mbox series

[net-next,v4,0/3] net: A lightweight zero-copy notification

Message ID 20240528212103.350767-1-zijianzhang@bytedance.com (mailing list archive)
Headers show
Series net: A lightweight zero-copy notification | expand

Message

Zijian Zhang May 28, 2024, 9:21 p.m. UTC
From: Zijian Zhang <zijianzhang@bytedance.com>

Original title is "net: socket sendmsg MSG_ZEROCOPY_UARG". 

Original notification mechanism needs poll + recvmmsg which is not
easy for applcations to accommodate. And, it also incurs unignorable
overhead including extra system calls and usage of socket optmem.

While making maximum reuse of the existing MSG_ZEROCOPY related code,
this patch set introduces a new zerocopy socket notification mechanism.
Users of sendmsg pass a control message as a placeholder for the incoming
notifications. Upon returning, kernel embeds notifications directly into
user arguments passed in. By doing so, we can significantly reduce the
complexity and overhead for managing notifications. In an ideal pattern,
the user will keep calling sendmsg with SCM_ZC_NOTIFICATION msg_control,
and the notification will be delivered as soon as possible.

Users need to pass in a user space address pointing to an array of struct
zc_info_elem, and the cmsg_len should be the memory size of the array
instead of the size of the pointer itself.

As Willem commented,

> The main design issue with this series is this indirection, rather
> than passing the array of notifications as cmsg.

> This trick circumvents having to deal with compat issues and having to
> figure out copy_to_user in ____sys_sendmsg (as msg_control is an
> in-kernel copy).

> This is quite hacky, from an API design PoV.

> As is passing a pointer, but expecting msg_controllen to hold the
> length not of the pointer, but of the pointed to user buffer.

> I had also hoped for more significant savings. Especially with the
> higher syscall overhead due to meltdown and spectre mitigations vs
> when MSG_ZEROCOPY was introduced and I last tried this optimization.

Changelog:
  v1 -> v2:
    - Reuse errormsg queue in the new notification mechanism,
      users can actually use these two mechanisms in hybrid way
      if they want to do so.
    - Update case SCM_ZC_NOTIFICATION in __sock_cmsg_send
      1. Regardless of 32-bit, 64-bit program, we will always handle
      u64 type user address.
      2. The size of data to copy_to_user is precisely calculated
      in case of kernel stack leak.
    - fix (kbuild-bot)
      1. Add SCM_ZC_NOTIFICATION to arch-specific header files.
      2. header file types.h in include/uapi/linux/socket.h

  v2 -> v3:
    - 1. Users can now pass in the address of the zc_info_elem directly
      with appropriate cmsg_len instead of the ugly user interface. Plus,
      the handler is now compatible with MSG_CMSG_COMPAT and 32-bit
      pointer.
    - 2. Suggested by Willem, another strategy of getting zc info is
      briefly taking the lock of sk_error_queue and move to a private
      list, like net_rx_action. I thought sk_error_queue is protected by
      sock_lock, so that it's impossible for the handling of zc info and
      users recvmsg from the sk_error_queue at the same time.
      However, sk_error_queue is protected by its own lock. I am afraid
      that during the time it is handling the private list, users may
      fail to get other error messages in the queue via recvmsg. Thus,
      I don't implement the splice logic in this version. Any comments?

  v3 -> v4:
    - 1. Change SOCK_ZC_INFO_MAX to 64 to avoid large stack frame size.
    - 2. Fix minor typos.
    - 3. Change cfg_zerocopy from int to enum in msg_zerocopy.c

* Performance

I extend the selftests/msg_zerocopy.c to accommodate the new mechanism,
test result is as follows,

cfg_notification_limit = 1, in this case the original method approximately
aligns with the semantics of new one. In this case, the new flag has
around 13% cpu savings in TCP and 18% cpu savings in UDP.

+---------------------+---------+---------+---------+---------+
| Test Type / Protocol| TCP v4  | TCP v6  | UDP v4  | UDP v6  |
+---------------------+---------+---------+---------+---------+
| ZCopy (MB)          | 5147    | 4885    | 7489    | 7854    |
+---------------------+---------+---------+---------+---------+
| New ZCopy (MB)      | 5859    | 5505    | 9053    | 9236    |
+---------------------+---------+---------+---------+---------+
| New ZCopy / ZCopy   | 113.83% | 112.69% | 120.88% | 117.59% |
+---------------------+---------+---------+---------+---------+


cfg_notification_limit = 32, the new mechanism performs 8% better in TCP.
For UDP, no obvious performance gain is observed and sometimes may lead
to degradation. Thus, if users don't need to retrieve the notification
ASAP in UDP, the original mechanism is preferred.

+---------------------+---------+---------+---------+---------+
| Test Type / Protocol| TCP v4  | TCP v6  | UDP v4  | UDP v6  |
+---------------------+---------+---------+---------+---------+
| ZCopy (MB)          | 6272    | 6138    | 12138   | 10055   |
+---------------------+---------+---------+---------+---------+
| New ZCopy (MB)      | 6774    | 6620    | 11504   | 10355   |
+---------------------+---------+---------+---------+---------+
| New ZCopy / ZCopy   | 108.00% | 107.85% | 94.78%  | 102.98% |
+---------------------+---------+---------+---------+---------+

Zijian Zhang (3):
  selftests: fix OOM problem in msg_zerocopy selftest
  sock: add MSG_ZEROCOPY notification mechanism based on msg_control
  selftests: add MSG_ZEROCOPY msg_control notification test

 arch/alpha/include/uapi/asm/socket.h        |   2 +
 arch/mips/include/uapi/asm/socket.h         |   2 +
 arch/parisc/include/uapi/asm/socket.h       |   2 +
 arch/sparc/include/uapi/asm/socket.h        |   2 +
 include/uapi/asm-generic/socket.h           |   2 +
 include/uapi/linux/socket.h                 |  10 ++
 net/core/sock.c                             |  68 ++++++++++++
 tools/testing/selftests/net/msg_zerocopy.c  | 114 ++++++++++++++++++--
 tools/testing/selftests/net/msg_zerocopy.sh |   1 +
 9 files changed, 196 insertions(+), 7 deletions(-)

Comments

Willem de Bruijn May 29, 2024, 1:59 p.m. UTC | #1
zijianzhang@ wrote:
> From: Zijian Zhang <zijianzhang@bytedance.com>
> 
> Original title is "net: socket sendmsg MSG_ZEROCOPY_UARG". 
> 
> Original notification mechanism needs poll + recvmmsg which is not
> easy for applcations to accommodate. And, it also incurs unignorable
> overhead including extra system calls and usage of socket optmem.
> 
> While making maximum reuse of the existing MSG_ZEROCOPY related code,
> this patch set introduces a new zerocopy socket notification mechanism.
> Users of sendmsg pass a control message as a placeholder for the incoming
> notifications. Upon returning, kernel embeds notifications directly into
> user arguments passed in. By doing so, we can significantly reduce the
> complexity and overhead for managing notifications. In an ideal pattern,
> the user will keep calling sendmsg with SCM_ZC_NOTIFICATION msg_control,
> and the notification will be delivered as soon as possible.
> 
> Users need to pass in a user space address pointing to an array of struct
> zc_info_elem, and the cmsg_len should be the memory size of the array
> instead of the size of the pointer itself.
> 
> As Willem commented,
> 
> > The main design issue with this series is this indirection, rather
> > than passing the array of notifications as cmsg.
> 
> > This trick circumvents having to deal with compat issues and having to
> > figure out copy_to_user in ____sys_sendmsg (as msg_control is an
> > in-kernel copy).
> 
> > This is quite hacky, from an API design PoV.
> 
> > As is passing a pointer, but expecting msg_controllen to hold the
> > length not of the pointer, but of the pointed to user buffer.
> 
> > I had also hoped for more significant savings. Especially with the
> > higher syscall overhead due to meltdown and spectre mitigations vs
> > when MSG_ZEROCOPY was introduced and I last tried this optimization.

Thanks for quoting this.

This revision does not address either of these concerns, right?

> 
> Changelog:
>   v1 -> v2:
>     - Reuse errormsg queue in the new notification mechanism,
>       users can actually use these two mechanisms in hybrid way
>       if they want to do so.
>     - Update case SCM_ZC_NOTIFICATION in __sock_cmsg_send
>       1. Regardless of 32-bit, 64-bit program, we will always handle
>       u64 type user address.
>       2. The size of data to copy_to_user is precisely calculated
>       in case of kernel stack leak.
>     - fix (kbuild-bot)
>       1. Add SCM_ZC_NOTIFICATION to arch-specific header files.
>       2. header file types.h in include/uapi/linux/socket.h
> 
>   v2 -> v3:
>     - 1. Users can now pass in the address of the zc_info_elem directly
>       with appropriate cmsg_len instead of the ugly user interface. Plus,
>       the handler is now compatible with MSG_CMSG_COMPAT and 32-bit
>       pointer.
>     - 2. Suggested by Willem, another strategy of getting zc info is
>       briefly taking the lock of sk_error_queue and move to a private
>       list, like net_rx_action. I thought sk_error_queue is protected by
>       sock_lock, so that it's impossible for the handling of zc info and
>       users recvmsg from the sk_error_queue at the same time.
>       However, sk_error_queue is protected by its own lock. I am afraid
>       that during the time it is handling the private list, users may
>       fail to get other error messages in the queue via recvmsg. Thus,
>       I don't implement the splice logic in this version. Any comments?
> 
>   v3 -> v4:
>     - 1. Change SOCK_ZC_INFO_MAX to 64 to avoid large stack frame size.
>     - 2. Fix minor typos.
>     - 3. Change cfg_zerocopy from int to enum in msg_zerocopy.c
> 
> * Performance
> 
> I extend the selftests/msg_zerocopy.c to accommodate the new mechanism,
> test result is as follows,
> 
> cfg_notification_limit = 1, in this case the original method approximately
> aligns with the semantics of new one. In this case, the new flag has
> around 13% cpu savings in TCP and 18% cpu savings in UDP.
> 
> +---------------------+---------+---------+---------+---------+
> | Test Type / Protocol| TCP v4  | TCP v6  | UDP v4  | UDP v6  |
> +---------------------+---------+---------+---------+---------+
> | ZCopy (MB)          | 5147    | 4885    | 7489    | 7854    |
> +---------------------+---------+---------+---------+---------+
> | New ZCopy (MB)      | 5859    | 5505    | 9053    | 9236    |
> +---------------------+---------+---------+---------+---------+
> | New ZCopy / ZCopy   | 113.83% | 112.69% | 120.88% | 117.59% |
> +---------------------+---------+---------+---------+---------+
> 
> 
> cfg_notification_limit = 32, the new mechanism performs 8% better in TCP.
> For UDP, no obvious performance gain is observed and sometimes may lead
> to degradation. Thus, if users don't need to retrieve the notification
> ASAP in UDP, the original mechanism is preferred.
> 
> +---------------------+---------+---------+---------+---------+
> | Test Type / Protocol| TCP v4  | TCP v6  | UDP v4  | UDP v6  |
> +---------------------+---------+---------+---------+---------+
> | ZCopy (MB)          | 6272    | 6138    | 12138   | 10055   |
> +---------------------+---------+---------+---------+---------+
> | New ZCopy (MB)      | 6774    | 6620    | 11504   | 10355   |
> +---------------------+---------+---------+---------+---------+
> | New ZCopy / ZCopy   | 108.00% | 107.85% | 94.78%  | 102.98% |
> +---------------------+---------+---------+---------+---------+
> 
> Zijian Zhang (3):
>   selftests: fix OOM problem in msg_zerocopy selftest
>   sock: add MSG_ZEROCOPY notification mechanism based on msg_control
>   selftests: add MSG_ZEROCOPY msg_control notification test
> 
>  arch/alpha/include/uapi/asm/socket.h        |   2 +
>  arch/mips/include/uapi/asm/socket.h         |   2 +
>  arch/parisc/include/uapi/asm/socket.h       |   2 +
>  arch/sparc/include/uapi/asm/socket.h        |   2 +
>  include/uapi/asm-generic/socket.h           |   2 +
>  include/uapi/linux/socket.h                 |  10 ++
>  net/core/sock.c                             |  68 ++++++++++++
>  tools/testing/selftests/net/msg_zerocopy.c  | 114 ++++++++++++++++++--
>  tools/testing/selftests/net/msg_zerocopy.sh |   1 +
>  9 files changed, 196 insertions(+), 7 deletions(-)
> 
> -- 
> 2.20.1
>
Zijian Zhang May 29, 2024, 5:34 p.m. UTC | #2
On 5/29/24 6:59 AM, Willem de Bruijn wrote:
> zijianzhang@ wrote:
>> From: Zijian Zhang <zijianzhang@bytedance.com>
>>
>> Original title is "net: socket sendmsg MSG_ZEROCOPY_UARG".
>>
>> Original notification mechanism needs poll + recvmmsg which is not
>> easy for applcations to accommodate. And, it also incurs unignorable
>> overhead including extra system calls and usage of socket optmem.
>>
>> While making maximum reuse of the existing MSG_ZEROCOPY related code,
>> this patch set introduces a new zerocopy socket notification mechanism.
>> Users of sendmsg pass a control message as a placeholder for the incoming
>> notifications. Upon returning, kernel embeds notifications directly into
>> user arguments passed in. By doing so, we can significantly reduce the
>> complexity and overhead for managing notifications. In an ideal pattern,
>> the user will keep calling sendmsg with SCM_ZC_NOTIFICATION msg_control,
>> and the notification will be delivered as soon as possible.
>>
>> Users need to pass in a user space address pointing to an array of struct
>> zc_info_elem, and the cmsg_len should be the memory size of the array
>> instead of the size of the pointer itself.
>>
>> As Willem commented,
>>
>>> The main design issue with this series is this indirection, rather
>>> than passing the array of notifications as cmsg.
>>
>>> This trick circumvents having to deal with compat issues and having to
>>> figure out copy_to_user in ____sys_sendmsg (as msg_control is an
>>> in-kernel copy).
>>
>>> This is quite hacky, from an API design PoV.
>>
>>> As is passing a pointer, but expecting msg_controllen to hold the
>>> length not of the pointer, but of the pointed to user buffer.
>>
>>> I had also hoped for more significant savings. Especially with the
>>> higher syscall overhead due to meltdown and spectre mitigations vs
>>> when MSG_ZEROCOPY was introduced and I last tried this optimization.
> 
> Thanks for quoting this.
> 
> This revision does not address either of these concerns, right?
> 

Right, this revision just fixed some code according to your comments.

>>
>> Changelog:
>>    v1 -> v2:
>>      - Reuse errormsg queue in the new notification mechanism,
>>        users can actually use these two mechanisms in hybrid way
>>        if they want to do so.
>>      - Update case SCM_ZC_NOTIFICATION in __sock_cmsg_send
>>        1. Regardless of 32-bit, 64-bit program, we will always handle
>>        u64 type user address.
>>        2. The size of data to copy_to_user is precisely calculated
>>        in case of kernel stack leak.
>>      - fix (kbuild-bot)
>>        1. Add SCM_ZC_NOTIFICATION to arch-specific header files.
>>        2. header file types.h in include/uapi/linux/socket.h
>>
>>    v2 -> v3:
>>      - 1. Users can now pass in the address of the zc_info_elem directly
>>        with appropriate cmsg_len instead of the ugly user interface. Plus,
>>        the handler is now compatible with MSG_CMSG_COMPAT and 32-bit
>>        pointer.
>>      - 2. Suggested by Willem, another strategy of getting zc info is
>>        briefly taking the lock of sk_error_queue and move to a private
>>        list, like net_rx_action. I thought sk_error_queue is protected by
>>        sock_lock, so that it's impossible for the handling of zc info and
>>        users recvmsg from the sk_error_queue at the same time.
>>        However, sk_error_queue is protected by its own lock. I am afraid
>>        that during the time it is handling the private list, users may
>>        fail to get other error messages in the queue via recvmsg. Thus,
>>        I don't implement the splice logic in this version. Any comments?
>>
>>    v3 -> v4:
>>      - 1. Change SOCK_ZC_INFO_MAX to 64 to avoid large stack frame size.
>>      - 2. Fix minor typos.
>>      - 3. Change cfg_zerocopy from int to enum in msg_zerocopy.c
>>
>> * Performance
>>
>> I extend the selftests/msg_zerocopy.c to accommodate the new mechanism,
>> test result is as follows,
>>
>> cfg_notification_limit = 1, in this case the original method approximately
>> aligns with the semantics of new one. In this case, the new flag has
>> around 13% cpu savings in TCP and 18% cpu savings in UDP.
>>
>> +---------------------+---------+---------+---------+---------+
>> | Test Type / Protocol| TCP v4  | TCP v6  | UDP v4  | UDP v6  |
>> +---------------------+---------+---------+---------+---------+
>> | ZCopy (MB)          | 5147    | 4885    | 7489    | 7854    |
>> +---------------------+---------+---------+---------+---------+
>> | New ZCopy (MB)      | 5859    | 5505    | 9053    | 9236    |
>> +---------------------+---------+---------+---------+---------+
>> | New ZCopy / ZCopy   | 113.83% | 112.69% | 120.88% | 117.59% |
>> +---------------------+---------+---------+---------+---------+
>>
>>
>> cfg_notification_limit = 32, the new mechanism performs 8% better in TCP.
>> For UDP, no obvious performance gain is observed and sometimes may lead
>> to degradation. Thus, if users don't need to retrieve the notification
>> ASAP in UDP, the original mechanism is preferred.
>>
>> +---------------------+---------+---------+---------+---------+
>> | Test Type / Protocol| TCP v4  | TCP v6  | UDP v4  | UDP v6  |
>> +---------------------+---------+---------+---------+---------+
>> | ZCopy (MB)          | 6272    | 6138    | 12138   | 10055   |
>> +---------------------+---------+---------+---------+---------+
>> | New ZCopy (MB)      | 6774    | 6620    | 11504   | 10355   |
>> +---------------------+---------+---------+---------+---------+
>> | New ZCopy / ZCopy   | 108.00% | 107.85% | 94.78%  | 102.98% |
>> +---------------------+---------+---------+---------+---------+
>>
>> Zijian Zhang (3):
>>    selftests: fix OOM problem in msg_zerocopy selftest
>>    sock: add MSG_ZEROCOPY notification mechanism based on msg_control
>>    selftests: add MSG_ZEROCOPY msg_control notification test
>>
>>   arch/alpha/include/uapi/asm/socket.h        |   2 +
>>   arch/mips/include/uapi/asm/socket.h         |   2 +
>>   arch/parisc/include/uapi/asm/socket.h       |   2 +
>>   arch/sparc/include/uapi/asm/socket.h        |   2 +
>>   include/uapi/asm-generic/socket.h           |   2 +
>>   include/uapi/linux/socket.h                 |  10 ++
>>   net/core/sock.c                             |  68 ++++++++++++
>>   tools/testing/selftests/net/msg_zerocopy.c  | 114 ++++++++++++++++++--
>>   tools/testing/selftests/net/msg_zerocopy.sh |   1 +
>>   9 files changed, 196 insertions(+), 7 deletions(-)
>>
>> -- 
>> 2.20.1
>>
> 
>