Message ID | 20240528212103.350767-1-zijianzhang@bytedance.com (mailing list archive) |
---|---|
Headers | show |
Series | net: A lightweight zero-copy notification | expand |
zijianzhang@ wrote: > From: Zijian Zhang <zijianzhang@bytedance.com> > > Original title is "net: socket sendmsg MSG_ZEROCOPY_UARG". > > Original notification mechanism needs poll + recvmmsg which is not > easy for applcations to accommodate. And, it also incurs unignorable > overhead including extra system calls and usage of socket optmem. > > While making maximum reuse of the existing MSG_ZEROCOPY related code, > this patch set introduces a new zerocopy socket notification mechanism. > Users of sendmsg pass a control message as a placeholder for the incoming > notifications. Upon returning, kernel embeds notifications directly into > user arguments passed in. By doing so, we can significantly reduce the > complexity and overhead for managing notifications. In an ideal pattern, > the user will keep calling sendmsg with SCM_ZC_NOTIFICATION msg_control, > and the notification will be delivered as soon as possible. > > Users need to pass in a user space address pointing to an array of struct > zc_info_elem, and the cmsg_len should be the memory size of the array > instead of the size of the pointer itself. > > As Willem commented, > > > The main design issue with this series is this indirection, rather > > than passing the array of notifications as cmsg. > > > This trick circumvents having to deal with compat issues and having to > > figure out copy_to_user in ____sys_sendmsg (as msg_control is an > > in-kernel copy). > > > This is quite hacky, from an API design PoV. > > > As is passing a pointer, but expecting msg_controllen to hold the > > length not of the pointer, but of the pointed to user buffer. > > > I had also hoped for more significant savings. Especially with the > > higher syscall overhead due to meltdown and spectre mitigations vs > > when MSG_ZEROCOPY was introduced and I last tried this optimization. Thanks for quoting this. This revision does not address either of these concerns, right? > > Changelog: > v1 -> v2: > - Reuse errormsg queue in the new notification mechanism, > users can actually use these two mechanisms in hybrid way > if they want to do so. > - Update case SCM_ZC_NOTIFICATION in __sock_cmsg_send > 1. Regardless of 32-bit, 64-bit program, we will always handle > u64 type user address. > 2. The size of data to copy_to_user is precisely calculated > in case of kernel stack leak. > - fix (kbuild-bot) > 1. Add SCM_ZC_NOTIFICATION to arch-specific header files. > 2. header file types.h in include/uapi/linux/socket.h > > v2 -> v3: > - 1. Users can now pass in the address of the zc_info_elem directly > with appropriate cmsg_len instead of the ugly user interface. Plus, > the handler is now compatible with MSG_CMSG_COMPAT and 32-bit > pointer. > - 2. Suggested by Willem, another strategy of getting zc info is > briefly taking the lock of sk_error_queue and move to a private > list, like net_rx_action. I thought sk_error_queue is protected by > sock_lock, so that it's impossible for the handling of zc info and > users recvmsg from the sk_error_queue at the same time. > However, sk_error_queue is protected by its own lock. I am afraid > that during the time it is handling the private list, users may > fail to get other error messages in the queue via recvmsg. Thus, > I don't implement the splice logic in this version. Any comments? > > v3 -> v4: > - 1. Change SOCK_ZC_INFO_MAX to 64 to avoid large stack frame size. > - 2. Fix minor typos. > - 3. Change cfg_zerocopy from int to enum in msg_zerocopy.c > > * Performance > > I extend the selftests/msg_zerocopy.c to accommodate the new mechanism, > test result is as follows, > > cfg_notification_limit = 1, in this case the original method approximately > aligns with the semantics of new one. In this case, the new flag has > around 13% cpu savings in TCP and 18% cpu savings in UDP. > > +---------------------+---------+---------+---------+---------+ > | Test Type / Protocol| TCP v4 | TCP v6 | UDP v4 | UDP v6 | > +---------------------+---------+---------+---------+---------+ > | ZCopy (MB) | 5147 | 4885 | 7489 | 7854 | > +---------------------+---------+---------+---------+---------+ > | New ZCopy (MB) | 5859 | 5505 | 9053 | 9236 | > +---------------------+---------+---------+---------+---------+ > | New ZCopy / ZCopy | 113.83% | 112.69% | 120.88% | 117.59% | > +---------------------+---------+---------+---------+---------+ > > > cfg_notification_limit = 32, the new mechanism performs 8% better in TCP. > For UDP, no obvious performance gain is observed and sometimes may lead > to degradation. Thus, if users don't need to retrieve the notification > ASAP in UDP, the original mechanism is preferred. > > +---------------------+---------+---------+---------+---------+ > | Test Type / Protocol| TCP v4 | TCP v6 | UDP v4 | UDP v6 | > +---------------------+---------+---------+---------+---------+ > | ZCopy (MB) | 6272 | 6138 | 12138 | 10055 | > +---------------------+---------+---------+---------+---------+ > | New ZCopy (MB) | 6774 | 6620 | 11504 | 10355 | > +---------------------+---------+---------+---------+---------+ > | New ZCopy / ZCopy | 108.00% | 107.85% | 94.78% | 102.98% | > +---------------------+---------+---------+---------+---------+ > > Zijian Zhang (3): > selftests: fix OOM problem in msg_zerocopy selftest > sock: add MSG_ZEROCOPY notification mechanism based on msg_control > selftests: add MSG_ZEROCOPY msg_control notification test > > arch/alpha/include/uapi/asm/socket.h | 2 + > arch/mips/include/uapi/asm/socket.h | 2 + > arch/parisc/include/uapi/asm/socket.h | 2 + > arch/sparc/include/uapi/asm/socket.h | 2 + > include/uapi/asm-generic/socket.h | 2 + > include/uapi/linux/socket.h | 10 ++ > net/core/sock.c | 68 ++++++++++++ > tools/testing/selftests/net/msg_zerocopy.c | 114 ++++++++++++++++++-- > tools/testing/selftests/net/msg_zerocopy.sh | 1 + > 9 files changed, 196 insertions(+), 7 deletions(-) > > -- > 2.20.1 >
On 5/29/24 6:59 AM, Willem de Bruijn wrote: > zijianzhang@ wrote: >> From: Zijian Zhang <zijianzhang@bytedance.com> >> >> Original title is "net: socket sendmsg MSG_ZEROCOPY_UARG". >> >> Original notification mechanism needs poll + recvmmsg which is not >> easy for applcations to accommodate. And, it also incurs unignorable >> overhead including extra system calls and usage of socket optmem. >> >> While making maximum reuse of the existing MSG_ZEROCOPY related code, >> this patch set introduces a new zerocopy socket notification mechanism. >> Users of sendmsg pass a control message as a placeholder for the incoming >> notifications. Upon returning, kernel embeds notifications directly into >> user arguments passed in. By doing so, we can significantly reduce the >> complexity and overhead for managing notifications. In an ideal pattern, >> the user will keep calling sendmsg with SCM_ZC_NOTIFICATION msg_control, >> and the notification will be delivered as soon as possible. >> >> Users need to pass in a user space address pointing to an array of struct >> zc_info_elem, and the cmsg_len should be the memory size of the array >> instead of the size of the pointer itself. >> >> As Willem commented, >> >>> The main design issue with this series is this indirection, rather >>> than passing the array of notifications as cmsg. >> >>> This trick circumvents having to deal with compat issues and having to >>> figure out copy_to_user in ____sys_sendmsg (as msg_control is an >>> in-kernel copy). >> >>> This is quite hacky, from an API design PoV. >> >>> As is passing a pointer, but expecting msg_controllen to hold the >>> length not of the pointer, but of the pointed to user buffer. >> >>> I had also hoped for more significant savings. Especially with the >>> higher syscall overhead due to meltdown and spectre mitigations vs >>> when MSG_ZEROCOPY was introduced and I last tried this optimization. > > Thanks for quoting this. > > This revision does not address either of these concerns, right? > Right, this revision just fixed some code according to your comments. >> >> Changelog: >> v1 -> v2: >> - Reuse errormsg queue in the new notification mechanism, >> users can actually use these two mechanisms in hybrid way >> if they want to do so. >> - Update case SCM_ZC_NOTIFICATION in __sock_cmsg_send >> 1. Regardless of 32-bit, 64-bit program, we will always handle >> u64 type user address. >> 2. The size of data to copy_to_user is precisely calculated >> in case of kernel stack leak. >> - fix (kbuild-bot) >> 1. Add SCM_ZC_NOTIFICATION to arch-specific header files. >> 2. header file types.h in include/uapi/linux/socket.h >> >> v2 -> v3: >> - 1. Users can now pass in the address of the zc_info_elem directly >> with appropriate cmsg_len instead of the ugly user interface. Plus, >> the handler is now compatible with MSG_CMSG_COMPAT and 32-bit >> pointer. >> - 2. Suggested by Willem, another strategy of getting zc info is >> briefly taking the lock of sk_error_queue and move to a private >> list, like net_rx_action. I thought sk_error_queue is protected by >> sock_lock, so that it's impossible for the handling of zc info and >> users recvmsg from the sk_error_queue at the same time. >> However, sk_error_queue is protected by its own lock. I am afraid >> that during the time it is handling the private list, users may >> fail to get other error messages in the queue via recvmsg. Thus, >> I don't implement the splice logic in this version. Any comments? >> >> v3 -> v4: >> - 1. Change SOCK_ZC_INFO_MAX to 64 to avoid large stack frame size. >> - 2. Fix minor typos. >> - 3. Change cfg_zerocopy from int to enum in msg_zerocopy.c >> >> * Performance >> >> I extend the selftests/msg_zerocopy.c to accommodate the new mechanism, >> test result is as follows, >> >> cfg_notification_limit = 1, in this case the original method approximately >> aligns with the semantics of new one. In this case, the new flag has >> around 13% cpu savings in TCP and 18% cpu savings in UDP. >> >> +---------------------+---------+---------+---------+---------+ >> | Test Type / Protocol| TCP v4 | TCP v6 | UDP v4 | UDP v6 | >> +---------------------+---------+---------+---------+---------+ >> | ZCopy (MB) | 5147 | 4885 | 7489 | 7854 | >> +---------------------+---------+---------+---------+---------+ >> | New ZCopy (MB) | 5859 | 5505 | 9053 | 9236 | >> +---------------------+---------+---------+---------+---------+ >> | New ZCopy / ZCopy | 113.83% | 112.69% | 120.88% | 117.59% | >> +---------------------+---------+---------+---------+---------+ >> >> >> cfg_notification_limit = 32, the new mechanism performs 8% better in TCP. >> For UDP, no obvious performance gain is observed and sometimes may lead >> to degradation. Thus, if users don't need to retrieve the notification >> ASAP in UDP, the original mechanism is preferred. >> >> +---------------------+---------+---------+---------+---------+ >> | Test Type / Protocol| TCP v4 | TCP v6 | UDP v4 | UDP v6 | >> +---------------------+---------+---------+---------+---------+ >> | ZCopy (MB) | 6272 | 6138 | 12138 | 10055 | >> +---------------------+---------+---------+---------+---------+ >> | New ZCopy (MB) | 6774 | 6620 | 11504 | 10355 | >> +---------------------+---------+---------+---------+---------+ >> | New ZCopy / ZCopy | 108.00% | 107.85% | 94.78% | 102.98% | >> +---------------------+---------+---------+---------+---------+ >> >> Zijian Zhang (3): >> selftests: fix OOM problem in msg_zerocopy selftest >> sock: add MSG_ZEROCOPY notification mechanism based on msg_control >> selftests: add MSG_ZEROCOPY msg_control notification test >> >> arch/alpha/include/uapi/asm/socket.h | 2 + >> arch/mips/include/uapi/asm/socket.h | 2 + >> arch/parisc/include/uapi/asm/socket.h | 2 + >> arch/sparc/include/uapi/asm/socket.h | 2 + >> include/uapi/asm-generic/socket.h | 2 + >> include/uapi/linux/socket.h | 10 ++ >> net/core/sock.c | 68 ++++++++++++ >> tools/testing/selftests/net/msg_zerocopy.c | 114 ++++++++++++++++++-- >> tools/testing/selftests/net/msg_zerocopy.sh | 1 + >> 9 files changed, 196 insertions(+), 7 deletions(-) >> >> -- >> 2.20.1 >> > >