diff mbox series

[net,1/2] selftests: fix OOM in msg_zerocopy selftest

Message ID 20240701225349.3395580-2-zijianzhang@bytedance.com (mailing list archive)
State Accepted
Commit af2b7e5b741aaae9ffbba2c660def434e07aa241
Delegated to: Netdev Maintainers
Headers show
Series fix OOM and order check in msg_zerocopy selftest | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 8 this patch: 8
netdev/build_tools success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 5 maintainers not CCed: kuba@kernel.org pabeni@redhat.com edumazet@google.com linux-kselftest@vger.kernel.org shuah@kernel.org
netdev/build_clang success Errors and warnings before: 8 this patch: 8
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 8 this patch: 8
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-07-03--12-00 (tests: 666)

Commit Message

Zijian Zhang July 1, 2024, 10:53 p.m. UTC
From: Zijian Zhang <zijianzhang@bytedance.com>

In selftests/net/msg_zerocopy.c, it has a while loop keeps calling sendmsg
on a socket with MSG_ZEROCOPY flag, and it will recv the notifications
until the socket is not writable. Typically, it will start the receiving
process after around 30+ sendmsgs. However, as the introduction of commit
dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale"), the sender is
always writable and does not get any chance to run recv notifications.
The selftest always exits with OUT_OF_MEMORY because the memory used by
opt_skb exceeds the net.core.optmem_max. Meanwhile, it could be set to a
different value to trigger OOM on older kernels too.

Thus, we introduce "cfg_notification_limit" to force sender to receive
notifications after some number of sendmsgs.

Fixes: 07b65c5b31ce ("test: add msg_zerocopy test")
Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
Signed-off-by: Xiaochun Lu <xiaochun.lu@bytedance.com>
---
 tools/testing/selftests/net/msg_zerocopy.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

Comments

Willem de Bruijn July 2, 2024, 1:17 p.m. UTC | #1
zijianzhang@ wrote:
> From: Zijian Zhang <zijianzhang@bytedance.com>
> 
> In selftests/net/msg_zerocopy.c, it has a while loop keeps calling sendmsg
> on a socket with MSG_ZEROCOPY flag, and it will recv the notifications
> until the socket is not writable. Typically, it will start the receiving
> process after around 30+ sendmsgs. However, as the introduction of commit
> dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale"), the sender is
> always writable and does not get any chance to run recv notifications.
> The selftest always exits with OUT_OF_MEMORY because the memory used by
> opt_skb exceeds the net.core.optmem_max. Meanwhile, it could be set to a
> different value to trigger OOM on older kernels too.
> 
> Thus, we introduce "cfg_notification_limit" to force sender to receive
> notifications after some number of sendmsgs.
> 
> Fixes: 07b65c5b31ce ("test: add msg_zerocopy test")
> Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
> Signed-off-by: Xiaochun Lu <xiaochun.lu@bytedance.com>

Reviewed-by: Willem de Bruijn <willemb@google.com>
Jakub Kicinski July 4, 2024, 1:50 a.m. UTC | #2
On Mon,  1 Jul 2024 22:53:48 +0000 zijianzhang@bytedance.com wrote:
> In selftests/net/msg_zerocopy.c, it has a while loop keeps calling sendmsg
> on a socket with MSG_ZEROCOPY flag, and it will recv the notifications
> until the socket is not writable. Typically, it will start the receiving
> process after around 30+ sendmsgs. However, as the introduction of commit
> dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale"), the sender is
> always writable and does not get any chance to run recv notifications.
> The selftest always exits with OUT_OF_MEMORY because the memory used by
> opt_skb exceeds the net.core.optmem_max. Meanwhile, it could be set to a
> different value to trigger OOM on older kernels too.

This test doesn't fail in netdev CI. Is the problem fix in net-next
somehow? Or the "always exits with OUT_OF_MEMORY" is an exaggerations?
(TBH I'm not even sure what it means to "exit with OUT_OF_MEMORY" in
this context.)

TAP version 13
1..1
# timeout set to 3600
# selftests: net: msg_zerocopy.sh
# ipv4 tcp -t 1
# tx=164425 (10260 MB) txc=0 zc=n
# rx=59526 (10260 MB)
# ipv4 tcp -z -t 1
# tx=111332 (6947 MB) txc=111332 zc=n
# rx=55245 (6947 MB)
# ok
# ipv6 tcp -t 1
# tx=168140 (10492 MB) txc=0 zc=n
# rx=64633 (10492 MB)
# ipv6 tcp -z -t 1
# tx=108388 (6763 MB) txc=108388 zc=n
# rx=54146 (6763 MB)
# ok
# ipv4 udp -t 1
# tx=173970 (10856 MB) txc=0 zc=n
# rx=173936 (10854 MB)
# ipv4 udp -z -t 1
# tx=117728 (7346 MB) txc=117728 zc=n
# rx=117703 (7345 MB)
# ok
# ipv6 udp -t 1
# tx=225502 (14072 MB) txc=0 zc=n
# rx=225502 (14072 MB)
# ipv6 udp -z -t 1
# tx=111215 (6940 MB) txc=111215 zc=n
# rx=111212 (6940 MB)
# ok
# OK. All tests passed
ok 1 selftests: net: msg_zerocopy.sh
Zijian Zhang July 4, 2024, 2:32 a.m. UTC | #3
On 7/3/24 6:50 PM, Jakub Kicinski wrote:
> On Mon,  1 Jul 2024 22:53:48 +0000 zijianzhang@bytedance.com wrote:
>> In selftests/net/msg_zerocopy.c, it has a while loop keeps calling sendmsg
>> on a socket with MSG_ZEROCOPY flag, and it will recv the notifications
>> until the socket is not writable. Typically, it will start the receiving
>> process after around 30+ sendmsgs. However, as the introduction of commit
>> dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale"), the sender is
>> always writable and does not get any chance to run recv notifications.
>> The selftest always exits with OUT_OF_MEMORY because the memory used by
>> opt_skb exceeds the net.core.optmem_max. Meanwhile, it could be set to a
>> different value to trigger OOM on older kernels too.
> 
> This test doesn't fail in netdev CI. Is the problem fix in net-next
> somehow? Or the "always exits with OUT_OF_MEMORY" is an exaggerations?
> (TBH I'm not even sure what it means to "exit with OUT_OF_MEMORY" in
> this context.)
>
The reason why this test doesn't fail in CI:

According to the test output,
# ipv4 tcp -z -t 1
# tx=111332 (6947 MB) txc=111332 zc=n
zerocopy is false here.

This is because of some limitation of zerocopy in localhost.
Specifically, the subsection "Notification Latency" in the sendmsg
zerocopy the paper.

In order to make "zc=y", we may need to update skb_orphan_frags_rx to
the same as skb_orphan_frags, recompile the kernel, and run the test.

By OUT_OF_MEMORY I mean:

Each calling of sendmsg with zerocopy will allocate an skb with
sock_omalloc. If users never recv the notifications but keep calling
sendmsg with zerocopy. The send system call will finally return with
-ENOMEM.

I hope this clarifies your confusion :)

> TAP version 13
> 1..1
> # timeout set to 3600
> # selftests: net: msg_zerocopy.sh
> # ipv4 tcp -t 1
> # tx=164425 (10260 MB) txc=0 zc=n
> # rx=59526 (10260 MB)
> # ipv4 tcp -z -t 1
> # tx=111332 (6947 MB) txc=111332 zc=n
> # rx=55245 (6947 MB)
> # ok
> # ipv6 tcp -t 1
> # tx=168140 (10492 MB) txc=0 zc=n
> # rx=64633 (10492 MB)
> # ipv6 tcp -z -t 1
> # tx=108388 (6763 MB) txc=108388 zc=n
> # rx=54146 (6763 MB)
> # ok
> # ipv4 udp -t 1
> # tx=173970 (10856 MB) txc=0 zc=n
> # rx=173936 (10854 MB)
> # ipv4 udp -z -t 1
> # tx=117728 (7346 MB) txc=117728 zc=n
> # rx=117703 (7345 MB)
> # ok
> # ipv6 udp -t 1
> # tx=225502 (14072 MB) txc=0 zc=n
> # rx=225502 (14072 MB)
> # ipv6 udp -z -t 1
> # tx=111215 (6940 MB) txc=111215 zc=n
> # rx=111212 (6940 MB)
> # ok
> # OK. All tests passed
> ok 1 selftests: net: msg_zerocopy.sh
Jakub Kicinski July 4, 2024, 2:42 a.m. UTC | #4
On Wed, 3 Jul 2024 19:32:33 -0700 Zijian Zhang wrote:
> > This test doesn't fail in netdev CI. Is the problem fix in net-next
> > somehow? Or the "always exits with OUT_OF_MEMORY" is an exaggerations?
> > (TBH I'm not even sure what it means to "exit with OUT_OF_MEMORY" in
> > this context.)
> >  
> The reason why this test doesn't fail in CI:
> 
> According to the test output,
> # ipv4 tcp -z -t 1
> # tx=111332 (6947 MB) txc=111332 zc=n
> zerocopy is false here.
> 
> This is because of some limitation of zerocopy in localhost.
> Specifically, the subsection "Notification Latency" in the sendmsg
> zerocopy the paper.
> 
> In order to make "zc=y", we may need to update skb_orphan_frags_rx to
> the same as skb_orphan_frags, recompile the kernel, and run the test.
> 
> By OUT_OF_MEMORY I mean:
> 
> Each calling of sendmsg with zerocopy will allocate an skb with
> sock_omalloc. If users never recv the notifications but keep calling
> sendmsg with zerocopy. The send system call will finally return with
> -ENOMEM.
> 
> I hope this clarifies your confusion :)

It does, thanks!
diff mbox series

Patch

diff --git a/tools/testing/selftests/net/msg_zerocopy.c b/tools/testing/selftests/net/msg_zerocopy.c
index bdc03a2097e8..926556febc83 100644
--- a/tools/testing/selftests/net/msg_zerocopy.c
+++ b/tools/testing/selftests/net/msg_zerocopy.c
@@ -85,6 +85,7 @@  static bool cfg_rx;
 static int  cfg_runtime_ms	= 4200;
 static int  cfg_verbose;
 static int  cfg_waittime_ms	= 500;
+static int  cfg_notification_limit = 32;
 static bool cfg_zerocopy;
 
 static socklen_t cfg_alen;
@@ -95,6 +96,7 @@  static char payload[IP_MAXPACKET];
 static long packets, bytes, completions, expected_completions;
 static int  zerocopied = -1;
 static uint32_t next_completion;
+static uint32_t sends_since_notify;
 
 static unsigned long gettimeofday_ms(void)
 {
@@ -208,6 +210,7 @@  static bool do_sendmsg(int fd, struct msghdr *msg, bool do_zerocopy, int domain)
 		error(1, errno, "send");
 	if (cfg_verbose && ret != len)
 		fprintf(stderr, "send: ret=%u != %u\n", ret, len);
+	sends_since_notify++;
 
 	if (len) {
 		packets++;
@@ -460,6 +463,7 @@  static bool do_recv_completion(int fd, int domain)
 static void do_recv_completions(int fd, int domain)
 {
 	while (do_recv_completion(fd, domain)) {}
+	sends_since_notify = 0;
 }
 
 /* Wait for all remaining completions on the errqueue */
@@ -549,6 +553,9 @@  static void do_tx(int domain, int type, int protocol)
 		else
 			do_sendmsg(fd, &msg, cfg_zerocopy, domain);
 
+		if (cfg_zerocopy && sends_since_notify >= cfg_notification_limit)
+			do_recv_completions(fd, domain);
+
 		while (!do_poll(fd, POLLOUT)) {
 			if (cfg_zerocopy)
 				do_recv_completions(fd, domain);
@@ -708,7 +715,7 @@  static void parse_opts(int argc, char **argv)
 
 	cfg_payload_len = max_payload_len;
 
-	while ((c = getopt(argc, argv, "46c:C:D:i:mp:rs:S:t:vz")) != -1) {
+	while ((c = getopt(argc, argv, "46c:C:D:i:l:mp:rs:S:t:vz")) != -1) {
 		switch (c) {
 		case '4':
 			if (cfg_family != PF_UNSPEC)
@@ -736,6 +743,9 @@  static void parse_opts(int argc, char **argv)
 			if (cfg_ifindex == 0)
 				error(1, errno, "invalid iface: %s", optarg);
 			break;
+		case 'l':
+			cfg_notification_limit = strtoul(optarg, NULL, 0);
+			break;
 		case 'm':
 			cfg_cork_mixed = true;
 			break;