diff mbox series

[net-next,v2,1/2] udp: Allow GSO transmit from devices with no checksum offload

Message ID 20240626-linux-udpgso-v2-1-422dfcbd6b48@cloudflare.com (mailing list archive)
State Accepted
Commit 10154dbded6d6a2fecaebdfda206609de0f121a9
Delegated to: Netdev Maintainers
Headers show
Series Lift UDP_SEGMENT restriction for egress via device w/o csum offload | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 842 this patch: 842
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 1 maintainers not CCed: dsahern@kernel.org
netdev/build_clang success Errors and warnings before: 849 this patch: 849
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 860 this patch: 860
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 32 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 3 this patch: 3
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-06-28--06-00 (tests: 666)

Commit Message

Jakub Sitnicki June 26, 2024, 5:51 p.m. UTC
Today sending a UDP GSO packet from a TUN device results in an EIO error:

  import fcntl, os, struct
  from socket import *

  TUNSETIFF = 0x400454CA
  IFF_TUN = 0x0001
  IFF_NO_PI = 0x1000
  UDP_SEGMENT = 103

  tun_fd = os.open("/dev/net/tun", os.O_RDWR)
  ifr = struct.pack("16sH", b"tun0", IFF_TUN | IFF_NO_PI)
  fcntl.ioctl(tun_fd, TUNSETIFF, ifr)

  os.system("ip addr add 192.0.2.1/24 dev tun0")
  os.system("ip link set dev tun0 up")

  s = socket(AF_INET, SOCK_DGRAM)
  s.setsockopt(SOL_UDP, UDP_SEGMENT, 1200)
  s.sendto(b"x" * 3000, ("192.0.2.2", 9)) # EIO

This is due to a check in the udp stack if the egress device offers
checksum offload. While TUN/TAP devices, by default, don't advertise this
capability because it requires support from the TUN/TAP reader.

However, the GSO stack has a software fallback for checksum calculation,
which we can use. This way we don't force UDP_SEGMENT users to handle the
EIO error and implement a segmentation fallback.

Lift the restriction so that UDP_SEGMENT can be used with any egress
device. We also need to adjust the UDP GSO code to match the GSO stack
expectation about ip_summed field, as set in commit 8d63bee643f1 ("net:
avoid skb_warn_bad_offload false positives on UFO"). Otherwise we will hit
the bad offload check.

Users should, however, expect a potential performance impact when
batch-sending packets with UDP_SEGMENT without checksum offload on the
egress device. In such case the packet payload is read twice: first during
the sendmsg syscall when copying data from user memory, and then in the GSO
stack for checksum computation. This double memory read can be less
efficient than a regular sendmsg where the checksum is calculated during
the initial data copy from user memory.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/ipv4/udp.c         | 3 +--
 net/ipv4/udp_offload.c | 8 ++++++++
 net/ipv6/udp.c         | 3 +--
 3 files changed, 10 insertions(+), 4 deletions(-)

Comments

Willem de Bruijn June 28, 2024, 9:14 a.m. UTC | #1
Jakub Sitnicki wrote:
> Today sending a UDP GSO packet from a TUN device results in an EIO error:
> 
>   import fcntl, os, struct
>   from socket import *
> 
>   TUNSETIFF = 0x400454CA
>   IFF_TUN = 0x0001
>   IFF_NO_PI = 0x1000
>   UDP_SEGMENT = 103
> 
>   tun_fd = os.open("/dev/net/tun", os.O_RDWR)
>   ifr = struct.pack("16sH", b"tun0", IFF_TUN | IFF_NO_PI)
>   fcntl.ioctl(tun_fd, TUNSETIFF, ifr)
> 
>   os.system("ip addr add 192.0.2.1/24 dev tun0")
>   os.system("ip link set dev tun0 up")
> 
>   s = socket(AF_INET, SOCK_DGRAM)
>   s.setsockopt(SOL_UDP, UDP_SEGMENT, 1200)
>   s.sendto(b"x" * 3000, ("192.0.2.2", 9)) # EIO
> 
> This is due to a check in the udp stack if the egress device offers
> checksum offload. While TUN/TAP devices, by default, don't advertise this
> capability because it requires support from the TUN/TAP reader.
> 
> However, the GSO stack has a software fallback for checksum calculation,
> which we can use. This way we don't force UDP_SEGMENT users to handle the
> EIO error and implement a segmentation fallback.
> 
> Lift the restriction so that UDP_SEGMENT can be used with any egress
> device. We also need to adjust the UDP GSO code to match the GSO stack
> expectation about ip_summed field, as set in commit 8d63bee643f1 ("net:
> avoid skb_warn_bad_offload false positives on UFO"). Otherwise we will hit
> the bad offload check.
> 
> Users should, however, expect a potential performance impact when
> batch-sending packets with UDP_SEGMENT without checksum offload on the
> egress device. In such case the packet payload is read twice: first during
> the sendmsg syscall when copying data from user memory, and then in the GSO
> stack for checksum computation. This double memory read can be less
> efficient than a regular sendmsg where the checksum is calculated during
> the initial data copy from user memory.
> 
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>

Reviewed-by: Willem de Bruijn <willemb@google.com>
diff mbox series

Patch

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index d08bf16d476d..ed97df6af14d 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -938,8 +938,7 @@  static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4,
 			kfree_skb(skb);
 			return -EINVAL;
 		}
-		if (skb->ip_summed != CHECKSUM_PARTIAL || is_udplite ||
-		    dst_xfrm(skb_dst(skb))) {
+		if (is_udplite || dst_xfrm(skb_dst(skb))) {
 			kfree_skb(skb);
 			return -EIO;
 		}
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 59448a2dbf2c..aa2e0a28ca61 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -357,6 +357,14 @@  struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
 	else
 		uh->check = gso_make_checksum(seg, ~check) ? : CSUM_MANGLED_0;
 
+	/* On the TX path, CHECKSUM_NONE and CHECKSUM_UNNECESSARY have the same
+	 * meaning. However, check for bad offloads in the GSO stack expects the
+	 * latter, if the checksum was calculated in software. To vouch for the
+	 * segment skbs we actually need to set it on the gso_skb.
+	 */
+	if (gso_skb->ip_summed == CHECKSUM_NONE)
+		gso_skb->ip_summed = CHECKSUM_UNNECESSARY;
+
 	/* update refcount for the packet */
 	if (copy_dtor) {
 		int delta = sum_truesize - gso_skb->truesize;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index b56f0b9f4307..b5456394cc67 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1257,8 +1257,7 @@  static int udp_v6_send_skb(struct sk_buff *skb, struct flowi6 *fl6,
 			kfree_skb(skb);
 			return -EINVAL;
 		}
-		if (skb->ip_summed != CHECKSUM_PARTIAL || is_udplite ||
-		    dst_xfrm(skb_dst(skb))) {
+		if (is_udplite || dst_xfrm(skb_dst(skb))) {
 			kfree_skb(skb);
 			return -EIO;
 		}