diff mbox series

[net,1/2] udp: Mark GSO packets as CHECKSUM_UNNECESSARY early on on output

Message ID 20240725-udp-gso-egress-from-tunnel-v1-1-5e5530ead524@cloudflare.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series Fix bad offload warning when sending UDP GSO from a tunnel device | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 273 this patch: 273
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 1 maintainers not CCed: dsahern@kernel.org
netdev/build_clang success Errors and warnings before: 281 this patch: 281
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 292 this patch: 292
netdev/checkpatch warning WARNING: Possible repeated word: 'Google'
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 3 this patch: 3
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-07-26--15-00 (tests: 707)

Commit Message

Jakub Sitnicki July 25, 2024, 9:55 a.m. UTC
In commit 10154dbded6d ("udp: Allow GSO transmit from devices with no
checksum offload") we have added a tweak in the UDP GSO code to mark GSO
packets being sent out as CHECKSUM_UNNECESSARY when the egress device
doesn't support checksum offload. This was done to satisfy the offload
checks in the gso stack.

However, when sending a UDP GSO packet from a tunnel device, we will go
through the TX path and the GSO offload twice. Once for the tunnel device,
which acts as a passthru for GSO packets, and once for the underlying
egress device.

Even though a tunnel device acts as a passthru for a UDP GSO packet, GSO
offload checks still happen on transmit from a tunnel device. So if the skb
is not marked as CHECKSUM_UNNECESSARY or CHECKSUM_PARTIAL, we will get a
warning from the gso stack.

Today this can occur in two situations, which we check for in
__ip_append_data() and __ip6_append_data():

1) when the tunnel device does not advertise checksum offload, or
2) when there are IPv6 extension headers present.

Syzbot has triggered the second case, producing a warning as below:

  ip6tnl0: caps=(0x00000006401d7869, 0x00000006401d7869)
  WARNING: CPU: 0 PID: 5112 at net/core/dev.c:3293 skb_warn_bad_offload+0x166/0x1a0 net/core/dev.c:3291
  Modules linked in:
  CPU: 0 PID: 5112 Comm: syz-executor391 Not tainted 6.10.0-rc7-syzkaller-01603-g80ab5445da62 #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
  RIP: 0010:skb_warn_bad_offload+0x166/0x1a0 net/core/dev.c:3291
  [...]
  Call Trace:
   <TASK>
   __skb_gso_segment+0x3be/0x4c0 net/core/gso.c:127
   skb_gso_segment include/net/gso.h:83 [inline]
   validate_xmit_skb+0x585/0x1120 net/core/dev.c:3661
   __dev_queue_xmit+0x17a4/0x3e90 net/core/dev.c:4415
   neigh_output include/net/neighbour.h:542 [inline]
   ip6_finish_output2+0xffa/0x1680 net/ipv6/ip6_output.c:137
   ip6_finish_output+0x41e/0x810 net/ipv6/ip6_output.c:222
   ip6_send_skb+0x112/0x230 net/ipv6/ip6_output.c:1958
   udp_v6_send_skb+0xbf5/0x1870 net/ipv6/udp.c:1292
   udpv6_sendmsg+0x23b3/0x3270 net/ipv6/udp.c:1588
   sock_sendmsg_nosec net/socket.c:730 [inline]
   __sock_sendmsg+0xef/0x270 net/socket.c:745
   ____sys_sendmsg+0x525/0x7d0 net/socket.c:2585
   ___sys_sendmsg net/socket.c:2639 [inline]
   __sys_sendmmsg+0x3b2/0x740 net/socket.c:2725
   __do_sys_sendmmsg net/socket.c:2754 [inline]
   __se_sys_sendmmsg net/socket.c:2751 [inline]
   __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2751
   do_syscall_x64 arch/x86/entry/common.c:52 [inline]
   do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
   entry_SYSCALL_64_after_hwframe+0x77/0x7f
   [...]
   </TASK>

To fix it mark UDP_GSO packets as CHECKSUM_UNNECESSARY early on the TX
path, when still in the udp layer, since we need to have ip_summed set up
correctly for GSO processing by tunnel devices.

Note that even if GSO packet gets marked as CHECKSUM_PARTIAL due to tunnel
advertising HW csum offload, it will not prevent software csum offload in
UDP GSO from kicking in if the underlying device doesn't offer csum
offload (for example, a TUN/TAP device with default config). This is
because we recheck device features in gso stack instead relying on the
ip_summed hint.

Fixes: 10154dbded6d ("udp: Allow GSO transmit from devices with no checksum offload")
Reported-by: syzbot+e15b7e15b8a751a91d9a@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000e1609a061d5330ce@google.com/
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/ipv4/udp.c         | 7 +++++++
 net/ipv4/udp_offload.c | 8 --------
 net/ipv6/udp.c         | 7 +++++++
 3 files changed, 14 insertions(+), 8 deletions(-)

Comments

Willem de Bruijn July 25, 2024, 2:21 p.m. UTC | #1
On Thu, Jul 25, 2024 at 5:56 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> In commit 10154dbded6d ("udp: Allow GSO transmit from devices with no
> checksum offload") we have added a tweak in the UDP GSO code to mark GSO
> packets being sent out as CHECKSUM_UNNECESSARY when the egress device
> doesn't support checksum offload. This was done to satisfy the offload
> checks in the gso stack.
>
> However, when sending a UDP GSO packet from a tunnel device, we will go
> through the TX path and the GSO offload twice. Once for the tunnel device,
> which acts as a passthru for GSO packets, and once for the underlying
> egress device.
>
> Even though a tunnel device acts as a passthru for a UDP GSO packet, GSO
> offload checks still happen on transmit from a tunnel device. So if the skb
> is not marked as CHECKSUM_UNNECESSARY or CHECKSUM_PARTIAL, we will get a
> warning from the gso stack.

I don't entirely understand. The check should not hit on pass through,
where segs == skb:

        if (segs != skb && unlikely(skb_needs_check(skb, tx_path) &&
!IS_ERR(segs)))
                skb_warn_bad_offload(skb);

> Today this can occur in two situations, which we check for in
> __ip_append_data() and __ip6_append_data():
>
> 1) when the tunnel device does not advertise checksum offload, or
> 2) when there are IPv6 extension headers present.
>
> To fix it mark UDP_GSO packets as CHECKSUM_UNNECESSARY early on the TX
> path, when still in the udp layer, since we need to have ip_summed set up
> correctly for GSO processing by tunnel devices.

The previous patch converted segments post segmentation to
CHECKSUM_UNNECESSARY, which is fine as they had
already been checksummed in software, and CHECKSUM_NONE
packets on egress are common.

This creates GSO packets without CHECKSUM_PARTIAL.
Segmentation offload always requires checksum offload. So these
would be weird new packets. And having CHECKSUM_NONE (or
equivalent), but entering software checksumming is also confusing.

The crux is that I don't understand why the warning fires on tunnel
exit when no segmentation takes place there. Hopefully we can fix
in a way that does not introduce these weird GSO packets (but if
not, so be it).
Jakub Sitnicki July 26, 2024, 11:23 a.m. UTC | #2
On Thu, Jul 25, 2024 at 10:21 AM -04, Willem de Bruijn wrote:
> On Thu, Jul 25, 2024 at 5:56 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> In commit 10154dbded6d ("udp: Allow GSO transmit from devices with no
>> checksum offload") we have added a tweak in the UDP GSO code to mark GSO
>> packets being sent out as CHECKSUM_UNNECESSARY when the egress device
>> doesn't support checksum offload. This was done to satisfy the offload
>> checks in the gso stack.
>>
>> However, when sending a UDP GSO packet from a tunnel device, we will go
>> through the TX path and the GSO offload twice. Once for the tunnel device,
>> which acts as a passthru for GSO packets, and once for the underlying
>> egress device.
>>
>> Even though a tunnel device acts as a passthru for a UDP GSO packet, GSO
>> offload checks still happen on transmit from a tunnel device. So if the skb
>> is not marked as CHECKSUM_UNNECESSARY or CHECKSUM_PARTIAL, we will get a
>> warning from the gso stack.
>
> I don't entirely understand. The check should not hit on pass through,
> where segs == skb:
>
>         if (segs != skb && unlikely(skb_needs_check(skb, tx_path) &&
> !IS_ERR(segs)))
>                 skb_warn_bad_offload(skb);
>

That's something I should have explained better. Let me try to shed some
light on it now. We're hitting the skb_warn_bad_offload warning because
skb_mac_gso_segment doesn't return any segments (segs == NULL).

And that's because we bail out early out of __udp_gso_segment when we
detect that the tunnel device is capable of tx-udp-segmentation
(GSO_UDP_L4):

	if (skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) {
		/* Packet is from an untrusted source, reset gso_segs. */
		skb_shinfo(gso_skb)->gso_segs = DIV_ROUND_UP(gso_skb->len - sizeof(*uh),
							     mss);
		return NULL;
	}

It has not occurred to me before, but in the spirit of commit
8d74e9f88d65 "net: avoid skb_warn_bad_offload on IS_ERR" [1], we could
tighten the check to exclude cases when segs == NULL. I'm thinking of:

	if (segs != skb && !IS_ERR_OR_NULL(segs) && unlikely(skb_needs_check(skb, tx_path)))
		skb_warn_bad_offload(skb);

That would be an alternative. Though I'm not sure I understand the
consequences of such change fully yet. Namely if we're wouldn't be
losing some diagnostics from the bad offload warning.

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8d74e9f88d65af8bb2e095aff506aa6eac755ada

>> Today this can occur in two situations, which we check for in
>> __ip_append_data() and __ip6_append_data():
>>
>> 1) when the tunnel device does not advertise checksum offload, or
>> 2) when there are IPv6 extension headers present.
>>
>> To fix it mark UDP_GSO packets as CHECKSUM_UNNECESSARY early on the TX
>> path, when still in the udp layer, since we need to have ip_summed set up
>> correctly for GSO processing by tunnel devices.
>
> The previous patch converted segments post segmentation to
> CHECKSUM_UNNECESSARY, which is fine as they had
> already been checksummed in software, and CHECKSUM_NONE
> packets on egress are common.
>
> This creates GSO packets without CHECKSUM_PARTIAL.
> Segmentation offload always requires checksum offload. So these
> would be weird new packets. And having CHECKSUM_NONE (or
> equivalent), but entering software checksumming is also confusing.

I agree this is confusing to reason about. That is a GSO packet with
CHECKSUM_UNNECESSARY which has not undergone segmentation and csum
offload in software.

Kind of related, I noticed that turning off tx-checksum-ip-generic with
ethtool doesn't disable tx-udp-segmentation. That looks like a bug.

> The crux is that I don't understand why the warning fires on tunnel
> exit when no segmentation takes place there. Hopefully we can fix
> in a way that does not introduce these weird GSO packets (but if
> not, so be it).

Attaching a self contained repro which I've been using to trace and
understand the GSO code:

---8<---

sh# cat repro-full.py
#!/bin/env python
#
# `modprobe ip6_tunnel` might be needed.
#

import os
import subprocess
import shutil
from socket import *

UDP_SEGMENT = 103

cmd = [shutil.which("ip"), "-batch", "/dev/stdin"]
script = b"""
link set dev lo up

link add name sink mtu 1540 type dummy
addr add dev sink fd11::2/48 nodad
link set dev sink up

tunnel add iptnl mode ip6ip6 remote fd11::1 local fd11::2 dev sink
link set dev iptnl mtu 1500
addr add dev iptnl fd00::2/48 nodad
link set dev iptnl up
"""
proc = subprocess.Popen(cmd, stdin=subprocess.PIPE)
proc.communicate(input=script)

os.system("ethtool -K sink tx-udp-segmentation off > /dev/null")
os.system("ethtool -K sink tx-checksum-ip-generic off > /dev/null")

# Alternatively to hopopts:
# os.system("ethtool -K iptnl tx-checksum-ip-generic off")

hopopts = b"\x00" * 8
s = socket(AF_INET6, SOCK_DGRAM)
s.setsockopt(IPPROTO_IPV6, IPV6_HOPOPTS, hopopts)
s.setsockopt(SOL_UDP, UDP_SEGMENT, 145)
s.sendto(b"x" * 3000, ("fd00::1", 9))
sh# perf ftrace -G __skb_gso_segment --graph-opts noirqs,depth=5 -- unshare -n python repro-full.py
# tracer: function_graph
#
# CPU  DURATION                  FUNCTION CALLS
# |     |   |                     |   |   |   |
 16)               |  __skb_gso_segment() {
 16)   0.288 us    |    irq_enter_rcu(); /* = 0xffffa00c03d89ac0 */
 16)   0.172 us    |    idle_cpu(); /* = 0x0 */
 16)               |    skb_mac_gso_segment() {
 16)   0.184 us    |      skb_network_protocol(); /* = 0xdd86 */
 16)   0.161 us    |      __rcu_read_lock(); /* = 0x2 */
 16)               |      ipv6_gso_segment() {
 16)               |        rcu_read_lock_held() {
 16)   0.151 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
 16)   0.514 us    |        } /* rcu_read_lock_held = 0x1 */
 16)               |        rcu_read_lock_held() {
 16)   0.152 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
 16)   0.459 us    |        } /* rcu_read_lock_held = 0x1 */
 16)               |        rcu_read_lock_held() {
 16)   0.151 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
 16)   0.459 us    |        } /* rcu_read_lock_held = 0x1 */
 16)               |        udp6_ufo_fragment() {
 16)   0.237 us    |          __udp_gso_segment(); /* = 0x0 */
 16)   0.727 us    |        } /* udp6_ufo_fragment = 0x0 */
 16)   3.049 us    |      } /* ipv6_gso_segment = 0x0 */
 16)   0.171 us    |      __rcu_read_unlock(); /* = 0x1 */
 16)   4.748 us    |    } /* skb_mac_gso_segment = 0x0 */
 16)               |    skb_warn_bad_offload() {
 [...]
 16) ! 785.215 us  |    } /* skb_warn_bad_offload = 0x0 */
 16) ! 800.986 us  |  } /* __skb_gso_segment = 0x0 */
 16)               |  __skb_gso_segment() {
 16)   0.394 us    |    irq_enter_rcu(); /* = 0xffffa00c03d89ac0 */
 16)   0.181 us    |    idle_cpu(); /* = 0x0 */
 16)               |    skb_mac_gso_segment() {
 16)   0.182 us    |      skb_network_protocol(); /* = 0xdd86 */
 16)   0.178 us    |      __rcu_read_lock(); /* = 0x3 */
 16)               |      ipv6_gso_segment() {
 16)               |        rcu_read_lock_held() {
 16)   0.155 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
 16)   0.556 us    |        } /* rcu_read_lock_held = 0x1 */
 16)               |        rcu_read_lock_held() {
 16)   0.159 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
 16)   0.480 us    |        } /* rcu_read_lock_held = 0x1 */
 16)               |        rcu_read_lock_held() {
 16)   0.159 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
 16)   0.480 us    |        } /* rcu_read_lock_held = 0x1 */
 16)               |        ip6ip6_gso_segment() {
 16) + 22.176 us   |          ipv6_gso_segment(); /* = 0xffffa00c03018c00 */
 16) + 24.875 us   |        } /* ip6ip6_gso_segment = 0xffffa00c03018c00 */
 16) + 27.416 us   |      } /* ipv6_gso_segment = 0xffffa00c03018c00 */
 16)   0.230 us    |      __rcu_read_unlock(); /* = 0x2 */
 16) + 29.065 us   |    } /* skb_mac_gso_segment = 0xffffa00c03018c00 */
 16) + 32.828 us   |  } /* __skb_gso_segment = 0xffffa00c03018c00 */
sh#
Willem de Bruijn July 26, 2024, 1:58 p.m. UTC | #3
On Fri, Jul 26, 2024 at 7:23 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> On Thu, Jul 25, 2024 at 10:21 AM -04, Willem de Bruijn wrote:
> > On Thu, Jul 25, 2024 at 5:56 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
> >>
> >> In commit 10154dbded6d ("udp: Allow GSO transmit from devices with no
> >> checksum offload") we have added a tweak in the UDP GSO code to mark GSO
> >> packets being sent out as CHECKSUM_UNNECESSARY when the egress device
> >> doesn't support checksum offload. This was done to satisfy the offload
> >> checks in the gso stack.
> >>
> >> However, when sending a UDP GSO packet from a tunnel device, we will go
> >> through the TX path and the GSO offload twice. Once for the tunnel device,
> >> which acts as a passthru for GSO packets, and once for the underlying
> >> egress device.
> >>
> >> Even though a tunnel device acts as a passthru for a UDP GSO packet, GSO
> >> offload checks still happen on transmit from a tunnel device. So if the skb
> >> is not marked as CHECKSUM_UNNECESSARY or CHECKSUM_PARTIAL, we will get a
> >> warning from the gso stack.
> >
> > I don't entirely understand. The check should not hit on pass through,
> > where segs == skb:
> >
> >         if (segs != skb && unlikely(skb_needs_check(skb, tx_path) &&
> > !IS_ERR(segs)))
> >                 skb_warn_bad_offload(skb);
> >
>
> That's something I should have explained better. Let me try to shed some
> light on it now. We're hitting the skb_warn_bad_offload warning because
> skb_mac_gso_segment doesn't return any segments (segs == NULL).
>
> And that's because we bail out early out of __udp_gso_segment when we
> detect that the tunnel device is capable of tx-udp-segmentation
> (GSO_UDP_L4):
>
>         if (skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) {
>                 /* Packet is from an untrusted source, reset gso_segs. */
>                 skb_shinfo(gso_skb)->gso_segs = DIV_ROUND_UP(gso_skb->len - sizeof(*uh),
>                                                              mss);
>                 return NULL;
>         }

Oh I see. Thanks.

> It has not occurred to me before, but in the spirit of commit
> 8d74e9f88d65 "net: avoid skb_warn_bad_offload on IS_ERR" [1], we could
> tighten the check to exclude cases when segs == NULL. I'm thinking of:
>
>         if (segs != skb && !IS_ERR_OR_NULL(segs) && unlikely(skb_needs_check(skb, tx_path)))
>                 skb_warn_bad_offload(skb);

That looks sensible to me. And nicer than the ip_summed conversion in
udp_send_skb.

> That would be an alternative. Though I'm not sure I understand the
> consequences of such change fully yet. Namely if we're wouldn't be
> losing some diagnostics from the bad offload warning.
>
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8d74e9f88d65af8bb2e095aff506aa6eac755ada
>
> >> Today this can occur in two situations, which we check for in
> >> __ip_append_data() and __ip6_append_data():
> >>
> >> 1) when the tunnel device does not advertise checksum offload, or
> >> 2) when there are IPv6 extension headers present.
> >>
> >> To fix it mark UDP_GSO packets as CHECKSUM_UNNECESSARY early on the TX
> >> path, when still in the udp layer, since we need to have ip_summed set up
> >> correctly for GSO processing by tunnel devices.
> >
> > The previous patch converted segments post segmentation to
> > CHECKSUM_UNNECESSARY, which is fine as they had
> > already been checksummed in software, and CHECKSUM_NONE
> > packets on egress are common.
> >
> > This creates GSO packets without CHECKSUM_PARTIAL.
> > Segmentation offload always requires checksum offload. So these
> > would be weird new packets. And having CHECKSUM_NONE (or
> > equivalent), but entering software checksumming is also confusing.
>
> I agree this is confusing to reason about. That is a GSO packet with
> CHECKSUM_UNNECESSARY which has not undergone segmentation and csum
> offload in software.

I was mistaken earlier. Was looking at this code just yesterday too for

https://lore.kernel.org/netdev/20240726023359.879166-1-willemdebruijn.kernel@gmail.com/

We do set the GSO skb already skb CHECKSUM_NONE. So your suggestion is
not a significant change.

> Kind of related, I noticed that turning off tx-checksum-ip-generic with
> ethtool doesn't disable tx-udp-segmentation. That looks like a bug.

I saw the same :)

> > The crux is that I don't understand why the warning fires on tunnel
> > exit when no segmentation takes place there. Hopefully we can fix
> > in a way that does not introduce these weird GSO packets (but if
> > not, so be it).
>
> Attaching a self contained repro which I've been using to trace and
> understand the GSO code:
>
> ---8<---
>
> sh# cat repro-full.py
> #!/bin/env python
> #
> # `modprobe ip6_tunnel` might be needed.
> #
>
> import os
> import subprocess
> import shutil
> from socket import *
>
> UDP_SEGMENT = 103
>
> cmd = [shutil.which("ip"), "-batch", "/dev/stdin"]
> script = b"""
> link set dev lo up
>
> link add name sink mtu 1540 type dummy
> addr add dev sink fd11::2/48 nodad
> link set dev sink up
>
> tunnel add iptnl mode ip6ip6 remote fd11::1 local fd11::2 dev sink
> link set dev iptnl mtu 1500
> addr add dev iptnl fd00::2/48 nodad
> link set dev iptnl up
> """
> proc = subprocess.Popen(cmd, stdin=subprocess.PIPE)
> proc.communicate(input=script)
>
> os.system("ethtool -K sink tx-udp-segmentation off > /dev/null")
> os.system("ethtool -K sink tx-checksum-ip-generic off > /dev/null")
>
> # Alternatively to hopopts:
> # os.system("ethtool -K iptnl tx-checksum-ip-generic off")
>
> hopopts = b"\x00" * 8
> s = socket(AF_INET6, SOCK_DGRAM)
> s.setsockopt(IPPROTO_IPV6, IPV6_HOPOPTS, hopopts)
> s.setsockopt(SOL_UDP, UDP_SEGMENT, 145)
> s.sendto(b"x" * 3000, ("fd00::1", 9))
> sh# perf ftrace -G __skb_gso_segment --graph-opts noirqs,depth=5 -- unshare -n python repro-full.py
> # tracer: function_graph
> #
> # CPU  DURATION                  FUNCTION CALLS
> # |     |   |                     |   |   |   |
>  16)               |  __skb_gso_segment() {
>  16)   0.288 us    |    irq_enter_rcu(); /* = 0xffffa00c03d89ac0 */
>  16)   0.172 us    |    idle_cpu(); /* = 0x0 */
>  16)               |    skb_mac_gso_segment() {
>  16)   0.184 us    |      skb_network_protocol(); /* = 0xdd86 */
>  16)   0.161 us    |      __rcu_read_lock(); /* = 0x2 */
>  16)               |      ipv6_gso_segment() {
>  16)               |        rcu_read_lock_held() {
>  16)   0.151 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
>  16)   0.514 us    |        } /* rcu_read_lock_held = 0x1 */
>  16)               |        rcu_read_lock_held() {
>  16)   0.152 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
>  16)   0.459 us    |        } /* rcu_read_lock_held = 0x1 */
>  16)               |        rcu_read_lock_held() {
>  16)   0.151 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
>  16)   0.459 us    |        } /* rcu_read_lock_held = 0x1 */
>  16)               |        udp6_ufo_fragment() {
>  16)   0.237 us    |          __udp_gso_segment(); /* = 0x0 */
>  16)   0.727 us    |        } /* udp6_ufo_fragment = 0x0 */
>  16)   3.049 us    |      } /* ipv6_gso_segment = 0x0 */
>  16)   0.171 us    |      __rcu_read_unlock(); /* = 0x1 */
>  16)   4.748 us    |    } /* skb_mac_gso_segment = 0x0 */
>  16)               |    skb_warn_bad_offload() {
>  [...]
>  16) ! 785.215 us  |    } /* skb_warn_bad_offload = 0x0 */
>  16) ! 800.986 us  |  } /* __skb_gso_segment = 0x0 */
>  16)               |  __skb_gso_segment() {
>  16)   0.394 us    |    irq_enter_rcu(); /* = 0xffffa00c03d89ac0 */
>  16)   0.181 us    |    idle_cpu(); /* = 0x0 */
>  16)               |    skb_mac_gso_segment() {
>  16)   0.182 us    |      skb_network_protocol(); /* = 0xdd86 */
>  16)   0.178 us    |      __rcu_read_lock(); /* = 0x3 */
>  16)               |      ipv6_gso_segment() {
>  16)               |        rcu_read_lock_held() {
>  16)   0.155 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
>  16)   0.556 us    |        } /* rcu_read_lock_held = 0x1 */
>  16)               |        rcu_read_lock_held() {
>  16)   0.159 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
>  16)   0.480 us    |        } /* rcu_read_lock_held = 0x1 */
>  16)               |        rcu_read_lock_held() {
>  16)   0.159 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
>  16)   0.480 us    |        } /* rcu_read_lock_held = 0x1 */
>  16)               |        ip6ip6_gso_segment() {
>  16) + 22.176 us   |          ipv6_gso_segment(); /* = 0xffffa00c03018c00 */
>  16) + 24.875 us   |        } /* ip6ip6_gso_segment = 0xffffa00c03018c00 */
>  16) + 27.416 us   |      } /* ipv6_gso_segment = 0xffffa00c03018c00 */
>  16)   0.230 us    |      __rcu_read_unlock(); /* = 0x2 */
>  16) + 29.065 us   |    } /* skb_mac_gso_segment = 0xffffa00c03018c00 */
>  16) + 32.828 us   |  } /* __skb_gso_segment = 0xffffa00c03018c00 */
> sh#
Jakub Sitnicki July 29, 2024, 10:10 p.m. UTC | #4
On Fri, Jul 26, 2024 at 09:58 AM -04, Willem de Bruijn wrote:
> On Fri, Jul 26, 2024 at 7:23 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> On Thu, Jul 25, 2024 at 10:21 AM -04, Willem de Bruijn wrote:
>> > On Thu, Jul 25, 2024 at 5:56 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>> >>
>> >> In commit 10154dbded6d ("udp: Allow GSO transmit from devices with no
>> >> checksum offload") we have added a tweak in the UDP GSO code to mark GSO
>> >> packets being sent out as CHECKSUM_UNNECESSARY when the egress device
>> >> doesn't support checksum offload. This was done to satisfy the offload
>> >> checks in the gso stack.
>> >>
>> >> However, when sending a UDP GSO packet from a tunnel device, we will go
>> >> through the TX path and the GSO offload twice. Once for the tunnel device,
>> >> which acts as a passthru for GSO packets, and once for the underlying
>> >> egress device.
>> >>
>> >> Even though a tunnel device acts as a passthru for a UDP GSO packet, GSO
>> >> offload checks still happen on transmit from a tunnel device. So if the skb
>> >> is not marked as CHECKSUM_UNNECESSARY or CHECKSUM_PARTIAL, we will get a
>> >> warning from the gso stack.
>> >
>> > I don't entirely understand. The check should not hit on pass through,
>> > where segs == skb:
>> >
>> >         if (segs != skb && unlikely(skb_needs_check(skb, tx_path) &&
>> > !IS_ERR(segs)))
>> >                 skb_warn_bad_offload(skb);
>> >
>>
>> That's something I should have explained better. Let me try to shed some
>> light on it now. We're hitting the skb_warn_bad_offload warning because
>> skb_mac_gso_segment doesn't return any segments (segs == NULL).
>>
>> And that's because we bail out early out of __udp_gso_segment when we
>> detect that the tunnel device is capable of tx-udp-segmentation
>> (GSO_UDP_L4):
>>
>>         if (skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) {
>>                 /* Packet is from an untrusted source, reset gso_segs. */
>>                 skb_shinfo(gso_skb)->gso_segs = DIV_ROUND_UP(gso_skb->len - sizeof(*uh),
>>                                                              mss);
>>                 return NULL;
>>         }
>
> Oh I see. Thanks.
>
>> It has not occurred to me before, but in the spirit of commit
>> 8d74e9f88d65 "net: avoid skb_warn_bad_offload on IS_ERR" [1], we could
>> tighten the check to exclude cases when segs == NULL. I'm thinking of:
>>
>>         if (segs != skb && !IS_ERR_OR_NULL(segs) && unlikely(skb_needs_check(skb, tx_path)))
>>                 skb_warn_bad_offload(skb);
>
> That looks sensible to me. And nicer than the ip_summed conversion in
> udp_send_skb.

I've audited all existing ->gso_segment callbacks. skb_mac_gso_segment()
returns no segments, that is segs == NULL, if the callback chain ends
with either of these:

… → udp[46]_ufo_fragment → __udp_gso_segment → skb_gso_ok == true
… → tcp[46]_gso_segment → tcp_gso_segment → skb_gso_ok == true
… → sctp_gso_segment → skb_gso_ok == true

IOW when the device advertises that it can handle the desired GSO kind
(skb_gso_ok() returns true).

Considering that a device offering HW GSO and no checksum offload at the
same time makes no sense, I also think that tweaking the bad offload
detection to exclude the !segs case doesn't deprive us of diagnostics.

I will change to that in v2.
diff mbox series

Patch

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 49c622e743e8..b7254b8a1e56 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -946,6 +946,13 @@  static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4,
 		}
 
 		if (datalen > cork->gso_size) {
+			/* On the TX path CHECKSUM_NONE and CHECKSUM_UNNECESSARY
+			 * have the same meaning. However, check for bad
+			 * offloads in the GSO stack expects the latter, if the
+			 * checksum can be calculated in software.
+			 */
+			if (skb->ip_summed == CHECKSUM_NONE)
+				skb->ip_summed = CHECKSUM_UNNECESSARY;
 			skb_shinfo(skb)->gso_size = cork->gso_size;
 			skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4;
 			skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(datalen,
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index aa2e0a28ca61..59448a2dbf2c 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -357,14 +357,6 @@  struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
 	else
 		uh->check = gso_make_checksum(seg, ~check) ? : CSUM_MANGLED_0;
 
-	/* On the TX path, CHECKSUM_NONE and CHECKSUM_UNNECESSARY have the same
-	 * meaning. However, check for bad offloads in the GSO stack expects the
-	 * latter, if the checksum was calculated in software. To vouch for the
-	 * segment skbs we actually need to set it on the gso_skb.
-	 */
-	if (gso_skb->ip_summed == CHECKSUM_NONE)
-		gso_skb->ip_summed = CHECKSUM_UNNECESSARY;
-
 	/* update refcount for the packet */
 	if (copy_dtor) {
 		int delta = sum_truesize - gso_skb->truesize;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 6602a2e9cdb5..360392fc2b68 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1262,6 +1262,13 @@  static int udp_v6_send_skb(struct sk_buff *skb, struct flowi6 *fl6,
 		}
 
 		if (datalen > cork->gso_size) {
+			/* On the TX path CHECKSUM_NONE and CHECKSUM_UNNECESSARY
+			 * have the same meaning. However, check for bad
+			 * offloads in the GSO stack expects the latter, if the
+			 * checksum can be calculated in software.
+			 */
+			if (skb->ip_summed == CHECKSUM_NONE)
+				skb->ip_summed = CHECKSUM_UNNECESSARY;
 			skb_shinfo(skb)->gso_size = cork->gso_size;
 			skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4;
 			skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(datalen,