Message ID | 20240819111745.129190-1-amy.saq@antgroup.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: fix csum calculation for encapsulated packets | expand |
On 8/19/24 13:17, 沈安琪(凛玥) wrote: > This commit fixes the issue that when a packet is encapsulated, such as > sending through a UDP tunnel, the outer TCP/UDP checksum is not > correctly recalculated if (1) checksum has been offloaded to hardware > and (2) encapsulated packet has been NAT-ed again, which causes the > packet being dropped due to the invalid outer checksum. > > Previously, when an encapsulated packet met some NAT rules and its > src/dst ip and/or src/dst port has been modified, > inet_proto_csum_replace4 will be invoked to recalculated the outer > checksum. However, if the packet is under the following condition: (1) > checksum offloaded to hardware and (2) NAT rule has changed the src/dst > port, its outer checksum will not be recalculated, since (1) > skb->ip_summed is set to CHECKSUM_PARTIAL due to csum offload and (2) > pseudohdr is set to false since port number is not part of pseudo > header. I don't see where nat is calling inet_proto_csum_replace4() with pseudohdr == false: please include more detailed description of the relevant setup (ideally a self-test) or at least a backtrace leading to the issue. > This leads to the outer TCP/UDP checksum invalid since it does > not change along with the port number change. > > In this commit, another condition has been added to recalculate outer > checksum: if (1) the packet is encapsulated, (2) checksum has been > offloaded, (3) the encapsulated packet has been NAT-ed to change port > number and (4) outer checksum is needed, the outer checksum for > encapsulated packet will be recalculated to make sure it is valid. Please add a suitable fix tag. > Signed-off-by: Anqi Shen <amy.saq@antgroup.com> > --- > net/core/utils.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/net/core/utils.c b/net/core/utils.c > index c994e95172ac..d9de60e9b347 100644 > --- a/net/core/utils.c > +++ b/net/core/utils.c > @@ -435,6 +435,8 @@ void inet_proto_csum_replace4(__sum16 *sum, struct sk_buff *skb, > *sum = ~csum_fold(csum_add(csum_sub(csum_unfold(*sum), > (__force __wsum)from), > (__force __wsum)to)); > + else if (skb->encapsulation && !!(*sum)) > + csum_replace4(sum, from, to); This looks incorrect for a csum partial value, and AFAICS the nat caller has already checked for !!(*sum). Thanks, Paolo
在 8/22/24 下午6:05, Paolo Abeni 写道: > > > On 8/19/24 13:17, 沈安琪(凛玥) wrote: >> This commit fixes the issue that when a packet is encapsulated, such as >> sending through a UDP tunnel, the outer TCP/UDP checksum is not >> correctly recalculated if (1) checksum has been offloaded to hardware >> and (2) encapsulated packet has been NAT-ed again, which causes the >> packet being dropped due to the invalid outer checksum. >> >> Previously, when an encapsulated packet met some NAT rules and its >> src/dst ip and/or src/dst port has been modified, >> inet_proto_csum_replace4 will be invoked to recalculated the outer >> checksum. However, if the packet is under the following condition: (1) >> checksum offloaded to hardware and (2) NAT rule has changed the src/dst >> port, its outer checksum will not be recalculated, since (1) >> skb->ip_summed is set to CHECKSUM_PARTIAL due to csum offload and (2) >> pseudohdr is set to false since port number is not part of pseudo >> header. > > I don't see where nat is calling inet_proto_csum_replace4() with > pseudohdr == false: please include more detailed description of the > relevant setup (ideally a self-test) or at least a backtrace leading > to the issue. The relevant setup we found this issue was: 1. we setup a VXLAN tunnel and set a MASQUERADE rule with --random-fully enabled on client side (tx side). 2. we enabled. NIC checksum offload. We used the following bpftrace script to get the call trace: bpftrace -o trace.log -e '#include <linux/skbuff.h> #include <linux/net.h> #include <linux/ip.h> #include <linux/udp.h> #include <linux/if_ether.h> #include <linux/types.h> #include <net/sock.h> #include <net/route.h> #include <net/dst.h> kprobe:inet_proto_csum_replace4 { $skb = (struct sk_buff *)arg1; $ip_hdr = (struct iphdr *)($skb->head + $skb->network_header); $proto = $ip_hdr->protocol; if ($proto == 17) { $udp_hdr = (struct udphdr *)($skb->head + $skb->transport_header); printf("udp check: 0x%04x; \n", $udp_hdr->check); printf("skb ip_summed: %d, skb encapsulation: %d, skb encap_csum_hdr: %d; \n", $skb->ip_summed, $skb->encapsulation, $skb->encap_hdr_csum); printf("from: 0x%08x, to: 0x%08x; \n", arg2, arg3); printf("pseudohdr: %d; \n", arg4); printf("enter kprobe:inet_proto_csum_replace4 : %s\n\n", kstack); } } ' And get the following call trace: udp check: [...]; skb ip_summed: 3, skb encapsulation: 1, skb encap_csum_hdr: 0; from: ...[old src ip], to: ...[NAT-ed src ip]; pseudohdr: 1; enter kprobe:inet_proto_csum_replace4 : inet_proto_csum_replace4+1 l4proto_manip_pkt+1166 nf_nat_ipv4_manip_pkt+90 nf_nat_manip_pkt+141 nf_nat_ipv4_out+76 nf_hook_slow+57 ip_output+225 iptunnel_xmit+356 vxlan_xmit_one+3184 vxlan_xmit+823 xmit_one.constprop.0+149 dev_hard_start_xmit+80 __dev_queue_xmit+962 ip_finish_output2+417 ip_send_skb+56 udp_send_skb+337 udp_sendmsg+2404 sock_sendmsg+51 ____sys_sendmsg+487 ___sys_sendmsg+117 __sys_sendmsg+89 do_syscall_64+45 entry_SYSCALL_64_after_hwframe+68 udp check: [...]; skb ip_summed: 3, skb encapsulation: 1, skb encap_csum_hdr: 0; from: ...[old src port], to: ...[new src port]; pseudohdr: 0; enter kprobe:inet_proto_csum_replace4 : inet_proto_csum_replace4+1 l4proto_manip_pkt+1195 nf_nat_ipv4_manip_pkt+90 nf_nat_manip_pkt+141 nf_nat_ipv4_out+76 nf_hook_slow+57 ip_output+225 iptunnel_xmit+356 vxlan_xmit_one+3184 vxlan_xmit+823 xmit_one.constprop.0+149 dev_hard_start_xmit+80 __dev_queue_xmit+962 ip_finish_output2+417 ip_send_skb+56 udp_send_skb+337 udp_sendmsg+2404 sock_sendmsg+51 ____sys_sendmsg+487 ___sys_sendmsg+117 __sys_sendmsg+89 do_syscall_64+45 entry_SYSCALL_64_after_hwframe+68 We trace the current implementation and found in __udp_manip_pkt https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/net/netfilter/nf_nat_proto.c#n57 https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/include/net/checksum.h#n164 will invoke inet_proto_csum_replace4 with pseudohdr as false. > >> This leads to the outer TCP/UDP checksum invalid since it does >> not change along with the port number change. >> >> In this commit, another condition has been added to recalculate outer >> checksum: if (1) the packet is encapsulated, (2) checksum has been >> offloaded, (3) the encapsulated packet has been NAT-ed to change port >> number and (4) outer checksum is needed, the outer checksum for >> encapsulated packet will be recalculated to make sure it is valid. > > Please add a suitable fix tag. Thanks for notifying this. We will add it in the next version of this patch. > >> Signed-off-by: Anqi Shen <amy.saq@antgroup.com> >> --- >> net/core/utils.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/net/core/utils.c b/net/core/utils.c >> index c994e95172ac..d9de60e9b347 100644 >> --- a/net/core/utils.c >> +++ b/net/core/utils.c >> @@ -435,6 +435,8 @@ void inet_proto_csum_replace4(__sum16 *sum, >> struct sk_buff *skb, >> *sum = ~csum_fold(csum_add(csum_sub(csum_unfold(*sum), >> (__force __wsum)from), >> (__force __wsum)to)); >> + else if (skb->encapsulation && !!(*sum)) >> + csum_replace4(sum, from, to); > > This looks incorrect for a csum partial value, and AFAICS the nat > caller has already checked for !!(*sum). > > Thanks, > > Paolo
diff --git a/net/core/utils.c b/net/core/utils.c index c994e95172ac..d9de60e9b347 100644 --- a/net/core/utils.c +++ b/net/core/utils.c @@ -435,6 +435,8 @@ void inet_proto_csum_replace4(__sum16 *sum, struct sk_buff *skb, *sum = ~csum_fold(csum_add(csum_sub(csum_unfold(*sum), (__force __wsum)from), (__force __wsum)to)); + else if (skb->encapsulation && !!(*sum)) + csum_replace4(sum, from, to); } EXPORT_SYMBOL(inet_proto_csum_replace4);
This commit fixes the issue that when a packet is encapsulated, such as sending through a UDP tunnel, the outer TCP/UDP checksum is not correctly recalculated if (1) checksum has been offloaded to hardware and (2) encapsulated packet has been NAT-ed again, which causes the packet being dropped due to the invalid outer checksum. Previously, when an encapsulated packet met some NAT rules and its src/dst ip and/or src/dst port has been modified, inet_proto_csum_replace4 will be invoked to recalculated the outer checksum. However, if the packet is under the following condition: (1) checksum offloaded to hardware and (2) NAT rule has changed the src/dst port, its outer checksum will not be recalculated, since (1) skb->ip_summed is set to CHECKSUM_PARTIAL due to csum offload and (2) pseudohdr is set to false since port number is not part of pseudo header. This leads to the outer TCP/UDP checksum invalid since it does not change along with the port number change. In this commit, another condition has been added to recalculate outer checksum: if (1) the packet is encapsulated, (2) checksum has been offloaded, (3) the encapsulated packet has been NAT-ed to change port number and (4) outer checksum is needed, the outer checksum for encapsulated packet will be recalculated to make sure it is valid. Signed-off-by: Anqi Shen <amy.saq@antgroup.com> --- net/core/utils.c | 2 ++ 1 file changed, 2 insertions(+)