diff mbox series

xfrm: Add pre-encap fragmentation for packet offload

Message ID 20241124093531.3783434-1-ilia.lin@kernel.org (mailing list archive)
State New
Delegated to: Netdev Maintainers
Headers show
Series xfrm: Add pre-encap fragmentation for packet offload | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 3 this patch: 3
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 8 of 8 maintainers
netdev/build_clang success Errors and warnings before: 3 this patch: 3
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 4 this patch: 4
netdev/checkpatch warning WARNING: line length of 87 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest fail net-next-2024-11-24--12-00 (tests: 276)

Commit Message

Ilia Lin Nov. 24, 2024, 9:35 a.m. UTC
In packet offload mode the raw packets will be sent to the NiC,
and will not return to the Network Stack. In event of crossing
the MTU size after the encapsulation, the NiC HW may not be
able to fragment the final packet.
Adding mandatory pre-encapsulation fragmentation for both
IPv4 and IPv6, if tunnel mode with packet offload is configured
on the state.

Signed-off-by: Ilia Lin <ilia.lin@kernel.org>
---
 net/ipv4/xfrm4_output.c | 31 +++++++++++++++++++++++++++++--
 net/ipv6/xfrm6_output.c |  8 ++++++--
 2 files changed, 35 insertions(+), 4 deletions(-)

Comments

Leon Romanovsky Nov. 24, 2024, 12:04 p.m. UTC | #1
On Sun, Nov 24, 2024 at 11:35:31AM +0200, Ilia Lin wrote:
> In packet offload mode the raw packets will be sent to the NiC,
> and will not return to the Network Stack. In event of crossing
> the MTU size after the encapsulation, the NiC HW may not be
> able to fragment the final packet.

Yes, HW doesn't know how to handle these packets.

> Adding mandatory pre-encapsulation fragmentation for both
> IPv4 and IPv6, if tunnel mode with packet offload is configured
> on the state.

I was under impression is that xfrm_dev_offload_ok() is responsible to
prevent fragmentation.
https://elixir.bootlin.com/linux/v6.12/source/net/xfrm/xfrm_device.c#L410

Thanks
diff mbox series

Patch

diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c
index 3cff51ba72bb0..a4271e0dd51bb 100644
--- a/net/ipv4/xfrm4_output.c
+++ b/net/ipv4/xfrm4_output.c
@@ -14,17 +14,44 @@ 
 #include <net/xfrm.h>
 #include <net/icmp.h>
 
+static int __xfrm4_output_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
+{
+	return xfrm_output(sk, skb);
+}
+
 static int __xfrm4_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 {
-#ifdef CONFIG_NETFILTER
-	struct xfrm_state *x = skb_dst(skb)->xfrm;
+	struct dst_entry *dst = skb_dst(skb);
+	struct xfrm_state *x = dst->xfrm;
+	unsigned int mtu;
+	bool toobig;
 
+#ifdef CONFIG_NETFILTER
 	if (!x) {
 		IPCB(skb)->flags |= IPSKB_REROUTED;
 		return dst_output(net, sk, skb);
 	}
 #endif
 
+	if (x->props.mode != XFRM_MODE_TUNNEL || x->xso.type != XFRM_DEV_OFFLOAD_PACKET)
+		goto skip_frag;
+
+	mtu = xfrm_state_mtu(x, dst_mtu(skb_dst(skb)));
+
+	toobig = skb->len > mtu && !skb_is_gso(skb);
+
+	if (!skb->ignore_df && toobig && skb->sk) {
+		xfrm_local_error(skb, mtu);
+		kfree_skb(skb);
+		return -EMSGSIZE;
+	}
+
+	if (toobig) {
+		IPCB(skb)->frag_max_size = mtu;
+		return ip_do_fragment(net, sk, skb, __xfrm4_output_finish);
+	}
+
+skip_frag:
 	return xfrm_output(sk, skb);
 }
 
diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c
index 5f7b1fdbffe62..fdd2f2f5adc71 100644
--- a/net/ipv6/xfrm6_output.c
+++ b/net/ipv6/xfrm6_output.c
@@ -75,10 +75,14 @@  static int __xfrm6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 	if (x->props.mode != XFRM_MODE_TUNNEL)
 		goto skip_frag;
 
-	if (skb->protocol == htons(ETH_P_IPV6))
+	if (x->xso.type == XFRM_DEV_OFFLOAD_PACKET) {
+		mtu = xfrm_state_mtu(x, dst_mtu(skb_dst(skb)));
+		IP6CB(skb)->frag_max_size = mtu;
+	} else if (skb->protocol == htons(ETH_P_IPV6)) {
 		mtu = ip6_skb_dst_mtu(skb);
-	else
+	} else {
 		mtu = dst_mtu(skb_dst(skb));
+	}
 
 	toobig = skb->len > mtu && !skb_is_gso(skb);