diff mbox series

[RFC,ipsec-next] xfrm: fix tunnel mode TX datapath in packet offload mode

Message ID af1b9df0b22d7a9f208e093356412f8976cc1bc2.1738780166.git.leon@kernel.org (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series [RFC,ipsec-next] xfrm: fix tunnel mode TX datapath in packet offload mode | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers fail 1 blamed authors not CCed: raeds@nvidia.com; 6 maintainers not CCed: herbert@gondor.apana.org.au raeds@nvidia.com edumazet@google.com horms@kernel.org kuba@kernel.org pabeni@redhat.com
netdev/build_clang success Errors and warnings before: 2 this patch: 2
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1 this patch: 1
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 56 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Leon Romanovsky Feb. 5, 2025, 6:41 p.m. UTC
From: Alexandre Cassen <acassen@corp.free.fr>

Packets that match the output xfrm policy are delivered to the netstack.
In IPsec packet mode for tunnel mode, the HW is responsible for building the
hard header and outer IP header. In such a situation, the inner header may
refer to a network that is not directly reachable by the host, resulting in
a failed neighbor resolution. The packet is then dropped. xfrm policy defines
the netdevice to use for xmit so we can send packets directly to it.

This fix also provides a performance improvement for transport mode, since
there is no need to perform neighbor resolution if the HW is already configured
to do so.

Fixes: f8a70afafc17 ("xfrm: add TX datapath support for IPsec packet offload mode")
Signed-off-by: Alexandre Cassen <acassen@corp.free.fr>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
Steffen,

I'm sending this patch AS IS to get feedback if it is right approach.

Thanks
---
 net/xfrm/xfrm_output.c | 38 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 36 insertions(+), 2 deletions(-)
diff mbox series

Patch

diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
index 34c8e266641c..4ad83b9ea0e9 100644
--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c
@@ -495,7 +495,7 @@  static int xfrm_output_one(struct sk_buff *skb, int err)
 	struct xfrm_state *x = dst->xfrm;
 	struct net *net = xs_net(x);
 
-	if (err <= 0 || x->xso.type == XFRM_DEV_OFFLOAD_PACKET)
+	if (err <= 0)
 		goto resume;
 
 	do {
@@ -612,6 +612,40 @@  int xfrm_output_resume(struct sock *sk, struct sk_buff *skb, int err)
 }
 EXPORT_SYMBOL_GPL(xfrm_output_resume);
 
+static int xfrm_dev_direct_output(struct sock *sk, struct xfrm_state *x,
+				  struct sk_buff *skb)
+{
+	struct dst_entry *dst = skb_dst(skb);
+	struct net *net = xs_net(x);
+	int err;
+
+	dst = skb_dst_pop(skb);
+	if (!dst) {
+		XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTERROR);
+		kfree_skb(skb);
+		return -EHOSTUNREACH;
+	}
+	skb_dst_set(skb, dst);
+	nf_reset_ct(skb);
+
+	err = skb_dst(skb)->ops->local_out(net, sk, skb);
+	if (unlikely(err != 1)) {
+		kfree_skb(skb);
+		return err;
+	}
+
+	/* In transport mode, network destination is
+	 * directly reachable, while in tunnel mode,
+	 * inner packet network may not be. In packet
+	 * offload type, HW is responsible for hard
+	 * header packet mangling so directly xmit skb
+	 * to netdevice.
+	 */
+	skb->dev = x->xso.dev;
+	__skb_push(skb, skb->dev->hard_header_len);
+	return dev_queue_xmit(skb);
+}
+
 static int xfrm_output2(struct net *net, struct sock *sk, struct sk_buff *skb)
 {
 	return xfrm_output_resume(sk, skb, 1);
@@ -735,7 +769,7 @@  int xfrm_output(struct sock *sk, struct sk_buff *skb)
 			return -EHOSTUNREACH;
 		}
 
-		return xfrm_output_resume(sk, skb, 0);
+		return xfrm_dev_direct_output(sk, x, skb);
 	}
 
 	secpath_reset(skb);