diff mbox series

[net,v2] gso: do not skip outer ip header in case of ipip and net_failover

Message ID 20220216081041.70831-1-thomas.liu@ucloud.cn (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net,v2] gso: do not skip outer ip header in case of ipip and net_failover | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 3 this patch: 3
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 18 this patch: 18
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 8 this patch: 8
netdev/checkpatch warning WARNING: line length of 86 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Tao Liu Feb. 16, 2022, 8:10 a.m. UTC
We encounter a tcp drop issue in our cloud environment. Packet GROed in
host forwards to a VM virtio_net nic with net_failover enabled. VM acts
as a IPVS LB with ipip encapsulation. The full path like:
host gro -> vm virtio_net rx -> net_failover rx -> ipvs fullnat
 -> ipip encap -> net_failover tx -> virtio_net tx

When net_failover transmits a ipip pkt (gso_type = 0x0103), there is no gso
did because it supports TSO and GSO_IPXIP4. But network_header points to
inner ip header.

Call Trace:
 tcp4_gso_segment        ------> return NULL
 inet_gso_segment        ------> inner iph, network_header points to
 ipip_gso_segment
 inet_gso_segment        ------> outer iph
 skb_mac_gso_segment

Afterwards virtio_net transmits the pkt, only inner ip header is modified.
And the outer one just keeps unchanged. The pkt will be dropped in remote
host. So we need to reset network header in this case.

Call Trace:
 inet_gso_segment        ------> inner iph, outer iph is skipped
 skb_mac_gso_segment
 __skb_gso_segment
 validate_xmit_skb
 validate_xmit_skb_list
 sch_direct_xmit
 __qdisc_run
 __dev_queue_xmit        ------> virtio_net
 dev_hard_start_xmit
 __dev_queue_xmit        ------> net_failover
 ip_finish_output2
 ip_output
 iptunnel_xmit
 ip_tunnel_xmit
 ipip_tunnel_xmit        ------> ipip
 dev_hard_start_xmit
 __dev_queue_xmit
 ip_finish_output2
 ip_output
 ip_forward
 ip_rcv
 __netif_receive_skb_one_core
 netif_receive_skb_internal
 napi_gro_receive
 receive_buf
 virtnet_poll
 net_rx_action

This patch also includes ipv6_gso_segment(), considering SIT, etc.

Fixes: cb32f511a70b ("ipip: add GSO/TSO support")
Fixes: cfc80d9a1163 ("net: Introduce net_failover driver")
Signed-off-by: Tao Liu <thomas.liu@ucloud.cn>
---
 net/ipv4/af_inet.c     | 5 ++++-
 net/ipv6/ip6_offload.c | 2 ++
 2 files changed, 6 insertions(+), 1 deletion(-)

Comments

Willem de Bruijn Feb. 16, 2022, 1:29 p.m. UTC | #1
On Wed, Feb 16, 2022 at 3:23 AM Tao Liu <thomas.liu@ucloud.cn> wrote:
>
> We encounter a tcp drop issue in our cloud environment. Packet GROed in
> host forwards to a VM virtio_net nic with net_failover enabled. VM acts
> as a IPVS LB with ipip encapsulation. The full path like:
> host gro -> vm virtio_net rx -> net_failover rx -> ipvs fullnat
>  -> ipip encap -> net_failover tx -> virtio_net tx
>
> When net_failover transmits a ipip pkt (gso_type = 0x0103), there is no gso
> did because it supports TSO and GSO_IPXIP4. But network_header points to
> inner ip header.
>
> Call Trace:
>  tcp4_gso_segment        ------> return NULL
>  inet_gso_segment        ------> inner iph, network_header points to
>  ipip_gso_segment
>  inet_gso_segment        ------> outer iph
>  skb_mac_gso_segment
>
> Afterwards virtio_net transmits the pkt, only inner ip header is modified.
> And the outer one just keeps unchanged. The pkt will be dropped in remote
> host. So we need to reset network header in this case.
>
> Call Trace:
>  inet_gso_segment        ------> inner iph, outer iph is skipped
>  skb_mac_gso_segment
>  __skb_gso_segment
>  validate_xmit_skb
>  validate_xmit_skb_list
>  sch_direct_xmit
>  __qdisc_run
>  __dev_queue_xmit        ------> virtio_net
>  dev_hard_start_xmit
>  __dev_queue_xmit        ------> net_failover
>  ip_finish_output2
>  ip_output
>  iptunnel_xmit
>  ip_tunnel_xmit
>  ipip_tunnel_xmit        ------> ipip
>  dev_hard_start_xmit
>  __dev_queue_xmit
>  ip_finish_output2
>  ip_output
>  ip_forward
>  ip_rcv
>  __netif_receive_skb_one_core
>  netif_receive_skb_internal
>  napi_gro_receive
>  receive_buf
>  virtnet_poll
>  net_rx_action

I think the message could be rewritten to point out that the issue is
specific with the rare combination of SKB_GSO_DODGY and a tunnel
device that adds an SKB_GSO_ tunnel option.

> This patch also includes ipv6_gso_segment(), considering SIT, etc.
>
> Fixes: cb32f511a70b ("ipip: add GSO/TSO support")
> Fixes: cfc80d9a1163 ("net: Introduce net_failover driver")

This is not a net_failover issue.

I'm not sure whether the issue existed at the time tunnel support was
added, or introduced later. It's reasonable to assume that it was
always there, but it might be worth a quick code inspection.

> Signed-off-by: Tao Liu <thomas.liu@ucloud.cn>
> ---
>  net/ipv4/af_inet.c     | 5 ++++-
>  net/ipv6/ip6_offload.c | 2 ++
>  2 files changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 9c465ba..72fde28 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1376,8 +1376,11 @@ struct sk_buff *inet_gso_segment(struct sk_buff *skb,
>         }
>
>         ops = rcu_dereference(inet_offloads[proto]);
> -       if (likely(ops && ops->callbacks.gso_segment))
> +       if (likely(ops && ops->callbacks.gso_segment)) {
>                 segs = ops->callbacks.gso_segment(skb, features);
> +               if (!segs)
> +                       skb->network_header = skb_mac_header(skb) + nhoff - skb->head;
> +       }
>
>         if (IS_ERR_OR_NULL(segs))
>                 goto out;

It's unfortunate that we have to add a branch in the common path. But
I also don't immediately see a cleaner option.
diff mbox series

Patch

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 9c465ba..72fde28 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1376,8 +1376,11 @@  struct sk_buff *inet_gso_segment(struct sk_buff *skb,
 	}
 
 	ops = rcu_dereference(inet_offloads[proto]);
-	if (likely(ops && ops->callbacks.gso_segment))
+	if (likely(ops && ops->callbacks.gso_segment)) {
 		segs = ops->callbacks.gso_segment(skb, features);
+		if (!segs)
+			skb->network_header = skb_mac_header(skb) + nhoff - skb->head;
+	}
 
 	if (IS_ERR_OR_NULL(segs))
 		goto out;
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index b29e9ba..5f577e2 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -114,6 +114,8 @@  static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
 	if (likely(ops && ops->callbacks.gso_segment)) {
 		skb_reset_transport_header(skb);
 		segs = ops->callbacks.gso_segment(skb, features);
+		if (!segs)
+			skb->network_header = skb_mac_header(skb) + nhoff - skb->head;
 	}
 
 	if (IS_ERR_OR_NULL(segs))