diff mbox series

[net,01/14] ipvs: align inner_mac_header for encapsulation

Message ID 20230621100731.68068-2-pablo@netfilter.org (mailing list archive)
State Accepted
Commit d7fce52fdf96663ddc2eb21afecff3775588612a
Delegated to: Netdev Maintainers
Headers show
Series [net,01/14] ipvs: align inner_mac_header for encapsulation | expand

Checks

Context Check Description
netdev/series_format success Pull request is its own cover letter
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 8 this patch: 8
netdev/cc_maintainers fail 3 blamed authors not CCed: ja@ssi.bg horms@verge.net.au hengqing.hu@gmail.com; 7 maintainers not CCed: hengqing.hu@gmail.com horms@verge.net.au fw@strlen.de coreteam@netfilter.org ja@ssi.bg kadlec@netfilter.org lvs-devel@vger.kernel.org
netdev/build_clang success Errors and warnings before: 8 this patch: 8
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 8 this patch: 8
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 14 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Pablo Neira Ayuso June 21, 2023, 10:07 a.m. UTC
From: Terin Stock <terin@cloudflare.com>

When using encapsulation the original packet's headers are copied to the
inner headers. This preserves the space for an inner mac header, which
is not used by the inner payloads for the encapsulation types supported
by IPVS. If a packet is using GUE or GRE encapsulation and needs to be
segmented, flow can be passed to __skb_udp_tunnel_segment() which
calculates a negative tunnel header length. A negative tunnel header
length causes pskb_may_pull() to fail, dropping the packet.

This can be observed by attaching probes to ip_vs_in_hook(),
__dev_queue_xmit(), and __skb_udp_tunnel_segment():

    perf probe --add '__dev_queue_xmit skb->inner_mac_header \
    skb->inner_network_header skb->mac_header skb->network_header'
    perf probe --add '__skb_udp_tunnel_segment:7 tnl_hlen'
    perf probe -m ip_vs --add 'ip_vs_in_hook skb->inner_mac_header \
    skb->inner_network_header skb->mac_header skb->network_header'

These probes the headers and tunnel header length for packets which
traverse the IPVS encapsulation path. A TCP packet can be forced into
the segmentation path by being smaller than a calculated clamped MSS,
but larger than the advertised MSS.

    probe:ip_vs_in_hook: inner_mac_header=0x0 inner_network_header=0x0 mac_header=0x44 network_header=0x52
    probe:ip_vs_in_hook: inner_mac_header=0x44 inner_network_header=0x52 mac_header=0x44 network_header=0x32
    probe:dev_queue_xmit: inner_mac_header=0x44 inner_network_header=0x52 mac_header=0x44 network_header=0x32
    probe:__skb_udp_tunnel_segment_L7: tnl_hlen=-2

When using veth-based encapsulation, the interfaces are set to be
mac-less, which does not preserve space for an inner mac header. This
prevents this issue from occurring.

In our real-world testing of sending a 32KB file we observed operation
time increasing from ~75ms for veth-based encapsulation to over 1.5s
using IPVS encapsulation due to retries from dropped packets.

This changeset modifies the packet on the encapsulation path in
ip_vs_tunnel_xmit() and ip_vs_tunnel_xmit_v6() to remove the inner mac
header offset. This fixes UDP segmentation for both encapsulation types,
and corrects the inner headers for any IPIP flows that may use it.

Fixes: 84c0d5e96f3a ("ipvs: allow tunneling with gue encapsulation")
Signed-off-by: Terin Stock <terin@cloudflare.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/ipvs/ip_vs_xmit.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

patchwork-bot+netdevbpf@kernel.org June 22, 2023, 2:30 p.m. UTC | #1
Hello:

This series was applied to netdev/net.git (main)
by Pablo Neira Ayuso <pablo@netfilter.org>:

On Wed, 21 Jun 2023 12:07:18 +0200 you wrote:
> From: Terin Stock <terin@cloudflare.com>
> 
> When using encapsulation the original packet's headers are copied to the
> inner headers. This preserves the space for an inner mac header, which
> is not used by the inner payloads for the encapsulation types supported
> by IPVS. If a packet is using GUE or GRE encapsulation and needs to be
> segmented, flow can be passed to __skb_udp_tunnel_segment() which
> calculates a negative tunnel header length. A negative tunnel header
> length causes pskb_may_pull() to fail, dropping the packet.
> 
> [...]

Here is the summary with links:
  - [net,01/14] ipvs: align inner_mac_header for encapsulation
    https://git.kernel.org/netdev/net/c/d7fce52fdf96
  - [net,02/14] netfilter: nf_tables: fix chain binding transaction logic
    https://git.kernel.org/netdev/net/c/4bedf9eee016
  - [net,03/14] netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain
    https://git.kernel.org/netdev/net/c/26b5a5712eb8
  - [net,04/14] netfilter: nf_tables: drop map element references from preparation phase
    https://git.kernel.org/netdev/net/c/628bd3e49cba
  - [net,05/14] netfilter: nft_set_pipapo: .walk does not deal with generations
    https://git.kernel.org/netdev/net/c/2b84e215f874
  - [net,06/14] netfilter: nf_tables: fix underflow in object reference counter
    https://git.kernel.org/netdev/net/c/d6b478666ffa
  - [net,07/14] netfilter: nf_tables: disallow element updates of bound anonymous sets
    https://git.kernel.org/netdev/net/c/c88c535b592d
  - [net,08/14] netfilter: nf_tables: reject unbound anonymous set before commit phase
    https://git.kernel.org/netdev/net/c/938154b93be8
  - [net,09/14] netfilter: nf_tables: reject unbound chain set before commit phase
    https://git.kernel.org/netdev/net/c/62e1e94b246e
  - [net,10/14] netfilter: nf_tables: disallow updates of anonymous sets
    https://git.kernel.org/netdev/net/c/b770283c98e0
  - [net,11/14] netfilter: nf_tables: disallow timeout for anonymous sets
    https://git.kernel.org/netdev/net/c/e26d3009efda
  - [net,12/14] netfilter: nf_tables: drop module reference after updating chain
    https://git.kernel.org/netdev/net/c/043d2acf5722
  - [net,13/14] netfilter: nfnetlink_osf: fix module autoload
    https://git.kernel.org/netdev/net/c/62f9a68a36d4
  - [net,14/14] netfilter: nf_tables: Fix for deleting base chains with payload
    https://git.kernel.org/netdev/net/c/42e344f01688

You are awesome, thank you!
diff mbox series

Patch

diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index feb1d7fcb09f..a80b960223e1 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -1207,6 +1207,7 @@  ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 	skb->transport_header = skb->network_header;
 
 	skb_set_inner_ipproto(skb, next_protocol);
+	skb_set_inner_mac_header(skb, skb_inner_network_offset(skb));
 
 	if (tun_type == IP_VS_CONN_F_TUNNEL_TYPE_GUE) {
 		bool check = false;
@@ -1349,6 +1350,7 @@  ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 	skb->transport_header = skb->network_header;
 
 	skb_set_inner_ipproto(skb, next_protocol);
+	skb_set_inner_mac_header(skb, skb_inner_network_offset(skb));
 
 	if (tun_type == IP_VS_CONN_F_TUNNEL_TYPE_GUE) {
 		bool check = false;