diff mbox series

[net] vxlan: calculate correct header length for GPE

Message ID 0699747bc9bd7aaf7dc87efd33aa6b95de7d793e.1689677201.git.jbenc@redhat.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net] vxlan: calculate correct header length for GPE | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1368 this patch: 1368
netdev/cc_maintainers fail 1 blamed authors not CCed: davem@davemloft.net; 7 maintainers not CCed: kuba@kernel.org razor@blackwall.org idosch@nvidia.com intel-wired-lan@lists.osuosl.org davem@davemloft.net pabeni@redhat.com edumazet@google.com
netdev/build_clang success Errors and warnings before: 1365 this patch: 1365
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1405 this patch: 1405
netdev/checkpatch warning WARNING: 'immediatelly' may be misspelled - perhaps 'immediately'? WARNING: Please use correct Fixes: style 'Fixes: <12 chars of sha1> ("<title line>")' - ie: 'Fixes: e1e5314de08b ("vxlan: implement GPE")' WARNING: line length of 91 exceeds 80 columns WARNING: line length of 97 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Jiri Benc July 18, 2023, 10:50 a.m. UTC
VXLAN-GPE does not add an extra inner Ethernet header. Take that into
account when calculating header length.

This causes problems in skb_tunnel_check_pmtu, where incorrect PMTU is
cached. If the VXLAN-GPE interface has MTU 1464 set (with the underlying
interface having the usual MTU of 1500), a TCP stream sent over the
tunnel is first segmented to 1514 byte frames only to be immediatelly
followed by a resend with 1500 bytes frames, before the other side even
has a chance to ack them.

Fixes: e1e5314de08ba ("vxlan: implement GPE")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  2 +-
 drivers/net/vxlan/vxlan_core.c                | 23 ++++++++-----------
 include/net/vxlan.h                           | 13 +++++++----
 3 files changed, 20 insertions(+), 18 deletions(-)

Comments

Jakub Kicinski July 20, 2023, 4:08 a.m. UTC | #1
On Tue, 18 Jul 2023 12:50:13 +0200 Jiri Benc wrote:
> This causes problems in skb_tunnel_check_pmtu, where incorrect PMTU is
> cached. If the VXLAN-GPE interface has MTU 1464 set (with the underlying
> interface having the usual MTU of 1500), a TCP stream sent over the
> tunnel is first segmented to 1514 byte frames only to be immediatelly
> followed by a resend with 1500 bytes frames, before the other side even
> has a chance to ack them.

Sounds like we are overly conservative, assuming the header will be
larger than it ends up being. But you're saying it leads to oversized,
not undersized packets?
Jiri Benc July 20, 2023, 8:17 a.m. UTC | #2
On Wed, 19 Jul 2023 21:08:28 -0700, Jakub Kicinski wrote:
> On Tue, 18 Jul 2023 12:50:13 +0200 Jiri Benc wrote:
> > This causes problems in skb_tunnel_check_pmtu, where incorrect PMTU is
> > cached. If the VXLAN-GPE interface has MTU 1464 set (with the underlying
> > interface having the usual MTU of 1500), a TCP stream sent over the
> > tunnel is first segmented to 1514 byte frames only to be immediatelly
> > followed by a resend with 1500 bytes frames, before the other side even
> > has a chance to ack them.  
> 
> Sounds like we are overly conservative, assuming the header will be
> larger than it ends up being. But you're saying it leads to oversized,
> not undersized packets?

Sorry for not providing enough details. The packets are actually
correctly sized, initially. Then a lower, incorrect PMTU is cached.

In the collect_md mode (which is the only mode that VXLAN-GPE
supports), there's no magic auto-setting of the tunnel interface MTU.
It can't be, since the destination and thus the underlying interface
may be different for each packet.

So, the administrator is responsible for setting the correct tunnel
interface MTU. Apparently, the administrators are capable enough to
calculate that the maximum MTU for VXLAN-GPE is (their_lower_MTU - 36).
They set the tunnel interface MTU to 1464. If you run a TCP stream over
such interface, it's then segmented according to the MTU 1464, i.e.
producing 1514 bytes frames. Which is okay, this still fits the lower
MTU.

However, skb_tunnel_check_pmtu (called from vxlan_xmit_one) uses 50 as
the header size and thus incorrectly calculates the frame size to be
1528. This leads to ICMP too big message being generated (locally),
PMTU of 1450 to be cached and the TCP stream to be resegmented.

The fix is to use the correct actual header size, especially for
skb_tunnel_check_pmtu calculation.

Should I resend with more detailed patch description?

Thanks,

 Jiri
Paolo Abeni July 20, 2023, 8:43 a.m. UTC | #3
On Thu, 2023-07-20 at 10:17 +0200, Jiri Benc wrote:
> On Wed, 19 Jul 2023 21:08:28 -0700, Jakub Kicinski wrote:
> > On Tue, 18 Jul 2023 12:50:13 +0200 Jiri Benc wrote:
> > > This causes problems in skb_tunnel_check_pmtu, where incorrect PMTU is
> > > cached. If the VXLAN-GPE interface has MTU 1464 set (with the underlying
> > > interface having the usual MTU of 1500), a TCP stream sent over the
> > > tunnel is first segmented to 1514 byte frames only to be immediatelly
> > > followed by a resend with 1500 bytes frames, before the other side even
> > > has a chance to ack them.  
> > 
> > Sounds like we are overly conservative, assuming the header will be
> > larger than it ends up being. But you're saying it leads to oversized,
> > not undersized packets?
> 
> Sorry for not providing enough details. The packets are actually
> correctly sized, initially. Then a lower, incorrect PMTU is cached.
> 
> In the collect_md mode (which is the only mode that VXLAN-GPE
> supports), there's no magic auto-setting of the tunnel interface MTU.
> It can't be, since the destination and thus the underlying interface
> may be different for each packet.
> 
> So, the administrator is responsible for setting the correct tunnel
> interface MTU. Apparently, the administrators are capable enough to
> calculate that the maximum MTU for VXLAN-GPE is (their_lower_MTU - 36).
> They set the tunnel interface MTU to 1464. If you run a TCP stream over
> such interface, it's then segmented according to the MTU 1464, i.e.
> producing 1514 bytes frames. Which is okay, this still fits the lower
> MTU.
> 
> However, skb_tunnel_check_pmtu (called from vxlan_xmit_one) uses 50 as
> the header size and thus incorrectly calculates the frame size to be
> 1528. This leads to ICMP too big message being generated (locally),
> PMTU of 1450 to be cached and the TCP stream to be resegmented.
> 
> The fix is to use the correct actual header size, especially for
> skb_tunnel_check_pmtu calculation.
> 
> Should I resend with more detailed patch description?

I guess there is not such a thing as a "too verbose commit message", so
I would say: yes please!

Thanks!

Paolo
Jiri Benc July 20, 2023, 8:50 a.m. UTC | #4
On Thu, 20 Jul 2023 11:43:05 +0300, Eyal Birger wrote:
> From looking at the geneve code it appears geneve would also have this
> problem when inner_proto_inherit=true as GENEVE_IPV4_HLEN includes
> ETH_HLEN.
> 
> Would you consider adding a fix to it as part of a series?

I can look into that. I wouldn't call it a series, though :-) Let's do
both separately.

 Jiri
Eyal Birger July 20, 2023, 9:02 a.m. UTC | #5
On Thu, Jul 20, 2023 at 11:50 AM Jiri Benc <jbenc@redhat.com> wrote:
>
> On Thu, 20 Jul 2023 11:43:05 +0300, Eyal Birger wrote:
> > From looking at the geneve code it appears geneve would also have this
> > problem when inner_proto_inherit=true as GENEVE_IPV4_HLEN includes
> > ETH_HLEN.
> >
> > Would you consider adding a fix to it as part of a series?
>
> I can look into that. I wouldn't call it a series, though :-) Let's do
> both separately.

SGTM. Thanks!
Eyal.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 1726297f2e0d..8eb9839a3ca6 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8479,7 +8479,7 @@  static void ixgbe_atr(struct ixgbe_ring *ring,
 		struct ixgbe_adapter *adapter = q_vector->adapter;
 
 		if (unlikely(skb_tail_pointer(skb) < hdr.network +
-			     VXLAN_HEADROOM))
+			     vxlan_headroom(0)))
 			return;
 
 		/* verify the port is recognized as VXLAN */
diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c
index 78744549c1b3..42be8a26c171 100644
--- a/drivers/net/vxlan/vxlan_core.c
+++ b/drivers/net/vxlan/vxlan_core.c
@@ -2516,7 +2516,7 @@  void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 		}
 
 		ndst = &rt->dst;
-		err = skb_tunnel_check_pmtu(skb, ndst, VXLAN_HEADROOM,
+		err = skb_tunnel_check_pmtu(skb, ndst, vxlan_headroom(flags & VXLAN_F_GPE),
 					    netif_is_any_bridge_port(dev));
 		if (err < 0) {
 			goto tx_error;
@@ -2577,7 +2577,8 @@  void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 				goto out_unlock;
 		}
 
-		err = skb_tunnel_check_pmtu(skb, ndst, VXLAN6_HEADROOM,
+		err = skb_tunnel_check_pmtu(skb, ndst,
+					    vxlan_headroom((flags & VXLAN_F_GPE) | VXLAN_F_IPV6),
 					    netif_is_any_bridge_port(dev));
 		if (err < 0) {
 			goto tx_error;
@@ -2989,14 +2990,12 @@  static int vxlan_change_mtu(struct net_device *dev, int new_mtu)
 	struct vxlan_rdst *dst = &vxlan->default_dst;
 	struct net_device *lowerdev = __dev_get_by_index(vxlan->net,
 							 dst->remote_ifindex);
-	bool use_ipv6 = !!(vxlan->cfg.flags & VXLAN_F_IPV6);
 
 	/* This check is different than dev->max_mtu, because it looks at
 	 * the lowerdev->mtu, rather than the static dev->max_mtu
 	 */
 	if (lowerdev) {
-		int max_mtu = lowerdev->mtu -
-			      (use_ipv6 ? VXLAN6_HEADROOM : VXLAN_HEADROOM);
+		int max_mtu = lowerdev->mtu - vxlan_headroom(vxlan->cfg.flags);
 		if (new_mtu > max_mtu)
 			return -EINVAL;
 	}
@@ -3644,11 +3643,11 @@  static void vxlan_config_apply(struct net_device *dev,
 	struct vxlan_dev *vxlan = netdev_priv(dev);
 	struct vxlan_rdst *dst = &vxlan->default_dst;
 	unsigned short needed_headroom = ETH_HLEN;
-	bool use_ipv6 = !!(conf->flags & VXLAN_F_IPV6);
 	int max_mtu = ETH_MAX_MTU;
+	u32 flags = conf->flags;
 
 	if (!changelink) {
-		if (conf->flags & VXLAN_F_GPE)
+		if (flags & VXLAN_F_GPE)
 			vxlan_raw_setup(dev);
 		else
 			vxlan_ether_setup(dev);
@@ -3673,8 +3672,7 @@  static void vxlan_config_apply(struct net_device *dev,
 
 		dev->needed_tailroom = lowerdev->needed_tailroom;
 
-		max_mtu = lowerdev->mtu - (use_ipv6 ? VXLAN6_HEADROOM :
-					   VXLAN_HEADROOM);
+		max_mtu = lowerdev->mtu - vxlan_headroom(flags);
 		if (max_mtu < ETH_MIN_MTU)
 			max_mtu = ETH_MIN_MTU;
 
@@ -3685,10 +3683,9 @@  static void vxlan_config_apply(struct net_device *dev,
 	if (dev->mtu > max_mtu)
 		dev->mtu = max_mtu;
 
-	if (use_ipv6 || conf->flags & VXLAN_F_COLLECT_METADATA)
-		needed_headroom += VXLAN6_HEADROOM;
-	else
-		needed_headroom += VXLAN_HEADROOM;
+	if (flags & VXLAN_F_COLLECT_METADATA)
+		flags |= VXLAN_F_IPV6;
+	needed_headroom += vxlan_headroom(flags);
 	dev->needed_headroom = needed_headroom;
 
 	memcpy(&vxlan->cfg, conf, sizeof(*conf));
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 0be91ca78d3a..1648240c9668 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -386,10 +386,15 @@  static inline netdev_features_t vxlan_features_check(struct sk_buff *skb,
 	return features;
 }
 
-/* IP header + UDP + VXLAN + Ethernet header */
-#define VXLAN_HEADROOM (20 + 8 + 8 + 14)
-/* IPv6 header + UDP + VXLAN + Ethernet header */
-#define VXLAN6_HEADROOM (40 + 8 + 8 + 14)
+static inline int vxlan_headroom(u32 flags)
+{
+	/* VXLAN:     IP4/6 header + UDP + VXLAN + Ethernet header */
+	/* VXLAN-GPE: IP4/6 header + UDP + VXLAN */
+	return (flags & VXLAN_F_IPV6 ? sizeof(struct ipv6hdr) :
+				       sizeof(struct iphdr)) +
+	       sizeof(struct udphdr) + sizeof(struct vxlanhdr) +
+	       (flags & VXLAN_F_GPE ? 0 : ETH_HLEN);
+}
 
 static inline struct vxlanhdr *vxlan_hdr(struct sk_buff *skb)
 {