diff mbox series

[net-next,01/15] net: add netdev->tso_ipv6_max_size attribute

Message ID 20220203015140.3022854-2-eric.dumazet@gmail.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series tcp: BIG TCP implementation | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 4834 this patch: 4834
netdev/cc_maintainers warning 1 maintainers not CCed: liuhangbin@gmail.com
netdev/build_clang success Errors and warnings before: 823 this patch: 823
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 4987 this patch: 4987
netdev/checkpatch warning CHECK: Please don't use multiple blank lines WARNING: line length of 83 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Eric Dumazet Feb. 3, 2022, 1:51 a.m. UTC
From: Eric Dumazet <edumazet@google.com>

Some NIC (or virtual devices) are LSOv2 compatible.

BIG TCP plans using the large LSOv2 feature for IPv6.

New netlink attribute IFLA_TSO_IPV6_MAX_SIZE is defined.

Drivers should use netif_set_tso_ipv6_max_size() to advertize their limit.

Unchanged drivers are not allowing big TSO packets to be sent.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/netdevice.h          | 10 ++++++++++
 include/uapi/linux/if_link.h       |  1 +
 net/core/dev.c                     |  2 ++
 net/core/rtnetlink.c               |  3 +++
 tools/include/uapi/linux/if_link.h |  1 +
 5 files changed, 17 insertions(+)

Comments

Jakub Kicinski Feb. 3, 2022, 4:34 p.m. UTC | #1
On Wed,  2 Feb 2022 17:51:26 -0800 Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> Some NIC (or virtual devices) are LSOv2 compatible.
> 
> BIG TCP plans using the large LSOv2 feature for IPv6.
> 
> New netlink attribute IFLA_TSO_IPV6_MAX_SIZE is defined.
> 
> Drivers should use netif_set_tso_ipv6_max_size() to advertize their limit.
> 
> Unchanged drivers are not allowing big TSO packets to be sent.

Many drivers will have a limit on how many buffer descriptors they
can chain, not the size of the super frame, I'd think. Is that not
the case? We can't assume all pages but the first and last are full,
right?

> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index e490b84732d1654bf067b30f2bb0b0825f88dea9..b1f68df2b37bc4b623f61cc2c6f0c02ba2afbe02 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1948,6 +1948,7 @@ enum netdev_ml_priv_type {
>   *	@dev_addr_shadow:	Copy of @dev_addr to catch direct writes.
>   *	@linkwatch_dev_tracker:	refcount tracker used by linkwatch.
>   *	@watchdog_dev_tracker:	refcount tracker used by watchdog.
> + *	@tso_ipv6_max_size:	Maximum size of IPv6 TSO packets (driver/NIC limit)
>   *
>   *	FIXME: cleanup struct net_device such that network protocol info
>   *	moves out.
> @@ -2282,6 +2283,7 @@ struct net_device {
>  	u8 dev_addr_shadow[MAX_ADDR_LEN];
>  	netdevice_tracker	linkwatch_dev_tracker;
>  	netdevice_tracker	watchdog_dev_tracker;
> +	unsigned int		tso_ipv6_max_size;
>  };
>  #define to_net_dev(d) container_of(d, struct net_device, dev)
>  
> @@ -4818,6 +4820,14 @@ static inline void netif_set_gro_max_size(struct net_device *dev,
>  	WRITE_ONCE(dev->gro_max_size, size);
>  }
>  
> +/* Used by drivers to give their hardware/firmware limit for LSOv2 packets */
> +static inline void netif_set_tso_ipv6_max_size(struct net_device *dev,
> +					       unsigned int size)
> +{
> +	dev->tso_ipv6_max_size = size;
> +}
> +
> +

nit: double new line
Eric Dumazet Feb. 3, 2022, 4:56 p.m. UTC | #2
On Thu, Feb 3, 2022 at 8:34 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed,  2 Feb 2022 17:51:26 -0800 Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> >
> > Some NIC (or virtual devices) are LSOv2 compatible.
> >
> > BIG TCP plans using the large LSOv2 feature for IPv6.
> >
> > New netlink attribute IFLA_TSO_IPV6_MAX_SIZE is defined.
> >
> > Drivers should use netif_set_tso_ipv6_max_size() to advertize their limit.
> >
> > Unchanged drivers are not allowing big TSO packets to be sent.
>
> Many drivers will have a limit on how many buffer descriptors they
> can chain, not the size of the super frame, I'd think. Is that not
> the case? We can't assume all pages but the first and last are full,
> right?

In our case, we have a 100Gbit Google NIC which has these limits:

- TX descriptor has a 16bit field filled with skb->len
- No more than 21 frags per 'packet'

In order to support BIG TCP on it, we had to split the bigger TCP packets
into smaller chunks, to satisfy both constraints (even if the second
constraint is hardly hit once you chop to ~60KB packets, given our 4K
MTU)

ndo_features_check() might help to take care of small oddities.

For instance I will insert the following in the next version of the series:

commit 26644be08edc2f14f6ec79f650cc4a5d380df498
Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Feb 2 23:22:01 2022 -0800

    net: typhoon: implement ndo_features_check method

    Instead of disabling TSO if MAX_SKB_FRAGS > 32, implement
    ndo_features_check() method for this driver.

    If skb has more than 32 frags, use the following heuristic:

    1) force GSO for gso packets
    2) Otherwise force linearization.

    Most locally generated TCP packets will use a small number of fragments
    anyway.

    Signed-off-by: Eric Dumazet <edumazet@google.com>

diff --git a/drivers/net/ethernet/3com/typhoon.c
b/drivers/net/ethernet/3com/typhoon.c
index 8aec5d9fbfef2803c181387537300502a937caf0..216e26a49e9c272ba7483bfa06941ff11ea40e3c
100644
--- a/drivers/net/ethernet/3com/typhoon.c
+++ b/drivers/net/ethernet/3com/typhoon.c
@@ -138,11 +138,6 @@ MODULE_PARM_DESC(use_mmio, "Use MMIO (1) or
PIO(0) to access the NIC. "
 module_param(rx_copybreak, int, 0);
 module_param(use_mmio, int, 0);

-#if defined(NETIF_F_TSO) && MAX_SKB_FRAGS > 32
-#warning Typhoon only supports 32 entries in its SG list for TSO, disabling TSO
-#undef NETIF_F_TSO
-#endif
-
 #if TXLO_ENTRIES <= (2 * MAX_SKB_FRAGS)
 #error TX ring too small!
 #endif
@@ -2261,9 +2256,23 @@ typhoon_test_mmio(struct pci_dev *pdev)
        return mode;
 }

+static netdev_features_t typhoon_features_check(struct sk_buff *skb,
+                                               struct net_device *dev,
+                                               netdev_features_t features)
+{
+       if (skb_shinfo(skb)->nr_frags > 32) {
+               if (skb_is_gso(skb))
+                       features &= ~NETIF_F_GSO_MASK;
+               else
+                       features &= ~NETIF_F_SG;
+       }
+       return features;
+}
+
 static const struct net_device_ops typhoon_netdev_ops = {
        .ndo_open               = typhoon_open,
        .ndo_stop               = typhoon_close,
+       .ndo_features_check     = typhoon_features_check,
        .ndo_start_xmit         = typhoon_start_tx,
        .ndo_set_rx_mode        = typhoon_set_rx_mode,
        .ndo_tx_timeout         = typhoon_tx_timeout,
Jakub Kicinski Feb. 3, 2022, 6:58 p.m. UTC | #3
On Thu, 3 Feb 2022 08:56:56 -0800 Eric Dumazet wrote:
> On Thu, Feb 3, 2022 at 8:34 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > On Wed,  2 Feb 2022 17:51:26 -0800 Eric Dumazet wrote:  
> > > From: Eric Dumazet <edumazet@google.com>
> > >
> > > Some NIC (or virtual devices) are LSOv2 compatible.
> > >
> > > BIG TCP plans using the large LSOv2 feature for IPv6.
> > >
> > > New netlink attribute IFLA_TSO_IPV6_MAX_SIZE is defined.
> > >
> > > Drivers should use netif_set_tso_ipv6_max_size() to advertize their limit.
> > >
> > > Unchanged drivers are not allowing big TSO packets to be sent.  
> >
> > Many drivers will have a limit on how many buffer descriptors they
> > can chain, not the size of the super frame, I'd think. Is that not
> > the case? We can't assume all pages but the first and last are full,
> > right?  
> 
> In our case, we have a 100Gbit Google NIC which has these limits:
> 
> - TX descriptor has a 16bit field filled with skb->len
> - No more than 21 frags per 'packet'
> 
> In order to support BIG TCP on it, we had to split the bigger TCP packets
> into smaller chunks, to satisfy both constraints (even if the second
> constraint is hardly hit once you chop to ~60KB packets, given our 4K
> MTU)
> 
> ndo_features_check() might help to take care of small oddities.

Makes sense, I was curious if we can do more in the core so that fewer
changes are required in the drivers. Both so that drivers don't have to
strip the header and so that drivers with limitations can be served 
pre-cooked smaller skbs.

> For instance I will insert the following in the next version of the series:
> 
> commit 26644be08edc2f14f6ec79f650cc4a5d380df498
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Wed Feb 2 23:22:01 2022 -0800
> 
>     net: typhoon: implement ndo_features_check method
> 
>     Instead of disabling TSO if MAX_SKB_FRAGS > 32, implement
>     ndo_features_check() method for this driver.
> 
>     If skb has more than 32 frags, use the following heuristic:
> 
>     1) force GSO for gso packets
>     2) Otherwise force linearization.
> 
>     Most locally generated TCP packets will use a small number of fragments
>     anyway.
> 
>     Signed-off-by: Eric Dumazet <edumazet@google.com>
> 
> diff --git a/drivers/net/ethernet/3com/typhoon.c
> b/drivers/net/ethernet/3com/typhoon.c
> index 8aec5d9fbfef2803c181387537300502a937caf0..216e26a49e9c272ba7483bfa06941ff11ea40e3c
> 100644
> --- a/drivers/net/ethernet/3com/typhoon.c
> +++ b/drivers/net/ethernet/3com/typhoon.c
> @@ -138,11 +138,6 @@ MODULE_PARM_DESC(use_mmio, "Use MMIO (1) or
> PIO(0) to access the NIC. "
>  module_param(rx_copybreak, int, 0);
>  module_param(use_mmio, int, 0);
> 
> -#if defined(NETIF_F_TSO) && MAX_SKB_FRAGS > 32
> -#warning Typhoon only supports 32 entries in its SG list for TSO, disabling TSO
> -#undef NETIF_F_TSO
> -#endif

I wonder how many drivers just assumed MAX_SKB_FRAGS will never 
change :S What do you think about a device-level check in the core 
for number of frags?
Eric Dumazet Feb. 3, 2022, 7:12 p.m. UTC | #4
On Thu, Feb 3, 2022 at 10:58 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 3 Feb 2022 08:56:56 -0800 Eric Dumazet wrote:
> > On Thu, Feb 3, 2022 at 8:34 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > > On Wed,  2 Feb 2022 17:51:26 -0800 Eric Dumazet wrote:
> > > > From: Eric Dumazet <edumazet@google.com>
> > > >
> > > > Some NIC (or virtual devices) are LSOv2 compatible.
> > > >
> > > > BIG TCP plans using the large LSOv2 feature for IPv6.
> > > >
> > > > New netlink attribute IFLA_TSO_IPV6_MAX_SIZE is defined.
> > > >
> > > > Drivers should use netif_set_tso_ipv6_max_size() to advertize their limit.
> > > >
> > > > Unchanged drivers are not allowing big TSO packets to be sent.
> > >
> > > Many drivers will have a limit on how many buffer descriptors they
> > > can chain, not the size of the super frame, I'd think. Is that not
> > > the case? We can't assume all pages but the first and last are full,
> > > right?
> >
> > In our case, we have a 100Gbit Google NIC which has these limits:
> >
> > - TX descriptor has a 16bit field filled with skb->len
> > - No more than 21 frags per 'packet'
> >
> > In order to support BIG TCP on it, we had to split the bigger TCP packets
> > into smaller chunks, to satisfy both constraints (even if the second
> > constraint is hardly hit once you chop to ~60KB packets, given our 4K
> > MTU)
> >
> > ndo_features_check() might help to take care of small oddities.
>
> Makes sense, I was curious if we can do more in the core so that fewer
> changes are required in the drivers. Both so that drivers don't have to
> strip the header and so that drivers with limitations can be served
> pre-cooked smaller skbs.

I have on my plate to implement a helper to split 'big GRO/TSO' packets
into smaller chunks. I have avoided doing it in our Google NIC driver,
to avoid extra sk_buff/skb->head allocations for each BIG TCP packet.

Yes, core networking stack could use it.

> I wonder how many drivers just assumed MAX_SKB_FRAGS will never
> change :S What do you think about a device-level check in the core
> for number of frags?

I guess we could do this if the CONFIG_MAX_SKB_FRAGS > 17
diff mbox series

Patch

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e490b84732d1654bf067b30f2bb0b0825f88dea9..b1f68df2b37bc4b623f61cc2c6f0c02ba2afbe02 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1948,6 +1948,7 @@  enum netdev_ml_priv_type {
  *	@dev_addr_shadow:	Copy of @dev_addr to catch direct writes.
  *	@linkwatch_dev_tracker:	refcount tracker used by linkwatch.
  *	@watchdog_dev_tracker:	refcount tracker used by watchdog.
+ *	@tso_ipv6_max_size:	Maximum size of IPv6 TSO packets (driver/NIC limit)
  *
  *	FIXME: cleanup struct net_device such that network protocol info
  *	moves out.
@@ -2282,6 +2283,7 @@  struct net_device {
 	u8 dev_addr_shadow[MAX_ADDR_LEN];
 	netdevice_tracker	linkwatch_dev_tracker;
 	netdevice_tracker	watchdog_dev_tracker;
+	unsigned int		tso_ipv6_max_size;
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
@@ -4818,6 +4820,14 @@  static inline void netif_set_gro_max_size(struct net_device *dev,
 	WRITE_ONCE(dev->gro_max_size, size);
 }
 
+/* Used by drivers to give their hardware/firmware limit for LSOv2 packets */
+static inline void netif_set_tso_ipv6_max_size(struct net_device *dev,
+					       unsigned int size)
+{
+	dev->tso_ipv6_max_size = size;
+}
+
+
 static inline void skb_gso_error_unwind(struct sk_buff *skb, __be16 protocol,
 					int pulled_hlen, u16 mac_offset,
 					int mac_len)
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 6218f93f5c1a92b5765bc19dfb9d7583c3b9369b..79b9d399cd297a1f79dca5ce89762800c38ed4a8 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -348,6 +348,7 @@  enum {
 	IFLA_PARENT_DEV_NAME,
 	IFLA_PARENT_DEV_BUS_NAME,
 	IFLA_GRO_MAX_SIZE,
+	IFLA_TSO_IPV6_MAX_SIZE,
 
 	__IFLA_MAX
 };
diff --git a/net/core/dev.c b/net/core/dev.c
index 1baab07820f65f9bcf88a6d73e2c9ff741d33c18..b6ca3c348d41a097baf210f2a5d966b71308c69b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10188,6 +10188,8 @@  struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	dev->gso_max_size = GSO_MAX_SIZE;
 	dev->gso_max_segs = GSO_MAX_SEGS;
 	dev->gro_max_size = GRO_MAX_SIZE;
+	dev->tso_ipv6_max_size = GSO_MAX_SIZE;
+
 	dev->upper_level = 1;
 	dev->lower_level = 1;
 #ifdef CONFIG_LOCKDEP
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index e476403231f00053e1a261f31a8760325c75c941..4cefa07195ba3b67e7b724194b5d729d395ba466 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1027,6 +1027,7 @@  static noinline size_t if_nlmsg_size(const struct net_device *dev,
 	       + nla_total_size(4) /* IFLA_GSO_MAX_SEGS */
 	       + nla_total_size(4) /* IFLA_GSO_MAX_SIZE */
 	       + nla_total_size(4) /* IFLA_GRO_MAX_SIZE */
+	       + nla_total_size(4) /* IFLA_TSO_IPV6_MAX_SIZE */
 	       + nla_total_size(1) /* IFLA_OPERSTATE */
 	       + nla_total_size(1) /* IFLA_LINKMODE */
 	       + nla_total_size(4) /* IFLA_CARRIER_CHANGES */
@@ -1730,6 +1731,7 @@  static int rtnl_fill_ifinfo(struct sk_buff *skb,
 	    nla_put_u32(skb, IFLA_GSO_MAX_SEGS, dev->gso_max_segs) ||
 	    nla_put_u32(skb, IFLA_GSO_MAX_SIZE, dev->gso_max_size) ||
 	    nla_put_u32(skb, IFLA_GRO_MAX_SIZE, dev->gro_max_size) ||
+	    nla_put_u32(skb, IFLA_TSO_IPV6_MAX_SIZE, dev->tso_ipv6_max_size) ||
 #ifdef CONFIG_RPS
 	    nla_put_u32(skb, IFLA_NUM_RX_QUEUES, dev->num_rx_queues) ||
 #endif
@@ -1883,6 +1885,7 @@  static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_NEW_IFINDEX]	= NLA_POLICY_MIN(NLA_S32, 1),
 	[IFLA_PARENT_DEV_NAME]	= { .type = NLA_NUL_STRING },
 	[IFLA_GRO_MAX_SIZE]	= { .type = NLA_U32 },
+	[IFLA_TSO_IPV6_MAX_SIZE]	= { .type = NLA_U32 },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
diff --git a/tools/include/uapi/linux/if_link.h b/tools/include/uapi/linux/if_link.h
index 6218f93f5c1a92b5765bc19dfb9d7583c3b9369b..79b9d399cd297a1f79dca5ce89762800c38ed4a8 100644
--- a/tools/include/uapi/linux/if_link.h
+++ b/tools/include/uapi/linux/if_link.h
@@ -348,6 +348,7 @@  enum {
 	IFLA_PARENT_DEV_NAME,
 	IFLA_PARENT_DEV_BUS_NAME,
 	IFLA_GRO_MAX_SIZE,
+	IFLA_TSO_IPV6_MAX_SIZE,
 
 	__IFLA_MAX
 };