Message ID | 20220509222149.1763877-14-eric.dumazet@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | tcp: BIG TCP implementation | expand |
On Mon, May 9, 2022 at 3:22 PM Eric Dumazet <eric.dumazet@gmail.com> wrote: > > From: Coco Li <lixiaoyan@google.com> > > mlx5 supports LSOv2. > > IPv6 gro/tcp stacks insert a temporary Hop-by-Hop header > with JUMBO TLV for big packets. > > We need to ignore/skip this HBH header when populating TX descriptor. > > Note that ipv6_has_hopopt_jumbo() only recognizes very specific packet > layout, thus mlx5e_sq_xmit_wqe() is taking care of this layout only. > > v2: clear hopbyhop in mlx5e_tx_get_gso_ihs() > v4: fix compile error for CONFIG_MLX5_CORE_IPOIB=y > > Signed-off-by: Coco Li <lixiaoyan@google.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> > Reviewed-by: Tariq Toukan <tariqt@nvidia.com> > Cc: Saeed Mahameed <saeedm@nvidia.com> > Cc: Leon Romanovsky <leon@kernel.org> > --- > .../net/ethernet/mellanox/mlx5/core/en_main.c | 1 + > .../net/ethernet/mellanox/mlx5/core/en_tx.c | 84 +++++++++++++++---- > 2 files changed, 69 insertions(+), 16 deletions(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c > index d27986869b8ba070d1a4f8bcdc7e14ab54ae984e..226825410a1aa55b5b7941a7389a78abdb800521 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c > @@ -4920,6 +4920,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev) > > netdev->priv_flags |= IFF_UNICAST_FLT; > > + netif_set_tso_max_size(netdev, 512 * 1024); Apparently I forgot to amend this part on the final patch of the series. diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 226825410a1aa55b5b7941a7389a78abdb800521..bf3bca79e160124abd128ac1e9910cb2f39a39ff 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -4920,7 +4920,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev) netdev->priv_flags |= IFF_UNICAST_FLT; - netif_set_tso_max_size(netdev, 512 * 1024); + netif_set_tso_max_size(netdev, GSO_MAX_SIZE); mlx5e_set_netdev_dev_addr(netdev); mlx5e_ipsec_build_netdev(priv); mlx5e_ktls_build_netdev(priv);
On Mon, 9 May 2022 15:21:49 -0700 Eric Dumazet wrote: > From: Coco Li <lixiaoyan@google.com> > > mlx5 supports LSOv2. > > IPv6 gro/tcp stacks insert a temporary Hop-by-Hop header > with JUMBO TLV for big packets. > > We need to ignore/skip this HBH header when populating TX descriptor. > > Note that ipv6_has_hopopt_jumbo() only recognizes very specific packet > layout, thus mlx5e_sq_xmit_wqe() is taking care of this layout only. > > v2: clear hopbyhop in mlx5e_tx_get_gso_ihs() > v4: fix compile error for CONFIG_MLX5_CORE_IPOIB=y > > Signed-off-by: Coco Li <lixiaoyan@google.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> > Reviewed-by: Tariq Toukan <tariqt@nvidia.com> > Cc: Saeed Mahameed <saeedm@nvidia.com> > Cc: Leon Romanovsky <leon@kernel.org> So we're leaving the warning for Kees to deal with? Kees is there some form of "I know what I'm doing" cast that you could sneak us under the table?
On Mon, May 9, 2022 at 6:38 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Mon, 9 May 2022 15:21:49 -0700 Eric Dumazet wrote: > > From: Coco Li <lixiaoyan@google.com> > > > > mlx5 supports LSOv2. > > > > IPv6 gro/tcp stacks insert a temporary Hop-by-Hop header > > with JUMBO TLV for big packets. > > > > We need to ignore/skip this HBH header when populating TX descriptor. > > > > Note that ipv6_has_hopopt_jumbo() only recognizes very specific packet > > layout, thus mlx5e_sq_xmit_wqe() is taking care of this layout only. > > > > v2: clear hopbyhop in mlx5e_tx_get_gso_ihs() > > v4: fix compile error for CONFIG_MLX5_CORE_IPOIB=y > > > > Signed-off-by: Coco Li <lixiaoyan@google.com> > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > Reviewed-by: Tariq Toukan <tariqt@nvidia.com> > > Cc: Saeed Mahameed <saeedm@nvidia.com> > > Cc: Leon Romanovsky <leon@kernel.org> > > So we're leaving the warning for Kees to deal with? I think so. I do not see an easy way to escape this, unless perhaps add some extra obfuscation, so that gcc can not determine the memcpy() third argument at compile time. Alternative is to remove mlx5 patch from the upstream series. > > Kees is there some form of "I know what I'm doing" cast > that you could sneak us under the table?
On Mon, May 09, 2022 at 06:38:53PM -0700, Jakub Kicinski wrote: > On Mon, 9 May 2022 15:21:49 -0700 Eric Dumazet wrote: > > From: Coco Li <lixiaoyan@google.com> > > > > mlx5 supports LSOv2. > > > > IPv6 gro/tcp stacks insert a temporary Hop-by-Hop header > > with JUMBO TLV for big packets. > > > > We need to ignore/skip this HBH header when populating TX descriptor. > > > > Note that ipv6_has_hopopt_jumbo() only recognizes very specific packet > > layout, thus mlx5e_sq_xmit_wqe() is taking care of this layout only. > > > > v2: clear hopbyhop in mlx5e_tx_get_gso_ihs() > > v4: fix compile error for CONFIG_MLX5_CORE_IPOIB=y > > > > Signed-off-by: Coco Li <lixiaoyan@google.com> > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > Reviewed-by: Tariq Toukan <tariqt@nvidia.com> > > Cc: Saeed Mahameed <saeedm@nvidia.com> > > Cc: Leon Romanovsky <leon@kernel.org> > > So we're leaving the warning for Kees to deal with? > > Kees is there some form of "I know what I'm doing" cast > that you could sneak us under the table? Right now, it's switching that memcpy to __builtin_memcpy(), but I'll send a patch that'll create an unsafe_memcpy() macro that does the right things vs kasan, fortify, etc.
On Mon, May 09, 2022 at 06:38:53PM -0700, Jakub Kicinski wrote: > So we're leaving the warning for Kees to deal with? > > Kees is there some form of "I know what I'm doing" cast > that you could sneak us under the table? Okay, I've sent this[1] now. If that looks okay to you, I figure you'll land it via netdev for the coming merge window? -Kees [1] https://lore.kernel.org/netdev/20220511025301.3636666-1-keescook@chromium.org/
On Tue, 10 May 2022 19:55:16 -0700 Kees Cook wrote: > On Mon, May 09, 2022 at 06:38:53PM -0700, Jakub Kicinski wrote: > > So we're leaving the warning for Kees to deal with? > > > > Kees is there some form of "I know what I'm doing" cast > > that you could sneak us under the table? > > Okay, I've sent this[1] now. If that looks okay to you, I figure you'll > land it via netdev for the coming merge window? I was about to say "great!" but perhaps given we're adding an unsafe_ flavor of something a "it is what it is" would be a more appropriate reaction. Thank you!
On Wed, May 11, 2022 at 09:26:48AM -0700, Jakub Kicinski wrote: > On Tue, 10 May 2022 19:55:16 -0700 Kees Cook wrote: > > On Mon, May 09, 2022 at 06:38:53PM -0700, Jakub Kicinski wrote: > > > So we're leaving the warning for Kees to deal with? > > > > > > Kees is there some form of "I know what I'm doing" cast > > > that you could sneak us under the table? > > > > Okay, I've sent this[1] now. If that looks okay to you, I figure you'll > > land it via netdev for the coming merge window? > > I was about to say "great!" but perhaps given we're adding an unsafe_ > flavor of something a "it is what it is" would be a more appropriate > reaction. Heh, well, I think it's just calling a spade a spade: plain memcpy is already unsafe. The goal is for the kernel's (fortified) memcpy to be "provably" safe. :) But yeah, I get what you mean. I'm sad that I don't yet have a workable way to deal with this code pattern, but I'm getting close, I think. My random notes currently are: diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c index 2dc48406cd08..595d0db4e97a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c @@ -386,6 +386,14 @@ mlx5e_sq_xmit_wqe(struct mlx5e_txqsq *sq, struct sk_buff *skb, stats->added_vlan_packets++; } else { eseg->inline_hdr.sz |= cpu_to_be16(attr->ihs); +/* interface could take: + fas: wqe + dst: eth.inline_hdr.start + src: skb->data + bytes: attr->ihs + elements member: data + elements_count value: wqe_attr->ds_cnt_inl +*/ memcpy(eseg->inline_hdr.start, skb->data, attr->ihs); } dseg += wqe_attr->ds_cnt_inl; There's a similar case with how netlink constructs things (i.e. performing a memcpy across some of the trailing header members and then into the flex array) that may share this code pattern, and at least one patch to mlx5 I'd sent before could be refactored back into this to unsplit the memcpy there. Anyway, I'll continue to chip away at it.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index d27986869b8ba070d1a4f8bcdc7e14ab54ae984e..226825410a1aa55b5b7941a7389a78abdb800521 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -4920,6 +4920,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev) netdev->priv_flags |= IFF_UNICAST_FLT; + netif_set_tso_max_size(netdev, 512 * 1024); mlx5e_set_netdev_dev_addr(netdev); mlx5e_ipsec_build_netdev(priv); mlx5e_ktls_build_netdev(priv); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c index 2dc48406cd08d21ff94f665cd61ab9227f351215..b4fc45ba1b347fb9ad0f46b9c091cc45e4d3d84f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c @@ -40,6 +40,7 @@ #include "en_accel/en_accel.h" #include "en_accel/ipsec_rxtx.h" #include "en/ptp.h" +#include <net/ipv6.h> static void mlx5e_dma_unmap_wqe_err(struct mlx5e_txqsq *sq, u8 num_dma) { @@ -130,23 +131,32 @@ mlx5e_txwqe_build_eseg_csum(struct mlx5e_txqsq *sq, struct sk_buff *skb, sq->stats->csum_none++; } +/* Returns the number of header bytes that we plan + * to inline later in the transmit descriptor + */ static inline u16 -mlx5e_tx_get_gso_ihs(struct mlx5e_txqsq *sq, struct sk_buff *skb) +mlx5e_tx_get_gso_ihs(struct mlx5e_txqsq *sq, struct sk_buff *skb, int *hopbyhop) { struct mlx5e_sq_stats *stats = sq->stats; u16 ihs; + *hopbyhop = 0; if (skb->encapsulation) { ihs = skb_inner_transport_offset(skb) + inner_tcp_hdrlen(skb); stats->tso_inner_packets++; stats->tso_inner_bytes += skb->len - ihs; } else { - if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) + if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) { ihs = skb_transport_offset(skb) + sizeof(struct udphdr); - else + } else { ihs = skb_transport_offset(skb) + tcp_hdrlen(skb); + if (ipv6_has_hopopt_jumbo(skb)) { + *hopbyhop = sizeof(struct hop_jumbo_hdr); + ihs -= sizeof(struct hop_jumbo_hdr); + } + } stats->tso_packets++; - stats->tso_bytes += skb->len - ihs; + stats->tso_bytes += skb->len - ihs - *hopbyhop; } return ihs; @@ -208,6 +218,7 @@ struct mlx5e_tx_attr { __be16 mss; u16 insz; u8 opcode; + u8 hopbyhop; }; struct mlx5e_tx_wqe_attr { @@ -244,14 +255,16 @@ static void mlx5e_sq_xmit_prepare(struct mlx5e_txqsq *sq, struct sk_buff *skb, struct mlx5e_sq_stats *stats = sq->stats; if (skb_is_gso(skb)) { - u16 ihs = mlx5e_tx_get_gso_ihs(sq, skb); + int hopbyhop; + u16 ihs = mlx5e_tx_get_gso_ihs(sq, skb, &hopbyhop); *attr = (struct mlx5e_tx_attr) { .opcode = MLX5_OPCODE_LSO, .mss = cpu_to_be16(skb_shinfo(skb)->gso_size), .ihs = ihs, .num_bytes = skb->len + (skb_shinfo(skb)->gso_segs - 1) * ihs, - .headlen = skb_headlen(skb) - ihs, + .headlen = skb_headlen(skb) - ihs - hopbyhop, + .hopbyhop = hopbyhop, }; stats->packets += skb_shinfo(skb)->gso_segs; @@ -365,7 +378,8 @@ mlx5e_sq_xmit_wqe(struct mlx5e_txqsq *sq, struct sk_buff *skb, struct mlx5_wqe_eth_seg *eseg; struct mlx5_wqe_data_seg *dseg; struct mlx5e_tx_wqe_info *wi; - + u16 ihs = attr->ihs; + struct ipv6hdr *h6; struct mlx5e_sq_stats *stats = sq->stats; int num_dma; @@ -379,15 +393,36 @@ mlx5e_sq_xmit_wqe(struct mlx5e_txqsq *sq, struct sk_buff *skb, eseg->mss = attr->mss; - if (attr->ihs) { - if (skb_vlan_tag_present(skb)) { - eseg->inline_hdr.sz |= cpu_to_be16(attr->ihs + VLAN_HLEN); - mlx5e_insert_vlan(eseg->inline_hdr.start, skb, attr->ihs); + if (ihs) { + u8 *start = eseg->inline_hdr.start; + + if (unlikely(attr->hopbyhop)) { + /* remove the HBH header. + * Layout: [Ethernet header][IPv6 header][HBH][TCP header] + */ + if (skb_vlan_tag_present(skb)) { + mlx5e_insert_vlan(start, skb, ETH_HLEN + sizeof(*h6)); + ihs += VLAN_HLEN; + h6 = (struct ipv6hdr *)(start + sizeof(struct vlan_ethhdr)); + } else { + memcpy(start, skb->data, ETH_HLEN + sizeof(*h6)); + h6 = (struct ipv6hdr *)(start + ETH_HLEN); + } + h6->nexthdr = IPPROTO_TCP; + /* Copy the TCP header after the IPv6 one */ + memcpy(h6 + 1, + skb->data + ETH_HLEN + sizeof(*h6) + + sizeof(struct hop_jumbo_hdr), + tcp_hdrlen(skb)); + /* Leave ipv6 payload_len set to 0, as LSO v2 specs request. */ + } else if (skb_vlan_tag_present(skb)) { + mlx5e_insert_vlan(start, skb, ihs); + ihs += VLAN_HLEN; stats->added_vlan_packets++; } else { - eseg->inline_hdr.sz |= cpu_to_be16(attr->ihs); - memcpy(eseg->inline_hdr.start, skb->data, attr->ihs); + memcpy(start, skb->data, ihs); } + eseg->inline_hdr.sz |= cpu_to_be16(ihs); dseg += wqe_attr->ds_cnt_inl; } else if (skb_vlan_tag_present(skb)) { eseg->insert.type = cpu_to_be16(MLX5_ETH_WQE_INSERT_VLAN); @@ -398,7 +433,7 @@ mlx5e_sq_xmit_wqe(struct mlx5e_txqsq *sq, struct sk_buff *skb, } dseg += wqe_attr->ds_cnt_ids; - num_dma = mlx5e_txwqe_build_dsegs(sq, skb, skb->data + attr->ihs, + num_dma = mlx5e_txwqe_build_dsegs(sq, skb, skb->data + attr->ihs + attr->hopbyhop, attr->headlen, dseg); if (unlikely(num_dma < 0)) goto err_drop; @@ -918,12 +953,29 @@ void mlx5i_sq_xmit(struct mlx5e_txqsq *sq, struct sk_buff *skb, eseg->mss = attr.mss; if (attr.ihs) { - memcpy(eseg->inline_hdr.start, skb->data, attr.ihs); + if (unlikely(attr.hopbyhop)) { + struct ipv6hdr *h6; + + /* remove the HBH header. + * Layout: [Ethernet header][IPv6 header][HBH][TCP header] + */ + memcpy(eseg->inline_hdr.start, skb->data, ETH_HLEN + sizeof(*h6)); + h6 = (struct ipv6hdr *)((char *)eseg->inline_hdr.start + ETH_HLEN); + h6->nexthdr = IPPROTO_TCP; + /* Copy the TCP header after the IPv6 one */ + memcpy(h6 + 1, + skb->data + ETH_HLEN + sizeof(*h6) + + sizeof(struct hop_jumbo_hdr), + tcp_hdrlen(skb)); + /* Leave ipv6 payload_len set to 0, as LSO v2 specs request. */ + } else { + memcpy(eseg->inline_hdr.start, skb->data, attr.ihs); + } eseg->inline_hdr.sz = cpu_to_be16(attr.ihs); dseg += wqe_attr.ds_cnt_inl; } - num_dma = mlx5e_txwqe_build_dsegs(sq, skb, skb->data + attr.ihs, + num_dma = mlx5e_txwqe_build_dsegs(sq, skb, skb->data + attr.ihs + attr.hopbyhop, attr.headlen, dseg); if (unlikely(num_dma < 0)) goto err_drop;