From patchwork Thu Mar 3 18:16:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Dumazet X-Patchwork-Id: 12767899 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 956F2C433F5 for ; Thu, 3 Mar 2022 18:17:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235700AbiCCSSC (ORCPT ); Thu, 3 Mar 2022 13:18:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39058 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235681AbiCCSRn (ORCPT ); Thu, 3 Mar 2022 13:17:43 -0500 Received: from mail-pj1-x102b.google.com (mail-pj1-x102b.google.com [IPv6:2607:f8b0:4864:20::102b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CC0971A39C9 for ; Thu, 3 Mar 2022 10:16:56 -0800 (PST) Received: by mail-pj1-x102b.google.com with SMTP id ge19-20020a17090b0e1300b001bcca16e2e7so8403918pjb.3 for ; Thu, 03 Mar 2022 10:16:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ChwnZas8ZIG1k0tr2lJzRm+HnVvKuvhEO1Cu8gPKlGQ=; b=CkCICj4r8pyHJxC7w6jNTJXXhLvbh8yEm8EItyxsIINLl++sNG7MvTo2+5m9GGsdyk BQHel7ou5IL1CF14iYKkpkepmHt9smBbSuAN54/U/SlPezD6cBtKdf0mgr16kzs1ysJU 7rVR1dgIDDXA/gWwR4+8C7qR4tG0R3oWdhY62YjeU7vlhUge0vUZesbg7iG0B1ZMSYW5 pDyDjdZcvdXbT2eHjMdmjhIc/eUssHj9wfJhd8gUGcrw+7tWzYwaDxm5tdr562YE7iN3 oYVuTo3tE3l1XUVSaUufd4zeQmhcUUp8C2Fof5HZAom4ZxY2RICwJR+HKxFqy8ghpBzQ 126Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ChwnZas8ZIG1k0tr2lJzRm+HnVvKuvhEO1Cu8gPKlGQ=; b=RdM8dmBPwkC/tBWwennw7jZk4BU0EawjuCpAXoL6PkmpBbxcHejSFKABPkur2sdZlD sZP092SEly0JdRS+nlFv9Kj4B6lItFrvChIcN66dQBCmtV6xsoR6Adl8AAoZYdHFHads 8whBtf4PluU3hrQwWFIcKEP94YxJU9BCk6+P3GZvJJU8u+VAavMopq+IpbYWBxYBqeTb 1QpDd3KeFY3x2m6CG5entqxbIFTgJA061AFtbf4ICHh3aXq4B/O046454BPQxDdxTc0G mmgtJNQyY4piuiKr9xeQwVxoJKTmhk1NFyEn5WXquj6nSubSNAXE6KZzlSX5PWT/l7Du s3jg== X-Gm-Message-State: AOAM533Du/nzOxGyraRzbpn9gnx6VlZaMk4dmsMzFGj3gY9GzG27wfU3 1bAnCs2xWWdUat5T26heK6E= X-Google-Smtp-Source: ABdhPJwKSRK6jTo26K3sjSkzBHSRirhy4/NHei9ZFlFTRO0AbR6OhgLVIdsWoaL3oFYgeEpBws2bEQ== X-Received: by 2002:a17:902:d706:b0:14d:5b6f:5421 with SMTP id w6-20020a170902d70600b0014d5b6f5421mr36988450ply.96.1646331416073; Thu, 03 Mar 2022 10:16:56 -0800 (PST) Received: from edumazet1.svl.corp.google.com ([2620:15c:2c4:201:5388:c313:5e37:a261]) by smtp.gmail.com with ESMTPSA id u14-20020a17090adb4e00b001bee5dd39basm7611016pjx.1.2022.03.03.10.16.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Mar 2022 10:16:55 -0800 (PST) From: Eric Dumazet To: "David S . Miller" , Jakub Kicinski Cc: netdev , Eric Dumazet , Coco Li , David Ahern , Alexander Duyck , Eric Dumazet , Tariq Toukan Subject: [PATCH v2 net-next 13/14] mlx4: support BIG TCP packets Date: Thu, 3 Mar 2022 10:16:06 -0800 Message-Id: <20220303181607.1094358-14-eric.dumazet@gmail.com> X-Mailer: git-send-email 2.35.1.616.g0bdcbb4464-goog In-Reply-To: <20220303181607.1094358-1-eric.dumazet@gmail.com> References: <20220303181607.1094358-1-eric.dumazet@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org From: Eric Dumazet mlx4 supports LSOv2 just fine. IPv6 stack inserts a temporary Hop-by-Hop header with JUMBO TLV for big packets. We need to ignore the HBH header when populating TX descriptor. Tested: Before: (not enabling bigger TSO/GRO packets) ip link set dev eth0 gso_ipv6_max_size 65536 gro_ipv6_max_size 65536 netperf -H lpaa18 -t TCP_RR -T2,2 -l 10 -Cc -- -r 70000,70000 MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to lpaa18.prod.google.com () port 0 AF_INET6 : first burst 0 : cpu bind Local /Remote Socket Size Request Resp. Elapsed Trans. CPU CPU S.dem S.dem Send Recv Size Size Time Rate local remote local remote bytes bytes bytes bytes secs. per sec % S % S us/Tr us/Tr 262144 540000 70000 70000 10.00 6591.45 0.86 1.34 62.490 97.446 262144 540000 After: (enabling bigger TSO/GRO packets) ip link set dev eth0 gso_ipv6_max_size 185000 gro_ipv6_max_size 185000 netperf -H lpaa18 -t TCP_RR -T2,2 -l 10 -Cc -- -r 70000,70000 MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to lpaa18.prod.google.com () port 0 AF_INET6 : first burst 0 : cpu bind Local /Remote Socket Size Request Resp. Elapsed Trans. CPU CPU S.dem S.dem Send Recv Size Size Time Rate local remote local remote bytes bytes bytes bytes secs. per sec % S % S us/Tr us/Tr 262144 540000 70000 70000 10.00 8383.95 0.95 1.01 54.432 57.584 262144 540000 Signed-off-by: Eric Dumazet Cc: Tariq Toukan Reviewed-by: Tariq Toukan --- .../net/ethernet/mellanox/mlx4/en_netdev.c | 3 ++ drivers/net/ethernet/mellanox/mlx4/en_tx.c | 47 +++++++++++++++---- 2 files changed, 41 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c index c61dc7ae0c056a4dbcf24297549f6b1b5cc25d92..76cb93f5e5240c54f6f4c57e39739376206b4f34 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c @@ -3417,6 +3417,9 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port, dev->min_mtu = ETH_MIN_MTU; dev->max_mtu = priv->max_mtu; + /* supports LSOv2 packets, 512KB limit has been tested. */ + netif_set_tso_ipv6_max_size(dev, 512 * 1024); + mdev->pndev[port] = dev; mdev->upper[port] = NULL; diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c index 817f4154b86d599cd593876ec83529051d95fe2f..c89b3e8094e7d8cfb11aaa6cc4ad63bf3ad5934e 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c @@ -44,6 +44,7 @@ #include #include #include +#include #include "mlx4_en.h" @@ -635,19 +636,28 @@ static int get_real_size(const struct sk_buff *skb, struct net_device *dev, int *lso_header_size, bool *inline_ok, - void **pfrag) + void **pfrag, + int *hopbyhop) { struct mlx4_en_priv *priv = netdev_priv(dev); int real_size; if (shinfo->gso_size) { *inline_ok = false; - if (skb->encapsulation) + *hopbyhop = 0; + if (skb->encapsulation) { *lso_header_size = (skb_inner_transport_header(skb) - skb->data) + inner_tcp_hdrlen(skb); - else + } else { + /* Detects large IPV6 TCP packets and prepares for removal of + * HBH header that has been pushed by ip6_xmit(), + * mainly so that tcpdump can dissect them. + */ + if (ipv6_has_hopopt_jumbo(skb)) + *hopbyhop = sizeof(struct hop_jumbo_hdr); *lso_header_size = skb_transport_offset(skb) + tcp_hdrlen(skb); + } real_size = CTRL_SIZE + shinfo->nr_frags * DS_SIZE + - ALIGN(*lso_header_size + 4, DS_SIZE); + ALIGN(*lso_header_size - *hopbyhop + 4, DS_SIZE); if (unlikely(*lso_header_size != skb_headlen(skb))) { /* We add a segment for the skb linear buffer only if * it contains data */ @@ -874,6 +884,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev) int desc_size; int real_size; u32 index, bf_index; + struct ipv6hdr *h6; __be32 op_own; int lso_header_size; void *fragptr = NULL; @@ -882,6 +893,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev) bool stop_queue; bool inline_ok; u8 data_offset; + int hopbyhop; bool bf_ok; tx_ind = skb_get_queue_mapping(skb); @@ -891,7 +903,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev) goto tx_drop; real_size = get_real_size(skb, shinfo, dev, &lso_header_size, - &inline_ok, &fragptr); + &inline_ok, &fragptr, &hopbyhop); if (unlikely(!real_size)) goto tx_drop_count; @@ -944,7 +956,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev) data = &tx_desc->data; data_offset = offsetof(struct mlx4_en_tx_desc, data); } else { - int lso_align = ALIGN(lso_header_size + 4, DS_SIZE); + int lso_align = ALIGN(lso_header_size - hopbyhop + 4, DS_SIZE); data = (void *)&tx_desc->lso + lso_align; data_offset = offsetof(struct mlx4_en_tx_desc, lso) + lso_align; @@ -1009,14 +1021,31 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev) ((ring->prod & ring->size) ? cpu_to_be32(MLX4_EN_BIT_DESC_OWN) : 0); + lso_header_size -= hopbyhop; /* Fill in the LSO prefix */ tx_desc->lso.mss_hdr_size = cpu_to_be32( shinfo->gso_size << 16 | lso_header_size); - /* Copy headers; - * note that we already verified that it is linear */ - memcpy(tx_desc->lso.header, skb->data, lso_header_size); + if (unlikely(hopbyhop)) { + /* remove the HBH header. + * Layout: [Ethernet header][IPv6 header][HBH][TCP header] + */ + memcpy(tx_desc->lso.header, skb->data, ETH_HLEN + sizeof(*h6)); + h6 = (struct ipv6hdr *)((char *)tx_desc->lso.header + ETH_HLEN); + h6->nexthdr = IPPROTO_TCP; + /* Copy the TCP header after the IPv6 one */ + memcpy(h6 + 1, + skb->data + ETH_HLEN + sizeof(*h6) + + sizeof(struct hop_jumbo_hdr), + tcp_hdrlen(skb)); + /* Leave ipv6 payload_len set to 0, as LSO v2 specs request. */ + } else { + /* Copy headers; + * note that we already verified that it is linear + */ + memcpy(tx_desc->lso.header, skb->data, lso_header_size); + } ring->tso_packets++; i = shinfo->gso_segs;