[v2,2/2] gro: optimise redundant parsing of packets

Message ID	20230222151236.GB12658@debian (mailing list archive)
State	Deferred
Delegated to:	Netdev Maintainers
Headers	show Return-Path: <netdev-owner@vger.kernel.org> Date: Wed, 22 Feb 2023 16:12:38 +0100 From: Richard Gobert <richardbgobert@gmail.com> To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, dsahern@kernel.org, alexanderduyck@fb.com, lixiaoyan@google.com, steffen.klassert@secunet.com, lucien.xin@gmail.com, ye.xingchen@zte.com.cn, iwienand@redhat.com, leon@kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 2/2] gro: optimise redundant parsing of packets Message-ID: <20230222151236.GB12658@debian> References: <20230222145917.GA12590@debian> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230222145917.GA12590@debian> User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk
Series	gro: optimise redundant parsing of packets \| expand [v2,0/2] gro: optimise redundant parsing of packets [v2,1/2] gro: decrease size of CB [v2,2/2] gro: optimise redundant parsing of packets

Message ID

20230222151236.GB12658@debian (mailing list archive)

State

Deferred

Delegated to:

Netdev Maintainers

Headers

Date: Wed, 22 Feb 2023 16:12:38 +0100
From: Richard Gobert <richardbgobert@gmail.com>
To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
        pabeni@redhat.com, dsahern@kernel.org, alexanderduyck@fb.com,
        lixiaoyan@google.com, steffen.klassert@secunet.com,
        lucien.xin@gmail.com, ye.xingchen@zte.com.cn, iwienand@redhat.com,
        leon@kernel.org, netdev@vger.kernel.org,
        linux-kernel@vger.kernel.org
Subject: [PATCH v2 2/2] gro: optimise redundant parsing of packets
Message-ID: <20230222151236.GB12658@debian>
References: <20230222145917.GA12590@debian>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20230222145917.GA12590@debian>
User-Agent: Mutt/1.10.1 (2018-07-13)
Precedence: bulk

Series

gro: optimise redundant parsing of packets | expand

Context	Check	Description
netdev/tree_selection	success	Guessed tree name to be net-next
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/subject_prefix	warning	Target tree name not specified in the subject
netdev/cover_letter	success	Series has a cover letter
netdev/patch_count	success	Link
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 20 this patch: 20
netdev/cc_maintainers	success	CCed 9 of 9 maintainers
netdev/build_clang	success	Errors and warnings before: 0 this patch: 0
netdev/module_param	success	Was 0 now: 0
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 20 this patch: 20
netdev/checkpatch	warning	WARNING: line length of 89 exceeds 80 columns
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0

Context

Check

Description

netdev/tree_selection

success

Guessed tree name to be net-next

netdev/fixes_present

success

Fixes tag not required for -next series

netdev/subject_prefix

warning

Target tree name not specified in the subject

netdev/cover_letter

success

Series has a cover letter

netdev/patch_count

success

Link

netdev/header_inline

success

No static functions without inline keyword in header files

netdev/build_32bit

success

Errors and warnings before: 20 this patch: 20

netdev/cc_maintainers

success

CCed 9 of 9 maintainers

netdev/build_clang

success

Errors and warnings before: 0 this patch: 0

netdev/module_param

success

Was 0 now: 0

netdev/verify_signedoff

success

Signed-off-by tag matches author and committer

netdev/check_selftest

success

No net selftest shell script

netdev/verify_fixes

success

No Fixes tag

netdev/build_allmodconfig_warn

success

Errors and warnings before: 20 this patch: 20

netdev/checkpatch

warning

WARNING: line length of 89 exceeds 80 columns

netdev/kdoc

success

Errors and warnings before: 0 this patch: 0

netdev/source_inline

success

Was 0 now: 0

Commit Message

Richard Gobert Feb. 22, 2023, 3:12 p.m. UTC

Currently the IPv6 extension headers are parsed twice: first in
ipv6_gro_receive, and then again in ipv6_gro_complete.

By using the new ->transport_proto field, and also storing the size of the
network header, we can avoid parsing extension headers a second time in
ipv6_gro_complete (which saves multiple memory dereferences and conditional
checks inside ipv6_exthdrs_len for a varying amount of extension headers in IPv6
packets).

The implementation had to handle both inner and outer layers in case of
encapsulation (as they can't use the same field).

Performance tests for TCP stream over IPv6 with a varying amount of extension
headers demonstrate throughput improvement of ~0.7%.

In addition, I fixed a potential existing problem:
 - The call to skb_set_inner_network_header at the beginning of
   ipv6_gro_complete calculates inner_network_header based on skb->data by
   calling skb_set_inner_network_header, and setting it to point to the beginning
   of the ip header.
 - If a packet is going to be handled by BIG TCP, the following code block is
   going to shift the packet header, and skb->data is going to be changed as
   well. 

When the two flows are combined, inner_network_header will point to the wrong
place.

The fix is to place the whole encapsulation branch after the BIG TCP code block.
This way, inner_network_header is calculated with a correct value of skb->data.
Also, by arranging the code that way, the optimisation does not add an additional
branch.

Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
---
 include/net/gro.h      |  9 +++++++++
 net/ethernet/eth.c     | 14 +++++++++++---
 net/ipv6/ip6_offload.c | 20 +++++++++++++++-----
 3 files changed, 35 insertions(+), 8 deletions(-)

Comments

Willem de Bruijn Feb. 22, 2023, 3:27 p.m. UTC | #1

Richard Gobert wrote:
> Currently the IPv6 extension headers are parsed twice: first in
> ipv6_gro_receive, and then again in ipv6_gro_complete.
> 
> By using the new ->transport_proto field, and also storing the size of the
> network header, we can avoid parsing extension headers a second time in
> ipv6_gro_complete (which saves multiple memory dereferences and conditional
> checks inside ipv6_exthdrs_len for a varying amount of extension headers in IPv6
> packets).
> 
> The implementation had to handle both inner and outer layers in case of
> encapsulation (as they can't use the same field).
> 
> Performance tests for TCP stream over IPv6 with a varying amount of extension
> headers demonstrate throughput improvement of ~0.7%.
> 
> In addition, I fixed a potential existing problem:
>  - The call to skb_set_inner_network_header at the beginning of
>    ipv6_gro_complete calculates inner_network_header based on skb->data by
>    calling skb_set_inner_network_header, and setting it to point to the beginning
>    of the ip header.
>  - If a packet is going to be handled by BIG TCP, the following code block is
>    going to shift the packet header, and skb->data is going to be changed as
>    well. 
> 
> When the two flows are combined, inner_network_header will point to the wrong
> place.
> 
> The fix is to place the whole encapsulation branch after the BIG TCP code block.

This should be a separate fix patch?

> This way, inner_network_header is calculated with a correct value of skb->data.
> Also, by arranging the code that way, the optimisation does not add an additional
> branch.
> 
> Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
> ---
>  include/net/gro.h      |  9 +++++++++
>  net/ethernet/eth.c     | 14 +++++++++++---
>  net/ipv6/ip6_offload.c | 20 +++++++++++++++-----
>  3 files changed, 35 insertions(+), 8 deletions(-)
> 
> diff --git a/include/net/gro.h b/include/net/gro.h
> index 7b47dd6ce94f..35f60ea99f6c 100644
> --- a/include/net/gro.h
> +++ b/include/net/gro.h
> @@ -86,6 +86,15 @@ struct napi_gro_cb {
>  
>  	/* used to support CHECKSUM_COMPLETE for tunneling protocols */
>  	__wsum	csum;
> +
> +	/* Used in ipv6_gro_receive() */
> +	u16	network_len;
> +
> +	/* Used in eth_gro_receive() */
> +	__be16	network_proto;
> +

Why also cache eth->h_proto? That is not mentioned in the commit message.

> +	/* Used in ipv6_gro_receive() */
> +	u8	transport_proto;

Eric Dumazet Feb. 22, 2023, 3:41 p.m. UTC | #2

On Wed, Feb 22, 2023 at 4:13 PM Richard Gobert <richardbgobert@gmail.com> wrote:
>
> Currently the IPv6 extension headers are parsed twice: first in
> ipv6_gro_receive, and then again in ipv6_gro_complete.
>
> By using the new ->transport_proto field, and also storing the size of the
> network header, we can avoid parsing extension headers a second time in
> ipv6_gro_complete (which saves multiple memory dereferences and conditional
> checks inside ipv6_exthdrs_len for a varying amount of extension headers in IPv6
> packets).
>
> The implementation had to handle both inner and outer layers in case of
> encapsulation (as they can't use the same field).
>
> Performance tests for TCP stream over IPv6 with a varying amount of extension
> headers demonstrate throughput improvement of ~0.7%.
>
> In addition, I fixed a potential existing problem:
>  - The call to skb_set_inner_network_header at the beginning of
>    ipv6_gro_complete calculates inner_network_header based on skb->data by
>    calling skb_set_inner_network_header, and setting it to point to the beginning
>    of the ip header.
>  - If a packet is going to be handled by BIG TCP, the following code block is
>    going to shift the packet header, and skb->data is going to be changed as
>    well.
>
> When the two flows are combined, inner_network_header will point to the wrong
> place.

net-next is closed.

If you think a fix is needed, please send a stand-alone and minimal
patch so that we can discuss its merit.

Note :

BIG TCP only supports native IPv6, not encapsulated traffic,
so we should not bother with inner_network_header yet.

Richard Gobert Feb. 23, 2023, 7:12 p.m. UTC | #3

> On Wed, Feb 22, 2023 at 4:13 PM Richard Gobert <richardbgobert@gmail.com> wrote:
> >
> > Currently the IPv6 extension headers are parsed twice: first in
> > ipv6_gro_receive, and then again in ipv6_gro_complete.
> >
> > By using the new ->transport_proto field, and also storing the size of the
> > network header, we can avoid parsing extension headers a second time in
> > ipv6_gro_complete (which saves multiple memory dereferences and conditional
> > checks inside ipv6_exthdrs_len for a varying amount of extension headers in IPv6
> > packets).
> >
> > The implementation had to handle both inner and outer layers in case of
> > encapsulation (as they can't use the same field).
> >
> > Performance tests for TCP stream over IPv6 with a varying amount of extension
> > headers demonstrate throughput improvement of ~0.7%.
> >
> > In addition, I fixed a potential existing problem:
> >  - The call to skb_set_inner_network_header at the beginning of
> >    ipv6_gro_complete calculates inner_network_header based on skb->data by
> >    calling skb_set_inner_network_header, and setting it to point to the beginning
> >    of the ip header.
> >  - If a packet is going to be handled by BIG TCP, the following code block is
> >    going to shift the packet header, and skb->data is going to be changed as
> >    well.
> >
> > When the two flows are combined, inner_network_header will point to the wrong
> > place.
> 
> net-next is closed.
> 
> If you think a fix is needed, please send a stand-alone and minimal
> patch so that we can discuss its merit.

I'll repost when net-next will be opened again.
Thanks.

> 
> Note :
> 
> BIG TCP only supports native IPv6, not encapsulated traffic,
> so we should not bother with inner_network_header yet.

diff --git a/include/net/gro.h b/include/net/gro.h
index 7b47dd6ce94f..35f60ea99f6c 100644
--- a/include/net/gro.h
+++ b/include/net/gro.h
@@ -86,6 +86,15 @@  struct napi_gro_cb {
 
 	/* used to support CHECKSUM_COMPLETE for tunneling protocols */
 	__wsum	csum;
+
+	/* Used in ipv6_gro_receive() */
+	u16	network_len;
+
+	/* Used in eth_gro_receive() */
+	__be16	network_proto;
+
+	/* Used in ipv6_gro_receive() */
+	u8	transport_proto;
 };
 
 #define NAPI_GRO_CB(skb) ((struct napi_gro_cb *)(skb)->cb)
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index 2edc8b796a4e..c2b77d9401e4 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -439,6 +439,9 @@  struct sk_buff *eth_gro_receive(struct list_head *head, struct sk_buff *skb)
 		goto out;
 	}
 
+	if (!NAPI_GRO_CB(skb)->encap_mark)
+		NAPI_GRO_CB(skb)->network_proto = type;
+
 	skb_gro_pull(skb, sizeof(*eh));
 	skb_gro_postpull_rcsum(skb, eh, sizeof(*eh));
 
@@ -455,13 +458,18 @@  EXPORT_SYMBOL(eth_gro_receive);
 
 int eth_gro_complete(struct sk_buff *skb, int nhoff)
 {
-	struct ethhdr *eh = (struct ethhdr *)(skb->data + nhoff);
-	__be16 type = eh->h_proto;
 	struct packet_offload *ptype;
+	struct ethhdr *eh;
 	int err = -ENOSYS;
+	__be16 type;
 
-	if (skb->encapsulation)
+	if (skb->encapsulation) {
+		eh = (struct ethhdr *)(skb->data + nhoff);
 		skb_set_inner_mac_header(skb, nhoff);
+		type = eh->h_proto;
+	} else {
+		type = NAPI_GRO_CB(skb)->network_proto;
+	}
 
 	ptype = gro_find_complete_by_type(type);
 	if (ptype != NULL)
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index 00dc2e3b0184..6e3a923ad573 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -232,6 +232,11 @@  INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
 	flush--;
 	nlen = skb_network_header_len(skb);
 
+	if (!NAPI_GRO_CB(skb)->encap_mark) {
+		NAPI_GRO_CB(skb)->transport_proto = proto;
+		NAPI_GRO_CB(skb)->network_len = nlen;
+	}
+
 	list_for_each_entry(p, head, list) {
 		const struct ipv6hdr *iph2;
 		__be32 first_word; /* <Version:4><Traffic_Class:8><Flow_Label:20> */
@@ -324,10 +329,6 @@  INDIRECT_CALLABLE_SCOPE int ipv6_gro_complete(struct sk_buff *skb, int nhoff)
 	int err = -ENOSYS;
 	u32 payload_len;
 
-	if (skb->encapsulation) {
-		skb_set_inner_protocol(skb, cpu_to_be16(ETH_P_IPV6));
-		skb_set_inner_network_header(skb, nhoff);
-	}
 
 	payload_len = skb->len - nhoff - sizeof(*iph);
 	if (unlikely(payload_len > IPV6_MAXPLEN)) {
@@ -341,6 +342,7 @@  INDIRECT_CALLABLE_SCOPE int ipv6_gro_complete(struct sk_buff *skb, int nhoff)
 		skb->len += hoplen;
 		skb->mac_header -= hoplen;
 		skb->network_header -= hoplen;
+		NAPI_GRO_CB(skb)->network_len += hoplen;
 		iph = (struct ipv6hdr *)(skb->data + nhoff);
 		hop_jumbo = (struct hop_jumbo_hdr *)(iph + 1);
 
@@ -358,7 +360,15 @@  INDIRECT_CALLABLE_SCOPE int ipv6_gro_complete(struct sk_buff *skb, int nhoff)
 		iph->payload_len = htons(payload_len);
 	}
 
-	nhoff += sizeof(*iph) + ipv6_exthdrs_len(iph, &ops);
+	if (skb->encapsulation) {
+		skb_set_inner_protocol(skb, cpu_to_be16(ETH_P_IPV6));
+		skb_set_inner_network_header(skb, nhoff);
+		nhoff += sizeof(*iph) + ipv6_exthdrs_len(iph, &ops);
+	} else {
+		ops = rcu_dereference(inet6_offloads[NAPI_GRO_CB(skb)->transport_proto]);
+		nhoff += NAPI_GRO_CB(skb)->network_len;
+	}
+
 	if (WARN_ON(!ops || !ops->callbacks.gro_complete))
 		goto out;

[v2,2/2] gro: optimise redundant parsing of packets

Checks

Commit Message

Comments

Patch