diff mbox series

[net-next] net: Make gro complete function to return void

Message ID 20230529134430.492879-1-parav@nvidia.com (mailing list archive)
State Accepted
Commit b1f2abcf817d82b54764d1474424649feda6fe1b
Delegated to: Netdev Maintainers
Headers show
Series [net-next] net: Make gro complete function to return void | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 990 this patch: 990
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 127 this patch: 127
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 997 this patch: 997
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 42 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Parav Pandit May 29, 2023, 1:44 p.m. UTC
tcp_gro_complete() function only updates the skb fields related to GRO
and it always returns zero. All the 3 drivers which are using it
do not check for the return value either.

Change it to return void instead which simplifies its callers as
error handing becomes unnecessary.

Signed-off-by: Parav Pandit <parav@nvidia.com>
---
 include/net/tcp.h        | 2 +-
 net/ipv4/tcp_offload.c   | 7 +++----
 net/ipv6/tcpv6_offload.c | 3 ++-
 3 files changed, 6 insertions(+), 6 deletions(-)

Comments

Simon Horman May 30, 2023, 7:45 a.m. UTC | #1
On Mon, May 29, 2023 at 04:44:30PM +0300, Parav Pandit wrote:
> tcp_gro_complete() function only updates the skb fields related to GRO
> and it always returns zero. All the 3 drivers which are using it
> do not check for the return value either.
> 
> Change it to return void instead which simplifies its callers as
> error handing becomes unnecessary.
> 
> Signed-off-by: Parav Pandit <parav@nvidia.com>

Reviewed-by: Simon Horman <simon.horman@corigine.com>
David Ahern May 30, 2023, 3:25 p.m. UTC | #2
On 5/29/23 7:44 AM, Parav Pandit wrote:
> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
> index 45dda7889387..88f9b0081ee7 100644
> --- a/net/ipv4/tcp_offload.c
> +++ b/net/ipv4/tcp_offload.c
> @@ -296,7 +296,7 @@ struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb)
>  	return pp;
>  }
>  
> -int tcp_gro_complete(struct sk_buff *skb)
> +void tcp_gro_complete(struct sk_buff *skb)
>  {
>  	struct tcphdr *th = tcp_hdr(skb);
>  
> @@ -311,8 +311,6 @@ int tcp_gro_complete(struct sk_buff *skb)
>  
>  	if (skb->encapsulation)
>  		skb->inner_transport_header = skb->transport_header;
> -
> -	return 0;
>  }
>  EXPORT_SYMBOL(tcp_gro_complete);

tcp_gro_complete seems fairly trivial. Any reason not to make it an
inline and avoid another function call in the datapath?

>  
> @@ -342,7 +340,8 @@ INDIRECT_CALLABLE_SCOPE int tcp4_gro_complete(struct sk_buff *skb, int thoff)
>  	if (NAPI_GRO_CB(skb)->is_atomic)
>  		skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_FIXEDID;
>  
> -	return tcp_gro_complete(skb);
> +	tcp_gro_complete(skb);
> +	return 0;
>  }
>
Parav Pandit May 30, 2023, 3:39 p.m. UTC | #3
> From: David Ahern <dsahern@kernel.org>
> Sent: Tuesday, May 30, 2023 11:26 AM
> 
> On 5/29/23 7:44 AM, Parav Pandit wrote:
> > diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c index
> > 45dda7889387..88f9b0081ee7 100644
> > --- a/net/ipv4/tcp_offload.c
> > +++ b/net/ipv4/tcp_offload.c
> > @@ -296,7 +296,7 @@ struct sk_buff *tcp_gro_receive(struct list_head
> *head, struct sk_buff *skb)
> >  	return pp;
> >  }
> >
> > -int tcp_gro_complete(struct sk_buff *skb)
> > +void tcp_gro_complete(struct sk_buff *skb)
> >  {
> >  	struct tcphdr *th = tcp_hdr(skb);
> >
> > @@ -311,8 +311,6 @@ int tcp_gro_complete(struct sk_buff *skb)
> >
> >  	if (skb->encapsulation)
> >  		skb->inner_transport_header = skb->transport_header;
> > -
> > -	return 0;
> >  }
> >  EXPORT_SYMBOL(tcp_gro_complete);
> 
> tcp_gro_complete seems fairly trivial. Any reason not to make it an inline and
> avoid another function call in the datapath?
> 
Sounds good to me.
With inline it should mostly improve the perf, but I do not have any of the 3 adapters which are calling this API to show perf results.

Since, it is a different change touching the performance, I prefer to do follow up patch that bnx owners can test.
Is that ok?
Eric Dumazet May 30, 2023, 3:48 p.m. UTC | #4
On Tue, May 30, 2023 at 5:25 PM David Ahern <dsahern@kernel.org> wrote:
>
> On 5/29/23 7:44 AM, Parav Pandit wrote:
> > diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
> > index 45dda7889387..88f9b0081ee7 100644
> > --- a/net/ipv4/tcp_offload.c
> > +++ b/net/ipv4/tcp_offload.c
> > @@ -296,7 +296,7 @@ struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb)
> >       return pp;
> >  }
> >
> > -int tcp_gro_complete(struct sk_buff *skb)
> > +void tcp_gro_complete(struct sk_buff *skb)
> >  {
> >       struct tcphdr *th = tcp_hdr(skb);
> >
> > @@ -311,8 +311,6 @@ int tcp_gro_complete(struct sk_buff *skb)
> >
> >       if (skb->encapsulation)
> >               skb->inner_transport_header = skb->transport_header;
> > -
> > -     return 0;
> >  }
> >  EXPORT_SYMBOL(tcp_gro_complete);
>
> tcp_gro_complete seems fairly trivial. Any reason not to make it an
> inline and avoid another function call in the datapath?

Probably, although it is a regular function call, not an indirect one.

In the grand total of driver rx napi + GRO cost, saving a few cycles
per GRO completed packet is quite small.
David Ahern May 30, 2023, 3:51 p.m. UTC | #5
On 5/30/23 9:39 AM, Parav Pandit wrote:
>> From: David Ahern <dsahern@kernel.org>
>> Sent: Tuesday, May 30, 2023 11:26 AM
>>
>> On 5/29/23 7:44 AM, Parav Pandit wrote:
>>> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c index
>>> 45dda7889387..88f9b0081ee7 100644
>>> --- a/net/ipv4/tcp_offload.c
>>> +++ b/net/ipv4/tcp_offload.c
>>> @@ -296,7 +296,7 @@ struct sk_buff *tcp_gro_receive(struct list_head
>> *head, struct sk_buff *skb)
>>>  	return pp;
>>>  }
>>>
>>> -int tcp_gro_complete(struct sk_buff *skb)
>>> +void tcp_gro_complete(struct sk_buff *skb)
>>>  {
>>>  	struct tcphdr *th = tcp_hdr(skb);
>>>
>>> @@ -311,8 +311,6 @@ int tcp_gro_complete(struct sk_buff *skb)
>>>
>>>  	if (skb->encapsulation)
>>>  		skb->inner_transport_header = skb->transport_header;
>>> -
>>> -	return 0;
>>>  }
>>>  EXPORT_SYMBOL(tcp_gro_complete);
>>
>> tcp_gro_complete seems fairly trivial. Any reason not to make it an inline and
>> avoid another function call in the datapath?
>>
> Sounds good to me.
> With inline it should mostly improve the perf, but I do not have any of the 3 adapters which are calling this API to show perf results.
> 
> Since, it is a different change touching the performance, I prefer to do follow up patch that bnx owners can test.
> Is that ok?

sure.
Jakub Kicinski May 30, 2023, 7:39 p.m. UTC | #6
On Tue, 30 May 2023 17:48:22 +0200 Eric Dumazet wrote:
> > tcp_gro_complete seems fairly trivial. Any reason not to make it an
> > inline and avoid another function call in the datapath?  
> 
> Probably, although it is a regular function call, not an indirect one.
> 
> In the grand total of driver rx napi + GRO cost, saving a few cycles
> per GRO completed packet is quite small.

IOW please make sure you include the performance analysis quantifying
the win, if you want to make this a static inline. Or let us know if
the patch is good as is, I'm keeping it in pw for now.
David Ahern May 30, 2023, 10:36 p.m. UTC | #7
On 5/30/23 1:39 PM, Jakub Kicinski wrote:
> On Tue, 30 May 2023 17:48:22 +0200 Eric Dumazet wrote:
>>> tcp_gro_complete seems fairly trivial. Any reason not to make it an
>>> inline and avoid another function call in the datapath?  
>>
>> Probably, although it is a regular function call, not an indirect one.
>>
>> In the grand total of driver rx napi + GRO cost, saving a few cycles
>> per GRO completed packet is quite small.
> 
> IOW please make sure you include the performance analysis quantifying
> the win, if you want to make this a static inline. Or let us know if
> the patch is good as is, I'm keeping it in pw for now.

I am not suggesting holding up this patch; just constantly looking for
these little savings here and there to keep lowering the overhead.

100G, 1500 MTU, line rate is 8.3M pps so GRO wise that would be ~180k
fewer function calls.
Eric Dumazet May 31, 2023, 4:38 a.m. UTC | #8
On Wed, May 31, 2023 at 12:36 AM David Ahern <dsahern@kernel.org> wrote:
>
> On 5/30/23 1:39 PM, Jakub Kicinski wrote:
> > On Tue, 30 May 2023 17:48:22 +0200 Eric Dumazet wrote:
> >>> tcp_gro_complete seems fairly trivial. Any reason not to make it an
> >>> inline and avoid another function call in the datapath?
> >>
> >> Probably, although it is a regular function call, not an indirect one.
> >>
> >> In the grand total of driver rx napi + GRO cost, saving a few cycles
> >> per GRO completed packet is quite small.
> >
> > IOW please make sure you include the performance analysis quantifying
> > the win, if you want to make this a static inline. Or let us know if
> > the patch is good as is, I'm keeping it in pw for now.
>
> I am not suggesting holding up this patch; just constantly looking for
> these little savings here and there to keep lowering the overhead.
>
> 100G, 1500 MTU, line rate is 8.3M pps so GRO wise that would be ~180k
> fewer function calls.

Here with 4K MTU, this is called 67k per second

An __skb_put() instead of skb_put() in a driver (eg mlx5e_build_linear_skb())
would have 45x more impact, and would still be noise.
patchwork-bot+netdevbpf@kernel.org May 31, 2023, 9:10 a.m. UTC | #9
Hello:

This patch was applied to netdev/net-next.git (main)
by David S. Miller <davem@davemloft.net>:

On Mon, 29 May 2023 16:44:30 +0300 you wrote:
> tcp_gro_complete() function only updates the skb fields related to GRO
> and it always returns zero. All the 3 drivers which are using it
> do not check for the return value either.
> 
> Change it to return void instead which simplifies its callers as
> error handing becomes unnecessary.
> 
> [...]

Here is the summary with links:
  - [net-next] net: Make gro complete function to return void
    https://git.kernel.org/netdev/net-next/c/b1f2abcf817d

You are awesome, thank you!
Parav Pandit May 31, 2023, 5:07 p.m. UTC | #10
> From: Eric Dumazet <edumazet@google.com>
> Sent: Wednesday, May 31, 2023 12:39 AM
> 
> On Wed, May 31, 2023 at 12:36 AM David Ahern <dsahern@kernel.org> wrote:
> >
> > On 5/30/23 1:39 PM, Jakub Kicinski wrote:
> > > On Tue, 30 May 2023 17:48:22 +0200 Eric Dumazet wrote:
> > >>> tcp_gro_complete seems fairly trivial. Any reason not to make it
> > >>> an inline and avoid another function call in the datapath?
> > >>
> > >> Probably, although it is a regular function call, not an indirect one.
> > >>
> > >> In the grand total of driver rx napi + GRO cost, saving a few
> > >> cycles per GRO completed packet is quite small.
> > >
> > > IOW please make sure you include the performance analysis
> > > quantifying the win, if you want to make this a static inline. Or
> > > let us know if the patch is good as is, I'm keeping it in pw for now.
> >
> > I am not suggesting holding up this patch; just constantly looking for
> > these little savings here and there to keep lowering the overhead.
> >
> > 100G, 1500 MTU, line rate is 8.3M pps so GRO wise that would be ~180k
> > fewer function calls.
> 
> Here with 4K MTU, this is called 67k per second
> 
> An __skb_put() instead of skb_put() in a driver (eg mlx5e_build_linear_skb())
> would have 45x more impact, and would still be noise.

Thanks, Eric, for the suggestion, will evaluate with Tariq to use __skb_put().
diff mbox series

Patch

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 0b755988e20c..14fa716cac50 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2041,7 +2041,7 @@  INDIRECT_CALLABLE_DECLARE(int tcp4_gro_complete(struct sk_buff *skb, int thoff))
 INDIRECT_CALLABLE_DECLARE(struct sk_buff *tcp4_gro_receive(struct list_head *head, struct sk_buff *skb));
 INDIRECT_CALLABLE_DECLARE(int tcp6_gro_complete(struct sk_buff *skb, int thoff));
 INDIRECT_CALLABLE_DECLARE(struct sk_buff *tcp6_gro_receive(struct list_head *head, struct sk_buff *skb));
-int tcp_gro_complete(struct sk_buff *skb);
+void tcp_gro_complete(struct sk_buff *skb);
 
 void __tcp_v4_send_check(struct sk_buff *skb, __be32 saddr, __be32 daddr);
 
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 45dda7889387..88f9b0081ee7 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -296,7 +296,7 @@  struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb)
 	return pp;
 }
 
-int tcp_gro_complete(struct sk_buff *skb)
+void tcp_gro_complete(struct sk_buff *skb)
 {
 	struct tcphdr *th = tcp_hdr(skb);
 
@@ -311,8 +311,6 @@  int tcp_gro_complete(struct sk_buff *skb)
 
 	if (skb->encapsulation)
 		skb->inner_transport_header = skb->transport_header;
-
-	return 0;
 }
 EXPORT_SYMBOL(tcp_gro_complete);
 
@@ -342,7 +340,8 @@  INDIRECT_CALLABLE_SCOPE int tcp4_gro_complete(struct sk_buff *skb, int thoff)
 	if (NAPI_GRO_CB(skb)->is_atomic)
 		skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_FIXEDID;
 
-	return tcp_gro_complete(skb);
+	tcp_gro_complete(skb);
+	return 0;
 }
 
 static const struct net_offload tcpv4_offload = {
diff --git a/net/ipv6/tcpv6_offload.c b/net/ipv6/tcpv6_offload.c
index 39db5a226855..bf0c957e4b5e 100644
--- a/net/ipv6/tcpv6_offload.c
+++ b/net/ipv6/tcpv6_offload.c
@@ -36,7 +36,8 @@  INDIRECT_CALLABLE_SCOPE int tcp6_gro_complete(struct sk_buff *skb, int thoff)
 				  &iph->daddr, 0);
 	skb_shinfo(skb)->gso_type |= SKB_GSO_TCPV6;
 
-	return tcp_gro_complete(skb);
+	tcp_gro_complete(skb);
+	return 0;
 }
 
 static struct sk_buff *tcp6_gso_segment(struct sk_buff *skb,