diff mbox series

[net] net: gro: do not keep too many GRO packets in napi->rx_list

Message ID 20210204213146.4192368-1-eric.dumazet@gmail.com (mailing list archive)
State Accepted
Commit 8dc1c444df193701910f5e80b5d4caaf705a8fb0
Delegated to: Netdev Maintainers
Headers show
Series [net] net: gro: do not keep too many GRO packets in napi->rx_list | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net
netdev/subject_prefix success Link
netdev/cc_maintainers warning 7 maintainers not CCed: andriin@fb.com daniel@iogearbox.net alobakin@pm.me bjorn@kernel.org ap420073@gmail.com ast@kernel.org xiyou.wangcong@gmail.com
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 10 this patch: 10
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch warning WARNING: Non-standard signature: Bisected-by: WARNING: line length of 83 exceeds 80 columns
netdev/build_allmodconfig_warn success Errors and warnings before: 10 this patch: 10
netdev/header_inline success Link
netdev/stable success Stable not CCed

Commit Message

Eric Dumazet Feb. 4, 2021, 9:31 p.m. UTC
From: Eric Dumazet <edumazet@google.com>

Commit c80794323e82 ("net: Fix packet reordering caused by GRO and
listified RX cooperation") had the unfortunate effect of adding
latencies in common workloads.

Before the patch, GRO packets were immediately passed to
upper stacks.

After the patch, we can accumulate quite a lot of GRO
packets (depdending on NAPI budget).

My fix is counting in napi->rx_count number of segments
instead of number of logical packets.

Fixes: c80794323e82 ("net: Fix packet reordering caused by GRO and listified RX cooperation")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Bisected-by: John Sperbeck <jsperbeck@google.com>
Tested-by: Jian Yang <jianyang@google.com>
Cc: Maxim Mikityanskiy <maximmi@mellanox.com>
Cc: Alexander Lobakin <alobakin@dlink.ru>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Edward Cree <ecree@solarflare.com>
---
 net/core/dev.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

Comments

Saeed Mahameed Feb. 4, 2021, 10:14 p.m. UTC | #1
On Thu, 2021-02-04 at 13:31 -0800, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> Commit c80794323e82 ("net: Fix packet reordering caused by GRO and
> listified RX cooperation") had the unfortunate effect of adding
> latencies in common workloads.
> 
> Before the patch, GRO packets were immediately passed to
> upper stacks.
> 
> After the patch, we can accumulate quite a lot of GRO
> packets (depdending on NAPI budget).
> 

Why napi budget ? looking at the code it seems to be more related to
MAX_GRO_SKBS * gro_normal_batch, since we are counting GRO SKBs as 1

but maybe i am missing some information about the actual issue you are
hitting.

> My fix is counting in napi->rx_count number of segments
> instead of number of logical packets.
> 
> Fixes: c80794323e82 ("net: Fix packet reordering caused by GRO and
> listified RX cooperation")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Bisected-by: John Sperbeck <jsperbeck@google.com>
> Tested-by: Jian Yang <jianyang@google.com>
> Cc: Maxim Mikityanskiy <maximmi@mellanox.com>
> Cc: Alexander Lobakin <alobakin@dlink.ru>
> Cc: Saeed Mahameed <saeedm@mellanox.com>
> Cc: Edward Cree <ecree@solarflare.com>
> ---
>  net/core/dev.c | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index
> a979b86dbacda9dfe31dd8b269024f7f0f5a8ef1..449b45b843d40ece7dd1e2ed6a5
> 996ee1db9f591 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -5735,10 +5735,11 @@ static void gro_normal_list(struct
> napi_struct *napi)
>  /* Queue one GRO_NORMAL SKB up for list processing. If batch size
> exceeded,
>   * pass the whole batch up to the stack.
>   */
> -static void gro_normal_one(struct napi_struct *napi, struct sk_buff
> *skb)
> +static void gro_normal_one(struct napi_struct *napi, struct sk_buff
> *skb, int segs)
>  {
>         list_add_tail(&skb->list, &napi->rx_list);
> -       if (++napi->rx_count >= gro_normal_batch)
> +       napi->rx_count += segs;
> +       if (napi->rx_count >= gro_normal_batch)
>                 gro_normal_list(napi);
>  }
>  
> @@ -5777,7 +5778,7 @@ static int napi_gro_complete(struct napi_struct
> *napi, struct sk_buff *skb)
>         }
>  
>  out:
> -       gro_normal_one(napi, skb);
> +       gro_normal_one(napi, skb, NAPI_GRO_CB(skb)->count);

Seems correct to me,

Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Eric Dumazet Feb. 4, 2021, 10:44 p.m. UTC | #2
On Thu, Feb 4, 2021 at 11:14 PM Saeed Mahameed <saeed@kernel.org> wrote:
>
> On Thu, 2021-02-04 at 13:31 -0800, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> >
> > Commit c80794323e82 ("net: Fix packet reordering caused by GRO and
> > listified RX cooperation") had the unfortunate effect of adding
> > latencies in common workloads.
> >
> > Before the patch, GRO packets were immediately passed to
> > upper stacks.
> >
> > After the patch, we can accumulate quite a lot of GRO
> > packets (depdending on NAPI budget).
> >
>
> Why napi budget ? looking at the code it seems to be more related to
> MAX_GRO_SKBS * gro_normal_batch, since we are counting GRO SKBs as 1


Simply because we call gro_normal_list() from napi_poll(),

So we flush the napi rx_list every 64 packets under stress.(assuming
NIC driver uses NAPI_POLL_WEIGHT),
or more often if napi_complete_done() is called if the budget was not exhausted.

GRO always has been able to keep MAX_GRO_SKBS in its layer, but no recent patch
has changed this part.


>
>
> but maybe i am missing some information about the actual issue you are
> hitting.


Well, the issue is precisely described in the changelog.

>
>
> > My fix is counting in napi->rx_count number of segments
> > instead of number of logical packets.
> >
> > Fixes: c80794323e82 ("net: Fix packet reordering caused by GRO and
> > listified RX cooperation")
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Bisected-by: John Sperbeck <jsperbeck@google.com>
> > Tested-by: Jian Yang <jianyang@google.com>
> > Cc: Maxim Mikityanskiy <maximmi@mellanox.com>
> > Cc: Alexander Lobakin <alobakin@dlink.ru>
> > Cc: Saeed Mahameed <saeedm@mellanox.com>
> > Cc: Edward Cree <ecree@solarflare.com>
> > ---
> >  net/core/dev.c | 11 ++++++-----
> >  1 file changed, 6 insertions(+), 5 deletions(-)
> >
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index
> > a979b86dbacda9dfe31dd8b269024f7f0f5a8ef1..449b45b843d40ece7dd1e2ed6a5
> > 996ee1db9f591 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -5735,10 +5735,11 @@ static void gro_normal_list(struct
> > napi_struct *napi)
> >  /* Queue one GRO_NORMAL SKB up for list processing. If batch size
> > exceeded,
> >   * pass the whole batch up to the stack.
> >   */
> > -static void gro_normal_one(struct napi_struct *napi, struct sk_buff
> > *skb)
> > +static void gro_normal_one(struct napi_struct *napi, struct sk_buff
> > *skb, int segs)
> >  {
> >         list_add_tail(&skb->list, &napi->rx_list);
> > -       if (++napi->rx_count >= gro_normal_batch)
> > +       napi->rx_count += segs;
> > +       if (napi->rx_count >= gro_normal_batch)
> >                 gro_normal_list(napi);
> >  }
> >
> > @@ -5777,7 +5778,7 @@ static int napi_gro_complete(struct napi_struct
> > *napi, struct sk_buff *skb)
> >         }
> >
> >  out:
> > -       gro_normal_one(napi, skb);
> > +       gro_normal_one(napi, skb, NAPI_GRO_CB(skb)->count);
>
> Seems correct to me,
>
> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
>
>
Edward Cree Feb. 5, 2021, 10:49 a.m. UTC | #3
On 04/02/2021 21:31, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> Commit c80794323e82 ("net: Fix packet reordering caused by GRO and
> listified RX cooperation") had the unfortunate effect of adding
> latencies in common workloads.
> 
> Before the patch, GRO packets were immediately passed to
> upper stacks.
> 
> After the patch, we can accumulate quite a lot of GRO
> packets (depdending on NAPI budget).
> 
> My fix is counting in napi->rx_count number of segments
> instead of number of logical packets.
> 
> Fixes: c80794323e82 ("net: Fix packet reordering caused by GRO and listified RX cooperation")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Bisected-by: John Sperbeck <jsperbeck@google.com>
> Tested-by: Jian Yang <jianyang@google.com>
> Cc: Maxim Mikityanskiy <maximmi@mellanox.com>
> Cc: Alexander Lobakin <alobakin@dlink.ru>
> Cc: Saeed Mahameed <saeedm@mellanox.com>
> Cc: Edward Cree <ecree@solarflare.com>

Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Alexander Lobakin Feb. 5, 2021, 1:03 p.m. UTC | #4
From: Eric Dumazet <edumazet@google.com>
Date: Thu, 4 Feb 2021 23:44:17 +0100

> On Thu, Feb 4, 2021 at 11:14 PM Saeed Mahameed <saeed@kernel.org> wrote:
> >
> > On Thu, 2021-02-04 at 13:31 -0800, Eric Dumazet wrote:
> > > From: Eric Dumazet <edumazet@google.com>
> > >
> > > Commit c80794323e82 ("net: Fix packet reordering caused by GRO and
> > > listified RX cooperation") had the unfortunate effect of adding
> > > latencies in common workloads.
> > >
> > > Before the patch, GRO packets were immediately passed to
> > > upper stacks.
> > >
> > > After the patch, we can accumulate quite a lot of GRO
> > > packets (depdending on NAPI budget).
> > >
> >
> > Why napi budget ? looking at the code it seems to be more related to
> > MAX_GRO_SKBS * gro_normal_batch, since we are counting GRO SKBs as 1
>
>
> Simply because we call gro_normal_list() from napi_poll(),
>
> So we flush the napi rx_list every 64 packets under stress.(assuming
> NIC driver uses NAPI_POLL_WEIGHT),
> or more often if napi_complete_done() is called if the budget was not exhausted.

Saeed,

Eric means that if we have e.g. 8 GRO packets with 8 segs each, then
rx_list will be flushed only after processing of 64 ingress frames.

> GRO always has been able to keep MAX_GRO_SKBS in its layer, but no recent patch
> has changed this part.
>
>
> >
> >
> > but maybe i am missing some information about the actual issue you are
> > hitting.
>
>
> Well, the issue is precisely described in the changelog.
>
> >
> >
> > > My fix is counting in napi->rx_count number of segments
> > > instead of number of logical packets.
> > >
> > > Fixes: c80794323e82 ("net: Fix packet reordering caused by GRO and
> > > listified RX cooperation")
> > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > > Bisected-by: John Sperbeck <jsperbeck@google.com>
> > > Tested-by: Jian Yang <jianyang@google.com>
> > > Cc: Maxim Mikityanskiy <maximmi@mellanox.com>
> > > Cc: Alexander Lobakin <alobakin@dlink.ru>

It's strange why mailmap didn't pick up my active email at pm.me.

Anyways, this fix is correct for me. It restores the original Edward's
logics, but without spurious out-of-order deliveries.
Moreover, the pre-patch behaviour can easily be achieved by increasing
net.core.gro_normal_batch if needed.

Thanks!

Reviewed-by: Alexander Lobakin <alobakin@pm.me>

> > > Cc: Saeed Mahameed <saeedm@mellanox.com>
> > > Cc: Edward Cree <ecree@solarflare.com>
> > > ---
> > >  net/core/dev.c | 11 ++++++-----
> > >  1 file changed, 6 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > index
> > > a979b86dbacda9dfe31dd8b269024f7f0f5a8ef1..449b45b843d40ece7dd1e2ed6a5
> > > 996ee1db9f591 100644
> > > --- a/net/core/dev.c
> > > +++ b/net/core/dev.c
> > > @@ -5735,10 +5735,11 @@ static void gro_normal_list(struct
> > > napi_struct *napi)
> > >  /* Queue one GRO_NORMAL SKB up for list processing. If batch size
> > > exceeded,
> > >   * pass the whole batch up to the stack.
> > >   */
> > > -static void gro_normal_one(struct napi_struct *napi, struct sk_buff
> > > *skb)
> > > +static void gro_normal_one(struct napi_struct *napi, struct sk_buff
> > > *skb, int segs)
> > >  {
> > >         list_add_tail(&skb->list, &napi->rx_list);
> > > -       if (++napi->rx_count >= gro_normal_batch)
> > > +       napi->rx_count += segs;
> > > +       if (napi->rx_count >= gro_normal_batch)
> > >                 gro_normal_list(napi);
> > >  }
> > >
> > > @@ -5777,7 +5778,7 @@ static int napi_gro_complete(struct napi_struct
> > > *napi, struct sk_buff *skb)
> > >         }
> > >
> > >  out:
> > > -       gro_normal_one(napi, skb);
> > > +       gro_normal_one(napi, skb, NAPI_GRO_CB(skb)->count);
> >
> > Seems correct to me,
> >
> > Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>

Al
Eric Dumazet Feb. 5, 2021, 2:10 p.m. UTC | #5
On Fri, Feb 5, 2021 at 2:03 PM Alexander Lobakin <alobakin@pm.me> wrote:
>

>
> It's strange why mailmap didn't pick up my active email at pm.me.

I took the signatures from c80794323e82, I CCed all people involved in
this recent patch.

It is very rare I use scripts/get_maintainer.pl since it tends to be noisy.

>
> Anyways, this fix is correct for me. It restores the original Edward's
> logics, but without spurious out-of-order deliveries.
> Moreover, the pre-patch behaviour can easily be achieved by increasing
> net.core.gro_normal_batch if needed.
>
> Thanks!
>
> Reviewed-by: Alexander Lobakin <alobakin@pm.me>
>

Thanks.
patchwork-bot+netdevbpf@kernel.org Feb. 6, 2021, 3:30 a.m. UTC | #6
Hello:

This patch was applied to netdev/net.git (refs/heads/master):

On Thu,  4 Feb 2021 13:31:46 -0800 you wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> Commit c80794323e82 ("net: Fix packet reordering caused by GRO and
> listified RX cooperation") had the unfortunate effect of adding
> latencies in common workloads.
> 
> Before the patch, GRO packets were immediately passed to
> upper stacks.
> 
> [...]

Here is the summary with links:
  - [net] net: gro: do not keep too many GRO packets in napi->rx_list
    https://git.kernel.org/netdev/net/c/8dc1c444df19

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
diff mbox series

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index a979b86dbacda9dfe31dd8b269024f7f0f5a8ef1..449b45b843d40ece7dd1e2ed6a5996ee1db9f591 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5735,10 +5735,11 @@  static void gro_normal_list(struct napi_struct *napi)
 /* Queue one GRO_NORMAL SKB up for list processing. If batch size exceeded,
  * pass the whole batch up to the stack.
  */
-static void gro_normal_one(struct napi_struct *napi, struct sk_buff *skb)
+static void gro_normal_one(struct napi_struct *napi, struct sk_buff *skb, int segs)
 {
 	list_add_tail(&skb->list, &napi->rx_list);
-	if (++napi->rx_count >= gro_normal_batch)
+	napi->rx_count += segs;
+	if (napi->rx_count >= gro_normal_batch)
 		gro_normal_list(napi);
 }
 
@@ -5777,7 +5778,7 @@  static int napi_gro_complete(struct napi_struct *napi, struct sk_buff *skb)
 	}
 
 out:
-	gro_normal_one(napi, skb);
+	gro_normal_one(napi, skb, NAPI_GRO_CB(skb)->count);
 	return NET_RX_SUCCESS;
 }
 
@@ -6067,7 +6068,7 @@  static gro_result_t napi_skb_finish(struct napi_struct *napi,
 {
 	switch (ret) {
 	case GRO_NORMAL:
-		gro_normal_one(napi, skb);
+		gro_normal_one(napi, skb, 1);
 		break;
 
 	case GRO_DROP:
@@ -6155,7 +6156,7 @@  static gro_result_t napi_frags_finish(struct napi_struct *napi,
 		__skb_push(skb, ETH_HLEN);
 		skb->protocol = eth_type_trans(skb, skb->dev);
 		if (ret == GRO_NORMAL)
-			gro_normal_one(napi, skb);
+			gro_normal_one(napi, skb, 1);
 		break;
 
 	case GRO_DROP: