af_packet: fix efficiency issues in packet_read_pending

Message ID	20220406114807.1803-1-lianglixue@greatwall.com.cn (mailing list archive)
State	Changes Requested
Delegated to:	Netdev Maintainers
Headers	show Return-Path: <netdev-owner@kernel.org> From: lianglixue <lixue.liang5086@gmail.com> To: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com Cc: edumazet@google.com, pablo@netfilter.org, rsanger@wand.net.nz, yajun.deng@linux.dev, jiapeng.chong@linux.alibaba.com, netdev@vger.kernel.org, lianglixue <lianglixue@greatwall.com.cn> Subject: [PATCH] af_packet: fix efficiency issues in packet_read_pending Date: Wed, 6 Apr 2022 11:48:07 +0000 Message-Id: <20220406114807.1803-1-lianglixue@greatwall.com.cn> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	af_packet: fix efficiency issues in packet_read_pending \| expand af_packet: fix efficiency issues in packet_read_pending

Message ID

20220406114807.1803-1-lianglixue@greatwall.com.cn (mailing list archive)

State

Changes Requested

Delegated to:

Netdev Maintainers

Headers

From: lianglixue <lixue.liang5086@gmail.com>
To: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com
Cc: edumazet@google.com, pablo@netfilter.org, rsanger@wand.net.nz,
        yajun.deng@linux.dev, jiapeng.chong@linux.alibaba.com,
        netdev@vger.kernel.org, lianglixue <lianglixue@greatwall.com.cn>
Subject: [PATCH] af_packet: fix efficiency issues in packet_read_pending
Date: Wed,  6 Apr 2022 11:48:07 +0000
Message-Id: <20220406114807.1803-1-lianglixue@greatwall.com.cn>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

af_packet: fix efficiency issues in packet_read_pending | expand

Context	Check	Description
netdev/tree_selection	success	Guessed tree name to be net-next
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/subject_prefix	warning	Target tree name not specified in the subject
netdev/cover_letter	success	Single patches do not need cover letters
netdev/patch_count	success	Link
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 2 this patch: 2
netdev/cc_maintainers	success	CCed 6 of 6 maintainers
netdev/build_clang	success	Errors and warnings before: 0 this patch: 0
netdev/module_param	success	Was 0 now: 0
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 2 this patch: 2
netdev/checkpatch	warning	WARNING: From:/Signed-off-by: email address mismatch: 'From: lianglixue <lixue.liang5086@gmail.com>' != 'Signed-off-by: lianglixue <lianglixue@greatwall.com.cn>'
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0

Context

Check

Description

netdev/tree_selection

success

Guessed tree name to be net-next

netdev/fixes_present

success

Fixes tag not required for -next series

netdev/subject_prefix

warning

Target tree name not specified in the subject

netdev/cover_letter

success

Single patches do not need cover letters

netdev/patch_count

success

Link

netdev/header_inline

success

No static functions without inline keyword in header files

netdev/build_32bit

success

Errors and warnings before: 2 this patch: 2

netdev/cc_maintainers

success

CCed 6 of 6 maintainers

netdev/build_clang

success

Errors and warnings before: 0 this patch: 0

netdev/module_param

success

Was 0 now: 0

netdev/verify_signedoff

success

Signed-off-by tag matches author and committer

netdev/verify_fixes

success

No Fixes tag

netdev/build_allmodconfig_warn

success

Errors and warnings before: 2 this patch: 2

netdev/checkpatch

warning

WARNING: From:/Signed-off-by: email address mismatch: 'From: lianglixue <lixue.liang5086@gmail.com>' != 'Signed-off-by: lianglixue <lianglixue@greatwall.com.cn>'

netdev/kdoc

success

Errors and warnings before: 0 this patch: 0

netdev/source_inline

success

Was 0 now: 0

Commit Message

lixue liang April 6, 2022, 11:48 a.m. UTC

In packet_read_pengding, even if the pending_refcnt of the first CPU
is not 0, the pending_refcnt of all CPUs will be traversed,
and the long delay of cross-cpu access in NUMA significantly reduces
the performance of packet sending; especially in tpacket_destruct_skb.

When pending_refcnt is not 0, it returns without counting the number of
all pending packets, which significantly reduces the traversal time.

Signed-off-by: lianglixue <lianglixue@greatwall.com.cn>
---
 net/packet/af_packet.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

Comments

Eric Dumazet April 6, 2022, 3:20 p.m. UTC | #1

On Wed, Apr 6, 2022 at 4:49 AM lianglixue <lixue.liang5086@gmail.com> wrote:
>
> In packet_read_pengding, even if the pending_refcnt of the first CPU
> is not 0, the pending_refcnt of all CPUs will be traversed,
> and the long delay of cross-cpu access in NUMA significantly reduces
> the performance of packet sending; especially in tpacket_destruct_skb.
>
> When pending_refcnt is not 0, it returns without counting the number of
> all pending packets, which significantly reduces the traversal time.
>

Can you describe the use case ?

You are changing the slow path.

Perhaps you do not use the interface in the most efficient way.

Your patch is wrong, pending_refcnt increments and decrements can
happen on different cpus.



> Signed-off-by: lianglixue <lianglixue@greatwall.com.cn>
> ---
>  net/packet/af_packet.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index c39c09899fd0..c04f49e44a33 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -1210,17 +1210,18 @@ static void packet_dec_pending(struct packet_ring_buffer *rb)
>
>  static unsigned int packet_read_pending(const struct packet_ring_buffer *rb)
>  {
> -       unsigned int refcnt = 0;
>         int cpu;
>
>         /* We don't use pending refcount in rx_ring. */
>         if (rb->pending_refcnt == NULL)
>                 return 0;
>
> -       for_each_possible_cpu(cpu)
> -               refcnt += *per_cpu_ptr(rb->pending_refcnt, cpu);
> +       for_each_possible_cpu(cpu) {
> +               if (*per_cpu_ptr(rb->pending_refcnt, cpu) != 0)
> +                       return 1;
> +       }
>
> -       return refcnt;
> +       return 0;
>  }
>
>  static int packet_alloc_pending(struct packet_sock *po)
> --
> 2.27.0
>

Eric Dumazet April 7, 2022, 3:14 a.m. UTC | #2

On Wed, Apr 6, 2022 at 8:02 PM lixue liang <lixue.liang5086@gmail.com> wrote:
>
> Hi, Eric Dumazet.The pending_refcnt will indeed increase and decrease on different cpus,
> but the caller of packet_read_pending in af_packet.c only needs to determine whether there is
> a pending packet, not the sum of pending packets on each cpu .
>
> In the numa of 128 cpus, if the maximum distance of node is 100 (numactl -H, the result is as shown below),
> even if the first cpu has a pending packet, the packet_read_pending of tpacket_destruct_skb still needs to
> additionally traverse the remaining 127 cpus the pending packet, which leads to additional consumption
> of more cpu time, and the return refcnt of packet_read_pending is the sum of the pending packets that the
> caller does not need.
>
> please correct me, thanks.

Please do not send HTML messages, they won't be delivered to the list.

No, the intent of the code is to sum all cpu variables.

Your change basically will break many cases, as I explained.

If for some reason CPU 0 did a single packet_inc_pending() but the
corresponding packet_dec_pending() happened on CPU 1,
we do not want  packet_read_pending() to return 1, but 0.


If having per-cpu variable is too costly, we need to revisit the whole thing.

(ie possibly revert b013840810c221f2b0cf641d01531526052dc1fb packet:
use percpu mmap tx frame pending refcount)

>
>
>
> 2022年4月6日 23:20，Eric Dumazet <edumazet@google.com> 写道：
>
> On Wed, Apr 6, 2022 at 4:49 AM lianglixue <lixue.liang5086@gmail.com> wrote:
>
>
> In packet_read_pengding, even if the pending_refcnt of the first CPU
> is not 0, the pending_refcnt of all CPUs will be traversed,
> and the long delay of cross-cpu access in NUMA significantly reduces
> the performance of packet sending; especially in tpacket_destruct_skb.
>
> When pending_refcnt is not 0, it returns without counting the number of
> all pending packets, which significantly reduces the traversal time.
>
>
> Can you describe the use case ?
>
> You are changing the slow path.
>
> Perhaps you do not use the interface in the most efficient way.
>
> Your patch is wrong, pending_refcnt increments and decrements can
> happen on different cpus.
>
>
>
> Signed-off-by: lianglixue <lianglixue@greatwall.com.cn>
> ---
> net/packet/af_packet.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index c39c09899fd0..c04f49e44a33 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -1210,17 +1210,18 @@ static void packet_dec_pending(struct packet_ring_buffer *rb)
>
> static unsigned int packet_read_pending(const struct packet_ring_buffer *rb)
> {
> -       unsigned int refcnt = 0;
>        int cpu;
>
>        /* We don't use pending refcount in rx_ring. */
>        if (rb->pending_refcnt == NULL)
>                return 0;
>
> -       for_each_possible_cpu(cpu)
> -               refcnt += *per_cpu_ptr(rb->pending_refcnt, cpu);
> +       for_each_possible_cpu(cpu) {
> +               if (*per_cpu_ptr(rb->pending_refcnt, cpu) != 0)
> +                       return 1;
> +       }
>
> -       return refcnt;
> +       return 0;
> }
>
> static int packet_alloc_pending(struct packet_sock *po)
> --
> 2.27.0
>
>

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index c39c09899fd0..c04f49e44a33 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1210,17 +1210,18 @@  static void packet_dec_pending(struct packet_ring_buffer *rb)
 
 static unsigned int packet_read_pending(const struct packet_ring_buffer *rb)
 {
-	unsigned int refcnt = 0;
 	int cpu;
 
 	/* We don't use pending refcount in rx_ring. */
 	if (rb->pending_refcnt == NULL)
 		return 0;
 
-	for_each_possible_cpu(cpu)
-		refcnt += *per_cpu_ptr(rb->pending_refcnt, cpu);
+	for_each_possible_cpu(cpu) {
+		if (*per_cpu_ptr(rb->pending_refcnt, cpu) != 0)
+			return 1;
+	}
 
-	return refcnt;
+	return 0;
 }
 
 static int packet_alloc_pending(struct packet_sock *po)

af_packet: fix efficiency issues in packet_read_pending

Checks

Commit Message

Comments

Patch