diff mbox series

af_packet: fix efficiency issues in packet_read_pending

Message ID 20220406114807.1803-1-lianglixue@greatwall.com.cn (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series af_packet: fix efficiency issues in packet_read_pending | expand

Checks

Context Check Description
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 2 this patch: 2
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 2 this patch: 2
netdev/checkpatch warning WARNING: From:/Signed-off-by: email address mismatch: 'From: lianglixue <lixue.liang5086@gmail.com>' != 'Signed-off-by: lianglixue <lianglixue@greatwall.com.cn>'
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

lixue liang April 6, 2022, 11:48 a.m. UTC
In packet_read_pengding, even if the pending_refcnt of the first CPU
is not 0, the pending_refcnt of all CPUs will be traversed,
and the long delay of cross-cpu access in NUMA significantly reduces
the performance of packet sending; especially in tpacket_destruct_skb.

When pending_refcnt is not 0, it returns without counting the number of
all pending packets, which significantly reduces the traversal time.

Signed-off-by: lianglixue <lianglixue@greatwall.com.cn>
---
 net/packet/af_packet.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

Comments

Eric Dumazet April 6, 2022, 3:20 p.m. UTC | #1
On Wed, Apr 6, 2022 at 4:49 AM lianglixue <lixue.liang5086@gmail.com> wrote:
>
> In packet_read_pengding, even if the pending_refcnt of the first CPU
> is not 0, the pending_refcnt of all CPUs will be traversed,
> and the long delay of cross-cpu access in NUMA significantly reduces
> the performance of packet sending; especially in tpacket_destruct_skb.
>
> When pending_refcnt is not 0, it returns without counting the number of
> all pending packets, which significantly reduces the traversal time.
>

Can you describe the use case ?

You are changing the slow path.

Perhaps you do not use the interface in the most efficient way.

Your patch is wrong, pending_refcnt increments and decrements can
happen on different cpus.



> Signed-off-by: lianglixue <lianglixue@greatwall.com.cn>
> ---
>  net/packet/af_packet.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index c39c09899fd0..c04f49e44a33 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -1210,17 +1210,18 @@ static void packet_dec_pending(struct packet_ring_buffer *rb)
>
>  static unsigned int packet_read_pending(const struct packet_ring_buffer *rb)
>  {
> -       unsigned int refcnt = 0;
>         int cpu;
>
>         /* We don't use pending refcount in rx_ring. */
>         if (rb->pending_refcnt == NULL)
>                 return 0;
>
> -       for_each_possible_cpu(cpu)
> -               refcnt += *per_cpu_ptr(rb->pending_refcnt, cpu);
> +       for_each_possible_cpu(cpu) {
> +               if (*per_cpu_ptr(rb->pending_refcnt, cpu) != 0)
> +                       return 1;
> +       }
>
> -       return refcnt;
> +       return 0;
>  }
>
>  static int packet_alloc_pending(struct packet_sock *po)
> --
> 2.27.0
>
Eric Dumazet April 7, 2022, 3:14 a.m. UTC | #2
On Wed, Apr 6, 2022 at 8:02 PM lixue liang <lixue.liang5086@gmail.com> wrote:
>
> Hi, Eric Dumazet.The pending_refcnt will indeed increase and decrease on different cpus,
> but the caller of packet_read_pending in af_packet.c only needs to determine whether there is
> a pending packet, not the sum of pending packets on each cpu .
>
> In the numa of 128 cpus, if the maximum distance of node is 100 (numactl -H, the result is as shown below),
> even if the first cpu has a pending packet, the packet_read_pending of tpacket_destruct_skb still needs to
> additionally traverse the remaining 127 cpus the pending packet, which leads to additional consumption
> of more cpu time, and the return refcnt of packet_read_pending is the sum of the pending packets that the
> caller does not need.
>
> please correct me, thanks.

Please do not send HTML messages, they won't be delivered to the list.

No, the intent of the code is to sum all cpu variables.

Your change basically will break many cases, as I explained.

If for some reason CPU 0 did a single packet_inc_pending() but the
corresponding packet_dec_pending() happened on CPU 1,
we do not want  packet_read_pending() to return 1, but 0.


If having per-cpu variable is too costly, we need to revisit the whole thing.

(ie possibly revert b013840810c221f2b0cf641d01531526052dc1fb packet:
use percpu mmap tx frame pending refcount)

>
>
>
> 2022年4月6日 23:20,Eric Dumazet <edumazet@google.com> 写道:
>
> On Wed, Apr 6, 2022 at 4:49 AM lianglixue <lixue.liang5086@gmail.com> wrote:
>
>
> In packet_read_pengding, even if the pending_refcnt of the first CPU
> is not 0, the pending_refcnt of all CPUs will be traversed,
> and the long delay of cross-cpu access in NUMA significantly reduces
> the performance of packet sending; especially in tpacket_destruct_skb.
>
> When pending_refcnt is not 0, it returns without counting the number of
> all pending packets, which significantly reduces the traversal time.
>
>
> Can you describe the use case ?
>
> You are changing the slow path.
>
> Perhaps you do not use the interface in the most efficient way.
>
> Your patch is wrong, pending_refcnt increments and decrements can
> happen on different cpus.
>
>
>
> Signed-off-by: lianglixue <lianglixue@greatwall.com.cn>
> ---
> net/packet/af_packet.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index c39c09899fd0..c04f49e44a33 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -1210,17 +1210,18 @@ static void packet_dec_pending(struct packet_ring_buffer *rb)
>
> static unsigned int packet_read_pending(const struct packet_ring_buffer *rb)
> {
> -       unsigned int refcnt = 0;
>        int cpu;
>
>        /* We don't use pending refcount in rx_ring. */
>        if (rb->pending_refcnt == NULL)
>                return 0;
>
> -       for_each_possible_cpu(cpu)
> -               refcnt += *per_cpu_ptr(rb->pending_refcnt, cpu);
> +       for_each_possible_cpu(cpu) {
> +               if (*per_cpu_ptr(rb->pending_refcnt, cpu) != 0)
> +                       return 1;
> +       }
>
> -       return refcnt;
> +       return 0;
> }
>
> static int packet_alloc_pending(struct packet_sock *po)
> --
> 2.27.0
>
>
diff mbox series

Patch

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index c39c09899fd0..c04f49e44a33 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1210,17 +1210,18 @@  static void packet_dec_pending(struct packet_ring_buffer *rb)
 
 static unsigned int packet_read_pending(const struct packet_ring_buffer *rb)
 {
-	unsigned int refcnt = 0;
 	int cpu;
 
 	/* We don't use pending refcount in rx_ring. */
 	if (rb->pending_refcnt == NULL)
 		return 0;
 
-	for_each_possible_cpu(cpu)
-		refcnt += *per_cpu_ptr(rb->pending_refcnt, cpu);
+	for_each_possible_cpu(cpu) {
+		if (*per_cpu_ptr(rb->pending_refcnt, cpu) != 0)
+			return 1;
+	}
 
-	return refcnt;
+	return 0;
 }
 
 static int packet_alloc_pending(struct packet_sock *po)