diff mbox series

[v5,net,2/3] net: report RCU QS on threaded NAPI repolling

Message ID 4c3b0d3f32d3b18949d75b18e5e1d9f13a24f025.1710877680.git.yan@cloudflare.com (mailing list archive)
State Accepted
Commit d6dbbb11247c71203785a2c9da474c36f4b19eae
Delegated to: Netdev Maintainers
Headers show
Series Report RCU QS for busy network kthreads | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 946 this patch: 946
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers fail 1 blamed authors not CCed: hannes@stressinduktion.org; 1 maintainers not CCed: hannes@stressinduktion.org
netdev/build_clang success Errors and warnings before: 957 this patch: 957
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 963 this patch: 963
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 15 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-03-20--12-00 (tests: 910)

Commit Message

Yan Zhai March 19, 2024, 8:44 p.m. UTC
NAPI threads can keep polling packets under load. Currently it is only
calling cond_resched() before repolling, but it is not sufficient to
clear out the holdout of RCU tasks, which prevent BPF tracing programs
from detaching for long period. This can be reproduced easily with
following set up:

ip netns add test1
ip netns add test2

ip -n test1 link add veth1 type veth peer name veth2 netns test2

ip -n test1 link set veth1 up
ip -n test1 link set lo up
ip -n test2 link set veth2 up
ip -n test2 link set lo up

ip -n test1 addr add 192.168.1.2/31 dev veth1
ip -n test1 addr add 1.1.1.1/32 dev lo
ip -n test2 addr add 192.168.1.3/31 dev veth2
ip -n test2 addr add 2.2.2.2/31 dev lo

ip -n test1 route add default via 192.168.1.3
ip -n test2 route add default via 192.168.1.2

for i in `seq 10 210`; do
 for j in `seq 10 210`; do
    ip netns exec test2 iptables -I INPUT -s 3.3.$i.$j -p udp --dport 5201
 done
done

ip netns exec test2 ethtool -K veth2 gro on
ip netns exec test2 bash -c 'echo 1 > /sys/class/net/veth2/threaded'
ip netns exec test1 ethtool -K veth1 tso off

Then run an iperf3 client/server and a bpftrace script can trigger it:

ip netns exec test2 iperf3 -s -B 2.2.2.2 >/dev/null&
ip netns exec test1 iperf3 -c 2.2.2.2 -B 1.1.1.1 -u -l 1500 -b 3g -t 100 >/dev/null&
bpftrace -e 'kfunc:__napi_poll{@=count();} interval:s:1{exit();}'

Report RCU quiescent states periodically will resolve the issue.

Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
Reviewed-by: Jesper Dangaard Brouer <hawk@kernel.org>
Signed-off-by: Yan Zhai <yan@cloudflare.com>
---
v2->v3: abstracted the work into a RCU helper
v1->v2: moved rcu_softirq_qs out from bh critical section, and only
raise it after a second of repolling. Added some brief perf test result.

v2: https://lore.kernel.org/bpf/ZeFPz4D121TgvCje@debian.debian/
v1: https://lore.kernel.org/lkml/Zd4DXTyCf17lcTfq@debian.debian/#t
---
 net/core/dev.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Paul E. McKenney March 19, 2024, 9:32 p.m. UTC | #1
On Tue, Mar 19, 2024 at 01:44:37PM -0700, Yan Zhai wrote:
> NAPI threads can keep polling packets under load. Currently it is only
> calling cond_resched() before repolling, but it is not sufficient to
> clear out the holdout of RCU tasks, which prevent BPF tracing programs
> from detaching for long period. This can be reproduced easily with
> following set up:
> 
> ip netns add test1
> ip netns add test2
> 
> ip -n test1 link add veth1 type veth peer name veth2 netns test2
> 
> ip -n test1 link set veth1 up
> ip -n test1 link set lo up
> ip -n test2 link set veth2 up
> ip -n test2 link set lo up
> 
> ip -n test1 addr add 192.168.1.2/31 dev veth1
> ip -n test1 addr add 1.1.1.1/32 dev lo
> ip -n test2 addr add 192.168.1.3/31 dev veth2
> ip -n test2 addr add 2.2.2.2/31 dev lo
> 
> ip -n test1 route add default via 192.168.1.3
> ip -n test2 route add default via 192.168.1.2
> 
> for i in `seq 10 210`; do
>  for j in `seq 10 210`; do
>     ip netns exec test2 iptables -I INPUT -s 3.3.$i.$j -p udp --dport 5201
>  done
> done
> 
> ip netns exec test2 ethtool -K veth2 gro on
> ip netns exec test2 bash -c 'echo 1 > /sys/class/net/veth2/threaded'
> ip netns exec test1 ethtool -K veth1 tso off
> 
> Then run an iperf3 client/server and a bpftrace script can trigger it:
> 
> ip netns exec test2 iperf3 -s -B 2.2.2.2 >/dev/null&
> ip netns exec test1 iperf3 -c 2.2.2.2 -B 1.1.1.1 -u -l 1500 -b 3g -t 100 >/dev/null&
> bpftrace -e 'kfunc:__napi_poll{@=count();} interval:s:1{exit();}'
> 
> Report RCU quiescent states periodically will resolve the issue.
> 
> Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
> Reviewed-by: Jesper Dangaard Brouer <hawk@kernel.org>
> Signed-off-by: Yan Zhai <yan@cloudflare.com>

Acked-by: Paul E. McKenney <paulmck@kernel.org>

> ---
> v2->v3: abstracted the work into a RCU helper
> v1->v2: moved rcu_softirq_qs out from bh critical section, and only
> raise it after a second of repolling. Added some brief perf test result.
> 
> v2: https://lore.kernel.org/bpf/ZeFPz4D121TgvCje@debian.debian/
> v1: https://lore.kernel.org/lkml/Zd4DXTyCf17lcTfq@debian.debian/#t
> ---
>  net/core/dev.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 303a6ff46e4e..9a67003e49db 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6743,6 +6743,8 @@ static int napi_threaded_poll(void *data)
>  	void *have;
>  
>  	while (!napi_thread_wait(napi)) {
> +		unsigned long last_qs = jiffies;
> +
>  		for (;;) {
>  			bool repoll = false;
>  
> @@ -6767,6 +6769,7 @@ static int napi_threaded_poll(void *data)
>  			if (!repoll)
>  				break;
>  
> +			rcu_softirq_qs_periodic(last_qs);
>  			cond_resched();
>  		}
>  	}
> -- 
> 2.30.2
> 
>
diff mbox series

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index 303a6ff46e4e..9a67003e49db 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6743,6 +6743,8 @@  static int napi_threaded_poll(void *data)
 	void *have;
 
 	while (!napi_thread_wait(napi)) {
+		unsigned long last_qs = jiffies;
+
 		for (;;) {
 			bool repoll = false;
 
@@ -6767,6 +6769,7 @@  static int napi_threaded_poll(void *data)
 			if (!repoll)
 				break;
 
+			rcu_softirq_qs_periodic(last_qs);
 			cond_resched();
 		}
 	}