Message ID | cover.1710877680.git.yan@cloudflare.com (mailing list archive) |
---|---|
Headers | show |
Series | Report RCU QS for busy network kthreads | expand |
On 19/03/2024 21.44, Yan Zhai wrote: > This changeset fixes a common problem for busy networking kthreads. > These threads, e.g. NAPI threads, typically will do: > > * polling a batch of packets > * if there are more work, call cond_resched() to allow scheduling > * continue to poll more packets when rx queue is not empty > > We observed this being a problem in production, since it can block RCU > tasks from making progress under heavy load. Investigation indicates > that just calling cond_resched() is insufficient for RCU tasks to reach > quiescent states. This also has the side effect of frequently clearing > the TIF_NEED_RESCHED flag on voluntary preempt kernels. As a result, > schedule() will not be called in these circumstances, despite schedule() > in fact provides required quiescent states. This at least affects NAPI > threads, napi_busy_loop, and also cpumap kthread. > > By reporting RCU QSes in these kthreads periodically before cond_resched, the > blocked RCU waiters can correctly progress. Instead of just reporting QS for > RCU tasks, these code share the same concern as noted in the commit > d28139c4e967 ("rcu: Apply RCU-bh QSes to RCU-sched and RCU-preempt when safe"). > So report a consolidated QS for safety. > > It is worth noting that, although this problem is reproducible in > napi_busy_loop, it only shows up when setting the polling interval to as high > as 2ms, which is far larger than recommended 50us-100us in the documentation. > So napi_busy_loop is left untouched. > > Lastly, this does not affect RT kernels, which does not enter the scheduler > through cond_resched(). Without the mentioned side effect, schedule() will > be called time by time, and clear the RCU task holdouts. > > V4: https://lore.kernel.org/bpf/cover.1710525524.git.yan@cloudflare.com/ > V3: https://lore.kernel.org/lkml/20240314145459.7b3aedf1@kernel.org/t/ > V2: https://lore.kernel.org/bpf/ZeFPz4D121TgvCje@debian.debian/ > V1: https://lore.kernel.org/lkml/Zd4DXTyCf17lcTfq@debian.debian/#t > > changes since v4: > * polished comments and docs for the RCU helper as Paul McKenney suggested > > changes since v3: > * fixed kernel-doc errors > > changes since v2: > * created a helper in rcu header to abstract the behavior > * fixed cpumap kthread in addition > > changes since v1: > * disable preemption first as Paul McKenney suggested > > Yan Zhai (3): > rcu: add a helper to report consolidated flavor QS > net: report RCU QS on threaded NAPI repolling > bpf: report RCU QS in cpumap kthread > > include/linux/rcupdate.h | 31 +++++++++++++++++++++++++++++++ > kernel/bpf/cpumap.c | 3 +++ > net/core/dev.c | 3 +++ > 3 files changed, 37 insertions(+) > Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Hello: This series was applied to netdev/net.git (main) by Jakub Kicinski <kuba@kernel.org>: On Tue, 19 Mar 2024 13:44:30 -0700 you wrote: > This changeset fixes a common problem for busy networking kthreads. > These threads, e.g. NAPI threads, typically will do: > > * polling a batch of packets > * if there are more work, call cond_resched() to allow scheduling > * continue to poll more packets when rx queue is not empty > > [...] Here is the summary with links: - [v5,net,1/3] rcu: add a helper to report consolidated flavor QS https://git.kernel.org/netdev/net/c/1a77557d48cf - [v5,net,2/3] net: report RCU QS on threaded NAPI repolling https://git.kernel.org/netdev/net/c/d6dbbb11247c - [v5,net,3/3] bpf: report RCU QS in cpumap kthread https://git.kernel.org/netdev/net/c/00bf63122459 You are awesome, thank you!