Message ID | 20250219-netdevsim-v3-1-811e2b8abc4c@debian.org (mailing list archive) |
---|---|
State | Accepted |
Commit | bf3624cf1c3708284c53ed99a1c43f2e104dc2dd |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next,v3] netdevsim: call napi_schedule from a timer context | expand |
On Wed, Feb 19, 2025 at 08:41:20AM -0800, Breno Leitao wrote: > The netdevsim driver was experiencing NOHZ tick-stop errors during packet > transmission due to pending softirq work when calling napi_schedule(). > This issue was observed when running the netconsole selftest, which > triggered the following error message: > > NOHZ tick-stop error: local softirq work is pending, handler #08!!! > > To fix this issue, introduce a timer that schedules napi_schedule() > from a timer context instead of calling it directly from the TX path. > > Create an hrtimer for each queue and kick it from the TX path, > which then schedules napi_schedule() from the timer context. > > Suggested-by: Jakub Kicinski <kuba@kernel.org> > Signed-off-by: Breno Leitao <leitao@debian.org> > --- Looking at the tests, 3 of them are failing: https://netdev.bots.linux.dev/flakes.html 2/3 passed when retried and just one of them (ip6gre-custom-multipath-hash-sh) failed also on the retry. Looking at the flakes, I see that ip6gre-custom-multipath-hash-sh was flake during yesterday: https://netdev.bots.linux.dev/flakes.html?min-flip=0&tn-needle=ip6gre-custom-multipath-hash-sh I've testd manually it, and the tests is passing: # vng -v --run . --user root --cpus 4 -- make -C tools/testing/selftests TARGETS=net/forwarding TEST_PROGS=ip6gre_custom_multipath_hash.sh TEST_GEN_PROGS="" run_tests ... ok 1 selftests: net/forwarding: ip6gre_custom_multipath_hash.sh So, from a NIPA testing perspective, it seems the patch is good
Hello: This patch was applied to netdev/net-next.git (main) by Jakub Kicinski <kuba@kernel.org>: On Wed, 19 Feb 2025 08:41:20 -0800 you wrote: > The netdevsim driver was experiencing NOHZ tick-stop errors during packet > transmission due to pending softirq work when calling napi_schedule(). > This issue was observed when running the netconsole selftest, which > triggered the following error message: > > NOHZ tick-stop error: local softirq work is pending, handler #08!!! > > [...] Here is the summary with links: - [net-next,v3] netdevsim: call napi_schedule from a timer context https://git.kernel.org/netdev/net-next/c/bf3624cf1c37 You are awesome, thank you!
diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c index 9b394ddc5206a7a5ca5440341551aac50c43e20c..a41dc79e9c2e082367af156b10b61f04be8c41fb 100644 --- a/drivers/net/netdevsim/netdev.c +++ b/drivers/net/netdevsim/netdev.c @@ -87,7 +87,8 @@ static netdev_tx_t nsim_start_xmit(struct sk_buff *skb, struct net_device *dev) if (unlikely(nsim_forward_skb(peer_dev, skb, rq) == NET_RX_DROP)) goto out_drop_cnt; - napi_schedule(&rq->napi); + if (!hrtimer_active(&rq->napi_timer)) + hrtimer_start(&rq->napi_timer, us_to_ktime(5), HRTIMER_MODE_REL); rcu_read_unlock(); u64_stats_update_begin(&ns->syncp); @@ -426,6 +427,22 @@ static int nsim_init_napi(struct netdevsim *ns) return err; } +static enum hrtimer_restart nsim_napi_schedule(struct hrtimer *timer) +{ + struct nsim_rq *rq; + + rq = container_of(timer, struct nsim_rq, napi_timer); + napi_schedule(&rq->napi); + + return HRTIMER_NORESTART; +} + +static void nsim_rq_timer_init(struct nsim_rq *rq) +{ + hrtimer_init(&rq->napi_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); + rq->napi_timer.function = nsim_napi_schedule; +} + static void nsim_enable_napi(struct netdevsim *ns) { struct net_device *dev = ns->netdev; @@ -615,11 +632,13 @@ static struct nsim_rq *nsim_queue_alloc(void) return NULL; skb_queue_head_init(&rq->skb_queue); + nsim_rq_timer_init(rq); return rq; } static void nsim_queue_free(struct nsim_rq *rq) { + hrtimer_cancel(&rq->napi_timer); skb_queue_purge_reason(&rq->skb_queue, SKB_DROP_REASON_QUEUE_PURGE); kfree(rq); } diff --git a/drivers/net/netdevsim/netdevsim.h b/drivers/net/netdevsim/netdevsim.h index 96d54c08043d3a62b0731efd43bc6a313998bf01..e757f85ed8617bb13ed0bf0e367803e4ddbd8e95 100644 --- a/drivers/net/netdevsim/netdevsim.h +++ b/drivers/net/netdevsim/netdevsim.h @@ -97,6 +97,7 @@ struct nsim_rq { struct napi_struct napi; struct sk_buff_head skb_queue; struct page_pool *page_pool; + struct hrtimer napi_timer; }; struct netdevsim {
The netdevsim driver was experiencing NOHZ tick-stop errors during packet transmission due to pending softirq work when calling napi_schedule(). This issue was observed when running the netconsole selftest, which triggered the following error message: NOHZ tick-stop error: local softirq work is pending, handler #08!!! To fix this issue, introduce a timer that schedules napi_schedule() from a timer context instead of calling it directly from the TX path. Create an hrtimer for each queue and kick it from the TX path, which then schedules napi_schedule() from the timer context. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Breno Leitao <leitao@debian.org> --- Changes in v3: - Move the timer initialization and cancel close to the queue allocation/free (Jakub) - Link to v2: https://lore.kernel.org/r/20250217-netdevsim-v2-1-fc7fe177b98f@debian.org Changes in v2: - The approach implemented in v1 will not work, given that ndo_start_xmit() can be called with interrupt disable, and calling local_bh_enable() inside that function has nasty side effected. Jakub suggested creating a timer and calling napi_schedule() from that timer. - Link to v1: https://lore.kernel.org/r/20250212-netdevsim-v1-1-20ece94daae8@debian.org --- drivers/net/netdevsim/netdev.c | 21 ++++++++++++++++++++- drivers/net/netdevsim/netdevsim.h | 1 + 2 files changed, 21 insertions(+), 1 deletion(-) --- base-commit: 0784d83df3bfc977c13252a0599be924f0afa68d change-id: 20250212-netdevsim-258d2d628175 Best regards,