Message ID | 20250212-netdevsim-v1-1-20ece94daae8@debian.org (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] netdevsim: disable local BH when scheduling NAPI | expand |
On Wed, Feb 12, 2025 at 7:34 PM Breno Leitao <leitao@debian.org> wrote: > > The netdevsim driver was getting NOHZ tick-stop errors during packet > transmission due to pending softirq work when calling napi_schedule(). > > This is showing the following message when running netconsole selftest. > > NOHZ tick-stop error: local softirq work is pending, handler #08!!! > > Add local_bh_disable()/enable() around the napi_schedule() call to > prevent softirqs from being handled during this xmit. > > Cc: stable@vger.kernel.org > Fixes: 3762ec05a9fb ("netdevsim: add NAPI support") > Suggested-by: Jakub Kicinski <kuba@kernel.org> > Signed-off-by: Breno Leitao <leitao@debian.org> > --- > drivers/net/netdevsim/netdev.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c > index 42f247cbdceecbadf27f7090c030aa5bd240c18a..6aeb081b06da226ab91c49f53d08f465570877ae 100644 > --- a/drivers/net/netdevsim/netdev.c > +++ b/drivers/net/netdevsim/netdev.c > @@ -87,7 +87,9 @@ static netdev_tx_t nsim_start_xmit(struct sk_buff *skb, struct net_device *dev) > if (unlikely(nsim_forward_skb(peer_dev, skb, rq) == NET_RX_DROP)) > goto out_drop_cnt; > > + local_bh_disable(); > napi_schedule(&rq->napi); > + local_bh_enable(); > I thought all ndo_start_xmit() were done under local_bh_disable() Could you give more details ?
On 02/12, Eric Dumazet wrote: > On Wed, Feb 12, 2025 at 7:34 PM Breno Leitao <leitao@debian.org> wrote: > > > > The netdevsim driver was getting NOHZ tick-stop errors during packet > > transmission due to pending softirq work when calling napi_schedule(). > > > > This is showing the following message when running netconsole selftest. > > > > NOHZ tick-stop error: local softirq work is pending, handler #08!!! > > > > Add local_bh_disable()/enable() around the napi_schedule() call to > > prevent softirqs from being handled during this xmit. > > > > Cc: stable@vger.kernel.org > > Fixes: 3762ec05a9fb ("netdevsim: add NAPI support") > > Suggested-by: Jakub Kicinski <kuba@kernel.org> > > Signed-off-by: Breno Leitao <leitao@debian.org> > > --- > > drivers/net/netdevsim/netdev.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c > > index 42f247cbdceecbadf27f7090c030aa5bd240c18a..6aeb081b06da226ab91c49f53d08f465570877ae 100644 > > --- a/drivers/net/netdevsim/netdev.c > > +++ b/drivers/net/netdevsim/netdev.c > > @@ -87,7 +87,9 @@ static netdev_tx_t nsim_start_xmit(struct sk_buff *skb, struct net_device *dev) > > if (unlikely(nsim_forward_skb(peer_dev, skb, rq) == NET_RX_DROP)) > > goto out_drop_cnt; > > > > + local_bh_disable(); > > napi_schedule(&rq->napi); > > + local_bh_enable(); > > > > I thought all ndo_start_xmit() were done under local_bh_disable() > > Could you give more details ? Not 100% sure this patch is the culprit, but looks related: https://netdev-3.bots.linux.dev/vmksft-net-drv-dbg/results/989901/5-netcons-fragmented-msg-sh/stderr --- pw-bot: cr
Hello Eric, On Wed, Feb 12, 2025 at 07:55:32PM +0100, Eric Dumazet wrote: > On Wed, Feb 12, 2025 at 7:34 PM Breno Leitao <leitao@debian.org> wrote: > > > > --- a/drivers/net/netdevsim/netdev.c > > +++ b/drivers/net/netdevsim/netdev.c > > @@ -87,7 +87,9 @@ static netdev_tx_t nsim_start_xmit(struct sk_buff *skb, struct net_device *dev) > > if (unlikely(nsim_forward_skb(peer_dev, skb, rq) == NET_RX_DROP)) > > goto out_drop_cnt; > > > > + local_bh_disable(); > > napi_schedule(&rq->napi); > > + local_bh_enable(); > > > > I thought all ndo_start_xmit() were done under local_bh_disable() I think it depends on the path? > Could you give more details ? There are several paths to ndo_start_xmit(), and please correct me if I am reading the code wrongly here. Common path: __dev_direct_xmit() local_bh_disable(); netdev_start_xmit() __netdev_start_xmit() ops->ndo_start_xmit(skb, dev); But, in some other cases, I see: netpoll_start_xmit() netdev_start_xmit() .... My reading is that not all cases have local_bh_disable() disabled before calling ndo_start_xmit(). Question: Must BH be disabled before calling ndo_start_xmit()? If so, the problem might be in the netpoll code!? Also, is it worth adding a DEBUG_NET_WARN_ON_ONCE()? Note: Jakub gave another suggestion on how to fix this, so, I send a v2 with a different approach: https://lore.kernel.org/all/20250213071426.01490615@kernel.org/ Thanks for the review! --breno
diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c index 42f247cbdceecbadf27f7090c030aa5bd240c18a..6aeb081b06da226ab91c49f53d08f465570877ae 100644 --- a/drivers/net/netdevsim/netdev.c +++ b/drivers/net/netdevsim/netdev.c @@ -87,7 +87,9 @@ static netdev_tx_t nsim_start_xmit(struct sk_buff *skb, struct net_device *dev) if (unlikely(nsim_forward_skb(peer_dev, skb, rq) == NET_RX_DROP)) goto out_drop_cnt; + local_bh_disable(); napi_schedule(&rq->napi); + local_bh_enable(); rcu_read_unlock(); u64_stats_update_begin(&ns->syncp);
The netdevsim driver was getting NOHZ tick-stop errors during packet transmission due to pending softirq work when calling napi_schedule(). This is showing the following message when running netconsole selftest. NOHZ tick-stop error: local softirq work is pending, handler #08!!! Add local_bh_disable()/enable() around the napi_schedule() call to prevent softirqs from being handled during this xmit. Cc: stable@vger.kernel.org Fixes: 3762ec05a9fb ("netdevsim: add NAPI support") Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Breno Leitao <leitao@debian.org> --- drivers/net/netdevsim/netdev.c | 2 ++ 1 file changed, 2 insertions(+) --- base-commit: cf33d96f50903214226b379b3f10d1f262dae018 change-id: 20250212-netdevsim-258d2d628175 Best regards,