Message ID | 20240405203204.82062-2-marex@denx.de (mailing list archive) |
---|---|
State | Accepted |
Commit | be0384bf599cf1eb8d337517feeb732d71f75a6f |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v2,1/2] net: ks8851: Inline ks8851_rx_skb() | expand |
On Fri, Apr 05, 2024 at 10:30:40PM +0200, Marek Vasut wrote: > The ks8851_irq() thread may call ks8851_rx_pkts() in case there are > any packets in the MAC FIFO, which calls netif_rx(). This netif_rx() > implementation is guarded by local_bh_disable() and local_bh_enable(). > The local_bh_enable() may call do_softirq() to run softirqs in case > any are pending. One of the softirqs is net_rx_action, which ultimately > reaches the driver .start_xmit callback. If that happens, the system > hangs. The entire call chain is below: > > ks8851_start_xmit_par from netdev_start_xmit > netdev_start_xmit from dev_hard_start_xmit > dev_hard_start_xmit from sch_direct_xmit > sch_direct_xmit from __dev_queue_xmit > __dev_queue_xmit from __neigh_update > __neigh_update from neigh_update > neigh_update from arp_process.constprop.0 > arp_process.constprop.0 from __netif_receive_skb_one_core > __netif_receive_skb_one_core from process_backlog > process_backlog from __napi_poll.constprop.0 > __napi_poll.constprop.0 from net_rx_action > net_rx_action from __do_softirq > __do_softirq from call_with_stack > call_with_stack from do_softirq > do_softirq from __local_bh_enable_ip > __local_bh_enable_ip from netif_rx > netif_rx from ks8851_irq > ks8851_irq from irq_thread_fn > irq_thread_fn from irq_thread > irq_thread from kthread > kthread from ret_from_fork These lines are unneeded (in case you need a new version, you can drop them). > The hang happens because ks8851_irq() first locks a spinlock in > ks8851_par.c ks8851_lock_par() spin_lock_irqsave(&ksp->lock, ...) > and with that spinlock locked, calls netif_rx(). Once the execution > reaches ks8851_start_xmit_par(), it calls ks8851_lock_par() again > which attempts to claim the already locked spinlock again, and the > hang happens. > > Move the do_softirq() call outside of the spinlock protected section > of ks8851_irq() by disabling BHs around the entire spinlock protected > section of ks8851_irq() handler. Place local_bh_enable() outside of > the spinlock protected section, so that it can trigger do_softirq() > without the ks8851_par.c ks8851_lock_par() spinlock being held, and > safely call ks8851_start_xmit_par() without attempting to lock the > already locked spinlock. > > Since ks8851_irq() is protected by local_bh_disable()/local_bh_enable() > now, replace netif_rx() with __netif_rx() which is not duplicating the > local_bh_disable()/local_bh_enable() calls.
On 4/8/24 4:52 PM, Andy Shevchenko wrote: > On Fri, Apr 05, 2024 at 10:30:40PM +0200, Marek Vasut wrote: >> The ks8851_irq() thread may call ks8851_rx_pkts() in case there are >> any packets in the MAC FIFO, which calls netif_rx(). This netif_rx() >> implementation is guarded by local_bh_disable() and local_bh_enable(). >> The local_bh_enable() may call do_softirq() to run softirqs in case >> any are pending. One of the softirqs is net_rx_action, which ultimately >> reaches the driver .start_xmit callback. If that happens, the system >> hangs. The entire call chain is below: >> >> ks8851_start_xmit_par from netdev_start_xmit >> netdev_start_xmit from dev_hard_start_xmit >> dev_hard_start_xmit from sch_direct_xmit >> sch_direct_xmit from __dev_queue_xmit >> __dev_queue_xmit from __neigh_update >> __neigh_update from neigh_update >> neigh_update from arp_process.constprop.0 >> arp_process.constprop.0 from __netif_receive_skb_one_core >> __netif_receive_skb_one_core from process_backlog >> process_backlog from __napi_poll.constprop.0 >> __napi_poll.constprop.0 from net_rx_action >> net_rx_action from __do_softirq >> __do_softirq from call_with_stack >> call_with_stack from do_softirq >> do_softirq from __local_bh_enable_ip >> __local_bh_enable_ip from netif_rx >> netif_rx from ks8851_irq >> ks8851_irq from irq_thread_fn > >> irq_thread_fn from irq_thread >> irq_thread from kthread >> kthread from ret_from_fork > > These lines are unneeded (in case you need a new version, you can drop them). I just got back and going through a mountain of email, I see Jakub already picked the V2, so, noted for next time. Thank you !
On Fri, 12 Apr 2024 13:29:04 +0200 Marek Vasut wrote: > >> irq_thread_fn from irq_thread > >> irq_thread from kthread > >> kthread from ret_from_fork > > > > These lines are unneeded (in case you need a new version, you can drop them). > > I just got back and going through a mountain of email, I see Jakub > already picked the V2, so, noted for next time. Thank you ! Whether the stack trace is for a hard IRQ or threaded IRQ is the first thing I looked for when reviewing. Change is about the calling context. So I figured while not strictly necessary, in this particular case, these lines may be helpful for people eyeballing the change... In general, yes, trimming the bottom of the stack is good hygiene.
On Fri, 12 Apr 2024 07:52:20 -0700 Jakub Kicinski wrote:
> In general, yes, trimming the bottom of the stack is good hygiene.
That sentence puts the words "bottom" and "hygiene" uncomfortably close
together. Oh, well, it's Friday..
On Fri, Apr 12, 2024 at 07:54:14AM -0700, Jakub Kicinski wrote: > On Fri, 12 Apr 2024 07:52:20 -0700 Jakub Kicinski wrote: > > In general, yes, trimming the bottom of the stack is good hygiene. > > That sentence puts the words "bottom" and "hygiene" uncomfortably close > together. Oh, well, it's Friday.. Oh, nice! Have a good one, and take your hygiene seriously even on Fridays!
diff --git a/drivers/net/ethernet/micrel/ks8851_common.c b/drivers/net/ethernet/micrel/ks8851_common.c index 896d43bb8883d..d4cdf3d4f5525 100644 --- a/drivers/net/ethernet/micrel/ks8851_common.c +++ b/drivers/net/ethernet/micrel/ks8851_common.c @@ -299,7 +299,7 @@ static void ks8851_rx_pkts(struct ks8851_net *ks) ks8851_dbg_dumpkkt(ks, rxpkt); skb->protocol = eth_type_trans(skb, ks->netdev); - netif_rx(skb); + __netif_rx(skb); ks->netdev->stats.rx_packets++; ks->netdev->stats.rx_bytes += rxlen; @@ -330,6 +330,8 @@ static irqreturn_t ks8851_irq(int irq, void *_ks) unsigned long flags; unsigned int status; + local_bh_disable(); + ks8851_lock(ks, &flags); status = ks8851_rdreg16(ks, KS_ISR); @@ -406,6 +408,8 @@ static irqreturn_t ks8851_irq(int irq, void *_ks) if (status & IRQ_LCI) mii_check_link(&ks->mii); + local_bh_enable(); + return IRQ_HANDLED; }
The ks8851_irq() thread may call ks8851_rx_pkts() in case there are any packets in the MAC FIFO, which calls netif_rx(). This netif_rx() implementation is guarded by local_bh_disable() and local_bh_enable(). The local_bh_enable() may call do_softirq() to run softirqs in case any are pending. One of the softirqs is net_rx_action, which ultimately reaches the driver .start_xmit callback. If that happens, the system hangs. The entire call chain is below: ks8851_start_xmit_par from netdev_start_xmit netdev_start_xmit from dev_hard_start_xmit dev_hard_start_xmit from sch_direct_xmit sch_direct_xmit from __dev_queue_xmit __dev_queue_xmit from __neigh_update __neigh_update from neigh_update neigh_update from arp_process.constprop.0 arp_process.constprop.0 from __netif_receive_skb_one_core __netif_receive_skb_one_core from process_backlog process_backlog from __napi_poll.constprop.0 __napi_poll.constprop.0 from net_rx_action net_rx_action from __do_softirq __do_softirq from call_with_stack call_with_stack from do_softirq do_softirq from __local_bh_enable_ip __local_bh_enable_ip from netif_rx netif_rx from ks8851_irq ks8851_irq from irq_thread_fn irq_thread_fn from irq_thread irq_thread from kthread kthread from ret_from_fork The hang happens because ks8851_irq() first locks a spinlock in ks8851_par.c ks8851_lock_par() spin_lock_irqsave(&ksp->lock, ...) and with that spinlock locked, calls netif_rx(). Once the execution reaches ks8851_start_xmit_par(), it calls ks8851_lock_par() again which attempts to claim the already locked spinlock again, and the hang happens. Move the do_softirq() call outside of the spinlock protected section of ks8851_irq() by disabling BHs around the entire spinlock protected section of ks8851_irq() handler. Place local_bh_enable() outside of the spinlock protected section, so that it can trigger do_softirq() without the ks8851_par.c ks8851_lock_par() spinlock being held, and safely call ks8851_start_xmit_par() without attempting to lock the already locked spinlock. Since ks8851_irq() is protected by local_bh_disable()/local_bh_enable() now, replace netif_rx() with __netif_rx() which is not duplicating the local_bh_disable()/local_bh_enable() calls. Fixes: 797047f875b5 ("net: ks8851: Implement Parallel bus operations") Signed-off-by: Marek Vasut <marex@denx.de> --- Cc: "David S. Miller" <davem@davemloft.net> Cc: "Uwe Kleine-König" <u.kleine-koenig@pengutronix.de> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Ratheesh Kannoth <rkannoth@marvell.com> Cc: Ronald Wahl <ronald.wahl@raritan.com> Cc: Simon Horman <horms@kernel.org> Cc: netdev@vger.kernel.org --- V2: - Drop the need_bh_off test, the test was always true - Fill in 'net' subject prefix --- drivers/net/ethernet/micrel/ks8851_common.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)