diff mbox series

[net,v2,2/2] net: ks8851: Handle softirqs at the end of IRQ thread to fix hang

Message ID 20240405203204.82062-2-marex@denx.de (mailing list archive)
State Accepted
Commit be0384bf599cf1eb8d337517feeb732d71f75a6f
Delegated to: Netdev Maintainers
Headers show
Series [net,v2,1/2] net: ks8851: Inline ks8851_rx_skb() | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 942 this patch: 942
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers fail 1 blamed authors not CCed: andrew@lunn.ch; 1 maintainers not CCed: andrew@lunn.ch
netdev/build_clang success Errors and warnings before: 953 this patch: 953
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 953 this patch: 953
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 24 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 9 this patch: 9
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-04-07--00-00 (tests: 956)

Commit Message

Marek Vasut April 5, 2024, 8:30 p.m. UTC
The ks8851_irq() thread may call ks8851_rx_pkts() in case there are
any packets in the MAC FIFO, which calls netif_rx(). This netif_rx()
implementation is guarded by local_bh_disable() and local_bh_enable().
The local_bh_enable() may call do_softirq() to run softirqs in case
any are pending. One of the softirqs is net_rx_action, which ultimately
reaches the driver .start_xmit callback. If that happens, the system
hangs. The entire call chain is below:

ks8851_start_xmit_par from netdev_start_xmit
netdev_start_xmit from dev_hard_start_xmit
dev_hard_start_xmit from sch_direct_xmit
sch_direct_xmit from __dev_queue_xmit
__dev_queue_xmit from __neigh_update
__neigh_update from neigh_update
neigh_update from arp_process.constprop.0
arp_process.constprop.0 from __netif_receive_skb_one_core
__netif_receive_skb_one_core from process_backlog
process_backlog from __napi_poll.constprop.0
__napi_poll.constprop.0 from net_rx_action
net_rx_action from __do_softirq
__do_softirq from call_with_stack
call_with_stack from do_softirq
do_softirq from __local_bh_enable_ip
__local_bh_enable_ip from netif_rx
netif_rx from ks8851_irq
ks8851_irq from irq_thread_fn
irq_thread_fn from irq_thread
irq_thread from kthread
kthread from ret_from_fork

The hang happens because ks8851_irq() first locks a spinlock in
ks8851_par.c ks8851_lock_par() spin_lock_irqsave(&ksp->lock, ...)
and with that spinlock locked, calls netif_rx(). Once the execution
reaches ks8851_start_xmit_par(), it calls ks8851_lock_par() again
which attempts to claim the already locked spinlock again, and the
hang happens.

Move the do_softirq() call outside of the spinlock protected section
of ks8851_irq() by disabling BHs around the entire spinlock protected
section of ks8851_irq() handler. Place local_bh_enable() outside of
the spinlock protected section, so that it can trigger do_softirq()
without the ks8851_par.c ks8851_lock_par() spinlock being held, and
safely call ks8851_start_xmit_par() without attempting to lock the
already locked spinlock.

Since ks8851_irq() is protected by local_bh_disable()/local_bh_enable()
now, replace netif_rx() with __netif_rx() which is not duplicating the
local_bh_disable()/local_bh_enable() calls.

Fixes: 797047f875b5 ("net: ks8851: Implement Parallel bus operations")
Signed-off-by: Marek Vasut <marex@denx.de>
---
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Uwe Kleine-König" <u.kleine-koenig@pengutronix.de>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Ratheesh Kannoth <rkannoth@marvell.com>
Cc: Ronald Wahl <ronald.wahl@raritan.com>
Cc: Simon Horman <horms@kernel.org>
Cc: netdev@vger.kernel.org
---
V2: - Drop the need_bh_off test, the test was always true
    - Fill in 'net' subject prefix
---
 drivers/net/ethernet/micrel/ks8851_common.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Andy Shevchenko April 8, 2024, 2:52 p.m. UTC | #1
On Fri, Apr 05, 2024 at 10:30:40PM +0200, Marek Vasut wrote:
> The ks8851_irq() thread may call ks8851_rx_pkts() in case there are
> any packets in the MAC FIFO, which calls netif_rx(). This netif_rx()
> implementation is guarded by local_bh_disable() and local_bh_enable().
> The local_bh_enable() may call do_softirq() to run softirqs in case
> any are pending. One of the softirqs is net_rx_action, which ultimately
> reaches the driver .start_xmit callback. If that happens, the system
> hangs. The entire call chain is below:
> 
> ks8851_start_xmit_par from netdev_start_xmit
> netdev_start_xmit from dev_hard_start_xmit
> dev_hard_start_xmit from sch_direct_xmit
> sch_direct_xmit from __dev_queue_xmit
> __dev_queue_xmit from __neigh_update
> __neigh_update from neigh_update
> neigh_update from arp_process.constprop.0
> arp_process.constprop.0 from __netif_receive_skb_one_core
> __netif_receive_skb_one_core from process_backlog
> process_backlog from __napi_poll.constprop.0
> __napi_poll.constprop.0 from net_rx_action
> net_rx_action from __do_softirq
> __do_softirq from call_with_stack
> call_with_stack from do_softirq
> do_softirq from __local_bh_enable_ip
> __local_bh_enable_ip from netif_rx
> netif_rx from ks8851_irq
> ks8851_irq from irq_thread_fn

> irq_thread_fn from irq_thread
> irq_thread from kthread
> kthread from ret_from_fork

These lines are unneeded (in case you need a new version, you can drop them).

> The hang happens because ks8851_irq() first locks a spinlock in
> ks8851_par.c ks8851_lock_par() spin_lock_irqsave(&ksp->lock, ...)
> and with that spinlock locked, calls netif_rx(). Once the execution
> reaches ks8851_start_xmit_par(), it calls ks8851_lock_par() again
> which attempts to claim the already locked spinlock again, and the
> hang happens.
> 
> Move the do_softirq() call outside of the spinlock protected section
> of ks8851_irq() by disabling BHs around the entire spinlock protected
> section of ks8851_irq() handler. Place local_bh_enable() outside of
> the spinlock protected section, so that it can trigger do_softirq()
> without the ks8851_par.c ks8851_lock_par() spinlock being held, and
> safely call ks8851_start_xmit_par() without attempting to lock the
> already locked spinlock.
> 
> Since ks8851_irq() is protected by local_bh_disable()/local_bh_enable()
> now, replace netif_rx() with __netif_rx() which is not duplicating the
> local_bh_disable()/local_bh_enable() calls.
Marek Vasut April 12, 2024, 11:29 a.m. UTC | #2
On 4/8/24 4:52 PM, Andy Shevchenko wrote:
> On Fri, Apr 05, 2024 at 10:30:40PM +0200, Marek Vasut wrote:
>> The ks8851_irq() thread may call ks8851_rx_pkts() in case there are
>> any packets in the MAC FIFO, which calls netif_rx(). This netif_rx()
>> implementation is guarded by local_bh_disable() and local_bh_enable().
>> The local_bh_enable() may call do_softirq() to run softirqs in case
>> any are pending. One of the softirqs is net_rx_action, which ultimately
>> reaches the driver .start_xmit callback. If that happens, the system
>> hangs. The entire call chain is below:
>>
>> ks8851_start_xmit_par from netdev_start_xmit
>> netdev_start_xmit from dev_hard_start_xmit
>> dev_hard_start_xmit from sch_direct_xmit
>> sch_direct_xmit from __dev_queue_xmit
>> __dev_queue_xmit from __neigh_update
>> __neigh_update from neigh_update
>> neigh_update from arp_process.constprop.0
>> arp_process.constprop.0 from __netif_receive_skb_one_core
>> __netif_receive_skb_one_core from process_backlog
>> process_backlog from __napi_poll.constprop.0
>> __napi_poll.constprop.0 from net_rx_action
>> net_rx_action from __do_softirq
>> __do_softirq from call_with_stack
>> call_with_stack from do_softirq
>> do_softirq from __local_bh_enable_ip
>> __local_bh_enable_ip from netif_rx
>> netif_rx from ks8851_irq
>> ks8851_irq from irq_thread_fn
> 
>> irq_thread_fn from irq_thread
>> irq_thread from kthread
>> kthread from ret_from_fork
> 
> These lines are unneeded (in case you need a new version, you can drop them).

I just got back and going through a mountain of email, I see Jakub 
already picked the V2, so, noted for next time. Thank you !
Jakub Kicinski April 12, 2024, 2:52 p.m. UTC | #3
On Fri, 12 Apr 2024 13:29:04 +0200 Marek Vasut wrote:
> >> irq_thread_fn from irq_thread
> >> irq_thread from kthread
> >> kthread from ret_from_fork  
> > 
> > These lines are unneeded (in case you need a new version, you can drop them).  
> 
> I just got back and going through a mountain of email, I see Jakub 
> already picked the V2, so, noted for next time. Thank you !

Whether the stack trace is for a hard IRQ or threaded IRQ is the first
thing I looked for when reviewing. Change is about the calling context. 
So I figured while not strictly necessary, in this particular case,
these lines may be helpful for people eyeballing the change...
In general, yes, trimming the bottom of the stack is good hygiene.
Jakub Kicinski April 12, 2024, 2:54 p.m. UTC | #4
On Fri, 12 Apr 2024 07:52:20 -0700 Jakub Kicinski wrote:
> In general, yes, trimming the bottom of the stack is good hygiene.

That sentence puts the words "bottom" and "hygiene" uncomfortably close
together. Oh, well, it's Friday..
Andy Shevchenko April 12, 2024, 3:47 p.m. UTC | #5
On Fri, Apr 12, 2024 at 07:54:14AM -0700, Jakub Kicinski wrote:
> On Fri, 12 Apr 2024 07:52:20 -0700 Jakub Kicinski wrote:
> > In general, yes, trimming the bottom of the stack is good hygiene.
> 
> That sentence puts the words "bottom" and "hygiene" uncomfortably close
> together. Oh, well, it's Friday..

Oh, nice! Have a good one, and take your hygiene seriously even on Fridays!
diff mbox series

Patch

diff --git a/drivers/net/ethernet/micrel/ks8851_common.c b/drivers/net/ethernet/micrel/ks8851_common.c
index 896d43bb8883d..d4cdf3d4f5525 100644
--- a/drivers/net/ethernet/micrel/ks8851_common.c
+++ b/drivers/net/ethernet/micrel/ks8851_common.c
@@ -299,7 +299,7 @@  static void ks8851_rx_pkts(struct ks8851_net *ks)
 					ks8851_dbg_dumpkkt(ks, rxpkt);
 
 				skb->protocol = eth_type_trans(skb, ks->netdev);
-				netif_rx(skb);
+				__netif_rx(skb);
 
 				ks->netdev->stats.rx_packets++;
 				ks->netdev->stats.rx_bytes += rxlen;
@@ -330,6 +330,8 @@  static irqreturn_t ks8851_irq(int irq, void *_ks)
 	unsigned long flags;
 	unsigned int status;
 
+	local_bh_disable();
+
 	ks8851_lock(ks, &flags);
 
 	status = ks8851_rdreg16(ks, KS_ISR);
@@ -406,6 +408,8 @@  static irqreturn_t ks8851_irq(int irq, void *_ks)
 	if (status & IRQ_LCI)
 		mii_check_link(&ks->mii);
 
+	local_bh_enable();
+
 	return IRQ_HANDLED;
 }