diff mbox series

ks8851: Fix deadlock with the SPI chip variant

Message ID 20240703160053.9892-1-rwahl@gmx.de (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series ks8851: Fix deadlock with the SPI chip variant | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 839 this patch: 839
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 1 maintainers not CCed: marex@denx.de
netdev/build_clang success Errors and warnings before: 846 this patch: 846
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 846 this patch: 846
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 19 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 9 this patch: 9
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-07-04--00-00 (tests: 663)

Commit Message

Ronald Wahl July 3, 2024, 4 p.m. UTC
From: Ronald Wahl <ronald.wahl@raritan.com>

When SMP is enabled and spinlocks are actually functional then there is
a deadlock with the 'statelock' spinlock between ks8851_start_xmit_spi
and ks8851_irq:

    watchdog: BUG: soft lockup - CPU#0 stuck for 27s!
    call trace:
      queued_spin_lock_slowpath+0x100/0x284
      do_raw_spin_lock+0x34/0x44
      ks8851_start_xmit_spi+0x30/0xb8
      ks8851_start_xmit+0x14/0x20
      netdev_start_xmit+0x40/0x6c
      dev_hard_start_xmit+0x6c/0xbc
      sch_direct_xmit+0xa4/0x22c
      __qdisc_run+0x138/0x3fc
      qdisc_run+0x24/0x3c
      net_tx_action+0xf8/0x130
      handle_softirqs+0x1ac/0x1f0
      __do_softirq+0x14/0x20
      ____do_softirq+0x10/0x1c
      call_on_irq_stack+0x3c/0x58
      do_softirq_own_stack+0x1c/0x28
      __irq_exit_rcu+0x54/0x9c
      irq_exit_rcu+0x10/0x1c
      el1_interrupt+0x38/0x50
      el1h_64_irq_handler+0x18/0x24
      el1h_64_irq+0x64/0x68
      __netif_schedule+0x6c/0x80
      netif_tx_wake_queue+0x38/0x48
      ks8851_irq+0xb8/0x2c8
      irq_thread_fn+0x2c/0x74
      irq_thread+0x10c/0x1b0
      kthread+0xc8/0xd8
      ret_from_fork+0x10/0x20

This issue has not been identified earlier because tests were done on
a device with SMP disabled and so spinlocks were actually NOPs.

This commit moves the netif_wake_queue call outside the spinlock
protected area.

Fixes: 3dc5d4454545 ("net: ks8851: Fix TX stall caused by TX buffer overrun")
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Simon Horman <horms@kernel.org>
Cc: netdev@vger.kernel.org
Cc: stable@vger.kernel.org # 5.10+
Signed-off-by: Ronald Wahl <ronald.wahl@raritan.com>
---
 drivers/net/ethernet/micrel/ks8851_common.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--
2.45.2

Comments

Jakub Kicinski July 4, 2024, 2:44 p.m. UTC | #1
On Wed,  3 Jul 2024 18:00:53 +0200 Ronald Wahl wrote:
> +		bool need_wake_queue;
> 
>  		netif_dbg(ks, intr, ks->netdev,
>  			  "%s: txspace %d\n", __func__, tx_space);
> 
>  		spin_lock(&ks->statelock);
>  		ks->tx_space = tx_space;
> -		if (netif_queue_stopped(ks->netdev))
> -			netif_wake_queue(ks->netdev);
> +		need_wake_queue = netif_queue_stopped(ks->netdev);
>  		spin_unlock(&ks->statelock);
> +		if (need_wake_queue)
> +			netif_wake_queue(ks->netdev);

xmit runs in BH, this is just one way you can hit this deadlock
better fix would be to make sure statelock is always taken
using spin_lock_bh()
Ronald Wahl July 4, 2024, 8:18 p.m. UTC | #2
Thanks, I made a v2.

I now also found another potential TX stall issue caused by improper
locking. In ks8851_tx_work we need to move

   last = skb_queue_empty(&ks->txq);

under the lock or otherwise risk a TX stall because in case the queue
was empty and has meanwhile being completely filled while we were
waiting for the lock. I need to double check this scenario first. If it
is indeed an issue then I will provide a separate patch later.

On 04.07.24 16:44, Jakub Kicinski wrote:
> On Wed,  3 Jul 2024 18:00:53 +0200 Ronald Wahl wrote:
>> +            bool need_wake_queue;
>>
>>              netif_dbg(ks, intr, ks->netdev,
>>                        "%s: txspace %d\n", __func__, tx_space);
>>
>>              spin_lock(&ks->statelock);
>>              ks->tx_space = tx_space;
>> -            if (netif_queue_stopped(ks->netdev))
>> -                    netif_wake_queue(ks->netdev);
>> +            need_wake_queue = netif_queue_stopped(ks->netdev);
>>              spin_unlock(&ks->statelock);
>> +            if (need_wake_queue)
>> +                    netif_wake_queue(ks->netdev);
>
> xmit runs in BH, this is just one way you can hit this deadlock
> better fix would be to make sure statelock is always taken
> using spin_lock_bh()
diff mbox series

Patch

diff --git a/drivers/net/ethernet/micrel/ks8851_common.c b/drivers/net/ethernet/micrel/ks8851_common.c
index 6453c92f0fa7..60b959126b26 100644
--- a/drivers/net/ethernet/micrel/ks8851_common.c
+++ b/drivers/net/ethernet/micrel/ks8851_common.c
@@ -348,15 +348,17 @@  static irqreturn_t ks8851_irq(int irq, void *_ks)

 	if (status & IRQ_TXI) {
 		unsigned short tx_space = ks8851_rdreg16(ks, KS_TXMIR);
+		bool need_wake_queue;

 		netif_dbg(ks, intr, ks->netdev,
 			  "%s: txspace %d\n", __func__, tx_space);

 		spin_lock(&ks->statelock);
 		ks->tx_space = tx_space;
-		if (netif_queue_stopped(ks->netdev))
-			netif_wake_queue(ks->netdev);
+		need_wake_queue = netif_queue_stopped(ks->netdev);
 		spin_unlock(&ks->statelock);
+		if (need_wake_queue)
+			netif_wake_queue(ks->netdev);
 	}

 	if (status & IRQ_SPIBEI) {