diff mbox series

[net,v3] net: ks8851: Queue RX packets in IRQ handler instead of disabling BHs

Message ID 20240502183436.117117-1-marex@denx.de (mailing list archive)
State Accepted
Commit e0863634bf9f7cf36291ebb5bfa2d16632f79c49
Delegated to: Netdev Maintainers
Headers show
Series [net,v3] net: ks8851: Queue RX packets in IRQ handler instead of disabling BHs | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 926 this patch: 926
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 937 this patch: 937
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 937 this patch: 937
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 54 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 9 this patch: 9
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-05-03--15-00 (tests: 1001)

Commit Message

Marek Vasut May 2, 2024, 6:32 p.m. UTC
Currently the driver uses local_bh_disable()/local_bh_enable() in its
IRQ handler to avoid triggering net_rx_action() softirq on exit from
netif_rx(). The net_rx_action() could trigger this driver .start_xmit
callback, which is protected by the same lock as the IRQ handler, so
calling the .start_xmit from netif_rx() from the IRQ handler critical
section protected by the lock could lead to an attempt to claim the
already claimed lock, and a hang.

The local_bh_disable()/local_bh_enable() approach works only in case
the IRQ handler is protected by a spinlock, but does not work if the
IRQ handler is protected by mutex, i.e. this works for KS8851 with
Parallel bus interface, but not for KS8851 with SPI bus interface.

Remove the BH manipulation and instead of calling netif_rx() inside
the IRQ handler code protected by the lock, queue all the received
SKBs in the IRQ handler into a queue first, and once the IRQ handler
exits the critical section protected by the lock, dequeue all the
queued SKBs and push them all into netif_rx(). At this point, it is
safe to trigger the net_rx_action() softirq, since the netif_rx()
call is outside of the lock that protects the IRQ handler.

Fixes: be0384bf599c ("net: ks8851: Handle softirqs at the end of IRQ thread to fix hang")
Tested-by: Ronald Wahl <ronald.wahl@raritan.com> # KS8851 SPI
Signed-off-by: Marek Vasut <marex@denx.de>
---
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Ronald Wahl <ronald.wahl@raritan.com>
Cc: Simon Horman <horms@kernel.org>
Cc: netdev@vger.kernel.org
---
V2: - Add TB from Ronald
    - Operate private skb queue without locking as suggested by Eric
V3: - Put the RX queue on stack
    - Only set up the RX queue if there is RX IRQ
    - Update the netif_rx while loop per upstream feedback
---
Note: This is basically what Jakub originally suggested in
      https://patchwork.kernel.org/project/netdevbpf/patch/20240331142353.93792-2-marex@denx.de/#25785606
---
 drivers/net/ethernet/micrel/ks8851_common.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

Comments

Eric Dumazet May 3, 2024, 7:08 a.m. UTC | #1
On Thu, May 2, 2024 at 8:34 PM Marek Vasut <marex@denx.de> wrote:
>
> Currently the driver uses local_bh_disable()/local_bh_enable() in its
> IRQ handler to avoid triggering net_rx_action() softirq on exit from
> netif_rx(). The net_rx_action() could trigger this driver .start_xmit
> callback, which is protected by the same lock as the IRQ handler, so
> calling the .start_xmit from netif_rx() from the IRQ handler critical
> section protected by the lock could lead to an attempt to claim the
> already claimed lock, and a hang.
>
> The local_bh_disable()/local_bh_enable() approach works only in case
> the IRQ handler is protected by a spinlock, but does not work if the
> IRQ handler is protected by mutex, i.e. this works for KS8851 with
> Parallel bus interface, but not for KS8851 with SPI bus interface.
>
> Remove the BH manipulation and instead of calling netif_rx() inside
> the IRQ handler code protected by the lock, queue all the received
> SKBs in the IRQ handler into a queue first, and once the IRQ handler
> exits the critical section protected by the lock, dequeue all the
> queued SKBs and push them all into netif_rx(). At this point, it is
> safe to trigger the net_rx_action() softirq, since the netif_rx()
> call is outside of the lock that protects the IRQ handler.
>
> Fixes: be0384bf599c ("net: ks8851: Handle softirqs at the end of IRQ thread to fix hang")
> Tested-by: Ronald Wahl <ronald.wahl@raritan.com> # KS8851 SPI
> Signed-off-by: Marek Vasut <marex@denx.de>

Reviewed-by: Eric Dumazet <edumazet@google.com>
Marek Vasut May 3, 2024, 8:03 p.m. UTC | #2
On 5/3/24 9:08 AM, Eric Dumazet wrote:
> On Thu, May 2, 2024 at 8:34 PM Marek Vasut <marex@denx.de> wrote:
>>
>> Currently the driver uses local_bh_disable()/local_bh_enable() in its
>> IRQ handler to avoid triggering net_rx_action() softirq on exit from
>> netif_rx(). The net_rx_action() could trigger this driver .start_xmit
>> callback, which is protected by the same lock as the IRQ handler, so
>> calling the .start_xmit from netif_rx() from the IRQ handler critical
>> section protected by the lock could lead to an attempt to claim the
>> already claimed lock, and a hang.
>>
>> The local_bh_disable()/local_bh_enable() approach works only in case
>> the IRQ handler is protected by a spinlock, but does not work if the
>> IRQ handler is protected by mutex, i.e. this works for KS8851 with
>> Parallel bus interface, but not for KS8851 with SPI bus interface.
>>
>> Remove the BH manipulation and instead of calling netif_rx() inside
>> the IRQ handler code protected by the lock, queue all the received
>> SKBs in the IRQ handler into a queue first, and once the IRQ handler
>> exits the critical section protected by the lock, dequeue all the
>> queued SKBs and push them all into netif_rx(). At this point, it is
>> safe to trigger the net_rx_action() softirq, since the netif_rx()
>> call is outside of the lock that protects the IRQ handler.
>>
>> Fixes: be0384bf599c ("net: ks8851: Handle softirqs at the end of IRQ thread to fix hang")
>> Tested-by: Ronald Wahl <ronald.wahl@raritan.com> # KS8851 SPI
>> Signed-off-by: Marek Vasut <marex@denx.de>
> 
> Reviewed-by: Eric Dumazet <edumazet@google.com>

Thank you and Jakub for your help with this.
patchwork-bot+netdevbpf@kernel.org May 3, 2024, 10:30 p.m. UTC | #3
Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu,  2 May 2024 20:32:59 +0200 you wrote:
> Currently the driver uses local_bh_disable()/local_bh_enable() in its
> IRQ handler to avoid triggering net_rx_action() softirq on exit from
> netif_rx(). The net_rx_action() could trigger this driver .start_xmit
> callback, which is protected by the same lock as the IRQ handler, so
> calling the .start_xmit from netif_rx() from the IRQ handler critical
> section protected by the lock could lead to an attempt to claim the
> already claimed lock, and a hang.
> 
> [...]

Here is the summary with links:
  - [net,v3] net: ks8851: Queue RX packets in IRQ handler instead of disabling BHs
    https://git.kernel.org/netdev/net/c/e0863634bf9f

You are awesome, thank you!
diff mbox series

Patch

diff --git a/drivers/net/ethernet/micrel/ks8851_common.c b/drivers/net/ethernet/micrel/ks8851_common.c
index d4cdf3d4f5525..502518cdb4618 100644
--- a/drivers/net/ethernet/micrel/ks8851_common.c
+++ b/drivers/net/ethernet/micrel/ks8851_common.c
@@ -234,12 +234,13 @@  static void ks8851_dbg_dumpkkt(struct ks8851_net *ks, u8 *rxpkt)
 /**
  * ks8851_rx_pkts - receive packets from the host
  * @ks: The device information.
+ * @rxq: Queue of packets received in this function.
  *
  * This is called from the IRQ work queue when the system detects that there
  * are packets in the receive queue. Find out how many packets there are and
  * read them from the FIFO.
  */
-static void ks8851_rx_pkts(struct ks8851_net *ks)
+static void ks8851_rx_pkts(struct ks8851_net *ks, struct sk_buff_head *rxq)
 {
 	struct sk_buff *skb;
 	unsigned rxfc;
@@ -299,7 +300,7 @@  static void ks8851_rx_pkts(struct ks8851_net *ks)
 					ks8851_dbg_dumpkkt(ks, rxpkt);
 
 				skb->protocol = eth_type_trans(skb, ks->netdev);
-				__netif_rx(skb);
+				__skb_queue_tail(rxq, skb);
 
 				ks->netdev->stats.rx_packets++;
 				ks->netdev->stats.rx_bytes += rxlen;
@@ -326,11 +327,11 @@  static void ks8851_rx_pkts(struct ks8851_net *ks)
 static irqreturn_t ks8851_irq(int irq, void *_ks)
 {
 	struct ks8851_net *ks = _ks;
+	struct sk_buff_head rxq;
 	unsigned handled = 0;
 	unsigned long flags;
 	unsigned int status;
-
-	local_bh_disable();
+	struct sk_buff *skb;
 
 	ks8851_lock(ks, &flags);
 
@@ -384,7 +385,8 @@  static irqreturn_t ks8851_irq(int irq, void *_ks)
 		 * from the device so do not bother masking just the RX
 		 * from the device. */
 
-		ks8851_rx_pkts(ks);
+		__skb_queue_head_init(&rxq);
+		ks8851_rx_pkts(ks, &rxq);
 	}
 
 	/* if something stopped the rx process, probably due to wanting
@@ -408,7 +410,9 @@  static irqreturn_t ks8851_irq(int irq, void *_ks)
 	if (status & IRQ_LCI)
 		mii_check_link(&ks->mii);
 
-	local_bh_enable();
+	if (status & IRQ_RXI)
+		while ((skb = __skb_dequeue(&rxq)))
+			netif_rx(skb);
 
 	return IRQ_HANDLED;
 }