Message ID | 20250219181556.1020029-1-ralph.siemsen@linaro.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [RFC] usb: gadget: u_ether: prevent deadlock under RT | expand |
diff --git a/drivers/usb/gadget/function/u_ether.c b/drivers/usb/gadget/function/u_ether.c index 09e2838917e2..dc511c01b741 100644 --- a/drivers/usb/gadget/function/u_ether.c +++ b/drivers/usb/gadget/function/u_ether.c @@ -233,6 +233,7 @@ static void rx_complete(struct usb_ep *ep, struct usb_request *req) if (dev->unwrap) { unsigned long flags; + local_bh_disable(); spin_lock_irqsave(&dev->lock, flags); if (dev->port_usb) { status = dev->unwrap(dev->port_usb, @@ -243,6 +244,7 @@ static void rx_complete(struct usb_ep *ep, struct usb_request *req) status = -ENOTCONN; } spin_unlock_irqrestore(&dev->lock, flags); + local_bh_enable(); } else { skb_queue_tail(&dev->rx_frames, skb); }
A deadlock can be readily triggered when using NCM gadget with the Cadence cdns3 USB driver, under heavy traffic from "iperf3 --bidir". It has been observed under 6.1, 6.6 and 6.12 kernel versions, but only on PREEMPT_RT. Once deadlocked, even magicsysrq has no effect. However, there is periodic output from the rcu stall detector: [ 71.105941] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 71.105966] rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): P125/4:b..l [ 71.105992] rcu: (detected by 1, t=5252 jiffies, g=-515, q=7 ncpus=2) [ 71.106003] task:irq/507-s-f4000 state:D stack:0 pid:125 tgid:125 ppid:2 flags:0x00000008 [ 71.106018] Call trace: [ 71.106022] __switch_to+0xf4/0x158 [ 71.106046] __schedule+0x2b4/0x920 [ 71.106055] schedule_rtlock+0x24/0x50 [ 71.106064] rtlock_slowlock_locked+0x348/0xcb8 [ 71.106077] rt_spin_lock+0x88/0xb8 [ 71.106086] eth_start_xmit+0x30/0x1490 [u_ether] /*****/ [ 71.106112] ncm_tx_timeout+0x2c/0x50 [usb_f_ncm] [ 71.106131] __hrtimer_run_queues+0x180/0x378 [ 71.106143] hrtimer_run_softirq+0x90/0x100 [ 71.106151] handle_softirqs.isra.0+0x14c/0x360 [ 71.106165] __local_bh_enable_ip+0x104/0x118 [ 71.106177] __netdev_alloc_skb+0x1e0/0x210 [ 71.106192] ncm_unwrap_ntb+0x1ec/0x528 [usb_f_ncm] [ 71.106206] rx_complete+0x120/0x288 [u_ether] /*****/ [ 71.106221] usb_gadget_giveback_request+0x34/0xf8 [ 71.106236] cdns3_gadget_giveback+0xe4/0x2d0 [cdns3] [ 71.106286] cdns3_transfer_completed+0x3b0/0x630 [cdns3] [ 71.106320] cdns3_device_thread_irq_handler+0x8b8/0xd18 [cdns3] [ 71.106353] irq_thread_fn+0x34/0xb8 [ 71.106364] irq_thread+0x180/0x2f0 [ 71.106374] kthread+0x104/0x118 [ 71.106384] ret_from_fork+0x10/0x20 The deadlock occurs because eth_start_xmit() and rx_complete() both acquire the same spinlock in the same instance of struct eth_dev. The nested call occurs because rx_complete() calls __netdev_alloc_skb() which performs a brief local_bh_disable/enable() sequence. Prevent the deadlock by disabling softirq in rx_complete(), with the same scope as the spinlock. With this fix, no deadlocks are observed over many hours of continuous testing at USB 2.0 speed (480 Mbit/s). Signed-off-by: Ralph Siemsen <ralph.siemsen@linaro.org> --- Discussion items: 0) This is somewhat similar to the recent discussion https://lore.kernel.org/linux-rt-devel/20250212123302.0f620f23@gandalf.local.home/ and my solution is to "sprinkle local_bh_disable() around", which is clearly not optimal. Hence the RFC on this patch. 1) There are several more places using the same spinlock, for example the gether_suspend() and gether_resume() functions. I'm not using power management, but I wonder if there might be more deadlocks lurking? 2) By keeping softirq disabled for a longer duration, this patch could potentially increase the RT latency. I tried to quantify this using cyclictest, but I cannot get a baseline for comparison, since it deadlocks almost immediately when the USB is active. Other suggestions? 3) The fix is in generic u_ether.c code, to match the scope of the spinlock. Alternatively, it could be done in cdns3 specific code, such as in cdns3_device_thread_irq_handler(). I'm not sure if the same problem exists in other USB drivers? --- drivers/usb/gadget/function/u_ether.c | 2 ++ 1 file changed, 2 insertions(+)