Message ID | 20230423082403.49143-1-mirsad.todorovac@alu.unizg.hr (mailing list archive) |
---|---|
State | RFC |
Delegated to: | Johannes Berg |
Headers | show |
Series | [RFC,v1,1/1] net: mac80211: fortify the spinlock against deadlock in interrupt | expand |
On Sun, 2023-04-23 at 10:24 +0200, Mirsad Goran Todorovac wrote: > In the function ieee80211_tx_dequeue() there is a locking sequence: > > begin: > spin_lock(&local->queue_stop_reason_lock); > q_stopped = local->queue_stop_reasons[q]; > spin_unlock(&local->queue_stop_reason_lock); > > However small the chance (increased by ftracetest), an asynchronous > interrupt can occur in between of spin_lock() and spin_unlock(), > and the interrupt routine will attempt to lock the same > &local->queue_stop_reason_lock again. > > This is the only remaining spin_lock() on local->queue_stop_reason_lock > that did not disable interrupts and could have possibly caused the deadlock > on the same CPU (core). > > This will cause a costly reset of the CPU and wifi device or an > altogether hang in the single CPU and single core scenario. > > This is the probable reproduce of the deadlock: > > Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: Possible unsafe locking scenario: > Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: CPU0 > Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: ---- > Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: lock(&local->queue_stop_reason_lock); > Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: <Interrupt> > Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: lock(&local->queue_stop_reason_lock); > Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: > *** DEADLOCK *** > > Fixes: 4444bc2116ae That fixes tag is wrong, should be Fixes: 4444bc2116ae ("wifi: mac80211: Proper mark iTXQs for resumption") Otherwise seems fine to me, submit it properly? johannes
On 24.4.2023. 19:27, Johannes Berg wrote: > On Sun, 2023-04-23 at 10:24 +0200, Mirsad Goran Todorovac wrote: >> In the function ieee80211_tx_dequeue() there is a locking sequence: >> >> begin: >> spin_lock(&local->queue_stop_reason_lock); >> q_stopped = local->queue_stop_reasons[q]; >> spin_unlock(&local->queue_stop_reason_lock); >> >> However small the chance (increased by ftracetest), an asynchronous >> interrupt can occur in between of spin_lock() and spin_unlock(), >> and the interrupt routine will attempt to lock the same >> &local->queue_stop_reason_lock again. >> >> This is the only remaining spin_lock() on local->queue_stop_reason_lock >> that did not disable interrupts and could have possibly caused the deadlock >> on the same CPU (core). >> >> This will cause a costly reset of the CPU and wifi device or an >> altogether hang in the single CPU and single core scenario. >> >> This is the probable reproduce of the deadlock: >> >> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: Possible unsafe locking scenario: >> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: CPU0 >> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: ---- >> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: lock(&local->queue_stop_reason_lock); >> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: <Interrupt> >> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: lock(&local->queue_stop_reason_lock); >> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: >> *** DEADLOCK *** >> >> Fixes: 4444bc2116ae > > That fixes tag is wrong, should be > > Fixes: 4444bc2116ae ("wifi: mac80211: Proper mark iTXQs for resumption") > > Otherwise seems fine to me, submit it properly? > > johannes Will do, Sir. Do I have an Acked-by: ? Thank you. Mirsad
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c index 7699fb410670..45cb8e7bcc61 100644 --- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -3781,6 +3781,7 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw, ieee80211_tx_result r; struct ieee80211_vif *vif = txq->vif; int q = vif->hw_queue[txq->ac]; + unsigned long flags; bool q_stopped; WARN_ON_ONCE(softirq_count() == 0); @@ -3789,9 +3790,9 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw, return NULL; begin: - spin_lock(&local->queue_stop_reason_lock); + spin_lock_irqsave(&local->queue_stop_reason_lock, flags); q_stopped = local->queue_stop_reasons[q]; - spin_unlock(&local->queue_stop_reason_lock); + spin_unlock_irqrestore(&local->queue_stop_reason_lock, flags); if (unlikely(q_stopped)) { /* mark for waking later */
In the function ieee80211_tx_dequeue() there is a locking sequence: begin: spin_lock(&local->queue_stop_reason_lock); q_stopped = local->queue_stop_reasons[q]; spin_unlock(&local->queue_stop_reason_lock); However small the chance (increased by ftracetest), an asynchronous interrupt can occur in between of spin_lock() and spin_unlock(), and the interrupt routine will attempt to lock the same &local->queue_stop_reason_lock again. This is the only remaining spin_lock() on local->queue_stop_reason_lock that did not disable interrupts and could have possibly caused the deadlock on the same CPU (core). This will cause a costly reset of the CPU and wifi device or an altogether hang in the single CPU and single core scenario. This is the probable reproduce of the deadlock: Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: Possible unsafe locking scenario: Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: CPU0 Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: ---- Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: lock(&local->queue_stop_reason_lock); Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: <Interrupt> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: lock(&local->queue_stop_reason_lock); Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: *** DEADLOCK *** Fixes: 4444bc2116ae Link: https://lore.kernel.org/all/1f58a0d1-d2b9-d851-73c3-93fcc607501c@alu.unizg.hr/ Cc: Alexander Wetzel <alexander@wetzel-home.de> Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr> --- net/mac80211/tx.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)