Message ID | 20160817153253.6615-1-rmanohar@qti.qualcomm.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Kalle Valo |
Headers | show |
Rajkumar Manoharan <rmanohar@qti.qualcomm.com> wrote: > txqs_lock is interfering with wake_tx_queue submitting more frames. > so queues don't get filled in and don't keep firmware/hardware busy > enough. This change helps to reduce the txqs_lock contention and > wake_tx_queue() blockage to being possible in txrx_unref(). > > To reduce turn around time of wake_tx_queue ops and to maintain fairness > among all txqs, the callback is updated to push first txq alone from > pending list for every wake_tx_queue call. Remaining txqs will be > processed later upon tx completion. > > Below improvements are observed in push-only mode and validated on > IPQ4019 platform. With this change, in AP mode ~10Mbps increase is > observed in downlink (AP -> STA) traffic and approx. 5-10% of CPU > usage is reduced. > > Major improvement is observed in 1-hop Mesh mode topology in 11ACVHT80. > Compared to Infra mode, CPU overhead is higher in Mesh mode due to path > lookup and no fast-xmit support. So reducing spin lock contention is > helping in Mesh. > > TOT +change > -------- -------- > TCP DL 545 Mbps 595 Mbps > TCP UL 555 Mbps 585 Mbps > > Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com> Thanks, 1 patch applied to ath-next branch of ath.git: 83e164b7679d ath10k: improve wake_tx_queue ops performance
diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c index c7c7ceb9be73..93465ffec2eb 100644 --- a/drivers/net/wireless/ath/ath10k/mac.c +++ b/drivers/net/wireless/ath/ath10k/mac.c @@ -4106,13 +4106,29 @@ static void ath10k_mac_op_wake_tx_queue(struct ieee80211_hw *hw, { struct ath10k *ar = hw->priv; struct ath10k_txq *artxq = (void *)txq->drv_priv; + struct ieee80211_txq *f_txq; + struct ath10k_txq *f_artxq; + int ret = 0; + int max = 16; spin_lock_bh(&ar->txqs_lock); if (list_empty(&artxq->list)) list_add_tail(&artxq->list, &ar->txqs); + + f_artxq = list_first_entry(&ar->txqs, struct ath10k_txq, list); + f_txq = container_of((void *)f_artxq, struct ieee80211_txq, drv_priv); + list_del_init(&f_artxq->list); + + while (ath10k_mac_tx_can_push(hw, f_txq) && max--) { + ret = ath10k_mac_tx_push_txq(hw, f_txq); + if (ret) + break; + } + if (ret != -ENOENT) + list_add_tail(&f_artxq->list, &ar->txqs); spin_unlock_bh(&ar->txqs_lock); - ath10k_mac_tx_push_pending(ar); + ath10k_htt_tx_txq_update(hw, f_txq); ath10k_htt_tx_txq_update(hw, txq); }
txqs_lock is interfering with wake_tx_queue submitting more frames. so queues don't get filled in and don't keep firmware/hardware busy enough. This change helps to reduce the txqs_lock contention and wake_tx_queue() blockage to being possible in txrx_unref(). To reduce turn around time of wake_tx_queue ops and to maintain fairness among all txqs, the callback is updated to push first txq alone from pending list for every wake_tx_queue call. Remaining txqs will be processed later upon tx completion. Below improvements are observed in push-only mode and validated on IPQ4019 platform. With this change, in AP mode ~10Mbps increase is observed in downlink (AP -> STA) traffic and approx. 5-10% of CPU usage is reduced. Major improvement is observed in 1-hop Mesh mode topology in 11ACVHT80. Compared to Infra mode, CPU overhead is higher in Mesh mode due to path lookup and no fast-xmit support. So reducing spin lock contention is helping in Mesh. TOT +change -------- -------- TCP DL 545 Mbps 595 Mbps TCP UL 555 Mbps 585 Mbps Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com> --- drivers/net/wireless/ath/ath10k/mac.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)