diff mbox series

[wireless] wifi: wlcore: fix wlcore AP mode

Message ID E1sBsK3-00E8Uo-Ab@rmk-PC.armlinux.org.uk (mailing list archive)
State Changes Requested
Delegated to: Kalle Valo
Headers show
Series [wireless] wifi: wlcore: fix wlcore AP mode | expand

Commit Message

Russell King (Oracle) May 28, 2024, 8:36 a.m. UTC
From: Johannes Berg <johannes.berg@intel.com>

Using wl183x devices in AP mode with various firmwares is not stable.

The driver currently adds a station to firmware with basic rates when it
is first known to the stack using the CMD_ADD_PEER command. Once the
station has finished authorising, another CMD_ADD_PEER command is issued
to update the firmware with the rates the station can use.

However, after a random amount of time, the firmware ignores the power
management nullfunc frames from the station, and tries to send packets
while the station is asleep, resulting in lots of retries dropping down
in rate due to no response. This restricts the available bandwidth.

With this happening with several stations, the user visible effect is
the latency of interactive connections increases significantly, packets
get dropped, and in general the WiFi connections become unreliable and
unstable.

Eventually, the firmware transmit queue appears to get stuck - with
packets and blocks allocated that never clear.

TI have a couple of patches that address this, but they touch the
mac80211 core to disable NL80211_FEATURE_FULL_AP_CLIENT_STATE for *all*
wireless drivers, which has the effect of not adding the station to the
stack until later when the rates are known. This is a sledge hammer
approach to solving the problem.

The solution implemented here has the same effect, but without
impacting all drivers.

We delay adding the station to firmware until it has been authorised
in the driver, and correspondingly remove the station when unwinding
from authorised state. Adding the station to firmware allocates a hlid,
which will now happen later than the driver expects. Therefore, we need
to track when this happens so that we transmit using the correct hlid.

This patch is an equivalent fix to these two patches in TI's
wilink8-wlan repository:

https://git.ti.com/cgit/wilink8-wlan/build-utilites/tree/patches/kernel_patches/4.19.38/0004-mac80211-patch.patch?h=r8.9&id=a2ee50aa5190ed3b334373d6cd09b1bff56ffcf7
https://git.ti.com/cgit/wilink8-wlan/build-utilites/tree/patches/kernel_patches/4.19.38/0005-wlcore-patch.patch?h=r8.9&id=a2ee50aa5190ed3b334373d6cd09b1bff56ffcf7

Reported-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Co-developed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>"
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
 drivers/net/wireless/ti/wlcore/cmd.c      |  7 -------
 drivers/net/wireless/ti/wlcore/main.c     | 17 ++++++++---------
 drivers/net/wireless/ti/wlcore/tx.c       |  7 ++-----
 drivers/net/wireless/ti/wlcore/wlcore_i.h |  6 ++++++
 4 files changed, 16 insertions(+), 21 deletions(-)

Comments

Russell King (Oracle) May 28, 2024, 8:50 a.m. UTC | #1
On Tue, May 28, 2024 at 09:36:43AM +0100, Russell King wrote:
> From: Johannes Berg <johannes.berg@intel.com>
> 
> Using wl183x devices in AP mode with various firmwares is not stable.
> 
> The driver currently adds a station to firmware with basic rates when it
> is first known to the stack using the CMD_ADD_PEER command. Once the
> station has finished authorising, another CMD_ADD_PEER command is issued
> to update the firmware with the rates the station can use.
> 
> However, after a random amount of time, the firmware ignores the power
> management nullfunc frames from the station, and tries to send packets
> while the station is asleep, resulting in lots of retries dropping down
> in rate due to no response. This restricts the available bandwidth.
> 
> With this happening with several stations, the user visible effect is
> the latency of interactive connections increases significantly, packets
> get dropped, and in general the WiFi connections become unreliable and
> unstable.
> 
> Eventually, the firmware transmit queue appears to get stuck - with
> packets and blocks allocated that never clear.
> 
> TI have a couple of patches that address this, but they touch the
> mac80211 core to disable NL80211_FEATURE_FULL_AP_CLIENT_STATE for *all*
> wireless drivers, which has the effect of not adding the station to the
> stack until later when the rates are known. This is a sledge hammer
> approach to solving the problem.
> 
> The solution implemented here has the same effect, but without
> impacting all drivers.
> 
> We delay adding the station to firmware until it has been authorised
> in the driver, and correspondingly remove the station when unwinding
> from authorised state. Adding the station to firmware allocates a hlid,
> which will now happen later than the driver expects. Therefore, we need
> to track when this happens so that we transmit using the correct hlid.
> 
> This patch is an equivalent fix to these two patches in TI's
> wilink8-wlan repository:
> 
> https://git.ti.com/cgit/wilink8-wlan/build-utilites/tree/patches/kernel_patches/4.19.38/0004-mac80211-patch.patch?h=r8.9&id=a2ee50aa5190ed3b334373d6cd09b1bff56ffcf7
> https://git.ti.com/cgit/wilink8-wlan/build-utilites/tree/patches/kernel_patches/4.19.38/0005-wlcore-patch.patch?h=r8.9&id=a2ee50aa5190ed3b334373d6cd09b1bff56ffcf7
> 
> Reported-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> Co-developed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> Tested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> Signed-off-by: Johannes Berg <johannes.berg@intel.com>"
> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

Please note that this patch fixes just one of the issues with the
driver. There remains other firmware bugs that make AP mode
unreliable. For example:

When a station, e.g. a phone, moves out of range of the AP, and the
station is in power saving mode, packets become stuck in the transmit
queue. With additional debugging added to the driver:

Unable to flush all frames for station xx:xx:xx:ee:11:fe for hlid 3
FW time: 1675524181
 Frame 0: expires 1394140264 MAC xx:xx:xx:ee:11:fe FC 17032
 Frame 1: expires 1394264633 MAC xx:xx:xx:ee:11:fe FC 17032

These packets get removed by the firmware when the peer is removed.
However, if the broadcast hlid was in power saving at the time, then
it appears the broadcast hlid gets similarly stuck, leading to the
entire network eventually falling over due to the AP effectively
blocking broadcasted ARP requests.

I can find no way around this - and I suspect there is some kind of
refcounting bug in the firmware when told to remove a peer which has
queued packets.

My best workaround for this at the moment is to monitor the state of
the driver via debugfs, and when this problem presents, to take the
AP down and bring it back up, restarting the firmware (but has the
effect of kicking all connected devices off the network.)

Another workaround for is to turn wifi off on the phone before moving
it out of range!

I will attempt to get captures of the network at some point - both
from the packets at the AP network interface, but also the radio
side as well.
Johannes Berg May 28, 2024, 8:54 a.m. UTC | #2
On Tue, 2024-05-28 at 09:36 +0100, Russell King wrote:
> From: Johannes Berg <johannes.berg@intel.com>

Honestly, while I did write a good chunk of the patch itself, all the
commit log, actually making it compile/work etc. was all you, so I'd
suggest to reverse our roles here in the authorship/credits.

johannes
Russell King (Oracle) May 30, 2024, 8:23 p.m. UTC | #3
On Tue, May 28, 2024 at 10:54:31AM +0200, Johannes Berg wrote:
> On Tue, 2024-05-28 at 09:36 +0100, Russell King wrote:
> > From: Johannes Berg <johannes.berg@intel.com>
> 
> Honestly, while I did write a good chunk of the patch itself, all the
> commit log, actually making it compile/work etc. was all you, so I'd
> suggest to reverse our roles here in the authorship/credits.

Sent v2 with this changed. Thanks.

---
pw-bot: cr
diff mbox series

Patch

diff --git a/drivers/net/wireless/ti/wlcore/cmd.c b/drivers/net/wireless/ti/wlcore/cmd.c
index a939fd89a7f5..92fc2d456c2c 100644
--- a/drivers/net/wireless/ti/wlcore/cmd.c
+++ b/drivers/net/wireless/ti/wlcore/cmd.c
@@ -1566,13 +1566,6 @@  int wl12xx_cmd_add_peer(struct wl1271 *wl, struct wl12xx_vif *wlvif,
 		cpu_to_le32(wl1271_tx_enabled_rates_get(wl, sta_rates,
 							wlvif->band));
 
-	if (!cmd->supported_rates) {
-		wl1271_debug(DEBUG_CMD,
-			     "peer has no supported rates yet, configuring basic rates: 0x%x",
-			     wlvif->basic_rate_set);
-		cmd->supported_rates = cpu_to_le32(wlvif->basic_rate_set);
-	}
-
 	wl1271_debug(DEBUG_CMD, "new peer rates=0x%x queues=0x%x",
 		     cmd->supported_rates, sta->uapsd_queues);
 
diff --git a/drivers/net/wireless/ti/wlcore/main.c b/drivers/net/wireless/ti/wlcore/main.c
index ef12169f8044..492cd7aef44f 100644
--- a/drivers/net/wireless/ti/wlcore/main.c
+++ b/drivers/net/wireless/ti/wlcore/main.c
@@ -5139,19 +5139,23 @@  static int wl12xx_update_sta_state(struct wl1271 *wl,
 
 	/* Add station (AP mode) */
 	if (is_ap &&
-	    old_state == IEEE80211_STA_NOTEXIST &&
-	    new_state == IEEE80211_STA_NONE) {
+	    old_state == IEEE80211_STA_AUTH &&
+	    new_state == IEEE80211_STA_ASSOC) {
 		ret = wl12xx_sta_add(wl, wlvif, sta);
 		if (ret)
 			return ret;
 
+		wl_sta->fw_added = true;
+
 		wlcore_update_inconn_sta(wl, wlvif, wl_sta, true);
 	}
 
 	/* Remove station (AP mode) */
 	if (is_ap &&
-	    old_state == IEEE80211_STA_NONE &&
-	    new_state == IEEE80211_STA_NOTEXIST) {
+	    old_state == IEEE80211_STA_ASSOC &&
+	    new_state == IEEE80211_STA_AUTH) {
+		wl_sta->fw_added = false;
+
 		/* must not fail */
 		wl12xx_sta_remove(wl, wlvif, sta);
 
@@ -5165,11 +5169,6 @@  static int wl12xx_update_sta_state(struct wl1271 *wl,
 		if (ret < 0)
 			return ret;
 
-		/* reconfigure rates */
-		ret = wl12xx_cmd_add_peer(wl, wlvif, sta, wl_sta->hlid);
-		if (ret < 0)
-			return ret;
-
 		ret = wl1271_acx_set_ht_capabilities(wl, &sta->deflink.ht_cap,
 						     true,
 						     wl_sta->hlid);
diff --git a/drivers/net/wireless/ti/wlcore/tx.c b/drivers/net/wireless/ti/wlcore/tx.c
index 7bd3ce2f0804..464587d16ab2 100644
--- a/drivers/net/wireless/ti/wlcore/tx.c
+++ b/drivers/net/wireless/ti/wlcore/tx.c
@@ -140,11 +140,8 @@  EXPORT_SYMBOL(wl12xx_is_dummy_packet);
 static u8 wl12xx_tx_get_hlid_ap(struct wl1271 *wl, struct wl12xx_vif *wlvif,
 				struct sk_buff *skb, struct ieee80211_sta *sta)
 {
-	if (sta) {
-		struct wl1271_station *wl_sta;
-
-		wl_sta = (struct wl1271_station *)sta->drv_priv;
-		return wl_sta->hlid;
+	if (sta && wl1271_station(sta)->fw_added) {
+		return wl1271_station(sta)->hlid;
 	} else {
 		struct ieee80211_hdr *hdr;
 
diff --git a/drivers/net/wireless/ti/wlcore/wlcore_i.h b/drivers/net/wireless/ti/wlcore/wlcore_i.h
index eefae3f867b9..817a8a61cac6 100644
--- a/drivers/net/wireless/ti/wlcore/wlcore_i.h
+++ b/drivers/net/wireless/ti/wlcore/wlcore_i.h
@@ -324,6 +324,7 @@  struct wl12xx_rx_filter {
 
 struct wl1271_station {
 	u8 hlid;
+	bool fw_added;
 	bool in_connection;
 
 	/*
@@ -335,6 +336,11 @@  struct wl1271_station {
 	u64 total_freed_pkts;
 };
 
+static inline struct wl1271_station *wl1271_station(struct ieee80211_sta *sta)
+{
+	return (struct wl1271_station *)sta->drv_priv;
+}
+
 struct wl12xx_vif {
 	struct wl1271 *wl;
 	struct list_head list;