diff mbox series

802.11n IBSS: wlan0 stops receiving packets due to aggregation after sender reboot

Message ID m34l02mh71.fsf@t19.piap.pl (mailing list archive)
State Superseded
Delegated to: Johannes Berg
Headers show
Series 802.11n IBSS: wlan0 stops receiving packets due to aggregation after sender reboot | expand

Commit Message

Krzysztof Hałasa Oct. 21, 2019, 12:11 p.m. UTC
Johannes,

it seems I've encountered a bug in mac80211 RX aggregation handler.
The hw is a pair of stations using AR9580 (PCI ID 168c:0033) PCIe
adapters. Linux 5.4-rc4.
The driver shows the chip is Atheros AR9300 Rev:4.
I'm using (on both ends):
	iw wlan0 set type ibss
	ip link set wlan0 up
	iw dev wlan0 ibss join $ESSID $FREQ HT20

The problem manifests itself after one of the stations is restarted
(or the ath9k driver is reloaded, or a station is out of range for
some time etc).
It appears that the mac80211 RX aggregation code sets a new aggregation
"session" at the remote station's request, but the head_seq_num
(the sequence number the receiver expects to receive) isn't reset.
I've added some debugging code to ___ieee80211_start_rx_ba_session()
and ieee80211_sta_manage_reorder_buf() and it produced the following:

Both stations boot and join the IBSS, packets get through:
[   61.123131] AGG RX OK: ssn 1
[   61.125346] SEQ OK: 1 vs 1
[   61.125484] SEQ OK: 2 vs 2
[   62.100841] SEQ OK: 3 vs 3
...
[  180.124210] SEQ OK: 130 vs 130
[  181.123888] SEQ OK: 131 vs 131
[  182.126046] SEQ OK: 132 vs 132

Now I'm rebooting the remote station. It joins IBSS, packets can be seen
on mon0 monitoring interface (on the local station), but they aren't
arriving on wlan0:

[  192.131102] SEQ BAD: 0 vs 133
[  192.151243] AGG RX no change - OK: ssn 1
[  192.242760] SEQ BAD: 1 vs 133
[  193.133819] SEQ BAD: 2 vs 133
[  193.272802] SEQ BAD: 3 vs 133
...
[  421.272374] SEQ BAD: 130 vs 133
[  421.303630] SEQ BAD: 131 vs 133
[  422.327924] SEQ BAD: 132 vs 133

Then the sequence number catches up and the communication is
reestablished:
[  423.167023] SEQ OK: 133 vs 133
[  423.169061] SEQ OK: 134 vs 134
[  423.351618] SEQ OK: 135 vs 135

I'll attach a patch in a separate mail but I'm not sure if it's
the optimal fix - one packet (the "SEQ BAD: 0 vs 133) is still dropped,
and I guess it won't work if the sender decides to not request
aggregation anymore.

Comments?

The debugging code:

Comments

Nicolas Cavallari Dec. 9, 2019, 10:28 a.m. UTC | #1
I encountered the same issue in an IBSS-RSN network, where quick reboot
of a station would cause issues with aggregation because the kernel is
not aware of the reboot.

I figured out that since wpa_supplicant already detect reboots, the
simplest way to fix it would be for wpa_supplicant to reset the entire
state of the station in the kernel, instead of just resetting keys and
port.

This means extending NL80211_CMD_DEL_STATION to work in IBSS mode too.
My current wpa_supplicant patch tries this new API but fall back to the
old methods if an error is returned.

The changes required for drivers would be as follow:
- mac80211: will just work
- wil6210: does not seem to support IBSS mode
- ath6kl: need a patch
- brcmfmac: need a patch
- mwifiex: need a patch
- qtnfmac: no IBSS support
- staging/wilc1000: no IBSS support
- staging/rtl8723bs: already returns EINVAL in this case.

I made patches for ath6kl, brcmfmac and mwifiex that plainly reject
the request. If there is a better way, i'll glad to hear it.
Krzysztof Hałasa Dec. 11, 2019, 6:58 a.m. UTC | #2
Hi Nicolas,

Nicolas Cavallari <nicolas.cavallari@green-communications.fr> writes:

> I encountered the same issue in an IBSS-RSN network, where quick reboot
> of a station would cause issues with aggregation because the kernel is
> not aware of the reboot.
>
> I figured out that since wpa_supplicant already detect reboots, the
> simplest way to fix it would be for wpa_supplicant to reset the entire
> state of the station in the kernel, instead of just resetting keys and
> port.

Just to make sure everybody is aware that it would only be a partial fix
- unencrypted ad hoc mode (the simplest thing available) doesn't need
any userspace and thus must be fixed in the kernel. Alternatively this
functionality may be moved to wpa_supplicant and only then userspace fix
could really fix it completely.
Nicolas Cavallari Dec. 11, 2019, 10:31 a.m. UTC | #3
Hi Krzysztof,

On 11/12/2019 07:58, Krzysztof Hałasa wrote:
> Hi Nicolas,
> 
> Just to make sure everybody is aware that it would only be a partial fix
> - unencrypted ad hoc mode (the simplest thing available) doesn't need
> any userspace and thus must be fixed in the kernel. Alternatively this
> functionality may be moved to wpa_supplicant and only then userspace fix
> could really fix it completely.

A partial fix is better than no fix.

In any case, the IBSS-RSN and unencrypted IBSS cannot be fixed the same way.
Having the kernel automatically reset a station while wpa_supplicant may have
started a 4 way handshake with it had already lead to problems in the past.
Which is why the kernel no longer reset stations (52874a5e).

In any case, the unencrypted ibss case is probably much harder to fix, since
even in a linux-only world, there are probably fullmac drivers for hardware that
don't reply to open system auth in an open ibss. Anyway, this patch allows
userspace to fix it, using whatever policy the user chooses to detect it.
diff mbox series

Patch

--- a/net/mac80211/agg-rx.c
+++ b/net/mac80211/agg-rx.c
@@ -354,9 +354,10 @@  void ___ieee80211_start_rx_ba_session(struct sta_info *sta,
 			 */
 			rcu_read_lock();
 			tid_rx = rcu_dereference(sta->ampdu_mlme.tid_rx[tid]);
-			if (tid_rx && tid_rx->timeout == timeout)
+			if (tid_rx && tid_rx->timeout == timeout) {
 				status = WLAN_STATUS_SUCCESS;
-			else
+				printk(KERN_DEBUG "AGG RX no change - OK: ssn %u\n", start_seq_num);
+			} else
 				status = WLAN_STATUS_REQUEST_DECLINED;
 			rcu_read_unlock();
 			goto end;
@@ -434,6 +437,7 @@  void ___ieee80211_start_rx_ba_session(struct sta_info *sta,
 	tid_agg_rx->tid = tid;
 	tid_agg_rx->sta = sta;
 	status = WLAN_STATUS_SUCCESS;
+	printk(KERN_DEBUG "AGG RX OK: ssn %u\n", start_seq_num);
 
 	/* activate it for RX */
 	rcu_assign_pointer(sta->ampdu_mlme.tid_rx[tid], tid_agg_rx);
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -1298,9 +1298,11 @@  static bool ieee80211_sta_manage_reorder_buf(struct ieee80211_sub_if_data *sdata
 
 	/* frame with out of date sequence number */
 	if (ieee80211_sn_less(mpdu_seq_num, head_seq_num)) {
+		printk(KERN_DEBUG "SEQ BAD: %u vs %u\n", mpdu_seq_num, head_seq_num);
 		dev_kfree_skb(skb);
 		goto out;
-	}
+	} else
+		printk(KERN_DEBUG "SEQ OK: %u vs %u\n", mpdu_seq_num, head_seq_num);
 
 	/*
 	 * If frame the sequence number exceeds our buffering window