Message ID | 20150223224305.GA30228@w1.fi (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On Mon, Feb 23, 2015 at 2:43 PM, Jouni Malinen <j@w1.fi> wrote: > > This did not get exactly supportive response when this was proposed last > time (Sep 2013). Anyway, for a quick test, this can be done with the > following one-liner: fwiw, that one-liner seems to work fine for me. Which I guess is not a huge surprise. Side note: I've done the "turn off wifi and turn it back on" several times to test that patch, and it has worked every time. BUT I also see this odd behavior where the logs show that it tries to authenticate twice: the first time it does that "send auth to 20:9f .." thing three times (looks like 100ms apart), and nothing happens so it does "authentication with 20:9f .. timed out". Then it waits three seconds and tries again, and now it succeeds on the first try. The only downside of that seems to be that it takes an extra 3s to connect to the network - but it does now seem to *reliably* connect - so it's not a big problem, but I wonder why it should be that repeatable. Is there some difference between the first and the second time it tries to authenticate? Anyway, even if people don't like that particular patch, it does seem like *something* like that should be done. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Feb 23, 2015 at 03:00:39PM -0800, Linus Torvalds wrote: > On Mon, Feb 23, 2015 at 2:43 PM, Jouni Malinen <j@w1.fi> wrote: > > > > This did not get exactly supportive response when this was proposed last > > time (Sep 2013). Anyway, for a quick test, this can be done with the > > following one-liner: > > fwiw, that one-liner seems to work fine for me. Good to know. That seems to confirm that the issue is most likely in some of the higher HT rates not working with that AP for some reason at least for EAPOL frames immediately following the association. This should really have worked since I'm pretty sure minstrel will add a lower MCS index as a fallback rate, but maybe there are some interop issues or just close enough to being below the required signal strength (though, I'd find that somewhat unlikely unless there is something messed up with the antennas etc. taken into account your description on how close the devices are and what's between them). > Side note: I've done the "turn off wifi and turn it back on" several > times to test that patch, and it has worked every time. BUT I also see > this odd behavior where the logs show that it tries to authenticate > twice: the first time it does that "send auth to 20:9f .." thing three > times (looks like 100ms apart), and nothing happens so it does > "authentication with 20:9f .. timed out". Then it waits three seconds > and tries again, and now it succeeds on the first try. > > The only downside of that seems to be that it takes an extra 3s to > connect to the network - but it does now seem to *reliably* connect - > so it's not a big problem, but I wonder why it should be that > repeatable. Is there some difference between the first and the second > time it tries to authenticate? There have been some issues in this area in the past, but it is a bit difficult to say what could have caused it here without a wireless capture from the operating channel of the AP. It is possible that those Authentication frames were not actually transmitted due to some issue or they could have even been sent out on incorrect channel, etc. That shouldn't really happen that frequently (i.e., others should be seeing and reporting this as well..), but it's possible something gets messed up in channel configuration. > Anyway, even if people don't like that particular patch, it does seem > like *something* like that should be done. Agreed. Just need to figure out a suitable compromise somewhere in the middle of the minimum and close-to-maximum rates. As far as ath9k is concerned, it would actually support four different rates for retry series where minstrel uses only three, so it would be easy even for that driver on its own to add a lower rate at the end if we cannot find consensus on something more generic in mac80211. That said, I'd rather see minstrel getting more conservative for EAPOL frames.
Jouni Malinen wrote: > Even I think that this goes a bit too far especially on 2.4 GHz band, > but I would actually consider limiting EAPOL frames to using non-HT/VHT > (e.g., 6 Mbps OFDM at least for EAPOL-Key frames) to avoid some > interoperability issues. I would say that the current minstrel_ht > behavior is somewhat excessive for EAPOL-Key frames in the other > direction. Using MCS 14 with fallback to something like MCS 5 for the > second Data frame after an association can certainly fail. > > Number of Linux drivers do already limit EAPOL frame TX rate, so this > specific item is mainly applicable only to driver that use minstrel from > mac80211 (e.g., ath9k). Though, that IEEE80211_TX_CTL_USE_MINRATE would > likely affect most mac80211 drivers. We had this in the ath9k RC, where EAPOL frames were sent out at min-rate on the VO queue. I think instead of re-introducing it in the driver, having it in minstrel is cleaner. Sujith -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
(sorry to send HTML the first time... oops) Over the weekend I found a bug in minstrel-ht that might well be implicated here. The last retransmit rate is meant to be a 'get the packet there reliably' rate; minstrel-ht doesn't do that right, and can pick a fairly flaky rate instead. Can't generate a proper patch right now, so this diff might not apply cleanly, but the fix is simply to change 75 to 99 in the two places below: @@ -437,7 +448,7 @@ minstrel_ht_set_best_prob_rate(struct mi (max_tp_group != MINSTREL_CCK_GROUP)) return; - if (mrs->prob_ewma > MINSTREL_FRAC(75, 100)) { + if (mrs->prob_ewma > MINSTREL_FRAC(99, 100)) { cur_tp_avg = minstrel_ht_get_tp_avg(mi, cur_group, cur_idx); if (cur_tp_avg > tmp_tp_avg) mi->max_prob_rate = index; @@ -526,7 +537,7 @@ minstrel_ht_prob_rate_reduce_streams(str * Rules for rate selection: * - max_prob_rate must use only one stream, as a tradeoff between delivery * probability and throughput during strong fluctuations - * - as long as the max prob rate has a probability of more than 75%, pick + * - as long as the max prob rate has a probability of more than 99%, pick * higher throughput rates, even if the probablity is a bit lower */ static void On Tue, Feb 24, 2015 at 11:29 AM, Sujith Manoharan <sujith@msujith.org> wrote: > Jouni Malinen wrote: > > > Even I think that this goes a bit too far especially on 2.4 GHz band, > > > but I would actually consider limiting EAPOL frames to using non-HT/VHT > > > (e.g., 6 Mbps OFDM at least for EAPOL-Key frames) to avoid some > > > interoperability issues. I would say that the current minstrel_ht > > > behavior is somewhat excessive for EAPOL-Key frames in the other > > > direction. Using MCS 14 with fallback to something like MCS 5 for the > > > second Data frame after an association can certainly fail. > > > > > > Number of Linux drivers do already limit EAPOL frame TX rate, so this > > > specific item is mainly applicable only to driver that use minstrel from > > > mac80211 (e.g., ath9k). Though, that IEEE80211_TX_CTL_USE_MINRATE would > > > likely affect most mac80211 drivers. > > > We had this in the ath9k RC, where EAPOL frames were sent out at min-rate > > on the VO queue. I think instead of re-introducing it in the driver, > > having it in minstrel is cleaner. > > > Sujith > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-wireless" in > > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Feb 24, 2015 at 01:29:27PM +1100, Andrew McGregor wrote: > Over the weekend I found a bug in minstrel-ht that might well be > implicated here. > > The last retransmit rate is meant to be a 'get the packet there > reliably' rate; minstrel-ht doesn't do that right, and can pick a > fairly flaky rate instead. > > Can't generate a proper patch right now, so this diff might not apply > cleanly, but the fix is simply to change 75 to 99 in the two places > below: While this may indeed be helpful, I don't think it is sufficient for this EAPOL frame related issue. What I would like to see is minstrel_ht using a basic rate (something non-HT) at the end of the retry series for EAPOL frames. The current behavior looks very suspicious to me. The early EAPOL frames after association are being used to probe for higher rates. This results in the total number of retry attempts actually getting smaller than any other frame, i.e., minstrel_ht seems to be using significantly _less_ robust choices for the EAPOL frames than following "normal" data frames! This should really be the other way around.. As an example, I'm seeing this on 5 GHz band (with the 75 to 99 change in place, but behavior was more or less identical without it): - the first EAPOL frame (msg 2/4) getting one attempt at MCS 3, 2 attempts at MCS 0, 2 attempts at MCS 0 (yes, identical to the previous one) with total maximum of 5 attempts - the second EAPOL frame (msg 4/4) getting one attempt at MCS 9, 2 attempts at MCS 0, 2 attempts at MCS 0 with total maximum of 5 attempts - another data frame after this: 5 attempts at MCS 9, 5 attempts at MCS 3, 5 attempts at MCS 3 with total maximum of 15 attempts(!!) This cannot be the best approach here.. For the IEEE80211_TX_CTRL_PORT_CTRL_PROTO cases, there are identified issues where failing to deliver the frame results is significant issues either in getting connected in the first place or getting disconnected if rekeying fails. I'm not sure how this would be implemented cleanly in minstrel_ht or whether that is even the best place (i.e., rate.c could do this instead), but I'd like that third attempt for control port cases to be dropped to use a (lowish) basic rate and non-MCS at that since there may be some interop issues with HT MCS early during association. Alternatively with drivers like ath9k that support 4 rate values, it would also be fine to add this basic rate attempt (or well, I'd have multiple, say 4, such attempts) as an additional 4th entry which does not currently seem to get used with minstrel at all. The "(lowish) basic rate" here could be defined as 6 Mbps OFDM for 5 GHz band and either that or maybe even 2 Mbps or 5.5 Mbps on 2.4 GHz (if included by the AP in basic rate set).
On Tue, Feb 24, 2015 at 2:26 AM, Jouni Malinen <j@w1.fi> wrote: > On Tue, Feb 24, 2015 at 01:29:27PM +1100, Andrew McGregor wrote: >> Over the weekend I found a bug in minstrel-ht that might well be >> implicated here. >> >> The last retransmit rate is meant to be a 'get the packet there >> reliably' rate; minstrel-ht doesn't do that right, and can pick a >> fairly flaky rate instead. >> >> Can't generate a proper patch right now, so this diff might not apply >> cleanly, but the fix is simply to change 75 to 99 in the two places >> below: > > While this may indeed be helpful, I don't think it is sufficient for > this EAPOL frame related issue. What I would like to see is minstrel_ht > using a basic rate (something non-HT) at the end of the retry series for > EAPOL frames. > > The current behavior looks very suspicious to me. The early EAPOL frames > after association are being used to probe for higher rates. This results > in the total number of retry attempts actually getting smaller than any > other frame, i.e., minstrel_ht seems to be using significantly _less_ > robust choices for the EAPOL frames than following "normal" data frames! > This should really be the other way around.. > > As an example, I'm seeing this on 5 GHz band (with the 75 to 99 change > in place, but behavior was more or less identical without it): > - the first EAPOL frame (msg 2/4) getting one attempt at MCS 3, 2 > attempts at MCS 0, 2 attempts at MCS 0 (yes, identical to the previous > one) with total maximum of 5 attempts > - the second EAPOL frame (msg 4/4) getting one attempt at MCS 9, 2 > attempts at MCS 0, 2 attempts at MCS 0 with total maximum of 5 > attempts > - another data frame after this: 5 attempts at MCS 9, 5 attempts at MCS > 3, 5 attempts at MCS 3 with total maximum of 15 attempts(!!) I would in general prefer that the excessive retries in the present driver layers in wifi be dramatically reduced, the packet dropped and the problem punted to higher layers. > This cannot be the best approach here.. Falling back faster to the lowest possible rate with minimum retries, and then giving up sooner would be better. 15 attempts? jeeze.... > For the > IEEE80211_TX_CTRL_PORT_CTRL_PROTO cases, there are identified issues > where failing to deliver the frame results is significant issues either > in getting connected in the first place or getting disconnected if > rekeying fails. > > I'm not sure how this would be implemented cleanly in minstrel_ht or > whether that is even the best place (i.e., rate.c could do this > instead), but I'd like that third attempt for control port cases to be > dropped to use a (lowish) basic rate and non-MCS at that since there may > be some interop issues with HT MCS early during association. > Alternatively with drivers like ath9k that support 4 rate values, it > would also be fine to add this basic rate attempt (or well, I'd have > multiple, say 4, such attempts) as an additional 4th entry which does > not currently seem to get used with minstrel at all. > > The "(lowish) basic rate" here could be defined as 6 Mbps OFDM for 5 GHz > band and either that or maybe even 2 Mbps or 5.5 Mbps on 2.4 GHz (if > included by the AP in basic rate set). > > -- > Jouni Malinen PGP id EFC895FA > _______________________________________________ > ath9k-devel mailing list > ath9k-devel@lists.ath9k.org > https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Hi Jouni, Currently Minstrel_HT just skips EAPOL packets for its rate sampling on non-mrr chips by testing: (info->control.flags & IEEE80211_TX_CTRL_PORT_CTRL_PROTO) On mrr hardware it uses them for probing. But the general MRR-chain should look like this for ath5k and ath9k chips that support 4 mrr chains: mrr[0]:= max_tp_rate[0] mrr[1]:= max_tp_rate[1] mrr[2]:= max_prob_rate mrr[3]:= basic_rate So for Minstrel Sampling Packets as well as for data packets, the 4th mrr stage should use the slowest rate in case all other 3 mrr stages failed with their retry attempts. I do see two possible options for control frames like EAPOL to be send out in a more robust fashion: - exclude those frames from AMPDU aggragates - change their mrr setup to be more conservative What do you think about this ? Greetings Thomas > Am 24.02.2015 um 11:26 schrieb Jouni Malinen <j@w1.fi>: > > On Tue, Feb 24, 2015 at 01:29:27PM +1100, Andrew McGregor wrote: >> Over the weekend I found a bug in minstrel-ht that might well be >> implicated here. >> >> The last retransmit rate is meant to be a 'get the packet there >> reliably' rate; minstrel-ht doesn't do that right, and can pick a >> fairly flaky rate instead. >> >> Can't generate a proper patch right now, so this diff might not apply >> cleanly, but the fix is simply to change 75 to 99 in the two places >> below: > > While this may indeed be helpful, I don't think it is sufficient for > this EAPOL frame related issue. What I would like to see is minstrel_ht > using a basic rate (something non-HT) at the end of the retry series for > EAPOL frames. > > The current behavior looks very suspicious to me. The early EAPOL frames > after association are being used to probe for higher rates. This results > in the total number of retry attempts actually getting smaller than any > other frame, i.e., minstrel_ht seems to be using significantly _less_ > robust choices for the EAPOL frames than following "normal" data frames! > This should really be the other way around.. > > As an example, I'm seeing this on 5 GHz band (with the 75 to 99 change > in place, but behavior was more or less identical without it): > - the first EAPOL frame (msg 2/4) getting one attempt at MCS 3, 2 > attempts at MCS 0, 2 attempts at MCS 0 (yes, identical to the previous > one) with total maximum of 5 attempts > - the second EAPOL frame (msg 4/4) getting one attempt at MCS 9, 2 > attempts at MCS 0, 2 attempts at MCS 0 with total maximum of 5 > attempts > - another data frame after this: 5 attempts at MCS 9, 5 attempts at MCS > 3, 5 attempts at MCS 3 with total maximum of 15 attempts(!!) > > This cannot be the best approach here.. For the > IEEE80211_TX_CTRL_PORT_CTRL_PROTO cases, there are identified issues > where failing to deliver the frame results is significant issues either > in getting connected in the first place or getting disconnected if > rekeying fails. > > I'm not sure how this would be implemented cleanly in minstrel_ht or > whether that is even the best place (i.e., rate.c could do this > instead), but I'd like that third attempt for control port cases to be > dropped to use a (lowish) basic rate and non-MCS at that since there may > be some interop issues with HT MCS early during association. > Alternatively with drivers like ath9k that support 4 rate values, it > would also be fine to add this basic rate attempt (or well, I'd have > multiple, say 4, such attempts) as an additional 4th entry which does > not currently seem to get used with minstrel at all. > > The "(lowish) basic rate" here could be defined as 6 Mbps OFDM for 5 GHz > band and either that or maybe even 2 Mbps or 5.5 Mbps on 2.4 GHz (if > included by the AP in basic rate set). > > -- > Jouni Malinen PGP id EFC895FA > _______________________________________________ > ath9k-devel mailing list > ath9k-devel@lists.ath9k.org > https://lists.ath9k.org/mailman/listinfo/ath9k-devel -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Feb 24, 2015 at 06:54:47PM +0100, Thomas Hühn wrote: > Currently Minstrel_HT just skips EAPOL packets for its rate sampling on non-mrr chips by testing: (info->control.flags & IEEE80211_TX_CTRL_PORT_CTRL_PROTO) Yeah, I noticed that when going through the implementation, but it was indeed only for cases other than ath9k-like drivers. > On mrr hardware it uses them for probing. > But the general MRR-chain should look like this for ath5k and ath9k chips that support 4 mrr chains: > > mrr[0]:= max_tp_rate[0] > mrr[1]:= max_tp_rate[1] > mrr[2]:= max_prob_rate > mrr[3]:= basic_rate Where is that mrr[3] part implemented? I did not find it when reviewing the design (hw->max_rates >= 3 is used, but not >= 4) and this does not match my experiments either when printing out all four values from ath9k. In every single case I observed, the last entry was unused (idx = -1) and only MCS values were used (i.e., not even a single case of basic rate visible; basic rates being 6, 12, 24 Mbps OFDM in this specific case with the AP that I used in the tests). > So for Minstrel Sampling Packets as well as for data packets, the 4th mrr stage should use the slowest rate in case all other 3 mrr stages failed with their retry attempts. > > I do see two possible options for control frames like EAPOL to be send out in a more robust fashion: > - exclude those frames from AMPDU aggragates ath9k does that for IEEE80211_TX_CTL_RATE_CTRL_PROBE which seemed to get set for the initial EAPOL frames. I guess this could be done more generically for all EAPOL frames. > - change their mrr setup to be more conservative That mrr[3]:= basic_rate is the part I was really asking for as far as EAPOL frames are concerned.
Hi Jouni, > Where is that mrr[3] part implemented? I did not find it when reviewing > the design (hw->max_rates >= 3 is used, but not >= 4) and this does not > match my experiments either when printing out all four values from > ath9k. In every single case I observed, the last entry was unused (idx = > -1) and only MCS values were used (i.e., not even a single case of basic > rate visible; basic rates being 6, 12, 24 Mbps OFDM in this specific > case with the AP that I used in the tests). Minstrel_HT does only set mrr[0..2] and does not touch the fourth mrr[3], assumed chips with 4 mrr stages. In function minstrel_ht_update_rates() the rate table struct ieee80211_sta_rates is set for the first 3 out of 4 rates and the fouth rate (mrr[3]) is „-1“. In case of ath9k, the driver in xmit.c allocates in function ath_tx_fill_desc() a struct ath_tx_info info by memset(&info, 0, sizeof(info)) .. so all info->rates[xy].Rate == 0. Than function ath_buf_set_rate() sets all info->rates[xy].Rate to those values specified in the rate table (struct ieee8021_sta_rate) IF (!rates[i].count || (rates[i].idx < 0)) … … so info->rates[3].Rate is untouched and still „0“. I just added a printk() to xmit.c in function ath_tx_fill_desc() just before ath9k_hw_set_txdesc() is called. From the log I get, e.g.: [ 169.543554] mrr setup: mrr[0]:133 mrr[1]:132 mrr[2]:134 mrr[3]:0 I have not check yet if info->rates[3].Rate == 0 is the lowest possible rate… but I would guess so. Greetings Thomas -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, I thought about doing this for rate probing with FreeBSD's sample rate algorithm, but after actually having to use the damned thing in noisy environments I realised that it just wasn't worth the effort to optimise rate control selection whilst doing EAPOL frames. If we did more useful software retransmission at lower rates then sure, one could just be much more conservative about the rates and retry a few times at a lower rate. However, since that isn't the case and we only get three or so attempts at EAPOL exchange - once a second - before an AP decides we're no good, I figured I'd just skip anything other than the lowest basic rates for the frame exchange and make it associate quicker. I had the same problem with DHCP - initial DHCP leases were failing because the rate control code started high/medium and didn't degrade quickly enough before the next round of DHCP attempts. DHCP backs off even more aggressively, so failing that initial DHCP was quite expensive. -adrian -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2015-02-25 07:14, Jouni Malinen wrote: > On Tue, Feb 24, 2015 at 06:54:47PM +0100, Thomas Hühn wrote: >> Currently Minstrel_HT just skips EAPOL packets for its rate sampling on non-mrr chips by testing: (info->control.flags & IEEE80211_TX_CTRL_PORT_CTRL_PROTO) > > Yeah, I noticed that when going through the implementation, but it was > indeed only for cases other than ath9k-like drivers. > >> On mrr hardware it uses them for probing. >> But the general MRR-chain should look like this for ath5k and ath9k chips that support 4 mrr chains: >> >> mrr[0]:= max_tp_rate[0] >> mrr[1]:= max_tp_rate[1] >> mrr[2]:= max_prob_rate >> mrr[3]:= basic_rate > > Where is that mrr[3] part implemented? I did not find it when reviewing > the design (hw->max_rates >= 3 is used, but not >= 4) and this does not > match my experiments either when printing out all four values from > ath9k. In every single case I observed, the last entry was unused (idx = > -1) and only MCS values were used (i.e., not even a single case of basic > rate visible; basic rates being 6, 12, 24 Mbps OFDM in this specific > case with the AP that I used in the tests). Minstrel_ht does *NOT* use mrr[3], nor should it. For normal data packets, a little packet loss under tough conditions is good, otherwise we risk lots of wasted airtime and bufferbloat. >> So for Minstrel Sampling Packets as well as for data packets, the 4th mrr stage should use the slowest rate in case all other 3 mrr stages failed with their retry attempts. >> >> I do see two possible options for control frames like EAPOL to be send out in a more robust fashion: >> - exclude those frames from AMPDU aggragates > > ath9k does that for IEEE80211_TX_CTL_RATE_CTRL_PROBE which seemed to > get set for the initial EAPOL frames. I guess this could be done more > generically for all EAPOL frames. I agree. >> - change their mrr setup to be more conservative > > That mrr[3]:= basic_rate is the part I was really asking for as far as > EAPOL frames are concerned. I don't think we need that. If we just exclude EAPOL from both probing and aggregation, it should be safe. While it's connecting, that leaves in low rates in the retry chain anyway. If it still fails often enough to be noticeable under normal conditions, there must be something seriously wrong outside of rate control, and we should not paper over it with a crude band-aid workaround. - Felix -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Feb 25, 2015 at 06:00:08PM +1300, Felix Fietkau wrote: > Minstrel_ht does *NOT* use mrr[3], nor should it. For normal data > packets, a little packet loss under tough conditions is good, otherwise > we risk lots of wasted airtime and bufferbloat. I agree for normal data packets, but EAPOL frames are not really normal data packet even though they happen to be transmitted as Data frames. EAPOL frames are rare enough to not wast much airtime (and recovery from an issue would anyway use way more airtime) and bufferbloat is irrelevant for EAPOL frames. For EAPOL frames, little packet loss is not good. Especially for EAPOL-Key msg 4/4, the only recovery option in many cases is to reassociate with the AP and start from scratch. > > That mrr[3]:= basic_rate is the part I was really asking for as far as > > EAPOL frames are concerned. > I don't think we need that. If we just exclude EAPOL from both probing > and aggregation, it should be safe. While it's connecting, that leaves > in low rates in the retry chain anyway. Not low enough IMHO. EAPOL is a special case and needs to be addressed as such. It is special for at least two reasons: being very early in the association (well, the very _first_ Data frames using rate control) and being critical for maintaining the connection (AP will disconnect if it does not reply response). What happens now is way too optimistic: - one try at MCS 2 followed by four tries at MCS 0 for EAPOL-Key msg 2/4 - one try at MCS 9 (or so) followed by four tries at MCS 0 for EAPOL-Key msg 4/4 (this being the most critical frame in the connection sequence due to not having a good recovery mechanism) Dropping probing from these would allow one more attempt at the first rate and I guess it would also drop the first rate to somewhat lower. I'm fine with using these MCS rates as the first option, but I do think that we have to add one more rate to the end (or change the 3rd rate if that is easier for implementation) to be non-MCS and I think one of the basic rates (say, 6 Mbps on 5 GHz and maybe 2 or 5.5 Mbps on 2.4 GHz) with number of tries (say, 4). There have been way too many cases reported where "strange issues" with 4-way exchange (those EAPOL-Key frames) result in connection failing. While not all of these can be explained with the TX rate, I'm pretty sure large portion of these issues are indeed caused by too optimistic TX rate selection. Such policies may be acceptable for other Data frames, but not for EAPOL. > If it still fails often enough to be noticeable under normal conditions, > there must be something seriously wrong outside of rate control, and we > should not paper over it with a crude band-aid workaround. There may be something else wrong (say, some kind of interference), but there is no way we can assume normal users to be able to fix such issues. If we make EAPOL frames go through more robustly, the connection can be established in more cases and this can result in relatively functional network connection and rate control can handle the less critical data frames through whatever means to get optimal throughput from the network. As such, I do think we do need to "paper over" this for EAPOL frames.
On Tue, Feb 24, 2015 at 11:38:02PM +0100, Thomas Hühn wrote: > Minstrel_HT does only set mrr[0..2] and does not touch the fourth mrr[3], assumed chips with 4 mrr stages. > In function minstrel_ht_update_rates() the rate table struct ieee80211_sta_rates is set for the first 3 out of 4 rates and the fouth rate (mrr[3]) is „-1“. Agreed. > In case of ath9k, the driver in xmit.c allocates in function ath_tx_fill_desc() a struct ath_tx_info info by memset(&info, 0, sizeof(info)) .. so all info->rates[xy].Rate == 0. > Than function ath_buf_set_rate() sets all info->rates[xy].Rate to those values specified in the rate table (struct ieee8021_sta_rate) IF (!rates[i].count || (rates[i].idx < 0)) … > … so info->rates[3].Rate is untouched and still „0“. Sure, it can be 0 and so is the number of tries at that rate.. > I just added a printk() to xmit.c in function ath_tx_fill_desc() just before ath9k_hw_set_txdesc() is called. > From the log I get, e.g.: > > [ 169.543554] mrr setup: mrr[0]:133 mrr[1]:132 mrr[2]:134 mrr[3]:0 > > I have not check yet if info->rates[3].Rate == 0 is the lowest possible rate… but I would guess so. It is not and even if it were, it does not matter since this 4th item is used for _0_ tries. I've verified the exact behavior with a sniffer for a case where the target device does not ACK frames. ath9k ends up sending at exactly the three different rates indicated in the first three values and nothing else. With RC probing (which happens to occur for the initial EAPOL frames, this results in only one attempt at MCS(>0) and two + two attempts at MCS0. No non-MCS rates are tried. As pointed out previously, this is likely fine for normal data frames, but not for EAPOL.
On Wed, Feb 25, 2015 at 6:47 AM, Jouni Malinen <j@w1.fi> wrote: > > There may be something else wrong (say, some kind of interference), but > there is no way we can assume normal users to be able to fix such > issues. If we make EAPOL frames go through more robustly, the connection > can be established in more cases and this can result in relatively > functional network connection and rate control can handle the less > critical data frames through whatever means to get optimal throughput > from the network. As such, I do think we do need to "paper over" this > for EAPOL frames. While I realize that people may disagree about the exact details of how to fix this in the long run, may I suggest that in the meantime we at least get the two workaround patches applied? I'm talking about the two from Jouni - the "don't encrypt EAPOL frames" one, and the one-liner that makes all EAPOL frames go at the lowest data rate. Even if "lowest data rate" is ridiculously low, and even if that might disturb other things going on on the same channel at the same time, those authentication packets shouldn't be so common as to be a problem. No? Jouni has a few packet dumps for me, and he's stumped as to what exactly is going on, but those two patches (well, the one-liner "low data rate EAPOL" in particular, it seems) do seem to make my connections go through reliably. And it seems that other drivers already are working around the EAPOL issue in similar ways, judging by the comments about iwlwifi. Last time I had connection issues with this laptop, nothing ended up happening in the end, and I had people pipe up saying they had had similar problems. I'd hate for the same "nothing" to happen this time just because people aren't 100% sure what the final right thing is to do. So I'd really like people to apply the simple workarounds for now because clearly something is badly wrong, and *if* there is some better resolution later, that's fine. I'll happily test patches. It seems to be pretty repeatable for me, even if that "pretty repeatable" seems to be very much about the laptop being in one very particular place (it's right next to another AP, there's random other electronics around, since it's on my messy desk etc). So I wouldn't be at all surprised by horribly interference. And the AP is supposed to be ceiling- or wall-mounted, but because I'm just testing things out it's just sitting on a table in the next room, so for all I know it's in the *exact* worst position for the antennas etc etc. So I'm sure I can improve reception of my laptop, but that's not the point. The point is that bad wireless networks aren't so unusual, and right now things clearly don't work as well as they could. Does anybody hate Jouni's two patches *so* much that they can articulate *why* it would be wrong to apply them as interim patches? And if so, do you have better patches for me to try? Because if not.. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Linus Torvalds wrote: > Last time I had connection issues with this laptop, nothing ended up > happening in the end, and I had people pipe up saying they had had > similar problems. I'd hate for the same "nothing" to happen this time > just because people aren't 100% sure what the final right thing is to > do. So I'd really like people to apply the simple workarounds for now > because clearly something is badly wrong, and *if* there is some > better resolution later, that's fine. .. > So I'm sure I can improve reception of my laptop, but that's not the > point. The point is that bad wireless networks aren't so unusual, and > right now things clearly don't work as well as they could. Those two patches (the one-liner in particular) should have gone in already 5-10 years ago. This has been an embarrassing problem for many years. I'm glad that people jumped this time around. //Peter -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 25 February 2015 at 10:14, Linus Torvalds <torvalds@linux-foundation.org> wrote: > While I realize that people may disagree about the exact details of > how to fix this in the long run, may I suggest that in the meantime we > at least get the two workaround patches applied? > > I'm talking about the two from Jouni - the "don't encrypt EAPOL > frames" one, and the one-liner that makes all EAPOL frames go at the > lowest data rate. > > Even if "lowest data rate" is ridiculously low, and even if that might > disturb other things going on on the same channel at the same time, > those authentication packets shouldn't be so common as to be a > problem. No? > > Jouni has a few packet dumps for me, and he's stumped as to what > exactly is going on, but those two patches (well, the one-liner "low > data rate EAPOL" in particular, it seems) do seem to make my > connections go through reliably. > > And it seems that other drivers already are working around the EAPOL > issue in similar ways, judging by the comments about iwlwifi. [snip] > So I'm sure I can improve reception of my laptop, but that's not the > point. The point is that bad wireless networks aren't so unusual, and > right now things clearly don't work as well as they could. > > Does anybody hate Jouni's two patches *so* much that they can > articulate *why* it would be wrong to apply them as interim patches? > And if so, do you have better patches for me to try? Because if not.. I agree with you. I think you should just have EAPOL frames go out at the lowest rate for now and then worry about experimenting with more interesting ways to make EAPOL / DHCP frames cheaper. It fixes a lot of problems in noisy areas. That hack was hiding around in various commercial drivers I've seen, and it's been in FreeBSD for a while. Same with DHCP traffic too - it's the second set of data frames that the rate control code sees, and it's the primary reason I dropped the initial sample rate down in FreeBSD so the DHCP exchange would have a better chance of succeeding after association. -adrian -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Jouni, >> In case of ath9k, the driver in xmit.c allocates in function ath_tx_fill_desc() a struct ath_tx_info info by memset(&info, 0, sizeof(info)) .. so all info->rates[xy].Rate == 0. >> Than function ath_buf_set_rate() sets all info->rates[xy].Rate to those values specified in the rate table (struct ieee8021_sta_rate) IF (!rates[i].count || (rates[i].idx < 0)) … >> … so info->rates[3].Rate is untouched and still „0“. > > Sure, it can be 0 and so is the number of tries at that rate.. You are absolutely right … as the tries are 0 as well, the 4th mrr[3] is not used. I missed that obvious point. >> I just added a printk() to xmit.c in function ath_tx_fill_desc() just before ath9k_hw_set_txdesc() is called. >> From the log I get, e.g.: >> >> [ 169.543554] mrr setup: mrr[0]:133 mrr[1]:132 mrr[2]:134 mrr[3]:0 >> >> I have not check yet if info->rates[3].Rate == 0 is the lowest possible rate… but I would guess so. > > It is not and even if it were, it does not matter since this 4th item is > used for _0_ tries. I've verified the exact behavior with a sniffer for > a case where the target device does not ACK frames. ath9k ends up > sending at exactly the three different rates indicated in the first > three values and nothing else. With RC probing (which happens to occur > for the initial EAPOL frames, this results in only one attempt at > MCS(>0) and two + two attempts at MCS0. No non-MCS rates are tried. As > pointed out previously, this is likely fine for normal data frames, but > not for EAPOL. I do agree in changing the way to assign a (mrr) rate set for EAPOL frames in the most robust fashion. Greetings Thomas > > -- > Jouni Malinen PGP id EFC895FA > _______________________________________________ > ath9k-devel mailing list > ath9k-devel@lists.ath9k.org > https://lists.ath9k.org/mailman/listinfo/ath9k-devel -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Feb 26, 2015 at 7:22 AM, Adrian Chadd <adrian@freebsd.org> wrote: > On 25 February 2015 at 10:14, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > >> While I realize that people may disagree about the exact details of >> how to fix this in the long run, may I suggest that in the meantime we >> at least get the two workaround patches applied? >> >> I'm talking about the two from Jouni - the "don't encrypt EAPOL >> frames" one, and the one-liner that makes all EAPOL frames go at the >> lowest data rate. >> >> Even if "lowest data rate" is ridiculously low, and even if that might >> disturb other things going on on the same channel at the same time, >> those authentication packets shouldn't be so common as to be a >> problem. No? >> >> Jouni has a few packet dumps for me, and he's stumped as to what >> exactly is going on, but those two patches (well, the one-liner "low >> data rate EAPOL" in particular, it seems) do seem to make my >> connections go through reliably. >> >> And it seems that other drivers already are working around the EAPOL >> issue in similar ways, judging by the comments about iwlwifi. > > [snip] > >> So I'm sure I can improve reception of my laptop, but that's not the >> point. The point is that bad wireless networks aren't so unusual, and >> right now things clearly don't work as well as they could. >> >> Does anybody hate Jouni's two patches *so* much that they can >> articulate *why* it would be wrong to apply them as interim patches? >> And if so, do you have better patches for me to try? Because if not.. > > I agree with you. I think you should just have EAPOL frames go out at > the lowest rate for now and then worry about experimenting with more > interesting ways to make EAPOL / DHCP frames cheaper. It fixes a lot > of problems in noisy areas. That hack was hiding around in various > commercial drivers I've seen, and it's been in FreeBSD for a while. > > Same with DHCP traffic too - it's the second set of data frames that > the rate control code sees, and it's the primary reason I dropped the > initial sample rate down in FreeBSD so the DHCP exchange would have a > better chance of succeeding after association. > > > > -adrian +1 Really, who cares about efficiency here; these are rare control packets. -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Feb 25, 2015 at 10:14 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > I'm talking about the two from Jouni - the "don't encrypt EAPOL > frames" one, and the one-liner that makes all EAPOL frames go at the > lowest data rate. So I just found out and confirmed that this is not Atheros-specific in any way - it looks like it's simply the UniFi AP that does not like high-data-rate authentification frames at all. Because it looks like the brcmsmac driver has *exactly* the same behavior with this AP (in an Apple Macbook air). I assume brcmsmac uses the net/80211/tx.c logic too. And Jouni's one-liner fixes that one too, although as usual, maybe there is some testing noise, and I screwed something up. This time I only did the one-liner, so that's the critical one. It's interesting to note how nothing else has been unhappy with that network (admittedly it's been mainly android devices and a HP printer that I've tested), so it looks like everybody else does low-rate authentication packets anyway. So this actually looks like a Ubiquiti UniFi AP bug to me, but it also looks like presumably everybody else does low-rate initial packets, and our kernel 802.11 layer should just follow suit. The whole robustness principle and "be conservative in what you send, and liberal in what you accept" etc. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 02/26/15 06:55, Linus Torvalds wrote: > Because it looks like the brcmsmac driver has*exactly* the same > behavior with this AP (in an Apple Macbook air). I assume brcmsmac > uses the net/80211/tx.c logic too. That makes sense as brcmsmac is a mac80211 driver that uses the minstrel-ht rate control algorithm. iwlwifi is also a mac80211 driver but it uses a different rate control algorithm so if you have that lying around it would be interesting to test that (if you did not already do that). Regards, Arend -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Feb 25, 2015 at 10:14:45AM -0800, Linus Torvalds wrote: > While I realize that people may disagree about the exact details of > how to fix this in the long run, may I suggest that in the meantime we > at least get the two workaround patches applied? > Does anybody hate Jouni's two patches *so* much that they can > articulate *why* it would be wrong to apply them as interim patches? > And if so, do you have better patches for me to try? Because if not.. Of all people, I do actually have some hatred on the one-liner to force minimum rate for all EAPOL TX attempts. That is punishing the vast majority of cases where the AP is perfectly fine with higher MCS rates being used (and MCS 0 being sufficient fallback option) for EAPOL. Being able to use higher TX rates as the initial attempt is a nice feature and even though this may be limited to number of upstream Linux drivers today, that part of the feature is an improvement, IMHO. This can even be more robust in some environments especially when going through long EAP exchange with certain types of interference. That said, it is clear to me that we need to modify the current behavior to be more conservative with some of the EAPOL frame retries. I'm not familiar enough with the current minstrel implementation to prepare a clean patch doing the approach I think that should be used here (replace the last rate with a low basic rate and add couple of tries with it). I have reason to believe that Felix might be able to look at this in couple of weeks, though. In other words, I think I'll send out a more formal version of the patches so that they can be applied now if desired while keeping in mind that we may want to replace the minrate-for-EAPOL with something else later. As far as the EAPOL-Key msg 4/4 retries without encryption patch is concerned, I think I have enough concern on the couple of corner cases that I know it does not address (and may break), so it may take some more time before I'm ready to suggest it (or a bit modified version of it) to be applied, though.
Jouni Malinen wrote: > punishing the vast majority of cases where the AP is perfectly fine > with higher MCS rates First make it work (everywhere), then make it fast (where possible). //Peter -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 26 February 2015 at 02:20, Jouni Malinen <j@w1.fi> wrote: > On Wed, Feb 25, 2015 at 10:14:45AM -0800, Linus Torvalds wrote: >> While I realize that people may disagree about the exact details of >> how to fix this in the long run, may I suggest that in the meantime we >> at least get the two workaround patches applied? > >> Does anybody hate Jouni's two patches *so* much that they can >> articulate *why* it would be wrong to apply them as interim patches? >> And if so, do you have better patches for me to try? Because if not.. > > Of all people, I do actually have some hatred on the one-liner to force > minimum rate for all EAPOL TX attempts. That is punishing the vast > majority of cases where the AP is perfectly fine with higher MCS rates > being used (and MCS 0 being sufficient fallback option) for EAPOL. Being > able to use higher TX rates as the initial attempt is a nice feature and > even though this may be limited to number of upstream Linux drivers > today, that part of the feature is an improvement, IMHO. This can even > be more robust in some environments especially when going through long > EAP exchange with certain types of interference. I remember running the math - well, the "airtime" math, and realised it was almost always cheaper to do the single, non-aggregated EAPOL frame exchange at the lowest rate than to try higher rates and fall back to the lowest rates. -adrian -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c index c314c59..b45038f 100644 --- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -566,6 +566,7 @@ ieee80211_tx_h_check_control_port_protocol(struct ieee80211_tx_data *tx) if (tx->sdata->control_port_no_encrypt) info->flags |= IEEE80211_TX_INTFL_DONT_ENCRYPT; info->control.flags |= IEEE80211_TX_CTRL_PORT_CTRL_PROTO; + info->flags |= IEEE80211_TX_CTL_USE_MINRATE; } return TX_CONTINUE;