diff mbox

AR9462 problems connecting again..

Message ID 20150223171700.GA29730@w1.fi (mailing list archive)
State Not Applicable
Headers show

Commit Message

Jouni Malinen Feb. 23, 2015, 5:17 p.m. UTC
On Sun, Feb 22, 2015 at 05:41:25PM -0800, Linus Torvalds wrote:
> Nope, everything else I have seems to be intel wireless. I think one
> of the kids machines is a Mac Mini with an ath5k thing, but I'm hoping
> the wpa_supplicant.log is sufficient to give somebody an idea.

It looks like there are two issues here: 1) EAPOL-Key message 4/4 (i.e.,
the second Data frame sent by the station during association) is somehow
not seen or accepted by the AP, 2) recovery from that msg 4/4 getting
lost does not work in the intended way.

For (1), one would likely need to see a wireless capture from a separate
WLAN radio to say something certain about what exactly happened.
ath5k-compatible radios would not be sufficient since this would need to
be able to see HT frames which ath9k is mostly like using here. I
haven't used iwlwifi as a sniffer, so I do not know whether that would
be a workable option for this.

In my tests, I can see the rate control algorithm (minstrel_ht) using a
pretty high rate (even MCS14 with 2-stream device, which is one short of
maximum) which is quite a bit higher than I would myself have selected
for an EAPOL frame (especially for EAPOL-Key 4/4 which has these lovely
issues with retransmissions) more or less immediately after association.
Anyway, that frame is supposed to get additional fall-back TX rates for
link layer retransmissions and those should make it much more likely for
this to be received by the AP. Sniffer trace would confirm that.


For (2), wpa_supplicant debug log gives a pretty clear idea on what is
happening and based on that, I can easily reproduce this part. In fact,
I now have a fully automated test script for verifying this with
mac80211_hwsim.

Some alternative means of improving this was discussed in this thread.
I'm not completely happy with this, but the following mac80211 changes
do fix this retransmission case and will likely make the issue you are
seeing disappear since it allows any of the four EAPOL-Key msg 4/4
transmissions to be received by the AP to avoid the disconnection. This
doesn't fix the initial trigger behind the issue, but with those EAPOL
retransmissions working, the likelihood of all four attempts failing
(with each getting multiple link-layer retransmissions) is quite small.


mac80211: Do not encrypt EAPOL frames before peer has used the key

The 4-way handshake design is a bit inconvenient for the case where msg
3/4 needs to be transmitted (e.g., AP not receiving first transmission
of 4/4). The supplicant side has already configured the pairwise key at
that point in time and while we allow unencrypted msg 3/4 to be
received, we were sending out 4/4 encrypted which will result in it
getting dropped. User space would be aware of when the EAPOL frame
should not be encrypted, but we do not have convenient means of telling
mac80211 that. For now, use a mac80211-specific hack to avoid EAPOL
frame encryption to allow retransmission of 4-way handshake messages 3/4
and 4/4 to work in a way that the authenticator side can process 4/4.

---
 net/mac80211/key.h |  2 ++
 net/mac80211/rx.c  | 11 +++++++++++
 net/mac80211/tx.c  | 13 +++++++++++++
 3 files changed, 26 insertions(+)

Comments

Emmanuel Grumbach Feb. 23, 2015, 6 p.m. UTC | #1
On Mon, Feb 23, 2015 at 7:17 PM, Jouni Malinen <j@w1.fi> wrote:
> On Sun, Feb 22, 2015 at 05:41:25PM -0800, Linus Torvalds wrote:
>> Nope, everything else I have seems to be intel wireless. I think one
>> of the kids machines is a Mac Mini with an ath5k thing, but I'm hoping
>> the wpa_supplicant.log is sufficient to give somebody an idea.
>
> It looks like there are two issues here: 1) EAPOL-Key message 4/4 (i.e.,
> the second Data frame sent by the station during association) is somehow
> not seen or accepted by the AP, 2) recovery from that msg 4/4 getting
> lost does not work in the intended way.
>
> For (1), one would likely need to see a wireless capture from a separate
> WLAN radio to say something certain about what exactly happened.
> ath5k-compatible radios would not be sufficient since this would need to
> be able to see HT frames which ath9k is mostly like using here. I
> haven't used iwlwifi as a sniffer, so I do not know whether that would
> be a workable option for this.

iwlwifi as a sniffer is good enough for that.

>
> In my tests, I can see the rate control algorithm (minstrel_ht) using a
> pretty high rate (even MCS14 with 2-stream device, which is one short of
> maximum) which is quite a bit higher than I would myself have selected
> for an EAPOL frame (especially for EAPOL-Key 4/4 which has these lovely
> issues with retransmissions) more or less immediately after association.
> Anyway, that frame is supposed to get additional fall-back TX rates for
> link layer retransmissions and those should make it much more likely for
> this to be received by the AP. Sniffer trace would confirm that.
>

IIRC we had lots of issues with that and iwlwifi's rate control doesn't jump to
high rates immediately.

>
> For (2), wpa_supplicant debug log gives a pretty clear idea on what is
> happening and based on that, I can easily reproduce this part. In fact,
> I now have a fully automated test script for verifying this with
> mac80211_hwsim.
>
> Some alternative means of improving this was discussed in this thread.
> I'm not completely happy with this, but the following mac80211 changes
> do fix this retransmission case and will likely make the issue you are
> seeing disappear since it allows any of the four EAPOL-Key msg 4/4
> transmissions to be received by the AP to avoid the disconnection. This
> doesn't fix the initial trigger behind the issue, but with those EAPOL
> retransmissions working, the likelihood of all four attempts failing
> (with each getting multiple link-layer retransmissions) is quite small.
>
>
> mac80211: Do not encrypt EAPOL frames before peer has used the key
>
> The 4-way handshake design is a bit inconvenient for the case where msg
> 3/4 needs to be transmitted (e.g., AP not receiving first transmission

needs to *re*transmitted I guess.

> of 4/4). The supplicant side has already configured the pairwise key at
> that point in time and while we allow unencrypted msg 3/4 to be
> received, we were sending out 4/4 encrypted which will result in it
> getting dropped. User space would be aware of when the EAPOL frame
> should not be encrypted, but we do not have convenient means of telling
> mac80211 that. For now, use a mac80211-specific hack to avoid EAPOL
> frame encryption to allow retransmission of 4-way handshake messages 3/4
> and 4/4 to work in a way that the authenticator side can process 4/4.
>
> ---
>  net/mac80211/key.h |  2 ++
>  net/mac80211/rx.c  | 11 +++++++++++
>  net/mac80211/tx.c  | 13 +++++++++++++
>  3 files changed, 26 insertions(+)
>
> diff --git a/net/mac80211/key.h b/net/mac80211/key.h
> index d57a9915..3e23276 100644
> --- a/net/mac80211/key.h
> +++ b/net/mac80211/key.h
> @@ -120,6 +120,8 @@ struct ieee80211_key {
>
>         /* number of times this key has been used */
>         int tx_rx_count;
> +       /* whether a valid RX decryption has happened with this key */
> +       bool valid_rx_seen;
>
>  #ifdef CONFIG_MAC80211_DEBUGFS
>         struct {
> diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
> index 1101563..8f3f86c 100644
> --- a/net/mac80211/rx.c
> +++ b/net/mac80211/rx.c
> @@ -1691,6 +1691,16 @@ ieee80211_rx_h_decrypt(struct ieee80211_rx_data *rx)
>         return result;
>  }
>
> +static ieee80211_rx_result debug_noinline
> +ieee80211_rx_h_check_key_use(struct ieee80211_rx_data *rx)
> +{
> +       struct ieee80211_rx_status *status = IEEE80211_SKB_RXCB(rx->skb);
> +
> +       if ((status->flag & RX_FLAG_DECRYPTED) && rx->key)
> +               rx->key->valid_rx_seen = true;
> +       return RX_CONTINUE;
> +}
> +
>  static inline struct ieee80211_fragment_entry *
>  ieee80211_reassemble_add(struct ieee80211_sub_if_data *sdata,
>                          unsigned int frag, unsigned int seq, int rx_queue,
> @@ -3139,6 +3149,7 @@ static void ieee80211_rx_handlers(struct ieee80211_rx_data *rx,
>                 CALL_RXH(ieee80211_rx_h_uapsd_and_pspoll)
>                 CALL_RXH(ieee80211_rx_h_sta_process)
>                 CALL_RXH(ieee80211_rx_h_decrypt)
> +               CALL_RXH(ieee80211_rx_h_check_key_use)
>                 CALL_RXH(ieee80211_rx_h_defragment)
>                 CALL_RXH(ieee80211_rx_h_michael_mic_verify)
>                 /* must be after MMIC verify so header is counted in MPDU mic */
> diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
> index 88a18ff..c314c59 100644
> --- a/net/mac80211/tx.c
> +++ b/net/mac80211/tx.c
> @@ -612,6 +612,19 @@ ieee80211_tx_h_select_key(struct ieee80211_tx_data *tx)
>                 return TX_DROP;
>         }
>
> +       if (tx->key &&
> +           unlikely(info->control.flags & IEEE80211_TX_CTRL_PORT_CTRL_PROTO) &&
> +           !tx->key->valid_rx_seen) {
> +               /* Do not encrypt EAPOL frames before peer has used the key */
> +               /* FIX: This is not really complete.. It would be at least
> +                * theoretically possible for the peer to never send a Data
> +                * frame and if we were to initiate reauthentication or
> +                * rekeying, we might need to encrypt the initiating EAPOL
> +                * frame.
> +                */
> +               tx->key = NULL;
> +       }
> +
>         if (tx->key) {
>                 bool skip_hw = false;
>
> --
> Jouni Malinen                                            PGP id EFC895FA
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Linus Torvalds Feb. 23, 2015, 8:06 p.m. UTC | #2
On Mon, Feb 23, 2015 at 9:17 AM, Jouni Malinen <j@w1.fi> wrote:
>
> mac80211: Do not encrypt EAPOL frames before peer has used the key

Hmm. This patch does not seem to make a difference. I thought it did
at first, but then removed the wpa_supplicant debugging, and got the
same failures.

On Sun, Feb 22, 2015 at 10:01 PM, Sujith Manoharan <sujith@msujith.org> wrote:
>
> Or 'iw dev wlp1s0 set bitrates ht-mcs-2.4 0' to choose the lowest
> HT rate.

This *might* have worked. But it's a bit hard to really make sure,
since it sometimes does succeed even without debugging when I do
nothing at all, but it did work twice in a row after doing that.

> Sporadic association problems could be a problem with the chosen rates.
> This would show the rates for the EAPOL frames:
>
> iw dev wlp1s0 interface add mon0 type monitor
> ifconfig mon0 up

Hmm. I don't actually see a "mon0" interface after the "iw dev
interface add" thing. Yes, "iw" sees it when I do "iw dev", but
"ifconfig" does not.

This machine has a fairly minimal kernel config. Does that "type
monitor" interface perhaps need some debug infrastructure that I
haven't added?

> tshark -i mon0 -Y eapol -T fields -e radiotap.datarate -e wlan -e eapol -e wlan.sa -e wlan.da

.. and then this fails, presumably for similar reasons.

                        Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Linus Torvalds Feb. 23, 2015, 8:11 p.m. UTC | #3
On Mon, Feb 23, 2015 at 12:06 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> This machine has a fairly minimal kernel config. Does that "type
> monitor" interface perhaps need some debug infrastructure that I
> haven't added?

Nope. Same behavior with a F21 kernel (which means that the touchpad
doesn't work, but that's a separate and known issue with this
platform).

                       Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jouni Malinen Feb. 23, 2015, 9:30 p.m. UTC | #4
On Mon, Feb 23, 2015 at 12:06:09PM -0800, Linus Torvalds wrote:
> On Mon, Feb 23, 2015 at 9:17 AM, Jouni Malinen <j@w1.fi> wrote:
> > mac80211: Do not encrypt EAPOL frames before peer has used the key
> 
> Hmm. This patch does not seem to make a difference. I thought it did
> at first, but then removed the wpa_supplicant debugging, and got the
> same failures.

Interesting. That would seem to imply that this AP does not like
something about the EAPOL-Key msg 4/4 from the station during the faster
timing (no wpa_supplicant debug) and would even be unable to accept the
responses during retransmissions.. I'm not sure what could cause that.
Is there anything that the AP could provide as far as logging is
concerned?

How far is the station from the AP? Would it be possible to see whether
the behavior changes if you were within, say, five meters or so? This
should allow the higher transmit rate proposed by minstrel_ht to work
pretty reliably and that could confirm if this is indeed related to too
high a rate being used here and then for some reason being unable to
fall back to a sufficiently low rate to work at higher distance.

It would be useful if you can capture the 802.11 frame exchange from a
failed connection case with an external wireless sniffer. It sounds
like this should be doable with iwlwifi and while I haven't tested this
myself, I'd assume something like this would do the trick (this is
assuming that the interface is not being used at the time, e.g., with
wpa_supplicant):

sudo iw dev wlan0 set type monitor
sudo ip link set wlan0 up
sudo iw dev wlan0 set freq 2462 HT20
sudo dumpcap -i wlan0 -w /tmp/wlan0.pcap

# test connection and stop dumpcap after the failure
# and "sudo ip link set wlan0 down; sudo iw dev wlan0 set type station"
# to restore normal station mode
Linus Torvalds Feb. 23, 2015, 9:53 p.m. UTC | #5
On Mon, Feb 23, 2015 at 1:30 PM, Jouni Malinen <j@w1.fi> wrote:
>
> How far is the station from the AP? Would it be possible to see whether
> the behavior changes if you were within, say, five meters or so?

Well, it was pretty much within five meters already, but there was a
thin wall in between (and the old AP was right next to the laptop,
which might add some noise even if they are on different channels).
Going closer does seem to help, but again, it's not like this is 100%
reproducible to begin with.

So the theory that the driver starts at too high a transmit rate, and
then does not handle failures well, might be true. Of course, "not
handle failures well" is something of an understatement.

> It would be useful if you can capture the 802.11 frame exchange from a
> failed connection case with an external wireless sniffer.

I will try with my (much more reliable) iwlwifi laptop. At least the
merge window is over, so I should have some time. Knock wood.

              Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Adrian Chadd Feb. 23, 2015, 10:22 p.m. UTC | #6
On 23 February 2015 at 13:53, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Feb 23, 2015 at 1:30 PM, Jouni Malinen <j@w1.fi> wrote:
>>
>> How far is the station from the AP? Would it be possible to see whether
>> the behavior changes if you were within, say, five meters or so?
>
> Well, it was pretty much within five meters already, but there was a
> thin wall in between (and the old AP was right next to the laptop,
> which might add some noise even if they are on different channels).
> Going closer does seem to help, but again, it's not like this is 100%
> reproducible to begin with.
>
> So the theory that the driver starts at too high a transmit rate, and
> then does not handle failures well, might be true. Of course, "not
> handle failures well" is something of an understatement.
>
>> It would be useful if you can capture the 802.11 frame exchange from a
>> failed connection case with an external wireless sniffer.
>
> I will try with my (much more reliable) iwlwifi laptop. At least the
> merge window is over, so I should have some time. Knock wood.

Hm, can we just hack mac80211/ath9k to set /all/ EAPOL frames to the
lowest negotiated basic rate and test? That way we don't have to worry
about things resetting fixed rates or whatnot.

I've done that with FreeBSD's atheros/intel drivers and net80211 stack
to fix exactly these, although I'm thinking about adding a patch based
on Jouni's note about EAPOL encrypting the final response.

Does ath9k have an easy interface for dumping out the TX descriptors
and response? That way we can see (a) what it's transmitting as, and
(b) whether the hardware indicated it got an ACK for the particular
frame.


-adrian
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/mac80211/key.h b/net/mac80211/key.h
index d57a9915..3e23276 100644
--- a/net/mac80211/key.h
+++ b/net/mac80211/key.h
@@ -120,6 +120,8 @@  struct ieee80211_key {
 
 	/* number of times this key has been used */
 	int tx_rx_count;
+	/* whether a valid RX decryption has happened with this key */
+	bool valid_rx_seen;
 
 #ifdef CONFIG_MAC80211_DEBUGFS
 	struct {
diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index 1101563..8f3f86c 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -1691,6 +1691,16 @@  ieee80211_rx_h_decrypt(struct ieee80211_rx_data *rx)
 	return result;
 }
 
+static ieee80211_rx_result debug_noinline
+ieee80211_rx_h_check_key_use(struct ieee80211_rx_data *rx)
+{
+	struct ieee80211_rx_status *status = IEEE80211_SKB_RXCB(rx->skb);
+
+	if ((status->flag & RX_FLAG_DECRYPTED) && rx->key)
+		rx->key->valid_rx_seen = true;
+	return RX_CONTINUE;
+}
+
 static inline struct ieee80211_fragment_entry *
 ieee80211_reassemble_add(struct ieee80211_sub_if_data *sdata,
 			 unsigned int frag, unsigned int seq, int rx_queue,
@@ -3139,6 +3149,7 @@  static void ieee80211_rx_handlers(struct ieee80211_rx_data *rx,
 		CALL_RXH(ieee80211_rx_h_uapsd_and_pspoll)
 		CALL_RXH(ieee80211_rx_h_sta_process)
 		CALL_RXH(ieee80211_rx_h_decrypt)
+		CALL_RXH(ieee80211_rx_h_check_key_use)
 		CALL_RXH(ieee80211_rx_h_defragment)
 		CALL_RXH(ieee80211_rx_h_michael_mic_verify)
 		/* must be after MMIC verify so header is counted in MPDU mic */
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 88a18ff..c314c59 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -612,6 +612,19 @@  ieee80211_tx_h_select_key(struct ieee80211_tx_data *tx)
 		return TX_DROP;
 	}
 
+	if (tx->key &&
+	    unlikely(info->control.flags & IEEE80211_TX_CTRL_PORT_CTRL_PROTO) &&
+	    !tx->key->valid_rx_seen) {
+		/* Do not encrypt EAPOL frames before peer has used the key */
+		/* FIX: This is not really complete.. It would be at least
+		 * theoretically possible for the peer to never send a Data
+		 * frame and if we were to initiate reauthentication or
+		 * rekeying, we might need to encrypt the initiating EAPOL
+		 * frame.
+		 */
+		tx->key = NULL;
+	}
+
 	if (tx->key) {
 		bool skip_hw = false;