Message ID | 20180515102202.2021-1-alexander.wetzel@web.de (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Johannes Berg |
Headers | show |
On Tue, 2018-05-15 at 12:22 +0200, Alexander Wetzel wrote: > Rekeying a pairwise key with encryption offload and only keyid 0 has two > potential races which can freeze the wlan conection till rekeyed again: > > 1) For incomming packets: > If the local STA installs the key prior to the remote STA we still > have the old key active in the hardware for a short time after > mac80211 switched to the new key. > The card can still hand over packets decoded with the old key to > mac80211, bumping the new PN (IV) value to an incorrect high number and > tricking the local replay detection to drop all packets really sent > with the new key. > > 2) For outgoing packets: > If mac80211 is providing the PN (IV) and hands over the cleartext > packets for encryption to the hardware immediately prior to a key > change the driver/card may process the queued packets after > switching to the new key. > This will immediatelly bump the PN (IV) value on the remote STA to > an incorrect high number, also freezing the connection. > > Both issues can be prevented by first replacing the key in the HW and > makeing sure no aggregation sessions are running during the rekey. Getting back to this, am I understanding correctly that in the latter (outgoing) case this would cause Also, I think you should probably describe better why the aggregation session stuff is needed. I'm already thinking there times about it again ... > + ieee80211_sta_tear_down_BA_sessions( > + sta, AGG_STOP_LOCAL_REQUEST); minor indentation issue here > + ieee80211_flush_queues(key->local, key->sdata, false); > + ieee80211_key_disable_hw_accel(key); I'm not sure all drivers implement drv_flush() [correctly], what happens if they don't? I guess some packets end up being transmitted in clear text or a dummy key, unless the hardware/firmware knows about this and drops them? Perhaps that means we should make this whole thing opt-in, and leave it up to driver authors to first validate that they handle the flushing correctly? Regarding the code: the whole dance you do with ieee80211_key_link() and ieee80211_key_replace() seems to be a little pointless because you still add the key to debugfs and then free it, on errors that is? johannes
> On Tue, 2018-05-15 at 12:22 +0200, Alexander Wetzel wrote: >> Rekeying a pairwise key with encryption offload and only keyid 0 has two >> potential races which can freeze the wlan conection till rekeyed again: >> >> 1) For incomming packets: >> If the local STA installs the key prior to the remote STA we still >> have the old key active in the hardware for a short time after >> mac80211 switched to the new key. >> The card can still hand over packets decoded with the old key to >> mac80211, bumping the new PN (IV) value to an incorrect high number and >> tricking the local replay detection to drop all packets really sent >> with the new key. >> >> 2) For outgoing packets: >> If mac80211 is providing the PN (IV) and hands over the cleartext >> packets for encryption to the hardware immediately prior to a key >> change the driver/card may process the queued packets after >> switching to the new key. >> This will immediatelly bump the PN (IV) value on the remote STA to >> an incorrect high number, also freezing the connection. >> >> Both issues can be prevented by first replacing the key in the HW and >> makeing sure no aggregation sessions are running during the rekey. > > Getting back to this, am I understanding correctly that in the latter > (outgoing) case this would cause I don't get the point here...Shall I flesh out the description of the exiting races in the code a bit more? > > Also, I think you should probably describe better why the aggregation > session stuff is needed. I'm already thinking there times about it again > ... I'll rework the description and will go into more details. Here a first quick draft what I plan amend to the commit message. But this got kind of long: ... Both issues can be prevented by first replacing the key in the HW and making sure no aggregation sessions are running during the rekey. The "core" of the fix is to change the old unicast key install sequence - atomic switch over to the new key in mac80211 - remove the old key in the HW and mac80211 - add new key in the hw to - mark the old key as tainted to drop TX packets trying to use it - remove the old key in the HW and mac80211 - add new key to the HW - atomic switch over to the new key in mac80211 Since the new key is not marked as tainted the last step will also enable TX again. With the new procedure the HW will be unable to decode packets still encrypted with the old key prior to switching to the new key in mac80211. Locking down TX during the rekey makes sure that there are no outgoing packets mac80211 prepared for the old key the hw can encrypt with the new key. We can now decrypt and get incoming packets while mac80211 is still on the old key, but the worst (and most likely) case is now, that the replay detection will drop those packets till mac80211 also switch over. Instead of checking PN from old key packets against the new key (and bump the PN for the new key, killing our connection) we are now checking the PN from new key packets against the old key (and drop a few packets during rekey only). When using TX aggregation we get an additional race which can poison the remote station: When a TX aggregation session is running during the key replacement it's possible that one of the frames send out prior of the TX lock down (old key) got lost and will be repeated after the TX lock down is lifted, using the old skb prepared for the old key with the new installed key in the hardware. This would bump the PN counter of our peer to an incorrect value and breaking the RX for the remote station. Stopping all running aggregation sessions and declining any new ones till we have switched over to the new key side steps that problem. /end commit draft It would be possible to keep the aggregation sessions running, but that seems to need some quite invasive changes and checks. > >> + ieee80211_sta_tear_down_BA_sessions( >> + sta, AGG_STOP_LOCAL_REQUEST); > > minor indentation issue here I'll go for a 81 char long single line here. > >> + ieee80211_flush_queues(key->local, key->sdata, false); >> + ieee80211_key_disable_hw_accel(key); > > I'm not sure all drivers implement drv_flush() [correctly], what happens > if they don't? I guess some packets end up being transmitted in clear > text or a dummy key, unless the hardware/firmware knows about this and > drops them? Flushing the queues is not needed for to the logic and only a workaround for drivers like ath9k which can still send out packets for a key which was already deleted. When the driver guarantees that after the key deletion function returns to mac80211 that there will be no more packets send out prepared for the old key it can be removed. The driver could simply wait for whatever the may queue time may be or detect and drop these packets. (In fact it can even send out the packet again, it just must make sure to still use the old and official deleted key. May be handy for drivers like iwlwifi.) > > Perhaps that means we should make this whole thing opt-in, and leave it > up to driver authors to first validate that they handle the flushing > correctly? I tried to minimize the changes, but if we can consider an API change I would ask the drivers to provide a new function to switch directly from one key to another, without the need to delete it first. And to guarantee there, that after the function returns no packets with prepared for the old key can be sent out. Including retransmissions. That way drivers like ath9k should be able to directly program the new key and drivers for cards where this is not saving time can simply call delete and add internally. As fallback I would still want to use the current code with a flush, well knowing that not all drivers implement it and only remove it once all drivers have switched to the new API for rekeys. > > > Regarding the code: the whole dance you do with ieee80211_key_link() and > ieee80211_key_replace() seems to be a little pointless because you still > add the key to debugfs and then free it, on errors that is? I hope the more verbose description above explains that part. The intend of the dance is to finalize the switch to the new key in the hardware prior to doing it in mac80211. It's probably possible to bypass more of the code, but I did not touch the logic besides what I needed, understand or verified to be harmless. Alexander
Hi, > > > 2) For outgoing packets: > > > If mac80211 is providing the PN (IV) and hands over the cleartext > > > packets for encryption to the hardware immediately prior to a key > > > change the driver/card may process the queued packets after > > > switching to the new key. > > > This will immediatelly bump the PN (IV) value on the remote STA to > > > an incorrect high number, also freezing the connection. > > > > > > Both issues can be prevented by first replacing the key in the HW and > > > makeing sure no aggregation sessions are running during the rekey. > > > > Getting back to this, am I understanding correctly that in the latter > > (outgoing) case this would cause > > I don't get the point here...Shall I flesh out the description of the > exiting races in the code a bit more? You couldn't, I didn't finish this sentence for some reason ... or wrote and then deleted it by accident? I meant to say: Am I understanding correctly that in the latter (outgoing) case this might cause unencrypted packets to be transmitted? I talked about that more below. > > Also, I think you should probably describe better why the aggregation > > session stuff is needed. I'm already thinking there times about it again > > ... > > I'll rework the description and will go into more details. > Here a first quick draft what I plan amend to the commit message. But > this got kind of long: > ... > Both issues can be prevented by first replacing the key in the HW and > making sure no aggregation sessions are running during the rekey. > > The "core" of the fix is to change the old unicast key install sequence > - atomic switch over to the new key in mac80211 > - remove the old key in the HW and mac80211 > - add new key in the hw > to > - mark the old key as tainted to drop TX packets trying to use it > - remove the old key in the HW and mac80211 > - add new key to the HW > - atomic switch over to the new key in mac80211 > > Since the new key is not marked as tainted the last step will also > enable TX again. > > With the new procedure the HW will be unable to decode packets still > encrypted with the old key prior to switching to the new key in > mac80211. Locking down TX during the rekey makes sure that there are no > outgoing packets mac80211 prepared for the old key the hw can encrypt > with the new key. > We can now decrypt and get incoming packets while mac80211 is still on > the old key, but the worst (and most likely) case is now, that the > replay detection will drop those packets till mac80211 also switch over. This actually is somewhat problematic, at least for TKIP it could cause countermeasures. Should we exclude TKIP here somehow? I don't think we can disable countermeasures because then we could be attacked in this time window without starting them, and we have a really hard time distinguishing attacks and "fake" replays. > Instead of checking PN from old key packets against the new key (and > bump the PN for the new key, killing our connection) we are now checking > the PN from new key packets against the old key (and drop a few packets > during rekey only). > > When using TX aggregation we get an additional race which can poison the > remote station: > When a TX aggregation session is running during the key replacement it's > possible that one of the frames send out prior of the TX lock down (old > key) got lost and will be repeated after the TX lock down is lifted, > using the old skb prepared for the old key with the new installed key in > the hardware. This would bump the PN counter of our peer to an incorrect > value and breaking the RX for the remote station. Right, this is what I had forgotten about already. > > > + ieee80211_flush_queues(key->local, key->sdata, false); > > > + ieee80211_key_disable_hw_accel(key); > > > > I'm not sure all drivers implement drv_flush() [correctly], what happens > > if they don't? I guess some packets end up being transmitted in clear > > text or a dummy key, unless the hardware/firmware knows about this and > > drops them? > > Flushing the queues is not needed for to the logic and only a workaround > for drivers like ath9k which can still send out packets for a key which > was already deleted. When the driver guarantees that after the key > deletion function returns to mac80211 that there will be no more packets > send out prepared for the old key it can be removed. What will happen for other kinds of packets though? For iwlwifi, for example, the key material is added into the key. It thus doesn't have this race in the first place, but it will definitely send out packets after Btw - perhaps that means we should avoid the complicated mechanisms like TX aggregation shutdown for keys that the driver marks as being safe? Clearly for iwlwifi (at least CCMP and before, not with the longer keys in GCMP-256) the race can't possibly happen. In other drivers though, I worry that removing the key will *not* mean that there are no more packets transmitted with it. If you have a key cache in hardware then it might just be that marking the cache entry as invalid means the encryption will be skipped when encountering a packet to be transmitted with the key from a given (in the packet) key cache index, and then the frame goes out unprotected? > I tried to minimize the changes, but if we can consider an API change I > would ask the drivers to provide a new function to switch directly from > one key to another, without the need to delete it first. And to > guarantee there, that after the function returns no packets with > prepared for the old key can be sent out. Including retransmissions. This would be pretty tricky for most drivers though. > That way drivers like ath9k should be able to directly program the new > key and drivers for cards where this is not saving time can simply call > delete and add internally. I don't think that's really all that much savings, time-wise? But it might address the "unprotected TX" issue I worry about above. > > Regarding the code: the whole dance you do with ieee80211_key_link() and > > ieee80211_key_replace() seems to be a little pointless because you still > > add the key to debugfs and then free it, on errors that is? > > I hope the more verbose description above explains that part. > The intend of the dance is to finalize the switch to the new key in the > hardware prior to doing it in mac80211. It's probably possible to bypass > more of the code, but I did not touch the logic besides what I needed, > understand or verified to be harmless. I'm just saying that err = add_to_hardware(); add_to_debugfs(); if (err) remove_from_debugfs(); is pointless :-) johannes
> Am I understanding correctly that in the latter (outgoing) case this > might cause unencrypted packets to be transmitted? Yes, if we have a driver handling the keys similar to ath9k which is not implementing drv_flush (correctly) this is a possibility. This ties somewhat into the discussion what to handle in mac80211 and what delegate to the drivers. mac80211 itself should drop any packets trying to use the old outgoing key till the new key is rolled out correctly with the patch. >> We can now decrypt and get incoming packets while mac80211 is still on >> the old key, but the worst (and most likely) case is now, that the >> replay detection will drop those packets till mac80211 also switch over. > > This actually is somewhat problematic, at least for TKIP it could cause > countermeasures. Should we exclude TKIP here somehow? > > I don't think we can disable countermeasures because then we could be > attacked in this time window without starting them, and we have a really > hard time distinguishing attacks and "fake" replays. > Yes, that is a definitely a possibility. But excluding TKIP seems to be counterproductive for my understanding: TKIP is due to the weaker cipher more likely to have unicast rekeying enabled. So far I was hesitant to add new per packet check for that, but I guess we have to... I'll add that in the next version, so we can discuss that with some code. >> Flushing the queues is not needed for to the logic and only a workaround >> for drivers like ath9k which can still send out packets for a key which >> was already deleted. When the driver guarantees that after the key >> deletion function returns to mac80211 that there will be no more packets >> send out prepared for the old key it can be removed. > > What will happen for other kinds of packets though? > > For iwlwifi, for example, the key material is added into the key. It > thus doesn't have this race in the first place, but it will definitely > send out packets after That case is fine. A non-broken remote STA will not be able to decode the packet if it has already completed the switch to the new key and not hand over the packet to mac80211 or the equivalent. > > Btw - perhaps that means we should avoid the complicated mechanisms like > TX aggregation shutdown for keys that the driver marks as being safe? > Clearly for iwlwifi (at least CCMP and before, not with the longer keys > in GCMP-256) the race can't possibly happen. Sounds like that should work and I think I'll just try it out. We'll lose the side benefit that shutting down TX aggregation will reduce the risk that a unpatched remote sta freezes the connection, though. And since I started out to first patch ath9k to drop packets for an outgoing key: That looks to become either be ugly (delayed work to complete the key clean up after a given time) or need some API change. Remember we'll still have to shut down RX aggregation or drop all frames of a session running during the rekey. We are not able to tell which key was used to encrypt the frames and which have "old" PNs we can't allow mac80211 to process after switching to the new key. So I'm not sure that keeping TX up and running is worth it. That said I have another working patch which is delaying mac80211 key deletion from my first misguided attempts to at this issue. I could repurpose that and keep RX aggregation also up and running, allowing us to first check if the packet would have been accepted by the old key and only switch to the new PN counter once it's not. But that seems to be kind of invasive and overkill. > > In other drivers though, I worry that removing the key will *not* mean > that there are no more packets transmitted with it. If you have a key > cache in hardware then it might just be that marking the cache entry as > invalid means the encryption will be skipped when encountering a packet > to be transmitted with the key from a given (in the packet) key cache > index, and then the frame goes out unprotected? > Yes, that is part of the reason I had to add the call to drv_flush for ath9k. I've experimented with an additional "retire_key" command on top of the existing to add and delete keys but came to the conclusion that "replace_key" would make more sense for ath9k at least. But in order of my preference I see this options: 1) removing a key in HW must also grantee that any packets queued for this key has either been send or will be dropped after the call returns to mac80211. (Simply sleeping the max queue and retransmit time would be an ugly but simple way to implement that) 2) we add a new function/call like "replace_key" for this special case. But in that case we have to sort out how to handle drivers not implementing it. Thy only secure way would be to disconnect if someone tries to rekey... 3) Is what I implemented. We try what we can with the existing API and any driver not working with that has to be considered buggy. >> I tried to minimize the changes, but if we can consider an API change I >> would ask the drivers to provide a new function to switch directly from >> one key to another, without the need to delete it first. And to >> guarantee there, that after the function returns no packets with >> prepared for the old key can be sent out. Including retransmissions. > > This would be pretty tricky for most drivers though. I do not see any real alternative. Sleeping some reasonable time should be acceptable here, to allow the hw to flush out queued packets. (100ms should probably acceptable here, especially compared to complete connection loss.. Only other idea I have is to lock down TX for all but EAPOL packets sooner, e.g. during the key handshake. But that's more or less only a variant of the sleep above. > >> That way drivers like ath9k should be able to directly program the new >> key and drivers for cards where this is not saving time can simply call >> delete and add internally. > > I don't think that's really all that much savings, time-wise? But it > might address the "unprotected TX" issue I worry about above. > Fully agree. > > I'm just saying that > > err = add_to_hardware(); > > add_to_debugfs(); > > if (err) > remove_from_debugfs(); > > is pointless :-) Agree, that is backwards compatibility a step too far:-) I'll bypass that in the next patch version. Planning to work on v3 on the weekend. Alexander
Hi, On Tue, 2018-06-19 at 22:12 +0200, Alexander Wetzel wrote: > > Am I understanding correctly that in the latter (outgoing) case this > > might cause unencrypted packets to be transmitted? > > Yes, if we have a driver handling the keys similar to ath9k which is not > implementing drv_flush (correctly) this is a possibility. Sadly, I think many drivers fall into this category, so I'm not sure we should "fix the bug" for them at the expense of a potential security risk? So maybe we need to have an opt-in flag for drivers. > This ties somewhat into the discussion what to handle in mac80211 and > what delegate to the drivers. mac80211 itself should drop any packets > trying to use the old outgoing key till the new key is rolled out > correctly with the patch. Yeah, but it can't do it for packets already queued to the driver/hardware, and it can't know how the hardware behaves wrt. encryption: old-iwlwifi style (key material in descriptor) vs. ath9k style (key material in on-chip table). > > > We can now decrypt and get incoming packets while mac80211 is still on > > > the old key, but the worst (and most likely) case is now, that the > > > replay detection will drop those packets till mac80211 also switch over. > > > > This actually is somewhat problematic, at least for TKIP it could cause > > countermeasures. Should we exclude TKIP here somehow? > > > > I don't think we can disable countermeasures because then we could be > > attacked in this time window without starting them, and we have a really > > hard time distinguishing attacks and "fake" replays. > > > > Yes, that is a definitely a possibility. But excluding TKIP seems to be > counterproductive for my understanding: TKIP is due to the weaker cipher > more likely to have unicast rekeying enabled. > So far I was hesitant to add new per packet check for that, but I guess > we have to... I'll add that in the next version, so we can discuss that > with some code. Ok. > > Btw - perhaps that means we should avoid the complicated mechanisms like > > TX aggregation shutdown for keys that the driver marks as being safe? > > Clearly for iwlwifi (at least CCMP and before, not with the longer keys > > in GCMP-256) the race can't possibly happen. > > Sounds like that should work and I think I'll just try it out. We'll > lose the side benefit that shutting down TX aggregation will reduce the > risk that a unpatched remote sta freezes the connection, though. Fair point. > And since I started out to first patch ath9k to drop packets for an > outgoing key: That looks to become either be ugly (delayed work to > complete the key clean up after a given time) or need some API change. Ok. > Remember we'll still have to shut down RX aggregation or drop all frames > of a session running during the rekey. We are not able to tell which key > was used to encrypt the frames and which have "old" PNs we can't allow > mac80211 to process after switching to the new key. So I'm not sure that > keeping TX up and running is worth it. Yeah, good point, and TX is less interesting because we control when/how it's set up. > That said I have another working patch which is delaying mac80211 key > deletion from my first misguided attempts to at this issue. > I could repurpose that and keep RX aggregation also up and running, > allowing us to first check if the packet would have been accepted by the > old key and only switch to the new PN counter once it's not. But that > seems to be kind of invasive and overkill. Yeah, let's not go there for now. Also if HW crypto is involved it usually won't work unless you (a) get the packet and (b) re-encrypt the packet with the wrong key, and then decrypt it with the correct key ... > Yes, that is part of the reason I had to add the call to drv_flush for > ath9k. I've experimented with an additional "retire_key" command on top > of the existing to add and delete keys but came to the conclusion that > "replace_key" would make more sense for ath9k at least. Ok. > But in order of my preference I see this options: > 1) removing a key in HW must also grantee that any packets queued for > this key has either been send or will be dropped after the call returns > to mac80211. (Simply sleeping the max queue and retransmit time would be > an ugly but simple way to implement that) This works but we have to consider a lot of drivers. I guess we're back to some form of "opt-in" scheme then? > 2) we add a new function/call like "replace_key" for this special case. > But in that case we have to sort out how to handle drivers not > implementing it. Thy only secure way would be to disconnect if someone > tries to rekey... Maybe that's not so bad. > 3) Is what I implemented. We try what we can with the existing API and > any driver not working with that has to be considered buggy. I don't think we can really do this though. We break - in a potentially security-relevant way - the older drivers. We can't just say that's driver bugs, IMHO. > > > I tried to minimize the changes, but if we can consider an API change I > > > would ask the drivers to provide a new function to switch directly from > > > one key to another, without the need to delete it first. And to > > > guarantee there, that after the function returns no packets with > > > prepared for the old key can be sent out. Including retransmissions. > > > > This would be pretty tricky for most drivers though. > > I do not see any real alternative. Sleeping some reasonable time should > be acceptable here, to allow the hw to flush out queued packets. (100ms > should probably acceptable here, especially compared to complete > connection loss.. You have no guarantee that even 100ms is enough to get all packets out. They could be sitting on a BK queue and waiting for all BE/VI traffic in the vicinity ... basically forever. > Only other idea I have is to lock down TX for all but EAPOL packets > sooner, e.g. during the key handshake. But that's more or less only a > variant of the sleep above. Yeah, doesn't really fully address this issue anyway, since EAPOL are on VO and others might get way more delayed on HW queues due to contention. I think our best bet is some form of opt-in scheme. Perhaps (2), which drivers can implement also to get like the "best" behaviour. They could also implement there the flushing if they can, for example. Others would get a disconnect, but they probably effectively do already? Obviously if SW crypto is used none of this is a concern, so that's another factor to take into account in this decision logic of whether to disconnect or not? IOW - I'd rather get bugs that we now force a disconnect (if anyone even notices), rather than potentially having a bug that causes unencrypted packets to be sent. johannes
Hello, >>> Am I understanding correctly that in the latter (outgoing) case this >>> might cause unencrypted packets to be transmitted? >> >> Yes, if we have a driver handling the keys similar to ath9k which is not >> implementing drv_flush (correctly) this is a possibility. > > Sadly, I think many drivers fall into this category, so I'm not sure we > should "fix the bug" for them at the expense of a potential security > risk? So maybe we need to have an opt-in flag for drivers. Makes sense. > >> This ties somewhat into the discussion what to handle in mac80211 and >> what delegate to the drivers. mac80211 itself should drop any packets >> trying to use the old outgoing key till the new key is rolled out >> correctly with the patch. > > Yeah, but it can't do it for packets already queued to the > driver/hardware, and it can't know how the hardware behaves wrt. > encryption: old-iwlwifi style (key material in descriptor) vs. ath9k > style (key material in on-chip table). > >>>> We can now decrypt and get incoming packets while mac80211 is still on >>>> the old key, but the worst (and most likely) case is now, that the >>>> replay detection will drop those packets till mac80211 also switch over. >>> >>> This actually is somewhat problematic, at least for TKIP it could cause >>> countermeasures. Should we exclude TKIP here somehow? >>> >>> I don't think we can disable countermeasures because then we could be >>> attacked in this time window without starting them, and we have a really >>> hard time distinguishing attacks and "fake" replays. >>> >> >> Yes, that is a definitely a possibility. But excluding TKIP seems to be >> counterproductive for my understanding: TKIP is due to the weaker cipher >> more likely to have unicast rekeying enabled. >> So far I was hesitant to add new per packet check for that, but I guess >> we have to... I'll add that in the next version, so we can discuss that >> with some code. > > Ok. I was wrong here, this is not an issue. Tkip is simply dropping frames when the IV is too small and never verifies the MIC. And since only MIC errors can trigger counter measures we are fine as it is... Nevertheless I also gave TKIP a quick test. While it's really slow (2.6MB/s instead of 6.5MB/s due to no aggregation) it worked fine with the patch and my sole client downloading. (Again ath9k as AP and iwlwifi for STA) Quick reference to the code: ieee80211_rx_h_decrypt will return "RX_DROP_UNUSABLE" when IV is too small and we bypass ieee80211_rx_h_michael_mic_verify in ieee80211_rx_handlers_result due to the packet drop. > >>> Btw - perhaps that means we should avoid the complicated mechanisms like >>> TX aggregation shutdown for keys that the driver marks as being safe? >>> Clearly for iwlwifi (at least CCMP and before, not with the longer keys >>> in GCMP-256) the race can't possibly happen. >> >> Sounds like that should work and I think I'll just try it out. We'll >> lose the side benefit that shutting down TX aggregation will reduce the >> risk that a unpatched remote sta freezes the connection, though. > > Fair point. > >> And since I started out to first patch ath9k to drop packets for an >> outgoing key: That looks to become either be ugly (delayed work to >> complete the key clean up after a given time) or need some API change. > > Ok. > >> Remember we'll still have to shut down RX aggregation or drop all frames >> of a session running during the rekey. We are not able to tell which key >> was used to encrypt the frames and which have "old" PNs we can't allow >> mac80211 to process after switching to the new key. So I'm not sure that >> keeping TX up and running is worth it. > > Yeah, good point, and TX is less interesting because we control when/how > it's set up. > I'll keep the current aggregation tear down in the patch for now, then. The idea with testing keeping TX aggregation up and running would require an AP not stopping TX aggregation. >> That said I have another working patch which is delaying mac80211 key >> deletion from my first misguided attempts to at this issue. >> I could repurpose that and keep RX aggregation also up and running, >> allowing us to first check if the packet would have been accepted by the >> old key and only switch to the new PN counter once it's not. But that >> seems to be kind of invasive and overkill. > > Yeah, let's not go there for now. Also if HW crypto is involved it > usually won't work unless you (a) get the packet and (b) re-encrypt the > packet with the wrong key, and then decrypt it with the correct key ... I don't get which case this is... But let's not go into that till we have to:-) > >> Yes, that is part of the reason I had to add the call to drv_flush for >> ath9k. I've experimented with an additional "retire_key" command on top >> of the existing to add and delete keys but came to the conclusion that >> "replace_key" would make more sense for ath9k at least. > > Ok. > >> But in order of my preference I see this options: >> 1) removing a key in HW must also grantee that any packets queued for >> this key has either been send or will be dropped after the call returns >> to mac80211. (Simply sleeping the max queue and retransmit time would be >> an ugly but simple way to implement that) > > This works but we have to consider a lot of drivers. I guess we're back > to some form of "opt-in" scheme then? > >> 2) we add a new function/call like "replace_key" for this special case. >> But in that case we have to sort out how to handle drivers not >> implementing it. Thy only secure way would be to disconnect if someone >> tries to rekey... > > Maybe that's not so bad. Agree. After all this is what most of my windows drivers were doing accidentally. >> 3) Is what I implemented. We try what we can with the existing API and >> any driver not working with that has to be considered buggy. > > I don't think we can really do this though. We break - in a potentially > security-relevant way - the older drivers. We can't just say that's > driver bugs, IMHO. We would not break it, only not fix it for all drivers. The current code is already leaking cleartext packets for at least ath9k and most likely many others when the PTK is rekeyed. The patch would improve that, but due to more working rekeys it could leak more packets in specific scenarios, I assume... >>>> I tried to minimize the changes, but if we can consider an API change I >>>> would ask the drivers to provide a new function to switch directly from >>>> one key to another, without the need to delete it first. And to >>>> guarantee there, that after the function returns no packets with >>>> prepared for the old key can be sent out. Including retransmissions. >>> >>> This would be pretty tricky for most drivers though. >> >> I do not see any real alternative. Sleeping some reasonable time should >> be acceptable here, to allow the hw to flush out queued packets. (100ms >> should probably acceptable here, especially compared to complete >> connection loss.. > > You have no guarantee that even 100ms is enough to get all packets out. > They could be sitting on a BK queue and waiting for all BE/VI traffic in > the vicinity ... basically forever. > >> Only other idea I have is to lock down TX for all but EAPOL packets >> sooner, e.g. during the key handshake. But that's more or less only a >> variant of the sleep above. > > Yeah, doesn't really fully address this issue anyway, since EAPOL are on > VO and others might get way more delayed on HW queues due to contention. > > I think our best bet is some form of opt-in scheme. Perhaps (2), which > drivers can implement also to get like the "best" behaviour. They could > also implement there the flushing if they can, for example. Others would > get a disconnect, but they probably effectively do already? > I'll give (2) a shot, then. > Obviously if SW crypto is used none of this is a concern, so that's > another factor to take into account in this decision logic of whether to > disconnect or not? I did not check the SW crypto code but I'm pretty sure that indeed works with the current code. I assume the packets will only be decrypted after an RX aggregation is completed. That would filter out all packets send with the old key, since they simply can't be decrypted any more. Shall we bypass stopping aggregation sessions if we are on SW crypto? We'll again lose the benefit that we prevent a broken remote STA to try a TX Agg. (I tend to still stop them, but do not have a real opinion here...) > > IOW - I'd rather get bugs that we now force a disconnect (if anyone even > notices), rather than potentially having a bug that causes unencrypted > packets to be sent. Any suggestions how to trick hostap/wpa_supplicant into dropping the connection? For me it looks like we can just report an error on key install and expect the wpa_supplicant/hostapd to handle that. Will try that for the next version of the patch, with the other discussed improvements: If driver is not signalling "PTK0 rekey support" we'll simply not accept key installs when we already have a old PTK key for the connection. Alexander
On Fri, 2018-06-29 at 23:14 +0200, Alexander Wetzel wrote: > I was wrong here, this is not an issue. Tkip is simply dropping frames > when the IV is too small and never verifies the MIC. And since only MIC > errors can trigger counter measures we are fine as it is... Err, yes, of course. My bad. > > > 3) Is what I implemented. We try what we can with the existing API and > > > any driver not working with that has to be considered buggy. > > > > I don't think we can really do this though. We break - in a potentially > > security-relevant way - the older drivers. We can't just say that's > > driver bugs, IMHO. > > We would not break it, only not fix it for all drivers. The current code > is already leaking cleartext packets for at least ath9k and most likely > many others when the PTK is rekeyed. The patch would improve that, but > due to more working rekeys it could leak more packets in specific > scenarios, I assume... Yeah, ok, fair point. I don't really know. > > Obviously if SW crypto is used none of this is a concern, so that's > > another factor to take into account in this decision logic of whether to > > disconnect or not? > > I did not check the SW crypto code but I'm pretty sure that indeed works > with the current code. Me too. > I assume the packets will only be decrypted after an RX aggregation is > completed. That would filter out all packets send with the old key, > since they simply can't be decrypted any more. Oh, good point, but that's true - reordering happens before software decryption. > Shall we bypass stopping aggregation sessions if we are on SW crypto? > We'll again lose the benefit that we prevent a broken remote STA to try > a TX Agg. (I tend to still stop them, but do not have a real opinion > here...) I don't really know. > > IOW - I'd rather get bugs that we now force a disconnect (if anyone even > > notices), rather than potentially having a bug that causes unencrypted > > packets to be sent. > > Any suggestions how to trick hostap/wpa_supplicant into dropping the > connection? For me it looks like we can just report an error on key > install and expect the wpa_supplicant/hostapd to handle that. I think easier would be to just disconnect ourselves? At least if we're in managed mode... > Will try that for the next version of the patch, with the other > discussed improvements: If driver is not signalling "PTK0 rekey support" > we'll simply not accept key installs when we already have a old PTK key > for the connection. Since I see you have a new patch - how did that work out? :) johannes
Hello, > >> I was wrong here, this is not an issue. Tkip is simply dropping frames >> when the IV is too small and never verifies the MIC. And since only MIC >> errors can trigger counter measures we are fine as it is... > > Err, yes, of course. My bad. > >>>> 3) Is what I implemented. We try what we can with the existing API and >>>> any driver not working with that has to be considered buggy. >>> >>> I don't think we can really do this though. We break - in a potentially >>> security-relevant way - the older drivers. We can't just say that's >>> driver bugs, IMHO. >> >> We would not break it, only not fix it for all drivers. The current code >> is already leaking cleartext packets for at least ath9k and most likely >> many others when the PTK is rekeyed. The patch would improve that, but >> due to more working rekeys it could leak more packets in specific >> scenarios, I assume... > > Yeah, ok, fair point. I don't really know. > >>> Obviously if SW crypto is used none of this is a concern, so that's >>> another factor to take into account in this decision logic of whether to >>> disconnect or not? >> >> I did not check the SW crypto code but I'm pretty sure that indeed works >> with the current code. > > Me too. > >> I assume the packets will only be decrypted after an RX aggregation is >> completed. That would filter out all packets send with the old key, >> since they simply can't be decrypted any more. > > Oh, good point, but that's true - reordering happens before software > decryption. > >> Shall we bypass stopping aggregation sessions if we are on SW crypto? >> We'll again lose the benefit that we prevent a broken remote STA to try >> a TX Agg. (I tend to still stop them, but do not have a real opinion >> here...) > > I don't really know. > >>> IOW - I'd rather get bugs that we now force a disconnect (if anyone even >>> notices), rather than potentially having a bug that causes unencrypted >>> packets to be sent. >> >> Any suggestions how to trick hostap/wpa_supplicant into dropping the >> connection? For me it looks like we can just report an error on key >> install and expect the wpa_supplicant/hostapd to handle that. > > I think easier would be to just disconnect ourselves? At least if we're > in managed mode... > I still have much to learn about 802.11, but so far I did not see way to directly disconnect a STA. (Maybe spoofing a "signal lost" event or something like that, but I fear complications by losing the sync with the remote STA.) Is there any call/signal you have in mind I could test? hostapd or wpa_supplicant are "ordering" mac80211 to install a new key and are implementing the state machine and are in a good position to handle the fallout... at least theoretically. >> Will try that for the next version of the patch, with the other >> discussed improvements: If driver is not signalling "PTK0 rekey support" >> we'll simply not accept key installs when we already have a old PTK key >> for the connection. > > Since I see you have a new patch - how did that work out? :) So far I've only tested it with iwlwifi client (dvm, so no AP) on my normal "desktop". When implementing replace_key it seems to work fine. (No OTA captures done, just connection tests with the AP running still using the previous patch.) Using a unpatched iwlwifi does not reconnect automatically as expected. Instead I get a pop up asking for the PSK. Entering it reconnects normally. Cancel the prompt disconnect till a manual reconnect. I suspect NetworkManager is handling the rekey like the initial key install and then assumes the PSK is wrong. Hardly surprising but also highly visible to the users. But then only to those using the now broken rekey... Using wpa_supplicant directly reconnects after ~15s. It also assumes the key is wrong and seems to rate limit the connection attempts. Here a log with wpa_supplicat running in the console and dmesg -wT output on top of that: wlp3s0: WPA: Failed to set PTK to the driver (alg=3 keylen=16 bssid=12:6f:3f:0e:33:3c) [Tue Jul 3 21:13:17 2018] wlp3s0: Driver is not supporting save PTK key replacement. Insecure rekey attempt for STA 12:6f:3f:0e:33:3c denied. [Tue Jul 3 21:13:17 2018] wlp3s0: deauthenticating from 12:6f:3f:0e:33:3c by local choice (Reason: 1=UNSPECIFIED) wlp3s0: CTRL-EVENT-DISCONNECTED bssid=12:6f:3f:0e:33:3c reason=1 locally_generated=1 wlp3s0: WPA: 4-Way Handshake failed - pre-shared key may be incorrect wlp3s0: CTRL-EVENT-SSID-TEMP-DISABLED id=0 ssid="mordor-g" auth_failures=1 duration=10 reason=WRONG_KEY wlp3s0: CTRL-EVENT-REGDOM-CHANGE init=CORE type=WORLD [Tue Jul 3 21:13:17 2018] wlp3s0: delba from 12:6f:3f:0e:33:3c (initiator) tid 0 reason code 37 [Tue Jul 3 21:13:17 2018] wlp3s0: Rx BA session stop requested for 12:6f:3f:0e:33:3c tid 0 initiator reason: 0 [Tue Jul 3 21:13:17 2018] wlp3s0: moving STA 12:6f:3f:0e:33:3c to state 3 [Tue Jul 3 21:13:17 2018] wlp3s0: moving STA 12:6f:3f:0e:33:3c to state 2 [Tue Jul 3 21:13:17 2018] wlp3s0: moving STA 12:6f:3f:0e:33:3c to state 1 [Tue Jul 3 21:13:17 2018] wlp3s0: Removed STA 12:6f:3f:0e:33:3c [Tue Jul 3 21:13:17 2018] wlp3s0: Destroyed STA 12:6f:3f:0e:33:3c wlp3s0: CTRL-EVENT-SSID-REENABLED id=0 ssid="mordor-g" wlp3s0: SME: Trying to authenticate with 12:6f:3f:0e:33:3c (SSID='mordor-g' freq=2432 MHz) wlp3s0: Trying to associate with 12:6f:3f:0e:33:3c (SSID='mordor-g' freq=2432 MHz) [Tue Jul 3 21:13:31 2018] wlp3s0: authenticate with 12:6f:3f:0e:33:3c [Tue Jul 3 21:13:31 2018] wlp3s0: Allocated STA 12:6f:3f:0e:33:3c [Tue Jul 3 21:13:31 2018] wlp3s0: Inserted STA 12:6f:3f:0e:33:3c [Tue Jul 3 21:13:31 2018] wlp3s0: send auth to 12:6f:3f:0e:33:3c (try 1/3) [Tue Jul 3 21:13:31 2018] wlp3s0: authenticated [Tue Jul 3 21:13:31 2018] wlp3s0: moving STA 12:6f:3f:0e:33:3c to state 2 [Tue Jul 3 21:13:31 2018] wlp3s0: associate with 12:6f:3f:0e:33:3c (try 1/3) wlp3s0: Associated with 12:6f:3f:0e:33:3c wlp3s0: CTRL-EVENT-SUBNET-STATUS-UPDATE status=0 wlp3s0: CTRL-EVENT-REGDOM-CHANGE init=COUNTRY_IE type=COUNTRY alpha2=DE [Tue Jul 3 21:13:31 2018] wlp3s0: RX AssocResp from 12:6f:3f:0e:33:3c (capab=0x431 status=0 aid=1) [Tue Jul 3 21:13:31 2018] wlp3s0: moving STA 12:6f:3f:0e:33:3c to state 3 [Tue Jul 3 21:13:31 2018] wlp3s0: WMM AC=0 acm=0 aifs=2 cWmin=3 cWmax=7 txop=47 uapsd=0, downgraded=0 [Tue Jul 3 21:13:31 2018] wlp3s0: WMM AC=1 acm=0 aifs=2 cWmin=7 cWmax=15 txop=94 uapsd=0, downgraded=0 [Tue Jul 3 21:13:31 2018] wlp3s0: WMM AC=2 acm=0 aifs=3 cWmin=15 cWmax=1023 txop=0 uapsd=0, downgraded=0 [Tue Jul 3 21:13:31 2018] wlp3s0: WMM AC=3 acm=0 aifs=7 cWmin=15 cWmax=1023 txop=0 uapsd=0, downgraded=0 [Tue Jul 3 21:13:31 2018] wlp3s0: associated wlp3s0: WPA: Key negotiation completed with 12:6f:3f:0e:33:3c [PTK=CCMP GTK=CCMP] wlp3s0: CTRL-EVENT-CONNECTED - Connection to 12:6f:3f:0e:33:3c completed [id=0 id_str=] [Tue Jul 3 21:13:31 2018] wlp3s0: moving STA 12:6f:3f:0e:33:3c to state 4 [Tue Jul 3 21:13:31 2018] wlp3s0: AddBA Req buf_size=64 for 12:6f:3f:0e:33:3c [Tue Jul 3 21:13:31 2018] wlp3s0: Rx A-MPDU request on 12:6f:3f:0e:33:3c tid 0 result 0 A test with the code on a AP is still pending. (I'll probably try that on the weekend.) Alexander
On Tue, 2018-07-03 at 21:54 +0200, Alexander Wetzel wrote: > > I think easier would be to just disconnect ourselves? At least if we're > > in managed mode... > > > > I still have much to learn about 802.11, but so far I did not see way to > directly disconnect a STA. (Maybe spoofing a "signal lost" event or > something like that, but I fear complications by losing the sync with > the remote STA.) Is there any call/signal you have in mind I could test? ieee80211_set_disassoc(), this can also send a frame out to indicate to the AP that we're disconnecting instead. > hostapd or wpa_supplicant are "ordering" mac80211 to install a new key > and are implementing the state machine and are in a good position to > handle the fallout... at least theoretically. Ideally it would even know beforehand that we don't want to handle the PTK rekeying, and then could reconnect instead of going through the handshake. > Instead I get a pop up asking for the PSK. Entering it reconnects > normally. Cancel the prompt disconnect till a manual reconnect. > I suspect NetworkManager is handling the rekey like the initial key > install and then assumes the PSK is wrong. Hardly surprising but also > highly visible to the users. That's pretty awkward. > But then only to those using the now broken rekey... Yeah, but you don't necessarily control that - i.e. client device and AP might have different owners :-) > Using wpa_supplicant directly reconnects after ~15s. > It also assumes the key is wrong and seems to rate limit the connection > attempts. Here a log with wpa_supplicat running in the console and dmesg > -wT output on top of that: So I think we're probably better off accepting the set_key but not actually using it, and instead disconnecting... even if that's awkward and should come with a big comment :-) johannes
>>> I think easier would be to just disconnect ourselves? At least if we're >>> in managed mode... >>> >> >> I still have much to learn about 802.11, but so far I did not see way to >> directly disconnect a STA. (Maybe spoofing a "signal lost" event or >> something like that, but I fear complications by losing the sync with >> the remote STA.) Is there any call/signal you have in mind I could test? > > ieee80211_set_disassoc(), this can also send a frame out to indicate to > the AP that we're disconnecting instead. > I'll try that, but will probably take another week. My main main work station got severe file system corruption, forcing me to reinstall it from scratch. I suspect it was something in the wireless testing kernel 4.18-rc1 (944ae08d4e71) related to either btrfs or the ssd disk but since I needed the system just started over and avoid to run 4.18 if I do not have a full backup... >> hostapd or wpa_supplicant are "ordering" mac80211 to install a new key >> and are implementing the state machine and are in a good position to >> handle the fallout... at least theoretically. > > Ideally it would even know beforehand that we don't want to handle the > PTK rekeying, and then could reconnect instead of going through the > handshake. > Don't see how we could do that in the kernel, all the relevant information is handled in the state machine. I guess an API extension telling hostap/supplicant if we can handle rekeys or not would tbe he only way to avoid that. >> Instead I get a pop up asking for the PSK. Entering it reconnects >> normally. Cancel the prompt disconnect till a manual reconnect. >> I suspect NetworkManager is handling the rekey like the initial key >> install and then assumes the PSK is wrong. Hardly surprising but also >> highly visible to the users. > > That's pretty awkward. > >> But then only to those using the now broken rekey... > > Yeah, but you don't necessarily control that - i.e. client device and AP > might have different owners :-) > >> Using wpa_supplicant directly reconnects after ~15s. >> It also assumes the key is wrong and seems to rate limit the connection >> attempts. Here a log with wpa_supplicat running in the console and dmesg >> -wT output on top of that: > > So I think we're probably better off accepting the set_key but not > actually using it, and instead disconnecting... even if that's awkward > and should come with a big comment :-) Instead of returning an error I'll change the code to accept the rekey but do nothing with it. (Basically delete the new key and keep the old active). And of course calling ieee80211_set_disassoc() prior to return "success". Let's see how the supplicant will react on a disassoc while doing a rekey... Alexander
Hi, > I'll try that, but will probably take another week. My main main work > station got severe file system corruption, forcing me to reinstall it > from scratch. I suspect it was something in the wireless testing kernel > 4.18-rc1 (944ae08d4e71) related to either btrfs or the ssd disk but > since I needed the system just started over and avoid to run 4.18 if I > do not have a full backup... Ouch. FWIW, it's possible to run inside a VM with PCI(e) devices outside, at least on some machines. > > > hostapd or wpa_supplicant are "ordering" mac80211 to install a new key > > > and are implementing the state machine and are in a good position to > > > handle the fallout... at least theoretically. > > > > Ideally it would even know beforehand that we don't want to handle the > > PTK rekeying, and then could reconnect instead of going through the > > handshake. > > > > Don't see how we could do that in the kernel, all the relevant > information is handled in the state machine. I guess an API extension > telling hostap/supplicant if we can handle rekeys or not would tbe he > only way to avoid that. Right. Not really much point for now I guess. > > So I think we're probably better off accepting the set_key but not > > actually using it, and instead disconnecting... even if that's awkward > > and should come with a big comment :-) > > Instead of returning an error I'll change the code to accept the rekey > but do nothing with it. (Basically delete the new key and keep the old > active). > And of course calling ieee80211_set_disassoc() prior to return "success". Right. Did you handle/consider modes other than BSS/P2P client btw? > Let's see how the supplicant will react on a disassoc while doing a rekey... Shouldn't matter to it, I'd think. johannes
Hi Alexander, >>> hostapd or wpa_supplicant are "ordering" mac80211 to install a new key >>> and are implementing the state machine and are in a good position to >>> handle the fallout... at least theoretically. >> >> Ideally it would even know beforehand that we don't want to handle the >> PTK rekeying, and then could reconnect instead of going through the >> handshake. >> > > Don't see how we could do that in the kernel, all the relevant > information is handled in the state machine. I guess an API extension > telling hostap/supplicant if we can handle rekeys or not would tbe he > only way to avoid that. > Can the kernel / driver provide some sort of hint to user space that PTK rekey isn’t supported? We could then have user space deauthenticate with a big warning about what/why this is happening and try to re-connect to the last used BSS. >> So I think we're probably better off accepting the set_key but not >> actually using it, and instead disconnecting... even if that's awkward >> and should come with a big comment :-) > > Instead of returning an error I'll change the code to accept the rekey > but do nothing with it. (Basically delete the new key and keep the old > active). > And of course calling ieee80211_set_disassoc() prior to return "success". > > Let's see how the supplicant will react on a disassoc while doing a rekey... > This sounds pretty awful actually. Now that wpa_s is not the only game in town, can we stop resorting to these tactics? Regards, -Denis
Hi, >> I'll try that, but will probably take another week. My main main work >> station got severe file system corruption, forcing me to reinstall it >> from scratch. I suspect it was something in the wireless testing kernel >> 4.18-rc1 (944ae08d4e71) related to either btrfs or the ssd disk but >> since I needed the system just started over and avoid to run 4.18 if I >> do not have a full backup... > > Ouch. FWIW, it's possible to run inside a VM with PCI(e) devices > outside, at least on some machines. > >>>> hostapd or wpa_supplicant are "ordering" mac80211 to install a new key >>>> and are implementing the state machine and are in a good position to >>>> handle the fallout... at least theoretically. >>> >>> Ideally it would even know beforehand that we don't want to handle the >>> PTK rekeying, and then could reconnect instead of going through the >>> handshake. >>> >> >> Don't see how we could do that in the kernel, all the relevant >> information is handled in the state machine. I guess an API extension >> telling hostap/supplicant if we can handle rekeys or not would tbe he >> only way to avoid that. > > Right. Not really much point for now I guess. > >>> So I think we're probably better off accepting the set_key but not >>> actually using it, and instead disconnecting... even if that's awkward >>> and should come with a big comment :-) >> >> Instead of returning an error I'll change the code to accept the rekey >> but do nothing with it. (Basically delete the new key and keep the old >> active). >> And of course calling ieee80211_set_disassoc() prior to return "success". > > Right. Did you handle/consider modes other than BSS/P2P client btw? I still have huge gaps in my wlan knowledge, I only drilled into the rekey issue so far and never used anything besides AP/STA... That said we only should have mesh and IBSS left to consider, right? (I think I remember some new mode allowing STAs to diretly talk to each other while beeing associated to an AP, but can't find it again and don't see how this could work with PTK keys without somehow making a mesh out of it.) So far I have avoided non-generic changes. The current patch should work with all modes, shouldn't it?. My initial impression of ieee80211_set_disassoc() was much the same, but you seem to imply it's not usable in some modes? That would again be awkward... As for IBSS: I have no idea how a mesh would handle rekeys. If it can but won't work with ieee80211_set_disassoc() it will be quite some time till I've can propose a new patch. I normally only have a few hours per week for the forseeable future, and some weeks not even those... On the plus side i got my hands on a AP using ath10k and can look at that from yet another angle. I'll devinitelly continue here, but I suspect I'll be slow with patches for a while... > >> Let's see how the supplicant will react on a disassoc while doing a rekey... > > Shouldn't matter to it, I'd think. Alexander
Hi Denis, > Hi Alexander, > >>>> hostapd or wpa_supplicant are "ordering" mac80211 to install a new key >>>> and are implementing the state machine and are in a good position to >>>> handle the fallout... at least theoretically. >>> >>> Ideally it would even know beforehand that we don't want to handle the >>> PTK rekeying, and then could reconnect instead of going through the >>> handshake. >>> >> >> Don't see how we could do that in the kernel, all the relevant >> information is handled in the state machine. I guess an API extension >> telling hostap/supplicant if we can handle rekeys or not would tbe he >> only way to avoid that. >> > > Can the kernel / driver provide some sort of hint to user space that PTK > rekey isn’t supported? We could then have user space deauthenticate > with a big warning about what/why this is happening and try to > re-connect to the last used BSS. > Sure. In fact the latest patch is already doing that by returning an error when set_key is called for PTK and it's not an initial call. Tests with wpa_supplicant shows that this is is then handled like the initial key set is failing. Networkmanager prompts for the password and wpa_supplicant running without seems to blacklist a reconnect for 15s. I kind of liked that solution, but with existing implematations out this is indeed awkward to find a "correct" solution. The main problem for me currently is to find a correct and still acceptable solution. This turned from "let's fix this nasty wlan connection freezes" to a projet spanning the complete wlan stack: From hardware up to and including the userspace... It's fun to learn how that interacts (if not very fast), I'm stuggling finding the best way forward here. Whatever we do has undesired consequences. Maybe I'm missing something, but here the high level options we have in my opinion: 1) Keep it as it is and solve that in a indefinite future when we and the world implement the ieee802.11 2012 addition, to use key 0+1 for PTK and 2+3 for GTK - rekey has a extrem high probability of freezing connections and leaking a few clear text packets for years (decades?) to come + The issue is fixed at the core 2) Make it worse, like some (most) Windows systems/cards seem to handle it by encrypting EAPOL #4 with the NEW key, breaking the handshake and forcing a reconnect. - break something more to fix a problem sounds like a insane approach + This seems to be quite common and therefore well "tested" (based on my very limited data on that) 3) Fix what we can in mac80211 but keep the API stable - Without driver actions still many drivers will be "undefined" and even if they are not freezing leak packets + This will reduce the problems to a fraction of what is is today with only a mac80211 update 4) Redesign the mac80211 rekey handling and interaction with drivers to only rekey if it is save and decline when not. + We only have to touch the kernel - any supplicant (whatever runs the wpa state machine) may get errors where the programmes did not expect them, leading to unexpected side effects. 5) The full-stack solution: Update the API for the userpace + We do not have to "trick" the wpa state machine to disconnect, the programmers of it have to code it. - Well, it must be suppurted from the wpa state machine. If not we still have to handle the rekey somehow or we accept freezes/cleartext leaks... The last two solutions will also need some "fallback" when a secure rekey is not possible and/or the user is runing an old state machine not knowing about the new way... >>> So I think we're probably better off accepting the set_key but not >>> actually using it, and instead disconnecting... even if that's awkward >>> and should come with a big comment :-) >> >> Instead of returning an error I'll change the code to accept the rekey >> but do nothing with it. (Basically delete the new key and keep the old >> active). >> And of course calling ieee80211_set_disassoc() prior to return "success". >> >> Let's see how the supplicant will react on a disassoc while doing a >> rekey... >> > > This sounds pretty awful actually. Now that wpa_s is not the only game > in town, can we stop resorting to these tactics? Nothing of it is wpa_supplicat/hostap specifiy. Only my current "test" environment is using it, simply due to the fact that I tracked the issue down in that environment. Everything besides ath9k as an AP running hostapd and a iwldvm card running wpa_supplicant is mostly untested. And even there I have some areas marked for follow up after we find a solution acceptable for the kernel... Do you have any other software you think I should add to my prelimitary tests? If possible I'll happy to extend the test of the patches with those. Alexander
Hi Alexander, On 07/11/2018 12:08 PM, Alexander Wetzel wrote: > Hi Denis, > >> Hi Alexander, >> >>>>> hostapd or wpa_supplicant are "ordering" mac80211 to install a new key >>>>> and are implementing the state machine and are in a good position to >>>>> handle the fallout... at least theoretically. >>>> >>>> Ideally it would even know beforehand that we don't want to handle the >>>> PTK rekeying, and then could reconnect instead of going through the >>>> handshake. >>>> >>> >>> Don't see how we could do that in the kernel, all the relevant >>> information is handled in the state machine. I guess an API extension >>> telling hostap/supplicant if we can handle rekeys or not would tbe he >>> only way to avoid that. >>> >> >> Can the kernel / driver provide some sort of hint to user space that PTK >> rekey isn’t supported? We could then have user space deauthenticate >> with a big warning about what/why this is happening and try to >> re-connect to the last used BSS. >> > > Sure. In fact the latest patch is already doing that by returning an > error when set_key is called for PTK and it's not an initial call. > Tests with wpa_supplicant shows that this is is then handled like the > initial key set is failing. Networkmanager prompts for the password and > wpa_supplicant running without seems to blacklist a reconnect for 15s. Ideally we shouldn't even get this far. We really need some kind of capability bit on the phy telling userspace whether PTK rekey is supported or not. Then userspace can take proper action based on this information. E.g. if PTK rekey isn't safe, then we can simply issue a CMD_DISCONNECT and re-connect to the last BSS. The kernel doesn't need to play any 'tricks'. The fact that current userspace implementations are broken is regrettable and needs to be fixed. > > I kind of liked that solution, but with existing implematations out this > is indeed awkward to find a "correct" solution. > > The main problem for me currently is to find a correct and still > acceptable solution. This turned from "let's fix this nasty wlan > connection freezes" to a projet spanning the complete wlan stack: From > hardware up to and including the userspace... Right. The problem is that this PTK rekey likely 'works' (for some definition thereof) in a vast majority of cases, e.g. the link isn't broken, so the user doesn't notice. So, if the kernel starts to unilaterally issue disconnects, you will have a lot of grumbling users. Just to clarify, I'm not arguing against this necessarily. I can see why issuing a disconnect is a good idea for many reasons (e.g. security, etc.) But, I would expect a lot of user backlash if this is done, and given that this has been an issue for many years, I wonder if its the right way of handling this? > > It's fun to learn how that interacts (if not very fast), I'm stuggling > finding the best way forward here. Whatever we do has undesired > consequences. > Maybe I'm missing something, but here the high level options we have in > my opinion: > > 1) Keep it as it is and solve that in a indefinite future when we and > the world implement the ieee802.11 2012 addition, to use key 0+1 for PTK > and 2+3 for GTK > - rekey has a extrem high probability of freezing connections and > leaking a few clear text packets for years (decades?) to come > + The issue is fixed at the core It would seem to me that 0+1 rekey is a separate issue that needs to be supported in both kernel and userspace anyway. > > 2) Make it worse, like some (most) Windows systems/cards seem to handle > it by encrypting EAPOL #4 with the NEW key, breaking the handshake and > forcing a reconnect. > - break something more to fix a problem sounds like a insane approach > + This seems to be quite common and therefore well "tested" (based on my > very limited data on that) This seems awful. And then if you're unlucky someone will come in and tell you that the kernel has to maintain this 'legacy' behavior forever. So things can't ever be fixed. Plus, as I already mentioned above, some users 'think' that PTK rekey already works just fine. > > 3) Fix what we can in mac80211 but keep the API stable > - Without driver actions still many drivers will be "undefined" and even > if they are not freezing leak packets > + This will reduce the problems to a fraction of what is is today with > only a mac80211 update > > 4) Redesign the mac80211 rekey handling and interaction with drivers to > only rekey if it is save and decline when not. > + We only have to touch the kernel > - any supplicant (whatever runs the wpa state machine) may get errors > where the programmes did not expect them, leading to unexpected side > effects. > > 5) The full-stack solution: Update the API for the userpace > + We do not have to "trick" the wpa state machine to disconnect, the > programmers of it have to code it. > - Well, it must be suppurted from the wpa state machine. If not we still > have to handle the rekey somehow or we accept freezes/cleartext leaks... > > The last two solutions will also need some "fallback" when a secure > rekey is not possible and/or the user is runing an old state machine not > knowing about the new way... > My vote is for something along the lines of 5. I realize there are no good solutions here, but this really should be fixed by a combination of userspace and kernel working properly together. If this isn't possible, then perhaps the whole rekey should be done in the kernel and userspace should be given no chance to make a bad decision. > >>>> So I think we're probably better off accepting the set_key but not >>>> actually using it, and instead disconnecting... even if that's awkward >>>> and should come with a big comment :-) >>> >>> Instead of returning an error I'll change the code to accept the rekey >>> but do nothing with it. (Basically delete the new key and keep the old >>> active). >>> And of course calling ieee80211_set_disassoc() prior to return "success". >>> >>> Let's see how the supplicant will react on a disassoc while doing a >>> rekey... >>> >> >> This sounds pretty awful actually. Now that wpa_s is not the only game >> in town, can we stop resorting to these tactics? > > Nothing of it is wpa_supplicat/hostap specifiy. Only my current "test" > environment is using it, simply due to the fact that I tracked the issue > down in that environment. > Everything besides ath9k as an AP running hostapd and a iwldvm card > running wpa_supplicant is mostly untested. And even there I have some > areas marked for follow up after we find a solution acceptable for the > kernel... > > Do you have any other software you think I should add to my prelimitary > tests? If possible I'll happy to extend the test of the patches with those. > You can try: https://git.kernel.org/pub/scm/network/wireless/iwd.git/ We would be happy to implement whatever 'proper' userspace behavior / kernel api that this discussion leads to. Regards, -Denis
Hi, >>> So I think we're probably better off accepting the set_key but not >>> actually using it, and instead disconnecting... even if that's awkward >>> and should come with a big comment :-) >> >> Instead of returning an error I'll change the code to accept the rekey >> but do nothing with it. (Basically delete the new key and keep the old >> active). >> And of course calling ieee80211_set_disassoc() prior to return "success". > > Right. Did you handle/consider modes other than BSS/P2P client btw? I've tested that on a client only and it's already not working. Problem is, that with ieee80211_set_disassoc() we just dump the association in the kernel and are not informing the userspace, so the state machine is stuck in "connected" but the STA is unable to communicate. I also tried ieee80211_mgd_deauth(): While better this is basically the same behaviour as returning an error on key replace. By setting the error code to inactivity at least wpa_supplicant was actually starting to reconnect, unfortunatelly set_key then failes since we purged the assosication in the kernel. (Or that's my best interpretation from the logs). Networkmanager then again prompted for the password... I started experimenting with just signals to the userspace, but then paused... Trying to control the state machine with spoofed errors trying to trigger an "desireable" action can't be the way forward, can it? Even if we get that working changes or different implementations in userspace may well break it. I now think we only have two reasonable ways forward: 1) The user friendly one: If the userspace does not know about the new API just print a error message and do the insecure key replace. With potential clear text packet leaks and connection freezes. 2) The secure way: Report an error on key install and accept that users will encounter new problems when they are using rekeys with the old API with "old" userspace software. Since we have this issue in the kernel at least as long as we have mac80211 I would vote for 1) here. Fix mac80211 and add a new way for the userspace to to securely replace the keys and detect when this is not possible. Then get the userspace software updated to act accordingly, finally preventing the clear text packet leaks. After some time we can then decide to reject insecure rekeys, burning only those who use kernels much newer than the userspace. Does this sound like an reasonable approch or should I go back figuring out how to trick the userspace to reconnect without user notification/intervention? Alexander
diff --git a/net/mac80211/key.c b/net/mac80211/key.c index ee0d0cc8dc3b..bb498f2b6c44 100644 --- a/net/mac80211/key.c +++ b/net/mac80211/key.c @@ -248,6 +248,7 @@ static void ieee80211_key_disable_hw_accel(struct ieee80211_key *key) (key->conf.flags & IEEE80211_KEY_FLAG_RESERVE_TAILROOM))) increment_tailroom_need_count(sdata); + key->flags &= ~KEY_FLAG_UPLOADED_TO_HARDWARE; ret = drv_set_key(key->local, DISABLE_KEY, sdata, sta ? &sta->sta : NULL, &key->conf); @@ -257,7 +258,26 @@ static void ieee80211_key_disable_hw_accel(struct ieee80211_key *key) key->conf.keyidx, sta ? sta->sta.addr : bcast_addr, ret); - key->flags &= ~KEY_FLAG_UPLOADED_TO_HARDWARE; +} + +static void ieee80211_key_retire(struct ieee80211_key *key) +{ + struct ieee80211_local *local = key->local; + struct sta_info *sta = key->sta; + + assert_key_lock(key->local); + + /* Stop TX till we are on the new key */ + key->flags |= KEY_FLAG_TAINTED; + ieee80211_clear_fast_xmit(sta); + + if (ieee80211_hw_check(&local->hw, AMPDU_AGGREGATION)) { + set_sta_flag(sta, WLAN_STA_BLOCK_BA); + ieee80211_sta_tear_down_BA_sessions( + sta, AGG_STOP_LOCAL_REQUEST); + } + ieee80211_flush_queues(key->local, key->sdata, false); + ieee80211_key_disable_hw_accel(key); } static void __ieee80211_set_default_key(struct ieee80211_sub_if_data *sdata, @@ -316,34 +336,45 @@ void ieee80211_set_default_mgmt_key(struct ieee80211_sub_if_data *sdata, } -static void ieee80211_key_replace(struct ieee80211_sub_if_data *sdata, +static int ieee80211_key_replace(struct ieee80211_sub_if_data *sdata, struct sta_info *sta, bool pairwise, struct ieee80211_key *old, struct ieee80211_key *new) { int idx; + int ret; bool defunikey, defmultikey, defmgmtkey; /* caller must provide at least one old/new */ if (WARN_ON(!new && !old)) - return; + return 0; if (new) list_add_tail_rcu(&new->list, &sdata->key_list); WARN_ON(new && old && new->conf.keyidx != old->conf.keyidx); - if (old) + if (old) { idx = old->conf.keyidx; - else + if(new && sta && pairwise) + ieee80211_key_retire(old); + } else { idx = new->conf.keyidx; + } + + if (new && !new->local->wowlan) { + ret = ieee80211_key_enable_hw_accel(new); + if (ret) + return ret; + } if (sta) { if (pairwise) { rcu_assign_pointer(sta->ptk[idx], new); sta->ptk_idx = idx; ieee80211_check_fast_xmit(sta); + clear_sta_flag(sta, WLAN_STA_BLOCK_BA); } else { rcu_assign_pointer(sta->gtk[idx], new); } @@ -380,6 +411,7 @@ static void ieee80211_key_replace(struct ieee80211_sub_if_data *sdata, if (old) list_del_rcu(&old->list); + return 0; } struct ieee80211_key * @@ -654,7 +686,6 @@ int ieee80211_key_link(struct ieee80211_key *key, struct ieee80211_sub_if_data *sdata, struct sta_info *sta) { - struct ieee80211_local *local = sdata->local; struct ieee80211_key *old_key; int idx, ret; bool pairwise; @@ -687,18 +718,13 @@ int ieee80211_key_link(struct ieee80211_key *key, increment_tailroom_need_count(sdata); - ieee80211_key_replace(sdata, sta, pairwise, old_key, key); + ret = ieee80211_key_replace(sdata, sta, pairwise, old_key, key); ieee80211_key_destroy(old_key, true); ieee80211_debugfs_key_add(key); - if (!local->wowlan) { - ret = ieee80211_key_enable_hw_accel(key); - if (ret) - ieee80211_key_free(key, true); - } else { - ret = 0; - } + if (ret) + ieee80211_key_free(key, true); out: mutex_unlock(&sdata->local->key_mtx);
Rekeying a pairwise key with encryption offload and only keyid 0 has two potential races which can freeze the wlan conection till rekeyed again: 1) For incomming packets: If the local STA installs the key prior to the remote STA we still have the old key active in the hardware for a short time after mac80211 switched to the new key. The card can still hand over packets decoded with the old key to mac80211, bumping the new PN (IV) value to an incorrect high number and tricking the local replay detection to drop all packets really sent with the new key. 2) For outgoing packets: If mac80211 is providing the PN (IV) and hands over the cleartext packets for encryption to the hardware immediately prior to a key change the driver/card may process the queued packets after switching to the new key. This will immediatelly bump the PN (IV) value on the remote STA to an incorrect high number, also freezing the connection. Both issues can be prevented by first replacing the key in the HW and makeing sure no aggregation sessions are running during the rekey. Signed-off-by: Alexander Wetzel <alexander.wetzel@web.de> --- net/mac80211/key.c | 54 ++++++++++++++++++++++++++++++++++------------ 1 file changed, 40 insertions(+), 14 deletions(-)