Message ID | 20180814104255.4183-4-alexander@wetzel-home.de (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Johannes Berg |
Headers | show |
Series | Fix PTK rekey freezes and cleartext leaks | expand |
On Tue, 2018-08-14 at 12:42 +0200, Alexander Wetzel wrote: > > + /* PTK only using key ID 0 needs special handling on rekey */ > + if (new_key && sta && ptk0rekey) { > + local = old_key->local; > + sdata = old_key->sdata; > + > + /* Stop TX till we are on the new key */ > + old_key->flags |= KEY_FLAG_TAINTED; > + ieee80211_clear_fast_xmit(sta); > + > + /* Aggregation sessions during rekey are complicated due to > + * the reorder buffer. Side step that by blocking aggregation > + * and tear down running connections. > + */ > + if (ieee80211_hw_check(&local->hw, AMPDU_AGGREGATION)) { > + set_sta_flag(sta, WLAN_STA_BLOCK_BA); > + ieee80211_sta_tear_down_BA_sessions(sta, > + AGG_STOP_LOCAL_REQUEST); > + } > + > + if (new_key->local->ops->replace_key) { > + ret = drv_replace_key(old_key->local, sdata, > + &sta->sta, &old_key->conf, > + &new_key->conf); > + if (!ret) > + new_key->flags |= KEY_FLAG_UPLOADED_TO_HARDWARE; > + else > + sdata_err(sdata, > + "failed to replace key (%d) for " \ > + "STA (%pM) in hardware: ret=(%d)\n", > + old_key->conf.keyidx, > + sta->sta.addr, > + ret); > + > + old_key->flags &= ~KEY_FLAG_UPLOADED_TO_HARDWARE; > + } else { > + sdata_info(sdata, > + "Userspace requested a PTK rekey for STA " \ > + "%pM while feature not supported! " \ > + "This may leak clear text packets or " \ > + "freeze the connection.", > + sta->sta.addr); This seems a bit weird - we know a likely dangerous thing is happening and only print an info message? Why not just prevent this in the first place? johannes
Am 28.08.18 um 10:48 schrieb Johannes Berg: > On Tue, 2018-08-14 at 12:42 +0200, Alexander Wetzel wrote: >> >> + /* PTK only using key ID 0 needs special handling on rekey */ >> + if (new_key && sta && ptk0rekey) { >> + local = old_key->local; >> + sdata = old_key->sdata; >> + >> + /* Stop TX till we are on the new key */ >> + old_key->flags |= KEY_FLAG_TAINTED; >> + ieee80211_clear_fast_xmit(sta); >> + >> + /* Aggregation sessions during rekey are complicated due to >> + * the reorder buffer. Side step that by blocking aggregation >> + * and tear down running connections. >> + */ >> + if (ieee80211_hw_check(&local->hw, AMPDU_AGGREGATION)) { >> + set_sta_flag(sta, WLAN_STA_BLOCK_BA); >> + ieee80211_sta_tear_down_BA_sessions(sta, >> + AGG_STOP_LOCAL_REQUEST); >> + } >> + >> + if (new_key->local->ops->replace_key) { >> + ret = drv_replace_key(old_key->local, sdata, >> + &sta->sta, &old_key->conf, >> + &new_key->conf); >> + if (!ret) >> + new_key->flags |= KEY_FLAG_UPLOADED_TO_HARDWARE; >> + else >> + sdata_err(sdata, >> + "failed to replace key (%d) for " \ >> + "STA (%pM) in hardware: ret=(%d)\n", >> + old_key->conf.keyidx, >> + sta->sta.addr, >> + ret); >> + >> + old_key->flags &= ~KEY_FLAG_UPLOADED_TO_HARDWARE; >> + } else { >> + sdata_info(sdata, >> + "Userspace requested a PTK rekey for STA " \ >> + "%pM while feature not supported! " \ >> + "This may leak clear text packets or " \ >> + "freeze the connection.", >> + sta->sta.addr); > > This seems a bit weird - we know a likely dangerous thing is happening > and only print an info message? Why not just prevent this in the first > place? The next version will upgrade that to a warning. Rejecting it outright will break things for users with card/drivers where rekey currently is working. There is no error error message for "please re-associate as quick as you can" and tricking userspace to do that across versions and implementations would need code at I do not see belonging into the kernel. Here what I wrote in the cover letter of the v4 version of the patch about this decision: > I do not see an acceptable way from the kernel to trigger a fast > reconnect. wpa_supplicant e.g. has some special code handling errors > during a 4-way-handshake differently. I assume we would have to add > code detecting if we are running as AP, Station Infrastructure, Ad-Hoc > or Mesh and then find and spoof an error which allows the userspace to > reconnect fast. For multiple different userspace implementations and > the different versions of them. > And we'll have that mess in the kernel for basically forever... > Seems to be a clear no go, correct? > > So this patch (series) is giving up on a quick fix and will allow > broken rekeys to continue. It is instead providing an updated API for > both the userspace and the drivers to also do it correctly. Users > running a userspace not aware of the new API will get some > improvements but may > still leak cleartext packets and have connection freezes till we get > an updated userspace rolled out. On the plus side users running > updated userspace won't be able to rekey connections any longer, > avoiding the issues even with unpatched kernels. Of course I may miss something and there may be a good way to get that working anyhow. But for me it looks like we either have to accept something which looks like a regression to users or accept that some drivers may not be fixed with this patch alone and will need either an updated userspace or drivers. Alexander
On Tue, 2018-08-28 at 18:27 +0200, Alexander Wetzel wrote: > > This seems a bit weird - we know a likely dangerous thing is happening > > and only print an info message? Why not just prevent this in the first > > place? > > The next version will upgrade that to a warning. Maybe make it rate limited, and certainly not a WARN_ON() though. > Rejecting it outright will break things for users with card/drivers > where rekey currently is working. There is no error error message for > "please re-associate as quick as you can" and tricking userspace to do > that across versions and implementations would need code at I do not see > belonging into the kernel. Here what I wrote in the cover letter of the > v4 version of the patch about this decision: > > > I do not see an acceptable way from the kernel to trigger a fast > > reconnect. wpa_supplicant e.g. has some special code handling errors > > during a 4-way-handshake differently. I assume we would have to add > > code detecting if we are running as AP, Station Infrastructure, Ad-Hoc > > or Mesh and then find and spoof an error which allows the userspace to > > reconnect fast. For multiple different userspace implementations and > > the different versions of them. > > And we'll have that mess in the kernel for basically forever... > > Seems to be a clear no go, correct? > > > > So this patch (series) is giving up on a quick fix and will allow > > broken rekeys to continue. It is instead providing an updated API for > > both the userspace and the drivers to also do it correctly. Users > > running a userspace not aware of the new API will get some > > improvements but may > > still leak cleartext packets and have connection freezes till we get > > an updated userspace rolled out. On the plus side users running > > updated userspace won't be able to rekey connections any longer, > > avoiding the issues even with unpatched kernels. > > Of course I may miss something and there may be a good way to get that > working anyhow. But for me it looks like we either have to accept > something which looks like a regression to users or accept that some > drivers may not be fixed with this patch alone and will need either an > updated userspace or drivers. Ok, fair enough. I may be willing to accept something as a regression, but at the same time perhaps it doesn't really belong into this patch anyway, so we can always do it as a later one. johannes
diff --git a/net/mac80211/key.c b/net/mac80211/key.c index c054ac85793c..c77f32fe0473 100644 --- a/net/mac80211/key.c +++ b/net/mac80211/key.c @@ -248,6 +248,7 @@ static void ieee80211_key_disable_hw_accel(struct ieee80211_key *key) (key->conf.flags & IEEE80211_KEY_FLAG_RESERVE_TAILROOM))) increment_tailroom_need_count(sdata); + key->flags &= ~KEY_FLAG_UPLOADED_TO_HARDWARE; ret = drv_set_key(key->local, DISABLE_KEY, sdata, sta ? &sta->sta : NULL, &key->conf); @@ -256,8 +257,86 @@ static void ieee80211_key_disable_hw_accel(struct ieee80211_key *key) "failed to remove key (%d, %pM) from hardware (%d)\n", key->conf.keyidx, sta ? sta->sta.addr : bcast_addr, ret); +} - key->flags &= ~KEY_FLAG_UPLOADED_TO_HARDWARE; +static int ieee80211_hw_key_replace(struct ieee80211_key *old_key, + struct ieee80211_key *new_key, + bool ptk0rekey) +{ + struct ieee80211_sub_if_data *sdata; + struct ieee80211_local *local; + struct sta_info *sta; + int ret; + + /* Aggregation sessions are also ok when running on SW crypto. + * A broken remote STA may have issues not observed with HW + * crypto, though. Nevertheless bypass the complete procedure. + */ + if (!(old_key->flags & KEY_FLAG_UPLOADED_TO_HARDWARE)) + return 0; + + assert_key_lock(old_key->local); + sta = old_key->sta; + + /* PTK only using key ID 0 needs special handling on rekey */ + if (new_key && sta && ptk0rekey) { + local = old_key->local; + sdata = old_key->sdata; + + /* Stop TX till we are on the new key */ + old_key->flags |= KEY_FLAG_TAINTED; + ieee80211_clear_fast_xmit(sta); + + /* Aggregation sessions during rekey are complicated due to + * the reorder buffer. Side step that by blocking aggregation + * and tear down running connections. + */ + if (ieee80211_hw_check(&local->hw, AMPDU_AGGREGATION)) { + set_sta_flag(sta, WLAN_STA_BLOCK_BA); + ieee80211_sta_tear_down_BA_sessions(sta, + AGG_STOP_LOCAL_REQUEST); + } + + if (new_key->local->ops->replace_key) { + ret = drv_replace_key(old_key->local, sdata, + &sta->sta, &old_key->conf, + &new_key->conf); + if (!ret) + new_key->flags |= KEY_FLAG_UPLOADED_TO_HARDWARE; + else + sdata_err(sdata, + "failed to replace key (%d) for " \ + "STA (%pM) in hardware: ret=(%d)\n", + old_key->conf.keyidx, + sta->sta.addr, + ret); + + old_key->flags &= ~KEY_FLAG_UPLOADED_TO_HARDWARE; + } else { + sdata_info(sdata, + "Userspace requested a PTK rekey for STA " \ + "%pM while feature not supported! " \ + "This may leak clear text packets or " \ + "freeze the connection.", + sta->sta.addr); + /* Flushing the driver queues *may* prevent cleartext + * leaks and freezes if supported by the driver. + */ + ieee80211_flush_queues(old_key->local, + old_key->sdata, + false); + ieee80211_key_disable_hw_accel(old_key); + ret = ieee80211_key_enable_hw_accel(new_key); + } + } else { + ieee80211_key_disable_hw_accel(old_key); + if (new_key) + ret = ieee80211_key_enable_hw_accel(new_key); + else + ret = 0; + } + + return ret; } static void __ieee80211_set_default_key(struct ieee80211_sub_if_data *sdata, @@ -316,38 +395,56 @@ void ieee80211_set_default_mgmt_key(struct ieee80211_sub_if_data *sdata, } -static void ieee80211_key_replace(struct ieee80211_sub_if_data *sdata, +static int ieee80211_key_replace(struct ieee80211_sub_if_data *sdata, struct sta_info *sta, bool pairwise, struct ieee80211_key *old, struct ieee80211_key *new) { int idx; + int ret; bool defunikey, defmultikey, defmgmtkey; /* caller must provide at least one old/new */ if (WARN_ON(!new && !old)) - return; + return 0; if (new) list_add_tail_rcu(&new->list, &sdata->key_list); WARN_ON(new && old && new->conf.keyidx != old->conf.keyidx); - if (old) + if (old) { idx = old->conf.keyidx; - else + /* TODO: proper implement and test "Extended Key ID for + * Individually Addressed Frames" from IEEE 802.11-2016. + * Till then always assume only key ID 0 is used for + * pairwise keys.*/ + ret = ieee80211_hw_key_replace(old, new, pairwise); + } else { idx = new->conf.keyidx; + if (new && !new->local->wowlan) + ret = ieee80211_key_enable_hw_accel(new); + else + ret = 0; + } + + if (ret) + return ret; if (sta) { if (pairwise) { rcu_assign_pointer(sta->ptk[idx], new); sta->ptk_idx = idx; - ieee80211_check_fast_xmit(sta); + if (new) { + clear_sta_flag(sta, WLAN_STA_BLOCK_BA); + ieee80211_check_fast_xmit(sta); + } } else { rcu_assign_pointer(sta->gtk[idx], new); } - ieee80211_check_fast_rx(sta); + if (new) + ieee80211_check_fast_rx(sta); } else { defunikey = old && old == key_mtx_dereference(sdata->local, @@ -380,6 +477,8 @@ static void ieee80211_key_replace(struct ieee80211_sub_if_data *sdata, if (old) list_del_rcu(&old->list); + + return 0; } struct ieee80211_key * @@ -575,9 +674,6 @@ static void ieee80211_key_free_common(struct ieee80211_key *key) static void __ieee80211_key_destroy(struct ieee80211_key *key, bool delay_tailroom) { - if (key->local) - ieee80211_key_disable_hw_accel(key); - if (key->local) { struct ieee80211_sub_if_data *sdata = key->sdata; @@ -654,7 +750,6 @@ int ieee80211_key_link(struct ieee80211_key *key, struct ieee80211_sub_if_data *sdata, struct sta_info *sta) { - struct ieee80211_local *local = sdata->local; struct ieee80211_key *old_key; int idx = key->conf.keyidx; bool pairwise = key->conf.flags & IEEE80211_KEY_FLAG_PAIRWISE; @@ -691,17 +786,13 @@ int ieee80211_key_link(struct ieee80211_key *key, increment_tailroom_need_count(sdata); - ieee80211_key_replace(sdata, sta, pairwise, old_key, key); - ieee80211_key_destroy(old_key, delay_tailroom); - - ieee80211_debugfs_key_add(key); + ret = ieee80211_key_replace(sdata, sta, pairwise, old_key, key); - if (!local->wowlan) { - ret = ieee80211_key_enable_hw_accel(key); - if (ret) - ieee80211_key_free(key, delay_tailroom); + if (!ret) { + ieee80211_debugfs_key_add(key); + ieee80211_key_destroy(old_key, delay_tailroom); } else { - ret = 0; + ieee80211_key_free(key, delay_tailroom); } out: diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c index cd332e3e1134..5c79b3c2a7e1 100644 --- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -2951,6 +2951,10 @@ void ieee80211_check_fast_xmit(struct sta_info *sta) if (!(build.key->flags & KEY_FLAG_UPLOADED_TO_HARDWARE)) goto out; + /* Key is being removed */ + if (build.key->flags & KEY_FLAG_TAINTED) + goto out; + switch (build.key->conf.cipher) { case WLAN_CIPHER_SUITE_CCMP: case WLAN_CIPHER_SUITE_CCMP_256:
Rekeying PTK keys without "Extended Key ID for Individually Addressed Frames" did use a procedure not suitable to replace in-use keys and could caused the following issues: 1) Freeze caused by incoming packets: If the local STA installed the key prior to the remote STA we still had the old key active in the hardware with mac80211 already operating on the new key. Therefore there was a window where the card could still hand over packets decoded with the old key to mac80211, bumping the new PN (IV) value to an incorrect high number and tricking the local replay detection to drop all packets sent with the new key. 2) Freeze caused by outgoing packets: If mac80211 was providing the PN (IV) and handed over a cleartext packets for encryption to the hardware prior to a key change the driver/card could have processed the queued packets after switching to the new key. This immediately bumped the PN value on the remote STA to an incorrect high number, which then discarded all packets using correct PNs we sent after that. 3) Freeze caused by RX aggregation reorder buffer: An aggregation session started with the old key and ending after the switch to the new key also bumped the PN to an incorrect high number, freezing the connection quite similar to 1). 4) Freeze caused by repeating lost frames in an aggregation session: A driver could repeat a lost frame and encrypt it with the new key while in a TX aggregation session without updating the PN for the new key. This also could freeze connections similar to 2). 5) Clear text leak: Removing encryption offload from the card cleared the encryption offload flag only after the card had removed the key. The driver/card could therefore get unencrypted packets from mac80211 while no longer be instructed to encrypt them. To prevent those issues the key install logic has been overhauled. We now stop queuing packets to the driver while replacing the key, replace the key first in the HW and stop/block new aggregation sessions during the rekey. This will only work correctly with drivers implementing replace_key or with a userspace honoring NL80211_EXT_FEATURE_ATOMIC_KEY_REPLACE. If the driver is not implementing replace_key all three issues can still occur and the userspace must not try to rekey the PTK key. If asked to do that anyhow mac80211 will fall back to an best-we-can-do approach which should avoid the issues for at least some drivers and inform the user about the potential issues via a kernel message. The core of the fix changes the key install procedure from: - atomic switch over to the new key in mac80211 - remove the old key in the HW (stops encryption offloading, fall back to SW encryption with a potential clear text packet leak in between) - delete the inactive old key in mac80211 - enable HW for encryption offloading (ending software encryption) to: - if it's a PTK mark the old key as tainted to drop TX packets with the outgoing key - replace the key in HW with the new one, using the new driver callback "replace_key" if it's availabe or by simulating it with existing calls if not - atomic switch over to the new key in mac80211 (allow TX with the key again) - delete the inactive old key in mac80211 The logic behind the updated key install sequence: With the new sequence the HW will be unable to decode packets encrypted with the old key prior to switching to the new key in mac80211. Locking down TX during the rekey makes sure that there are no outgoing packets while the driver and card are switching to the new key. The driver is allowed to hand over packets decrypted with either the new or the old key till "replace_key" returns. But all packets queued prior to calling the callback must be either still be send out encrypted with the old key or be dropped. Even with that a RX aggregation session started prior to the rekey and completed after can still dump frames received with the old key at mac80211 after it switched over to the new key. This is side stepped by stopping all aggregation sessions when we replace a PTK key and are using key offloading. (The only alternative seems be to discard all frames of the aggregation session running during rekey. Due to retransmits and reorder we can't differenciate between packets send with the old or the new key without deploying extreme efforts.) Stopping TX aggregation sessions - instead of only RX - also avoides the need to get the PNs (IVs) updated in frames prepared for the old key and retransmitted after the switch to the new key. And it improves the combatibility when the remote STA is not handling rekeys as it should. When using SW crypto aggregation sessions are not stopped. Mac80211 won't be able to decode the dangerous packets and discard them. At least as long as the remote STA is not also messing up and indeed updating the PN. Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de> --- net/mac80211/key.c | 131 ++++++++++++++++++++++++++++++++++++++------- net/mac80211/tx.c | 4 ++ 2 files changed, 115 insertions(+), 20 deletions(-)