diff mbox series

iwlwifi: add NL80211_EXT_FEATURE_CAN_REPLACE_PTK0 support

Message ID 20200918171301.6942-1-alexander@wetzel-home.de (mailing list archive)
State Rejected
Delegated to: Luca Coelho
Headers show
Series iwlwifi: add NL80211_EXT_FEATURE_CAN_REPLACE_PTK0 support | expand

Commit Message

Alexander Wetzel Sept. 18, 2020, 5:13 p.m. UTC
Avoid a not needed warning and queue flush from mac80211 when deleting
PTK keys.

The flush is only needed when queued MPDUs depend on the key table
instead carrying the needed key material directly in the MPDU info.

Set NL80211_EXT_FEATURE_CAN_REPLACE_PTK0 and flush the queues only when
needed.

Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de>
---
 .../net/wireless/intel/iwlwifi/dvm/mac80211.c  |  1 +
 .../net/wireless/intel/iwlwifi/mvm/mac80211.c  | 18 ++++++++++++++++++
 drivers/net/wireless/intel/iwlwifi/mvm/tx.c    |  6 ------
 3 files changed, 19 insertions(+), 6 deletions(-)

Comments

Johannes Berg Sept. 18, 2020, 5:34 p.m. UTC | #1
On Fri, 2020-09-18 at 19:13 +0200, Alexander Wetzel wrote:
> 
> +		/* GCMP and 256 bit CCMP keys the key can't be copied into the
> +		 * MPDU struct ieee80211_tx_info. We therefore must flush the
> +		 * queues to ensure there are no MPDUs left which are referring
> +		 * to the outgoing key.
> +		 */
> +		if (key->flags & IEEE80211_KEY_FLAG_PAIRWISE &&
> +		    (key->cipher == WLAN_CIPHER_SUITE_GCMP ||
> +		     key->cipher == WLAN_CIPHER_SUITE_GCMP_256 ||
> +		     key->cipher == WLAN_CIPHER_SUITE_CCMP_256)) {
> +			ieee80211_stop_queues(hw);
> +			iwl_mvm_mac_flush(hw, vif, 0, true);
> +			ieee80211_wake_queues(hw);
> +		}

Shouldn't the wake be after installing the new key? Otherwise new frames
can race and be there again?

johannes
Alexander Wetzel Sept. 20, 2020, 6:37 a.m. UTC | #2
>>
 >> +		/* GCMP and 256 bit CCMP keys the key can't be copied into the
 >> +		 * MPDU struct ieee80211_tx_info. We therefore must flush the
 >> +		 * queues to ensure there are no MPDUs left which are referring
 >> +		 * to the outgoing key.
 >> +		 */
 >> +		if (key->flags & IEEE80211_KEY_FLAG_PAIRWISE &&
 >> +		    (key->cipher == WLAN_CIPHER_SUITE_GCMP ||
 >> +		     key->cipher == WLAN_CIPHER_SUITE_GCMP_256 ||
 >> +		     key->cipher == WLAN_CIPHER_SUITE_CCMP_256)) {
 >> +			ieee80211_stop_queues(hw);
 >> +			iwl_mvm_mac_flush(hw, vif, 0, true);
 >> +			ieee80211_wake_queues(hw);
 >> +		}
 >
 > Shouldn't the wake be after installing the new key? Otherwise new frames
 > can race and be there again?

mac80211 is taking care of that and has already stopped queuing new 
MPDUs which would use the key by setting KEY_FLAG_TAINTED on it. So for 
a PTK0 rekey we are fine, mac80211 won't queue frames with the key under 
deletion.

But I think we should start setting KEY_FLAG_TAINTED for *any* PTK 
deletion to make sure we are not sending out MPDUs with a zero key or 
something like that. This is a special kind of a rekey case we missed so 
far but should be trival cover that, too.

That is then basically taking care about the generic Kr00k vulnerability 
  some months ago.

Now I'm not aware of we have any mitigations in the code for that but 
when we set KEY_FLAG_TAINTED for key deletions any driver which is able 
to rekey PTK0 correctly can't be affected by Kr00k any longer.

I'll look into that in the next days and prepare a patch for mac8011 to 
discuss that.

Alexander
Alexander Wetzel Sept. 22, 2020, 6:45 p.m. UTC | #3
Am 20.09.20 um 08:37 schrieb Alexander Wetzel:
> 
>  >>
>  >> +        /* GCMP and 256 bit CCMP keys the key can't be copied into the
>  >> +         * MPDU struct ieee80211_tx_info. We therefore must flush the
>  >> +         * queues to ensure there are no MPDUs left which are 
> referring
>  >> +         * to the outgoing key.
>  >> +         */
>  >> +        if (key->flags & IEEE80211_KEY_FLAG_PAIRWISE &&
>  >> +            (key->cipher == WLAN_CIPHER_SUITE_GCMP ||
>  >> +             key->cipher == WLAN_CIPHER_SUITE_GCMP_256 ||
>  >> +             key->cipher == WLAN_CIPHER_SUITE_CCMP_256)) {
>  >> +            ieee80211_stop_queues(hw);
>  >> +            iwl_mvm_mac_flush(hw, vif, 0, true);
>  >> +            ieee80211_wake_queues(hw);
>  >> +        }
>  >
>  > Shouldn't the wake be after installing the new key? Otherwise new frames
>  > can race and be there again?
> 
> mac80211 is taking care of that and has already stopped queuing new 
> MPDUs which would use the key by setting KEY_FLAG_TAINTED on it. So for 
> a PTK0 rekey we are fine, mac80211 won't queue frames with the key under 
> deletion.
> 

Ok, forget this patch, the mvm part is pointless.
The maximum we have to do is pausing the queues when we delete a key, no 
flush required at all... I'll look into that again and send an updated 
version:

The hw_key and the keyid is set (again) when we dequeue. So we either 
get a valid key or ieee80211_tx_dequeue() drops the MPDU itself.
Therefore it's irrelevant if we lookup the key from a table or store it 
in the MPDU info struct. Now we may still have a race within the driver 
but simply pausing dequeuing during key deletion should do the trick.

> But I think we should start setting KEY_FLAG_TAINTED for *any* PTK 
> deletion to make sure we are not sending out MPDUs with a zero key or 
> something like that. This is a special kind of a rekey case we missed so 
> far but should be trival cover that, too.
> 
> That is then basically taking care about the generic Kr00k vulnerability 
>   some months ago.
> 
> Now I'm not aware of we have any mitigations in the code for that but 
> when we set KEY_FLAG_TAINTED for key deletions any driver which is able 
> to rekey PTK0 correctly can't be affected by Kr00k any longer.
> 

Looked at the current code and turns out we have at least two 
mitigations for that now:
1) commit ce2e1ca70307 "(mac80211: Check port authorization in the 
ieee80211_tx_dequeue() case)" and

2) commit a0761a301746 "(mac80211: drop data frames without key on 
encrypted links)"

> I'll look into that in the next days and prepare a patch for mac8011 to 
> discuss that.

I've the patch basically ready, but there are already the two 
mitigations mentioned above in place.

The only thing the new patch seems to improves is deleting an active PTK 
key without de-authentication which seems to have no valid use cases.
And it makes KEY_FLAG_TAINTED more logical by also stopping Tx for an 
outgoing key and not only when we replace it.

I'll probably still send it as an RFC patch to sort out if we still want 
that or not.

Alexander
Johannes Berg Sept. 21, 2021, 7:04 p.m. UTC | #4
Hi Alexander,

On Tue, 2020-09-22 at 20:45 +0200, Alexander Wetzel wrote:
> 
> Ok, forget this patch, the mvm part is pointless.
> The maximum we have to do is pausing the queues when we delete a key, no 
> flush required at all... I'll look into that again and send an updated 
> version:

I know it's been (almost exactly) a year, but I was wondering about this
scenario recently
(due to e.g. https://bugzilla.kernel.org/show_bug.cgi?id=213059, though
I'm not sure that report even makes sense).

Did you ever send another patch? I can't seem to find any.


But basically, in mvm, we support two scenarios these days:

 1) PN assigned by the driver, in iwl_mvm_set_tx_cmd_crypto(), with two 
    sub-cases:
    a) key material embedded into the TX command (CCMP, TKIP, WEP)
    b) key material taken from firmware key offset (CCMP-256, GCMP)
 2) PN assigned by the device per the station, via the "new TX API"
    selected in iwl_mvm_set_tx_params().


Am I wrong in thinking that both scenarios 1a) and 2) are completely
acceptable for CAN_REPLACE_PTK0, since there's never any possibility of
sending a frame with a mismatch between the PN assignment and key
material?

However, it seems that scenario 1b) is what this patch attempted to
handle, by flushing the TX queues when the new key is installed, and I'm
not sure why you said it wasn't necessary - if the driver installs new
key material in the device while there are frames that already have a PN
set, then the old PN _might_ be used with a new key, leading to
problems.

There's a probabilistic defense against this, in that we attempt to
reuse key offsets (the hw_key ID that goes into the TX command) as
rarely as possible, so that if we put a frame with key offset 0 into the
queue, and then reinstall the PTK, it would go to key offset 2 (1 being
used by the GTK), and 0 would stay unless we did another rekeying or so.

I guess in theory we could arrange -- on the hardware where case 1b) is
even relevant, i.e. only 9260 since previous don't support GCMP, and
later use case 2) -- to never put the rekeyed PTK into the same slot,
that way, even if we do have to reuse the slot, it'd be with a different
key? But in theory that might leak data to the wrong station or
something ... functionally, it would lead to a rejection of the frame
and it being dropped at the receiver, but security-wise it'd be a
problem.


Anyway, I'm not really sure why you thought the flush wasn't needed, it
seems to me it is still needed in the case 1b).

Theoretically, it seems we could rely on the "no key slot (offset)
reuse" if we put some kind of barrier into the TX queues whenever we
stop using a key slot, that way we'd know if it might still get used or
not. If yes, we flush, but that basically never happens since there are
a relatively large number of slots and typical use cases don't have so
many keys.


The other question I had was concerning the documented requirements for
NL80211_EXT_FEATURE_CAN_REPLACE_PTK0, aren't those too strict? For
reference, this is what it says now:

> * Mac80211 drivers should set the @NL80211_EXT_FEATURE_CAN_REPLACE_PTK0 flag
> * when they are able to replace in-use PTK keys according to the following
> * requirements:
> * 1) They do not hand over frames decrypted with the old key to
>      mac80211 once the call to set_key() with command %DISABLE_KEY has been
>      completed when also setting @IEEE80211_KEY_FLAG_GENERATE_IV for any key,

This is I think a bit misleading, how's the RX-side related to
GENERATE_IV? It seems to me that this requirement is to ensure we don't
get a bad replay counter on RX, but then that's unrelated to
GENERATE_IV?

>   2) either drop or continue to use the old key for any outgoing frames queued
>      at the time of the key deletion (including re-transmits),

That's mostly true for iwlwifi, apart from the case 1b) key offset reuse
I was explaining above.

>   3) never send out a frame queued prior to the set_key() %SET_KEY command
>      encrypted with the new key and

This I don't really know why - I think maybe *this* was meant to have
the "when also setting GENERATE_IV"?

I don't think this is necessary if you ensure that the PN is assigned
from the correct key? That is, it doesn't really apply in case 2) I
mentioned?


>   4) never send out a frame unencrypted when it should be encrypted.

Obviously :)


Thanks,
johannes
Alexander Wetzel Sept. 23, 2021, 10:04 p.m. UTC | #5
Hi Johannes,

On 21.09.21 21:04, Johannes Berg wrote:
> Hi Alexander,
> 
> On Tue, 2020-09-22 at 20:45 +0200, Alexander Wetzel wrote:
>>
>> Ok, forget this patch, the mvm part is pointless.
>> The maximum we have to do is pausing the queues when we delete a key, no
>> flush required at all... I'll look into that again and send an updated
>> version:
> 
> I know it's been (almost exactly) a year, but I was wondering about this
> scenario recently
> (due to e.g. https://bugzilla.kernel.org/show_bug.cgi?id=213059, though
> I'm not sure that report even makes sense).

I had a quick look in the ticket. It may just be the "other end" which 
is not able to re-key correctly. After all the tests I did with mvm I 
should have noticed rekey issues.

But there is one thing sticking out in the problem description:
kernel: wlan0: disassociated from 4c:a6:4d:54:32:6b (Reason: 
15=4WAY_HANDSHAKE_TIMEOUT)

--> with that it's can't be a PTK0 rekey bug as we defined it so far.
With the 4way handshake incomplete the new key has never been install on 
the device the log is from.

So this is probably a case where the other end - should be the STA and 
not the AP - activated the new key and then encrypts EAPOL#4 with the 
new key... When the other end is not using control port and the driver 
has no special workaround this can easily happen..

Problem here is, that when wpa_supplicant hands over EAPOL #4 to the 
kernel this only guarantees that one frame is send out. When there are 
others in the QDISC that won't be the EAPOL#4 frame. See
https://patchwork.ozlabs.org/project/hostap/patch/20190912192002.6105-1-alexander@wetzel-home.de/


> 
> Did you ever send another patch? I can't seem to find any.
> 

No, I never send a patch. To be able to detect incorrect key usage I 
ended up implementing key color for mac80211 only to discover that this 
in not adding anything useful and never submitted it. I ended up with 
the understanding that we do not need the flush but have now problems to 
reproduce the reasoning for that. But let's see what I can together here.

> 
> But basically, in mvm, we support two scenarios these days:
> 
>   1) PN assigned by the driver, in iwl_mvm_set_tx_cmd_crypto(), with two
>      sub-cases:
>      a) key material embedded into the TX command (CCMP, TKIP, WEP)
>      b) key material taken from firmware key offset (CCMP-256, GCMP)
>   2) PN assigned by the device per the station, via the "new TX AP >      selected in iwl_mvm_set_tx_params().
> 
> 
> Am I wrong in thinking that both scenarios 1a) and 2) are completely
> acceptable for CAN_REPLACE_PTK0, since there's never any possibility of
> sending a frame with a mismatch between the PN assignment and key
> material?

  CAN_REPLACE_PTK0 only has to confirms that the driver is handling all 
the races PTK0 rekeying has. Which is possible in all of the cases above.
In a nutshell the driver and cards must never send out a frame where the 
PN is not matching to the PTK key used for encryption AND never send out 
a frame - when we are set up for encryption - with a deleted keys

> 
> However, it seems that scenario 1b) is what this patch attempted to
> handle, by flushing the TX queues when the new key is installed, and I'm
> not sure why you said it wasn't necessary - if the driver installs new
> key material in the device while there are frames that already have a PN
> set, then the old PN _might_ be used with a new key, leading to
> problems.

You are probably right but it depends how the HW is handling that in detail:

mac80211/iwlwifi/mwm together make sure that when a skb is send to the 
HW the HW also has a valid key for it. At that very moment the frame can 
be send. Of course chances are it will be queued there again...
Now when the key is only de-referenced on dequeue we may have an issue 
and need a flush or another mitigation.

Regardless how the HW is handling that here are some indirect mitigations.
mac80211 sets first KEY_FLAG_TAINTED and then waits for all rcu 
sections. After that the HW can't get any skb depending on the outgoing 
key. Then the old key is deleted from the HW and when it still has 
frames queued needing the deleted one it may go out with a null key or 
so. Even more time and waiting on running RCU sections will pass till 
the new key is installed and we can freeze the connection instead "only" 
having a cleartext leak.

When A-MPDU is active on rekey stopping it will also flush the 
problematic frames for my understanding.


> 
> There's a probabilistic defense against this, in that we attempt to
> reuse key offsets (the hw_key ID that goes into the TX command) as
> rarely as possible, so that if we put a frame with key offset 0 into the
> queue, and then reinstall the PTK, it would go to key offset 2 (1 being
> used by the GTK), and 0 would stay unless we did another rekeying or so.

The HW must then also be able to detect when a frame refers to a free 
hw_key ID and drop those frames.

> I guess in theory we could arrange -- on the hardware where case 1b) is
> even relevant, i.e. only 9260 since previous don't support GCMP, and
> later use case 2) -- to never put the rekeyed PTK into the same slot,
> that way, even if we do have to reuse the slot, it'd be with a different
> key? But in theory that might leak data to the wrong station or
> something ... functionally, it would lead to a rejection of the frame
> and it being dropped at the receiver, but security-wise it'd be a
> problem.

Maybe keeping the hw_key_id but setting and checking a key color would 
be better?
Not sure what the HW is capable of here but adding a single bit to the 
key information and compare it to the color of the frame can hopefully 
be done in the firmware.
We then would have to make sure the hw_key_id is not changing on rekey 
to be sure concurrent rekeys are not swapping the hw_key_ids somehow.

> 
> 
> Anyway, I'm not really sure why you thought the flush wasn't needed, it
> seems to me it is still needed in the case 1b).
> 

I was never able to get my mvm cards to mix up the key or send cleartext 
frames. (I think I first played around with CCMP-256 and later hacked 
CCMP to also use the key mapping.)

But I may have mixed up mac80211 queues with HW queues:
While mac80211 queues are safe the HW queues may still need the flush.
We don't have to flush the mac80211 queues. Cycling or draining all 
frames handed off to the HW must be sufficient.


> Theoretically, it seems we could rely on the "no key slot (offset)
> reuse" if we put some kind of barrier into the TX queues whenever we
> stop using a key slot, that way we'd know if it might still get used or
> not. If yes, we flush, but that basically never happens since there are
> a relatively large number of slots and typical use cases don't have so
> many keys.
> 
One of my attempts to fix the PTK0 issues tried to use something like 
that for the ath9k driver. It turned out that we still have to stop TX 
for the outgoing key and can only resume once the new key is active in 
the HW, making key slot cycling unnecessary.
That said it should work here, but only when the HW is able to drop 
frames pointing to an cleared hw key.

As I see it mvm only can get and hand over frames to the HW when the 
correct key is still installed in the HW.

> 
> The other question I had was concerning the documented requirements for
> NL80211_EXT_FEATURE_CAN_REPLACE_PTK0, aren't those too strict? For
> reference, this is what it says now:
> 
>> * Mac80211 drivers should set the @NL80211_EXT_FEATURE_CAN_REPLACE_PTK0 flag
>> * when they are able to replace in-use PTK keys according to the following
>> * requirements:
>> * 1) They do not hand over frames decrypted with the old key to
>>       mac80211 once the call to set_key() with command %DISABLE_KEY has been
>>       completed when also setting @IEEE80211_KEY_FLAG_GENERATE_IV for any key,
> 
> This is I think a bit misleading, how's the RX-side related to
> GENERATE_IV? It seems to me that this requirement is to ensure we don't
> get a bad replay counter on RX, but then that's unrelated to
> GENERATE_IV?

Yes, looks like that ended up in the wrong sections with all the 
rewording I've done here.
The idea is to not bump the PN of the new key with a frame decrypted by 
a now deleted key only.

> 
>>    2) either drop or continue to use the old key for any outgoing frames queued
>>       at the time of the key deletion (including re-transmits),
> 
> That's mostly true for iwlwifi, apart from the case 1b) key offset reuse
> I was explaining above.
> 
>>    3) never send out a frame queued prior to the set_key() %SET_KEY command
>>       encrypted with the new key and
> 
> This I don't really know why - I think maybe *this* was meant to have
> the "when also setting GENERATE_IV"?

Now I'm sure you are right:
Any frame queued prior to %SET_KEY with @IEEE80211_KEY_FLAG_GENERATE_IV 
can bump the new PN to a invalid value. But when the hw or driver is 
handling that it's ok.

> 
> I don't think this is necessary if you ensure that the PN is assigned
> from the correct key? That is, it doesn't really apply in case 2) I
> mentioned?

exactly.

> 
> 
>>    4) never send out a frame unencrypted when it should be encrypted.
> 
> Obviously :)
> 
> 
> Thanks,
> johannes
>
Johannes Berg Sept. 24, 2021, 7:36 a.m. UTC | #6
[dropping now dead linuxwifi@intel.com]

> 
> > I know it's been (almost exactly) a year, but I was wondering about this
> > scenario recently
> > (due to e.g. https://bugzilla.kernel.org/show_bug.cgi?id=213059, though
> > I'm not sure that report even makes sense).
> 
> I had a quick look in the ticket. It may just be the "other end" which 
> is not able to re-key correctly. After all the tests I did with mvm I 
> should have noticed rekey issues.

Maybe. But we see this also on Cisco gear, which I'd hope wasn't so bad
:)

Also, I wasn't sure what NIC you had actually tested - there are three
different scenarios depending on the NIC/encryption, as I outlined in my
previous email (1a, 1b, 2).

> But there is one thing sticking out in the problem description:
> kernel: wlan0: disassociated from 4c:a6:4d:54:32:6b (Reason: 
> 15=4WAY_HANDSHAKE_TIMEOUT)
> 
> --> with that it's can't be a PTK0 rekey bug as we defined it so far.
> With the 4way handshake incomplete the new key has never been install on 
> the device the log is from.

Hmm. Good point. But we did also see the message saying

	Rekeying PTK for STA ... but driver can't safely do that.

so it would appear that some rekeying happened successfully.

Oh. So maybe in this particular case some rekeying happened
successfully, but then another rekeying later got stuck?

> So this is probably a case where the other end - should be the STA and 
> not the AP - activated the new key and then encrypts EAPOL#4 with the 
> new key... When the other end is not using control port and the driver 
> has no special workaround this can easily happen..
> 
> Problem here is, that when wpa_supplicant hands over EAPOL #4 to the 
> kernel this only guarantees that one frame is send out. When there are 
> others in the QDISC that won't be the EAPOL#4 frame. See
> https://patchwork.ozlabs.org/project/hostap/patch/20190912192002.6105-1-alexander@wetzel-home.de/

Well, we don't even know that the other end is using mac80211/hostapd,
but I get your point.

> > Did you ever send another patch? I can't seem to find any.
> > 
> 
> No, I never send a patch. To be able to detect incorrect key usage I 
> ended up implementing key color for mac80211 only to discover that this 
> in not adding anything useful and never submitted it. I ended up with 
> the understanding that we do not need the flush but have now problems to 
> reproduce the reasoning for that. But let's see what I can together here.

OK.

> > 
> > But basically, in mvm, we support two scenarios these days:
> > 
> >   1) PN assigned by the driver, in iwl_mvm_set_tx_cmd_crypto(), with two
> >      sub-cases:
> >      a) key material embedded into the TX command (CCMP, TKIP, WEP)
> >      b) key material taken from firmware key offset (CCMP-256, GCMP)
> >   2) PN assigned by the device per the station, via the "new TX AP >      selected in iwl_mvm_set_tx_params().
> > 
> > 
> > Am I wrong in thinking that both scenarios 1a) and 2) are completely
> > acceptable for CAN_REPLACE_PTK0, since there's never any possibility of
> > sending a frame with a mismatch between the PN assignment and key
> > material?
> 
>   CAN_REPLACE_PTK0 only has to confirms that the driver is handling all 
> the races PTK0 rekeying has. Which is possible in all of the cases above.
> In a nutshell the driver and cards must never send out a frame where the 
> PN is not matching to the PTK key used for encryption AND never send out 
> a frame - when we are set up for encryption - with a deleted keys

Right. Mostly, anyway. In 1b, we do have a race, because the PN is
assigned by driver and the key material can be programmed while frames
(with old PN) are on the TX queue(s), which can be handled by flushing
all the (affected) queues (i.e. for this station).

However, the race is often benign, because what happens is that we have
32 (I think? maybe 16) key "slots" and reuse them as late as possible.
Thus, if we do a pretty standard rekeying, the reference to the old key,
when dereferenced in the firmware, will actually still be to the correct
old key. Only if we had enough new keys installed to override that
reference, would we have any issues.

The bigger problem here happens if we do have enough key slots for this
(e.g. because we're the AP) and then reuse it ... potentially with a key
for a *different* station/connection. Then when the firmware
dereferences it, it'd be (theoretically) readable to another station
that it wasn't intended for, though the good thing is that the receiver
won't be able to decrypt it (anyway even if it's with the old key).

Only if we roll over enough keys that the new PTK0 ends up in the same
slots would we actually end up in the "stuck connection" case.

Either way though, if we roll over the key (regardless with which key),
we have a problem.

> > However, it seems that scenario 1b) is what this patch attempted to
> > handle, by flushing the TX queues when the new key is installed, and I'm
> > not sure why you said it wasn't necessary - if the driver installs new
> > key material in the device while there are frames that already have a PN
> > set, then the old PN _might_ be used with a new key, leading to
> > problems.
> 
> You are probably right but it depends how the HW is handling that in detail:
> 
> mac80211/iwlwifi/mwm together make sure that when a skb is send to the 
> HW the HW also has a valid key for it. At that very moment the frame can 
> be send. Of course chances are it will be queued there again...
> Now when the key is only de-referenced on dequeue we may have an issue 
> and need a flush or another mitigation.

Yes, this is what happens in 1b.

> Regardless how the HW is handling that here are some indirect mitigations.
> mac80211 sets first KEY_FLAG_TAINTED and then waits for all rcu 
> sections. After that the HW can't get any skb depending on the outgoing 
> key. Then the old key is deleted from the HW and when it still has 
> frames queued needing the deleted one it may go out with a null key or 
> so. Even more time and waiting on running RCU sections will pass till 
> the new key is installed and we can freeze the connection instead "only" 
> having a cleartext leak.

(Note the above description about key slot rollover)

> When A-MPDU is active on rekey stopping it will also flush the 
> problematic frames for my understanding.

Yeah. I think so.

Actually, I think this whole business of stopping aggregation sessions
is another potential issue - though particularly *RX* sessions, because
we don't know if the AP will always re-establish them.

> > There's a probabilistic defense against this, in that we attempt to
> > reuse key offsets (the hw_key ID that goes into the TX command) as
> > rarely as possible, so that if we put a frame with key offset 0 into the
> > queue, and then reinstall the PTK, it would go to key offset 2 (1 being
> > used by the GTK), and 0 would stay unless we did another rekeying or so.
> 
> The HW must then also be able to detect when a frame refers to a free 
> hw_key ID and drop those frames.

No, the hw_key ID isn't marked free, it just keeps the old key material,
unless we reuse that slot. If it's not overwritten, this isn't really a
big deal since the receiver won't be able to decrypt it any more, but in
any case it's still protected well.

The complication just comes in if we reuse that slot, as described
above.
> 
> Maybe keeping the hw_key_id but setting and checking a key color would 
> be better?

That would require firmware support.

> Not sure what the HW is capable of here but adding a single bit to the 
> key information and compare it to the color of the frame can hopefully 
> be done in the firmware.

It could be, yes. And I could even implement it, but it wouldn't really
roll out well, and it's a really old device. I don't think it's worth
the effort.

> We then would have to make sure the hw_key_id is not changing on rekey 
> to be sure concurrent rekeys are not swapping the hw_key_ids somehow.

Yeah we could ensure that.


> I was never able to get my mvm cards to mix up the key or send cleartext 
> frames. (I think I first played around with CCMP-256 and later hacked 
> CCMP to also use the key mapping.)

CCMP-256 was never supported, I think, but GCMP/GCMP-256.

But yes, I'm not surprised you weren't able to, given that key slot
reuse would have to occur.

> But I may have mixed up mac80211 queues with HW queues:
> While mac80211 queues are safe the HW queues may still need the flush.
> We don't have to flush the mac80211 queues. Cycling or draining all 
> frames handed off to the HW must be sufficient.

Right.

> 
> One of my attempts to fix the PTK0 issues tried to use something like 
> that for the ath9k driver. It turned out that we still have to stop TX 
> for the outgoing key and can only resume once the new key is active in 
> the HW, making key slot cycling unnecessary.
> That said it should work here, but only when the HW is able to drop 
> frames pointing to an cleared hw key.
> 
> As I see it mvm only can get and hand over frames to the HW when the 
> correct key is still installed in the HW.

It's not HW, it's firmware, but yeah. The thing though is that the
driver has assigned the PN - so key slot reuse ...
> 

> > The other question I had was concerning the documented requirements for
> > NL80211_EXT_FEATURE_CAN_REPLACE_PTK0, aren't those too strict? For
> > reference, this is what it says now:
> > 
> > > * Mac80211 drivers should set the @NL80211_EXT_FEATURE_CAN_REPLACE_PTK0 flag
> > > * when they are able to replace in-use PTK keys according to the following
> > > * requirements:
> > > * 1) They do not hand over frames decrypted with the old key to
> > >       mac80211 once the call to set_key() with command %DISABLE_KEY has been
> > >       completed when also setting @IEEE80211_KEY_FLAG_GENERATE_IV for any key,
> > 
> > This is I think a bit misleading, how's the RX-side related to
> > GENERATE_IV? It seems to me that this requirement is to ensure we don't
> > get a bad replay counter on RX, but then that's unrelated to
> > GENERATE_IV?
> 
> Yes, looks like that ended up in the wrong sections with all the 
> rewording I've done here.

OK, thanks. Want to send a patch? Or I can fix it too.

> The idea is to not bump the PN of the new key with a frame decrypted by 
> a now deleted key only.

Yeah. I'd phrase it TX-side and say "the idea is to never do PN
assignment and encryption from two different keys" :)

> > 
> > >    2) either drop or continue to use the old key for any outgoing frames queued
> > >       at the time of the key deletion (including re-transmits),
> > 
> > That's mostly true for iwlwifi, apart from the case 1b) key offset reuse
> > I was explaining above.
> > 
> > >    3) never send out a frame queued prior to the set_key() %SET_KEY command
> > >       encrypted with the new key and
> > 
> > This I don't really know why - I think maybe *this* was meant to have
> > the "when also setting GENERATE_IV"?
> 
> Now I'm sure you are right:
> Any frame queued prior to %SET_KEY with @IEEE80211_KEY_FLAG_GENERATE_IV 
> can bump the new PN to a invalid value. But when the hw or driver is 
> handling that it's ok.

Right.

Thanks a lot!

johannes
Alexander Wetzel Sept. 24, 2021, 8:33 p.m. UTC | #7
On 24.09.21 09:36, Johannes Berg wrote:
> [dropping now dead linuxwifi@intel.com]
> 
>>
>>> I know it's been (almost exactly) a year, but I was wondering about this
>>> scenario recently
>>> (due to e.g. https://bugzilla.kernel.org/show_bug.cgi?id=213059, though
>>> I'm not sure that report even makes sense).
>>
>> I had a quick look in the ticket. It may just be the "other end" which
>> is not able to re-key correctly. After all the tests I did with mvm I
>> should have noticed rekey issues.
> 
> Maybe. But we see this also on Cisco gear, which I'd hope wasn't so bad
> :)
> 

I no longer have any expectations here. In the end it's probably just 
luck if they test for it and/or the developers notice it.
After all even the authors of ieee892.11 did miss the implications of 
what they put in the standard. A few warning words or better something 
like Extended Key ID would have avoided most of the issues quite easily 
back then.

> Also, I wasn't sure what NIC you had actually tested - there are three
> different scenarios depending on the NIC/encryption, as I outlined in my
> previous email (1a, 1b, 2).
AC 9260 and AC 9560

> 
>> But there is one thing sticking out in the problem description:
>> kernel: wlan0: disassociated from 4c:a6:4d:54:32:6b (Reason:
>> 15=4WAY_HANDSHAKE_TIMEOUT)
>>
>> --> with that it's can't be a PTK0 rekey bug as we defined it so far.
>> With the 4way handshake incomplete the new key has never been install on
>> the device the log is from.
> 
> Hmm. Good point. But we did also see the message saying
> 
> 	Rekeying PTK for STA ... but driver can't safely do that.
> 
> so it would appear that some rekeying happened successfully.
> 
> Oh. So maybe in this particular case some rekeying happened
> successfully, but then another rekeying later got stuck?
> 
>> So this is probably a case where the other end - should be the STA and
>> not the AP - activated the new key and then encrypts EAPOL#4 with the
>> new key... When the other end is not using control port and the driver
>> has no special workaround this can easily happen..
>>
>> Problem here is, that when wpa_supplicant hands over EAPOL #4 to the
>> kernel this only guarantees that one frame is send out. When there are
>> others in the QDISC that won't be the EAPOL#4 frame. See
>> https://patchwork.ozlabs.org/project/hostap/patch/20190912192002.6105-1-alexander@wetzel-home.de/

I'll look at that closer. But only way I see is:
1) STA sends out EAPOL#4
2) AP is not reviving/able to decrypt it
3) And then disassociates the STA.

That seems to indicate that the mvm card did send out EAPOL#4 encrypted 
with the new key. Assuming the STA is modern enough to use control port 
that indicates some serious delay in mvm HW queues..

Now when the STA is not using control port the observation in the bug 
report would be more or less the "expected" behavior. We probably should 
ask for a wpa_supplicant debug log here.


> 
> Well, we don't even know that the other end is using mac80211/hostapd,
> but I get your point.
> 
>>> Did you ever send another patch? I can't seem to find any.
>>>
>>
>> No, I never send a patch. To be able to detect incorrect key usage I
>> ended up implementing key color for mac80211 only to discover that this
>> in not adding anything useful and never submitted it. I ended up with
>> the understanding that we do not need the flush but have now problems to
>> reproduce the reasoning for that. But let's see what I can together here.
> 
> OK.
> 
>>>
>>> But basically, in mvm, we support two scenarios these days:
>>>
>>>    1) PN assigned by the driver, in iwl_mvm_set_tx_cmd_crypto(), with two
>>>       sub-cases:
>>>       a) key material embedded into the TX command (CCMP, TKIP, WEP)
>>>       b) key material taken from firmware key offset (CCMP-256, GCMP)
>>>    2) PN assigned by the device per the station, via the "new TX AP >      selected in iwl_mvm_set_tx_params().
>>>
>>>
>>> Am I wrong in thinking that both scenarios 1a) and 2) are completely
>>> acceptable for CAN_REPLACE_PTK0, since there's never any possibility of
>>> sending a frame with a mismatch between the PN assignment and key
>>> material?
>>
>>    CAN_REPLACE_PTK0 only has to confirms that the driver is handling all
>> the races PTK0 rekeying has. Which is possible in all of the cases above.
>> In a nutshell the driver and cards must never send out a frame where the
>> PN is not matching to the PTK key used for encryption AND never send out
>> a frame - when we are set up for encryption - with a deleted keys
> 
> Right. Mostly, anyway. In 1b, we do have a race, because the PN is
> assigned by driver and the key material can be programmed while frames
> (with old PN) are on the TX queue(s), which can be handled by flushing
> all the (affected) queues (i.e. for this station).
> 
> However, the race is often benign, because what happens is that we have
> 32 (I think? maybe 16) key "slots" and reuse them as late as possible.
> Thus, if we do a pretty standard rekeying, the reference to the old key,
> when dereferenced in the firmware, will actually still be to the correct
> old key. Only if we had enough new keys installed to override that
> reference, would we have any issues.

You want to skip deleting the key in the firmware when mac80211 issues 
the DISABLE_KEY set_key() call for it? That should indeed work.
As long as the firmware plays along. It was acting a bit odd when having 
two STA keys installed when I tried Extended Key ID with it.

> 
> The bigger problem here happens if we do have enough key slots for this
> (e.g. because we're the AP) and then reuse it ... potentially with a key
> for a *different* station/connection. Then when the firmware
> dereferences it, it'd be (theoretically) readable to another station
> that it wasn't intended for, though the good thing is that the receiver
> won't be able to decrypt it (anyway even if it's with the old key).

When the key slot is reused it will have the new key installed for the 
current (new) station. It should be decrypted and be perfectly readable. 
Worse, if the PN was generated not by the HW it will bump the PN and 
make the connection unusable.

> Only if we roll over enough keys that the new PTK0 ends up in the same
> slots would we actually end up in the "stuck connection" case.
> 
> Either way though, if we roll over the key (regardless with which key),
> we have a problem.
> 
>>> However, it seems that scenario 1b) is what this patch attempted to
>>> handle, by flushing the TX queues when the new key is installed, and I'm
>>> not sure why you said it wasn't necessary - if the driver installs new
>>> key material in the device while there are frames that already have a PN
>>> set, then the old PN _might_ be used with a new key, leading to
>>> problems.
>>
>> You are probably right but it depends how the HW is handling that in detail:
>>
>> mac80211/iwlwifi/mwm together make sure that when a skb is send to the
>> HW the HW also has a valid key for it. At that very moment the frame can
>> be send. Of course chances are it will be queued there again...
>> Now when the key is only de-referenced on dequeue we may have an issue
>> and need a flush or another mitigation.
> 
> Yes, this is what happens in 1b.
> 
>> Regardless how the HW is handling that here are some indirect mitigations.
>> mac80211 sets first KEY_FLAG_TAINTED and then waits for all rcu
>> sections. After that the HW can't get any skb depending on the outgoing
>> key. Then the old key is deleted from the HW and when it still has
>> frames queued needing the deleted one it may go out with a null key or
>> so. Even more time and waiting on running RCU sections will pass till
>> the new key is installed and we can freeze the connection instead "only"
>> having a cleartext leak.
> 
> (Note the above description about key slot rollover)
> 
>> When A-MPDU is active on rekey stopping it will also flush the
>> problematic frames for my understanding.
> 
> Yeah. I think so.
> 
> Actually, I think this whole business of stopping aggregation sessions
> is another potential issue - though particularly *RX* sessions, because
> we don't know if the AP will always re-establish them.
> 
Works fine for my HW but I get your point.
Unfortunately exactly the RX sessions are the problem: Noting in the 
standard is clearly forbidding to mix old and new PTK key in one session 
and for sure it happens in reality all the time.

So when one A-MPDU group includes any frame with a PN from the old key 
it will bump the PN of the new key and freeze the connection.

To avoid that we have to identify and (probably) drop the frames after 
the MPDU reorder buffer. (I did not look into that at all so far.)
But looks like that needs a mac80211 API extensions and very likely 
firmware support from (many) cards. Or some really ugly heuristic 
potentially impacting security.

>>> There's a probabilistic defense against this, in that we attempt to
>>> reuse key offsets (the hw_key ID that goes into the TX command) as
>>> rarely as possible, so that if we put a frame with key offset 0 into the
>>> queue, and then reinstall the PTK, it would go to key offset 2 (1 being
>>> used by the GTK), and 0 would stay unless we did another rekeying or so.
>>
>> The HW must then also be able to detect when a frame refers to a free
>> hw_key ID and drop those frames.
> 
> No, the hw_key ID isn't marked free, it just keeps the old key material,
> unless we reuse that slot. If it's not overwritten, this isn't really a
> big deal since the receiver won't be able to decrypt it any more, but in
> any case it's still protected well.
> 
> The complication just comes in if we reuse that slot, as described
> above.

Got it now.

>>
>> Maybe keeping the hw_key_id but setting and checking a key color would
>> be better?
> 
> That would require firmware support.
> 
>> Not sure what the HW is capable of here but adding a single bit to the
>> key information and compare it to the color of the frame can hopefully
>> be done in the firmware.
> 
> It could be, yes. And I could even implement it, but it wouldn't really
> roll out well, and it's a really old device. I don't think it's worth
> the effort.
> 
>> We then would have to make sure the hw_key_id is not changing on rekey
>> to be sure concurrent rekeys are not swapping the hw_key_ids somehow.
> 
> Yeah we could ensure that.
> 
> 
>> I was never able to get my mvm cards to mix up the key or send cleartext
>> frames. (I think I first played around with CCMP-256 and later hacked
>> CCMP to also use the key mapping.)
> 
> CCMP-256 was never supported, I think, but GCMP/GCMP-256.
> 
> But yes, I'm not surprised you weren't able to, given that key slot
> reuse would have to occur.
> 
>> But I may have mixed up mac80211 queues with HW queues:
>> While mac80211 queues are safe the HW queues may still need the flush.
>> We don't have to flush the mac80211 queues. Cycling or draining all
>> frames handed off to the HW must be sufficient.
> 
> Right.
> 

So we have two solutions: Flushing the queues when we rekey 
GCMP/GCMP-256 or cycle the key slots. Flushing is quite simple but 
cycling the key ids is in >>99% getting the job done without a flush.

>>
>> One of my attempts to fix the PTK0 issues tried to use something like
>> that for the ath9k driver. It turned out that we still have to stop TX
>> for the outgoing key and can only resume once the new key is active in
>> the HW, making key slot cycling unnecessary.
>> That said it should work here, but only when the HW is able to drop
>> frames pointing to an cleared hw key.
>>
>> As I see it mvm only can get and hand over frames to the HW when the
>> correct key is still installed in the HW.
> 
> It's not HW, it's firmware, but yeah. The thing though is that the
> driver has assigned the PN - so key slot reuse ... (Both are after all using keyid 0.)

So we either flush, rotate the key slots or find something else we can 
do in the driver and does not need firmware updates.

>>
> 
>>> The other question I had was concerning the documented requirements for
>>> NL80211_EXT_FEATURE_CAN_REPLACE_PTK0, aren't those too strict? For
>>> reference, this is what it says now:
>>>
>>>> * Mac80211 drivers should set the @NL80211_EXT_FEATURE_CAN_REPLACE_PTK0 flag
>>>> * when they are able to replace in-use PTK keys according to the following
>>>> * requirements:
>>>> * 1) They do not hand over frames decrypted with the old key to
>>>>        mac80211 once the call to set_key() with command %DISABLE_KEY has been
>>>>        completed when also setting @IEEE80211_KEY_FLAG_GENERATE_IV for any key,
>>>
>>> This is I think a bit misleading, how's the RX-side related to
>>> GENERATE_IV? It seems to me that this requirement is to ensure we don't
>>> get a bad replay counter on RX, but then that's unrelated to
>>> GENERATE_IV?
>>
>> Yes, looks like that ended up in the wrong sections with all the
>> rewording I've done here.
> 
> OK, thanks. Want to send a patch? Or I can fix it too.
> 

No preference, I'm fine when you do it. But I've submitted a patch now 
so you hopeful just can merge it.

>> The idea is to not bump the PN of the new key with a frame decrypted by
>> a now deleted key only.
> 
> Yeah. I'd phrase it TX-side and say "the idea is to never do PN
> assignment and encryption from two different keys" :)
> 
Ups, should have read that prior to submitting the patch. Just change 
whatever you want or make your own patch.
I just moved the wrong statement to the correct block.

>>>
>>>>     2) either drop or continue to use the old key for any outgoing frames queued
>>>>        at the time of the key deletion (including re-transmits),
>>>
>>> That's mostly true for iwlwifi, apart from the case 1b) key offset reuse
>>> I was explaining above.
>>>
>>>>     3) never send out a frame queued prior to the set_key() %SET_KEY command
>>>>        encrypted with the new key and
>>>
>>> This I don't really know why - I think maybe *this* was meant to have
>>> the "when also setting GENERATE_IV"?
>>
>> Now I'm sure you are right:
>> Any frame queued prior to %SET_KEY with @IEEE80211_KEY_FLAG_GENERATE_IV
>> can bump the new PN to a invalid value. But when the hw or driver is
>> handling that it's ok.
> 
> Right.
> 
> Thanks a lot!
> 
> johannes
>
diff mbox series

Patch

diff --git a/drivers/net/wireless/intel/iwlwifi/dvm/mac80211.c b/drivers/net/wireless/intel/iwlwifi/dvm/mac80211.c
index 423d3c396b2d..8846203526d2 100644
--- a/drivers/net/wireless/intel/iwlwifi/dvm/mac80211.c
+++ b/drivers/net/wireless/intel/iwlwifi/dvm/mac80211.c
@@ -201,6 +201,7 @@  int iwlagn_mac_setup_register(struct iwl_priv *priv,
 
 	wiphy_ext_feature_set(hw->wiphy, NL80211_EXT_FEATURE_CQM_RSSI_LIST);
 	wiphy_ext_feature_set(hw->wiphy, NL80211_EXT_FEATURE_EXT_KEY_ID);
+	wiphy_ext_feature_set(hw->wiphy, NL80211_EXT_FEATURE_CAN_REPLACE_PTK0);
 
 	ret = ieee80211_register_hw(priv->hw);
 	if (ret) {
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
index 9374c85c5caf..40003de28556 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
@@ -83,6 +83,9 @@ 
 #include "iwl-prph.h"
 #include "iwl-nvm-parse.h"
 
+static void iwl_mvm_mac_flush(struct ieee80211_hw *hw,
+			      struct ieee80211_vif *vif, u32 queues, bool drop);
+
 static const struct ieee80211_iface_limit iwl_mvm_limits[] = {
 	{
 		.max = 1,
@@ -543,6 +546,7 @@  int iwl_mvm_mac_setup_register(struct iwl_mvm *mvm)
 
 	hw->wiphy->flags |= WIPHY_FLAG_IBSS_RSN;
 	wiphy_ext_feature_set(hw->wiphy, NL80211_EXT_FEATURE_VHT_IBSS);
+	wiphy_ext_feature_set(hw->wiphy, NL80211_EXT_FEATURE_CAN_REPLACE_PTK0);
 
 	/* The new Tx API does not allow to pass the key or keyid of a MPDU to
 	 * the hw, preventing us to control which key(id) to use per MPDU.
@@ -3540,6 +3544,20 @@  static int __iwl_mvm_mac_set_key(struct ieee80211_hw *hw,
 			break;
 		}
 
+		/* GCMP and 256 bit CCMP keys the key can't be copied into the
+		 * MPDU struct ieee80211_tx_info. We therefore must flush the
+		 * queues to ensure there are no MPDUs left which are referring
+		 * to the outgoing key.
+		 */
+		if (key->flags & IEEE80211_KEY_FLAG_PAIRWISE &&
+		    (key->cipher == WLAN_CIPHER_SUITE_GCMP ||
+		     key->cipher == WLAN_CIPHER_SUITE_GCMP_256 ||
+		     key->cipher == WLAN_CIPHER_SUITE_CCMP_256)) {
+			ieee80211_stop_queues(hw);
+			iwl_mvm_mac_flush(hw, vif, 0, true);
+			ieee80211_wake_queues(hw);
+		}
+
 		if (sta && iwl_mvm_has_new_rx_api(mvm) &&
 		    key->flags & IEEE80211_KEY_FLAG_PAIRWISE &&
 		    (key->cipher == WLAN_CIPHER_SUITE_CCMP ||
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
index 2f6484e0d726..9e0c55b75478 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
@@ -472,12 +472,6 @@  static void iwl_mvm_set_tx_cmd_crypto(struct iwl_mvm *mvm,
 		type = TX_CMD_SEC_GCMP;
 		/* Fall through */
 	case WLAN_CIPHER_SUITE_CCMP_256:
-		/* TODO: Taking the key from the table might introduce a race
-		 * when PTK rekeying is done, having an old packets with a PN
-		 * based on the old key but the message encrypted with a new
-		 * one.
-		 * Need to handle this.
-		 */
 		tx_cmd->sec_ctl |= type | TX_CMD_SEC_KEY_FROM_TABLE;
 		tx_cmd->key[0] = keyconf->hw_key_idx;
 		iwl_mvm_set_tx_cmd_pn(info, crypto_hdr);