diff mbox

[v2,2/2] mac80211: prevent possible crypto tx tailroom corruption

Message ID 1431508609-9841-2-git-send-email-michal.kazior@tieto.com (mailing list archive)
State Superseded
Delegated to: Johannes Berg
Headers show

Commit Message

Michal Kazior May 13, 2015, 9:16 a.m. UTC
There was a possible race between
ieee80211_reconfig() and
ieee80211_delayed_tailroom_dec(). This could
result in inability to transmit data if driver
crashed during roaming or rekeying and subsequent
skbs with insufficient tailroom appeared.

This race was probably never seen in the wild
because a device driver would have to crash AND
recover within 0.5s which is very unlikely.

I was able to prove this race exists after
changing the delay to 10s locally and crashing
ath10k via debugfs immediately after GTK
rekeying. In case of ath10k the counter went below
0. This was harmless but other drivers which
actually require tailroom (e.g. for WEP ICV or
MMIC) could end up with the counter at 0 instead
of >0 and introduce insufficient skb tailroom
failures because mac80211 would not resize skbs
appropriately anymore.

Fixes: 8d1f7ecd2af5 ("mac80211: defer tailroom counter manipulation when roaming")
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
---

Notes:
    While doing PATCH v2 [1/2] I've noticed a subtle bug in the
    delayed tailroom counter logic. Since this touches the
    codepaths [1/2] does I'm posting this as a pair.

 net/mac80211/key.c  | 5 ++++-
 net/mac80211/main.c | 3 +++
 2 files changed, 7 insertions(+), 1 deletion(-)

Comments

Johannes Berg May 20, 2015, 1:14 p.m. UTC | #1
On Wed, 2015-05-13 at 09:16 +0000, Michal Kazior wrote:
> There was a possible race between
> ieee80211_reconfig() and
> ieee80211_delayed_tailroom_dec(). This could
> result in inability to transmit data if driver
> crashed during roaming or rekeying and subsequent
> skbs with insufficient tailroom appeared.
> 
> This race was probably never seen in the wild
> because a device driver would have to crash AND
> recover within 0.5s which is very unlikely.
> 
> I was able to prove this race exists after
> changing the delay to 10s locally and crashing
> ath10k via debugfs immediately after GTK
> rekeying. In case of ath10k the counter went below
> 0. This was harmless but other drivers which
> actually require tailroom (e.g. for WEP ICV or
> MMIC) could end up with the counter at 0 instead
> of >0 and introduce insufficient skb tailroom
> failures because mac80211 would not resize skbs
> appropriately anymore.
> 
> Fixes: 8d1f7ecd2af5 ("mac80211: defer tailroom counter manipulation when roaming")
> Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
> ---
> 
> Notes:
>     While doing PATCH v2 [1/2] I've noticed a subtle bug in the
>     delayed tailroom counter logic. Since this touches the
>     codepaths [1/2] does I'm posting this as a pair.
> 
>  net/mac80211/key.c  | 5 ++++-
>  net/mac80211/main.c | 3 +++
>  2 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/net/mac80211/key.c b/net/mac80211/key.c
> index 577a11a13cdf..4c6f8c97d11a 100644
> --- a/net/mac80211/key.c
> +++ b/net/mac80211/key.c
> @@ -695,10 +695,13 @@ void ieee80211_reset_crypto_tx_tailroom(struct ieee80211_sub_if_data *sdata)
>  	mutex_lock(&sdata->local->key_mtx);
>  
>  	sdata->crypto_tx_tailroom_needed_cnt = 0;
> +	sdata->crypto_tx_tailroom_pending_dec = 0;
>  
>  	if (sdata->vif.type == NL80211_IFTYPE_AP) {
> -		list_for_each_entry(vlan, &sdata->u.ap.vlans, u.vlan.list)
> +		list_for_each_entry(vlan, &sdata->u.ap.vlans, u.vlan.list) {
>  			vlan->crypto_tx_tailroom_needed_cnt = 0;
> +			vlan->crypto_tx_tailroom_pending_dec = 0;
> +		}
>  	}
>  
>  	mutex_unlock(&sdata->local->key_mtx);
> diff --git a/net/mac80211/main.c b/net/mac80211/main.c
> index 3c956c5f99b2..d8e1cbdcbc43 100644
> --- a/net/mac80211/main.c
> +++ b/net/mac80211/main.c
> @@ -246,6 +246,7 @@ static void ieee80211_restart_work(struct work_struct *work)
>  {
>  	struct ieee80211_local *local =
>  		container_of(work, struct ieee80211_local, restart_work);
> +	struct ieee80211_sub_if_data *sdata;
>  
>  	/* wait for scan work complete */
>  	flush_workqueue(local->workqueue);
> @@ -254,6 +255,8 @@ static void ieee80211_restart_work(struct work_struct *work)
>  	     "%s called with hardware scan in progress\n", __func__);
>  
>  	rtnl_lock();
> +	list_for_each_entry(sdata, &local->interfaces, list)
> +		cancel_delayed_work_sync(&sdata->dec_tailroom_needed_wk);

Would it make sense to just flush the work here? That way we don't have
to do all the other things.

johannes

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michal Kazior May 21, 2015, 8:16 a.m. UTC | #2
On 20 May 2015 at 15:14, Johannes Berg <johannes@sipsolutions.net> wrote:
> On Wed, 2015-05-13 at 09:16 +0000, Michal Kazior wrote:
>> There was a possible race between
>> ieee80211_reconfig() and
>> ieee80211_delayed_tailroom_dec(). This could
>> result in inability to transmit data if driver
>> crashed during roaming or rekeying and subsequent
>> skbs with insufficient tailroom appeared.
>>
>> This race was probably never seen in the wild
>> because a device driver would have to crash AND
>> recover within 0.5s which is very unlikely.
>>
>> I was able to prove this race exists after
>> changing the delay to 10s locally and crashing
>> ath10k via debugfs immediately after GTK
>> rekeying. In case of ath10k the counter went below
>> 0. This was harmless but other drivers which
>> actually require tailroom (e.g. for WEP ICV or
>> MMIC) could end up with the counter at 0 instead
>> of >0 and introduce insufficient skb tailroom
>> failures because mac80211 would not resize skbs
>> appropriately anymore.
>>
>> Fixes: 8d1f7ecd2af5 ("mac80211: defer tailroom counter manipulation when roaming")
>> Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
>> ---
>>
>> Notes:
>>     While doing PATCH v2 [1/2] I've noticed a subtle bug in the
>>     delayed tailroom counter logic. Since this touches the
>>     codepaths [1/2] does I'm posting this as a pair.
>>
>>  net/mac80211/key.c  | 5 ++++-
>>  net/mac80211/main.c | 3 +++
>>  2 files changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/mac80211/key.c b/net/mac80211/key.c
>> index 577a11a13cdf..4c6f8c97d11a 100644
>> --- a/net/mac80211/key.c
>> +++ b/net/mac80211/key.c
>> @@ -695,10 +695,13 @@ void ieee80211_reset_crypto_tx_tailroom(struct ieee80211_sub_if_data *sdata)
>>       mutex_lock(&sdata->local->key_mtx);
>>
>>       sdata->crypto_tx_tailroom_needed_cnt = 0;
>> +     sdata->crypto_tx_tailroom_pending_dec = 0;
>>
>>       if (sdata->vif.type == NL80211_IFTYPE_AP) {
>> -             list_for_each_entry(vlan, &sdata->u.ap.vlans, u.vlan.list)
>> +             list_for_each_entry(vlan, &sdata->u.ap.vlans, u.vlan.list) {
>>                       vlan->crypto_tx_tailroom_needed_cnt = 0;
>> +                     vlan->crypto_tx_tailroom_pending_dec = 0;
>> +             }
>>       }
>>
>>       mutex_unlock(&sdata->local->key_mtx);
>> diff --git a/net/mac80211/main.c b/net/mac80211/main.c
>> index 3c956c5f99b2..d8e1cbdcbc43 100644
>> --- a/net/mac80211/main.c
>> +++ b/net/mac80211/main.c
>> @@ -246,6 +246,7 @@ static void ieee80211_restart_work(struct work_struct *work)
>>  {
>>       struct ieee80211_local *local =
>>               container_of(work, struct ieee80211_local, restart_work);
>> +     struct ieee80211_sub_if_data *sdata;
>>
>>       /* wait for scan work complete */
>>       flush_workqueue(local->workqueue);
>> @@ -254,6 +255,8 @@ static void ieee80211_restart_work(struct work_struct *work)
>>            "%s called with hardware scan in progress\n", __func__);
>>
>>       rtnl_lock();
>> +     list_for_each_entry(sdata, &local->interfaces, list)
>> +             cancel_delayed_work_sync(&sdata->dec_tailroom_needed_wk);
>
> Would it make sense to just flush the work here? That way we don't have
> to do all the other things.

Hmm.. dec_tailroom_needed_wk is queued on system workqueue now so
there's no feasible way of flushing it (restart_work is on a system
workqueue as well). It'd need to be moved to local->workqueue. I guess
that would work too.


Micha?
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Johannes Berg May 21, 2015, 8:30 a.m. UTC | #3
On Thu, 2015-05-21 at 10:16 +0200, Michal Kazior wrote:

> >>       rtnl_lock();
> >> +     list_for_each_entry(sdata, &local->interfaces, list)
> >> +             cancel_delayed_work_sync(&sdata->dec_tailroom_needed_wk);
> >
> > Would it make sense to just flush the work here? That way we don't have
> > to do all the other things.
> 
> Hmm.. dec_tailroom_needed_wk is queued on system workqueue now so
> there's no feasible way of flushing it (restart_work is on a system
> workqueue as well). It'd need to be moved to local->workqueue. I guess
> that would work too.

flush_work()?

johannes

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michal Kazior May 21, 2015, 8:44 a.m. UTC | #4
On 21 May 2015 at 10:30, Johannes Berg <johannes@sipsolutions.net> wrote:
> On Thu, 2015-05-21 at 10:16 +0200, Michal Kazior wrote:
>
>> >>       rtnl_lock();
>> >> +     list_for_each_entry(sdata, &local->interfaces, list)
>> >> +             cancel_delayed_work_sync(&sdata->dec_tailroom_needed_wk);
>> >
>> > Would it make sense to just flush the work here? That way we don't have
>> > to do all the other things.
>>
>> Hmm.. dec_tailroom_needed_wk is queued on system workqueue now so
>> there's no feasible way of flushing it (restart_work is on a system
>> workqueue as well). It'd need to be moved to local->workqueue. I guess
>> that would work too.
>
> flush_work()?

Oh. I wasn't aware of this call.. Thanks for pointing out :-)


Micha?
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/mac80211/key.c b/net/mac80211/key.c
index 577a11a13cdf..4c6f8c97d11a 100644
--- a/net/mac80211/key.c
+++ b/net/mac80211/key.c
@@ -695,10 +695,13 @@  void ieee80211_reset_crypto_tx_tailroom(struct ieee80211_sub_if_data *sdata)
 	mutex_lock(&sdata->local->key_mtx);
 
 	sdata->crypto_tx_tailroom_needed_cnt = 0;
+	sdata->crypto_tx_tailroom_pending_dec = 0;
 
 	if (sdata->vif.type == NL80211_IFTYPE_AP) {
-		list_for_each_entry(vlan, &sdata->u.ap.vlans, u.vlan.list)
+		list_for_each_entry(vlan, &sdata->u.ap.vlans, u.vlan.list) {
 			vlan->crypto_tx_tailroom_needed_cnt = 0;
+			vlan->crypto_tx_tailroom_pending_dec = 0;
+		}
 	}
 
 	mutex_unlock(&sdata->local->key_mtx);
diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index 3c956c5f99b2..d8e1cbdcbc43 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -246,6 +246,7 @@  static void ieee80211_restart_work(struct work_struct *work)
 {
 	struct ieee80211_local *local =
 		container_of(work, struct ieee80211_local, restart_work);
+	struct ieee80211_sub_if_data *sdata;
 
 	/* wait for scan work complete */
 	flush_workqueue(local->workqueue);
@@ -254,6 +255,8 @@  static void ieee80211_restart_work(struct work_struct *work)
 	     "%s called with hardware scan in progress\n", __func__);
 
 	rtnl_lock();
+	list_for_each_entry(sdata, &local->interfaces, list)
+		cancel_delayed_work_sync(&sdata->dec_tailroom_needed_wk);
 	ieee80211_scan_cancel(local);
 	ieee80211_reconfig(local);
 	rtnl_unlock();