diff mbox

ath10k: prevent sta pointer rcu violation

Message ID 1484234070-7431-1-git-send-email-michal.kazior@tieto.com (mailing list archive)
State Accepted
Commit 0a744d927406389e00687560d9ce3c5ab0e58db9
Delegated to: Kalle Valo
Headers show

Commit Message

Michal Kazior Jan. 12, 2017, 3:14 p.m. UTC
Station pointers are RCU protected so driver must
be extra careful if it tries to store them
internally for later use outside of the RCU
section it obtained it in.

It was possible for station teardown to race with
some htt events. The possible outcome could be a
use-after-free and a crash.

Only peer-flow-control capable firmware was
affected (so hardware-wise qca99x0 and qca4019).

This could be done in sta_state() itself via
explicit synchronize_net() call but there's
already a convenient sta_pre_rcu_remove() op that
can be hooked up to avoid extra rcu stall.

The peer->sta pointer itself can't be set to
NULL/ERR_PTR because it is later used in
sta_state() for extra sanity checks.

Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
---
 drivers/net/wireless/ath/ath10k/core.h |  1 +
 drivers/net/wireless/ath/ath10k/mac.c  | 18 ++++++++++++++++++
 2 files changed, 19 insertions(+)

Comments

Johannes Berg Jan. 12, 2017, 3:46 p.m. UTC | #1
On Thu, 2017-01-12 at 16:14 +0100, Michal Kazior wrote:
> Station pointers are RCU protected so driver must
> be extra careful if it tries to store them
> internally for later use outside of the RCU
> section it obtained it in.
> 
> It was possible for station teardown to race with
> some htt events. The possible outcome could be a
> use-after-free and a crash.
> 
> Only peer-flow-control capable firmware was
> affected (so hardware-wise qca99x0 and qca4019).
> 
> This could be done in sta_state() itself via
> explicit synchronize_net() call but there's
> already a convenient sta_pre_rcu_remove() op that
> can be hooked up to avoid extra rcu stall.

I don't think this makes sense. You're not using RCU-protected pointers
to the stations yourself, all accesses to them are locked under the
&ar->data_lock. As a consequence, you can't have any need for waiting
for a grace period. Since you also remove the pointer (under lock) when
the station gets removed, I don't think RCU can be the problem?

johannes
Michal Kazior Jan. 12, 2017, 4:18 p.m. UTC | #2
On 12 January 2017 at 16:46, Johannes Berg <johannes@sipsolutions.net> wrote:
> On Thu, 2017-01-12 at 16:14 +0100, Michal Kazior wrote:
>> Station pointers are RCU protected so driver must
>> be extra careful if it tries to store them
>> internally for later use outside of the RCU
>> section it obtained it in.
>>
>> It was possible for station teardown to race with
>> some htt events. The possible outcome could be a
>> use-after-free and a crash.
>>
>> Only peer-flow-control capable firmware was
>> affected (so hardware-wise qca99x0 and qca4019).
>>
>> This could be done in sta_state() itself via
>> explicit synchronize_net() call but there's
>> already a convenient sta_pre_rcu_remove() op that
>> can be hooked up to avoid extra rcu stall.
>
> I don't think this makes sense. You're not using RCU-protected pointers
> to the stations yourself, all accesses to them are locked under the
> &ar->data_lock. As a consequence, you can't have any need for waiting
> for a grace period. Since you also remove the pointer (under lock) when
> the station gets removed, I don't think RCU can be the problem?

Unless you then continue to use that sta pointer after you release
data_lock. Consider this:

>          CPU0             CPU1
> 1   synchronize_net()
> 2    drv_sta_state()
> 3                      htt_fetch_ind(pid,tid) called
> 4                      rcu_read_lock()
> 5                      get(data_lock)
> 6                      txq=peers[pid]->sta->txq[tid]
> 7                      put(data_lock)
> 8    get(data_lock)
> 9     peer->sta=0
> 10   put(data_lock)
> 11     kfree(sta)
> 12                     ieee80211_tx_dequeue(txq)

Even though there's no code like (9) per se you can think of it as
anything that tries to "remove" the peer--sta association (ath10k_peer
is removed implicitly via wmi peer delete command and waiting for htt
event completion).

Holding data_lock for the entire duration of handling fetch indication
isn't really good for performance so it's better to fix RCU handling.


Michał
Johannes Berg Jan. 13, 2017, 7:24 a.m. UTC | #3
> Unless you then continue to use that sta pointer after you release
> data_lock.

Ouch, ok. That's rather strangely hidden though.

> Consider this:
> 
> >          CPU0             CPU1
> > 1   synchronize_net()
> > 2    drv_sta_state()
> > 3                      htt_fetch_ind(pid,tid) called
> > 4                      rcu_read_lock()
> > 5                      get(data_lock)
> > 6                      txq=peers[pid]->sta->txq[tid]
> > 7                      put(data_lock)
> > 8    get(data_lock)
> > 9     peer->sta=0
> > 10   put(data_lock)
> > 11     kfree(sta)
> > 12                     ieee80211_tx_dequeue(txq)
> 
> Even though there's no code like (9) per se you can think of it as
> anything that tries to "remove" the peer--sta association
> (ath10k_peer
> is removed implicitly via wmi peer delete command and waiting for htt
> event completion).
> 
> Holding data_lock for the entire duration of handling fetch
> indication isn't really good for performance so it's better to fix
> RCU handling.

Yeah, I see it now - also the comment where this happens. You probably
should mark some things __rcu though and actually use RCU primitives,
so the code is actually understandable :)

johannes
Michal Kazior Jan. 13, 2017, 8:32 a.m. UTC | #4
On 13 January 2017 at 08:24, Johannes Berg <johannes@sipsolutions.net> wrote:
>
>> Unless you then continue to use that sta pointer after you release
>> data_lock.
>
> Ouch, ok. That's rather strangely hidden though.
>
>> Consider this:
>>
>> >          CPU0             CPU1
>> > 1   synchronize_net()
>> > 2    drv_sta_state()
>> > 3                      htt_fetch_ind(pid,tid) called
>> > 4                      rcu_read_lock()
>> > 5                      get(data_lock)
>> > 6                      txq=peers[pid]->sta->txq[tid]
>> > 7                      put(data_lock)
>> > 8    get(data_lock)
>> > 9     peer->sta=0
>> > 10   put(data_lock)
>> > 11     kfree(sta)
>> > 12                     ieee80211_tx_dequeue(txq)
>>
>> Even though there's no code like (9) per se you can think of it as
>> anything that tries to "remove" the peer--sta association
>> (ath10k_peer
>> is removed implicitly via wmi peer delete command and waiting for htt
>> event completion).
>>
>> Holding data_lock for the entire duration of handling fetch
>> indication isn't really good for performance so it's better to fix
>> RCU handling.
>
> Yeah, I see it now - also the comment where this happens. You probably
> should mark some things __rcu though and actually use RCU primitives,
> so the code is actually understandable :)

Good point. I'll do that in another patch.


Michał
Kalle Valo Jan. 19, 2017, 1:18 p.m. UTC | #5
Michal Kazior <michal.kazior@tieto.com> wrote:
> Station pointers are RCU protected so driver must
> be extra careful if it tries to store them
> internally for later use outside of the RCU
> section it obtained it in.
> 
> It was possible for station teardown to race with
> some htt events. The possible outcome could be a
> use-after-free and a crash.
> 
> Only peer-flow-control capable firmware was
> affected (so hardware-wise qca99x0 and qca4019).
> 
> This could be done in sta_state() itself via
> explicit synchronize_net() call but there's
> already a convenient sta_pre_rcu_remove() op that
> can be hooked up to avoid extra rcu stall.
> 
> The peer->sta pointer itself can't be set to
> NULL/ERR_PTR because it is later used in
> sta_state() for extra sanity checks.
> 
> Signed-off-by: Michal Kazior <michal.kazior@tieto.com>

Patch applied to ath-next branch of ath.git, thanks.

0a744d927406 ath10k: prevent sta pointer rcu violation
diff mbox

Patch

diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
index c7664d6569fa..1ab589296dff 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -314,6 +314,7 @@  struct ath10k_peer {
 	struct ieee80211_vif *vif;
 	struct ieee80211_sta *sta;
 
+	bool removed;
 	int vdev_id;
 	u8 addr[ETH_ALEN];
 	DECLARE_BITMAP(peer_ids, ATH10K_MAX_NUM_PEER_IDS);
diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
index d1b7edba5e49..aa91f55b35a4 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -3774,6 +3774,9 @@  struct ieee80211_txq *ath10k_mac_txq_lookup(struct ath10k *ar,
 	if (!peer)
 		return NULL;
 
+	if (peer->removed)
+		return NULL;
+
 	if (peer->sta)
 		return peer->sta->txq[tid];
 	else if (peer->vif)
@@ -7476,6 +7479,20 @@  ath10k_mac_op_switch_vif_chanctx(struct ieee80211_hw *hw,
 	return 0;
 }
 
+static void ath10k_mac_op_sta_pre_rcu_remove(struct ieee80211_hw *hw,
+					     struct ieee80211_vif *vif,
+					     struct ieee80211_sta *sta)
+{
+	struct ath10k *ar;
+	struct ath10k_peer *peer;
+
+	ar = hw->priv;
+
+	list_for_each_entry(peer, &ar->peers, list)
+		if (peer->sta == sta)
+			peer->removed = true;
+}
+
 static const struct ieee80211_ops ath10k_ops = {
 	.tx				= ath10k_mac_op_tx,
 	.wake_tx_queue			= ath10k_mac_op_wake_tx_queue,
@@ -7516,6 +7533,7 @@  static const struct ieee80211_ops ath10k_ops = {
 	.assign_vif_chanctx		= ath10k_mac_op_assign_vif_chanctx,
 	.unassign_vif_chanctx		= ath10k_mac_op_unassign_vif_chanctx,
 	.switch_vif_chanctx		= ath10k_mac_op_switch_vif_chanctx,
+	.sta_pre_rcu_remove		= ath10k_mac_op_sta_pre_rcu_remove,
 
 	CFG80211_TESTMODE_CMD(ath10k_tm_cmd)