diff mbox

[RFC,v2,RFC] mac80211: lock rate control

Message ID 1425575237-5556-1-git-send-email-johannes@sipsolutions.net (mailing list archive)
State RFC
Headers show

Commit Message

Johannes Berg March 5, 2015, 5:07 p.m. UTC
From: Johannes Berg <johannes.berg@intel.com>

Both minstrel (reported by Sven Eckelmann) and the iwlwifi rate
control aren't properly taking concurrency into account. It's
likely that the same is true for other rate control algorithms.

In the case of minstrel this manifests itself in crashes when an
update and other data access are run concurrently, for example
when the stations change bandwidth or similar. In iwlwifi, this
can cause firmware crashes.

Since fixing all rate control algorithms will be very difficult,
just provide locking for invocations. This protects the internal
data structures the algorithms maintain.

I've manipulated hostapd to test this, by having it change its
advertised bandwidth roughly ever 150ms. At the same time, I'm
running a flood ping between the client and the AP, which causes
this race of update vs. get_rate/status to easily happen on the
client. With this change, the system survives this test.

Change-Id: I77383401a67a1ec380cec65b34802ab879357e80
Reported-by: Sven Eckelmann <sven@open-mesh.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
 net/mac80211/rate.c     |  8 +++++++-
 net/mac80211/rate.h     | 12 +++++++++---
 net/mac80211/sta_info.c |  2 +-
 net/mac80211/sta_info.h |  1 +
 4 files changed, 18 insertions(+), 5 deletions(-)

Comments

Sven Eckelmann March 6, 2015, 10:47 a.m. UTC | #1
On Thursday 05 March 2015 18:07:17 Johannes Berg wrote:
> From: Johannes Berg <johannes.berg@intel.com>
> 
> Both minstrel (reported by Sven Eckelmann) and the iwlwifi rate
> control aren't properly taking concurrency into account. It's
> likely that the same is true for other rate control algorithms.
> 
> In the case of minstrel this manifests itself in crashes when an
> update and other data access are run concurrently, for example
> when the stations change bandwidth or similar. In iwlwifi, this
> can cause firmware crashes.
> 
> Since fixing all rate control algorithms will be very difficult,
> just provide locking for invocations. This protects the internal
> data structures the algorithms maintain.
> 
> I've manipulated hostapd to test this, by having it change its
> advertised bandwidth roughly ever 150ms. At the same time, I'm
> running a flood ping between the client and the AP, which causes
> this race of update vs. get_rate/status to easily happen on the
> client. With this change, the system survives this test.
> 
> Change-Id: I77383401a67a1ec380cec65b34802ab879357e80
> Reported-by: Sven Eckelmann <sven@open-mesh.com>
> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> ---

Thanks for the second patch. I got a message today that the patched wifi 
driver was tested at least on one network and it still crashes. You can see 
the attached logs but they are mostly the same as before - just with some 
addresses changed due to the patch.

I've looked through the code and didn't find a situation where the tx_status 
function pointer is called without the lock. Same for the rate_update. But 
there is something in minstrel_ht which you most likely don't have in the 
iwlwifi code:

 minstrel_ht_update_caps is called by rate_update and rate_init

My hypothesis would be that rate_init is called for a sta during tx_status and 
this causes one of the crashes. This sounds a little bit weird but at least 
one thing that can be checked. I am not sure in which state the device is but 
it is most likely running AP vifs and one IBSS vif.

The function rate_control_rate_init is called by the IBSS code (in my OpenWrt 
compat-wireless 2014-11-04 snapshot) in ieee80211_ibss_finish_sta and 
ieee80211_rx_bss_info after an rates_updated. At least the latter one sounds 
suspicious.

I will just add following modification to your patch and ask for a new test. 
But feel free to point out any problem in my hypothesis.


@@ -69,8 +71,10 @@ static inline void rate_control_rate_ini
 
 	sband = local->hw.wiphy->bands[chanctx_conf->def.chan->band];
 
+	spin_lock_bh(&sta->rate_ctrl_lock);
 	ref->ops->rate_init(ref->priv, sband, &chanctx_conf->def, ista,
 			    priv_sta);
+	spin_unlock_bh(&sta->rate_ctrl_lock);
 	rcu_read_unlock();
 	set_sta_flag(sta, WLAN_STA_RATE_CONTROL);
 }

Kind regards,
	Sven
Before applying the patch


[ 4161.700000] CPU 0 Unable to handle kernel paging request at virtual address 00000048, epc == 83b93c24, ra == 83b81fa8
[ 4161.710000] Oops[#1]:
[ 4161.710000] Cpu 0
[ 4161.710000] $ 0   : 00000000 00000000 ffffffff 00000012
[ 4161.710000] $ 4   : 83b5cac0 00000000 82eff580 82eff592
[ 4161.710000] $ 8   : 00000000 83bcda00 00000008 0000000f
[ 4161.710000] $12   : 00004000 00000001 00000000 00000001
[ 4161.710000] $16   : 83248000 83b5cac0 83394840 8212b030
[ 4161.710000] $20   : 00000001 00000001 00000800 83394843
[ 4161.710000] $24   : 00000000 00000000                  
[ 4161.710000] $28   : 82d5c000 82d5d920 04208060 83b81fa8
[ 4161.710000] Hi    : 00000000
[ 4161.710000] Lo    : 00000000
[ 4161.710000] epc   : 83b93c24 rate_control_set_rates+0x8/0x120 [mac80211]
[ 4161.710000]     Tainted: P           O
[ 4161.710000] ra    : 83b81fa8 ieee80211_tx_status+0x400/0xde8 [mac80211]
[ 4161.710000] Status: 1100f403    KERNEL EXL IE 
[ 4161.710000] Cause : 00802008
[ 4161.710000] BadVA : 00000048
[ 4161.710000] PrId  : 0001974c (MIPS 74Kc)
[ 4161.710000] Modules linked in: sch_teql sch_tbf sch_sfq sch_red sch_prio sch_netem sch_htb sch_gred sch_dsmark em_text em_nbyte em_meta em_cmp cls_basic act_police act_ipt act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_hfsc sch_fq_codel sch_codel sch_ingress tmp421 ath10k_pci(O) ath10k_core(O) UDSMARK(O) udsmac(O) i2c_dev tc_classid_mapper(PO) filter_group(O) ledtrig_netdev classifier_dumper(PO) classifier_dns(PO) classifier_netblock(PO) classifier_bittorrent(PO) classifier_rtmp(PO) classifier_ssl(PO) classifier_content(PO) classifier_skype(PO) kernel_classifier(O) batman_adv(O) ip6t_REJECT ip6t_rt ip6t_hbh ip6t_mh ip6t_ipv6header ip6t_frag ip6t_eui64 ip6t_ah ip6t_NPT ip6t_MASQUERADE ip6table_nat nf_nat_ipv6 ip6table_raw ip6table_mangle ip6table_filter ip6_tables nf_conntrack_ipv6 nf_defrag_ipv6 xt_IMQ imq nf_nat_tftp nf_conntrack_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_h323 nf_conntrack_h323 nf_nat_proto_gre nf_conntrack_proto_gre nf_nat_amanda nf_conntrack_amanda nf_conntrack_broadcast nf_nat_irc nf_nat_ftp nf_conntrack_irc nf_conntrack_ftp xt_HL xt_hl xt_ecn ipt_ECN xt_CLASSIFY xt_time xt_tcpmss xt_statistic xt_mark xt_length xt_DSCP xt_dscp xt_string xt_layer7 xt_quota xt_pkttype xt_physdev xt_owner xt_addrtype xt_REDIRECT xt_NETMAP ipt_MASQUERADE iptable_nat xt_nat nf_nat_ipv4 nf_nat xt_recent xt_helper xt_connmark xt_connbytes xt_conntrack xt_CT iptable_raw xt_state nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ipt_REJECT xt_TCPMSS xt_LOG xt_comment xt_multiport xt_mac xt_limit iptable_mangle iptable_filter ip_tables xt_tcpudp x_tables hwmon ifb tun button_hotplug(O) ath9k(O) ath9k_common(O) ath9k_hw(O) ath(O) mac80211(O) ts_fsm ts_bm ts_kmp libcrc32c crc16 crc_ccitt input_polldev cfg80211(O) compat(O) input_core ag71xx arc4 crc32c crypto_blkcipher aead crypto_hash ledtrig_timer ledtrig_default_on leds_gpio gpio_button_hotplug(O)
[ 4161.710000] Process kworker/u:2 (pid: 15261, threadinfo=82d5c000, task=822ba140, tls=00000000)
[ 4161.710000] Stack : 83248000 83248000 83b5cac0 83394840 8212b030 83b81fa8 00000001 80000000
        82d5da08 00000001 83394840 82d5d964 8029ee8c 80000000 8032b714 83394480
        8032beb8 00000000 00000000 00000001 8388212c 83093042 8392b800 82d5da20
        803b0000 80340000 83b5d6a0 82d5d9c0 83b7ef30 83b5d6a0 83882010 00100100
        00200200 803a0000 04208060 83b48728 00000000 00000000 83b5e0a8 83b5e068
        ...
[ 4161.710000] Call Trace:
[ 4161.710000] [<83b93c24>] rate_control_set_rates+0x8/0x120 [mac80211]
[ 4161.710000] [<83b81fa8>] ieee80211_tx_status+0x400/0xde8 [mac80211]
[ 4161.710000] [<83b48728>] ath_txq_unlock_complete+0xa8/0xc0 [ath9k]
[ 4161.710000] [<83b4baec>] ath_tx_edma_tasklet+0x288/0x2b8 [ath9k]
[ 4161.710000] [<83b447f0>] ath9k_tasklet+0x264/0x2c8 [ath9k]
[ 4161.710000] [<8007e31c>] tasklet_action+0x78/0xc8
[ 4161.710000] [<8007db6c>] __do_softirq+0xb0/0x184
[ 4161.710000] [<8007dcf0>] do_softirq+0x48/0x68
[ 4161.710000] [<8007df0c>] irq_exit+0x4c/0x80
[ 4161.710000] [<8006082c>] ret_from_irq+0x0/0x4
[ 4161.710000] [<80064800>] __bzero+0x44/0x164
[ 4161.710000] [<83bc8208>] minstrel_remove_sta_debugfs+0x11ec/0x1aa0 [mac80211]
[ 4161.710000] 
[ 4161.710000] 
Code: 27bd0020  27bdffe8  afbf0014 <8ca40048> 10800003  aca60048  0c02c21a  00002821  8fbf0014 
[ 4162.100000] ---[ end trace 0873425f4a654b6a ]---



--------------------------------------------------------------------------------

After applying the patch

CPU 0 Unable to handle kernel paging request at virtual address 00000048, epc == 83b93c84, ra == 83b81fb4
[  846.030000] Oops[#1]:
[  846.030000] Cpu 0
[  846.030000] $ 0   : 00000000 00000000 ffffffff 00000012
[  846.030000] $ 4   : 83b68ac0 00000000 81d99200 81d99212
[  846.030000] $ 8   : 00000000 83bcdb20 00000008 0000000f
[  846.030000] $12   : 00004000 00000001 00000000 00000000
[  846.030000] $16   : 83a66000 83b68ac0 83b3f9c0 82e50040
[  846.030000] $20   : 00000000 00000001 0000b000 83b3f9c0
[  846.030000] $24   : 00000000 00000000                  
[  846.030000] $28   : 83838000 83839920 83bdb000 83b81fb4
[  846.030000] Hi    : 00000000
[  846.030000] Lo    : 00000000
[  846.030000] epc   : 83b93c84 rate_control_set_rates+0x8/0x120 [mac80211]
[  846.030000]     Tainted: P           O
[  846.030000] ra    : 83b81fb4 ieee80211_tx_status+0x40c/0xe00 [mac80211]
[  846.030000] Status: 1100f403    KERNEL EXL IE 
[  846.030000] Cause : 00800008
[  846.030000] BadVA : 00000048
[  846.030000] PrId  : 0001974c (MIPS 74Kc)
[  846.030000] Modules linked in: sch_teql sch_tbf sch_sfq sch_red sch_prio sch_netem sch_htb sch_gred sch_dsmark em_text em_nbyte em_meta em_cmp cls_basic act_police act_ipt act_skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_hfsc sch_fq_codel sch_codel sch_ingress tmp421 ath10k_pci(O) ath10k_core(O) UDSMARK(O) udsmac(O) i2c_dev tc_classid_mapper(PO) filter_group(O) ledtrig_netdev classifier_dumper(PO) classifier_dns(PO) classifier_netblock(PO) classifier_bittorrent(PO) classifier_rtmp(PO) classifier_ssl(PO) classifier_content(PO) classifier_skype(PO) kernel_classifier(O) batman_adv(O) ip6t_REJECT ip6t_rt ip6t_hbh ip6t_mh ip6t_ipv6header ip6t_frag ip6t_eui64 ip6t_ah ip6t_NPT ip6t_MASQUERADE ip6table_nat nf_nat_ipv6 ip6table_raw ip6table_mangle ip6table_filter ip6_tables nf_conntrack_ipv6 nf_defrag_ipv6 xt_IMQ imq nf_nat_tftp nf_conntrack_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_h323 nf_conntrack_h323 nf_nat_proto_gre nf_conntrack_proto_gre nf_nat_amanda nf_conntrack_amanda nf_conntrack_broadcast nf_nat_irc nf_nat_ftp nf_conntrack_irc nf_conntrack_ftp xt_HL xt_hl xt_ecn ipt_ECN xt_CLASSIFY xt_time xt_tcpmss xt_statistic xt_mark xt_length xt_DSCP xt_dscp xt_string xt_layer7 xt_quota xt_pkttype xt_physdev xt_owner xt_addrtype xt_REDIRECT xt_NETMAP ipt_MASQUERADE iptable_nat xt_nat nf_nat_ipv4 nf_nat xt_recent xt_helper xt_connmark xt_connbytes xt_conntrack xt_CT iptable_raw xt_state nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ipt_REJECT xt_TCPMSS xt_LOG xt_comment xt_multiport xt_mac xt_limit iptable_mangle iptable_filter ip_tables xt_tcpudp x_tables hwmon ifb tun button_hotplug(O) ath9k(O) ath9k_common(O) ath9k_hw(O) ath(O) mac80211(O) ts_fsm ts_bm ts_kmp libcrc32c crc16 crc_ccitt input_polldev cfg80211(O) compat(O) input_core ag71xx arc4 crc32c crypto_blkcipher aead crypto_hash ledtrig_timer ledtrig_default_on leds_gpio gpio_button_hotplug(O)
[  846.030000] Process kworker/u:0 (pid: 6, threadinfo=83838000, task=8381d4c8, tls=00000000)
[  846.030000] Stack : 83a66000 83a66000 83b68ac0 83b3f9c0 82e50040 83b81fb4 822d62c0 801e947c
        83839a08 00000001 83b3f9c0 00000000 838399a0 83b947a0 822d62c0 00000000
        83886000 00000000 00000000 00000000 8388212c 00000000 00000089 00001002
        022d630e 00000000 83b696a0 838399c0 83b65250 83b696a0 83882010 00100100
        00200200 803a0000 04208060 83b48728 00000000 00000000 83b6a170 83b6a150
        ...
[  846.030000] Call Trace:
[  846.030000] [<83b93c84>] rate_control_set_rates+0x8/0x120 [mac80211]
[  846.030000] [<83b81fb4>] ieee80211_tx_status+0x40c/0xe00 [mac80211]
[  846.030000] [<83b48728>] ath_txq_unlock_complete+0xa8/0xc0 [ath9k]
[  846.030000] [<83b4baec>] ath_tx_edma_tasklet+0x288/0x2b8 [ath9k]
[  846.030000] [<83b447f0>] ath9k_tasklet+0x264/0x2c8 [ath9k]
[  846.030000] [<8007e31c>] tasklet_action+0x78/0xc8
[  846.030000] [<8007db6c>] __do_softirq+0xb0/0x184
[  846.030000] [<8007dcf0>] do_softirq+0x48/0x68
[  846.030000] [<8007df0c>] irq_exit+0x4c/0x80
[  846.030000] [<8006082c>] ret_from_irq+0x0/0x4
[  846.030000] [<80064804>] __bzero+0x48/0x164
[  846.030000] [<83bc8328>] minstrel_remove_sta_debugfs+0x11ec/0x1aa0 [mac80211]
[  846.030000] 
[  846.030000] 
Code: 27bd0020  27bdffe8  afbf0014 <8ca40048> 10800003  aca60048  0c02c21a  00002821  8fbf0014 
[  846.410000] ---[ end trace 36e2ac14823a5880 ]---
Johannes Berg March 6, 2015, 10:55 a.m. UTC | #2
On Fri, 2015-03-06 at 11:47 +0100, Sven Eckelmann wrote:

> The function rate_control_rate_init is called by the IBSS code (in my OpenWrt 
> compat-wireless 2014-11-04 snapshot) in ieee80211_ibss_finish_sta and 
> ieee80211_rx_bss_info after an rates_updated. At least the latter one sounds 
> suspicious.

Yeah, I forgot IBSS is doing strange things here.

> I will just add following modification to your patch and ask for a new test. 

Let me know how that goes :)

johannes

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sven Eckelmann March 9, 2015, 8:32 a.m. UTC | #3
On Friday 06 March 2015 11:55:48 Johannes Berg wrote:
> > I will just add following modification to your patch and ask for a new
> > test.
>
> Let me know how that goes :)

I got a reply and they informed me that the test looked promising. The test 
setup didn't crash and was running stable for hours. It looks like the extra 
locking in rate_control_rate_init around ref->ops->rate_init was the missing 
part in your patch that was necessary fix the problem for them.

Kind regards,
	Sven
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Johannes Berg March 9, 2015, 8:35 a.m. UTC | #4
On Mon, 2015-03-09 at 09:32 +0100, Sven Eckelmann wrote:
> On Friday 06 March 2015 11:55:48 Johannes Berg wrote:
> > > I will just add following modification to your patch and ask for a new
> > > test.
> >
> > Let me know how that goes :)
> 
> I got a reply and they informed me that the test looked promising. The test 
> setup didn't crash and was running stable for hours. It looks like the extra 
> locking in rate_control_rate_init around ref->ops->rate_init was the missing 
> part in your patch that was necessary fix the problem for them.

Great, thanks for testing.

johannes

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/mac80211/rate.c b/net/mac80211/rate.c
index bc5270ed26bd..68bed2cad6f3 100644
--- a/net/mac80211/rate.c
+++ b/net/mac80211/rate.c
@@ -683,7 +683,13 @@  void rate_control_get_rate(struct ieee80211_sub_if_data *sdata,
 	if (sdata->local->hw.flags & IEEE80211_HW_HAS_RATE_CONTROL)
 		return;
 
-	ref->ops->get_rate(ref->priv, ista, priv_sta, txrc);
+	if (ista) {
+		spin_lock_bh(&sta->rate_ctrl_lock);
+		ref->ops->get_rate(ref->priv, ista, priv_sta, txrc);
+		spin_unlock_bh(&sta->rate_ctrl_lock);
+	} else {
+		ref->ops->get_rate(ref->priv, NULL, NULL, txrc);
+	}
 
 	if (sdata->local->hw.flags & IEEE80211_HW_SUPPORTS_RC_TABLE)
 		return;
diff --git a/net/mac80211/rate.h b/net/mac80211/rate.h
index a7d5439322d7..d0ecc8952fa0 100644
--- a/net/mac80211/rate.h
+++ b/net/mac80211/rate.h
@@ -42,10 +42,12 @@  static inline void rate_control_tx_status(struct ieee80211_local *local,
 	if (!ref || !test_sta_flag(sta, WLAN_STA_RATE_CONTROL))
 		return;
 
+	spin_lock_bh(&sta->rate_ctrl_lock);
 	if (ref->ops->tx_status)
 		ref->ops->tx_status(ref->priv, sband, ista, priv_sta, skb);
 	else
 		ref->ops->tx_status_noskb(ref->priv, sband, ista, priv_sta, info);
+	spin_unlock_bh(&sta->rate_ctrl_lock);
 }
 
 static inline void
@@ -64,7 +66,9 @@  rate_control_tx_status_noskb(struct ieee80211_local *local,
 	if (WARN_ON_ONCE(!ref->ops->tx_status_noskb))
 		return;
 
+	spin_lock_bh(&sta->rate_ctrl_lock);
 	ref->ops->tx_status_noskb(ref->priv, sband, ista, priv_sta, info);
+	spin_unlock_bh(&sta->rate_ctrl_lock);
 }
 
 static inline void rate_control_rate_init(struct sta_info *sta)
@@ -115,18 +119,20 @@  static inline void rate_control_rate_update(struct ieee80211_local *local,
 			return;
 		}
 
+		spin_lock_bh(&sta->rate_ctrl_lock);
 		ref->ops->rate_update(ref->priv, sband, &chanctx_conf->def,
 				      ista, priv_sta, changed);
+		spin_unlock_bh(&sta->rate_ctrl_lock);
 		rcu_read_unlock();
 	}
 	drv_sta_rc_update(local, sta->sdata, &sta->sta, changed);
 }
 
 static inline void *rate_control_alloc_sta(struct rate_control_ref *ref,
-					   struct ieee80211_sta *sta,
-					   gfp_t gfp)
+					   struct sta_info *sta, gfp_t gfp)
 {
-	return ref->ops->alloc_sta(ref->priv, sta, gfp);
+	spin_lock_init(&sta->rate_ctrl_lock);
+	return ref->ops->alloc_sta(ref->priv, &sta->sta, gfp);
 }
 
 static inline void rate_control_free_sta(struct sta_info *sta)
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index 11c1e71c833d..eab8d8afab13 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -292,7 +292,7 @@  static int sta_prepare_rate_control(struct ieee80211_local *local,
 
 	sta->rate_ctrl = local->rate_ctrl;
 	sta->rate_ctrl_priv = rate_control_alloc_sta(sta->rate_ctrl,
-						     &sta->sta, gfp);
+						     sta, gfp);
 	if (!sta->rate_ctrl_priv)
 		return -ENOMEM;
 
diff --git a/net/mac80211/sta_info.h b/net/mac80211/sta_info.h
index 36a14a0be6dd..b53c6ca3179a 100644
--- a/net/mac80211/sta_info.h
+++ b/net/mac80211/sta_info.h
@@ -396,6 +396,7 @@  struct sta_info {
 	u8 ptk_idx;
 	struct rate_control_ref *rate_ctrl;
 	void *rate_ctrl_priv;
+	spinlock_t rate_ctrl_lock;
 	spinlock_t lock;
 
 	struct work_struct drv_deliver_wk;