diff mbox series

[RFC] wifi: cfg80211: fix cqm_config access race

Message ID 20230813151828.ef56f5624c62.I1a1bb102329fc88e4712eaf394cba3025ada0dc7@changeid (mailing list archive)
State RFC
Delegated to: Johannes Berg
Headers show
Series [RFC] wifi: cfg80211: fix cqm_config access race | expand

Commit Message

Johannes Berg Aug. 13, 2023, 1:18 p.m. UTC
From: Johannes Berg <johannes.berg@intel.com>

Max Schulze reports crashes with brcmfmac. The reason seems
to be a race between userspace removing the CQM config and
the driver calling cfg80211_cqm_rssi_notify(), where if the
data is freed while cfg80211_cqm_rssi_notify() runs it will
crash since it assumes wdev->cqm_config is set. This can't
be fixed with a simple non-NULL check since there's nothing
we can do for locking easily, so use RCU instead to protect
the pointer.

Since we need to change the free anyway, also change it to
go back to the old settings if changing the settings fails.

Fixes: 4a4b8169501b ("cfg80211: Accept multiple RSSI thresholds for CQM")
Closes: https://lore.kernel.org/r/ac96309a-8d8d-4435-36e6-6d152eb31876@online.de
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
 include/net/cfg80211.h |  2 +-
 net/wireless/core.c    | 11 ++++-----
 net/wireless/core.h    |  3 +--
 net/wireless/nl80211.c | 52 ++++++++++++++++++++++++------------------
 4 files changed, 36 insertions(+), 32 deletions(-)

Comments

Max Schulze Aug. 15, 2023, 10:56 a.m. UTC | #1
Hello Johannes,

thanks for your patch.

While it works well in my lab setting, it crashes within minutes in the field. 

While the crashes look slightly different ("Unable to handle kernel pagign request"... descendant of is_swiotlb_active...) I think the notice beforehand is much more interesting: do you understand it?

: ------------[ cut here ]------------
: Voluntary context switch within RCU read-side critical section!
: WARNING: CPU: 0 PID: 319 at kernel/rcu/tree_plugin.h:320 rcu_note_context_switch+0x484/0x550
: Modules linked in: ov9281 rtc_pcf85063 regmap_i2c brcmfmac brcmutil v3d gpu_sched drm_shmem_helper cfg80211 gpio_keys i2c_mux_pinctrl i2c_mux vc4 rfkill raspberrypi_hwmon bcm2835_codec(C) bcm2835_v4l2(C) bcm2835_isp(C) bcm2835_unicam snd_soc_hdmi_codec v4l2_dv_timings bcm2835_mmal_vchiq(C) drm_display_helper rpivid_hevc(C) v4l2_fwnode v4l2_mem2mem videobuf2_vmalloc cec v4l2_async videobuf2_dma_contig i2c_brcmstb videobuf2_memops drm_dma_helper videobuf2_v4l2 i2c_bcm2835 drm_kms_helper snd_soc_core videobuf2_common videodev vc_sm_cma(C) snd_bcm2835(C) snd_compress snd_pcm_dmaengine snd_pcm snd_timer mc snd syscopyarea sysfillrect sysimgblt fb_sys_fops uio_pdrv_genirq nvmem_rmem uio drm drm_panel_orientation_quirks fuse backlight ip_tables x_tables ipv6
: CPU: 0 PID: 319 Comm: kworker/0:3 Tainted: G         C 6.1.45-v8-ged6bf58272ec #2
: Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
: Workqueue: events brcmf_fweh_event_worker [brcmfmac]
: pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
: pc : rcu_note_context_switch+0x484/0x550
: lr : rcu_note_context_switch+0x484/0x550
: sp : ffffffc008d337c0
: x29: ffffffc008d337c0 x28: ffffff804765be00 x27: 000000000000001b
: x26: 0000000000000001 x25: 0000000000000000 x24: ffffff804765be00
: x23: ffffff804765be00 x22: 0000000000000000 x21: 0000000000000000
: x20: ffffffdcc58f4f00 x19: ffffff807fb5ff00 x18: 00000000fffffffc
: x17: 2e2e2e2e2e202020 x16: ffffffdcc53b9bb0 x15: 0000000000000020
: x14: ffffffdcc5c65358 x13: 216e6f6974636573 x12: 206c616369746972
: x11: 00000000000021c8 x10: 0000000000002180 x9 : ffffffdcc48fc5c4
: x8 : ffffffdcc5c0c788 x7 : ffffffdcc5c64788 x6 : 0000000000005538
: x5 : 000000000000bff4 x4 : 0000000000000000 x3 : 0000000000000027
: x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffff804765be00
: Call trace:
:  rcu_note_context_switch+0x484/0x550
:  __schedule+0xc0/0xa10
:  schedule+0x60/0x100
:  schedule_timeout+0xa0/0x1c0
:  brcmf_sdio_bus_txctl+0xcc/0x1f4 [brcmfmac]
:  brcmf_proto_bcdc_msg+0xd4/0xf0 [brcmfmac]
:  brcmf_proto_bcdc_set_dcmd+0x88/0x124 [brcmfmac]
:  brcmf_fil_cmd_data+0x84/0x180 [brcmfmac]
:  brcmf_fil_iovar_data_set+0x11c/0x160 [brcmfmac]
:  brcmf_cfg80211_set_cqm_rssi_range_config+0xe4/0x130 [brcmfmac]
:  cfg80211_cqm_rssi_update+0x120/0x3f0 [cfg80211]
:  cfg80211_cqm_rssi_notify+0x78/0x1b4 [cfg80211]
:  brcmf_notify_rssi+0x10c/0x1b0 [brcmfmac]
:  brcmf_fweh_call_event_handler+0x40/0xa0 [brcmfmac]
:  brcmf_fweh_event_worker+0x1e4/0x4e0 [brcmfmac]
:  process_one_work+0x1dc/0x450
:  worker_thread+0x154/0x450
:  kthread+0x104/0x110
:  ret_from_fork+0x10/0x20
: ---[ end trace 0000000000000000 ]---
Johannes Berg Aug. 15, 2023, 11:02 a.m. UTC | #2
On Tue, 2023-08-15 at 12:56 +0200, Max Schulze wrote:
> Hello Johannes,
> 
> thanks for your patch.
> 
> While it works well in my lab setting, it crashes within minutes in the field. 
> 
> While the crashes look slightly different ("Unable to handle kernel pagign request"... descendant of is_swiotlb_active...) I think the notice beforehand is much more interesting: do you understand it?
> 
> : ------------[ cut here ]------------
> : Voluntary context switch within RCU read-side critical section!
[...]
> :  brcmf_sdio_bus_txctl+0xcc/0x1f4 [brcmfmac]
> :  brcmf_proto_bcdc_msg+0xd4/0xf0 [brcmfmac]
> :  brcmf_proto_bcdc_set_dcmd+0x88/0x124 [brcmfmac]
> :  brcmf_fil_cmd_data+0x84/0x180 [brcmfmac]
> :  brcmf_fil_iovar_data_set+0x11c/0x160 [brcmfmac]
> :  brcmf_cfg80211_set_cqm_rssi_range_config+0xe4/0x130 [brcmfmac]
> :  cfg80211_cqm_rssi_update+0x120/0x3f0 [cfg80211]
> :  cfg80211_cqm_rssi_notify+0x78/0x1b4 [cfg80211]
[...]

Oh, yeah, stupid me.

I did RCU protection around cfg80211_cqm_rssi_update() to have that
protected, but failed to realize that this will call back into the
driver too, which then promptly assumes it can sleep.

Well, OK, so this isn't how we can fix this.

That's really bad for multiple reasons though, because it also means we
call back into the driver from a driver call, which is generally not a
good idea since it can easily cause deadlocks.

Anyway, I guess I have to come up with something else. Thanks for
testing, and sorry I didn't realize that before.

johannes
diff mbox series

Patch

diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
index d6fa7c8767ad..8de6e5c8c85e 100644
--- a/include/net/cfg80211.h
+++ b/include/net/cfg80211.h
@@ -6014,7 +6014,7 @@  struct wireless_dev {
 	} wext;
 #endif
 
-	struct cfg80211_cqm_config *cqm_config;
+	struct cfg80211_cqm_config __rcu *cqm_config;
 
 	struct list_head pmsr_list;
 	spinlock_t pmsr_lock;
diff --git a/net/wireless/core.c b/net/wireless/core.c
index 25bc2e50a061..598da9451f0e 100644
--- a/net/wireless/core.c
+++ b/net/wireless/core.c
@@ -1181,16 +1181,11 @@  void wiphy_rfkill_set_hw_state_reason(struct wiphy *wiphy, bool blocked,
 }
 EXPORT_SYMBOL(wiphy_rfkill_set_hw_state_reason);
 
-void cfg80211_cqm_config_free(struct wireless_dev *wdev)
-{
-	kfree(wdev->cqm_config);
-	wdev->cqm_config = NULL;
-}
-
 static void _cfg80211_unregister_wdev(struct wireless_dev *wdev,
 				      bool unregister_netdev)
 {
 	struct cfg80211_registered_device *rdev = wiphy_to_rdev(wdev->wiphy);
+	struct cfg80211_cqm_config *cqm_config;
 	unsigned int link_id;
 
 	ASSERT_RTNL();
@@ -1227,7 +1222,9 @@  static void _cfg80211_unregister_wdev(struct wireless_dev *wdev,
 	kfree_sensitive(wdev->wext.keys);
 	wdev->wext.keys = NULL;
 #endif
-	cfg80211_cqm_config_free(wdev);
+	/* deleted from the list, so can't be found from nl80211 any more */
+	cqm_config = rcu_access_pointer(wdev->cqm_config);
+	kfree_rcu(cqm_config, rcu_head);
 
 	/*
 	 * Ensure that all events have been processed and
diff --git a/net/wireless/core.h b/net/wireless/core.h
index 8a807b609ef7..62a91aa694a7 100644
--- a/net/wireless/core.h
+++ b/net/wireless/core.h
@@ -295,6 +295,7 @@  struct cfg80211_beacon_registration {
 };
 
 struct cfg80211_cqm_config {
+	struct rcu_head rcu_head;
 	u32 rssi_hyst;
 	s32 last_rssi_event_value;
 	int n_rssi_thresholds;
@@ -566,8 +567,6 @@  cfg80211_bss_update(struct cfg80211_registered_device *rdev,
 #define CFG80211_DEV_WARN_ON(cond)	({bool __r = (cond); __r; })
 #endif
 
-void cfg80211_cqm_config_free(struct wireless_dev *wdev);
-
 void cfg80211_release_pmsr(struct wireless_dev *wdev, u32 portid);
 void cfg80211_pmsr_wdev_down(struct wireless_dev *wdev);
 void cfg80211_pmsr_free_wk(struct work_struct *work);
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 8bcf8e293308..4f79c3543aed 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -12796,7 +12796,8 @@  static int nl80211_set_cqm_txe(struct genl_info *info,
 }
 
 static int cfg80211_cqm_rssi_update(struct cfg80211_registered_device *rdev,
-				    struct net_device *dev)
+				    struct net_device *dev,
+				    struct cfg80211_cqm_config *cqm_config)
 {
 	struct wireless_dev *wdev = dev->ieee80211_ptr;
 	s32 last, low, high;
@@ -12805,7 +12806,7 @@  static int cfg80211_cqm_rssi_update(struct cfg80211_registered_device *rdev,
 	int err;
 
 	/* RSSI reporting disabled? */
-	if (!wdev->cqm_config)
+	if (!cqm_config)
 		return rdev_set_cqm_rssi_range_config(rdev, dev, 0, 0);
 
 	/*
@@ -12814,7 +12815,7 @@  static int cfg80211_cqm_rssi_update(struct cfg80211_registered_device *rdev,
 	 * connection is established and enough beacons received to calculate
 	 * the average.
 	 */
-	if (!wdev->cqm_config->last_rssi_event_value &&
+	if (!cqm_config->last_rssi_event_value &&
 	    wdev->links[0].client.current_bss &&
 	    rdev->ops->get_station) {
 		struct station_info sinfo = {};
@@ -12828,30 +12829,30 @@  static int cfg80211_cqm_rssi_update(struct cfg80211_registered_device *rdev,
 
 		cfg80211_sinfo_release_content(&sinfo);
 		if (sinfo.filled & BIT_ULL(NL80211_STA_INFO_BEACON_SIGNAL_AVG))
-			wdev->cqm_config->last_rssi_event_value =
+			cqm_config->last_rssi_event_value =
 				(s8) sinfo.rx_beacon_signal_avg;
 	}
 
-	last = wdev->cqm_config->last_rssi_event_value;
-	hyst = wdev->cqm_config->rssi_hyst;
-	n = wdev->cqm_config->n_rssi_thresholds;
+	last = cqm_config->last_rssi_event_value;
+	hyst = cqm_config->rssi_hyst;
+	n = cqm_config->n_rssi_thresholds;
 
 	for (i = 0; i < n; i++) {
 		i = array_index_nospec(i, n);
-		if (last < wdev->cqm_config->rssi_thresholds[i])
+		if (last < cqm_config->rssi_thresholds[i])
 			break;
 	}
 
 	low_index = i - 1;
 	if (low_index >= 0) {
 		low_index = array_index_nospec(low_index, n);
-		low = wdev->cqm_config->rssi_thresholds[low_index] - hyst;
+		low = cqm_config->rssi_thresholds[low_index] - hyst;
 	} else {
 		low = S32_MIN;
 	}
 	if (i < n) {
 		i = array_index_nospec(i, n);
-		high = wdev->cqm_config->rssi_thresholds[i] + hyst - 1;
+		high = cqm_config->rssi_thresholds[i] + hyst - 1;
 	} else {
 		high = S32_MAX;
 	}
@@ -12864,6 +12865,7 @@  static int nl80211_set_cqm_rssi(struct genl_info *info,
 				u32 hysteresis)
 {
 	struct cfg80211_registered_device *rdev = info->user_ptr[0];
+	struct cfg80211_cqm_config *cqm_config = NULL, *old;
 	struct net_device *dev = info->user_ptr[1];
 	struct wireless_dev *wdev = dev->ieee80211_ptr;
 	int i, err;
@@ -12881,10 +12883,6 @@  static int nl80211_set_cqm_rssi(struct genl_info *info,
 	    wdev->iftype != NL80211_IFTYPE_P2P_CLIENT)
 		return -EOPNOTSUPP;
 
-	wdev_lock(wdev);
-	cfg80211_cqm_config_free(wdev);
-	wdev_unlock(wdev);
-
 	if (n_thresholds <= 1 && rdev->ops->set_cqm_rssi_config) {
 		if (n_thresholds == 0 || thresholds[0] == 0) /* Disabling */
 			return rdev_set_cqm_rssi_config(rdev, dev, 0, 0);
@@ -12901,9 +12899,9 @@  static int nl80211_set_cqm_rssi(struct genl_info *info,
 		n_thresholds = 0;
 
 	wdev_lock(wdev);
+	old = rcu_dereference_protected(wdev->cqm_config,
+					lockdep_is_held(&wdev->mtx));
 	if (n_thresholds) {
-		struct cfg80211_cqm_config *cqm_config;
-
 		cqm_config = kzalloc(struct_size(cqm_config, rssi_thresholds,
 						 n_thresholds),
 				     GFP_KERNEL);
@@ -12918,10 +12916,16 @@  static int nl80211_set_cqm_rssi(struct genl_info *info,
 		       flex_array_size(cqm_config, rssi_thresholds,
 				       n_thresholds));
 
-		wdev->cqm_config = cqm_config;
+		rcu_assign_pointer(wdev->cqm_config, cqm_config);
 	}
 
-	err = cfg80211_cqm_rssi_update(rdev, dev);
+	err = cfg80211_cqm_rssi_update(rdev, dev, cqm_config);
+	if (err) {
+		rcu_assign_pointer(wdev->cqm_config, old);
+		kfree(cqm_config);
+	} else {
+		kfree_rcu(old, rcu_head);
+	}
 
 unlock:
 	wdev_unlock(wdev);
@@ -19076,6 +19080,7 @@  void cfg80211_cqm_rssi_notify(struct net_device *dev,
 	struct sk_buff *msg;
 	struct wireless_dev *wdev = dev->ieee80211_ptr;
 	struct cfg80211_registered_device *rdev = wiphy_to_rdev(wdev->wiphy);
+	struct cfg80211_cqm_config *cqm_config;
 
 	trace_cfg80211_cqm_rssi_notify(dev, rssi_event, rssi_level);
 
@@ -19083,14 +19088,17 @@  void cfg80211_cqm_rssi_notify(struct net_device *dev,
 		    rssi_event != NL80211_CQM_RSSI_THRESHOLD_EVENT_HIGH))
 		return;
 
-	if (wdev->cqm_config) {
-		wdev->cqm_config->last_rssi_event_value = rssi_level;
+	rcu_read_lock();
+	cqm_config = rcu_dereference(wdev->cqm_config);
+	if (cqm_config) {
+		cqm_config->last_rssi_event_value = rssi_level;
 
-		cfg80211_cqm_rssi_update(rdev, dev);
+		cfg80211_cqm_rssi_update(rdev, dev, cqm_config);
 
 		if (rssi_level == 0)
-			rssi_level = wdev->cqm_config->last_rssi_event_value;
+			rssi_level = cqm_config->last_rssi_event_value;
 	}
+	rcu_read_unlock();
 
 	msg = cfg80211_prepare_cqm(dev, NULL, gfp);
 	if (!msg)