Message ID | 1508237298.10607.76.camel@sipsolutions.net (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Johannes Berg |
Headers | show |
On Tue, Oct 17, 2017 at 6:48 PM, Johannes Berg <johannes@sipsolutions.net> wrote: > On Tue, 2017-10-17 at 18:18 +0800, Jesse Sung wrote: > >> > Does mwifiex treat this -EALREADY as *keeping* an old connection, >> > or tearing it down entirely? >> >> From the call trace: > > Well, the call trace can't really answer that :-) > Does mwifiex firmware stay connected? Sorry I don't know how to tell firmware's state. @@ >> 139.451318: nl80211_get_valid_chan <-nl80211_connect >> 139.451321: cfg80211_connect <-nl80211_connect >> 139.451322: cfg80211_oper_and_ht_capa <-cfg80211_connect >> 139.451323: mwifiex_cfg80211_connect <-cfg80211_connect >> 139.451337: nl80211_post_doit <-genl_family_rcv_msg >> 139.451423: nl80211_pre_doit <-genl_family_rcv_msg >> 139.451425: nl80211_disconnect <-genl_family_rcv_msg >> 139.451426: cfg80211_disconnect <-nl80211_disconnect >> 139.451430: mwifiex_cfg80211_disconnect <-cfg80211_disconnect >> >> mwifiex_cfg80211_disconnect() would be called after >> mwifiex_cfg80211_connect(), though I'm not sure if it's triggered by >> the error returned. > > I think so - it's probably wpa_supplicant trying to get back to a well- > known state (of being disconnected). > >> > I think your fix is invalid because we then reset ssid_len and >> > still >> > keep an old connection (current_bss) which will lead to strange >> > nl80211 >> > behaviour when getting interface data etc. >> >> Since this is how it works before commit 0711d638 (use current_bss >> instead of ssid_len), so I'm guessing this would still work. But I >> agree that this may not be a proper fix... > > It would probably work, but we get data inconsistencies, and at the > very least you get no SSID data back when you query the current state. > > I don't see anything in nl80211 or so that would say we should accept a > connect() while already connected, so how about this? > > diff --git a/net/wireless/sme.c b/net/wireless/sme.c > index b347e63d7aaa..fe0037ad1f5e 100644 > --- a/net/wireless/sme.c > +++ b/net/wireless/sme.c > @@ -1042,6 +1042,9 @@ int cfg80211_connect(struct cfg80211_registered_device *rdev, > > ASSERT_WDEV_LOCK(wdev); > > + if (wdev->current_bss) > + return -EALREADY; > + > if (WARN_ON(wdev->connect_keys)) { > kzfree(wdev->connect_keys); > wdev->connect_keys = NULL; > > Not really quite sure about it yet, but that should address the issue? A very rough test by issuing reassociate in wpa_cli: mwifiex works after reassociate - looks good > reassociate OK <3>CTRL-EVENT-SCAN-STARTED <3>CTRL-EVENT-SCAN-RESULTS <3>Trying to associate with <BSSID1> (SSID=<SSID> freq=<freq1>) <3>Association request to the driver failed <3>CTRL-EVENT-REGDOM-CHANGE init=CORE type=WORLD <3>CTRL-EVENT-SCAN-STARTED <3>CTRL-EVENT-SCAN-RESULTS <3>WPS-AP-AVAILABLE <3>Trying to associate with <BSSID2> (SSID=<SSID> freq=<freq2>) <3>Associated with <BSSID2> <3>WPA: Key negotiation completed with <BSSID2> [PTK=CCMP GTK=CCMP] <3>CTRL-EVENT-CONNECTED - Connection to <BSSID2> completed [id=0 id_str=] <3>CTRL-EVENT-SCAN-STARTED > reassociate OK <3>CTRL-EVENT-SCAN-RESULTS <3>Trying to associate with <BSSID2> (SSID=<SSID> freq=<freq2>) <3>Associated with <BSSID2> <3>CTRL-EVENT-REGDOM-CHANGE init=CORE type=WORLD <3>WPA: Key negotiation completed with <BSSID2> [PTK=CCMP GTK=CCMP] <3>CTRL-EVENT-CONNECTED - Connection to <BSSID2> completed [id=0 id_str=] > reassociate OK <3>CTRL-EVENT-SCAN-STARTED > <3>CTRL-EVENT-SCAN-RESULTS <3>Trying to associate with <BSSID2> (SSID=<SSID> freq=<freq2>) <3>Association request to the driver failed <3>CTRL-EVENT-REGDOM-CHANGE init=CORE type=WORLD <3>CTRL-EVENT-SCAN-STARTED <3>CTRL-EVENT-SCAN-RESULTS <3>WPS-AP-AVAILABLE <3>Trying to associate with <BSSID3> (SSID=<SSID> freq=<freq3>) <3>Associated with <BSSID3> <3>WPA: Key negotiation completed with <BSSID3> [PTK=CCMP GTK=CCMP] <3>CTRL-EVENT-CONNECTED - Connection to <BSSID3> completed [id=0 id_str=] > reassociate OK <3>CTRL-EVENT-SCAN-STARTED <3>CTRL-EVENT-SCAN-RESULTS <3>Trying to associate with <BSSID3> (SSID=<SSID> freq=<freq3>) <3>Association request to the driver failed <3>CTRL-EVENT-REGDOM-CHANGE init=CORE type=WORLD <3>CTRL-EVENT-SCAN-STARTED <3>CTRL-EVENT-SCAN-RESULTS <3>WPS-AP-AVAILABLE <3>Trying to associate with <BSSID1> (SSID=<SSID> freq=<freq1>) <3>Associated with <BSSID1> <3>WPA: Key negotiation completed with <BSSID1> [PTK=CCMP GTK=CCMP] <3>CTRL-EVENT-CONNECTED - Connection to <BSSID1> completed [id=0 id_str=] <3>CTRL-EVENT-SCAN-STARTED <3>CTRL-EVENT-SCAN-RESULTS <3>Trying to associate with <BSSID2> (SSID=<SSID> freq=<freq2>) <3>Association request to the driver failed <3>CTRL-EVENT-REGDOM-CHANGE init=CORE type=WORLD <3>CTRL-EVENT-SCAN-STARTED <3>CTRL-EVENT-SCAN-RESULTS <3>WPS-AP-AVAILABLE <3>Trying to associate with <BSSID1> (SSID=<SSID> freq=<freq1>) <3>Associated with <BSSID1> <3>WPA: Key negotiation completed with <BSSID1> [PTK=CCMP GTK=CCMP] <3>CTRL-EVENT-CONNECTED - Connection to <BSSID1> completed [id=0 id_str=] Thanks, Jesse
On Tue, 2017-10-17 at 21:07 +0800, Jesse Sung wrote: > > > Not really quite sure about it yet, but that should address the > > issue? > > A very rough test by issuing reassociate in wpa_cli: > mwifiex works after reassociate - looks good Ok. Discussing this with Ilan, we realized that this was buggy, and came up with this patch instead: https://p.sipsolutions.net/8f9d5917a5cbe4b3.txt Can you also give that a spin? johannes
On Tue, Oct 17, 2017 at 9:13 PM, Johannes Berg <johannes@sipsolutions.net> wrote: > > On Tue, 2017-10-17 at 21:07 +0800, Jesse Sung wrote: > > > > > Not really quite sure about it yet, but that should address the > > > issue? > > > > A very rough test by issuing reassociate in wpa_cli: > > mwifiex works after reassociate - looks good > > Ok. Discussing this with Ilan, we realized that this was buggy, and > came up with this patch instead: > > https://p.sipsolutions.net/8f9d5917a5cbe4b3.txt > > Can you also give that a spin? Issued four reassociate, the network interface is still alive. Also with this patch it stays with BSSID1 and I don't see any "Association request to the driver failed" in the process. Thanks, Jesse
On Tue, Oct 17, 2017 at 10:08 PM, Jesse Sung <jesse.sung@canonical.com> wrote: > On Tue, Oct 17, 2017 at 9:13 PM, Johannes Berg > <johannes@sipsolutions.net> wrote: >> >> On Tue, 2017-10-17 at 21:07 +0800, Jesse Sung wrote: >> > >> > > Not really quite sure about it yet, but that should address the >> > > issue? >> > >> > A very rough test by issuing reassociate in wpa_cli: >> > mwifiex works after reassociate - looks good >> >> Ok. Discussing this with Ilan, we realized that this was buggy, and >> came up with this patch instead: >> >> https://p.sipsolutions.net/8f9d5917a5cbe4b3.txt >> >> Can you also give that a spin? > > Issued four reassociate, the network interface is still alive. Also with this > patch it stays with BSSID1 and I don't see any "Association request to > the driver failed" in the process. Correction: I can still see "Association request to the driver failed" and switching between BSSIDs with more reassociates but I think that shouldn't be an issue?
Hi, > > > https://p.sipsolutions.net/8f9d5917a5cbe4b3.txt > > > > > > Can you also give that a spin? Thanks for testing! > > Issued four reassociate, the network interface is still alive. Also > > with this patch it stays with BSSID1 and I don't see any > > "Association request to the driver failed" in the process. > > Correction: I can still see "Association request to the driver > failed" and switching between BSSIDs with more reassociates but I > think that shouldn't be an issue? Not sure what you mean by "with more reassociates"? In any case, we suspect that there's another underlying issue in mwifiex, which happens to tickle this problem. Could you record a full trace with -e cfg80211 for me (and send it privately, please compress trace.dat e.g. with xz, it compresses really well) Thanks, johannes
On Tue, Oct 17, 2017 at 11:10 PM, Johannes Berg <johannes@sipsolutions.net> wrote: > Hi, > >> > > https://p.sipsolutions.net/8f9d5917a5cbe4b3.txt >> > > >> > > Can you also give that a spin? > > Thanks for testing! > >> > Issued four reassociate, the network interface is still alive. Also >> > with this patch it stays with BSSID1 and I don't see any >> > "Association request to the driver failed" in the process. >> >> Correction: I can still see "Association request to the driver >> failed" and switching between BSSIDs with more reassociates but I >> think that shouldn't be an issue? > > Not sure what you mean by "with more reassociates"? I mean do more "reassociate" in the wpa_cli. :) Thanks, Jesse
Hi, I'm not sure I've followed all the problems here, but I wanted to point some things out: On Tue, Oct 17, 2017 at 12:48:18PM +0200, Johannes Berg wrote: > On Tue, 2017-10-17 at 18:18 +0800, Jesse Sung wrote: > > > > Does mwifiex treat this -EALREADY as *keeping* an old connection, > > > or tearing it down entirely? > > > > From the call trace: > > Well, the call trace can't really answer that :-) > Does mwifiex firmware stay connected? IIUC, mwifiex hasn't told the firmware to do anything at this point -- the -EALREADY check is practically the first thing it does within connect(). So it just quits the connect() request and tries to carry on as usual. It will only do something different if the upper layers tell it to do so afterward (e.g., calling disconnect()). > > 139.451318: nl80211_get_valid_chan <-nl80211_connect > > 139.451321: cfg80211_connect <-nl80211_connect > > 139.451322: cfg80211_oper_and_ht_capa <-cfg80211_connect > > 139.451323: mwifiex_cfg80211_connect <-cfg80211_connect > > 139.451337: nl80211_post_doit <-genl_family_rcv_msg > > 139.451423: nl80211_pre_doit <-genl_family_rcv_msg > > 139.451425: nl80211_disconnect <-genl_family_rcv_msg > > 139.451426: cfg80211_disconnect <-nl80211_disconnect > > 139.451430: mwifiex_cfg80211_disconnect <-cfg80211_disconnect > > > > mwifiex_cfg80211_disconnect() would be called after > > mwifiex_cfg80211_connect(), though I'm not sure if it's triggered by > > the error returned. > > I think so - it's probably wpa_supplicant trying to get back to a well- > known state (of being disconnected). Yes, that's definitely what's happening. And it's explicitly called out in the supplicant's nl80211 driver that this is intentional: static int wpa_driver_nl80211_connect( struct wpa_driver_nl80211_data *drv, struct wpa_driver_associate_params *params) { ... ret = wpa_driver_nl80211_try_connect(drv, params); if (ret == -EALREADY) { /* * cfg80211 does not currently accept new connections if * we are already connected. As a workaround, force * disconnection and try again. */ wpa_printf(MSG_DEBUG, "nl80211: Explicitly " "disconnecting before reassociation " "attempt"); if (wpa_driver_nl80211_disconnect( drv, WLAN_REASON_PREV_AUTH_NOT_VALID)) return -1; ret = wpa_driver_nl80211_try_connect(drv, params); } return ret; } This is the main code path for supplicant commands like "Reattach", which boil down to (for non SME drivers): wpas_request_connection() ... -> wpa_supplicant_connect() -> wpa_supplicant_associate() -> wpas_start_assoc_cb() -> wpa_drv_associate() -> wpa_driver_nl80211_associate() -> wpa_driver_nl80211_connect() Now for the part I'm not so familiar with: is this really the *expected* flow for full-MAC drivers in reattach, reassociate, and roaming flows? All of those seem to boil down to this same connect() (and fallback to disconnect()+connect() if -EALREADY) flow. But it doesn't seem like all full-MAC drivers do the same thing. Some seem to just blaze ahead with a connect attempt (maybe some firmwares automatically interpret this for us?) and never return -EALREADY at all. Sorry if this is slightly off-topic, but I'm trying to understand what the general expectations are here, based on my relatively narrow experience with a few drivers. Brian
Hi, > IIUC, mwifiex hasn't told the firmware to do anything at this point -- > the -EALREADY check is practically the first thing it does within > connect(). So it just quits the connect() request and tries to carry on > as usual. It will only do something different if the upper layers tell > it to do so afterward (e.g., calling disconnect()). Yeah, that makes sense. > Yes, that's definitely what's happening. And it's explicitly called out > in the supplicant's nl80211 driver that this is intentional: > > [...] Right. > This is the main code path for supplicant commands like "Reattach", > which boil down to (for non SME drivers): > > wpas_request_connection() > ... > -> wpa_supplicant_connect() > -> wpa_supplicant_associate() > -> wpas_start_assoc_cb() > -> wpa_drv_associate() > -> wpa_driver_nl80211_associate() > -> wpa_driver_nl80211_connect() > > Now for the part I'm not so familiar with: is this really the *expected* > flow for full-MAC drivers in reattach, reassociate, and roaming flows? > All of those seem to boil down to this same connect() (and fallback to > disconnect()+connect() if -EALREADY) flow. We never implemented a "ROAM" command, so there's not all that much choice. > But it doesn't seem like all full-MAC drivers do the same thing. Some > seem to just blaze ahead with a connect attempt (maybe some firmwares > automatically interpret this for us?) and never return -EALREADY at all. Agree, some seem to just do some form of roaming on this, which really just means they disconnect internally - or perhaps they even try to roam if it's the same network? > Sorry if this is slightly off-topic, but I'm trying to understand what > the general expectations are here, based on my relatively narrow > experience with a few drivers. I don't really know either! There aren't that many direct cfg80211 drivers that don't use mac80211, so there's not that much experience. You're probably well positioned to say what the behaviour _should_ be though? :-) I tend to think we should actually do the EALREADY from cfg80211, so that wpa_s will always be forced to go through this roundabout code path, but also finally implement a ROAM command, to let drivers have that option? Note also that we've always sort of envisioned that drivers implementing CONNECT would do BSS selection themselves, but I think there's no automatic way of doing that with wpa_s, and it may not even be desirable in many cases (unless you really need the power saving advantage) since it gets us into a situation where we have all these different algorithms etc. johannes
On 27-10-17 22:10, Johannes Berg wrote: > Hi, > >> IIUC, mwifiex hasn't told the firmware to do anything at this point -- >> the -EALREADY check is practically the first thing it does within >> connect(). So it just quits the connect() request and tries to carry on >> as usual. It will only do something different if the upper layers tell >> it to do so afterward (e.g., calling disconnect()). > > Yeah, that makes sense. > >> Yes, that's definitely what's happening. And it's explicitly called out >> in the supplicant's nl80211 driver that this is intentional: >> >> [...] > > Right. > >> This is the main code path for supplicant commands like "Reattach", >> which boil down to (for non SME drivers): >> >> wpas_request_connection() >> ... >> -> wpa_supplicant_connect() >> -> wpa_supplicant_associate() >> -> wpas_start_assoc_cb() >> -> wpa_drv_associate() >> -> wpa_driver_nl80211_associate() >> -> wpa_driver_nl80211_connect() >> >> Now for the part I'm not so familiar with: is this really the *expected* >> flow for full-MAC drivers in reattach, reassociate, and roaming flows? >> All of those seem to boil down to this same connect() (and fallback to >> disconnect()+connect() if -EALREADY) flow. > > We never implemented a "ROAM" command, so there's not all that much > choice. > >> But it doesn't seem like all full-MAC drivers do the same thing. Some >> seem to just blaze ahead with a connect attempt (maybe some firmwares >> automatically interpret this for us?) and never return -EALREADY at all. > > Agree, some seem to just do some form of roaming on this, which really > just means they disconnect internally - or perhaps they even try to > roam if it's the same network? > >> Sorry if this is slightly off-topic, but I'm trying to understand what >> the general expectations are here, based on my relatively narrow >> experience with a few drivers. > > I don't really know either! There aren't that many direct cfg80211 > drivers that don't use mac80211, so there's not that much experience. > > You're probably well positioned to say what the behaviour _should_ be > though? :-) > > I tend to think we should actually do the EALREADY from cfg80211, so > that wpa_s will always be forced to go through this roundabout code > path, but also finally implement a ROAM command, to let drivers have > that option? > > Note also that we've always sort of envisioned that drivers > implementing CONNECT would do BSS selection themselves, but I think > there's no automatic way of doing that with wpa_s, and it may not even > be desirable in many cases (unless you really need the power saving > advantage) since it gets us into a situation where we have all these > different algorithms etc. wpa_s can issue a CONNECT without bssid (nor bssid_hint) leaving it up to the driver to select the BSS. I am pretty sure I hit that scenario although not sure what exact criteria are for wpa_s. Maybe it depended on WIPHY_FLAG_SUPPORTS_FW_ROAM. Regarding BSS selection I added bss_select feature. An effort which actually was triggered by a chromebook project. Guess the number of drivers implementing that can be counted on one ...finger ;-) Regards, Arend
diff --git a/net/wireless/sme.c b/net/wireless/sme.c index b347e63d7aaa..fe0037ad1f5e 100644 --- a/net/wireless/sme.c +++ b/net/wireless/sme.c @@ -1042,6 +1042,9 @@ int cfg80211_connect(struct cfg80211_registered_device *rdev, ASSERT_WDEV_LOCK(wdev); + if (wdev->current_bss) + return -EALREADY; + if (WARN_ON(wdev->connect_keys)) { kzfree(wdev->connect_keys); wdev->connect_keys = NULL;