diff mbox

Commit 0711d638 breaks mwifiex

Message ID 1508237298.10607.76.camel@sipsolutions.net (mailing list archive)
State Not Applicable
Delegated to: Johannes Berg
Headers show

Commit Message

Johannes Berg Oct. 17, 2017, 10:48 a.m. UTC
On Tue, 2017-10-17 at 18:18 +0800, Jesse Sung wrote:

> > Does mwifiex treat this -EALREADY as *keeping* an old connection,
> > or tearing it down entirely?
> 
> From the call trace:

Well, the call trace can't really answer that :-)
Does mwifiex firmware stay connected?


> 139.451318: nl80211_get_valid_chan <-nl80211_connect
> 139.451321: cfg80211_connect <-nl80211_connect
> 139.451322: cfg80211_oper_and_ht_capa <-cfg80211_connect
> 139.451323: mwifiex_cfg80211_connect <-cfg80211_connect
> 139.451337: nl80211_post_doit <-genl_family_rcv_msg
> 139.451423: nl80211_pre_doit <-genl_family_rcv_msg
> 139.451425: nl80211_disconnect <-genl_family_rcv_msg
> 139.451426: cfg80211_disconnect <-nl80211_disconnect
> 139.451430: mwifiex_cfg80211_disconnect <-cfg80211_disconnect
> 
> mwifiex_cfg80211_disconnect() would be called after
> mwifiex_cfg80211_connect(), though I'm not sure if it's triggered by
> the error returned.

I think so - it's probably wpa_supplicant trying to get back to a well-
known state (of being disconnected).

> > I think your fix is invalid because we then reset ssid_len and
> > still
> > keep an old connection (current_bss) which will lead to strange
> > nl80211
> > behaviour when getting interface data etc.
> 
> Since this is how it works before commit 0711d638 (use current_bss
> instead of ssid_len), so I'm guessing this would still work. But I
> agree that this may not be a proper fix...

It would probably work, but we get data inconsistencies, and at the
very least you get no SSID data back when you query the current state.

I don't see anything in nl80211 or so that would say we should accept a
connect() while already connected, so how about this?


Not really quite sure about it yet, but that should address the issue?

johannes

Comments

Wen-chien Jesse Sung Oct. 17, 2017, 1:07 p.m. UTC | #1
On Tue, Oct 17, 2017 at 6:48 PM, Johannes Berg
<johannes@sipsolutions.net> wrote:
> On Tue, 2017-10-17 at 18:18 +0800, Jesse Sung wrote:
>
>> > Does mwifiex treat this -EALREADY as *keeping* an old connection,
>> > or tearing it down entirely?
>>
>> From the call trace:
>
> Well, the call trace can't really answer that :-)
> Does mwifiex firmware stay connected?

Sorry I don't know how to tell firmware's state.  @@

>> 139.451318: nl80211_get_valid_chan <-nl80211_connect
>> 139.451321: cfg80211_connect <-nl80211_connect
>> 139.451322: cfg80211_oper_and_ht_capa <-cfg80211_connect
>> 139.451323: mwifiex_cfg80211_connect <-cfg80211_connect
>> 139.451337: nl80211_post_doit <-genl_family_rcv_msg
>> 139.451423: nl80211_pre_doit <-genl_family_rcv_msg
>> 139.451425: nl80211_disconnect <-genl_family_rcv_msg
>> 139.451426: cfg80211_disconnect <-nl80211_disconnect
>> 139.451430: mwifiex_cfg80211_disconnect <-cfg80211_disconnect
>>
>> mwifiex_cfg80211_disconnect() would be called after
>> mwifiex_cfg80211_connect(), though I'm not sure if it's triggered by
>> the error returned.
>
> I think so - it's probably wpa_supplicant trying to get back to a well-
> known state (of being disconnected).
>
>> > I think your fix is invalid because we then reset ssid_len and
>> > still
>> > keep an old connection (current_bss) which will lead to strange
>> > nl80211
>> > behaviour when getting interface data etc.
>>
>> Since this is how it works before commit 0711d638 (use current_bss
>> instead of ssid_len), so I'm guessing this would still work. But I
>> agree that this may not be a proper fix...
>
> It would probably work, but we get data inconsistencies, and at the
> very least you get no SSID data back when you query the current state.
>
> I don't see anything in nl80211 or so that would say we should accept a
> connect() while already connected, so how about this?
>
> diff --git a/net/wireless/sme.c b/net/wireless/sme.c
> index b347e63d7aaa..fe0037ad1f5e 100644
> --- a/net/wireless/sme.c
> +++ b/net/wireless/sme.c
> @@ -1042,6 +1042,9 @@ int cfg80211_connect(struct cfg80211_registered_device *rdev,
>
>         ASSERT_WDEV_LOCK(wdev);
>
> +       if (wdev->current_bss)
> +               return -EALREADY;
> +
>         if (WARN_ON(wdev->connect_keys)) {
>                 kzfree(wdev->connect_keys);
>                 wdev->connect_keys = NULL;
>
> Not really quite sure about it yet, but that should address the issue?

A very rough test by issuing reassociate in wpa_cli:
mwifiex works after reassociate - looks good

> reassociate
OK
<3>CTRL-EVENT-SCAN-STARTED
<3>CTRL-EVENT-SCAN-RESULTS
<3>Trying to associate with <BSSID1> (SSID=<SSID> freq=<freq1>)
<3>Association request to the driver failed
<3>CTRL-EVENT-REGDOM-CHANGE init=CORE type=WORLD
<3>CTRL-EVENT-SCAN-STARTED
<3>CTRL-EVENT-SCAN-RESULTS
<3>WPS-AP-AVAILABLE
<3>Trying to associate with <BSSID2> (SSID=<SSID> freq=<freq2>)
<3>Associated with <BSSID2>
<3>WPA: Key negotiation completed with <BSSID2> [PTK=CCMP GTK=CCMP]
<3>CTRL-EVENT-CONNECTED - Connection to <BSSID2> completed [id=0 id_str=]
<3>CTRL-EVENT-SCAN-STARTED
> reassociate
OK
<3>CTRL-EVENT-SCAN-RESULTS
<3>Trying to associate with <BSSID2> (SSID=<SSID> freq=<freq2>)
<3>Associated with <BSSID2>
<3>CTRL-EVENT-REGDOM-CHANGE init=CORE type=WORLD
<3>WPA: Key negotiation completed with <BSSID2> [PTK=CCMP GTK=CCMP]
<3>CTRL-EVENT-CONNECTED - Connection to <BSSID2> completed [id=0 id_str=]
> reassociate
OK
<3>CTRL-EVENT-SCAN-STARTED
>
<3>CTRL-EVENT-SCAN-RESULTS
<3>Trying to associate with <BSSID2> (SSID=<SSID> freq=<freq2>)
<3>Association request to the driver failed
<3>CTRL-EVENT-REGDOM-CHANGE init=CORE type=WORLD
<3>CTRL-EVENT-SCAN-STARTED
<3>CTRL-EVENT-SCAN-RESULTS
<3>WPS-AP-AVAILABLE
<3>Trying to associate with <BSSID3> (SSID=<SSID> freq=<freq3>)
<3>Associated with <BSSID3>
<3>WPA: Key negotiation completed with <BSSID3> [PTK=CCMP GTK=CCMP]
<3>CTRL-EVENT-CONNECTED - Connection to <BSSID3> completed [id=0 id_str=]
> reassociate
OK
<3>CTRL-EVENT-SCAN-STARTED
<3>CTRL-EVENT-SCAN-RESULTS
<3>Trying to associate with <BSSID3> (SSID=<SSID> freq=<freq3>)
<3>Association request to the driver failed
<3>CTRL-EVENT-REGDOM-CHANGE init=CORE type=WORLD
<3>CTRL-EVENT-SCAN-STARTED
<3>CTRL-EVENT-SCAN-RESULTS
<3>WPS-AP-AVAILABLE
<3>Trying to associate with <BSSID1> (SSID=<SSID> freq=<freq1>)
<3>Associated with <BSSID1>
<3>WPA: Key negotiation completed with <BSSID1> [PTK=CCMP GTK=CCMP]
<3>CTRL-EVENT-CONNECTED - Connection to <BSSID1> completed [id=0 id_str=]
<3>CTRL-EVENT-SCAN-STARTED
<3>CTRL-EVENT-SCAN-RESULTS
<3>Trying to associate with <BSSID2> (SSID=<SSID> freq=<freq2>)
<3>Association request to the driver failed
<3>CTRL-EVENT-REGDOM-CHANGE init=CORE type=WORLD
<3>CTRL-EVENT-SCAN-STARTED
<3>CTRL-EVENT-SCAN-RESULTS
<3>WPS-AP-AVAILABLE
<3>Trying to associate with <BSSID1> (SSID=<SSID> freq=<freq1>)
<3>Associated with <BSSID1>
<3>WPA: Key negotiation completed with <BSSID1> [PTK=CCMP GTK=CCMP]
<3>CTRL-EVENT-CONNECTED - Connection to <BSSID1> completed [id=0 id_str=]


Thanks,
Jesse
Johannes Berg Oct. 17, 2017, 1:13 p.m. UTC | #2
On Tue, 2017-10-17 at 21:07 +0800, Jesse Sung wrote:
> 
> > Not really quite sure about it yet, but that should address the
> > issue?
> 
> A very rough test by issuing reassociate in wpa_cli:
> mwifiex works after reassociate - looks good

Ok. Discussing this with Ilan, we realized that this was buggy, and
came up with this patch instead:

https://p.sipsolutions.net/8f9d5917a5cbe4b3.txt

Can you also give that a spin?

johannes
Wen-chien Jesse Sung Oct. 17, 2017, 2:08 p.m. UTC | #3
On Tue, Oct 17, 2017 at 9:13 PM, Johannes Berg
<johannes@sipsolutions.net> wrote:
>
> On Tue, 2017-10-17 at 21:07 +0800, Jesse Sung wrote:
> >
> > > Not really quite sure about it yet, but that should address the
> > > issue?
> >
> > A very rough test by issuing reassociate in wpa_cli:
> > mwifiex works after reassociate - looks good
>
> Ok. Discussing this with Ilan, we realized that this was buggy, and
> came up with this patch instead:
>
> https://p.sipsolutions.net/8f9d5917a5cbe4b3.txt
>
> Can you also give that a spin?

Issued four reassociate, the network interface is still alive. Also with this
patch it stays with BSSID1 and I don't see any "Association request to
the driver failed" in the process.

Thanks,
Jesse
Wen-chien Jesse Sung Oct. 17, 2017, 2:10 p.m. UTC | #4
On Tue, Oct 17, 2017 at 10:08 PM, Jesse Sung <jesse.sung@canonical.com> wrote:
> On Tue, Oct 17, 2017 at 9:13 PM, Johannes Berg
> <johannes@sipsolutions.net> wrote:
>>
>> On Tue, 2017-10-17 at 21:07 +0800, Jesse Sung wrote:
>> >
>> > > Not really quite sure about it yet, but that should address the
>> > > issue?
>> >
>> > A very rough test by issuing reassociate in wpa_cli:
>> > mwifiex works after reassociate - looks good
>>
>> Ok. Discussing this with Ilan, we realized that this was buggy, and
>> came up with this patch instead:
>>
>> https://p.sipsolutions.net/8f9d5917a5cbe4b3.txt
>>
>> Can you also give that a spin?
>
> Issued four reassociate, the network interface is still alive. Also with this
> patch it stays with BSSID1 and I don't see any "Association request to
> the driver failed" in the process.

Correction: I can still see "Association request to the driver failed" and
switching between BSSIDs with more reassociates but I think that shouldn't
be an issue?
Johannes Berg Oct. 17, 2017, 3:10 p.m. UTC | #5
Hi,

> > > https://p.sipsolutions.net/8f9d5917a5cbe4b3.txt
> > > 
> > > Can you also give that a spin?

Thanks for testing!

> > Issued four reassociate, the network interface is still alive. Also
> > with this patch it stays with BSSID1 and I don't see any
> > "Association request to the driver failed" in the process.
> 
> Correction: I can still see "Association request to the driver
> failed" and switching between BSSIDs with more reassociates but I
> think that shouldn't be an issue?

Not sure what you mean by "with more reassociates"?

In any case, we suspect that there's another underlying issue in
mwifiex, which happens to tickle this problem.

Could you record a full trace with -e cfg80211 for me (and send it
privately, please compress trace.dat e.g. with xz, it compresses really
well)

Thanks,
johannes
Wen-chien Jesse Sung Oct. 17, 2017, 3:25 p.m. UTC | #6
On Tue, Oct 17, 2017 at 11:10 PM, Johannes Berg
<johannes@sipsolutions.net> wrote:
> Hi,
>
>> > > https://p.sipsolutions.net/8f9d5917a5cbe4b3.txt
>> > >
>> > > Can you also give that a spin?
>
> Thanks for testing!
>
>> > Issued four reassociate, the network interface is still alive. Also
>> > with this patch it stays with BSSID1 and I don't see any
>> > "Association request to the driver failed" in the process.
>>
>> Correction: I can still see "Association request to the driver
>> failed" and switching between BSSIDs with more reassociates but I
>> think that shouldn't be an issue?
>
> Not sure what you mean by "with more reassociates"?

I mean do more "reassociate" in the wpa_cli. :)

Thanks,
Jesse
Brian Norris Oct. 26, 2017, 9:13 p.m. UTC | #7
Hi,

I'm not sure I've followed all the problems here, but I wanted to point
some things out:

On Tue, Oct 17, 2017 at 12:48:18PM +0200, Johannes Berg wrote:
> On Tue, 2017-10-17 at 18:18 +0800, Jesse Sung wrote:
> 
> > > Does mwifiex treat this -EALREADY as *keeping* an old connection,
> > > or tearing it down entirely?
> > 
> > From the call trace:
> 
> Well, the call trace can't really answer that :-)
> Does mwifiex firmware stay connected?

IIUC, mwifiex hasn't told the firmware to do anything at this point --
the -EALREADY check is practically the first thing it does within
connect(). So it just quits the connect() request and tries to carry on
as usual. It will only do something different if the upper layers tell
it to do so afterward (e.g., calling disconnect()).

> > 139.451318: nl80211_get_valid_chan <-nl80211_connect
> > 139.451321: cfg80211_connect <-nl80211_connect
> > 139.451322: cfg80211_oper_and_ht_capa <-cfg80211_connect
> > 139.451323: mwifiex_cfg80211_connect <-cfg80211_connect
> > 139.451337: nl80211_post_doit <-genl_family_rcv_msg
> > 139.451423: nl80211_pre_doit <-genl_family_rcv_msg
> > 139.451425: nl80211_disconnect <-genl_family_rcv_msg
> > 139.451426: cfg80211_disconnect <-nl80211_disconnect
> > 139.451430: mwifiex_cfg80211_disconnect <-cfg80211_disconnect
> > 
> > mwifiex_cfg80211_disconnect() would be called after
> > mwifiex_cfg80211_connect(), though I'm not sure if it's triggered by
> > the error returned.
> 
> I think so - it's probably wpa_supplicant trying to get back to a well-
> known state (of being disconnected).

Yes, that's definitely what's happening. And it's explicitly called out
in the supplicant's nl80211 driver that this is intentional:

static int wpa_driver_nl80211_connect(
	struct wpa_driver_nl80211_data *drv,
	struct wpa_driver_associate_params *params)
{
...
	ret = wpa_driver_nl80211_try_connect(drv, params);
	if (ret == -EALREADY) {
		/*
		 * cfg80211 does not currently accept new connections if
		 * we are already connected. As a workaround, force
		 * disconnection and try again.
		 */
		wpa_printf(MSG_DEBUG, "nl80211: Explicitly "
			   "disconnecting before reassociation "
			   "attempt");
		if (wpa_driver_nl80211_disconnect(
			    drv, WLAN_REASON_PREV_AUTH_NOT_VALID))
			return -1;
		ret = wpa_driver_nl80211_try_connect(drv, params);
	}
	return ret;
}

This is the main code path for supplicant commands like "Reattach",
which boil down to (for non SME drivers):

wpas_request_connection()
  ...
 -> wpa_supplicant_connect()
   -> wpa_supplicant_associate()
     -> wpas_start_assoc_cb()
       -> wpa_drv_associate()
         -> wpa_driver_nl80211_associate()
           -> wpa_driver_nl80211_connect()

Now for the part I'm not so familiar with: is this really the *expected*
flow for full-MAC drivers in reattach, reassociate, and roaming flows?
All of those seem to boil down to this same connect() (and fallback to
disconnect()+connect() if -EALREADY) flow.

But it doesn't seem like all full-MAC drivers do the same thing. Some
seem to just blaze ahead with a connect attempt (maybe some firmwares
automatically interpret this for us?) and never return -EALREADY at all.

Sorry if this is slightly off-topic, but I'm trying to understand what
the general expectations are here, based on my relatively narrow
experience with a few drivers.

Brian
Johannes Berg Oct. 27, 2017, 8:10 p.m. UTC | #8
Hi,

> IIUC, mwifiex hasn't told the firmware to do anything at this point --
> the -EALREADY check is practically the first thing it does within
> connect(). So it just quits the connect() request and tries to carry on
> as usual. It will only do something different if the upper layers tell
> it to do so afterward (e.g., calling disconnect()).

Yeah, that makes sense.

> Yes, that's definitely what's happening. And it's explicitly called out
> in the supplicant's nl80211 driver that this is intentional:
> 
> [...]

Right.

> This is the main code path for supplicant commands like "Reattach",
> which boil down to (for non SME drivers):
> 
> wpas_request_connection()
>   ...
>  -> wpa_supplicant_connect()
>    -> wpa_supplicant_associate()
>      -> wpas_start_assoc_cb()
>        -> wpa_drv_associate()
>          -> wpa_driver_nl80211_associate()
>            -> wpa_driver_nl80211_connect()
> 
> Now for the part I'm not so familiar with: is this really the *expected*
> flow for full-MAC drivers in reattach, reassociate, and roaming flows?
> All of those seem to boil down to this same connect() (and fallback to
> disconnect()+connect() if -EALREADY) flow.

We never implemented a "ROAM" command, so there's not all that much
choice.

> But it doesn't seem like all full-MAC drivers do the same thing. Some
> seem to just blaze ahead with a connect attempt (maybe some firmwares
> automatically interpret this for us?) and never return -EALREADY at all.

Agree, some seem to just do some form of roaming on this, which really
just means they disconnect internally - or perhaps they even try to
roam if it's the same network?

> Sorry if this is slightly off-topic, but I'm trying to understand what
> the general expectations are here, based on my relatively narrow
> experience with a few drivers.

I don't really know either! There aren't that many direct cfg80211
drivers that don't use mac80211, so there's not that much experience.

You're probably well positioned to say what the behaviour _should_ be
though? :-)

I tend to think we should actually do the EALREADY from cfg80211, so
that wpa_s will always be forced to go through this roundabout code
path, but also finally implement a ROAM command, to let drivers have
that option?

Note also that we've always sort of envisioned that drivers
implementing CONNECT would do BSS selection themselves, but I think
there's no automatic way of doing that with wpa_s, and it may not even
be desirable in many cases (unless you really need the power saving
advantage) since it gets us into a situation where we have all these
different algorithms etc.

johannes
Arend van Spriel Oct. 28, 2017, 9:32 p.m. UTC | #9
On 27-10-17 22:10, Johannes Berg wrote:
> Hi,
> 
>> IIUC, mwifiex hasn't told the firmware to do anything at this point --
>> the -EALREADY check is practically the first thing it does within
>> connect(). So it just quits the connect() request and tries to carry on
>> as usual. It will only do something different if the upper layers tell
>> it to do so afterward (e.g., calling disconnect()).
> 
> Yeah, that makes sense.
> 
>> Yes, that's definitely what's happening. And it's explicitly called out
>> in the supplicant's nl80211 driver that this is intentional:
>>
>> [...]
> 
> Right.
> 
>> This is the main code path for supplicant commands like "Reattach",
>> which boil down to (for non SME drivers):
>>
>> wpas_request_connection()
>>    ...
>>   -> wpa_supplicant_connect()
>>     -> wpa_supplicant_associate()
>>       -> wpas_start_assoc_cb()
>>         -> wpa_drv_associate()
>>           -> wpa_driver_nl80211_associate()
>>             -> wpa_driver_nl80211_connect()
>>
>> Now for the part I'm not so familiar with: is this really the *expected*
>> flow for full-MAC drivers in reattach, reassociate, and roaming flows?
>> All of those seem to boil down to this same connect() (and fallback to
>> disconnect()+connect() if -EALREADY) flow.
> 
> We never implemented a "ROAM" command, so there's not all that much
> choice.
> 
>> But it doesn't seem like all full-MAC drivers do the same thing. Some
>> seem to just blaze ahead with a connect attempt (maybe some firmwares
>> automatically interpret this for us?) and never return -EALREADY at all.
> 
> Agree, some seem to just do some form of roaming on this, which really
> just means they disconnect internally - or perhaps they even try to
> roam if it's the same network?
> 
>> Sorry if this is slightly off-topic, but I'm trying to understand what
>> the general expectations are here, based on my relatively narrow
>> experience with a few drivers.
> 
> I don't really know either! There aren't that many direct cfg80211
> drivers that don't use mac80211, so there's not that much experience.
> 
> You're probably well positioned to say what the behaviour _should_ be
> though? :-)
> 
> I tend to think we should actually do the EALREADY from cfg80211, so
> that wpa_s will always be forced to go through this roundabout code
> path, but also finally implement a ROAM command, to let drivers have
> that option?
> 
> Note also that we've always sort of envisioned that drivers
> implementing CONNECT would do BSS selection themselves, but I think
> there's no automatic way of doing that with wpa_s, and it may not even
> be desirable in many cases (unless you really need the power saving
> advantage) since it gets us into a situation where we have all these
> different algorithms etc.

wpa_s can issue a CONNECT without bssid (nor bssid_hint) leaving it up 
to the driver to select the BSS. I am pretty sure I hit that scenario 
although not sure what exact criteria are for wpa_s. Maybe it depended 
on WIPHY_FLAG_SUPPORTS_FW_ROAM.

Regarding BSS selection I added bss_select feature. An effort which 
actually was triggered by a chromebook project. Guess the number of 
drivers implementing that can be counted on one ...finger ;-)

Regards,
Arend
diff mbox

Patch

diff --git a/net/wireless/sme.c b/net/wireless/sme.c
index b347e63d7aaa..fe0037ad1f5e 100644
--- a/net/wireless/sme.c
+++ b/net/wireless/sme.c
@@ -1042,6 +1042,9 @@  int cfg80211_connect(struct cfg80211_registered_device *rdev,
 
 	ASSERT_WDEV_LOCK(wdev);
 
+	if (wdev->current_bss)
+		return -EALREADY;
+
 	if (WARN_ON(wdev->connect_keys)) {
 		kzfree(wdev->connect_keys);
 		wdev->connect_keys = NULL;