diff mbox

Problems with "cfg80211: fix SME connect" commit

Message ID 1253859759.3868.569.camel@johannes.local (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Johannes Berg Sept. 25, 2009, 6:22 a.m. UTC
Thanks for your analysis.

> This seems to look like/relate to a little problem I have for the last
> few days - lately I have authentication failure on first try and have
> to click on NM a 2nd time for it to go through; blow away
> compat-wireless & revert to as-shipped distro modules gives me the
> older/smooth behavior  of NM just associating without me
> clicking/asking.
> 
> v2.6.31-38294-ged3ac87 + 'cfg80211: don't set privacy w/o key' doesn't
> improve my situation.
> 
> wpa_supplicant log:
> --------- distro modules:
> Trying to associate with <id> (SSID='ID' freq=2437 MHz)
> Associated with <id>
> CTRL-EVENT-CONNECTED - Connection to <id> completed (auth) [id=0 id_str=]
> CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys
> 
> -------- compat-wireless
> Trying to associate with <id> (SSID='ID' freq=2437 MHz)
> Authentication with 00:00:00:00:00:00 timed out.
> Trying to associate with <id> (SSID='ID' freq=2437 MHz)
> Associated with <id>
> CTRL-EVENT-CONNECTED - Connection to <id> completed (auth) [id=0 id_str=]
> 
> ------ dmesg distro modules
> wlan2: direct probe to AP <id> try 1
> wlan2 direct probe responded
> wlan2: authenticate with AP <id>
> wlan2: authenticated
> wlan2: associate with AP <id>
> wlan2: RX AssocResp from <id> (capab=0x431 status=0 aid=1)
> wlan2: associated
> 
> ------ compat-wireless, note the extra deauth at the beginning, and
> all those 'try 1''s.
> wlan2: deauthenticating by local choice (reason=3)
> wlan2: direct probe to AP <id> (try 1)
> wlan2 direct probe responded
> wlan2: authenticate with AP <id> (try 1)
> wlan2: authenticated
> wlan2: associate with AP <id> (try 1)
> wlan2: RX AssocResp from <id> (capab=0x431 status=0 aid=1)
> wlan2: associated

I've analysed this, and now know the reason for the extra deauth, but it
shouldn't hurt since we never send a wireless extensions event. The
reason is that once wpa_supplicant sets the SSID we already start to
connect with the new changes, but then setting the BSSID might require
restarting the process. This could be optimised, but I would prefer not
having to.

I can see a problem with the code and it trying to scan once more again
etc. Below patch seems to help for me. However, I only once managed to
reproduce the problem you were seeing with the authentication timeout in
wpa_supplicant.

Can you try this? The last hunk is most important, but the other stuff
helps debugging.

johannes



--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Hin-Tak Leung Sept. 25, 2009, 3:54 p.m. UTC | #1
On Fri, Sep 25, 2009 at 7:22 AM, Johannes Berg
<johannes@sipsolutions.net> wrote:
> Thanks for your analysis.
>
>> This seems to look like/relate to a little problem I have for the last
>> few days - lately I have authentication failure on first try and have
>> to click on NM a 2nd time for it to go through; blow away
>> compat-wireless & revert to as-shipped distro modules gives me the
>> older/smooth behavior  of NM just associating without me
>> clicking/asking.
>>
>> v2.6.31-38294-ged3ac87 + 'cfg80211: don't set privacy w/o key' doesn't
>> improve my situation.
>>
>> wpa_supplicant log:
>> --------- distro modules:
>> Trying to associate with <id> (SSID='ID' freq=2437 MHz)
>> Associated with <id>
>> CTRL-EVENT-CONNECTED - Connection to <id> completed (auth) [id=0 id_str=]
>> CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys
>>
>> -------- compat-wireless
>> Trying to associate with <id> (SSID='ID' freq=2437 MHz)
>> Authentication with 00:00:00:00:00:00 timed out.
>> Trying to associate with <id> (SSID='ID' freq=2437 MHz)
>> Associated with <id>
>> CTRL-EVENT-CONNECTED - Connection to <id> completed (auth) [id=0 id_str=]
>>
>> ------ dmesg distro modules
>> wlan2: direct probe to AP <id> try 1
>> wlan2 direct probe responded
>> wlan2: authenticate with AP <id>
>> wlan2: authenticated
>> wlan2: associate with AP <id>
>> wlan2: RX AssocResp from <id> (capab=0x431 status=0 aid=1)
>> wlan2: associated
>>
>> ------ compat-wireless, note the extra deauth at the beginning, and
>> all those 'try 1''s.
>> wlan2: deauthenticating by local choice (reason=3)
>> wlan2: direct probe to AP <id> (try 1)
>> wlan2 direct probe responded
>> wlan2: authenticate with AP <id> (try 1)
>> wlan2: authenticated
>> wlan2: associate with AP <id> (try 1)
>> wlan2: RX AssocResp from <id> (capab=0x431 status=0 aid=1)
>> wlan2: associated
>
> I've analysed this, and now know the reason for the extra deauth, but it
> shouldn't hurt since we never send a wireless extensions event. The
> reason is that once wpa_supplicant sets the SSID we already start to
> connect with the new changes, but then setting the BSSID might require
> restarting the process. This could be optimised, but I would prefer not
> having to.
>
> I can see a problem with the code and it trying to scan once more again
> etc. Below patch seems to help for me. However, I only once managed to
> reproduce the problem you were seeing with the authentication timeout in
> wpa_supplicant.
>
> Can you try this? The last hunk is most important, but the other stuff
> helps debugging.

Great. The extra timeout in wap_spplicant.log is gone, so it is back
to NM does it all by itself.

Here is the dmesg from this patch on top of everything else so far:

wlan2: starting authentication with _id_
wlan2: deauthenticating from _id_ by local choice (reason=3)
wlan2: starting authentication with _id_
wlan2: direct probe to AP _id_ (try 1)
wlan2: direct probe responded
wlan2: authenticate with AP _id_ (try 1)
wlan2: authenticated
wlan2: starting association with _id_
wlan2: associate with AP _id_ (try 1)
wlan2: RX AssocResp from _id_ (capab=0x431 status=0 aid=1)
wlan2: associated

There is still the extra deauth at the beginning, but I guess I can
live with it, it doesn't require user action to deal with (unlike
without this latest patch) I suppose there might be more tuning before
commit?  Otherwise Tested-by:

Hmm, slightly side-tracked - was the original poster using NM on top
on wpa_supplicant, just curious?

Cheers,
Hin-Tak
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Johannes Berg Sept. 26, 2009, 11:39 a.m. UTC | #2
On Fri, 2009-09-25 at 16:54 +0100, Hin-Tak Leung wrote:

> > Can you try this? The last hunk is most important, but the other stuff
> > helps debugging.
> 
> Great. The extra timeout in wap_spplicant.log is gone, so it is back
> to NM does it all by itself.

Interesting, thanks for the test. I'll go submit the patch.

> Here is the dmesg from this patch on top of everything else so far:
> 
> wlan2: starting authentication with _id_
> wlan2: deauthenticating from _id_ by local choice (reason=3)
> wlan2: starting authentication with _id_
> wlan2: direct probe to AP _id_ (try 1)
> wlan2: direct probe responded
> wlan2: authenticate with AP _id_ (try 1)
> wlan2: authenticated
> wlan2: starting association with _id_
> wlan2: associate with AP _id_ (try 1)
> wlan2: RX AssocResp from _id_ (capab=0x431 status=0 aid=1)
> wlan2: associated
> 
> There is still the extra deauth at the beginning, but I guess I can
> live with it, it doesn't require user action to deal with (unlike
> without this latest patch) I suppose there might be more tuning before
> commit? 

I think I'll just remove some of the printks again, but leave the ones
moving. Actually probably better to split it up into a mac80211 and a
cfg80211 patch.

The extra deauth is because cfg80211 already starts the auth with the
BSS before wpa_supplicant set the BSSID, and then when setting the BSSID
it asks for deauth, but before we ever actually did anything... I think
we'll just have to live with that, since it's hard to fix in the layered
design we have now.

>  Otherwise Tested-by:
> 
> Hmm, slightly side-tracked - was the original poster using NM on top
> on wpa_supplicant, just curious?

You mean Albert? I don't know.

johannes
Hin-Tak Leung Sept. 26, 2009, 11:57 p.m. UTC | #3
On Sat, Sep 26, 2009 at 12:39 PM, Johannes Berg
<johannes@sipsolutions.net> wrote:
> On Fri, 2009-09-25 at 16:54 +0100, Hin-Tak Leung wrote:
>
>> > Can you try this? The last hunk is most important, but the other stuff
>> > helps debugging.
>>
>> Great. The extra timeout in wap_spplicant.log is gone, so it is back
>> to NM does it all by itself.
>
> Interesting, thanks for the test. I'll go submit the patch.
>
>> Here is the dmesg from this patch on top of everything else so far:
>>
>> wlan2: starting authentication with _id_
>> wlan2: deauthenticating from _id_ by local choice (reason=3)
>> wlan2: starting authentication with _id_
>> wlan2: direct probe to AP _id_ (try 1)
>> wlan2: direct probe responded
>> wlan2: authenticate with AP _id_ (try 1)
>> wlan2: authenticated
>> wlan2: starting association with _id_
>> wlan2: associate with AP _id_ (try 1)
>> wlan2: RX AssocResp from _id_ (capab=0x431 status=0 aid=1)
>> wlan2: associated
>>
>> There is still the extra deauth at the beginning, but I guess I can
>> live with it, it doesn't require user action to deal with (unlike
>> without this latest patch) I suppose there might be more tuning before
>> commit?
>
> I think I'll just remove some of the printks again, but leave the ones
> moving. Actually probably better to split it up into a mac80211 and a
> cfg80211 patch.
>
> The extra deauth is because cfg80211 already starts the auth with the
> BSS before wpa_supplicant set the BSSID, and then when setting the BSSID
> it asks for deauth, but before we ever actually did anything... I think
> we'll just have to live with that, since it's hard to fix in the layered
> design we have now.

I suppose (together with some of the newly added printk you mentioned
could be removed in the final version) the dmesg messages are somewhat
confusing, because as a user, I would rather have a deauth message
that's actually associated with a user action (e.g. if I switch AP or
rfkill). Is it possible to distinguish situation where a user action
is involved versus one that isn't? or is the distinction between any
consequence of 'user-action' vs wpa_supplicant doing-it-on-its-own too
much buried down in the layers?

>>  Otherwise Tested-by:
>>
>> Hmm, slightly side-tracked - was the original poster using NM on top
>> on wpa_supplicant, just curious?
>
> You mean Albert? I don't know.

Yes - I meant Albert - wpa_spplicant runs directly probably behaves a
little differently from NM starting/stopping it.

>
> johannes
>
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Johannes Berg Sept. 28, 2009, 2:41 p.m. UTC | #4
On Sun, 2009-09-27 at 00:57 +0100, Hin-Tak Leung wrote:

> I suppose (together with some of the newly added printk you mentioned
> could be removed in the final version) the dmesg messages are somewhat
> confusing, because as a user, I would rather have a deauth message
> that's actually associated with a user action (e.g. if I switch AP or
> rfkill). Is it possible to distinguish situation where a user action
> is involved versus one that isn't? or is the distinction between any
> consequence of 'user-action' vs wpa_supplicant doing-it-on-its-own too
> much buried down in the layers?

Yeah, it'd be nice to avoid that completely. Or even just avoid telling
the driver, maybe with some delay akin iwcommit. Alas, I haven't looked
at it yet and right now it seems to just be a message (and possibly a
deauth frame)

johannes
diff mbox

Patch

--- wireless-testing.orig/net/mac80211/mlme.c	2009-09-25 07:38:46.000000000 +0200
+++ wireless-testing/net/mac80211/mlme.c	2009-09-25 08:12:26.000000000 +0200
@@ -1675,7 +1675,7 @@  static void ieee80211_rx_mgmt_probe_resp
 
 	/* direct probe may be part of the association flow */
 	if (wk && wk->state == IEEE80211_MGD_STATE_PROBE) {
-		printk(KERN_DEBUG "%s direct probe responded\n",
+		printk(KERN_DEBUG "%s: direct probe responded\n",
 		       sdata->dev->name);
 		wk->tries = 0;
 		wk->state = IEEE80211_MGD_STATE_AUTH;
@@ -2411,6 +2411,9 @@  int ieee80211_mgd_auth(struct ieee80211_
 	list_add(&wk->list, &sdata->u.mgd.work_list);
 	mutex_unlock(&ifmgd->mtx);
 
+	printk(KERN_DEBUG "%s: starting authentication with %pM\n",
+	       sdata->dev->name, req->bss->bssid);
+
 	ieee80211_queue_work(&sdata->local->hw, &sdata->u.mgd.work);
 	return 0;
 }
@@ -2485,6 +2488,9 @@  int ieee80211_mgd_assoc(struct ieee80211
 	else
 		ifmgd->flags &= ~IEEE80211_STA_CONTROL_PORT;
 
+	printk(KERN_DEBUG "%s: starting association with %pM\n",
+	       sdata->dev->name, req->bss->bssid);
+
 	ieee80211_queue_work(&sdata->local->hw, &sdata->u.mgd.work);
 
 	err = 0;
@@ -2502,9 +2508,6 @@  int ieee80211_mgd_deauth(struct ieee8021
 	struct ieee80211_mgd_work *wk;
 	const u8 *bssid = NULL;
 
-	printk(KERN_DEBUG "%s: deauthenticating by local choice (reason=%d)\n",
-	       sdata->dev->name, req->reason_code);
-
 	mutex_lock(&ifmgd->mtx);
 
 	if (ifmgd->associated && &ifmgd->associated->cbss == req->bss) {
@@ -2532,6 +2535,9 @@  int ieee80211_mgd_deauth(struct ieee8021
 
 	mutex_unlock(&ifmgd->mtx);
 
+	printk(KERN_DEBUG "%s: deauthenticating from %pM by local choice (reason=%d)\n",
+	       sdata->dev->name, bssid, req->reason_code);
+
 	ieee80211_send_deauth_disassoc(sdata, bssid,
 			IEEE80211_STYPE_DEAUTH, req->reason_code,
 			cookie);
@@ -2545,9 +2551,6 @@  int ieee80211_mgd_disassoc(struct ieee80
 {
 	struct ieee80211_if_managed *ifmgd = &sdata->u.mgd;
 
-	printk(KERN_DEBUG "%s: disassociating by local choice (reason=%d)\n",
-	       sdata->dev->name, req->reason_code);
-
 	mutex_lock(&ifmgd->mtx);
 
 	/*
@@ -2561,6 +2564,9 @@  int ieee80211_mgd_disassoc(struct ieee80
 		return -ENOLINK;
 	}
 
+	printk(KERN_DEBUG "%s: disassociating from %pM by local choice (reason=%d)\n",
+	       sdata->dev->name, req->bss->bssid, req->reason_code);
+
 	ieee80211_set_disassoc(sdata, false);
 
 	mutex_unlock(&ifmgd->mtx);
--- wireless-testing.orig/net/wireless/sme.c	2009-09-25 08:05:20.000000000 +0200
+++ wireless-testing/net/wireless/sme.c	2009-09-25 08:13:42.000000000 +0200
@@ -762,9 +762,8 @@  int __cfg80211_connect(struct cfg80211_r
 		wdev->conn->params.ssid = wdev->ssid;
 		wdev->conn->params.ssid_len = connect->ssid_len;
 
-		/* don't care about result -- but fill bssid & channel */
-		if (!wdev->conn->params.bssid || !wdev->conn->params.channel)
-			bss = cfg80211_get_conn_bss(wdev);
+		/* see if we have the bss already */
+		bss = cfg80211_get_conn_bss(wdev);
 
 		wdev->sme_state = CFG80211_SME_CONNECTING;
 		wdev->connect_keys = connkeys;