diff mbox

sta_find_ibss (active_ibss=1) in a loop?

Message ID 1307370716.3894.10.camel@jlt3.sipsolutions.net (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Johannes Berg June 6, 2011, 2:31 p.m. UTC
On Mon, 2011-06-06 at 13:04 +0200, Ignacy Gawedzki wrote:

> After looking more in details at the code, I think I'm getting closer to a
> proper diagnostic.

Thanks for digging into this.

> The state is changed to MLME_SEARCH when ieee80211_ibss_join() is called from
> ieee80211_join_ibss() in cfg.c.  I suppose this is what's called when an
> application asks the kernel to make the interface join an IBSS.  Since in that
> state no new STA can be added to local->sta_list, if there were no STA at all,
> there's no last_rx to update and consequently ieee80211_sta_active_ibss()
> returns 0 and ieee80211_sta_find_ibss() may proceed.
> 
> But apparently, if there are still some remaining STAs in local->sta_list,
> their last_rx is updated upon reception of a frame from them, so if they are
> transmitting frames often enough, ieee80211_sta_active_ibss() won't ever
> return 0 and ieee80211_sta_find_ibss() won't be able to proceed at all, which
> will keep the interface in the MLME_SEARCH state and hence the loop.
> 
> So how comes there are remaining STAs in local->sta_list?  When
> ieee80211_ibss_leave() is called from ieee80211_leave_ibss() in cfg.c,
> sta_info_flush() is called and there are no remaining STAs.  But what if
> ieee80211_ibss_join() is called without prior call to ieee80211_ibss_leave()?
> I suppose this is when it goes wrong.

I don't think this can happen.

> Is flushing the local->sta_list in ieee80211_ibss_join() a good solution?

I think we should see how this situation happens. cfg80211 shouldn't be
allowing a join while an IBSS is already joined, so the above situation
you describe shouldn't be possible. Also, I don't see that it is
possible that a station is added while in search state or not joined
(BSSID should be zeroes).

Then again, it looks like it could be a race. A station could be added
while _leave() is in the middle of the flush. This is because
ieee80211_ibss_add_sta() can't hold the mutex.

Thus, the proper fix might be this:



johannes

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Ignacy Gawedzki June 6, 2011, 4:01 p.m. UTC | #1
On Mon, Jun 06, 2011 at 04:31:56PM +0200, thus spake Johannes Berg:
> On Mon, 2011-06-06 at 13:04 +0200, Ignacy Gawedzki wrote:
> > Is flushing the local->sta_list in ieee80211_ibss_join() a good solution?
> 
> I think we should see how this situation happens. cfg80211 shouldn't be
> allowing a join while an IBSS is already joined, so the above situation
> you describe shouldn't be possible. Also, I don't see that it is
> possible that a station is added while in search state or not joined
> (BSSID should be zeroes).
> 
> Then again, it looks like it could be a race.

Given how difficult it is to reproduce, it smells like a race to me.

>                                               A station could be added
> while _leave() is in the middle of the flush. This is because
> ieee80211_ibss_add_sta() can't hold the mutex.

I see.

> Thus, the proper fix might be this:
> 
> --- wireless-testing.orig/net/mac80211/ibss.c	2011-06-06 12:25:06.000000000 +0200
> +++ wireless-testing/net/mac80211/ibss.c	2011-06-06 16:31:26.000000000 +0200
> @@ -965,6 +965,9 @@ int ieee80211_ibss_leave(struct ieee8021
>  
>  	mutex_lock(&sdata->u.ibss.mtx);
>  
> +	memset(sdata->u.ibss.bssid, 0, ETH_ALEN);
> +	sdata->u.ibss.state = IEEE80211_IBSS_MLME_SEARCH;
> +
>  	active_ibss = ieee80211_sta_active_ibss(sdata);
>  
>  	if (!active_ibss && !is_zero_ether_addr(ifibss->bssid)) {
> @@ -999,7 +1002,6 @@ int ieee80211_ibss_leave(struct ieee8021
>  	kfree_skb(skb);
>  
>  	skb_queue_purge(&sdata->skb_queue);
> -	memset(sdata->u.ibss.bssid, 0, ETH_ALEN);
>  	sdata->u.ibss.ssid_len = 0;

Shouldn't that second line also move up with the memset?

>  
>  	del_timer_sync(&sdata->u.ibss.timer);

So this fix relies on the fact that reading sdata->i.ibss.state is an atomic
operation, right?

Thanks for the patch, I'll give it a try, though it may be some time until I
report further on this, since the problem is difficult to reproduce.
Johannes Berg June 6, 2011, 4:11 p.m. UTC | #2
On Mon, 2011-06-06 at 18:01 +0200, Ignacy Gawedzki wrote:

> > @@ -999,7 +1002,6 @@ int ieee80211_ibss_leave(struct ieee8021
> >  	kfree_skb(skb);
> >  
> >  	skb_queue_purge(&sdata->skb_queue);
> > -	memset(sdata->u.ibss.bssid, 0, ETH_ALEN);
> >  	sdata->u.ibss.ssid_len = 0;
> 
> Shouldn't that second line also move up with the memset?

Not really necessary -- that's not used in the RX path. But yeah we can
do that for consistency. Also maybe the skb queue purge.

> >  	del_timer_sync(&sdata->u.ibss.timer);
> 
> So this fix relies on the fact that reading sdata->i.ibss.state is an atomic
> operation, right?
> 
> Thanks for the patch, I'll give it a try, though it may be some time until I
> report further on this, since the problem is difficult to reproduce.

Oh ok. Well, I think we should put these changes in anyway since I see
nothing preventing the race now.

johannes

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ignacy Gawedzki June 8, 2011, 11:13 a.m. UTC | #3
On Mon, Jun 06, 2011 at 06:11:32PM +0200, thus spake Johannes Berg:
> Oh ok. Well, I think we should put these changes in anyway since I see
> nothing preventing the race now.

After having applied this patch, I haven't seen the problem appear for two
days (and two nights) of testing.  I suppose this sort of confirms the fix.

Thanks again.

Ignacy
Johannes Berg June 8, 2011, 11:26 a.m. UTC | #4
On Wed, 2011-06-08 at 13:13 +0200, Ignacy Gawedzki wrote:
> On Mon, Jun 06, 2011 at 06:11:32PM +0200, thus spake Johannes Berg:
> > Oh ok. Well, I think we should put these changes in anyway since I see
> > nothing preventing the race now.
> 
> After having applied this patch, I haven't seen the problem appear for two
> days (and two nights) of testing.  I suppose this sort of confirms the fix.

Thanks. I just sent out the patch officially.

johannes

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- wireless-testing.orig/net/mac80211/ibss.c	2011-06-06 12:25:06.000000000 +0200
+++ wireless-testing/net/mac80211/ibss.c	2011-06-06 16:31:26.000000000 +0200
@@ -965,6 +965,9 @@  int ieee80211_ibss_leave(struct ieee8021
 
 	mutex_lock(&sdata->u.ibss.mtx);
 
+	memset(sdata->u.ibss.bssid, 0, ETH_ALEN);
+	sdata->u.ibss.state = IEEE80211_IBSS_MLME_SEARCH;
+
 	active_ibss = ieee80211_sta_active_ibss(sdata);
 
 	if (!active_ibss && !is_zero_ether_addr(ifibss->bssid)) {
@@ -999,7 +1002,6 @@  int ieee80211_ibss_leave(struct ieee8021
 	kfree_skb(skb);
 
 	skb_queue_purge(&sdata->skb_queue);
-	memset(sdata->u.ibss.bssid, 0, ETH_ALEN);
 	sdata->u.ibss.ssid_len = 0;
 
 	del_timer_sync(&sdata->u.ibss.timer);