Message ID | 1425902001-3163-1-git-send-email-eliad@wizery.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Johannes Berg |
Headers | show |
On Mon, 2015-03-09 at 13:53 +0200, Eliad Peller wrote: > Fix it by cancelling hw_roc_done on roc cancel. > Since hw_roc_done takes local->mtx, temporarily > release it (before re-acquiring it and starting > the next roc if needed). I can't say I like this, and I'm not even sure it doesn't leave a race? After all, the work will not take the RTNL. Also, I think the whole concept is racy because you acquire "found" before the unlock, but use it after the unlock, so if the work actually ran (_sync, perhaps it already started) it may have freed the "found" pointer. I think the only way to really fix that is to make the driver return the roc pointer or so and then we can either queue a work struct per roc struct, or at least mark the roc struct as driver-destroyed and double-check that in the work function? Or perhaps we could flush the work before we even take the lock, but then it might still race with the driver trying to cancel just after we flushed I guess. johannes -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Mar 9, 2015 at 2:52 PM, Johannes Berg <johannes@sipsolutions.net> wrote: > On Mon, 2015-03-09 at 13:53 +0200, Eliad Peller wrote: > >> Fix it by cancelling hw_roc_done on roc cancel. >> Since hw_roc_done takes local->mtx, temporarily >> release it (before re-acquiring it and starting >> the next roc if needed). > > I can't say I like this, and I'm not even sure it doesn't leave a race? > After all, the work will not take the RTNL. > the other way (cancel while hw_roc_done) is protected, since ieee80211_cancel_roc() looks for a specific roc cookie. > Also, I think the whole concept is racy because you acquire "found" > before the unlock, but use it after the unlock, so if the work actually > ran (_sync, perhaps it already started) it may have freed the "found" > pointer. > hw_roc_done work bails out in case of no started roc, so i think we're safe here. > I think the only way to really fix that is to make the driver return the > roc pointer or so and then we can either queue a work struct per roc > struct, or at least mark the roc struct as driver-destroyed and > double-check that in the work function? > that was my initial implementation. but then i thought it will be nicer to implement it in mac80211 and avoid changing all the drivers (with the need to save the cookie, etc.) do you prefer adding such cookie/pointer param? > Or perhaps we could flush the work before we even take the lock, but > then it might still race with the driver trying to cancel just after we > flushed I guess. right. Eliad. -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
To revive an old thread - sorry... > >> Fix it by cancelling hw_roc_done on roc cancel. > >> Since hw_roc_done takes local->mtx, temporarily > >> release it (before re-acquiring it and starting > >> the next roc if needed). > > > > I can't say I like this, and I'm not even sure it doesn't leave a race? > > After all, the work will not take the RTNL. > > > the other way (cancel while hw_roc_done) is protected, since > ieee80211_cancel_roc() looks for a specific roc cookie. But you're saying that we rely on RTNL to not modify the list, and the hw_roc_done work doesn't use the RTNL? Hmm. Are you saying now that it also checks that it was started? But then why do we even have this issue? Or I can see why we have this issue, but perhaps we could fix it by flushing the work *after* we cancel, so in case the driver was cancelling at the same time we'd not yet have started the next one? > > I think the only way to really fix that is to make the driver return the > > roc pointer or so and then we can either queue a work struct per roc > > struct, or at least mark the roc struct as driver-destroyed and > > double-check that in the work function? > > > that was my initial implementation. but then i thought it will be > nicer to implement it in mac80211 and avoid changing all the drivers > (with the need to save the cookie, etc.) > do you prefer adding such cookie/pointer param? How difficult was it for the driver(s)? It seems it would make the code far easier to reason about, which can only be a good thing. We'd save the whole race discussion above :) johannes -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 6, 2015 at 4:15 PM, Johannes Berg <johannes@sipsolutions.net> wrote: > To revive an old thread - sorry... > >> >> Fix it by cancelling hw_roc_done on roc cancel. >> >> Since hw_roc_done takes local->mtx, temporarily >> >> release it (before re-acquiring it and starting >> >> the next roc if needed). >> > >> > I can't say I like this, and I'm not even sure it doesn't leave a race? >> > After all, the work will not take the RTNL. >> > >> the other way (cancel while hw_roc_done) is protected, since >> ieee80211_cancel_roc() looks for a specific roc cookie. > > But you're saying that we rely on RTNL to not modify the list, and the > hw_roc_done work doesn't use the RTNL? Hmm. Are you saying now that it > also checks that it was started? But then why do we even have this > issue? > sorry, i'll have to recall the exact details... > Or I can see why we have this issue, but perhaps we could fix it by > flushing the work *after* we cancel, so in case the driver was > cancelling at the same time we'd not yet have started the next one? > that's basically what i did - canceled the work after drv_cancel_remain_on_channel() was called. >> > I think the only way to really fix that is to make the driver return the >> > roc pointer or so and then we can either queue a work struct per roc >> > struct, or at least mark the roc struct as driver-destroyed and >> > double-check that in the work function? >> > >> that was my initial implementation. but then i thought it will be >> nicer to implement it in mac80211 and avoid changing all the drivers >> (with the need to save the cookie, etc.) >> do you prefer adding such cookie/pointer param? > > How difficult was it for the driver(s)? It seems it would make the code > far easier to reason about, which can only be a good thing. We'd save > the whole race discussion above :) > sure. let's forget about this discussion and go with the clearer solution :) Eliad. -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c index dd4ff36..a80f7b3 100644 --- a/net/mac80211/cfg.c +++ b/net/mac80211/cfg.c @@ -2782,7 +2782,22 @@ static int ieee80211_cancel_roc(struct ieee80211_local *local, } list_del(&found->list); + mutex_unlock(&local->mtx); + + /* + * The driver might have already expired the ROC (by + * calling ieee80211_remain_on_channel_expired()), so + * cancel the pending roc_done work (otherwise, it might + * cancel the upcoming roc). + */ + cancel_work_sync(&local->hw_roc_done); + mutex_lock(&local->mtx); + /* + * We hold rtnl, so nothing should have been able to + * manipulate the roc_list during the temporary + * lock release. + */ if (found->started) ieee80211_start_next_roc(local); mutex_unlock(&local->mtx);
mac80211 handles roc expirations (indicated by ieee80211_remain_on_channel_expired()) with a separate work (hw_roc_done). If userspace asks to cancel the ongoing roc before hw_roc_done was scheduled, mac80211 will cancel the ongoing roc, while leaving the hw_roc_done work pending. This might end up in the next roc being cancelled by the stale work, causing mac80211 and the low-level driver to go out of sync. Fix it by cancelling hw_roc_done on roc cancel. Since hw_roc_done takes local->mtx, temporarily release it (before re-acquiring it and starting the next roc if needed). Signed-off-by: Eliad Peller <eliad@wizery.com> --- net/mac80211/cfg.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+)