Message ID | 20220701143052.1267509-20-miquel.raynal@bootlin.com (mailing list archive) |
---|---|
State | Awaiting Upstream |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: ieee802154: Support scanning/beaconing | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Guessing tree name failed - patch did not apply |
Hi, On Fri, Jul 1, 2022 at 10:37 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > There is no need to ensure the rtnl is locked when changing a driver's > channel. This cause issues when scanning and this is the only driver > relying on it. Just drop this dependency because it does not seem > legitimate. > switching channels relies on changing pib attributes, pib attributes are protected by rtnl. If you experience issues here then it's probably because you do something wrong. All drivers assuming here that rtnl lock is held. - Alex
Hi, On Tue, Jul 5, 2022 at 9:23 PM Alexander Aring <aahringo@redhat.com> wrote: > > Hi, > > On Fri, Jul 1, 2022 at 10:37 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > There is no need to ensure the rtnl is locked when changing a driver's > > channel. This cause issues when scanning and this is the only driver > > relying on it. Just drop this dependency because it does not seem > > legitimate. > > > > switching channels relies on changing pib attributes, pib attributes > are protected by rtnl. If you experience issues here then it's > probably because you do something wrong. All drivers assuming here > that rtnl lock is held. especially this change could end in invalid free. Maybe we can solve this problem in a different way, what exactly is the problem by helding rtnl lock? - Alex
Hi Alexander, aahringo@redhat.com wrote on Tue, 5 Jul 2022 21:23:21 -0400: > Hi, > > On Fri, Jul 1, 2022 at 10:37 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > There is no need to ensure the rtnl is locked when changing a driver's > > channel. This cause issues when scanning and this is the only driver > > relying on it. Just drop this dependency because it does not seem > > legitimate. > > > > switching channels relies on changing pib attributes, pib attributes > are protected by rtnl. If you experience issues here then it's > probably because you do something wrong. All drivers assuming here > that rtnl lock is held. ---8<--- > especially this change could end in invalid free. Maybe we can solve > this problem in a different way, what exactly is the problem by > helding rtnl lock? --->8--- During a scan we need to change channels. So when the background job kicks-in, we first acquire scan_lock, then we check the internal parameters of the structure protected by this lock, like the next channel to use and the sdata pointer. A channel change must be performed, preceded by an rtnl_lock(). This will again trigger a possible circular lockdep dependency warning because the triggering path acquires the rtnl (as part of the netlink layer) before the scan lock. One possible solution would be to do the following: scan_work() { acquire(scan_lock); // do some config release(scan_lock); rtnl_lock(); perform_channel_change(); rtnl_unlock(); acquire(scan_lock); // reinit the scan struct ptr and the sdata ptr // do some more things release(scan_lock); } It looks highly non-elegant IMHO. Otherwise I need to stop verifying in the drivers that the rtnl is taken. Any third option here? BTW I don't see where the invalid free situation you mentioned can happen here, can you explain? Thanks, Miquèl
Hi Alexander, miquel.raynal@bootlin.com wrote on Fri, 19 Aug 2022 19:09:44 +0200: > Hi Alexander, > > aahringo@redhat.com wrote on Tue, 5 Jul 2022 21:23:21 -0400: > > > Hi, > > > > On Fri, Jul 1, 2022 at 10:37 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > > > > > There is no need to ensure the rtnl is locked when changing a driver's > > > channel. This cause issues when scanning and this is the only driver > > > relying on it. Just drop this dependency because it does not seem > > > legitimate. > > > > > > > switching channels relies on changing pib attributes, pib attributes > > are protected by rtnl. If you experience issues here then it's > > probably because you do something wrong. All drivers assuming here > > that rtnl lock is held. > > ---8<--- > > especially this change could end in invalid free. Maybe we can solve > > this problem in a different way, what exactly is the problem by > > helding rtnl lock? > --->8--- > > During a scan we need to change channels. So when the background job > kicks-in, we first acquire scan_lock, then we check the internal > parameters of the structure protected by this lock, like the next > channel to use and the sdata pointer. A channel change must be > performed, preceded by an rtnl_lock(). This will again trigger a > possible circular lockdep dependency warning because the triggering path > acquires the rtnl (as part of the netlink layer) before the scan lock. > > One possible solution would be to do the following: > scan_work() { > acquire(scan_lock); > // do some config > release(scan_lock); > rtnl_lock(); > perform_channel_change(); > rtnl_unlock(); > acquire(scan_lock); > // reinit the scan struct ptr and the sdata ptr > // do some more things > release(scan_lock); > } > > It looks highly non-elegant IMHO. Otherwise I need to stop verifying in > the drivers that the rtnl is taken. Any third option here? I've tried two other solutions. A/ Enforcing the dependency rtnl -> scan_lock This means always acquiring the rtnl before scan_lock, and in terms of code requires to take the rtnl in the scan worker. Of course enclosing the drv_change_chan() call would mean releasing the scan_lock in the middle and re-taking it after all, which would defeat the protection of the scan_req structure which the lock is supposed to enforce. So I went for acquiring the lock at the top, before acquiring scan_lock, of course. This does not work because we need to acquire the rtnl in the worker, while at the same time there are places like ->slave_close which need to acquire the worker lock (during flush_workqueue()) and this can only happen under rtnl. Lockdep then complains about a possible circular dependency. B/ Avoiding the rtnl in scan operations and allowing the reverse dependency, which is scan_lock -> rtnl I've drafted this solution because I think the scan operation do not really need the rtnl. This idea got reinforced when I found this wireless change: a05829a7222e ("cfg80211: avoid holding the RTNL when calling the driver"). But unfortunately I get the same issue again, with the ->close() implementation which needs to acquire the worker lock to flush, this makes a rtnl -> worker_lock dependency which is incompatible with a worker_lock -> scan_lock -> rtnl chain (this is is typically what should happen when changing the channel during a scan). So I looked at reducing the scope of scan_lock, in order to avoid taking it for too long and avoid the scan_lock -> rtnl or rtnl -> scan_lock dependency in the worker, but I think in the end it is a truly bad idea. Finally, I decided I could use another workqueue for the mac related commands which is not the one for the data. We don't care about flushing it because we _need_ the beacons/scan workers to be stopped, which is handled in their dedicated helpers. Doing so removes a rtnl -> worker_lock dependency, which allows to acquire the rtnl from the worker. I've mostly implemented it, I'll clean all this up and send a v2 tomorrow. Thanks, Miquèl
diff --git a/drivers/net/ieee802154/mac802154_hwsim.c b/drivers/net/ieee802154/mac802154_hwsim.c index 38c217bd7c82..a5b9fc2fb64c 100644 --- a/drivers/net/ieee802154/mac802154_hwsim.c +++ b/drivers/net/ieee802154/mac802154_hwsim.c @@ -100,7 +100,7 @@ static int hwsim_hw_channel(struct ieee802154_hw *hw, u8 page, u8 channel) pib->page = page; pib->channel = channel; - pib_old = rtnl_dereference(phy->pib); + rcu_assign_pointer(pib_old, phy->pib); rcu_assign_pointer(phy->pib, pib); kfree_rcu(pib_old, rcu); return 0;
There is no need to ensure the rtnl is locked when changing a driver's channel. This cause issues when scanning and this is the only driver relying on it. Just drop this dependency because it does not seem legitimate. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> --- drivers/net/ieee802154/mac802154_hwsim.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)