Message ID | 20190206090721.8001-1-johannes@sipsolutions.net (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Johannes Berg |
Headers | show |
Series | [v2] rhashtable: make walk safe from softirq context | expand |
On Wed, Feb 06, 2019 at 10:07:21AM +0100, Johannes Berg wrote: > From: Johannes Berg <johannes.berg@intel.com> > > When an rhashtable walk is done from softirq context, we rightfully > get a lockdep complaint saying that we could get a softirq in the > middle of a rehash, and thus deadlock on &ht->lock. This happened > e.g. in mac80211 as it does a walk in softirq context. > > Fix this by using spin_lock_bh() wherever we use the &ht->lock. > > Initially, I thought it would be sufficient to do this only in the > rehash (rhashtable_rehash_table), but I changed my mind: > * the caller doesn't really need to disable softirqs across all > of the rhashtable_walk_* functions, only those parts that they > actually do within the lock need it > * maybe more importantly, it would still lead to massive lockdep > complaints - false positives, but hard to fix - because lockdep > wouldn't know about different ht->lock instances, and thus one > user of the code doing a walk w/o any locking (when it only ever > uses process context this is fine) vs. another user like in wifi > where we noticed this problem would still cause it to complain. > > Cc: stable@vger.kernel.org > Reported-by: Jouni Malinen <j@w1.fi> > Signed-off-by: Johannes Berg <johannes.berg@intel.com> This interface wasn't designed for use in softirq contexts. Could you please show me who is doing this so I can review that to see whether it's a legitimate use of this API? Thanks,
> This interface wasn't designed for use in softirq contexts. Well, it clearly was used there. You even gave it a gfp_t argument in rhashtable_walk_init(), so you can't really claim it wasn't designed for this. I see now that it's ignored, but still? > Could you please show me who is doing this so I can review that > to see whether it's a legitimate use of this API? I'm sure you'll say it's not legitimate, but it still exists ;-) mesh_plink_broken() gets called from the TX status path, via ieee80211s_update_metric(). johannes
On Thu, Feb 07, 2019 at 02:50:54PM +0100, Johannes Berg wrote: > > > This interface wasn't designed for use in softirq contexts. > > Well, it clearly was used there. You even gave it a gfp_t argument in > rhashtable_walk_init(), so you can't really claim it wasn't designed for > this. I see now that it's ignored, but still? I see. This was added behind my back so I wasn't aware of it. > > Could you please show me who is doing this so I can review that > > to see whether it's a legitimate use of this API? > > I'm sure you'll say it's not legitimate, but it still exists ;-) > > mesh_plink_broken() gets called from the TX status path, via > ieee80211s_update_metric(). Let me take a look. Thanks!
On Fri, 2019-02-08 at 05:48 +0800, Herbert Xu wrote: > On Thu, Feb 07, 2019 at 02:50:54PM +0100, Johannes Berg wrote: > > > > > This interface wasn't designed for use in softirq contexts. > > > > Well, it clearly was used there. You even gave it a gfp_t argument in > > rhashtable_walk_init(), so you can't really claim it wasn't designed for > > this. I see now that it's ignored, but still? > > I see. This was added behind my back so I wasn't aware of it. It's not used and actually I was wrong anyway: this would have also allowed doing the walk while holding a spinlock or with softirqs disabled, rather than from IRQ/softirq context. In any case, it's clearly working to iterate from this context, and doing a spinlock_bh vs. a spinlock in the rhashtable code isn't really that big a deal? Not sure I really understand your objection. johannes
On Fri, Feb 08, 2019 at 05:48:34AM +0800, Herbert Xu wrote: > > > > Could you please show me who is doing this so I can review that > > > to see whether it's a legitimate use of this API? > > > > I'm sure you'll say it's not legitimate, but it still exists ;-) > > > > mesh_plink_broken() gets called from the TX status path, via > > ieee80211s_update_metric(). > > Let me take a look. OK I think it is wrong but you already knew that :) First of all our current walk interface does not guarantee that you will see all elements because a parallel remove can cause you to miss elements. Although Neil Brown was trying to fix that it is still not in the tree yet. But more fundamentally, iterating over an unbounded list in a sotirq context is *broken*! Either you don't really need a hash table at all because your list is short enough to start with, or you need to add more hash tables (or other data structures) to efficiently look things up using these alternative keys so that you don't end up hogging the softirq. So this needs to be fixed in mesh. Once that is gone hopefully we can remove the long obsolete init API that carries the GFP flag. Cheers,
From: Johannes Berg <johannes@sipsolutions.net> Date: Wed, 6 Feb 2019 10:07:21 +0100 > From: Johannes Berg <johannes.berg@intel.com> > > When an rhashtable walk is done from softirq context, we rightfully > get a lockdep complaint saying that we could get a softirq in the > middle of a rehash, and thus deadlock on &ht->lock. This happened > e.g. in mac80211 as it does a walk in softirq context. > > Fix this by using spin_lock_bh() wherever we use the &ht->lock. > > Initially, I thought it would be sufficient to do this only in the > rehash (rhashtable_rehash_table), but I changed my mind: > * the caller doesn't really need to disable softirqs across all > of the rhashtable_walk_* functions, only those parts that they > actually do within the lock need it > * maybe more importantly, it would still lead to massive lockdep > complaints - false positives, but hard to fix - because lockdep > wouldn't know about different ht->lock instances, and thus one > user of the code doing a walk w/o any locking (when it only ever > uses process context this is fine) vs. another user like in wifi > where we noticed this problem would still cause it to complain. > > Cc: stable@vger.kernel.org > Reported-by: Jouni Malinen <j@w1.fi> > Signed-off-by: Johannes Berg <johannes.berg@intel.com> Herbert and Johannes, I need some guidance. It seems Herbert wants the softirq usage of rhashtables removed, but since things have been like this for so long that's not the most reasonable requirement if we can fix it more simply with Johannes's patch especially for -stable. Thanks.
On Tue, 2019-02-12 at 10:43 -0800, David Miller wrote: > Herbert and Johannes, I need some guidance. > > It seems Herbert wants the softirq usage of rhashtables removed, Well, specifically of rhashtable walkers. I can only concede that he's right in that a hashtable walk during softirq (or even with softirqs disabled) was maybe a bad idea. At the same time, it's likely going to be pretty deep surgery in this code, and I'm not sure I can do that right now. Maybe Bob has some thoughts if it can be achieved more easily, but I think it'd require adding a new list to each station that tracks which mesh paths it is the next_hop for, and making sure that's maintained correctly, which feels tricky but maybe it's not (I could be more familiar with mesh ...) Evidently this goes back to commit 60854fd94573f0d3b80b55b40cf0140a0430f3ab Author: Bob Copeland <me@bobcopeland.com> Date: Wed Mar 2 10:09:20 2016 -0500 mac80211: mesh: convert path table to rhashtable which is kinda old. Not sure why this didn't surface before, because the spinlock was introduced *before*, otherwise certainly the mutex would've caused us to not be able to do this code to start with (commit c6ff5268293 - rhashtable: Fix walker list corruption). That commit also just converted an existing hashtable walk to rhashtable, so not sure that counts as having introduced the problem :-) I guess that's not really guidance. If it were my call I'd apply the patch and issue a stern warning to myself to remove this ASAP ;-) But sadly, mesh isn't exactly a priority to most, so not sure when that "P" would be. But I guess we should also ask Bob first: 1) do you think it'd be easy to maintain a separate list or avoid the iteration in some otherway, and make that a small enough patch to be applicable for stable? 2) or do you think maybe the mesh_plink_broken() call could just be lifted into a workqueue instead? johannes
On Tue, Feb 12, 2019 at 08:03:17PM +0100, Johannes Berg wrote: > commit 60854fd94573f0d3b80b55b40cf0140a0430f3ab > Author: Bob Copeland <me@bobcopeland.com> > Date: Wed Mar 2 10:09:20 2016 -0500 > > mac80211: mesh: convert path table to rhashtable > > which is kinda old. Not sure why this didn't surface before, because the > spinlock was introduced *before*, otherwise certainly the mutex would've > caused us to not be able to do this code to start with (commit > c6ff5268293 - rhashtable: Fix walker list corruption). > > That commit also just converted an existing hashtable walk to > rhashtable, so not sure that counts as having introduced the problem :-) Yeah, I didn't change any of the contexts in which the iterations were called, except making it possible to replace the previous mesh-specific RCU hashtable with rhashtable by introducing this. That said, it would certainly be nice to not have to walk the table to find paths that use a specific station as nexthop. > But I guess we should also ask Bob first: > 1) do you think it'd be easy to maintain a separate list or avoid the > iteration in some otherway, and make that a small enough patch to be > applicable for stable? I'm not sure, it's been a while. I guess most of the difficulties would be around station removal? > 2) or do you think maybe the mesh_plink_broken() call could just be > lifted into a workqueue instead? As far as I know, no reason this can't wait until later; we might send a few frames to the wrong destination but that can happen anyway. But there are a couple more walks in there like mesh_path_flush_by_nexthop().
On Tue, Feb 12, 2019 at 10:43:39AM -0800, David Miller wrote: > > Herbert and Johannes, I need some guidance. > > It seems Herbert wants the softirq usage of rhashtables removed, but > since things have been like this for so long that's not the most > reasonable requirement if we can fix it more simply with Johannes's > patch especially for -stable. I agree that we should provide a fix simple enough to backport. However, I think the simplest fix is to simply use a linked list for the walk. I can prepare a patch for this if everybody else agrees with this approach. Cheers,
diff --git a/lib/rhashtable.c b/lib/rhashtable.c index 852ffa5160f1..30d14f8d9985 100644 --- a/lib/rhashtable.c +++ b/lib/rhashtable.c @@ -327,10 +327,10 @@ static int rhashtable_rehash_table(struct rhashtable *ht) /* Publish the new table pointer. */ rcu_assign_pointer(ht->tbl, new_tbl); - spin_lock(&ht->lock); + spin_lock_bh(&ht->lock); list_for_each_entry(walker, &old_tbl->walkers, list) walker->tbl = NULL; - spin_unlock(&ht->lock); + spin_unlock_bh(&ht->lock); /* Wait for readers. All new readers will see the new * table, and thus no references to the old table will @@ -670,11 +670,11 @@ void rhashtable_walk_enter(struct rhashtable *ht, struct rhashtable_iter *iter) iter->skip = 0; iter->end_of_table = 0; - spin_lock(&ht->lock); + spin_lock_bh(&ht->lock); iter->walker.tbl = rcu_dereference_protected(ht->tbl, lockdep_is_held(&ht->lock)); list_add(&iter->walker.list, &iter->walker.tbl->walkers); - spin_unlock(&ht->lock); + spin_unlock_bh(&ht->lock); } EXPORT_SYMBOL_GPL(rhashtable_walk_enter); @@ -686,10 +686,10 @@ EXPORT_SYMBOL_GPL(rhashtable_walk_enter); */ void rhashtable_walk_exit(struct rhashtable_iter *iter) { - spin_lock(&iter->ht->lock); + spin_lock_bh(&iter->ht->lock); if (iter->walker.tbl) list_del(&iter->walker.list); - spin_unlock(&iter->ht->lock); + spin_unlock_bh(&iter->ht->lock); } EXPORT_SYMBOL_GPL(rhashtable_walk_exit); @@ -719,10 +719,10 @@ int rhashtable_walk_start_check(struct rhashtable_iter *iter) rcu_read_lock(); - spin_lock(&ht->lock); + spin_lock_bh(&ht->lock); if (iter->walker.tbl) list_del(&iter->walker.list); - spin_unlock(&ht->lock); + spin_unlock_bh(&ht->lock); if (iter->end_of_table) return 0; @@ -938,12 +938,12 @@ void rhashtable_walk_stop(struct rhashtable_iter *iter) ht = iter->ht; - spin_lock(&ht->lock); + spin_lock_bh(&ht->lock); if (tbl->rehash < tbl->size) list_add(&iter->walker.list, &tbl->walkers); else iter->walker.tbl = NULL; - spin_unlock(&ht->lock); + spin_unlock_bh(&ht->lock); out: rcu_read_unlock();