diff mbox series

[v2] rhashtable: make walk safe from softirq context

Message ID 20190206090721.8001-1-johannes@sipsolutions.net (mailing list archive)
State Not Applicable
Delegated to: Johannes Berg
Headers show
Series [v2] rhashtable: make walk safe from softirq context | expand

Commit Message

Johannes Berg Feb. 6, 2019, 9:07 a.m. UTC
From: Johannes Berg <johannes.berg@intel.com>

When an rhashtable walk is done from softirq context, we rightfully
get a lockdep complaint saying that we could get a softirq in the
middle of a rehash, and thus deadlock on &ht->lock. This happened
e.g. in mac80211 as it does a walk in softirq context.

Fix this by using spin_lock_bh() wherever we use the &ht->lock.

Initially, I thought it would be sufficient to do this only in the
rehash (rhashtable_rehash_table), but I changed my mind:
 * the caller doesn't really need to disable softirqs across all
   of the rhashtable_walk_* functions, only those parts that they
   actually do within the lock need it
 * maybe more importantly, it would still lead to massive lockdep
   complaints - false positives, but hard to fix - because lockdep
   wouldn't know about different ht->lock instances, and thus one
   user of the code doing a walk w/o any locking (when it only ever
   uses process context this is fine) vs. another user like in wifi
   where we noticed this problem would still cause it to complain.

Cc: stable@vger.kernel.org
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
 lib/rhashtable.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

Comments

Herbert Xu Feb. 7, 2019, 1:40 p.m. UTC | #1
On Wed, Feb 06, 2019 at 10:07:21AM +0100, Johannes Berg wrote:
> From: Johannes Berg <johannes.berg@intel.com>
> 
> When an rhashtable walk is done from softirq context, we rightfully
> get a lockdep complaint saying that we could get a softirq in the
> middle of a rehash, and thus deadlock on &ht->lock. This happened
> e.g. in mac80211 as it does a walk in softirq context.
> 
> Fix this by using spin_lock_bh() wherever we use the &ht->lock.
> 
> Initially, I thought it would be sufficient to do this only in the
> rehash (rhashtable_rehash_table), but I changed my mind:
>  * the caller doesn't really need to disable softirqs across all
>    of the rhashtable_walk_* functions, only those parts that they
>    actually do within the lock need it
>  * maybe more importantly, it would still lead to massive lockdep
>    complaints - false positives, but hard to fix - because lockdep
>    wouldn't know about different ht->lock instances, and thus one
>    user of the code doing a walk w/o any locking (when it only ever
>    uses process context this is fine) vs. another user like in wifi
>    where we noticed this problem would still cause it to complain.
> 
> Cc: stable@vger.kernel.org
> Reported-by: Jouni Malinen <j@w1.fi>
> Signed-off-by: Johannes Berg <johannes.berg@intel.com>

This interface wasn't designed for use in softirq contexts.

Could you please show me who is doing this so I can review that
to see whether it's a legitimate use of this API?

Thanks,
Johannes Berg Feb. 7, 2019, 1:50 p.m. UTC | #2
> This interface wasn't designed for use in softirq contexts.

Well, it clearly was used there. You even gave it a gfp_t argument in
rhashtable_walk_init(), so you can't really claim it wasn't designed for
this. I see now that it's ignored, but still?

> Could you please show me who is doing this so I can review that
> to see whether it's a legitimate use of this API?

I'm sure you'll say it's not legitimate, but it still exists ;-)

mesh_plink_broken() gets called from the TX status path, via
ieee80211s_update_metric().

johannes
Herbert Xu Feb. 7, 2019, 9:48 p.m. UTC | #3
On Thu, Feb 07, 2019 at 02:50:54PM +0100, Johannes Berg wrote:
> 
> > This interface wasn't designed for use in softirq contexts.
> 
> Well, it clearly was used there. You even gave it a gfp_t argument in
> rhashtable_walk_init(), so you can't really claim it wasn't designed for
> this. I see now that it's ignored, but still?

I see.  This was added behind my back so I wasn't aware of it.

> > Could you please show me who is doing this so I can review that
> > to see whether it's a legitimate use of this API?
> 
> I'm sure you'll say it's not legitimate, but it still exists ;-)
> 
> mesh_plink_broken() gets called from the TX status path, via
> ieee80211s_update_metric().

Let me take a look.

Thanks!
Johannes Berg Feb. 7, 2019, 9:56 p.m. UTC | #4
On Fri, 2019-02-08 at 05:48 +0800, Herbert Xu wrote:
> On Thu, Feb 07, 2019 at 02:50:54PM +0100, Johannes Berg wrote:
> > 
> > > This interface wasn't designed for use in softirq contexts.
> > 
> > Well, it clearly was used there. You even gave it a gfp_t argument in
> > rhashtable_walk_init(), so you can't really claim it wasn't designed for
> > this. I see now that it's ignored, but still?
> 
> I see.  This was added behind my back so I wasn't aware of it.

It's not used and actually I was wrong anyway: this would have also
allowed doing the walk while holding a spinlock or with softirqs
disabled, rather than from IRQ/softirq context.

In any case, it's clearly working to iterate from this context, and
doing a spinlock_bh vs. a spinlock in the rhashtable code isn't really
that big a deal? Not sure I really understand your objection.

johannes
Herbert Xu Feb. 7, 2019, 10:02 p.m. UTC | #5
On Fri, Feb 08, 2019 at 05:48:34AM +0800, Herbert Xu wrote:
>
> > > Could you please show me who is doing this so I can review that
> > > to see whether it's a legitimate use of this API?
> > 
> > I'm sure you'll say it's not legitimate, but it still exists ;-)
> > 
> > mesh_plink_broken() gets called from the TX status path, via
> > ieee80211s_update_metric().
> 
> Let me take a look.

OK I think it is wrong but you already knew that :)

First of all our current walk interface does not guarantee that you
will see all elements because a parallel remove can cause you to miss
elements.  Although Neil Brown was trying to fix that it is still
not in the tree yet.

But more fundamentally, iterating over an unbounded list in a sotirq
context is *broken*!

Either you don't really need a hash table at all because your list
is short enough to start with, or you need to add more hash tables
(or other data structures) to efficiently look things up using these
alternative keys so that you don't end up hogging the softirq.

So this needs to be fixed in mesh.

Once that is gone hopefully we can remove the long obsolete init
API that carries the GFP flag.

Cheers,
David Miller Feb. 12, 2019, 6:43 p.m. UTC | #6
From: Johannes Berg <johannes@sipsolutions.net>
Date: Wed,  6 Feb 2019 10:07:21 +0100

> From: Johannes Berg <johannes.berg@intel.com>
> 
> When an rhashtable walk is done from softirq context, we rightfully
> get a lockdep complaint saying that we could get a softirq in the
> middle of a rehash, and thus deadlock on &ht->lock. This happened
> e.g. in mac80211 as it does a walk in softirq context.
> 
> Fix this by using spin_lock_bh() wherever we use the &ht->lock.
> 
> Initially, I thought it would be sufficient to do this only in the
> rehash (rhashtable_rehash_table), but I changed my mind:
>  * the caller doesn't really need to disable softirqs across all
>    of the rhashtable_walk_* functions, only those parts that they
>    actually do within the lock need it
>  * maybe more importantly, it would still lead to massive lockdep
>    complaints - false positives, but hard to fix - because lockdep
>    wouldn't know about different ht->lock instances, and thus one
>    user of the code doing a walk w/o any locking (when it only ever
>    uses process context this is fine) vs. another user like in wifi
>    where we noticed this problem would still cause it to complain.
> 
> Cc: stable@vger.kernel.org
> Reported-by: Jouni Malinen <j@w1.fi>
> Signed-off-by: Johannes Berg <johannes.berg@intel.com>

Herbert and Johannes, I need some guidance.

It seems Herbert wants the softirq usage of rhashtables removed, but
since things have been like this for so long that's not the most
reasonable requirement if we can fix it more simply with Johannes's
patch especially for -stable.

Thanks.
Johannes Berg Feb. 12, 2019, 7:03 p.m. UTC | #7
On Tue, 2019-02-12 at 10:43 -0800, David Miller wrote:

> Herbert and Johannes, I need some guidance.
> 
> It seems Herbert wants the softirq usage of rhashtables removed, 

Well, specifically of rhashtable walkers. I can only concede that he's
right in that a hashtable walk during softirq (or even with softirqs
disabled) was maybe a bad idea.

At the same time, it's likely going to be pretty deep surgery in this
code, and I'm not sure I can do that right now. Maybe Bob has some
thoughts if it can be achieved more easily, but I think it'd require
adding a new list to each station that tracks which mesh paths it is the
next_hop for, and making sure that's maintained correctly, which feels
tricky but maybe it's not (I could be more familiar with mesh ...)


Evidently this goes back to 

commit 60854fd94573f0d3b80b55b40cf0140a0430f3ab
Author: Bob Copeland <me@bobcopeland.com>
Date:   Wed Mar 2 10:09:20 2016 -0500

    mac80211: mesh: convert path table to rhashtable

which is kinda old. Not sure why this didn't surface before, because the
spinlock was introduced *before*, otherwise certainly the mutex would've
caused us to not be able to do this code to start with (commit
c6ff5268293 - rhashtable: Fix walker list corruption).

That commit also just converted an existing hashtable walk to
rhashtable, so not sure that counts as having introduced the problem :-)


I guess that's not really guidance. If it were my call I'd apply the
patch and issue a stern warning to myself to remove this ASAP ;-) But
sadly, mesh isn't exactly a priority to most, so not sure when that "P"
would be.


But I guess we should also ask Bob first:
1) do you think it'd be easy to maintain a separate list or avoid the
   iteration in some otherway, and make that a small enough patch to be
   applicable for stable?
2) or do you think maybe the mesh_plink_broken() call could just be
   lifted into a workqueue instead?

johannes
Bob Copeland Feb. 12, 2019, 9:54 p.m. UTC | #8
On Tue, Feb 12, 2019 at 08:03:17PM +0100, Johannes Berg wrote:
> commit 60854fd94573f0d3b80b55b40cf0140a0430f3ab
> Author: Bob Copeland <me@bobcopeland.com>
> Date:   Wed Mar 2 10:09:20 2016 -0500
> 
>     mac80211: mesh: convert path table to rhashtable
> 
> which is kinda old. Not sure why this didn't surface before, because the
> spinlock was introduced *before*, otherwise certainly the mutex would've
> caused us to not be able to do this code to start with (commit
> c6ff5268293 - rhashtable: Fix walker list corruption).
> 
> That commit also just converted an existing hashtable walk to
> rhashtable, so not sure that counts as having introduced the problem :-)

Yeah, I didn't change any of the contexts in which the iterations were
called, except making it possible to replace the previous mesh-specific
RCU hashtable with rhashtable by introducing this.  That said, it would
certainly be nice to not have to walk the table to find paths that
use a specific station as nexthop.

> But I guess we should also ask Bob first:
> 1) do you think it'd be easy to maintain a separate list or avoid the
>    iteration in some otherway, and make that a small enough patch to be
>    applicable for stable?

I'm not sure, it's been a while.  I guess most of the difficulties would
be around station removal?

> 2) or do you think maybe the mesh_plink_broken() call could just be
>    lifted into a workqueue instead?

As far as I know, no reason this can't wait until later; we might send
a few frames to the wrong destination but that can happen anyway.  But
there are a couple more walks in there like mesh_path_flush_by_nexthop().
Herbert Xu Feb. 13, 2019, 4:35 a.m. UTC | #9
On Tue, Feb 12, 2019 at 10:43:39AM -0800, David Miller wrote:
>
> Herbert and Johannes, I need some guidance.
> 
> It seems Herbert wants the softirq usage of rhashtables removed, but
> since things have been like this for so long that's not the most
> reasonable requirement if we can fix it more simply with Johannes's
> patch especially for -stable.

I agree that we should provide a fix simple enough to backport.
However, I think the simplest fix is to simply use a linked list
for the walk.

I can prepare a patch for this if everybody else agrees with this
approach.

Cheers,
diff mbox series

Patch

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 852ffa5160f1..30d14f8d9985 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -327,10 +327,10 @@  static int rhashtable_rehash_table(struct rhashtable *ht)
 	/* Publish the new table pointer. */
 	rcu_assign_pointer(ht->tbl, new_tbl);
 
-	spin_lock(&ht->lock);
+	spin_lock_bh(&ht->lock);
 	list_for_each_entry(walker, &old_tbl->walkers, list)
 		walker->tbl = NULL;
-	spin_unlock(&ht->lock);
+	spin_unlock_bh(&ht->lock);
 
 	/* Wait for readers. All new readers will see the new
 	 * table, and thus no references to the old table will
@@ -670,11 +670,11 @@  void rhashtable_walk_enter(struct rhashtable *ht, struct rhashtable_iter *iter)
 	iter->skip = 0;
 	iter->end_of_table = 0;
 
-	spin_lock(&ht->lock);
+	spin_lock_bh(&ht->lock);
 	iter->walker.tbl =
 		rcu_dereference_protected(ht->tbl, lockdep_is_held(&ht->lock));
 	list_add(&iter->walker.list, &iter->walker.tbl->walkers);
-	spin_unlock(&ht->lock);
+	spin_unlock_bh(&ht->lock);
 }
 EXPORT_SYMBOL_GPL(rhashtable_walk_enter);
 
@@ -686,10 +686,10 @@  EXPORT_SYMBOL_GPL(rhashtable_walk_enter);
  */
 void rhashtable_walk_exit(struct rhashtable_iter *iter)
 {
-	spin_lock(&iter->ht->lock);
+	spin_lock_bh(&iter->ht->lock);
 	if (iter->walker.tbl)
 		list_del(&iter->walker.list);
-	spin_unlock(&iter->ht->lock);
+	spin_unlock_bh(&iter->ht->lock);
 }
 EXPORT_SYMBOL_GPL(rhashtable_walk_exit);
 
@@ -719,10 +719,10 @@  int rhashtable_walk_start_check(struct rhashtable_iter *iter)
 
 	rcu_read_lock();
 
-	spin_lock(&ht->lock);
+	spin_lock_bh(&ht->lock);
 	if (iter->walker.tbl)
 		list_del(&iter->walker.list);
-	spin_unlock(&ht->lock);
+	spin_unlock_bh(&ht->lock);
 
 	if (iter->end_of_table)
 		return 0;
@@ -938,12 +938,12 @@  void rhashtable_walk_stop(struct rhashtable_iter *iter)
 
 	ht = iter->ht;
 
-	spin_lock(&ht->lock);
+	spin_lock_bh(&ht->lock);
 	if (tbl->rehash < tbl->size)
 		list_add(&iter->walker.list, &tbl->walkers);
 	else
 		iter->walker.tbl = NULL;
-	spin_unlock(&ht->lock);
+	spin_unlock_bh(&ht->lock);
 
 out:
 	rcu_read_unlock();