Message ID | Z4FXs8vAtitHIJyl@gondor.apana.org.au (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2] rhashtable: Fix rhashtable_try_insert test | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Guessing tree name failed - patch did not apply |
From: Herbert Xu <herbert@gondor.apana.org.au> Sent: Friday, January 10, 2025 9:24 AM > > On Fri, Jan 10, 2025 at 04:59:28PM +0000, Michael Kelley wrote: > > > > This patch fixes the problem I saw with VMs in the Azure cloud. Thanks! > > Sorry, but the test on data is needed after all (it was just > buggy). Otherwise we will break rhlist. So please test this > patch instead. > > ---8<--- > The test on whether rhashtable_insert_one did an insertion relies > on the value returned by rhashtable_lookup_one. Unfortunately that > value is overwritten after rhashtable_insert_one returns. Fix this > by saving the old value. > > Also simplify the test as only data == NULL matters. > > Reported-by: Michael Kelley <mhklinux@outlook.com> > Fixes: e1d3422c95f0 ("rhashtable: Fix potential deadlock by moving schedule_work > outside lock") > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> > > diff --git a/lib/rhashtable.c b/lib/rhashtable.c > index bf956b85455a..e36b36f3146d 100644 > --- a/lib/rhashtable.c > +++ b/lib/rhashtable.c > @@ -611,17 +611,20 @@ static void *rhashtable_try_insert(struct rhashtable *ht, > const void *key, > new_tbl = rht_dereference_rcu(tbl->future_tbl, ht); > data = ERR_PTR(-EAGAIN); > } else { > + void *odata; > + > flags = rht_lock(tbl, bkt); > data = rhashtable_lookup_one(ht, bkt, tbl, > hash, key, obj); > new_tbl = rhashtable_insert_one(ht, bkt, tbl, > hash, obj, data); > + odata = data; > if (PTR_ERR(new_tbl) != -EEXIST) > data = ERR_CAST(new_tbl); > > rht_unlock(tbl, bkt, flags); > > - if (PTR_ERR(data) == -ENOENT && !new_tbl) { > + if (odata && !new_tbl) { > atomic_inc(&ht->nelems); > if (rht_grow_above_75(ht, tbl)) > schedule_work(&ht->run_work); This patch passes my tests. I'm doing a narrow test to verify that the boot failure when opening the Mellanox NIC is no longer occurring. I also unloaded/reloaded the mlx5 driver a couple of times. For good measure, I then did a full Linux kernel build, and all is good. My testing does not broadly verify correct operation of rhashtable except as it gets exercised implicitly by these basic tests. Michael
diff --git a/lib/rhashtable.c b/lib/rhashtable.c index bf956b85455a..e36b36f3146d 100644 --- a/lib/rhashtable.c +++ b/lib/rhashtable.c @@ -611,17 +611,20 @@ static void *rhashtable_try_insert(struct rhashtable *ht, const void *key, new_tbl = rht_dereference_rcu(tbl->future_tbl, ht); data = ERR_PTR(-EAGAIN); } else { + void *odata; + flags = rht_lock(tbl, bkt); data = rhashtable_lookup_one(ht, bkt, tbl, hash, key, obj); new_tbl = rhashtable_insert_one(ht, bkt, tbl, hash, obj, data); + odata = data; if (PTR_ERR(new_tbl) != -EEXIST) data = ERR_CAST(new_tbl); rht_unlock(tbl, bkt, flags); - if (PTR_ERR(data) == -ENOENT && !new_tbl) { + if (odata && !new_tbl) { atomic_inc(&ht->nelems); if (rht_grow_above_75(ht, tbl)) schedule_work(&ht->run_work);