Message ID | 20230322194456.2331527-3-frederic@kernel.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | rcu/nocb: Shrinker related boring fixes | expand |
On Wed, Mar 22, 2023 at 08:44:54PM +0100, Frederic Weisbecker wrote: > The shrinker resets the lazy callbacks counter in order to trigger the > pending lazy queue flush though the rcuog kthread. The counter reset is > protected by the ->nocb_lock against concurrent accesses...except > for one of them. Here is a list of existing synchronized readers/writer: > > 1) The first lazy enqueuer (incrementing ->lazy_len to 1) does so under > ->nocb_lock and ->nocb_bypass_lock. > > 2) The further lazy enqueuers (incrementing ->lazy_len above 1) do so > under ->nocb_bypass_lock _only_. > > 3) The lazy flush checks and resets to 0 under ->nocb_lock and > ->nocb_bypass_lock. > > The shrinker protects its ->lazy_len reset against cases 1) and 3) but > not against 2). As such, setting ->lazy_len to 0 under the ->nocb_lock > may be cancelled right away by an overwrite from an enqueuer, leading > rcuog to ignore the flush. > > To avoid that, use the proper bypass flush API which takes care of all > those details. > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Again, good catch, and this one looks good to me. So what am I missing? ;-) Thanx, Paul > --- > kernel/rcu/tree_nocb.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h > index dd9b655ae533..cb57e8312231 100644 > --- a/kernel/rcu/tree_nocb.h > +++ b/kernel/rcu/tree_nocb.h > @@ -1356,7 +1356,7 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) > continue; > > rcu_nocb_lock_irqsave(rdp, flags); > - WRITE_ONCE(rdp->lazy_len, 0); > + WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies, false)); > rcu_nocb_unlock_irqrestore(rdp, flags); > wake_nocb_gp(rdp, false); > sc->nr_to_scan -= _count; > -- > 2.34.1 >
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h index dd9b655ae533..cb57e8312231 100644 --- a/kernel/rcu/tree_nocb.h +++ b/kernel/rcu/tree_nocb.h @@ -1356,7 +1356,7 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) continue; rcu_nocb_lock_irqsave(rdp, flags); - WRITE_ONCE(rdp->lazy_len, 0); + WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies, false)); rcu_nocb_unlock_irqrestore(rdp, flags); wake_nocb_gp(rdp, false); sc->nr_to_scan -= _count;
The shrinker resets the lazy callbacks counter in order to trigger the pending lazy queue flush though the rcuog kthread. The counter reset is protected by the ->nocb_lock against concurrent accesses...except for one of them. Here is a list of existing synchronized readers/writer: 1) The first lazy enqueuer (incrementing ->lazy_len to 1) does so under ->nocb_lock and ->nocb_bypass_lock. 2) The further lazy enqueuers (incrementing ->lazy_len above 1) do so under ->nocb_bypass_lock _only_. 3) The lazy flush checks and resets to 0 under ->nocb_lock and ->nocb_bypass_lock. The shrinker protects its ->lazy_len reset against cases 1) and 3) but not against 2). As such, setting ->lazy_len to 0 under the ->nocb_lock may be cancelled right away by an overwrite from an enqueuer, leading rcuog to ignore the flush. To avoid that, use the proper bypass flush API which takes care of all those details. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> --- kernel/rcu/tree_nocb.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)