Message ID | 20240515125332.9306-3-frederic@kernel.org (mailing list archive) |
---|---|
State | Accepted |
Commit | c71a9c8588197f2491187c2acf4563d1913351e0 |
Headers | show |
Series | rcu: Remove several redundant memory barriers | expand |
On 15/05/24 14:53, Frederic Weisbecker wrote: > When the grace period kthread checks the extended quiescent state > counter of a CPU, full ordering is necessary to ensure that either: > > * If the GP kthread observes the remote target in an extended quiescent > state, then that target must observe all accesses prior to the current > grace period, including the current grace period sequence number, once > it exits that extended quiescent state. > > or: > > * If the GP kthread observes the remote target NOT in an extended > quiescent state, then the target further entering in an extended > quiescent state must observe all accesses prior to the current > grace period, including the current grace period sequence number, once > it enters that extended quiescent state. > > This ordering is enforced through a full memory barrier placed right > before taking the first EQS snapshot. However this is superfluous > because the snapshot is taken while holding the target's rnp lock which > provides the necessary ordering through its chain of > smp_mb__after_unlock_lock(). > > Remove the needless explicit barrier before the snapshot and put a > comment about the implicit barrier newly relied upon here. > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org> > --- > .../Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst | 6 +++--- > kernel/rcu/tree.c | 7 ++++++- > 2 files changed, 9 insertions(+), 4 deletions(-) > > diff --git a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst > index 5750f125361b..728b1e690c64 100644 > --- a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst > +++ b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst > @@ -149,9 +149,9 @@ This case is handled by calls to the strongly ordered > ``atomic_add_return()`` read-modify-write atomic operation that > is invoked within ``rcu_dynticks_eqs_enter()`` at idle-entry > time and within ``rcu_dynticks_eqs_exit()`` at idle-exit time. > -The grace-period kthread invokes ``rcu_dynticks_snap()`` and > -``rcu_dynticks_in_eqs_since()`` (both of which invoke > -an ``atomic_add_return()`` of zero) to detect idle CPUs. > +The grace-period kthread invokes first ``ct_dynticks_cpu_acquire()`` > +(preceded by a full memory barrier) and ``rcu_dynticks_in_eqs_since()`` > +(both of which rely on acquire semantics) to detect idle CPUs. > > +-----------------------------------------------------------------------+ > | **Quick Quiz**: | > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 58415cdc54f8..f5354de5644b 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -773,7 +773,12 @@ static void rcu_gpnum_ovf(struct rcu_node *rnp, struct rcu_data *rdp) > */ > static int dyntick_save_progress_counter(struct rcu_data *rdp) > { > - rdp->dynticks_snap = rcu_dynticks_snap(rdp->cpu); So for PPC, which gets the smp_mb() at the lock acquisition, this is an "obvious" redundant smp_mb(). For the other archs, per the definition of smp_mb__after_unlock_lock() it seems implied that UNLOCK+LOCK is a full memory barrier, but I wanted to see it explicitly stated somewhere. From a bit of spelunking below I still think it's the case, but is there a "better" source of truth? 01352fb81658 ("locking: Add an smp_mb__after_unlock_lock() for UNLOCK+BLOCK barrier") """ The Linux kernel has traditionally required that an UNLOCK+LOCK pair act as a full memory barrier when either (1) that UNLOCK+LOCK pair was executed by the same CPU or task, or (2) the same lock variable was used for the UNLOCK and LOCK. """ and https://lore.kernel.org/all/1436789704-10086-1-git-send-email-will.deacon@arm.com/ """ This ordering guarantee is already provided without the barrier on all architectures apart from PowerPC """ > + /* > + * Full ordering against accesses prior current GP and also against ^^^^^ prior to > + * current GP sequence number is enforced by current rnp locking > + * with chained smp_mb__after_unlock_lock(). > + */ > + rdp->dynticks_snap = ct_dynticks_cpu_acquire(rdp->cpu); > if (rcu_dynticks_in_eqs(rdp->dynticks_snap)) { > trace_rcu_fqs(rcu_state.name, rdp->gp_seq, rdp->cpu, TPS("dti")); > rcu_gpnum_ovf(rdp->mynode, rdp); > -- > 2.44.0
On Thu, May 16, 2024 at 05:31:40PM +0200, Valentin Schneider wrote: > On 15/05/24 14:53, Frederic Weisbecker wrote: > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index 58415cdc54f8..f5354de5644b 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -773,7 +773,12 @@ static void rcu_gpnum_ovf(struct rcu_node *rnp, struct rcu_data *rdp) > > */ > > static int dyntick_save_progress_counter(struct rcu_data *rdp) > > { > > - rdp->dynticks_snap = rcu_dynticks_snap(rdp->cpu); > > So for PPC, which gets the smp_mb() at the lock acquisition, this is an > "obvious" redundant smp_mb(). > > For the other archs, per the definition of smp_mb__after_unlock_lock() it > seems implied that UNLOCK+LOCK is a full memory barrier, but I wanted to > see it explicitly stated somewhere. From a bit of spelunking below I still > think it's the case, but is there a "better" source of truth? > > 01352fb81658 ("locking: Add an smp_mb__after_unlock_lock() for UNLOCK+BLOCK barrier") > """ > The Linux kernel has traditionally required that an UNLOCK+LOCK pair act as a > full memory barrier when either (1) that UNLOCK+LOCK pair was executed by the > same CPU or task, or (2) the same lock variable was used for the UNLOCK and > LOCK. > """ > > and > > https://lore.kernel.org/all/1436789704-10086-1-git-send-email-will.deacon@arm.com/ > """ > This ordering guarantee is already provided without the barrier on > all architectures apart from PowerPC > """ You seem to have found the accurate informations! But I must admit they are hard to find and it would be welcome to document that properly, for example in Documentation/memory-barriers.txt I think the reason is that it's not supposed to be used outside RCU, perhaps because its semantics are too fragile to use for general purpose? Even that could be stated along in Documentation/memory-barriers.txt Another thing is that its semantics are similar to smp_mb__after_spinlock() (itself badly documented), although slightly different. I'm not even completely sure how. I assume that smp_mb__after_spinlock() can be just used once to produce the required ordering and subsequent lock on that spinlock don't need to repeat the barrier to propagate the ordering against what is before the smp_mb__after_spinlock. However IUUC smp_mb__after_unlock_lock() has to be chained/repeated on all subsequent locking of the spinlock... Thanks. > > > + /* > > + * Full ordering against accesses prior current GP and also against > ^^^^^ > prior to > > > + * current GP sequence number is enforced by current rnp locking > > + * with chained smp_mb__after_unlock_lock(). > > + */ > > + rdp->dynticks_snap = ct_dynticks_cpu_acquire(rdp->cpu); > > if (rcu_dynticks_in_eqs(rdp->dynticks_snap)) { > > trace_rcu_fqs(rcu_state.name, rdp->gp_seq, rdp->cpu, TPS("dti")); > > rcu_gpnum_ovf(rdp->mynode, rdp); > > -- > > 2.44.0 >
On 16/05/24 18:08, Frederic Weisbecker wrote: > On Thu, May 16, 2024 at 05:31:40PM +0200, Valentin Schneider wrote: >> On 15/05/24 14:53, Frederic Weisbecker wrote: >> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c >> > index 58415cdc54f8..f5354de5644b 100644 >> > --- a/kernel/rcu/tree.c >> > +++ b/kernel/rcu/tree.c >> > @@ -773,7 +773,12 @@ static void rcu_gpnum_ovf(struct rcu_node *rnp, struct rcu_data *rdp) >> > */ >> > static int dyntick_save_progress_counter(struct rcu_data *rdp) >> > { >> > - rdp->dynticks_snap = rcu_dynticks_snap(rdp->cpu); >> >> So for PPC, which gets the smp_mb() at the lock acquisition, this is an >> "obvious" redundant smp_mb(). >> >> For the other archs, per the definition of smp_mb__after_unlock_lock() it >> seems implied that UNLOCK+LOCK is a full memory barrier, but I wanted to >> see it explicitly stated somewhere. From a bit of spelunking below I still >> think it's the case, but is there a "better" source of truth? >> >> 01352fb81658 ("locking: Add an smp_mb__after_unlock_lock() for UNLOCK+BLOCK barrier") >> """ >> The Linux kernel has traditionally required that an UNLOCK+LOCK pair act as a >> full memory barrier when either (1) that UNLOCK+LOCK pair was executed by the >> same CPU or task, or (2) the same lock variable was used for the UNLOCK and >> LOCK. >> """ >> >> and >> >> https://lore.kernel.org/all/1436789704-10086-1-git-send-email-will.deacon@arm.com/ >> """ >> This ordering guarantee is already provided without the barrier on >> all architectures apart from PowerPC >> """ > > You seem to have found the accurate informations! But I must admit > they are hard to find and it would be welcome to document that properly, for example > in Documentation/memory-barriers.txt > > I think the reason is that it's not supposed to be used outside RCU, perhaps > because its semantics are too fragile to use for general purpose? Even that > could be stated along in Documentation/memory-barriers.txt > That's also what I suspected when I stumbled on 12d560f4ea87 ("rcu,locking: Privatize smp_mb__after_unlock_lock()") which removed the references to it from Documentation/memory-barriers.txt > Another thing is that its semantics are similar to smp_mb__after_spinlock() > (itself badly documented), although slightly different. I'm not even completely > sure how. I assume that smp_mb__after_spinlock() can be just used once to > produce the required ordering and subsequent lock on that spinlock don't need > to repeat the barrier to propagate the ordering against what is before the > smp_mb__after_spinlock. However IUUC smp_mb__after_unlock_lock() has to be > chained/repeated on all subsequent locking of the spinlock... IIUC (big if) the chaining is a requirement of RCU itself, per: 2a67e741bbbc ("rcu: Create transitive rnp->lock acquisition functions") * Because the rcu_nodes form a tree, the tree traversal locking will observe * different lock values, this in turn means that an UNLOCK of one level * followed by a LOCK of another level does not imply a full memory barrier; * and most importantly transitivity is lost. * * In order to restore full ordering between tree levels, augment the regular * lock acquire functions with smp_mb__after_unlock_lock().
> >> > @@ -773,7 +773,12 @@ static void rcu_gpnum_ovf(struct rcu_node *rnp, struct rcu_data *rdp) > >> > */ > >> > static int dyntick_save_progress_counter(struct rcu_data *rdp) > >> > { > >> > - rdp->dynticks_snap = rcu_dynticks_snap(rdp->cpu); > >> > >> So for PPC, which gets the smp_mb() at the lock acquisition, this is an > >> "obvious" redundant smp_mb(). > >> > >> For the other archs, per the definition of smp_mb__after_unlock_lock() it > >> seems implied that UNLOCK+LOCK is a full memory barrier, but I wanted to > >> see it explicitly stated somewhere. From a bit of spelunking below I still > >> think it's the case, but is there a "better" source of truth? > >> > >> 01352fb81658 ("locking: Add an smp_mb__after_unlock_lock() for UNLOCK+BLOCK barrier") > >> """ > >> The Linux kernel has traditionally required that an UNLOCK+LOCK pair act as a > >> full memory barrier when either (1) that UNLOCK+LOCK pair was executed by the > >> same CPU or task, or (2) the same lock variable was used for the UNLOCK and > >> LOCK. > >> """ > >> > >> and > >> > >> https://lore.kernel.org/all/1436789704-10086-1-git-send-email-will.deacon@arm.com/ > >> """ > >> This ordering guarantee is already provided without the barrier on > >> all architectures apart from PowerPC > >> """ > > > > You seem to have found the accurate informations! But I must admit > > they are hard to find and it would be welcome to document that properly, for example > > in Documentation/memory-barriers.txt > > > > I think the reason is that it's not supposed to be used outside RCU, perhaps > > because its semantics are too fragile to use for general purpose? Even that > > could be stated along in Documentation/memory-barriers.txt > > > > That's also what I suspected when I stumbled on > > 12d560f4ea87 ("rcu,locking: Privatize smp_mb__after_unlock_lock()") > > which removed the references to it from Documentation/memory-barriers.txt > > > Another thing is that its semantics are similar to smp_mb__after_spinlock() > > (itself badly documented), although slightly different. I'm not even completely > > sure how. I assume that smp_mb__after_spinlock() can be just used once to > > produce the required ordering and subsequent lock on that spinlock don't need > > to repeat the barrier to propagate the ordering against what is before the > > smp_mb__after_spinlock. However IUUC smp_mb__after_unlock_lock() has to be > > chained/repeated on all subsequent locking of the spinlock... > > IIUC (big if) the chaining is a requirement of RCU itself, per: > > 2a67e741bbbc ("rcu: Create transitive rnp->lock acquisition functions") > > * Because the rcu_nodes form a tree, the tree traversal locking will observe > * different lock values, this in turn means that an UNLOCK of one level > * followed by a LOCK of another level does not imply a full memory barrier; > * and most importantly transitivity is lost. > * > * In order to restore full ordering between tree levels, augment the regular > * lock acquire functions with smp_mb__after_unlock_lock(). > I know my remark may seem a little biased, ;-) but the semantics of smp_mb__after_unlock_lock() and smp_mb__after_spinlock() have been somehowr/formally documented in the LKMM. This means, in particular, that one can write "litmus tests" with the barriers at stake and then "run"/check such tests against the _current model. For example, (based on inline comments in include/linux/spinlock.h) $ cat after_spinlock.litmus C after_spinlock { } P0(int *x, spinlock_t *s) { spin_lock(s); WRITE_ONCE(*x, 1); spin_unlock(s); } P1(int *x, int *y, spinlock_t *s) { int r0; spin_lock(s); smp_mb__after_spinlock(); r0 = READ_ONCE(*x); WRITE_ONCE(*y, 1); spin_unlock(s); } P2(int *x, int *y) { int r1; int r2; r1 = READ_ONCE(*y); smp_rmb(); r2 = READ_ONCE(*x); } exists (1:r0=1 /\ 2:r1=1 /\ 2:r2=0) $ herd7 -conf linux-kernel.cfg after_spinlock.litmus Test after_spinlock Allowed States 7 1:r0=0; 2:r1=0; 2:r2=0; 1:r0=0; 2:r1=0; 2:r2=1; 1:r0=0; 2:r1=1; 2:r2=0; 1:r0=0; 2:r1=1; 2:r2=1; 1:r0=1; 2:r1=0; 2:r2=0; 1:r0=1; 2:r1=0; 2:r2=1; 1:r0=1; 2:r1=1; 2:r2=1; No Witnesses Positive: 0 Negative: 7 Condition exists (1:r0=1 /\ 2:r1=1 /\ 2:r2=0) Observation after_spinlock Never 0 7 Time after_spinlock 0.01 Hash=b377bde8fe3565fcdd0eb2bdfaf3351e Notice that, according to the current model at least, the state in the above "exists" clause remains forbidden _after removal of the smp_mb__after_spinlock() barrier. In this sense, if you want, the inline comment (I contributed to) is misleading/incomplete. :-/ Andrea
Le Fri, May 17, 2024 at 09:29:14AM +0200, Andrea Parri a écrit : > I know my remark may seem a little biased, ;-) but the semantics of > smp_mb__after_unlock_lock() and smp_mb__after_spinlock() have been > somehowr/formally documented in the LKMM. This means, in particular, > that one can write "litmus tests" with the barriers at stake and then > "run"/check such tests against the _current model. > > For example, (based on inline comments in include/linux/spinlock.h) > > $ cat after_spinlock.litmus > C after_spinlock > > { } > > P0(int *x, spinlock_t *s) > { > spin_lock(s); > WRITE_ONCE(*x, 1); > spin_unlock(s); > } > > P1(int *x, int *y, spinlock_t *s) > { > int r0; > > spin_lock(s); > smp_mb__after_spinlock(); > r0 = READ_ONCE(*x); > WRITE_ONCE(*y, 1); > spin_unlock(s); > } > > P2(int *x, int *y) > { > int r1; > int r2; > > r1 = READ_ONCE(*y); > smp_rmb(); > r2 = READ_ONCE(*x); > } > > exists (1:r0=1 /\ 2:r1=1 /\ 2:r2=0) > > $ herd7 -conf linux-kernel.cfg after_spinlock.litmus > Test after_spinlock Allowed > States 7 > 1:r0=0; 2:r1=0; 2:r2=0; > 1:r0=0; 2:r1=0; 2:r2=1; > 1:r0=0; 2:r1=1; 2:r2=0; > 1:r0=0; 2:r1=1; 2:r2=1; > 1:r0=1; 2:r1=0; 2:r2=0; > 1:r0=1; 2:r1=0; 2:r2=1; > 1:r0=1; 2:r1=1; 2:r2=1; > No > Witnesses > Positive: 0 Negative: 7 > Condition exists (1:r0=1 /\ 2:r1=1 /\ 2:r2=0) > Observation after_spinlock Never 0 7 > Time after_spinlock 0.01 > Hash=b377bde8fe3565fcdd0eb2bdfaf3351e > > Notice that, according to the current model at least, the state in > the above "exists" clause remains forbidden _after removal of the > smp_mb__after_spinlock() barrier. In this sense, if you want, the > inline comment (I contributed to) is misleading/incomplete. :-/ Z6.0+pooncelock+poonceLock+pombonce.litmus shows an example of how full ordering is subtely incomplete without smp_mb__after_spinlock(). But still, smp_mb__after_unlock_lock() is supposed to be weaker than smp_mb__after_spinlock() and yet I'm failing to produce a litmus test that is successfull with the latter and fails with the former. For example, and assuming smp_mb__after_unlock_lock() is expected to be chained across locking, here is a litmus test inspired by Z6.0+pooncelock+poonceLock+pombonce.litmus that never observes the condition even though I would expect it should, as opposed to using smp_mb__after_spinlock(): C smp_mb__after_unlock_lock {} P0(int *w, int *x, spinlock_t *mylock) { spin_lock(mylock); WRITE_ONCE(*w, 1); WRITE_ONCE(*x, 1); spin_unlock(mylock); } P1(int *x, int *y, spinlock_t *mylock) { int r0; spin_lock(mylock); smp_mb__after_unlock_lock(); r0 = READ_ONCE(*x); WRITE_ONCE(*y, 1); spin_unlock(mylock); } P2(int *y, int *z, spinlock_t *mylock) { int r0; spin_lock(mylock); r0 = READ_ONCE(*y); WRITE_ONCE(*z, 1); spin_unlock(mylock); } P3(int *w, int *z) { int r1; WRITE_ONCE(*z, 2); smp_mb(); r1 = READ_ONCE(*w); } exists (1:r0=1 /\ 2:r0=1 /\ z=2 /\ 3:r1=0)
> Z6.0+pooncelock+poonceLock+pombonce.litmus shows an example of > how full ordering is subtely incomplete without smp_mb__after_spinlock(). > > But still, smp_mb__after_unlock_lock() is supposed to be weaker than > smp_mb__after_spinlock() and yet I'm failing to produce a litmus test > that is successfull with the latter and fails with the former. smp_mb__after_unlock_lock() is a nop without a matching unlock-lock; smp_mb__after_spinlock() not quite... C after_spinlock__vs__after_unlock_lock { } P0(int *x, int *y, spinlock_t *s) { int r0; WRITE_ONCE(*x, 1); spin_lock(s); smp_mb__after_spinlock(); r0 = READ_ONCE(*y); spin_unlock(s); } P1(int *x, int *y) { int r1; WRITE_ONCE(*y, 1); smp_mb(); r1 = READ_ONCE(*x); } exists (0:r0=0 /\ 1:r1=0) > For example, and assuming smp_mb__after_unlock_lock() is expected to be > chained across locking, here is a litmus test inspired by > Z6.0+pooncelock+poonceLock+pombonce.litmus that never observes the condition > even though I would expect it should, as opposed to using > smp_mb__after_spinlock(): > > C smp_mb__after_unlock_lock > > {} > > P0(int *w, int *x, spinlock_t *mylock) > { > spin_lock(mylock); > WRITE_ONCE(*w, 1); > WRITE_ONCE(*x, 1); > spin_unlock(mylock); > } > > P1(int *x, int *y, spinlock_t *mylock) > { > int r0; > > spin_lock(mylock); > smp_mb__after_unlock_lock(); > r0 = READ_ONCE(*x); > WRITE_ONCE(*y, 1); > spin_unlock(mylock); > } > > P2(int *y, int *z, spinlock_t *mylock) > { > int r0; > > spin_lock(mylock); > r0 = READ_ONCE(*y); > WRITE_ONCE(*z, 1); > spin_unlock(mylock); > } > > P3(int *w, int *z) > { > int r1; > > WRITE_ONCE(*z, 2); > smp_mb(); > r1 = READ_ONCE(*w); > } > > exists (1:r0=1 /\ 2:r0=1 /\ z=2 /\ 3:r1=0) Here's an informal argument to explain this outcome. It is not the only according to the LKMM, but the first that came to my mind. And this is longer than I wished. TL; DR: Full barriers are strong, really strong. Remark full memory barriers share the following "strong-fence property": A ->full-barrier B implies (SFP) A propagates (aka, is visible) to _every CPU before B executes (cf. tools/memory-model/Documentation/explanation.txt for details about the concepts of "propagation" and "execution"). For example, in the snippet above, P0:WRITE_ONCE(*w, 1) ->full-barrier P1:spin_unlock(mylock) since P0:spin_unlock(mylock) ->reads-from P1:spin_lock(mylock) ->program-order P1:smp_mb__after_unlock_lock() acts as a full memory barrier. (1:r0=1 and 2:r0=1 together determine the so called critical-sections' order (CSO).) By contradiction, 1) P0:WRITE_ONCE(*w, 1) propagates to P3 before P1:spin_unlock(mylock) executes (SFP) 2) P1:spin_unlock(mylock) executes before P2:spin_lock(mylock) executes (CSO) 3) P2:spin_lock(mylock) executes before P2:WRITE_ONCE(*z, 1) executes (P2:spin_lock() is an ACQUIRE op) 4) P2:WRITE_ONCE(*z, 1) executes before P2:WRITE_ONCE(*z, 1) propagates P3 (intuitively, a store is visible to the local CPU before being visible to a remote CPU) 5) P2:WRITE_ONCE(*z, 1) propagates to P3 before P3:WRITE_ONCE(*z, 2) executes (z=2) 6) P3:WRITE_ONCE(*z, 2) executes before P3:WRITE_ONCE(*z, 2) propagates to P0 (a store is visible to the local CPU before being visible to a remote CPU) 7) P3:WRITE_ONCE(*z, 2) propagates to P0 before P3:READ_ONCE(*w) executes (SFP) 8) P3:READ_ONCE(*w) executes before P0:WRITE_ONCE(*w, 1) propagates to P3 (3:r1=0) Put together, (1-8) gives: P0:WRITE_ONCE(*w, 1) propagates to P3 before P0:WRITE_ONCE(*w, 1) propagates to P3 an absurd. Andrea
diff --git a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst index 5750f125361b..728b1e690c64 100644 --- a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst +++ b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst @@ -149,9 +149,9 @@ This case is handled by calls to the strongly ordered ``atomic_add_return()`` read-modify-write atomic operation that is invoked within ``rcu_dynticks_eqs_enter()`` at idle-entry time and within ``rcu_dynticks_eqs_exit()`` at idle-exit time. -The grace-period kthread invokes ``rcu_dynticks_snap()`` and -``rcu_dynticks_in_eqs_since()`` (both of which invoke -an ``atomic_add_return()`` of zero) to detect idle CPUs. +The grace-period kthread invokes first ``ct_dynticks_cpu_acquire()`` +(preceded by a full memory barrier) and ``rcu_dynticks_in_eqs_since()`` +(both of which rely on acquire semantics) to detect idle CPUs. +-----------------------------------------------------------------------+ | **Quick Quiz**: | diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 58415cdc54f8..f5354de5644b 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -773,7 +773,12 @@ static void rcu_gpnum_ovf(struct rcu_node *rnp, struct rcu_data *rdp) */ static int dyntick_save_progress_counter(struct rcu_data *rdp) { - rdp->dynticks_snap = rcu_dynticks_snap(rdp->cpu); + /* + * Full ordering against accesses prior current GP and also against + * current GP sequence number is enforced by current rnp locking + * with chained smp_mb__after_unlock_lock(). + */ + rdp->dynticks_snap = ct_dynticks_cpu_acquire(rdp->cpu); if (rcu_dynticks_in_eqs(rdp->dynticks_snap)) { trace_rcu_fqs(rcu_state.name, rdp->gp_seq, rdp->cpu, TPS("dti")); rcu_gpnum_ovf(rdp->mynode, rdp);
When the grace period kthread checks the extended quiescent state counter of a CPU, full ordering is necessary to ensure that either: * If the GP kthread observes the remote target in an extended quiescent state, then that target must observe all accesses prior to the current grace period, including the current grace period sequence number, once it exits that extended quiescent state. or: * If the GP kthread observes the remote target NOT in an extended quiescent state, then the target further entering in an extended quiescent state must observe all accesses prior to the current grace period, including the current grace period sequence number, once it enters that extended quiescent state. This ordering is enforced through a full memory barrier placed right before taking the first EQS snapshot. However this is superfluous because the snapshot is taken while holding the target's rnp lock which provides the necessary ordering through its chain of smp_mb__after_unlock_lock(). Remove the needless explicit barrier before the snapshot and put a comment about the implicit barrier newly relied upon here. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> --- .../Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst | 6 +++--- kernel/rcu/tree.c | 7 ++++++- 2 files changed, 9 insertions(+), 4 deletions(-)