Message ID | 20241104151230.3107133-1-zilinguan811@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | rcu: Use READ_ONCE() for rdp->gpwrap access in __note_gp_changes() | expand |
On Mon, Nov 04, 2024 at 03:12:30PM +0000, Zilin Guan wrote: > In function __note_gp_changes(), rdp->gpwrap is read using READ_ONCE() > in line 1307: > > 1307 if (IS_ENABLED(CONFIG_PROVE_RCU) && READ_ONCE(rdp->gpwrap)) > 1308 WRITE_ONCE(rdp->last_sched_clock, jiffies); > > while read directly in line 1305: > > 1305 if (ULONG_CMP_LT(rdp->gp_seq_needed, rnp->gp_seq_needed) || > rdp->gpwrap) > 1306 WRITE_ONCE(rdp->gp_seq_needed, rnp->gp_seq_needed); > > In the same environment, reads in two places should have the same > protection. > > Signed-off-by: Zilin Guan <zilinguan811@gmail.com> Good eyes!!! But did you find this with KCSAN, or by visual inspection? The reason that I ask is that the __note_gp_changes() should be invoked with the leaf rnp->lock held, which should exclude writes to the rdp->gpwrap fields for all CPUs corresponding to that leaf rcu_node structure. Note the raw_lockdep_assert_held_rcu_node(rnp) call at the beginning of this function. So I believe that the proper fix is to *remove* READ_ONCE() from accesses to rdp->gpwrap in this function. Or am I missing something here? Thanx, Paul > --- > kernel/rcu/tree.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index b1f883fcd918..d3e2b420dce5 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -1302,7 +1302,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp) > zero_cpu_stall_ticks(rdp); > } > rdp->gp_seq = rnp->gp_seq; /* Remember new grace-period state. */ > - if (ULONG_CMP_LT(rdp->gp_seq_needed, rnp->gp_seq_needed) || rdp->gpwrap) > + if (ULONG_CMP_LT(rdp->gp_seq_needed, rnp->gp_seq_needed) || READ_ONCE(rdp->gpwrap)) > WRITE_ONCE(rdp->gp_seq_needed, rnp->gp_seq_needed); > if (IS_ENABLED(CONFIG_PROVE_RCU) && READ_ONCE(rdp->gpwrap)) > WRITE_ONCE(rdp->last_sched_clock, jiffies); > -- > 2.34.1 >
On Wed, Nov 06, 2024 at 12:18:25PM -0800, Paul E. McKenney wrote: > Good eyes!!! > > But did you find this with KCSAN, or by visual inspection? > > The reason that I ask is that the __note_gp_changes() should be > invoked with the leaf rnp->lock held, which should exclude writes to > the rdp->gpwrap fields for all CPUs corresponding to that leaf rcu_node > structure. > > Note the raw_lockdep_assert_held_rcu_node(rnp) call at the beginning of > this function. > > So I believe that the proper fix is to *remove* READ_ONCE() from accesses > to rdp->gpwrap in this function. > > Or am I missing something here? > > Thanx, Paul I found this by visual inspection. When reviewing the function __note_gp_changes(), I noticed that other accesses to rdp->gpwrap are protected with either READ_ONCE() or WRITE_ONCE(), which led me to suspect a potential data race at line 1305. However, I am not certain whether holding rnp->lock protects access to rdp->gpwrap in this case. If it indeed ensures that no concurrent writes can occur, then I agree that the correct approach would be to remove READ_ONCE() from those accesses. Thanks, Zilin
On Thu, Nov 07, 2024 at 02:01:17PM +0000, Zilin Guan wrote: > On Wed, Nov 06, 2024 at 12:18:25PM -0800, Paul E. McKenney wrote: > > Good eyes!!! > > > > But did you find this with KCSAN, or by visual inspection? > > > > The reason that I ask is that the __note_gp_changes() should be > > invoked with the leaf rnp->lock held, which should exclude writes to > > the rdp->gpwrap fields for all CPUs corresponding to that leaf rcu_node > > structure. > > > > Note the raw_lockdep_assert_held_rcu_node(rnp) call at the beginning of > > this function. > > > > So I believe that the proper fix is to *remove* READ_ONCE() from accesses > > to rdp->gpwrap in this function. > > > > Or am I missing something here? > > > > Thanx, Paul > > I found this by visual inspection. Good eyes! ;-) > When reviewing the function __note_gp_changes(), I noticed that other > accesses to rdp->gpwrap are protected with either READ_ONCE() or > WRITE_ONCE(), which led me to suspect a potential data race at line 1305. > > However, I am not certain whether holding rnp->lock protects access to > rdp->gpwrap in this case. If it indeed ensures that no concurrent writes > can occur, then I agree that the correct approach would be to remove > READ_ONCE() from those accesses. One way to check this is via inspection of all the updates to the ->gpwrap field. Another approach is to run KCSAN, for example, from the top-level directory of the Linux-kernel source tree on a system with qemu/KVM enabled: tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 30m --configs "4*TREE03" --kconfigs "CONFIG_NR_CPUS=4" --kcsan --trust-make This particular command is set up for my 16-CPU laptop. You can of course adjust the "4*" and the "=4" to match your hardware. For example, on a 64-CPU system you might instead do this: tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 30m --configs "8*TREE03" --kconfigs "CONFIG_NR_CPUS=8" --kcsan --trust-make Please see Documentation/dev-tools/kcsan.rst for information on how to interpret KCSAN reports. This will find false positives in the non-RCU portions of the kernel, so you should look for reports involving __note_gp_changes() and/or its callers (inlining and all that). So why not try it? ;-) Thanx, Paul
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index b1f883fcd918..d3e2b420dce5 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1302,7 +1302,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp) zero_cpu_stall_ticks(rdp); } rdp->gp_seq = rnp->gp_seq; /* Remember new grace-period state. */ - if (ULONG_CMP_LT(rdp->gp_seq_needed, rnp->gp_seq_needed) || rdp->gpwrap) + if (ULONG_CMP_LT(rdp->gp_seq_needed, rnp->gp_seq_needed) || READ_ONCE(rdp->gpwrap)) WRITE_ONCE(rdp->gp_seq_needed, rnp->gp_seq_needed); if (IS_ENABLED(CONFIG_PROVE_RCU) && READ_ONCE(rdp->gpwrap)) WRITE_ONCE(rdp->last_sched_clock, jiffies);
In function __note_gp_changes(), rdp->gpwrap is read using READ_ONCE() in line 1307: 1307 if (IS_ENABLED(CONFIG_PROVE_RCU) && READ_ONCE(rdp->gpwrap)) 1308 WRITE_ONCE(rdp->last_sched_clock, jiffies); while read directly in line 1305: 1305 if (ULONG_CMP_LT(rdp->gp_seq_needed, rnp->gp_seq_needed) || rdp->gpwrap) 1306 WRITE_ONCE(rdp->gp_seq_needed, rnp->gp_seq_needed); In the same environment, reads in two places should have the same protection. Signed-off-by: Zilin Guan <zilinguan811@gmail.com> --- kernel/rcu/tree.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)