[25/48] rcu: Mark writes to rcu_sync ->gp_count field

Message ID	20240507093530.3043-26-urezki@gmail.com (mailing list archive)
State	New
Headers	show Received: from mail-lj1-f176.google.com (mail-lj1-f176.google.com [209.85.208.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B275515ECE1; Tue, 7 May 2024 09:35:58 +0000 (UTC) From: "Uladzislau Rezki (Sony)" <urezki@gmail.com> To: "Paul E . McKenney" <paulmck@kernel.org> Cc: RCU <rcu@vger.kernel.org>, Neeraj upadhyay <Neeraj.Upadhyay@amd.com>, Boqun Feng <boqun.feng@gmail.com>, Hillf Danton <hdanton@sina.com>, Joel Fernandes <joel@joelfernandes.org>, LKML <linux-kernel@vger.kernel.org>, Uladzislau Rezki <urezki@gmail.com>, Oleksiy Avramchenko <oleksiy.avramchenko@sony.com>, Frederic Weisbecker <frederic@kernel.org>, Oleg Nesterov <oleg@redhat.com>, Peter Zijlstra <peterz@infradead.org> Subject: [PATCH 25/48] rcu: Mark writes to rcu_sync ->gp_count field Date: Tue, 7 May 2024 11:35:07 +0200 Message-Id: <20240507093530.3043-26-urezki@gmail.com> In-Reply-To: <20240507093530.3043-1-urezki@gmail.com> References: <20240507093530.3043-1-urezki@gmail.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	RCU changes for v6.10 \| expand [00/48] RCU changes for v6.10 [01/48] scftorture: Increase memory provided to guest OS [02/48] rcutorture: Disable tracing to permit Tasks Rude RCU testing [03/48] rcu: Add lockdep checks and kernel-doc header to rcu_softirq_qs() [04/48] rcutorture: Enable RCU priority boosting for TREE09 [05/48] rcutorture: Dump # online CPUs on insufficient cb-flood laundering [06/48] rcutorture: Dump GP kthread state on insufficient cb-flood laundering [07/48] rcutorture: ASSERT_EXCLUSIVE_WRITER() for ->rtort_pipe_count updates [08/48] rcu-tasks: Make Tasks RCU wait idly for grace-period delays [09/48] doc: Remove references to arrayRCU.rst [10/48] MAINTAINERS: Update Neeraj's email address [11/48] bpf: Select new NEED_TASKS_RCU Kconfig option [12/48] arch: Select new NEED_TASKS_RCU Kconfig option [13/48] rcu: Add data structures for synchronize_rcu() [14/48] tracing: Select new NEED_TASKS_RCU Kconfig option [15/48] bpf: Choose RCU Tasks based on TASKS_RCU rather than PREEMPTION [16/48] ftrace: Choose RCU Tasks based on TASKS_RCU rather than PREEMPTION [17/48] rcu: Update lockdep while in RCU read-side critical section [18/48] rcu: Make TINY_RCU depend on !PREEMPT_RCU rather than !PREEMPTION [19/48] srcu: Make Tiny SRCU explicitly disable preemption [20/48] rcu: Create NEED_TASKS_RCU to factor out enablement logic [21/48] rcu: Remove redundant BH disabling in TINY_RCU [22/48] rcu: Make Tiny RCU explicitly disable preemption [23/48] rcu: Remove redundant READ_ONCE() of rcu_state.gp_flags in tree.c [24/48] rcu: Bring diagnostic read of rcu_state.gp_flags into alignment [25/48] rcu: Mark writes to rcu_sync ->gp_count field [26/48] rcu: Mark loads from rcu_state.n_online_cpus [27/48] rcu: Make hotplug operations track GP state, not flags [28/48] rcu: Inform KCSAN of one-byte cmpxchg() in rcu_trc_cmpxchg_need_qs() [29/48] rcu: Remove redundant CONFIG_PROVE_RCU #if condition [30/48] rcu-tasks: Replace exit_tasks_rcu_start() initialization with WARN_ON_ONCE() [31/48] rcu-tasks: Fix the comments for tasks_rcu_exit_srcu_stall_timer [32/48] rcu-tasks: Fix show_rcu_tasks_trace_gp_kthread buffer overflow [33/48] rcu: Mollify sparse with RCU guard [34/48] rcu: Fix buffer overflow in print_cpu_stall_info() [35/48] rcu: Reduce synchronize_rcu() latency [36/48] rcu: Add a trace event for synchronize_rcu_normal() [37/48] rcu: Support direct wake-up of synchronize_rcu() users [38/48] rcu: Allocate WQ with WQ_MEM_RECLAIM bit set [39/48] rcutorture: Remove extraneous rcu_torture_pipe_update_one() READ_ONCE() [40/48] rcutorture: Fix rcu_torture_one_read() pipe_count overflow comment [41/48] rcutorture: Re-use value stored to ->rtort_pipe_count instead of re-reading [42/48] rcutorture: Use the gp_kthread_dbg operation specified by cur_ops [43/48] rcutorture: Make rcutorture support print rcu-tasks gp state [44/48] rcutorture: Removing redundant function pointer initialization [45/48] rcutorture: Make stall-tasks directly exit when rcutorture tests end [46/48] rcutorture: Fix invalid context warning when enable srcu barrier testing [47/48] torture: Scale --do-kvfree test time [48/48] rcutorture: Use rcu_gp_slow_register/unregister() only for rcutype test

Uladzislau Rezki May 7, 2024, 9:35 a.m. UTC

From: "Paul E. McKenney" <paulmck@kernel.org>

The rcu_sync structure's ->gp_count field is updated under the protection
of ->rss_lock, but read locklessly, and KCSAN noted the data race.
This commit therefore uses WRITE_ONCE() to do this update to clearly
document its racy nature.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
 kernel/rcu/sync.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Oleg Nesterov May 7, 2024, 2:54 p.m. UTC | #1

Hello,

I feel I don't really like this patch but I am travelling without my working
laptop, can't read the source code ;) Quite possibly I am wrong, I'll return
to this when I get back on May 10.

Oleg.

On 05/07, Uladzislau Rezki (Sony) wrote:
>
> From: "Paul E. McKenney" <paulmck@kernel.org>
>
> The rcu_sync structure's ->gp_count field is updated under the protection
> of ->rss_lock, but read locklessly, and KCSAN noted the data race.
> This commit therefore uses WRITE_ONCE() to do this update to clearly
> document its racy nature.
>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> ---
>  kernel/rcu/sync.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c
> index 86df878a2fee..6c2bd9001adc 100644
> --- a/kernel/rcu/sync.c
> +++ b/kernel/rcu/sync.c
> @@ -122,7 +122,7 @@ void rcu_sync_enter(struct rcu_sync *rsp)
>  		 * we are called at early boot time but this shouldn't happen.
>  		 */
>  	}
> -	rsp->gp_count++;
> +	WRITE_ONCE(rsp->gp_count, rsp->gp_count + 1);
>  	spin_unlock_irq(&rsp->rss_lock);
>
>  	if (gp_state == GP_IDLE) {
> @@ -151,11 +151,15 @@ void rcu_sync_enter(struct rcu_sync *rsp)
>   */
>  void rcu_sync_exit(struct rcu_sync *rsp)
>  {
> +	int gpc;
> +
>  	WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_IDLE);
>  	WARN_ON_ONCE(READ_ONCE(rsp->gp_count) == 0);
>
>  	spin_lock_irq(&rsp->rss_lock);
> -	if (!--rsp->gp_count) {
> +	gpc = rsp->gp_count - 1;
> +	WRITE_ONCE(rsp->gp_count, gpc);
> +	if (!gpc) {
>  		if (rsp->gp_state == GP_PASSED) {
>  			WRITE_ONCE(rsp->gp_state, GP_EXIT);
>  			rcu_sync_call(rsp);
> --
> 2.39.2
>

Paul E. McKenney May 7, 2024, 5:44 p.m. UTC | #2

On Tue, May 07, 2024 at 10:54:41AM -0400, Oleg Nesterov wrote:
> Hello,
> 
> I feel I don't really like this patch but I am travelling without my working
> laptop, can't read the source code ;) Quite possibly I am wrong, I'll return
> to this when I get back on May 10.

By the stricter data-race rules used in RCU code [1], this is a bug that
needs to be fixed.  This code is updating ->gp_count, which is read
locklessly, which in turn results in a data race.  The fix is to mark
the updates (as below) with WRITE_ONCE().

Or is there something in one or the other of these updates to ->gp_count
that excludes lockless readers?  (I am not seeing it, but you know this
code way better than I do!)

							Thanx, Paul

[1] https://docs.google.com/document/d/1FwZaXSg3A55ivVoWffA9iMuhJ3_Gmj_E494dLYjjyLQ/edit?usp=sharing

> Oleg.
> 
> On 05/07, Uladzislau Rezki (Sony) wrote:
> >
> > From: "Paul E. McKenney" <paulmck@kernel.org>
> >
> > The rcu_sync structure's ->gp_count field is updated under the protection
> > of ->rss_lock, but read locklessly, and KCSAN noted the data race.
> > This commit therefore uses WRITE_ONCE() to do this update to clearly
> > document its racy nature.
> >
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > Cc: Oleg Nesterov <oleg@redhat.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > ---
> >  kernel/rcu/sync.c | 8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c
> > index 86df878a2fee..6c2bd9001adc 100644
> > --- a/kernel/rcu/sync.c
> > +++ b/kernel/rcu/sync.c
> > @@ -122,7 +122,7 @@ void rcu_sync_enter(struct rcu_sync *rsp)
> >  		 * we are called at early boot time but this shouldn't happen.
> >  		 */
> >  	}
> > -	rsp->gp_count++;
> > +	WRITE_ONCE(rsp->gp_count, rsp->gp_count + 1);
> >  	spin_unlock_irq(&rsp->rss_lock);
> >
> >  	if (gp_state == GP_IDLE) {
> > @@ -151,11 +151,15 @@ void rcu_sync_enter(struct rcu_sync *rsp)
> >   */
> >  void rcu_sync_exit(struct rcu_sync *rsp)
> >  {
> > +	int gpc;
> > +
> >  	WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_IDLE);
> >  	WARN_ON_ONCE(READ_ONCE(rsp->gp_count) == 0);
> >
> >  	spin_lock_irq(&rsp->rss_lock);
> > -	if (!--rsp->gp_count) {
> > +	gpc = rsp->gp_count - 1;
> > +	WRITE_ONCE(rsp->gp_count, gpc);
> > +	if (!gpc) {
> >  		if (rsp->gp_state == GP_PASSED) {
> >  			WRITE_ONCE(rsp->gp_state, GP_EXIT);
> >  			rcu_sync_call(rsp);
> > --
> > 2.39.2
> >
>

Oleg Nesterov May 9, 2024, 3:13 p.m. UTC | #3

On 05/07, Paul E. McKenney wrote:
>
> On Tue, May 07, 2024 at 10:54:41AM -0400, Oleg Nesterov wrote:
> > Hello,
> >
> > I feel I don't really like this patch but I am travelling without my working
> > laptop, can't read the source code ;) Quite possibly I am wrong, I'll return
> > to this when I get back on May 10.
>
> By the stricter data-race rules used in RCU code [1], this is a bug that
> needs to be fixed.

Now that I can read the code... Sorry, still can't understand.

> which is read locklessly,

Where???

OK, OK, we have

	// rcu_sync_exit()
	WARN_ON_ONCE(READ_ONCE(rsp->gp_count) == 0)

and

	// rcu_sync_dtor()
	WARN_ON_ONCE(READ_ONCE(rsp->gp_count));

other than that ->gp_count is always accessed under ->rss_lock.

And yes, at least WARN_ON_ONCE() in rcu_sync_exit() can obviously race with
rcu_sync_enter/exit which update gp_count. I think this is fine correctness-wise.

But OK, we need to please KCSAN (or is there another problem I missed ???)

We can move these WARN_ON()'s into the ->rss_lock protected section.

Or perhaps we can use data_race(rsp->gp_count) ? To be honest I thought
that READ_ONCE() should be enough...

Or we can simply kill these WARN_ON_ONCE()'s.

I don't understand why should we add more WRITE_ONCE()'s into the critical
section protected by ->rss_lock.

Help! ;)

Oleg.


which in turn results in a data race.  The fix is to mark
> the updates (as below) with WRITE_ONCE().
>
> Or is there something in one or the other of these updates to ->gp_count
> that excludes lockless readers?  (I am not seeing it, but you know this
> code way better than I do!)
>
> 							Thanx, Paul
>
> [1] https://docs.google.com/document/d/1FwZaXSg3A55ivVoWffA9iMuhJ3_Gmj_E494dLYjjyLQ/edit?usp=sharing
>
> > Oleg.
> >
> > On 05/07, Uladzislau Rezki (Sony) wrote:
> > >
> > > From: "Paul E. McKenney" <paulmck@kernel.org>
> > >
> > > The rcu_sync structure's ->gp_count field is updated under the protection
> > > of ->rss_lock, but read locklessly, and KCSAN noted the data race.
> > > This commit therefore uses WRITE_ONCE() to do this update to clearly
> > > document its racy nature.
> > >
> > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > Cc: Oleg Nesterov <oleg@redhat.com>
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > ---
> > >  kernel/rcu/sync.c | 8 ++++++--
> > >  1 file changed, 6 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c
> > > index 86df878a2fee..6c2bd9001adc 100644
> > > --- a/kernel/rcu/sync.c
> > > +++ b/kernel/rcu/sync.c
> > > @@ -122,7 +122,7 @@ void rcu_sync_enter(struct rcu_sync *rsp)
> > >  		 * we are called at early boot time but this shouldn't happen.
> > >  		 */
> > >  	}
> > > -	rsp->gp_count++;
> > > +	WRITE_ONCE(rsp->gp_count, rsp->gp_count + 1);
> > >  	spin_unlock_irq(&rsp->rss_lock);
> > >
> > >  	if (gp_state == GP_IDLE) {
> > > @@ -151,11 +151,15 @@ void rcu_sync_enter(struct rcu_sync *rsp)
> > >   */
> > >  void rcu_sync_exit(struct rcu_sync *rsp)
> > >  {
> > > +	int gpc;
> > > +
> > >  	WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_IDLE);
> > >  	WARN_ON_ONCE(READ_ONCE(rsp->gp_count) == 0);
> > >
> > >  	spin_lock_irq(&rsp->rss_lock);
> > > -	if (!--rsp->gp_count) {
> > > +	gpc = rsp->gp_count - 1;
> > > +	WRITE_ONCE(rsp->gp_count, gpc);
> > > +	if (!gpc) {
> > >  		if (rsp->gp_state == GP_PASSED) {
> > >  			WRITE_ONCE(rsp->gp_state, GP_EXIT);
> > >  			rcu_sync_call(rsp);
> > > --
> > > 2.39.2
> > >
> >
>

Paul E. McKenney May 10, 2024, 3:59 a.m. UTC | #4

On Thu, May 09, 2024 at 05:13:12PM +0200, Oleg Nesterov wrote:
> On 05/07, Paul E. McKenney wrote:
> >
> > On Tue, May 07, 2024 at 10:54:41AM -0400, Oleg Nesterov wrote:
> > > Hello,
> > >
> > > I feel I don't really like this patch but I am travelling without my working
> > > laptop, can't read the source code ;) Quite possibly I am wrong, I'll return
> > > to this when I get back on May 10.
> >
> > By the stricter data-race rules used in RCU code [1], this is a bug that
> > needs to be fixed.
> 
> Now that I can read the code... Sorry, still can't understand.
> 
> > which is read locklessly,
> 
> Where???
> 
> OK, OK, we have
> 
> 	// rcu_sync_exit()
> 	WARN_ON_ONCE(READ_ONCE(rsp->gp_count) == 0)
> 
> and
> 
> 	// rcu_sync_dtor()
> 	WARN_ON_ONCE(READ_ONCE(rsp->gp_count));
> 
> other than that ->gp_count is always accessed under ->rss_lock.
> 
> And yes, at least WARN_ON_ONCE() in rcu_sync_exit() can obviously race with
> rcu_sync_enter/exit which update gp_count. I think this is fine correctness-wise.
> 
> But OK, we need to please KCSAN (or is there another problem I missed ???)
> 
> We can move these WARN_ON()'s into the ->rss_lock protected section.
> 
> Or perhaps we can use data_race(rsp->gp_count) ? To be honest I thought
> that READ_ONCE() should be enough...
> 
> Or we can simply kill these WARN_ON_ONCE()'s.

Or we could move those WARN_ON_ONCE() under the lock.  If this would
be a lock-contention issue, we could condition them with something like
IS_ENABLED(CONFIG_PROVE_RCU).  Then all accesses to those variables would
always be protected by the lock, and the WRITE_ONCE() and READ_ONCE()
calls could be dropped.  (Or am I missing another lockless access?)

Which would have the further advantage that if anyone accessed these
without holding the lock, KCSAN would complain.

> I don't understand why should we add more WRITE_ONCE()'s into the critical
> section protected by ->rss_lock.

There are indeed several ways to fix this.  Which would you prefer?

> Help! ;)

;-) ;-) ;-)

							Thanx, Paul

> Oleg.
> 
> 
> which in turn results in a data race.  The fix is to mark
> > the updates (as below) with WRITE_ONCE().
> >
> > Or is there something in one or the other of these updates to ->gp_count
> > that excludes lockless readers?  (I am not seeing it, but you know this
> > code way better than I do!)
> >
> > 							Thanx, Paul
> >
> > [1] https://docs.google.com/document/d/1FwZaXSg3A55ivVoWffA9iMuhJ3_Gmj_E494dLYjjyLQ/edit?usp=sharing
> >
> > > Oleg.
> > >
> > > On 05/07, Uladzislau Rezki (Sony) wrote:
> > > >
> > > > From: "Paul E. McKenney" <paulmck@kernel.org>
> > > >
> > > > The rcu_sync structure's ->gp_count field is updated under the protection
> > > > of ->rss_lock, but read locklessly, and KCSAN noted the data race.
> > > > This commit therefore uses WRITE_ONCE() to do this update to clearly
> > > > document its racy nature.
> > > >
> > > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > > Cc: Oleg Nesterov <oleg@redhat.com>
> > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > ---
> > > >  kernel/rcu/sync.c | 8 ++++++--
> > > >  1 file changed, 6 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c
> > > > index 86df878a2fee..6c2bd9001adc 100644
> > > > --- a/kernel/rcu/sync.c
> > > > +++ b/kernel/rcu/sync.c
> > > > @@ -122,7 +122,7 @@ void rcu_sync_enter(struct rcu_sync *rsp)
> > > >  		 * we are called at early boot time but this shouldn't happen.
> > > >  		 */
> > > >  	}
> > > > -	rsp->gp_count++;
> > > > +	WRITE_ONCE(rsp->gp_count, rsp->gp_count + 1);
> > > >  	spin_unlock_irq(&rsp->rss_lock);
> > > >
> > > >  	if (gp_state == GP_IDLE) {
> > > > @@ -151,11 +151,15 @@ void rcu_sync_enter(struct rcu_sync *rsp)
> > > >   */
> > > >  void rcu_sync_exit(struct rcu_sync *rsp)
> > > >  {
> > > > +	int gpc;
> > > > +
> > > >  	WARN_ON_ONCE(READ_ONCE(rsp->gp_state) == GP_IDLE);
> > > >  	WARN_ON_ONCE(READ_ONCE(rsp->gp_count) == 0);
> > > >
> > > >  	spin_lock_irq(&rsp->rss_lock);
> > > > -	if (!--rsp->gp_count) {
> > > > +	gpc = rsp->gp_count - 1;
> > > > +	WRITE_ONCE(rsp->gp_count, gpc);
> > > > +	if (!gpc) {
> > > >  		if (rsp->gp_state == GP_PASSED) {
> > > >  			WRITE_ONCE(rsp->gp_state, GP_EXIT);
> > > >  			rcu_sync_call(rsp);
> > > > --
> > > > 2.39.2
> > > >
> > >
> >
>

Oleg Nesterov May 10, 2024, 11:31 a.m. UTC | #5

On 05/09, Paul E. McKenney wrote:
>
> On Thu, May 09, 2024 at 05:13:12PM +0200, Oleg Nesterov wrote:
> >
> > We can move these WARN_ON()'s into the ->rss_lock protected section.
> >
> > Or perhaps we can use data_race(rsp->gp_count) ? To be honest I thought
> > that READ_ONCE() should be enough...
> >
> > Or we can simply kill these WARN_ON_ONCE()'s.
>
> Or we could move those WARN_ON_ONCE() under the lock.

Sure, see above.

But could you help me to understand this magic? I naively thought
that READ_ONCE() is always "safe"...

So, unless I am totally confused it turns out that, say,

	CPU 0			CPU 1
	-----			-----

	spin_lock(LOCK);
	++X;			READ_ONCE(X); // data race
	spin_unlock(LOCK);

is data-racy accoring to KCSAN, while

	CPU 0			CPU 1
	-----			-----

	spin_lock(LOCK);
	WRITE_ONCE(X, X+1);	READ_ONCE(X); // no data race
	spin_unlock(LOCK);

is not.

Why is that?

Trying to read Documentation/dev-tools/kcsan.rst... it says

	KCSAN is aware of *marked atomic operations* (``READ_ONCE``, WRITE_ONCE``,

	...

	if all accesses to a variable that is accessed concurrently are properly
	marked, KCSAN will never trigger a watchpoint

but how can KCSAN detect that all accesses to X are properly marked? I see nothing
KCSAN-related in the definition of WRITE_ONCE() or READ_ONCE().

And what does the "all accesses" above actually mean? The 2nd version does

	WRITE_ONCE(X, X+1);

but "X + 1" is the plain/unmarked access?

Thanks,

Oleg.

Oleg Nesterov May 10, 2024, 1:18 p.m. UTC | #6

On 05/07, Paul E. McKenney wrote:
>
> By the stricter data-race rules used in RCU code [1],
...
> [1] https://docs.google.com/document/d/1FwZaXSg3A55ivVoWffA9iMuhJ3_Gmj_E494dLYjjyLQ/edit?usp=sharing

I am getting more and more confused...

Does this mean that KCSAN/etc treats the files in kernel/rcu/
differently than the "Rest of Kernel"? Or what?

And how is it enforced?

Oleg.

Oleg Nesterov May 10, 2024, 1:50 p.m. UTC | #7

On 05/10, Oleg Nesterov wrote:
>
> On 05/07, Paul E. McKenney wrote:
> >
> > By the stricter data-race rules used in RCU code [1],
> ...
> > [1] https://docs.google.com/document/d/1FwZaXSg3A55ivVoWffA9iMuhJ3_Gmj_E494dLYjjyLQ/edit?usp=sharing
>
> I am getting more and more confused...
>
> Does this mean that KCSAN/etc treats the files in kernel/rcu/
> differently than the "Rest of Kernel"? Or what?
>
> And how is it enforced?

I can only find the strnstr(buf, "rcu") checks in skip_report(),
but they only cover the KCSAN_REPORT_VALUE_CHANGE_ONLY case...

Oleg.

Paul E. McKenney May 10, 2024, 1:58 p.m. UTC | #8

On Fri, May 10, 2024 at 03:18:50PM +0200, Oleg Nesterov wrote:
> On 05/07, Paul E. McKenney wrote:
> >
> > By the stricter data-race rules used in RCU code [1],
> ...
> > [1] https://docs.google.com/document/d/1FwZaXSg3A55ivVoWffA9iMuhJ3_Gmj_E494dLYjjyLQ/edit?usp=sharing
> 
> I am getting more and more confused...
> 
> Does this mean that KCSAN/etc treats the files in kernel/rcu/
> differently than the "Rest of Kernel"? Or what?

Yes.

> And how is it enforced?

By me running rcutorture with KCSAN with the Kconfig options listed in
that docusment.

							Thanx, Paul

Alan Huang May 10, 2024, 2 p.m. UTC | #9

> 2024年5月10日 19:31，Oleg Nesterov <oleg@redhat.com> wrote：
> 
> On 05/09, Paul E. McKenney wrote:
>> 
>> On Thu, May 09, 2024 at 05:13:12PM +0200, Oleg Nesterov wrote:
>>> 
>>> We can move these WARN_ON()'s into the ->rss_lock protected section.
>>> 
>>> Or perhaps we can use data_race(rsp->gp_count) ? To be honest I thought
>>> that READ_ONCE() should be enough...
>>> 
>>> Or we can simply kill these WARN_ON_ONCE()'s.
>> 
>> Or we could move those WARN_ON_ONCE() under the lock.
> 
> Sure, see above.
> 
> But could you help me to understand this magic? I naively thought
> that READ_ONCE() is always "safe"...
> 
> So, unless I am totally confused it turns out that, say,
> 
> CPU 0 CPU 1
> ----- -----
> 
> spin_lock(LOCK);
> ++X; READ_ONCE(X); // data race
> spin_unlock(LOCK);
> 
> is data-racy accoring to KCSAN, while
> 
> CPU 0 CPU 1
> ----- -----
> 
> spin_lock(LOCK);
> WRITE_ONCE(X, X+1); READ_ONCE(X); // no data race
> spin_unlock(LOCK);
> 
> is not.
> 
> Why is that?
> 
> Trying to read Documentation/dev-tools/kcsan.rst... it says
> 
> KCSAN is aware of *marked atomic operations* (``READ_ONCE``, WRITE_ONCE``,
> 
> ...
> 
> if all accesses to a variable that is accessed concurrently are properly
> marked, KCSAN will never trigger a watchpoint
> 
> but how can KCSAN detect that all accesses to X are properly marked? I see nothing
> KCSAN-related in the definition of WRITE_ONCE() or READ_ONCE().
> 
> And what does the "all accesses" above actually mean? The 2nd version does
> 
> WRITE_ONCE(X, X+1);
> 
> but "X + 1" is the plain/unmarked access?

X + 1 and READ_ONCE(X) are two read.

> 
> Thanks,
> 
> Oleg.
> 
>

Paul E. McKenney May 10, 2024, 2:04 p.m. UTC | #10

On Fri, May 10, 2024 at 01:31:49PM +0200, Oleg Nesterov wrote:
> On 05/09, Paul E. McKenney wrote:
> >
> > On Thu, May 09, 2024 at 05:13:12PM +0200, Oleg Nesterov wrote:
> > >
> > > We can move these WARN_ON()'s into the ->rss_lock protected section.
> > >
> > > Or perhaps we can use data_race(rsp->gp_count) ? To be honest I thought
> > > that READ_ONCE() should be enough...
> > >
> > > Or we can simply kill these WARN_ON_ONCE()'s.
> >
> > Or we could move those WARN_ON_ONCE() under the lock.
> 
> Sure, see above.
> 
> But could you help me to understand this magic? I naively thought
> that READ_ONCE() is always "safe"...
> 
> So, unless I am totally confused it turns out that, say,
> 
> 	CPU 0			CPU 1
> 	-----			-----
> 
> 	spin_lock(LOCK);
> 	++X;			READ_ONCE(X); // data race
> 	spin_unlock(LOCK);
> 
> is data-racy accoring to KCSAN, while
> 
> 	CPU 0			CPU 1
> 	-----			-----
> 
> 	spin_lock(LOCK);
> 	WRITE_ONCE(X, X+1);	READ_ONCE(X); // no data race
> 	spin_unlock(LOCK);
> 
> is not.

Agreed, in RCU code.

> Why is that?

Because I run KCSAN on RCU using Kconfig options that cause KCSAN
to be more strict.

> Trying to read Documentation/dev-tools/kcsan.rst... it says
> 
> 	KCSAN is aware of *marked atomic operations* (``READ_ONCE``, WRITE_ONCE``,
> 
> 	...
> 
> 	if all accesses to a variable that is accessed concurrently are properly
> 	marked, KCSAN will never trigger a watchpoint
> 
> but how can KCSAN detect that all accesses to X are properly marked? I see nothing
> KCSAN-related in the definition of WRITE_ONCE() or READ_ONCE().

The trick is that KCSAN sees the volatile cast that both READ_ONCE()
and WRITE_ONCE() use.

> And what does the "all accesses" above actually mean? The 2nd version does
> 
> 	WRITE_ONCE(X, X+1);
> 
> but "X + 1" is the plain/unmarked access?

That would be the correct usage in RCU code if there were lockless
accesses to X, which would use READ_ONCE(), but a lock was held across
that WRITE_ONCE() such that there would be no concurrent updates of X.
In that case, the "X+1" cannot be involved in a data race, so KCSAN
won't complain.

But if all accesses to X were protected by an exclusive lock, then there
would be no data races involving X, and thus no marking of any accesses
to X.  Which would allow KCSAN to detect buggy lockless accesses to X.

							Thanx, Paul

Paul E. McKenney May 10, 2024, 2:11 p.m. UTC | #11

On Fri, May 10, 2024 at 03:50:57PM +0200, Oleg Nesterov wrote:
> On 05/10, Oleg Nesterov wrote:
> >
> > On 05/07, Paul E. McKenney wrote:
> > >
> > > By the stricter data-race rules used in RCU code [1],
> > ...
> > > [1] https://docs.google.com/document/d/1FwZaXSg3A55ivVoWffA9iMuhJ3_Gmj_E494dLYjjyLQ/edit?usp=sharing
> >
> > I am getting more and more confused...
> >
> > Does this mean that KCSAN/etc treats the files in kernel/rcu/
> > differently than the "Rest of Kernel"? Or what?
> >
> > And how is it enforced?
> 
> I can only find the strnstr(buf, "rcu") checks in skip_report(),
> but they only cover the KCSAN_REPORT_VALUE_CHANGE_ONLY case...

Huh, new one on me!  When I run KCSAN, I set CONFIG_KCSAN_STRICT=y,
which implies CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n, which should
prevent skip_report() from even being invoked.

Which suggests that in the rest of the kernel, including "rcu_"
in your function name gets you stricter KCSAN checking.  ;-)

							Thanx, Paul

Oleg Nesterov May 12, 2024, 10:41 a.m. UTC | #12

Sorry for another delay...

On 05/10, Paul E. McKenney wrote:
>
> On Fri, May 10, 2024 at 03:50:57PM +0200, Oleg Nesterov wrote:
> >
> > I can only find the strnstr(buf, "rcu") checks in skip_report(),
> > but they only cover the KCSAN_REPORT_VALUE_CHANGE_ONLY case...
>
> Huh, new one on me!  When I run KCSAN, I set CONFIG_KCSAN_STRICT=y,
> which implies CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n, which should
> prevent skip_report() from even being invoked.
>
> Which suggests that in the rest of the kernel, including "rcu_"
> in your function name gets you stricter KCSAN checking.  ;-)

Yes.

And that is why I was very confused. I misinterpreted the "stricter
data-race rules used in RCU code" as if there must be more "rcu-only"
hacks in the kernel/kcsan/ code which I can't find ;)

Oleg.

Oleg Nesterov May 12, 2024, 10:53 a.m. UTC | #13

On 05/10, Paul E. McKenney wrote:
>
> On Fri, May 10, 2024 at 01:31:49PM +0200, Oleg Nesterov wrote:
>
> > Why is that?
>
> Because I run KCSAN on RCU using Kconfig options that cause KCSAN
> to be more strict.

Yes, I see now.

> > but how can KCSAN detect that all accesses to X are properly marked? I see nothing
> > KCSAN-related in the definition of WRITE_ONCE() or READ_ONCE().
>
> The trick is that KCSAN sees the volatile cast that both READ_ONCE()
> and WRITE_ONCE() use.

Hmm. grep-grep-grep... I seem to understand, DEFINE_TSAN_VOLATILE_READ_WRITE.
So __tsan_volatile_readX() will use KCSAN_ACCESS_ATOMIC.

Thanks!

> > And what does the "all accesses" above actually mean? The 2nd version does
> >
> > 	WRITE_ONCE(X, X+1);
> >
> > but "X + 1" is the plain/unmarked access?
>
> ...
>
> In that case, the "X+1" cannot be involved in a data race, so KCSAN
> won't complain.

Yes, yes, I understand now.

Paul, thanks for your explanations! and sorry for wasting your time by
provoking the unnecessarily long discussion.

I am going to send the trivial patch which moves these WARN_ON()'s under
spin_lock(), this looks more clean to me. But I won't argue if you prefer
your original patch.

Oleg.

Paul E. McKenney May 12, 2024, 2:57 p.m. UTC | #14

On Sun, May 12, 2024 at 12:53:06PM +0200, Oleg Nesterov wrote:
> On 05/10, Paul E. McKenney wrote:
> >
> > On Fri, May 10, 2024 at 01:31:49PM +0200, Oleg Nesterov wrote:
> >
> > > Why is that?
> >
> > Because I run KCSAN on RCU using Kconfig options that cause KCSAN
> > to be more strict.
> 
> Yes, I see now.
> 
> > > but how can KCSAN detect that all accesses to X are properly marked? I see nothing
> > > KCSAN-related in the definition of WRITE_ONCE() or READ_ONCE().
> >
> > The trick is that KCSAN sees the volatile cast that both READ_ONCE()
> > and WRITE_ONCE() use.
> 
> Hmm. grep-grep-grep... I seem to understand, DEFINE_TSAN_VOLATILE_READ_WRITE.
> So __tsan_volatile_readX() will use KCSAN_ACCESS_ATOMIC.
> 
> Thanks!

You got it!!!

> > > And what does the "all accesses" above actually mean? The 2nd version does
> > >
> > > 	WRITE_ONCE(X, X+1);
> > >
> > > but "X + 1" is the plain/unmarked access?
> >
> > ...
> >
> > In that case, the "X+1" cannot be involved in a data race, so KCSAN
> > won't complain.
> 
> Yes, yes, I understand now.
> 
> Paul, thanks for your explanations! and sorry for wasting your time by
> provoking the unnecessarily long discussion.

Not a problem and absolutely no need to apologize!  Of course, please do
pass this information on to anyone else needing it.

> I am going to send the trivial patch which moves these WARN_ON()'s under
> spin_lock(), this looks more clean to me. But I won't argue if you prefer
> your original patch.

Actually, I like your patch quite a bit better than I do my original.
In fact, I feel quite foolish that I did not think of this to begin with.
With your way, we have strict locking for that field and can therefore
just use plain C-language accesses for all accesses to it.  KCSAN will
then warn us of any buggy lockless access to that field, even if that
buggy access uses READ_ONCE().  Much much better your way!!!

So thank you for persisting on this!

							Thanx, Paul

Marco Elver May 13, 2024, 2:13 p.m. UTC | #15

On Fri, 10 May 2024 at 16:11, Paul E. McKenney <paulmck@kernel.org> wrote:
[...]
> > > Does this mean that KCSAN/etc treats the files in kernel/rcu/
> > > differently than the "Rest of Kernel"? Or what?
> > >
> > > And how is it enforced?
> >
> > I can only find the strnstr(buf, "rcu") checks in skip_report(),
> > but they only cover the KCSAN_REPORT_VALUE_CHANGE_ONLY case...
>
> Huh, new one on me!  When I run KCSAN, I set CONFIG_KCSAN_STRICT=y,
> which implies CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n, which should
> prevent skip_report() from even being invoked.

The strnstr hack goes back to the first version of KCSAN released in
v5.8 [1]. It was added in response to Paul wanting the "stricter"
treatment for RCU even while KCSAN was still in development, and back
then syzbot was the first test system using KCSAN. Shortly after Paul
added a KCSAN config for rcutorture with a laundry list of options to
make KCSAN "strict" (before we eventually added CONFIG_KCSAN_STRICT
which greatly simplified that).

While the strnstr(.., "rcu") rules are redundant when using the
stricter rules (at least CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n is
set), we're keeping the "rcu" special case around because there are
test robots and fuzzers that use the default KCSAN config (unlike
rcutorture). And because we know that RCU likes the stricter rules,
the "value change only" filter is ignored in all KCSAN configs for
rcu-related functions.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/kcsan/report.c?id=v5.8

Back then syzbot occasionally reported RCU data races, but these days
rcutorture probably finds all of them before syzbot ever gets its
hands on new code.

Thanks,
-- Marco

Paul E. McKenney May 14, 2024, 4:54 p.m. UTC | #16

On Mon, May 13, 2024 at 04:13:35PM +0200, Marco Elver wrote:
> On Fri, 10 May 2024 at 16:11, Paul E. McKenney <paulmck@kernel.org> wrote:
> [...]
> > > > Does this mean that KCSAN/etc treats the files in kernel/rcu/
> > > > differently than the "Rest of Kernel"? Or what?
> > > >
> > > > And how is it enforced?
> > >
> > > I can only find the strnstr(buf, "rcu") checks in skip_report(),
> > > but they only cover the KCSAN_REPORT_VALUE_CHANGE_ONLY case...
> >
> > Huh, new one on me!  When I run KCSAN, I set CONFIG_KCSAN_STRICT=y,
> > which implies CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n, which should
> > prevent skip_report() from even being invoked.
> 
> The strnstr hack goes back to the first version of KCSAN released in
> v5.8 [1]. It was added in response to Paul wanting the "stricter"
> treatment for RCU even while KCSAN was still in development, and back
> then syzbot was the first test system using KCSAN. Shortly after Paul
> added a KCSAN config for rcutorture with a laundry list of options to
> make KCSAN "strict" (before we eventually added CONFIG_KCSAN_STRICT
> which greatly simplified that).
> 
> While the strnstr(.., "rcu") rules are redundant when using the
> stricter rules (at least CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n is
> set), we're keeping the "rcu" special case around because there are
> test robots and fuzzers that use the default KCSAN config (unlike
> rcutorture). And because we know that RCU likes the stricter rules,
> the "value change only" filter is ignored in all KCSAN configs for
> rcu-related functions.
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/kcsan/report.c?id=v5.8

Thank you for the background information!

> Back then syzbot occasionally reported RCU data races, but these days
> rcutorture probably finds all of them before syzbot ever gets its
> hands on new code.

I do try my best.  ;-)

							Thanx, Paul

[25/48] rcu: Mark writes to rcu_sync ->gp_count field

Commit Message

Comments

Patch