diff mbox series

srcu: Fix a rare race issue in __srcu_read_(un)lock()

Message ID 20221103131313.41536-1-kernelfans@gmail.com (mailing list archive)
State New, archived
Headers show
Series srcu: Fix a rare race issue in __srcu_read_(un)lock() | expand

Commit Message

Pingfan Liu Nov. 3, 2022, 1:13 p.m. UTC
Clarify at first:
  This issue is totally detected by code suspicion, not a real experience.

Scene:

  __srcu_read_(un)lock() uses percpu variable srcu_(un)lock_count[2].
Normally, the percpu can help avoid the non-atomic RMW issue, but in
some rare cases, it can not.

Supposing that __srcu_read_lock() runs on cpuX, the statement
    this_cpu_inc(ssp->sda->srcu_lock_count[idx]);
can be decomposed into two sub group:
    -1. get the address of this_cpu_ptr(ssp->sda)->srcu_lock_count[idx],
denoted as addressX and let unsigned long *pX = addressX;
    -2. *pX = *pX + 1;

Now, assuming there are two tasks, denoted as taskA and taskB, three
cpus, denoted as cpuX, cpuY and cpuZ. Let both taskA and taskB finish
the first step as above on cpuX, but migrate to cpuY and cpuZ
individually just before the second step. Then both of them continue to
execute concurrently from the second step. This will raise a typical
non-atomic RMW issue.

Solution:

  This issue can be tackled by disable preemption around the two sub
groups.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: rcu@vger.kernel.org
---
 kernel/rcu/srcutree.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Frederic Weisbecker Nov. 3, 2022, 1:36 p.m. UTC | #1
On Thu, Nov 03, 2022 at 09:13:13PM +0800, Pingfan Liu wrote:
> Clarify at first:
>   This issue is totally detected by code suspicion, not a real experience.
> 
> Scene:
> 
>   __srcu_read_(un)lock() uses percpu variable srcu_(un)lock_count[2].
> Normally, the percpu can help avoid the non-atomic RMW issue, but in
> some rare cases, it can not.
> 
> Supposing that __srcu_read_lock() runs on cpuX, the statement
>     this_cpu_inc(ssp->sda->srcu_lock_count[idx]);
> can be decomposed into two sub group:
>     -1. get the address of this_cpu_ptr(ssp->sda)->srcu_lock_count[idx],
> denoted as addressX and let unsigned long *pX = addressX;
>     -2. *pX = *pX + 1;

It's not supposed to happen:

* The weak version of this_cpu_inc() disables interrupts during the whole.
* x86 adds directly to gs/fs memory
* arm64, loongarch, s390 disable preemption

This has to be a fundamental constraint of this_cpu_*() ops implementation.

Thanks.



> 
> Now, assuming there are two tasks, denoted as taskA and taskB, three
> cpus, denoted as cpuX, cpuY and cpuZ. Let both taskA and taskB finish
> the first step as above on cpuX, but migrate to cpuY and cpuZ
> individually just before the second step. Then both of them continue to
> execute concurrently from the second step. This will raise a typical
> non-atomic RMW issue.
> 
> Solution:
> 
>   This issue can be tackled by disable preemption around the two sub
> groups.
> 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> To: rcu@vger.kernel.org
> ---
>  kernel/rcu/srcutree.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index 1c304fec89c0..a38a3779dc01 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -636,7 +636,11 @@ int __srcu_read_lock(struct srcu_struct *ssp)
>  	int idx;
>  
>  	idx = READ_ONCE(ssp->srcu_idx) & 0x1;
> +	__preempt_count_inc();
> +	barrier();
>  	this_cpu_inc(ssp->sda->srcu_lock_count[idx]);
> +	barrier();
> +	__preempt_count_dec();
>  	smp_mb(); /* B */  /* Avoid leaking the critical section. */
>  	return idx;
>  }
> @@ -650,7 +654,11 @@ EXPORT_SYMBOL_GPL(__srcu_read_lock);
>  void __srcu_read_unlock(struct srcu_struct *ssp, int idx)
>  {
>  	smp_mb(); /* C */  /* Avoid leaking the critical section. */
> +	__preempt_count_inc();
> +	barrier();
>  	this_cpu_inc(ssp->sda->srcu_unlock_count[idx]);
> +	barrier();
> +	__preempt_count_dec();
>  }
>  EXPORT_SYMBOL_GPL(__srcu_read_unlock);
>  
> -- 
> 2.31.1
>
Mathieu Desnoyers Nov. 3, 2022, 3:05 p.m. UTC | #2
On 2022-11-03 09:36, Frederic Weisbecker wrote:
> On Thu, Nov 03, 2022 at 09:13:13PM +0800, Pingfan Liu wrote:
>> Clarify at first:
>>    This issue is totally detected by code suspicion, not a real experience.
>>
>> Scene:
>>
>>    __srcu_read_(un)lock() uses percpu variable srcu_(un)lock_count[2].
>> Normally, the percpu can help avoid the non-atomic RMW issue, but in
>> some rare cases, it can not.
>>
>> Supposing that __srcu_read_lock() runs on cpuX, the statement
>>      this_cpu_inc(ssp->sda->srcu_lock_count[idx]);
>> can be decomposed into two sub group:
>>      -1. get the address of this_cpu_ptr(ssp->sda)->srcu_lock_count[idx],
>> denoted as addressX and let unsigned long *pX = addressX;
>>      -2. *pX = *pX + 1;
> 
> It's not supposed to happen:
> 
> * The weak version of this_cpu_inc() disables interrupts during the whole.
> * x86 adds directly to gs/fs memory
> * arm64, loongarch, s390 disable preemption
> 
> This has to be a fundamental constraint of this_cpu_*() ops implementation.

I concur with Frederic, this is guaranteed by the this_cpu_*() API. 
There is no issue there.

Thanks,

Mathieu
Pingfan Liu Nov. 4, 2022, 9:38 a.m. UTC | #3
On Thu, Nov 3, 2022 at 9:36 PM Frederic Weisbecker <frederic@kernel.org> wrote:
>
> On Thu, Nov 03, 2022 at 09:13:13PM +0800, Pingfan Liu wrote:
> > Clarify at first:
> >   This issue is totally detected by code suspicion, not a real experience.
> >
> > Scene:
> >
> >   __srcu_read_(un)lock() uses percpu variable srcu_(un)lock_count[2].
> > Normally, the percpu can help avoid the non-atomic RMW issue, but in
> > some rare cases, it can not.
> >
> > Supposing that __srcu_read_lock() runs on cpuX, the statement
> >     this_cpu_inc(ssp->sda->srcu_lock_count[idx]);
> > can be decomposed into two sub group:
> >     -1. get the address of this_cpu_ptr(ssp->sda)->srcu_lock_count[idx],
> > denoted as addressX and let unsigned long *pX = addressX;
> >     -2. *pX = *pX + 1;
>
> It's not supposed to happen:
>
> * The weak version of this_cpu_inc() disables interrupts during the whole.
> * x86 adds directly to gs/fs memory
> * arm64, loongarch, s390 disable preemption
>
> This has to be a fundamental constraint of this_cpu_*() ops implementation.
>

Thanks! I find the exact implementation in the code, also there is
notes about this_cpu_*() api, which says they are safe from preemption
and interrupt.

Regards,

    Pingfan
diff mbox series

Patch

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 1c304fec89c0..a38a3779dc01 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -636,7 +636,11 @@  int __srcu_read_lock(struct srcu_struct *ssp)
 	int idx;
 
 	idx = READ_ONCE(ssp->srcu_idx) & 0x1;
+	__preempt_count_inc();
+	barrier();
 	this_cpu_inc(ssp->sda->srcu_lock_count[idx]);
+	barrier();
+	__preempt_count_dec();
 	smp_mb(); /* B */  /* Avoid leaking the critical section. */
 	return idx;
 }
@@ -650,7 +654,11 @@  EXPORT_SYMBOL_GPL(__srcu_read_lock);
 void __srcu_read_unlock(struct srcu_struct *ssp, int idx)
 {
 	smp_mb(); /* C */  /* Avoid leaking the critical section. */
+	__preempt_count_inc();
+	barrier();
 	this_cpu_inc(ssp->sda->srcu_unlock_count[idx]);
+	barrier();
+	__preempt_count_dec();
 }
 EXPORT_SYMBOL_GPL(__srcu_read_unlock);