diff mbox series

[v2] rcu-tasks: Make rude RCU-Tasks work well with CPU hotplug

Message ID 20221128143428.1703744-1-qiang1.zhang@intel.com (mailing list archive)
State Superseded
Headers show
Series [v2] rcu-tasks: Make rude RCU-Tasks work well with CPU hotplug | expand

Commit Message

Zqiang Nov. 28, 2022, 2:34 p.m. UTC
Currently, invoke rcu_tasks_rude_wait_gp() to wait one rude
RCU-tasks grace period, if __num_online_cpus == 1, will return
directly, indicates the end of the rude RCU-task grace period.
suppose the system has two cpus, consider the following scenario:

	CPU0                                   CPU1 (going offline)
				          migration/1 task:
                                      cpu_stopper_thread
                                       -> take_cpu_down
                                          -> _cpu_disable
			                   (dec __num_online_cpus)
                                          ->cpuhp_invoke_callback
                                                preempt_disable
						access old_data0
           task1
 del old_data0                                  .....
 synchronize_rcu_tasks_rude()
 task1 schedule out
 ....
 task2 schedule in
 rcu_tasks_rude_wait_gp()
     ->__num_online_cpus == 1
       ->return
 ....
 task1 schedule in
 ->free old_data0
                                                preempt_enable

when CPU1 dec __num_online_cpus and __num_online_cpus is equal one,
the CPU1 has not finished offline, stop_machine task(migration/1)
still running on CPU1, maybe still accessing 'old_data0', but the
'old_data0' has freed on CPU0.

This commit add cpus_read_lock/unlock() protection before accessing
__num_online_cpus variables, to ensure that the CPU in the offline
process has been completed offline.

Signed-off-by: Zqiang <qiang1.zhang@intel.com>
---
 kernel/rcu/tasks.h | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

Comments

Paul E. McKenney Nov. 29, 2022, 1:03 a.m. UTC | #1
On Mon, Nov 28, 2022 at 10:34:28PM +0800, Zqiang wrote:
> Currently, invoke rcu_tasks_rude_wait_gp() to wait one rude
> RCU-tasks grace period, if __num_online_cpus == 1, will return
> directly, indicates the end of the rude RCU-task grace period.
> suppose the system has two cpus, consider the following scenario:
> 
> 	CPU0                                   CPU1 (going offline)
> 				          migration/1 task:
>                                       cpu_stopper_thread
>                                        -> take_cpu_down
>                                           -> _cpu_disable
> 			                   (dec __num_online_cpus)
>                                           ->cpuhp_invoke_callback
>                                                 preempt_disable
> 						access old_data0
>            task1
>  del old_data0                                  .....
>  synchronize_rcu_tasks_rude()
>  task1 schedule out
>  ....
>  task2 schedule in
>  rcu_tasks_rude_wait_gp()
>      ->__num_online_cpus == 1
>        ->return
>  ....
>  task1 schedule in
>  ->free old_data0
>                                                 preempt_enable
> 
> when CPU1 dec __num_online_cpus and __num_online_cpus is equal one,
> the CPU1 has not finished offline, stop_machine task(migration/1)
> still running on CPU1, maybe still accessing 'old_data0', but the
> 'old_data0' has freed on CPU0.
> 
> This commit add cpus_read_lock/unlock() protection before accessing
> __num_online_cpus variables, to ensure that the CPU in the offline
> process has been completed offline.
> 
> Signed-off-by: Zqiang <qiang1.zhang@intel.com>

First, good eyes and good catch!!!

The purpose of that check for num_online_cpus() is not performance
on single-CPU systems, but rather correct operation during early boot.
So a simpler way to make that work is to check for RCU_SCHEDULER_RUNNING,
for example, as follows:

	if (rcu_scheduler_active != RCU_SCHEDULER_RUNNING &&
	    num_online_cpus() <= 1)
		return;	// Early boot fastpath for only one CPU.

This works because rcu_scheduler_active is set to RCU_SCHEDULER_RUNNING
long before it is possible to offline CPUs.

Yes, schedule_on_each_cpu() does do cpus_read_lock(), again, good eyes,
and it also unnecessarily does the schedule_work_on() the current CPU,
but the code calling synchronize_rcu_tasks_rude() is on high-overhead
code paths, so this overhead is down in the noise.

Until further notice, anyway.

So simplicity is much more important than performance in this code.
So just adding the check for RCU_SCHEDULER_RUNNING should fix this,
unless I am missing something (always possible!).

							Thanx, Paul

> ---
>  kernel/rcu/tasks.h | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> index 4a991311be9b..08e72c6462d8 100644
> --- a/kernel/rcu/tasks.h
> +++ b/kernel/rcu/tasks.h
> @@ -1033,14 +1033,30 @@ static void rcu_tasks_be_rude(struct work_struct *work)
>  {
>  }
>  
> +static DEFINE_PER_CPU(struct work_struct, rude_work);
> +
>  // Wait for one rude RCU-tasks grace period.
>  static void rcu_tasks_rude_wait_gp(struct rcu_tasks *rtp)
>  {
> +	int cpu;
> +	struct work_struct *work;
> +
> +	cpus_read_lock();
>  	if (num_online_cpus() <= 1)
> -		return;	// Fastpath for only one CPU.
> +		goto end;// Fastpath for only one CPU.
>  
>  	rtp->n_ipis += cpumask_weight(cpu_online_mask);
> -	schedule_on_each_cpu(rcu_tasks_be_rude);
> +	for_each_online_cpu(cpu) {
> +		work = per_cpu_ptr(&rude_work, cpu);
> +		INIT_WORK(work, rcu_tasks_be_rude);
> +		schedule_work_on(cpu, work);
> +	}
> +
> +	for_each_online_cpu(cpu)
> +		flush_work(per_cpu_ptr(&rude_work, cpu));
> +
> +end:
> +	cpus_read_unlock();
>  }
>  
>  void call_rcu_tasks_rude(struct rcu_head *rhp, rcu_callback_t func);
> -- 
> 2.25.1
>
Zqiang Nov. 29, 2022, 1:44 a.m. UTC | #2
On Mon, Nov 28, 2022 at 10:34:28PM +0800, Zqiang wrote:
> Currently, invoke rcu_tasks_rude_wait_gp() to wait one rude
> RCU-tasks grace period, if __num_online_cpus == 1, will return
> directly, indicates the end of the rude RCU-task grace period.
> suppose the system has two cpus, consider the following scenario:
> 
> 	CPU0                                   CPU1 (going offline)
> 				          migration/1 task:
>                                       cpu_stopper_thread
>                                        -> take_cpu_down
>                                           -> _cpu_disable
> 			                   (dec __num_online_cpus)
>                                           ->cpuhp_invoke_callback
>                                                 preempt_disable
> 						access old_data0
>            task1
>  del old_data0                                  .....
>  synchronize_rcu_tasks_rude()
>  task1 schedule out
>  ....
>  task2 schedule in
>  rcu_tasks_rude_wait_gp()
>      ->__num_online_cpus == 1
>        ->return
>  ....
>  task1 schedule in
>  ->free old_data0
>                                                 preempt_enable
> 
> when CPU1 dec __num_online_cpus and __num_online_cpus is equal one,
> the CPU1 has not finished offline, stop_machine task(migration/1)
> still running on CPU1, maybe still accessing 'old_data0', but the
> 'old_data0' has freed on CPU0.
> 
> This commit add cpus_read_lock/unlock() protection before accessing
> __num_online_cpus variables, to ensure that the CPU in the offline
> process has been completed offline.
> 
> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
>
>First, good eyes and good catch!!!
>
>The purpose of that check for num_online_cpus() is not performance
>on single-CPU systems, but rather correct operation during early boot.
>So a simpler way to make that work is to check for RCU_SCHEDULER_RUNNING,
>for example, as follows:
>
>	if (rcu_scheduler_active != RCU_SCHEDULER_RUNNING &&
>	    num_online_cpus() <= 1)
>		return;	// Early boot fastpath for only one CPU.
>
>This works because rcu_scheduler_active is set to RCU_SCHEDULER_RUNNING
>long before it is possible to offline CPUs.
>
>Yes, schedule_on_each_cpu() does do cpus_read_lock(), again, good eyes,
>and it also unnecessarily does the schedule_work_on() the current CPU,
>but the code calling synchronize_rcu_tasks_rude() is on high-overhead
>code paths, so this overhead is down in the noise.
>
>Until further notice, anyway.
>
>So simplicity is much more important than performance in this code.
>So just adding the check for RCU_SCHEDULER_RUNNING should fix this,

Agree 
Zqiang Nov. 29, 2022, 4:54 a.m. UTC | #3
On Mon, Nov 28, 2022 at 10:34:28PM +0800, Zqiang wrote:
> Currently, invoke rcu_tasks_rude_wait_gp() to wait one rude
> RCU-tasks grace period, if __num_online_cpus == 1, will return
> directly, indicates the end of the rude RCU-task grace period.
> suppose the system has two cpus, consider the following scenario:
> 
> 	CPU0                                   CPU1 (going offline)
> 				          migration/1 task:
>                                       cpu_stopper_thread
>                                        -> take_cpu_down
>                                           -> _cpu_disable
> 			                   (dec __num_online_cpus)
>                                           ->cpuhp_invoke_callback
>                                                 preempt_disable
> 						access old_data0
>            task1
>  del old_data0                                  .....
>  synchronize_rcu_tasks_rude()
>  task1 schedule out
>  ....
>  task2 schedule in
>  rcu_tasks_rude_wait_gp()
>      ->__num_online_cpus == 1
>        ->return
>  ....
>  task1 schedule in
>  ->free old_data0
>                                                 preempt_enable
> 
> when CPU1 dec __num_online_cpus and __num_online_cpus is equal one,
> the CPU1 has not finished offline, stop_machine task(migration/1)
> still running on CPU1, maybe still accessing 'old_data0', but the
> 'old_data0' has freed on CPU0.
> 
> This commit add cpus_read_lock/unlock() protection before accessing
> __num_online_cpus variables, to ensure that the CPU in the offline
> process has been completed offline.
> 
> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
>
>First, good eyes and good catch!!!
>
>The purpose of that check for num_online_cpus() is not performance
>on single-CPU systems, but rather correct operation during early boot.
>So a simpler way to make that work is to check for RCU_SCHEDULER_RUNNING,
>for example, as follows:
>
>	if (rcu_scheduler_active != RCU_SCHEDULER_RUNNING &&
>	    num_online_cpus() <= 1)
>		return;	// Early boot fastpath for only one CPU.

Hi Paul

During system startup, because the RCU_SCHEDULER_RUNNING is set after starting other CPUs, 

  	    	CPU0                                                                       CPU1                                                                 

if (rcu_scheduler_active !=                                    
	RCU_SCHEDULER_RUNNING &&
       	__num_online_cpus  == 1)                                               
	return;                                                                         inc  __num_online_cpus
							(__num_online_cpus == 2)

CPU0 didn't notice the update of the __num_online_cpus variable by CPU1 in time
Can we move rcu_set_runtime_mode() before smp_init()
any thoughts?

Thanks
Zqiang

>
>This works because rcu_scheduler_active is set to RCU_SCHEDULER_RUNNING
>long before it is possible to offline CPUs.
>
>Yes, schedule_on_each_cpu() does do cpus_read_lock(), again, good eyes,
>and it also unnecessarily does the schedule_work_on() the current CPU,
>but the code calling synchronize_rcu_tasks_rude() is on high-overhead
>code paths, so this overhead is down in the noise.
>
>Until further notice, anyway.
>
>So simplicity is much more important than performance in this code.
>So just adding the check for RCU_SCHEDULER_RUNNING should fix this,
>unless I am missing something (always possible!).
>
>							Thanx, Paul
>
> ---
>  kernel/rcu/tasks.h | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> index 4a991311be9b..08e72c6462d8 100644
> --- a/kernel/rcu/tasks.h
> +++ b/kernel/rcu/tasks.h
> @@ -1033,14 +1033,30 @@ static void rcu_tasks_be_rude(struct work_struct *work)
>  {
>  }
>  
> +static DEFINE_PER_CPU(struct work_struct, rude_work);
> +
>  // Wait for one rude RCU-tasks grace period.
>  static void rcu_tasks_rude_wait_gp(struct rcu_tasks *rtp)
>  {
> +	int cpu;
> +	struct work_struct *work;
> +
> +	cpus_read_lock();
>  	if (num_online_cpus() <= 1)
> -		return;	// Fastpath for only one CPU.
> +		goto end;// Fastpath for only one CPU.
>  
>  	rtp->n_ipis += cpumask_weight(cpu_online_mask);
> -	schedule_on_each_cpu(rcu_tasks_be_rude);
> +	for_each_online_cpu(cpu) {
> +		work = per_cpu_ptr(&rude_work, cpu);
> +		INIT_WORK(work, rcu_tasks_be_rude);
> +		schedule_work_on(cpu, work);
> +	}
> +
> +	for_each_online_cpu(cpu)
> +		flush_work(per_cpu_ptr(&rude_work, cpu));
> +
> +end:
> +	cpus_read_unlock();
>  }
>  
>  void call_rcu_tasks_rude(struct rcu_head *rhp, rcu_callback_t func);
> -- 
> 2.25.1
>
Joel Fernandes Nov. 29, 2022, 5:59 a.m. UTC | #4
> On Nov 28, 2022, at 11:54 PM, Zhang, Qiang1 <qiang1.zhang@intel.com> wrote:
> 
> On Mon, Nov 28, 2022 at 10:34:28PM +0800, Zqiang wrote:
>> Currently, invoke rcu_tasks_rude_wait_gp() to wait one rude
>> RCU-tasks grace period, if __num_online_cpus == 1, will return
>> directly, indicates the end of the rude RCU-task grace period.
>> suppose the system has two cpus, consider the following scenario:
>> 
>>    CPU0                                   CPU1 (going offline)
>>                          migration/1 task:
>>                                      cpu_stopper_thread
>>                                       -> take_cpu_down
>>                                          -> _cpu_disable
>>                               (dec __num_online_cpus)
>>                                          ->cpuhp_invoke_callback
>>                                                preempt_disable
>>                        access old_data0
>>           task1
>> del old_data0                                  .....
>> synchronize_rcu_tasks_rude()
>> task1 schedule out
>> ....
>> task2 schedule in
>> rcu_tasks_rude_wait_gp()
>>     ->__num_online_cpus == 1
>>       ->return
>> ....
>> task1 schedule in
>> ->free old_data0
>>                                                preempt_enable
>> 
>> when CPU1 dec __num_online_cpus and __num_online_cpus is equal one,
>> the CPU1 has not finished offline, stop_machine task(migration/1)
>> still running on CPU1, maybe still accessing 'old_data0', but the
>> 'old_data0' has freed on CPU0.
>> 
>> This commit add cpus_read_lock/unlock() protection before accessing
>> __num_online_cpus variables, to ensure that the CPU in the offline
>> process has been completed offline.
>> 
>> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
>> 
>> First, good eyes and good catch!!!
>> 
>> The purpose of that check for num_online_cpus() is not performance
>> on single-CPU systems, but rather correct operation during early boot.
>> So a simpler way to make that work is to check for RCU_SCHEDULER_RUNNING,
>> for example, as follows:
>> 
>>    if (rcu_scheduler_active != RCU_SCHEDULER_RUNNING &&
>>        num_online_cpus() <= 1)
>>        return;    // Early boot fastpath for only one CPU.
> 
> Hi Paul
> 
> During system startup, because the RCU_SCHEDULER_RUNNING is set after starting other CPUs, 
> 
>              CPU0                                                                       CPU1                                                                 
> 
> if (rcu_scheduler_active !=                                    
>    RCU_SCHEDULER_RUNNING &&
>           __num_online_cpus  == 1)                                               
>    return;                                                                         inc  __num_online_cpus
>                            (__num_online_cpus == 2)
> 
> CPU0 didn't notice the update of the __num_online_cpus variable by CPU1 in time
> Can we move rcu_set_runtime_mode() before smp_init()
> any thoughts?

Is anyone expected to do rcu-tasks operation before the scheduler is running? Typically this requires the tasks to context switch which is a scheduler operation.

If the scheduler is not yet running, then I don’t think missing an update the __num_online_cpus matters since no one does a tasks-RCU synchronize.

Or did I miss something?

Thanks.



> 
> Thanks
> Zqiang
> 
>> 
>> This works because rcu_scheduler_active is set to RCU_SCHEDULER_RUNNING
>> long before it is possible to offline CPUs.
>> 
>> Yes, schedule_on_each_cpu() does do cpus_read_lock(), again, good eyes,
>> and it also unnecessarily does the schedule_work_on() the current CPU,
>> but the code calling synchronize_rcu_tasks_rude() is on high-overhead
>> code paths, so this overhead is down in the noise.
>> 
>> Until further notice, anyway.
>> 
>> So simplicity is much more important than performance in this code.
>> So just adding the check for RCU_SCHEDULER_RUNNING should fix this,
>> unless I am missing something (always possible!).
>> 
>>                            Thanx, Paul
>> 
>> ---
>> kernel/rcu/tasks.h | 20 ++++++++++++++++++--
>> 1 file changed, 18 insertions(+), 2 deletions(-)
>> 
>> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
>> index 4a991311be9b..08e72c6462d8 100644
>> --- a/kernel/rcu/tasks.h
>> +++ b/kernel/rcu/tasks.h
>> @@ -1033,14 +1033,30 @@ static void rcu_tasks_be_rude(struct work_struct *work)
>> {
>> }
>> 
>> +static DEFINE_PER_CPU(struct work_struct, rude_work);
>> +
>> // Wait for one rude RCU-tasks grace period.
>> static void rcu_tasks_rude_wait_gp(struct rcu_tasks *rtp)
>> {
>> +    int cpu;
>> +    struct work_struct *work;
>> +
>> +    cpus_read_lock();
>>    if (num_online_cpus() <= 1)
>> -        return;    // Fastpath for only one CPU.
>> +        goto end;// Fastpath for only one CPU.
>> 
>>    rtp->n_ipis += cpumask_weight(cpu_online_mask);
>> -    schedule_on_each_cpu(rcu_tasks_be_rude);
>> +    for_each_online_cpu(cpu) {
>> +        work = per_cpu_ptr(&rude_work, cpu);
>> +        INIT_WORK(work, rcu_tasks_be_rude);
>> +        schedule_work_on(cpu, work);
>> +    }
>> +
>> +    for_each_online_cpu(cpu)
>> +        flush_work(per_cpu_ptr(&rude_work, cpu));
>> +
>> +end:
>> +    cpus_read_unlock();
>> }
>> 
>> void call_rcu_tasks_rude(struct rcu_head *rhp, rcu_callback_t func);
>> -- 
>> 2.25.1
>>
Zqiang Nov. 29, 2022, 6:25 a.m. UTC | #5
> On Nov 28, 2022, at 11:54 PM, Zhang, Qiang1 <qiang1.zhang@intel.com> wrote:
> 
> On Mon, Nov 28, 2022 at 10:34:28PM +0800, Zqiang wrote:
>> Currently, invoke rcu_tasks_rude_wait_gp() to wait one rude
>> RCU-tasks grace period, if __num_online_cpus == 1, will return
>> directly, indicates the end of the rude RCU-task grace period.
>> suppose the system has two cpus, consider the following scenario:
>> 
>>    CPU0                                   CPU1 (going offline)
>>                          migration/1 task:
>>                                      cpu_stopper_thread
>>                                       -> take_cpu_down
>>                                          -> _cpu_disable
>>                               (dec __num_online_cpus)
>>                                          ->cpuhp_invoke_callback
>>                                                preempt_disable
>>                        access old_data0
>>           task1
>> del old_data0                                  .....
>> synchronize_rcu_tasks_rude()
>> task1 schedule out
>> ....
>> task2 schedule in
>> rcu_tasks_rude_wait_gp()
>>     ->__num_online_cpus == 1
>>       ->return
>> ....
>> task1 schedule in
>> ->free old_data0
>>                                                preempt_enable
>> 
>> when CPU1 dec __num_online_cpus and __num_online_cpus is equal one,
>> the CPU1 has not finished offline, stop_machine task(migration/1)
>> still running on CPU1, maybe still accessing 'old_data0', but the
>> 'old_data0' has freed on CPU0.
>> 
>> This commit add cpus_read_lock/unlock() protection before accessing
>> __num_online_cpus variables, to ensure that the CPU in the offline
>> process has been completed offline.
>> 
>> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
>> 
>> First, good eyes and good catch!!!
>> 
>> The purpose of that check for num_online_cpus() is not performance
>> on single-CPU systems, but rather correct operation during early boot.
>> So a simpler way to make that work is to check for RCU_SCHEDULER_RUNNING,
>> for example, as follows:
>> 
>>    if (rcu_scheduler_active != RCU_SCHEDULER_RUNNING &&
>>        num_online_cpus() <= 1)
>>        return;    // Early boot fastpath for only one CPU.
> 
> Hi Paul
> 
> During system startup, because the RCU_SCHEDULER_RUNNING is set after starting other CPUs, 
> 
>              CPU0                                                                       CPU1                                                                 
> 
> if (rcu_scheduler_active !=                                    
>    RCU_SCHEDULER_RUNNING &&
>           __num_online_cpus  == 1)                                               
>    return;                                                                         inc  __num_online_cpus
>                            (__num_online_cpus == 2)
> 
> CPU0 didn't notice the update of the __num_online_cpus variable by CPU1 in time
> Can we move rcu_set_runtime_mode() before smp_init()
> any thoughts?
>
>Is anyone expected to do rcu-tasks operation before the scheduler is running? 

Not sure if such a scenario exists.

>Typically this requires the tasks to context switch which is a scheduler operation.
>
>If the scheduler is not yet running, then I don’t think missing an update the __num_online_cpus matters since no one does a tasks-RCU synchronize.

Hi Joel

After the kernel_init task runs, before calling smp_init() to starting other CPUs, 
the scheduler haven been initialization, task context switching can occur.

Thanks
Zqiang

>
>Or did I miss something?
>
>Thanks.
>
>
>
> 
> Thanks
> Zqiang
> 
>> 
>> This works because rcu_scheduler_active is set to RCU_SCHEDULER_RUNNING
>> long before it is possible to offline CPUs.
>> 
>> Yes, schedule_on_each_cpu() does do cpus_read_lock(), again, good eyes,
>> and it also unnecessarily does the schedule_work_on() the current CPU,
>> but the code calling synchronize_rcu_tasks_rude() is on high-overhead
>> code paths, so this overhead is down in the noise.
>> 
>> Until further notice, anyway.
>> 
>> So simplicity is much more important than performance in this code.
>> So just adding the check for RCU_SCHEDULER_RUNNING should fix this,
>> unless I am missing something (always possible!).
>> 
>>                            Thanx, Paul
>> 
>> ---
>> kernel/rcu/tasks.h | 20 ++++++++++++++++++--
>> 1 file changed, 18 insertions(+), 2 deletions(-)
>> 
>> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
>> index 4a991311be9b..08e72c6462d8 100644
>> --- a/kernel/rcu/tasks.h
>> +++ b/kernel/rcu/tasks.h
>> @@ -1033,14 +1033,30 @@ static void rcu_tasks_be_rude(struct work_struct *work)
>> {
>> }
>> 
>> +static DEFINE_PER_CPU(struct work_struct, rude_work);
>> +
>> // Wait for one rude RCU-tasks grace period.
>> static void rcu_tasks_rude_wait_gp(struct rcu_tasks *rtp)
>> {
>> +    int cpu;
>> +    struct work_struct *work;
>> +
>> +    cpus_read_lock();
>>    if (num_online_cpus() <= 1)
>> -        return;    // Fastpath for only one CPU.
>> +        goto end;// Fastpath for only one CPU.
>> 
>>    rtp->n_ipis += cpumask_weight(cpu_online_mask);
>> -    schedule_on_each_cpu(rcu_tasks_be_rude);
>> +    for_each_online_cpu(cpu) {
>> +        work = per_cpu_ptr(&rude_work, cpu);
>> +        INIT_WORK(work, rcu_tasks_be_rude);
>> +        schedule_work_on(cpu, work);
>> +    }
>> +
>> +    for_each_online_cpu(cpu)
>> +        flush_work(per_cpu_ptr(&rude_work, cpu));
>> +
>> +end:
>> +    cpus_read_unlock();
>> }
>> 
>> void call_rcu_tasks_rude(struct rcu_head *rhp, rcu_callback_t func);
>> -- 
>> 2.25.1
>>
Paul E. McKenney Nov. 29, 2022, 3:18 p.m. UTC | #6
On Tue, Nov 29, 2022 at 06:25:04AM +0000, Zhang, Qiang1 wrote:
> > On Nov 28, 2022, at 11:54 PM, Zhang, Qiang1 <qiang1.zhang@intel.com> wrote:
> > 
> > On Mon, Nov 28, 2022 at 10:34:28PM +0800, Zqiang wrote:
> >> Currently, invoke rcu_tasks_rude_wait_gp() to wait one rude
> >> RCU-tasks grace period, if __num_online_cpus == 1, will return
> >> directly, indicates the end of the rude RCU-task grace period.
> >> suppose the system has two cpus, consider the following scenario:
> >> 
> >>    CPU0                                   CPU1 (going offline)
> >>                          migration/1 task:
> >>                                      cpu_stopper_thread
> >>                                       -> take_cpu_down
> >>                                          -> _cpu_disable
> >>                               (dec __num_online_cpus)
> >>                                          ->cpuhp_invoke_callback
> >>                                                preempt_disable
> >>                        access old_data0
> >>           task1
> >> del old_data0                                  .....
> >> synchronize_rcu_tasks_rude()
> >> task1 schedule out
> >> ....
> >> task2 schedule in
> >> rcu_tasks_rude_wait_gp()
> >>     ->__num_online_cpus == 1
> >>       ->return
> >> ....
> >> task1 schedule in
> >> ->free old_data0
> >>                                                preempt_enable
> >> 
> >> when CPU1 dec __num_online_cpus and __num_online_cpus is equal one,
> >> the CPU1 has not finished offline, stop_machine task(migration/1)
> >> still running on CPU1, maybe still accessing 'old_data0', but the
> >> 'old_data0' has freed on CPU0.
> >> 
> >> This commit add cpus_read_lock/unlock() protection before accessing
> >> __num_online_cpus variables, to ensure that the CPU in the offline
> >> process has been completed offline.
> >> 
> >> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> >> 
> >> First, good eyes and good catch!!!
> >> 
> >> The purpose of that check for num_online_cpus() is not performance
> >> on single-CPU systems, but rather correct operation during early boot.
> >> So a simpler way to make that work is to check for RCU_SCHEDULER_RUNNING,
> >> for example, as follows:
> >> 
> >>    if (rcu_scheduler_active != RCU_SCHEDULER_RUNNING &&
> >>        num_online_cpus() <= 1)
> >>        return;    // Early boot fastpath for only one CPU.
> > 
> > Hi Paul
> > 
> > During system startup, because the RCU_SCHEDULER_RUNNING is set after starting other CPUs, 
> > 
> >              CPU0                                                                       CPU1                                                                 
> > 
> > if (rcu_scheduler_active !=                                    
> >    RCU_SCHEDULER_RUNNING &&
> >           __num_online_cpus  == 1)                                               
> >    return;                                                                         inc  __num_online_cpus
> >                            (__num_online_cpus == 2)
> > 
> > CPU0 didn't notice the update of the __num_online_cpus variable by CPU1 in time
> > Can we move rcu_set_runtime_mode() before smp_init()
> > any thoughts?
> >
> >Is anyone expected to do rcu-tasks operation before the scheduler is running? 
> 
> Not sure if such a scenario exists.
> 
> >Typically this requires the tasks to context switch which is a scheduler operation.
> >
> >If the scheduler is not yet running, then I don’t think missing an update the __num_online_cpus matters since no one does a tasks-RCU synchronize.
> 
> Hi Joel
> 
> After the kernel_init task runs, before calling smp_init() to starting other CPUs, 
> the scheduler haven been initialization, task context switching can occur.

Good catch, thank you both.  For some reason, I was thinking that the
additional CPUs did not come online until later.

So how about this?

	if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
		return;    // Early boot fastpath.

If this condition is true, there is only one CPU and no scheduler,
thus no preemption.

						Thanx, Paul

> Thanks
> Zqiang
> 
> >
> >Or did I miss something?
> >
> >Thanks.
> >
> >
> >
> > 
> > Thanks
> > Zqiang
> > 
> >> 
> >> This works because rcu_scheduler_active is set to RCU_SCHEDULER_RUNNING
> >> long before it is possible to offline CPUs.
> >> 
> >> Yes, schedule_on_each_cpu() does do cpus_read_lock(), again, good eyes,
> >> and it also unnecessarily does the schedule_work_on() the current CPU,
> >> but the code calling synchronize_rcu_tasks_rude() is on high-overhead
> >> code paths, so this overhead is down in the noise.
> >> 
> >> Until further notice, anyway.
> >> 
> >> So simplicity is much more important than performance in this code.
> >> So just adding the check for RCU_SCHEDULER_RUNNING should fix this,
> >> unless I am missing something (always possible!).
> >> 
> >>                            Thanx, Paul
> >> 
> >> ---
> >> kernel/rcu/tasks.h | 20 ++++++++++++++++++--
> >> 1 file changed, 18 insertions(+), 2 deletions(-)
> >> 
> >> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> >> index 4a991311be9b..08e72c6462d8 100644
> >> --- a/kernel/rcu/tasks.h
> >> +++ b/kernel/rcu/tasks.h
> >> @@ -1033,14 +1033,30 @@ static void rcu_tasks_be_rude(struct work_struct *work)
> >> {
> >> }
> >> 
> >> +static DEFINE_PER_CPU(struct work_struct, rude_work);
> >> +
> >> // Wait for one rude RCU-tasks grace period.
> >> static void rcu_tasks_rude_wait_gp(struct rcu_tasks *rtp)
> >> {
> >> +    int cpu;
> >> +    struct work_struct *work;
> >> +
> >> +    cpus_read_lock();
> >>    if (num_online_cpus() <= 1)
> >> -        return;    // Fastpath for only one CPU.
> >> +        goto end;// Fastpath for only one CPU.
> >> 
> >>    rtp->n_ipis += cpumask_weight(cpu_online_mask);
> >> -    schedule_on_each_cpu(rcu_tasks_be_rude);
> >> +    for_each_online_cpu(cpu) {
> >> +        work = per_cpu_ptr(&rude_work, cpu);
> >> +        INIT_WORK(work, rcu_tasks_be_rude);
> >> +        schedule_work_on(cpu, work);
> >> +    }
> >> +
> >> +    for_each_online_cpu(cpu)
> >> +        flush_work(per_cpu_ptr(&rude_work, cpu));
> >> +
> >> +end:
> >> +    cpus_read_unlock();
> >> }
> >> 
> >> void call_rcu_tasks_rude(struct rcu_head *rhp, rcu_callback_t func);
> >> -- 
> >> 2.25.1
> >>
Joel Fernandes Nov. 29, 2022, 4 p.m. UTC | #7
> On Nov 29, 2022, at 10:18 AM, Paul E. McKenney <paulmck@kernel.org> wrote:
> 
> On Tue, Nov 29, 2022 at 06:25:04AM +0000, Zhang, Qiang1 wrote:
>>>> On Nov 28, 2022, at 11:54 PM, Zhang, Qiang1 <qiang1.zhang@intel.com> wrote:
>>> 
>>> On Mon, Nov 28, 2022 at 10:34:28PM +0800, Zqiang wrote:
>>>> Currently, invoke rcu_tasks_rude_wait_gp() to wait one rude
>>>> RCU-tasks grace period, if __num_online_cpus == 1, will return
>>>> directly, indicates the end of the rude RCU-task grace period.
>>>> suppose the system has two cpus, consider the following scenario:
>>>> 
>>>>   CPU0                                   CPU1 (going offline)
>>>>                         migration/1 task:
>>>>                                     cpu_stopper_thread
>>>>                                      -> take_cpu_down
>>>>                                         -> _cpu_disable
>>>>                              (dec __num_online_cpus)
>>>>                                         ->cpuhp_invoke_callback
>>>>                                               preempt_disable
>>>>                       access old_data0
>>>>          task1
>>>> del old_data0                                  .....
>>>> synchronize_rcu_tasks_rude()
>>>> task1 schedule out
>>>> ....
>>>> task2 schedule in
>>>> rcu_tasks_rude_wait_gp()
>>>>    ->__num_online_cpus == 1
>>>>      ->return
>>>> ....
>>>> task1 schedule in
>>>> ->free old_data0
>>>>                                               preempt_enable
>>>> 
>>>> when CPU1 dec __num_online_cpus and __num_online_cpus is equal one,
>>>> the CPU1 has not finished offline, stop_machine task(migration/1)
>>>> still running on CPU1, maybe still accessing 'old_data0', but the
>>>> 'old_data0' has freed on CPU0.
>>>> 
>>>> This commit add cpus_read_lock/unlock() protection before accessing
>>>> __num_online_cpus variables, to ensure that the CPU in the offline
>>>> process has been completed offline.
>>>> 
>>>> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
>>>> 
>>>> First, good eyes and good catch!!!
>>>> 
>>>> The purpose of that check for num_online_cpus() is not performance
>>>> on single-CPU systems, but rather correct operation during early boot.
>>>> So a simpler way to make that work is to check for RCU_SCHEDULER_RUNNING,
>>>> for example, as follows:
>>>> 
>>>>   if (rcu_scheduler_active != RCU_SCHEDULER_RUNNING &&
>>>>       num_online_cpus() <= 1)
>>>>       return;    // Early boot fastpath for only one CPU.
>>> 
>>> Hi Paul
>>> 
>>> During system startup, because the RCU_SCHEDULER_RUNNING is set after starting other CPUs, 
>>> 
>>>             CPU0                                                                       CPU1                                                                 
>>> 
>>> if (rcu_scheduler_active !=                                    
>>>   RCU_SCHEDULER_RUNNING &&
>>>          __num_online_cpus  == 1)                                               
>>>   return;                                                                         inc  __num_online_cpus
>>>                           (__num_online_cpus == 2)
>>> 
>>> CPU0 didn't notice the update of the __num_online_cpus variable by CPU1 in time
>>> Can we move rcu_set_runtime_mode() before smp_init()
>>> any thoughts?
>>> 
>>> Is anyone expected to do rcu-tasks operation before the scheduler is running? 
>> 
>> Not sure if such a scenario exists.
>> 
>>> Typically this requires the tasks to context switch which is a scheduler operation.
>>> 
>>> If the scheduler is not yet running, then I don’t think missing an update the __num_online_cpus matters since no one does a tasks-RCU synchronize.
>> 
>> Hi Joel
>> 
>> After the kernel_init task runs, before calling smp_init() to starting other CPUs, 
>> the scheduler haven been initialization, task context switching can occur.
> 
> Good catch, thank you both.  For some reason, I was thinking that the
> additional CPUs did not come online until later.
> 
> So how about this?
> 
>    if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
>        return;    // Early boot fastpath.
> 
> If this condition is true, there is only one CPU and no scheduler,
> thus no preemption.

Agreed. I was going to suggest exactly this :)

Ack.
(Replying by phone but feel free to add my reviewed by tag).

- Joel


> 
>                        Thanx, Paul
> 
>> Thanks
>> Zqiang
>> 
>>> 
>>> Or did I miss something?
>>> 
>>> Thanks.
>>> 
>>> 
>>> 
>>> 
>>> Thanks
>>> Zqiang
>>> 
>>>> 
>>>> This works because rcu_scheduler_active is set to RCU_SCHEDULER_RUNNING
>>>> long before it is possible to offline CPUs.
>>>> 
>>>> Yes, schedule_on_each_cpu() does do cpus_read_lock(), again, good eyes,
>>>> and it also unnecessarily does the schedule_work_on() the current CPU,
>>>> but the code calling synchronize_rcu_tasks_rude() is on high-overhead
>>>> code paths, so this overhead is down in the noise.
>>>> 
>>>> Until further notice, anyway.
>>>> 
>>>> So simplicity is much more important than performance in this code.
>>>> So just adding the check for RCU_SCHEDULER_RUNNING should fix this,
>>>> unless I am missing something (always possible!).
>>>> 
>>>>                           Thanx, Paul
>>>> 
>>>> ---
>>>> kernel/rcu/tasks.h | 20 ++++++++++++++++++--
>>>> 1 file changed, 18 insertions(+), 2 deletions(-)
>>>> 
>>>> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
>>>> index 4a991311be9b..08e72c6462d8 100644
>>>> --- a/kernel/rcu/tasks.h
>>>> +++ b/kernel/rcu/tasks.h
>>>> @@ -1033,14 +1033,30 @@ static void rcu_tasks_be_rude(struct work_struct *work)
>>>> {
>>>> }
>>>> 
>>>> +static DEFINE_PER_CPU(struct work_struct, rude_work);
>>>> +
>>>> // Wait for one rude RCU-tasks grace period.
>>>> static void rcu_tasks_rude_wait_gp(struct rcu_tasks *rtp)
>>>> {
>>>> +    int cpu;
>>>> +    struct work_struct *work;
>>>> +
>>>> +    cpus_read_lock();
>>>>   if (num_online_cpus() <= 1)
>>>> -        return;    // Fastpath for only one CPU.
>>>> +        goto end;// Fastpath for only one CPU.
>>>> 
>>>>   rtp->n_ipis += cpumask_weight(cpu_online_mask);
>>>> -    schedule_on_each_cpu(rcu_tasks_be_rude);
>>>> +    for_each_online_cpu(cpu) {
>>>> +        work = per_cpu_ptr(&rude_work, cpu);
>>>> +        INIT_WORK(work, rcu_tasks_be_rude);
>>>> +        schedule_work_on(cpu, work);
>>>> +    }
>>>> +
>>>> +    for_each_online_cpu(cpu)
>>>> +        flush_work(per_cpu_ptr(&rude_work, cpu));
>>>> +
>>>> +end:
>>>> +    cpus_read_unlock();
>>>> }
>>>> 
>>>> void call_rcu_tasks_rude(struct rcu_head *rhp, rcu_callback_t func);
>>>> -- 
>>>> 2.25.1
>>>>
Paul E. McKenney Nov. 29, 2022, 7:18 p.m. UTC | #8
On Tue, Nov 29, 2022 at 11:00:05AM -0500, Joel Fernandes wrote:
> 
> 
> > On Nov 29, 2022, at 10:18 AM, Paul E. McKenney <paulmck@kernel.org> wrote:
> > 
> > On Tue, Nov 29, 2022 at 06:25:04AM +0000, Zhang, Qiang1 wrote:
> >>>> On Nov 28, 2022, at 11:54 PM, Zhang, Qiang1 <qiang1.zhang@intel.com> wrote:
> >>> 
> >>> On Mon, Nov 28, 2022 at 10:34:28PM +0800, Zqiang wrote:
> >>>> Currently, invoke rcu_tasks_rude_wait_gp() to wait one rude
> >>>> RCU-tasks grace period, if __num_online_cpus == 1, will return
> >>>> directly, indicates the end of the rude RCU-task grace period.
> >>>> suppose the system has two cpus, consider the following scenario:
> >>>> 
> >>>>   CPU0                                   CPU1 (going offline)
> >>>>                         migration/1 task:
> >>>>                                     cpu_stopper_thread
> >>>>                                      -> take_cpu_down
> >>>>                                         -> _cpu_disable
> >>>>                              (dec __num_online_cpus)
> >>>>                                         ->cpuhp_invoke_callback
> >>>>                                               preempt_disable
> >>>>                       access old_data0
> >>>>          task1
> >>>> del old_data0                                  .....
> >>>> synchronize_rcu_tasks_rude()
> >>>> task1 schedule out
> >>>> ....
> >>>> task2 schedule in
> >>>> rcu_tasks_rude_wait_gp()
> >>>>    ->__num_online_cpus == 1
> >>>>      ->return
> >>>> ....
> >>>> task1 schedule in
> >>>> ->free old_data0
> >>>>                                               preempt_enable
> >>>> 
> >>>> when CPU1 dec __num_online_cpus and __num_online_cpus is equal one,
> >>>> the CPU1 has not finished offline, stop_machine task(migration/1)
> >>>> still running on CPU1, maybe still accessing 'old_data0', but the
> >>>> 'old_data0' has freed on CPU0.
> >>>> 
> >>>> This commit add cpus_read_lock/unlock() protection before accessing
> >>>> __num_online_cpus variables, to ensure that the CPU in the offline
> >>>> process has been completed offline.
> >>>> 
> >>>> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> >>>> 
> >>>> First, good eyes and good catch!!!
> >>>> 
> >>>> The purpose of that check for num_online_cpus() is not performance
> >>>> on single-CPU systems, but rather correct operation during early boot.
> >>>> So a simpler way to make that work is to check for RCU_SCHEDULER_RUNNING,
> >>>> for example, as follows:
> >>>> 
> >>>>   if (rcu_scheduler_active != RCU_SCHEDULER_RUNNING &&
> >>>>       num_online_cpus() <= 1)
> >>>>       return;    // Early boot fastpath for only one CPU.
> >>> 
> >>> Hi Paul
> >>> 
> >>> During system startup, because the RCU_SCHEDULER_RUNNING is set after starting other CPUs, 
> >>> 
> >>>             CPU0                                                                       CPU1                                                                 
> >>> 
> >>> if (rcu_scheduler_active !=                                    
> >>>   RCU_SCHEDULER_RUNNING &&
> >>>          __num_online_cpus  == 1)                                               
> >>>   return;                                                                         inc  __num_online_cpus
> >>>                           (__num_online_cpus == 2)
> >>> 
> >>> CPU0 didn't notice the update of the __num_online_cpus variable by CPU1 in time
> >>> Can we move rcu_set_runtime_mode() before smp_init()
> >>> any thoughts?
> >>> 
> >>> Is anyone expected to do rcu-tasks operation before the scheduler is running? 
> >> 
> >> Not sure if such a scenario exists.
> >> 
> >>> Typically this requires the tasks to context switch which is a scheduler operation.
> >>> 
> >>> If the scheduler is not yet running, then I don’t think missing an update the __num_online_cpus matters since no one does a tasks-RCU synchronize.
> >> 
> >> Hi Joel
> >> 
> >> After the kernel_init task runs, before calling smp_init() to starting other CPUs, 
> >> the scheduler haven been initialization, task context switching can occur.
> > 
> > Good catch, thank you both.  For some reason, I was thinking that the
> > additional CPUs did not come online until later.
> > 
> > So how about this?
> > 
> >    if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
> >        return;    // Early boot fastpath.
> > 
> > If this condition is true, there is only one CPU and no scheduler,
> > thus no preemption.
> 
> Agreed. I was going to suggest exactly this :)
> 
> Ack.
> (Replying by phone but feel free to add my reviewed by tag).

I should add that the downside of this approach is that there is a short
time between the scheduler initializing and workqueues fully initializing
where a critical-path call to synchronize_rcu_tasks() will hang the
system.  I do -not- consider this to be a real problem because RCU had
some hundreds of calls to synchronize_rcu() before this became an issue.

So this should be fine, but please recall this for when/if someone does
stick a synchronize_rcu_tasks() into that short time.  ;-)

							Thanx, Paul

> - Joel
> 
> 
> > 
> >                        Thanx, Paul
> > 
> >> Thanks
> >> Zqiang
> >> 
> >>> 
> >>> Or did I miss something?
> >>> 
> >>> Thanks.
> >>> 
> >>> 
> >>> 
> >>> 
> >>> Thanks
> >>> Zqiang
> >>> 
> >>>> 
> >>>> This works because rcu_scheduler_active is set to RCU_SCHEDULER_RUNNING
> >>>> long before it is possible to offline CPUs.
> >>>> 
> >>>> Yes, schedule_on_each_cpu() does do cpus_read_lock(), again, good eyes,
> >>>> and it also unnecessarily does the schedule_work_on() the current CPU,
> >>>> but the code calling synchronize_rcu_tasks_rude() is on high-overhead
> >>>> code paths, so this overhead is down in the noise.
> >>>> 
> >>>> Until further notice, anyway.
> >>>> 
> >>>> So simplicity is much more important than performance in this code.
> >>>> So just adding the check for RCU_SCHEDULER_RUNNING should fix this,
> >>>> unless I am missing something (always possible!).
> >>>> 
> >>>>                           Thanx, Paul
> >>>> 
> >>>> ---
> >>>> kernel/rcu/tasks.h | 20 ++++++++++++++++++--
> >>>> 1 file changed, 18 insertions(+), 2 deletions(-)
> >>>> 
> >>>> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> >>>> index 4a991311be9b..08e72c6462d8 100644
> >>>> --- a/kernel/rcu/tasks.h
> >>>> +++ b/kernel/rcu/tasks.h
> >>>> @@ -1033,14 +1033,30 @@ static void rcu_tasks_be_rude(struct work_struct *work)
> >>>> {
> >>>> }
> >>>> 
> >>>> +static DEFINE_PER_CPU(struct work_struct, rude_work);
> >>>> +
> >>>> // Wait for one rude RCU-tasks grace period.
> >>>> static void rcu_tasks_rude_wait_gp(struct rcu_tasks *rtp)
> >>>> {
> >>>> +    int cpu;
> >>>> +    struct work_struct *work;
> >>>> +
> >>>> +    cpus_read_lock();
> >>>>   if (num_online_cpus() <= 1)
> >>>> -        return;    // Fastpath for only one CPU.
> >>>> +        goto end;// Fastpath for only one CPU.
> >>>> 
> >>>>   rtp->n_ipis += cpumask_weight(cpu_online_mask);
> >>>> -    schedule_on_each_cpu(rcu_tasks_be_rude);
> >>>> +    for_each_online_cpu(cpu) {
> >>>> +        work = per_cpu_ptr(&rude_work, cpu);
> >>>> +        INIT_WORK(work, rcu_tasks_be_rude);
> >>>> +        schedule_work_on(cpu, work);
> >>>> +    }
> >>>> +
> >>>> +    for_each_online_cpu(cpu)
> >>>> +        flush_work(per_cpu_ptr(&rude_work, cpu));
> >>>> +
> >>>> +end:
> >>>> +    cpus_read_unlock();
> >>>> }
> >>>> 
> >>>> void call_rcu_tasks_rude(struct rcu_head *rhp, rcu_callback_t func);
> >>>> -- 
> >>>> 2.25.1
> >>>>
Joel Fernandes Nov. 29, 2022, 8:01 p.m. UTC | #9
> On Nov 29, 2022, at 2:18 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
> 
> On Tue, Nov 29, 2022 at 11:00:05AM -0500, Joel Fernandes wrote:
>> 
>> 
>>>> On Nov 29, 2022, at 10:18 AM, Paul E. McKenney <paulmck@kernel.org> wrote:
>>> 
>>> On Tue, Nov 29, 2022 at 06:25:04AM +0000, Zhang, Qiang1 wrote:
>>>>>> On Nov 28, 2022, at 11:54 PM, Zhang, Qiang1 <qiang1.zhang@intel.com> wrote:
>>>>> 
>>>>> On Mon, Nov 28, 2022 at 10:34:28PM +0800, Zqiang wrote:
>>>>>> Currently, invoke rcu_tasks_rude_wait_gp() to wait one rude
>>>>>> RCU-tasks grace period, if __num_online_cpus == 1, will return
>>>>>> directly, indicates the end of the rude RCU-task grace period.
>>>>>> suppose the system has two cpus, consider the following scenario:
>>>>>> 
>>>>>>  CPU0                                   CPU1 (going offline)
>>>>>>                        migration/1 task:
>>>>>>                                    cpu_stopper_thread
>>>>>>                                     -> take_cpu_down
>>>>>>                                        -> _cpu_disable
>>>>>>                             (dec __num_online_cpus)
>>>>>>                                        ->cpuhp_invoke_callback
>>>>>>                                              preempt_disable
>>>>>>                      access old_data0
>>>>>>         task1
>>>>>> del old_data0                                  .....
>>>>>> synchronize_rcu_tasks_rude()
>>>>>> task1 schedule out
>>>>>> ....
>>>>>> task2 schedule in
>>>>>> rcu_tasks_rude_wait_gp()
>>>>>>   ->__num_online_cpus == 1
>>>>>>     ->return
>>>>>> ....
>>>>>> task1 schedule in
>>>>>> ->free old_data0
>>>>>>                                              preempt_enable
>>>>>> 
>>>>>> when CPU1 dec __num_online_cpus and __num_online_cpus is equal one,
>>>>>> the CPU1 has not finished offline, stop_machine task(migration/1)
>>>>>> still running on CPU1, maybe still accessing 'old_data0', but the
>>>>>> 'old_data0' has freed on CPU0.
>>>>>> 
>>>>>> This commit add cpus_read_lock/unlock() protection before accessing
>>>>>> __num_online_cpus variables, to ensure that the CPU in the offline
>>>>>> process has been completed offline.
>>>>>> 
>>>>>> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
>>>>>> 
>>>>>> First, good eyes and good catch!!!
>>>>>> 
>>>>>> The purpose of that check for num_online_cpus() is not performance
>>>>>> on single-CPU systems, but rather correct operation during early boot.
>>>>>> So a simpler way to make that work is to check for RCU_SCHEDULER_RUNNING,
>>>>>> for example, as follows:
>>>>>> 
>>>>>>  if (rcu_scheduler_active != RCU_SCHEDULER_RUNNING &&
>>>>>>      num_online_cpus() <= 1)
>>>>>>      return;    // Early boot fastpath for only one CPU.
>>>>> 
>>>>> Hi Paul
>>>>> 
>>>>> During system startup, because the RCU_SCHEDULER_RUNNING is set after starting other CPUs, 
>>>>> 
>>>>>            CPU0                                                                       CPU1                                                                 
>>>>> 
>>>>> if (rcu_scheduler_active !=                                    
>>>>>  RCU_SCHEDULER_RUNNING &&
>>>>>         __num_online_cpus  == 1)                                               
>>>>>  return;                                                                         inc  __num_online_cpus
>>>>>                          (__num_online_cpus == 2)
>>>>> 
>>>>> CPU0 didn't notice the update of the __num_online_cpus variable by CPU1 in time
>>>>> Can we move rcu_set_runtime_mode() before smp_init()
>>>>> any thoughts?
>>>>> 
>>>>> Is anyone expected to do rcu-tasks operation before the scheduler is running? 
>>>> 
>>>> Not sure if such a scenario exists.
>>>> 
>>>>> Typically this requires the tasks to context switch which is a scheduler operation.
>>>>> 
>>>>> If the scheduler is not yet running, then I don’t think missing an update the __num_online_cpus matters since no one does a tasks-RCU synchronize.
>>>> 
>>>> Hi Joel
>>>> 
>>>> After the kernel_init task runs, before calling smp_init() to starting other CPUs, 
>>>> the scheduler haven been initialization, task context switching can occur.
>>> 
>>> Good catch, thank you both.  For some reason, I was thinking that the
>>> additional CPUs did not come online until later.
>>> 
>>> So how about this?
>>> 
>>>   if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
>>>       return;    // Early boot fastpath.
>>> 
>>> If this condition is true, there is only one CPU and no scheduler,
>>> thus no preemption.
>> 
>> Agreed. I was going to suggest exactly this :)
>> 
>> Ack.
>> (Replying by phone but feel free to add my reviewed by tag).
> 
> I should add that the downside of this approach is that there is a short
> time between the scheduler initializing and workqueues fully initializing
> where a critical-path call to synchronize_rcu_tasks() will hang the
> system.  I do -not- consider this to be a real problem because RCU had
> some hundreds of calls to synchronize_rcu() before this became an issue.
> 
> So this should be fine, but please recall this for when/if someone does
> stick a synchronize_rcu_tasks() into that short time.  ;-)

Thanks Paul, but why would anyone want to do sync rcu tasks, before the scheduler is fully initialized? 
Maybe we can add a warning here in the if-early-return path, to make sure no such usage slips. And then we can look into someone using it that way, if they ever start using it.

Thanks,

- Joel

> 
>                            Thanx, Paul
> 
>> - Joel
>> 
>> 
>>> 
>>>                       Thanx, Paul
>>> 
>>>> Thanks
>>>> Zqiang
>>>> 
>>>>> 
>>>>> Or did I miss something?
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> Thanks
>>>>> Zqiang
>>>>> 
>>>>>> 
>>>>>> This works because rcu_scheduler_active is set to RCU_SCHEDULER_RUNNING
>>>>>> long before it is possible to offline CPUs.
>>>>>> 
>>>>>> Yes, schedule_on_each_cpu() does do cpus_read_lock(), again, good eyes,
>>>>>> and it also unnecessarily does the schedule_work_on() the current CPU,
>>>>>> but the code calling synchronize_rcu_tasks_rude() is on high-overhead
>>>>>> code paths, so this overhead is down in the noise.
>>>>>> 
>>>>>> Until further notice, anyway.
>>>>>> 
>>>>>> So simplicity is much more important than performance in this code.
>>>>>> So just adding the check for RCU_SCHEDULER_RUNNING should fix this,
>>>>>> unless I am missing something (always possible!).
>>>>>> 
>>>>>>                          Thanx, Paul
>>>>>> 
>>>>>> ---
>>>>>> kernel/rcu/tasks.h | 20 ++++++++++++++++++--
>>>>>> 1 file changed, 18 insertions(+), 2 deletions(-)
>>>>>> 
>>>>>> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
>>>>>> index 4a991311be9b..08e72c6462d8 100644
>>>>>> --- a/kernel/rcu/tasks.h
>>>>>> +++ b/kernel/rcu/tasks.h
>>>>>> @@ -1033,14 +1033,30 @@ static void rcu_tasks_be_rude(struct work_struct *work)
>>>>>> {
>>>>>> }
>>>>>> 
>>>>>> +static DEFINE_PER_CPU(struct work_struct, rude_work);
>>>>>> +
>>>>>> // Wait for one rude RCU-tasks grace period.
>>>>>> static void rcu_tasks_rude_wait_gp(struct rcu_tasks *rtp)
>>>>>> {
>>>>>> +    int cpu;
>>>>>> +    struct work_struct *work;
>>>>>> +
>>>>>> +    cpus_read_lock();
>>>>>>  if (num_online_cpus() <= 1)
>>>>>> -        return;    // Fastpath for only one CPU.
>>>>>> +        goto end;// Fastpath for only one CPU.
>>>>>> 
>>>>>>  rtp->n_ipis += cpumask_weight(cpu_online_mask);
>>>>>> -    schedule_on_each_cpu(rcu_tasks_be_rude);
>>>>>> +    for_each_online_cpu(cpu) {
>>>>>> +        work = per_cpu_ptr(&rude_work, cpu);
>>>>>> +        INIT_WORK(work, rcu_tasks_be_rude);
>>>>>> +        schedule_work_on(cpu, work);
>>>>>> +    }
>>>>>> +
>>>>>> +    for_each_online_cpu(cpu)
>>>>>> +        flush_work(per_cpu_ptr(&rude_work, cpu));
>>>>>> +
>>>>>> +end:
>>>>>> +    cpus_read_unlock();
>>>>>> }
>>>>>> 
>>>>>> void call_rcu_tasks_rude(struct rcu_head *rhp, rcu_callback_t func);
>>>>>> -- 
>>>>>> 2.25.1
>>>>>>
Paul E. McKenney Nov. 29, 2022, 8:13 p.m. UTC | #10
On Tue, Nov 29, 2022 at 03:01:12PM -0500, Joel Fernandes wrote:
> > On Nov 29, 2022, at 2:18 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
> > On Tue, Nov 29, 2022 at 11:00:05AM -0500, Joel Fernandes wrote:
> >>>> On Nov 29, 2022, at 10:18 AM, Paul E. McKenney <paulmck@kernel.org> wrote:
> >>> On Tue, Nov 29, 2022 at 06:25:04AM +0000, Zhang, Qiang1 wrote:
> >>>>>> On Nov 28, 2022, at 11:54 PM, Zhang, Qiang1 <qiang1.zhang@intel.com> wrote:
> >>>>> 
> >>>>> On Mon, Nov 28, 2022 at 10:34:28PM +0800, Zqiang wrote:
> >>>>>> Currently, invoke rcu_tasks_rude_wait_gp() to wait one rude
> >>>>>> RCU-tasks grace period, if __num_online_cpus == 1, will return
> >>>>>> directly, indicates the end of the rude RCU-task grace period.
> >>>>>> suppose the system has two cpus, consider the following scenario:
> >>>>>> 
> >>>>>>  CPU0                                   CPU1 (going offline)
> >>>>>>                        migration/1 task:
> >>>>>>                                    cpu_stopper_thread
> >>>>>>                                     -> take_cpu_down
> >>>>>>                                        -> _cpu_disable
> >>>>>>                             (dec __num_online_cpus)
> >>>>>>                                        ->cpuhp_invoke_callback
> >>>>>>                                              preempt_disable
> >>>>>>                      access old_data0
> >>>>>>         task1
> >>>>>> del old_data0                                  .....
> >>>>>> synchronize_rcu_tasks_rude()
> >>>>>> task1 schedule out
> >>>>>> ....
> >>>>>> task2 schedule in
> >>>>>> rcu_tasks_rude_wait_gp()
> >>>>>>   ->__num_online_cpus == 1
> >>>>>>     ->return
> >>>>>> ....
> >>>>>> task1 schedule in
> >>>>>> ->free old_data0
> >>>>>>                                              preempt_enable
> >>>>>> 
> >>>>>> when CPU1 dec __num_online_cpus and __num_online_cpus is equal one,
> >>>>>> the CPU1 has not finished offline, stop_machine task(migration/1)
> >>>>>> still running on CPU1, maybe still accessing 'old_data0', but the
> >>>>>> 'old_data0' has freed on CPU0.
> >>>>>> 
> >>>>>> This commit add cpus_read_lock/unlock() protection before accessing
> >>>>>> __num_online_cpus variables, to ensure that the CPU in the offline
> >>>>>> process has been completed offline.
> >>>>>> 
> >>>>>> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> >>>>>> 
> >>>>>> First, good eyes and good catch!!!
> >>>>>> 
> >>>>>> The purpose of that check for num_online_cpus() is not performance
> >>>>>> on single-CPU systems, but rather correct operation during early boot.
> >>>>>> So a simpler way to make that work is to check for RCU_SCHEDULER_RUNNING,
> >>>>>> for example, as follows:
> >>>>>> 
> >>>>>>  if (rcu_scheduler_active != RCU_SCHEDULER_RUNNING &&
> >>>>>>      num_online_cpus() <= 1)
> >>>>>>      return;    // Early boot fastpath for only one CPU.
> >>>>> 
> >>>>> Hi Paul
> >>>>> 
> >>>>> During system startup, because the RCU_SCHEDULER_RUNNING is set after starting other CPUs, 
> >>>>> 
> >>>>>            CPU0                                                                       CPU1                                                                 
> >>>>> 
> >>>>> if (rcu_scheduler_active !=                                    
> >>>>>  RCU_SCHEDULER_RUNNING &&
> >>>>>         __num_online_cpus  == 1)                                               
> >>>>>  return;                                                                         inc  __num_online_cpus
> >>>>>                          (__num_online_cpus == 2)
> >>>>> 
> >>>>> CPU0 didn't notice the update of the __num_online_cpus variable by CPU1 in time
> >>>>> Can we move rcu_set_runtime_mode() before smp_init()
> >>>>> any thoughts?
> >>>>> 
> >>>>> Is anyone expected to do rcu-tasks operation before the scheduler is running? 
> >>>> 
> >>>> Not sure if such a scenario exists.
> >>>> 
> >>>>> Typically this requires the tasks to context switch which is a scheduler operation.
> >>>>> 
> >>>>> If the scheduler is not yet running, then I don’t think missing an update the __num_online_cpus matters since no one does a tasks-RCU synchronize.
> >>>> 
> >>>> Hi Joel
> >>>> 
> >>>> After the kernel_init task runs, before calling smp_init() to starting other CPUs, 
> >>>> the scheduler haven been initialization, task context switching can occur.
> >>> 
> >>> Good catch, thank you both.  For some reason, I was thinking that the
> >>> additional CPUs did not come online until later.
> >>> 
> >>> So how about this?
> >>> 
> >>>   if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
> >>>       return;    // Early boot fastpath.
> >>> 
> >>> If this condition is true, there is only one CPU and no scheduler,
> >>> thus no preemption.
> >> 
> >> Agreed. I was going to suggest exactly this :)
> >> 
> >> Ack.
> >> (Replying by phone but feel free to add my reviewed by tag).
> > 
> > I should add that the downside of this approach is that there is a short
> > time between the scheduler initializing and workqueues fully initializing
> > where a critical-path call to synchronize_rcu_tasks() will hang the
> > system.  I do -not- consider this to be a real problem because RCU had
> > some hundreds of calls to synchronize_rcu() before this became an issue.
> > 
> > So this should be fine, but please recall this for when/if someone does
> > stick a synchronize_rcu_tasks() into that short time.  ;-)
> 
> Thanks Paul, but why would anyone want to do sync rcu tasks, before
> the scheduler is fully initialized?

I could ask that same question of a number of other RCU API members.  ;-)

> Maybe we can add a warning here in the if-early-return path, to make
> sure no such usage slips. And then we can look into someone using it
> that way, if they ever start using it.

I expect that it would be more work to code and maintain any such warning
than it would to diagnose the hang, so let's leave it as is.

							Thanx, Paul

> Thanks,
> 
> - Joel
> 
> > 
> >                            Thanx, Paul
> > 
> >> - Joel
> >> 
> >> 
> >>> 
> >>>                       Thanx, Paul
> >>> 
> >>>> Thanks
> >>>> Zqiang
> >>>> 
> >>>>> 
> >>>>> Or did I miss something?
> >>>>> 
> >>>>> Thanks.
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> Thanks
> >>>>> Zqiang
> >>>>> 
> >>>>>> 
> >>>>>> This works because rcu_scheduler_active is set to RCU_SCHEDULER_RUNNING
> >>>>>> long before it is possible to offline CPUs.
> >>>>>> 
> >>>>>> Yes, schedule_on_each_cpu() does do cpus_read_lock(), again, good eyes,
> >>>>>> and it also unnecessarily does the schedule_work_on() the current CPU,
> >>>>>> but the code calling synchronize_rcu_tasks_rude() is on high-overhead
> >>>>>> code paths, so this overhead is down in the noise.
> >>>>>> 
> >>>>>> Until further notice, anyway.
> >>>>>> 
> >>>>>> So simplicity is much more important than performance in this code.
> >>>>>> So just adding the check for RCU_SCHEDULER_RUNNING should fix this,
> >>>>>> unless I am missing something (always possible!).
> >>>>>> 
> >>>>>>                          Thanx, Paul
> >>>>>> 
> >>>>>> ---
> >>>>>> kernel/rcu/tasks.h | 20 ++++++++++++++++++--
> >>>>>> 1 file changed, 18 insertions(+), 2 deletions(-)
> >>>>>> 
> >>>>>> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> >>>>>> index 4a991311be9b..08e72c6462d8 100644
> >>>>>> --- a/kernel/rcu/tasks.h
> >>>>>> +++ b/kernel/rcu/tasks.h
> >>>>>> @@ -1033,14 +1033,30 @@ static void rcu_tasks_be_rude(struct work_struct *work)
> >>>>>> {
> >>>>>> }
> >>>>>> 
> >>>>>> +static DEFINE_PER_CPU(struct work_struct, rude_work);
> >>>>>> +
> >>>>>> // Wait for one rude RCU-tasks grace period.
> >>>>>> static void rcu_tasks_rude_wait_gp(struct rcu_tasks *rtp)
> >>>>>> {
> >>>>>> +    int cpu;
> >>>>>> +    struct work_struct *work;
> >>>>>> +
> >>>>>> +    cpus_read_lock();
> >>>>>>  if (num_online_cpus() <= 1)
> >>>>>> -        return;    // Fastpath for only one CPU.
> >>>>>> +        goto end;// Fastpath for only one CPU.
> >>>>>> 
> >>>>>>  rtp->n_ipis += cpumask_weight(cpu_online_mask);
> >>>>>> -    schedule_on_each_cpu(rcu_tasks_be_rude);
> >>>>>> +    for_each_online_cpu(cpu) {
> >>>>>> +        work = per_cpu_ptr(&rude_work, cpu);
> >>>>>> +        INIT_WORK(work, rcu_tasks_be_rude);
> >>>>>> +        schedule_work_on(cpu, work);
> >>>>>> +    }
> >>>>>> +
> >>>>>> +    for_each_online_cpu(cpu)
> >>>>>> +        flush_work(per_cpu_ptr(&rude_work, cpu));
> >>>>>> +
> >>>>>> +end:
> >>>>>> +    cpus_read_unlock();
> >>>>>> }
> >>>>>> 
> >>>>>> void call_rcu_tasks_rude(struct rcu_head *rhp, rcu_callback_t func);
> >>>>>> -- 
> >>>>>> 2.25.1
> >>>>>>
Joel Fernandes Nov. 29, 2022, 11:30 p.m. UTC | #11
> On Nov 29, 2022, at 3:13 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
> 
> On Tue, Nov 29, 2022 at 03:01:12PM -0500, Joel Fernandes wrote:
>>>> On Nov 29, 2022, at 2:18 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
>>> On Tue, Nov 29, 2022 at 11:00:05AM -0500, Joel Fernandes wrote:
>>>>>> On Nov 29, 2022, at 10:18 AM, Paul E. McKenney <paulmck@kernel.org> wrote:
>>>>> On Tue, Nov 29, 2022 at 06:25:04AM +0000, Zhang, Qiang1 wrote:
>>>>>>>> On Nov 28, 2022, at 11:54 PM, Zhang, Qiang1 <qiang1.zhang@intel.com> wrote:
>>>>>>> 
>>>>>>> On Mon, Nov 28, 2022 at 10:34:28PM +0800, Zqiang wrote:
>>>>>>>> Currently, invoke rcu_tasks_rude_wait_gp() to wait one rude
>>>>>>>> RCU-tasks grace period, if __num_online_cpus == 1, will return
>>>>>>>> directly, indicates the end of the rude RCU-task grace period.
>>>>>>>> suppose the system has two cpus, consider the following scenario:
>>>>>>>> 
>>>>>>>> CPU0                                   CPU1 (going offline)
>>>>>>>>                       migration/1 task:
>>>>>>>>                                   cpu_stopper_thread
>>>>>>>>                                    -> take_cpu_down
>>>>>>>>                                       -> _cpu_disable
>>>>>>>>                            (dec __num_online_cpus)
>>>>>>>>                                       ->cpuhp_invoke_callback
>>>>>>>>                                             preempt_disable
>>>>>>>>                     access old_data0
>>>>>>>>        task1
>>>>>>>> del old_data0                                  .....
>>>>>>>> synchronize_rcu_tasks_rude()
>>>>>>>> task1 schedule out
>>>>>>>> ....
>>>>>>>> task2 schedule in
>>>>>>>> rcu_tasks_rude_wait_gp()
>>>>>>>>  ->__num_online_cpus == 1
>>>>>>>>    ->return
>>>>>>>> ....
>>>>>>>> task1 schedule in
>>>>>>>> ->free old_data0
>>>>>>>>                                             preempt_enable
>>>>>>>> 
>>>>>>>> when CPU1 dec __num_online_cpus and __num_online_cpus is equal one,
>>>>>>>> the CPU1 has not finished offline, stop_machine task(migration/1)
>>>>>>>> still running on CPU1, maybe still accessing 'old_data0', but the
>>>>>>>> 'old_data0' has freed on CPU0.
>>>>>>>> 
>>>>>>>> This commit add cpus_read_lock/unlock() protection before accessing
>>>>>>>> __num_online_cpus variables, to ensure that the CPU in the offline
>>>>>>>> process has been completed offline.
>>>>>>>> 
>>>>>>>> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
>>>>>>>> 
>>>>>>>> First, good eyes and good catch!!!
>>>>>>>> 
>>>>>>>> The purpose of that check for num_online_cpus() is not performance
>>>>>>>> on single-CPU systems, but rather correct operation during early boot.
>>>>>>>> So a simpler way to make that work is to check for RCU_SCHEDULER_RUNNING,
>>>>>>>> for example, as follows:
>>>>>>>> 
>>>>>>>> if (rcu_scheduler_active != RCU_SCHEDULER_RUNNING &&
>>>>>>>>     num_online_cpus() <= 1)
>>>>>>>>     return;    // Early boot fastpath for only one CPU.
>>>>>>> 
>>>>>>> Hi Paul
>>>>>>> 
>>>>>>> During system startup, because the RCU_SCHEDULER_RUNNING is set after starting other CPUs, 
>>>>>>> 
>>>>>>>           CPU0                                                                       CPU1                                                                 
>>>>>>> 
>>>>>>> if (rcu_scheduler_active !=                                    
>>>>>>> RCU_SCHEDULER_RUNNING &&
>>>>>>>        __num_online_cpus  == 1)                                               
>>>>>>> return;                                                                         inc  __num_online_cpus
>>>>>>>                         (__num_online_cpus == 2)
>>>>>>> 
>>>>>>> CPU0 didn't notice the update of the __num_online_cpus variable by CPU1 in time
>>>>>>> Can we move rcu_set_runtime_mode() before smp_init()
>>>>>>> any thoughts?
>>>>>>> 
>>>>>>> Is anyone expected to do rcu-tasks operation before the scheduler is running? 
>>>>>> 
>>>>>> Not sure if such a scenario exists.
>>>>>> 
>>>>>>> Typically this requires the tasks to context switch which is a scheduler operation.
>>>>>>> 
>>>>>>> If the scheduler is not yet running, then I don’t think missing an update the __num_online_cpus matters since no one does a tasks-RCU synchronize.
>>>>>> 
>>>>>> Hi Joel
>>>>>> 
>>>>>> After the kernel_init task runs, before calling smp_init() to starting other CPUs, 
>>>>>> the scheduler haven been initialization, task context switching can occur.
>>>>> 
>>>>> Good catch, thank you both.  For some reason, I was thinking that the
>>>>> additional CPUs did not come online until later.
>>>>> 
>>>>> So how about this?
>>>>> 
>>>>>  if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
>>>>>      return;    // Early boot fastpath.
>>>>> 
>>>>> If this condition is true, there is only one CPU and no scheduler,
>>>>> thus no preemption.
>>>> 
>>>> Agreed. I was going to suggest exactly this :)
>>>> 
>>>> Ack.
>>>> (Replying by phone but feel free to add my reviewed by tag).
>>> 
>>> I should add that the downside of this approach is that there is a short
>>> time between the scheduler initializing and workqueues fully initializing
>>> where a critical-path call to synchronize_rcu_tasks() will hang the
>>> system.  I do -not- consider this to be a real problem because RCU had
>>> some hundreds of calls to synchronize_rcu() before this became an issue.
>>> 
>>> So this should be fine, but please recall this for when/if someone does
>>> stick a synchronize_rcu_tasks() into that short time.  ;-)
>> 
>> Thanks Paul, but why would anyone want to do sync rcu tasks, before
>> the scheduler is fully initialized?
> 
> I could ask that same question of a number of other RCU API members.  ;-)
> 
>> Maybe we can add a warning here in the if-early-return path, to make
>> sure no such usage slips. And then we can look into someone using it
>> that way, if they ever start using it.
> 
> I expect that it would be more work to code and maintain any such warning
> than it would to diagnose the hang, so let's leave it as is.

Makes sense.  Works for me and you over to you Qiang! ;-)

Thanks,

  - Joel 

> 
>                            Thanx, Paul
> 
>> Thanks,
>> 
>> - Joel
>> 
>>> 
>>>                           Thanx, Paul
>>> 
>>>> - Joel
>>>> 
>>>> 
>>>>> 
>>>>>                      Thanx, Paul
>>>>> 
>>>>>> Thanks
>>>>>> Zqiang
>>>>>> 
>>>>>>> 
>>>>>>> Or did I miss something?
>>>>>>> 
>>>>>>> Thanks.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks
>>>>>>> Zqiang
>>>>>>> 
>>>>>>>> 
>>>>>>>> This works because rcu_scheduler_active is set to RCU_SCHEDULER_RUNNING
>>>>>>>> long before it is possible to offline CPUs.
>>>>>>>> 
>>>>>>>> Yes, schedule_on_each_cpu() does do cpus_read_lock(), again, good eyes,
>>>>>>>> and it also unnecessarily does the schedule_work_on() the current CPU,
>>>>>>>> but the code calling synchronize_rcu_tasks_rude() is on high-overhead
>>>>>>>> code paths, so this overhead is down in the noise.
>>>>>>>> 
>>>>>>>> Until further notice, anyway.
>>>>>>>> 
>>>>>>>> So simplicity is much more important than performance in this code.
>>>>>>>> So just adding the check for RCU_SCHEDULER_RUNNING should fix this,
>>>>>>>> unless I am missing something (always possible!).
>>>>>>>> 
>>>>>>>>                         Thanx, Paul
>>>>>>>> 
>>>>>>>> ---
>>>>>>>> kernel/rcu/tasks.h | 20 ++++++++++++++++++--
>>>>>>>> 1 file changed, 18 insertions(+), 2 deletions(-)
>>>>>>>> 
>>>>>>>> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
>>>>>>>> index 4a991311be9b..08e72c6462d8 100644
>>>>>>>> --- a/kernel/rcu/tasks.h
>>>>>>>> +++ b/kernel/rcu/tasks.h
>>>>>>>> @@ -1033,14 +1033,30 @@ static void rcu_tasks_be_rude(struct work_struct *work)
>>>>>>>> {
>>>>>>>> }
>>>>>>>> 
>>>>>>>> +static DEFINE_PER_CPU(struct work_struct, rude_work);
>>>>>>>> +
>>>>>>>> // Wait for one rude RCU-tasks grace period.
>>>>>>>> static void rcu_tasks_rude_wait_gp(struct rcu_tasks *rtp)
>>>>>>>> {
>>>>>>>> +    int cpu;
>>>>>>>> +    struct work_struct *work;
>>>>>>>> +
>>>>>>>> +    cpus_read_lock();
>>>>>>>> if (num_online_cpus() <= 1)
>>>>>>>> -        return;    // Fastpath for only one CPU.
>>>>>>>> +        goto end;// Fastpath for only one CPU.
>>>>>>>> 
>>>>>>>> rtp->n_ipis += cpumask_weight(cpu_online_mask);
>>>>>>>> -    schedule_on_each_cpu(rcu_tasks_be_rude);
>>>>>>>> +    for_each_online_cpu(cpu) {
>>>>>>>> +        work = per_cpu_ptr(&rude_work, cpu);
>>>>>>>> +        INIT_WORK(work, rcu_tasks_be_rude);
>>>>>>>> +        schedule_work_on(cpu, work);
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    for_each_online_cpu(cpu)
>>>>>>>> +        flush_work(per_cpu_ptr(&rude_work, cpu));
>>>>>>>> +
>>>>>>>> +end:
>>>>>>>> +    cpus_read_unlock();
>>>>>>>> }
>>>>>>>> 
>>>>>>>> void call_rcu_tasks_rude(struct rcu_head *rhp, rcu_callback_t func);
>>>>>>>> -- 
>>>>>>>> 2.25.1
>>>>>>>>
Zqiang Nov. 30, 2022, 12:26 a.m. UTC | #12
On Tue, Nov 29, 2022 at 06:25:04AM +0000, Zhang, Qiang1 wrote:
> > On Nov 28, 2022, at 11:54 PM, Zhang, Qiang1 <qiang1.zhang@intel.com> wrote:
> > 
> > On Mon, Nov 28, 2022 at 10:34:28PM +0800, Zqiang wrote:
> >> Currently, invoke rcu_tasks_rude_wait_gp() to wait one rude
> >> RCU-tasks grace period, if __num_online_cpus == 1, will return
> >> directly, indicates the end of the rude RCU-task grace period.
> >> suppose the system has two cpus, consider the following scenario:
> >> 
> >>    CPU0                                   CPU1 (going offline)
> >>                          migration/1 task:
> >>                                      cpu_stopper_thread
> >>                                       -> take_cpu_down
> >>                                          -> _cpu_disable
> >>                               (dec __num_online_cpus)
> >>                                          ->cpuhp_invoke_callback
> >>                                                preempt_disable
> >>                        access old_data0
> >>           task1
> >> del old_data0                                  .....
> >> synchronize_rcu_tasks_rude()
> >> task1 schedule out
> >> ....
> >> task2 schedule in
> >> rcu_tasks_rude_wait_gp()
> >>     ->__num_online_cpus == 1
> >>       ->return
> >> ....
> >> task1 schedule in
> >> ->free old_data0
> >>                                                preempt_enable
> >> 
> >> when CPU1 dec __num_online_cpus and __num_online_cpus is equal one,
> >> the CPU1 has not finished offline, stop_machine task(migration/1)
> >> still running on CPU1, maybe still accessing 'old_data0', but the
> >> 'old_data0' has freed on CPU0.
> >> 
> >> This commit add cpus_read_lock/unlock() protection before accessing
> >> __num_online_cpus variables, to ensure that the CPU in the offline
> >> process has been completed offline.
> >> 
> >> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> >> 
> >> First, good eyes and good catch!!!
> >> 
> >> The purpose of that check for num_online_cpus() is not performance
> >> on single-CPU systems, but rather correct operation during early boot.
> >> So a simpler way to make that work is to check for RCU_SCHEDULER_RUNNING,
> >> for example, as follows:
> >> 
> >>    if (rcu_scheduler_active != RCU_SCHEDULER_RUNNING &&
> >>        num_online_cpus() <= 1)
> >>        return;    // Early boot fastpath for only one CPU.
> > 
> > Hi Paul
> > 
> > During system startup, because the RCU_SCHEDULER_RUNNING is set after starting other CPUs, 
> > 
> >              CPU0                                                                       CPU1                                                                 
> > 
> > if (rcu_scheduler_active !=                                    
> >    RCU_SCHEDULER_RUNNING &&
> >           __num_online_cpus  == 1)                                               
> >    return;                                                                         inc  __num_online_cpus
> >                            (__num_online_cpus == 2)
> > 
> > CPU0 didn't notice the update of the __num_online_cpus variable by CPU1 in time
> > Can we move rcu_set_runtime_mode() before smp_init()
> > any thoughts?
> >
> >Is anyone expected to do rcu-tasks operation before the scheduler is running? 
> 
> Not sure if such a scenario exists.
> 
> >Typically this requires the tasks to context switch which is a scheduler operation.
> >
> >If the scheduler is not yet running, then I don’t think missing an update the __num_online_cpus matters since no one does a tasks-RCU synchronize.
> 
> Hi Joel
> 
> After the kernel_init task runs, before calling smp_init() to starting other CPUs, 
> the scheduler haven been initialization, task context switching can occur.
>
>Good catch, thank you both.  For some reason, I was thinking that the
>additional CPUs did not come online until later.
>
>So how about this?
>
>	if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
>		return;    // Early boot fastpath.

If use RCU_SCHEDULER_INACTIVE to check, Can we make the following changes?

--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -562,8 +562,8 @@ static int __noreturn rcu_tasks_kthread(void *arg)
 static void synchronize_rcu_tasks_generic(struct rcu_tasks *rtp)
 {
        /* Complain if the scheduler has not started.  */
-       WARN_ONCE(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE,
-                        "synchronize_rcu_tasks called too soon");
+       if(WARN_ONCE(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE))
+               return;

        // If the grace-period kthread is running, use it.
        if (READ_ONCE(rtp->kthread_ptr)) {
@@ -1066,9 +1066,6 @@ static void rcu_tasks_be_rude(struct work_struct *work)
 // Wait for one rude RCU-tasks grace period.
 static void rcu_tasks_rude_wait_gp(struct rcu_tasks *rtp)
 {
-       if (num_online_cpus() <= 1)
-               return; // Fastpath for only one CPU.
-
        rtp->n_ipis += cpumask_weight(cpu_online_mask);
        schedule_on_each_cpu(rcu_tasks_be_rude);
 }

Thanks
Zqiang

>
>If this condition is true, there is only one CPU and no scheduler,
>thus no preemption.
>
>						Thanx, Paul

> Thanks
> Zqiang
> 
> >
> >Or did I miss something?
> >
> >Thanks.
> >
> >
> >
> > 
> > Thanks
> > Zqiang
> > 
> >> 
> >> This works because rcu_scheduler_active is set to RCU_SCHEDULER_RUNNING
> >> long before it is possible to offline CPUs.
> >> 
> >> Yes, schedule_on_each_cpu() does do cpus_read_lock(), again, good eyes,
> >> and it also unnecessarily does the schedule_work_on() the current CPU,
> >> but the code calling synchronize_rcu_tasks_rude() is on high-overhead
> >> code paths, so this overhead is down in the noise.
> >> 
> >> Until further notice, anyway.
> >> 
> >> So simplicity is much more important than performance in this code.
> >> So just adding the check for RCU_SCHEDULER_RUNNING should fix this,
> >> unless I am missing something (always possible!).
> >> 
> >>                            Thanx, Paul
> >> 
> >> ---
> >> kernel/rcu/tasks.h | 20 ++++++++++++++++++--
> >> 1 file changed, 18 insertions(+), 2 deletions(-)
> >> 
> >> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> >> index 4a991311be9b..08e72c6462d8 100644
> >> --- a/kernel/rcu/tasks.h
> >> +++ b/kernel/rcu/tasks.h
> >> @@ -1033,14 +1033,30 @@ static void rcu_tasks_be_rude(struct work_struct *work)
> >> {
> >> }
> >> 
> >> +static DEFINE_PER_CPU(struct work_struct, rude_work);
> >> +
> >> // Wait for one rude RCU-tasks grace period.
> >> static void rcu_tasks_rude_wait_gp(struct rcu_tasks *rtp)
> >> {
> >> +    int cpu;
> >> +    struct work_struct *work;
> >> +
> >> +    cpus_read_lock();
> >>    if (num_online_cpus() <= 1)
> >> -        return;    // Fastpath for only one CPU.
> >> +        goto end;// Fastpath for only one CPU.
> >> 
> >>    rtp->n_ipis += cpumask_weight(cpu_online_mask);
> >> -    schedule_on_each_cpu(rcu_tasks_be_rude);
> >> +    for_each_online_cpu(cpu) {
> >> +        work = per_cpu_ptr(&rude_work, cpu);
> >> +        INIT_WORK(work, rcu_tasks_be_rude);
> >> +        schedule_work_on(cpu, work);
> >> +    }
> >> +
> >> +    for_each_online_cpu(cpu)
> >> +        flush_work(per_cpu_ptr(&rude_work, cpu));
> >> +
> >> +end:
> >> +    cpus_read_unlock();
> >> }
> >> 
> >> void call_rcu_tasks_rude(struct rcu_head *rhp, rcu_callback_t func);
> >> -- 
> >> 2.25.1
> >>
Paul E. McKenney Nov. 30, 2022, 12:39 a.m. UTC | #13
On Wed, Nov 30, 2022 at 12:26:37AM +0000, Zhang, Qiang1 wrote:
> On Tue, Nov 29, 2022 at 06:25:04AM +0000, Zhang, Qiang1 wrote:
> > > On Nov 28, 2022, at 11:54 PM, Zhang, Qiang1 <qiang1.zhang@intel.com> wrote:
> > > 
> > > On Mon, Nov 28, 2022 at 10:34:28PM +0800, Zqiang wrote:
> > >> Currently, invoke rcu_tasks_rude_wait_gp() to wait one rude
> > >> RCU-tasks grace period, if __num_online_cpus == 1, will return
> > >> directly, indicates the end of the rude RCU-task grace period.
> > >> suppose the system has two cpus, consider the following scenario:
> > >> 
> > >>    CPU0                                   CPU1 (going offline)
> > >>                          migration/1 task:
> > >>                                      cpu_stopper_thread
> > >>                                       -> take_cpu_down
> > >>                                          -> _cpu_disable
> > >>                               (dec __num_online_cpus)
> > >>                                          ->cpuhp_invoke_callback
> > >>                                                preempt_disable
> > >>                        access old_data0
> > >>           task1
> > >> del old_data0                                  .....
> > >> synchronize_rcu_tasks_rude()
> > >> task1 schedule out
> > >> ....
> > >> task2 schedule in
> > >> rcu_tasks_rude_wait_gp()
> > >>     ->__num_online_cpus == 1
> > >>       ->return
> > >> ....
> > >> task1 schedule in
> > >> ->free old_data0
> > >>                                                preempt_enable
> > >> 
> > >> when CPU1 dec __num_online_cpus and __num_online_cpus is equal one,
> > >> the CPU1 has not finished offline, stop_machine task(migration/1)
> > >> still running on CPU1, maybe still accessing 'old_data0', but the
> > >> 'old_data0' has freed on CPU0.
> > >> 
> > >> This commit add cpus_read_lock/unlock() protection before accessing
> > >> __num_online_cpus variables, to ensure that the CPU in the offline
> > >> process has been completed offline.
> > >> 
> > >> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> > >> 
> > >> First, good eyes and good catch!!!
> > >> 
> > >> The purpose of that check for num_online_cpus() is not performance
> > >> on single-CPU systems, but rather correct operation during early boot.
> > >> So a simpler way to make that work is to check for RCU_SCHEDULER_RUNNING,
> > >> for example, as follows:
> > >> 
> > >>    if (rcu_scheduler_active != RCU_SCHEDULER_RUNNING &&
> > >>        num_online_cpus() <= 1)
> > >>        return;    // Early boot fastpath for only one CPU.
> > > 
> > > Hi Paul
> > > 
> > > During system startup, because the RCU_SCHEDULER_RUNNING is set after starting other CPUs, 
> > > 
> > >              CPU0                                                                       CPU1                                                                 
> > > 
> > > if (rcu_scheduler_active !=                                    
> > >    RCU_SCHEDULER_RUNNING &&
> > >           __num_online_cpus  == 1)                                               
> > >    return;                                                                         inc  __num_online_cpus
> > >                            (__num_online_cpus == 2)
> > > 
> > > CPU0 didn't notice the update of the __num_online_cpus variable by CPU1 in time
> > > Can we move rcu_set_runtime_mode() before smp_init()
> > > any thoughts?
> > >
> > >Is anyone expected to do rcu-tasks operation before the scheduler is running? 
> > 
> > Not sure if such a scenario exists.
> > 
> > >Typically this requires the tasks to context switch which is a scheduler operation.
> > >
> > >If the scheduler is not yet running, then I don’t think missing an update the __num_online_cpus matters since no one does a tasks-RCU synchronize.
> > 
> > Hi Joel
> > 
> > After the kernel_init task runs, before calling smp_init() to starting other CPUs, 
> > the scheduler haven been initialization, task context switching can occur.
> >
> >Good catch, thank you both.  For some reason, I was thinking that the
> >additional CPUs did not come online until later.
> >
> >So how about this?
> >
> >	if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
> >		return;    // Early boot fastpath.
> 
> If use RCU_SCHEDULER_INACTIVE to check, Can we make the following changes?

You will need s/WARN_ONCE/WARN_ON_ONCE/ (or supply the added arguments),
but yes, this looks good.

And thank you for digging down the extra level!

							Thanx, Paul

> --- a/kernel/rcu/tasks.h
> +++ b/kernel/rcu/tasks.h
> @@ -562,8 +562,8 @@ static int __noreturn rcu_tasks_kthread(void *arg)
>  static void synchronize_rcu_tasks_generic(struct rcu_tasks *rtp)
>  {
>         /* Complain if the scheduler has not started.  */
> -       WARN_ONCE(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE,
> -                        "synchronize_rcu_tasks called too soon");
> +       if(WARN_ONCE(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE))
> +               return;
> 
>         // If the grace-period kthread is running, use it.
>         if (READ_ONCE(rtp->kthread_ptr)) {
> @@ -1066,9 +1066,6 @@ static void rcu_tasks_be_rude(struct work_struct *work)
>  // Wait for one rude RCU-tasks grace period.
>  static void rcu_tasks_rude_wait_gp(struct rcu_tasks *rtp)
>  {
> -       if (num_online_cpus() <= 1)
> -               return; // Fastpath for only one CPU.
> -
>         rtp->n_ipis += cpumask_weight(cpu_online_mask);
>         schedule_on_each_cpu(rcu_tasks_be_rude);
>  }
> 
> Thanks
> Zqiang
> 
> >
> >If this condition is true, there is only one CPU and no scheduler,
> >thus no preemption.
> >
> >						Thanx, Paul
> 
> > Thanks
> > Zqiang
> > 
> > >
> > >Or did I miss something?
> > >
> > >Thanks.
> > >
> > >
> > >
> > > 
> > > Thanks
> > > Zqiang
> > > 
> > >> 
> > >> This works because rcu_scheduler_active is set to RCU_SCHEDULER_RUNNING
> > >> long before it is possible to offline CPUs.
> > >> 
> > >> Yes, schedule_on_each_cpu() does do cpus_read_lock(), again, good eyes,
> > >> and it also unnecessarily does the schedule_work_on() the current CPU,
> > >> but the code calling synchronize_rcu_tasks_rude() is on high-overhead
> > >> code paths, so this overhead is down in the noise.
> > >> 
> > >> Until further notice, anyway.
> > >> 
> > >> So simplicity is much more important than performance in this code.
> > >> So just adding the check for RCU_SCHEDULER_RUNNING should fix this,
> > >> unless I am missing something (always possible!).
> > >> 
> > >>                            Thanx, Paul
> > >> 
> > >> ---
> > >> kernel/rcu/tasks.h | 20 ++++++++++++++++++--
> > >> 1 file changed, 18 insertions(+), 2 deletions(-)
> > >> 
> > >> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> > >> index 4a991311be9b..08e72c6462d8 100644
> > >> --- a/kernel/rcu/tasks.h
> > >> +++ b/kernel/rcu/tasks.h
> > >> @@ -1033,14 +1033,30 @@ static void rcu_tasks_be_rude(struct work_struct *work)
> > >> {
> > >> }
> > >> 
> > >> +static DEFINE_PER_CPU(struct work_struct, rude_work);
> > >> +
> > >> // Wait for one rude RCU-tasks grace period.
> > >> static void rcu_tasks_rude_wait_gp(struct rcu_tasks *rtp)
> > >> {
> > >> +    int cpu;
> > >> +    struct work_struct *work;
> > >> +
> > >> +    cpus_read_lock();
> > >>    if (num_online_cpus() <= 1)
> > >> -        return;    // Fastpath for only one CPU.
> > >> +        goto end;// Fastpath for only one CPU.
> > >> 
> > >>    rtp->n_ipis += cpumask_weight(cpu_online_mask);
> > >> -    schedule_on_each_cpu(rcu_tasks_be_rude);
> > >> +    for_each_online_cpu(cpu) {
> > >> +        work = per_cpu_ptr(&rude_work, cpu);
> > >> +        INIT_WORK(work, rcu_tasks_be_rude);
> > >> +        schedule_work_on(cpu, work);
> > >> +    }
> > >> +
> > >> +    for_each_online_cpu(cpu)
> > >> +        flush_work(per_cpu_ptr(&rude_work, cpu));
> > >> +
> > >> +end:
> > >> +    cpus_read_unlock();
> > >> }
> > >> 
> > >> void call_rcu_tasks_rude(struct rcu_head *rhp, rcu_callback_t func);
> > >> -- 
> > >> 2.25.1
> > >>
Zqiang Nov. 30, 2022, 12:50 a.m. UTC | #14
On Wed, Nov 30, 2022 at 12:26:37AM +0000, Zhang, Qiang1 wrote:
> On Tue, Nov 29, 2022 at 06:25:04AM +0000, Zhang, Qiang1 wrote:
> > > On Nov 28, 2022, at 11:54 PM, Zhang, Qiang1 <qiang1.zhang@intel.com> wrote:
> > > 
> > > On Mon, Nov 28, 2022 at 10:34:28PM +0800, Zqiang wrote:
> > >> Currently, invoke rcu_tasks_rude_wait_gp() to wait one rude
> > >> RCU-tasks grace period, if __num_online_cpus == 1, will return
> > >> directly, indicates the end of the rude RCU-task grace period.
> > >> suppose the system has two cpus, consider the following scenario:
> > >> 
> > >>    CPU0                                   CPU1 (going offline)
> > >>                          migration/1 task:
> > >>                                      cpu_stopper_thread
> > >>                                       -> take_cpu_down
> > >>                                          -> _cpu_disable
> > >>                               (dec __num_online_cpus)
> > >>                                          ->cpuhp_invoke_callback
> > >>                                                preempt_disable
> > >>                        access old_data0
> > >>           task1
> > >> del old_data0                                  .....
> > >> synchronize_rcu_tasks_rude()
> > >> task1 schedule out
> > >> ....
> > >> task2 schedule in
> > >> rcu_tasks_rude_wait_gp()
> > >>     ->__num_online_cpus == 1
> > >>       ->return
> > >> ....
> > >> task1 schedule in
> > >> ->free old_data0
> > >>                                                preempt_enable
> > >> 
> > >> when CPU1 dec __num_online_cpus and __num_online_cpus is equal one,
> > >> the CPU1 has not finished offline, stop_machine task(migration/1)
> > >> still running on CPU1, maybe still accessing 'old_data0', but the
> > >> 'old_data0' has freed on CPU0.
> > >> 
> > >> This commit add cpus_read_lock/unlock() protection before accessing
> > >> __num_online_cpus variables, to ensure that the CPU in the offline
> > >> process has been completed offline.
> > >> 
> > >> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> > >> 
> > >> First, good eyes and good catch!!!
> > >> 
> > >> The purpose of that check for num_online_cpus() is not performance
> > >> on single-CPU systems, but rather correct operation during early boot.
> > >> So a simpler way to make that work is to check for RCU_SCHEDULER_RUNNING,
> > >> for example, as follows:
> > >> 
> > >>    if (rcu_scheduler_active != RCU_SCHEDULER_RUNNING &&
> > >>        num_online_cpus() <= 1)
> > >>        return;    // Early boot fastpath for only one CPU.
> > > 
> > > Hi Paul
> > > 
> > > During system startup, because the RCU_SCHEDULER_RUNNING is set after starting other CPUs, 
> > > 
> > >              CPU0                                                                       CPU1                                                                 
> > > 
> > > if (rcu_scheduler_active !=                                    
> > >    RCU_SCHEDULER_RUNNING &&
> > >           __num_online_cpus  == 1)                                               
> > >    return;                                                                         inc  __num_online_cpus
> > >                            (__num_online_cpus == 2)
> > > 
> > > CPU0 didn't notice the update of the __num_online_cpus variable by CPU1 in time
> > > Can we move rcu_set_runtime_mode() before smp_init()
> > > any thoughts?
> > >
> > >Is anyone expected to do rcu-tasks operation before the scheduler is running? 
> > 
> > Not sure if such a scenario exists.
> > 
> > >Typically this requires the tasks to context switch which is a scheduler operation.
> > >
> > >If the scheduler is not yet running, then I don’t think missing an update the __num_online_cpus matters since no one does a tasks-RCU synchronize.
> > 
> > Hi Joel
> > 
> > After the kernel_init task runs, before calling smp_init() to starting other CPUs, 
> > the scheduler haven been initialization, task context switching can occur.
> >
> >Good catch, thank you both.  For some reason, I was thinking that the
> >additional CPUs did not come online until later.
> >
> >So how about this?
> >
> >	if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
> >		return;    // Early boot fastpath.
> 
> If use RCU_SCHEDULER_INACTIVE to check, Can we make the following changes?

>
>You will need s/WARN_ONCE/WARN_ON_ONCE/ (or supply the added arguments),
>but yes, this looks good.
>
>
>And thank you for digging down the extra level!

Can I modify sending V3 as follows?

Thanks
Zqiang

>
>							Thanx, Paul

> --- a/kernel/rcu/tasks.h
> +++ b/kernel/rcu/tasks.h
> @@ -562,8 +562,8 @@ static int __noreturn rcu_tasks_kthread(void *arg)
>  static void synchronize_rcu_tasks_generic(struct rcu_tasks *rtp)
>  {
>         /* Complain if the scheduler has not started.  */
> -       WARN_ONCE(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE,
> -                        "synchronize_rcu_tasks called too soon");
> +       if(WARN_ONCE(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE))
> +               return;
> 
>         // If the grace-period kthread is running, use it.
>         if (READ_ONCE(rtp->kthread_ptr)) {
> @@ -1066,9 +1066,6 @@ static void rcu_tasks_be_rude(struct work_struct *work)
>  // Wait for one rude RCU-tasks grace period.
>  static void rcu_tasks_rude_wait_gp(struct rcu_tasks *rtp)
>  {
> -       if (num_online_cpus() <= 1)
> -               return; // Fastpath for only one CPU.
> -
>         rtp->n_ipis += cpumask_weight(cpu_online_mask);
>         schedule_on_each_cpu(rcu_tasks_be_rude);
>  }
> 
> Thanks
> Zqiang
> 
> >
> >If this condition is true, there is only one CPU and no scheduler,
> >thus no preemption.
> >
> >						Thanx, Paul
> 
> > Thanks
> > Zqiang
> > 
> > >
> > >Or did I miss something?
> > >
> > >Thanks.
> > >
> > >
> > >
> > > 
> > > Thanks
> > > Zqiang
> > > 
> > >> 
> > >> This works because rcu_scheduler_active is set to RCU_SCHEDULER_RUNNING
> > >> long before it is possible to offline CPUs.
> > >> 
> > >> Yes, schedule_on_each_cpu() does do cpus_read_lock(), again, good eyes,
> > >> and it also unnecessarily does the schedule_work_on() the current CPU,
> > >> but the code calling synchronize_rcu_tasks_rude() is on high-overhead
> > >> code paths, so this overhead is down in the noise.
> > >> 
> > >> Until further notice, anyway.
> > >> 
> > >> So simplicity is much more important than performance in this code.
> > >> So just adding the check for RCU_SCHEDULER_RUNNING should fix this,
> > >> unless I am missing something (always possible!).
> > >> 
> > >>                            Thanx, Paul
> > >> 
> > >> ---
> > >> kernel/rcu/tasks.h | 20 ++++++++++++++++++--
> > >> 1 file changed, 18 insertions(+), 2 deletions(-)
> > >> 
> > >> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> > >> index 4a991311be9b..08e72c6462d8 100644
> > >> --- a/kernel/rcu/tasks.h
> > >> +++ b/kernel/rcu/tasks.h
> > >> @@ -1033,14 +1033,30 @@ static void rcu_tasks_be_rude(struct work_struct *work)
> > >> {
> > >> }
> > >> 
> > >> +static DEFINE_PER_CPU(struct work_struct, rude_work);
> > >> +
> > >> // Wait for one rude RCU-tasks grace period.
> > >> static void rcu_tasks_rude_wait_gp(struct rcu_tasks *rtp)
> > >> {
> > >> +    int cpu;
> > >> +    struct work_struct *work;
> > >> +
> > >> +    cpus_read_lock();
> > >>    if (num_online_cpus() <= 1)
> > >> -        return;    // Fastpath for only one CPU.
> > >> +        goto end;// Fastpath for only one CPU.
> > >> 
> > >>    rtp->n_ipis += cpumask_weight(cpu_online_mask);
> > >> -    schedule_on_each_cpu(rcu_tasks_be_rude);
> > >> +    for_each_online_cpu(cpu) {
> > >> +        work = per_cpu_ptr(&rude_work, cpu);
> > >> +        INIT_WORK(work, rcu_tasks_be_rude);
> > >> +        schedule_work_on(cpu, work);
> > >> +    }
> > >> +
> > >> +    for_each_online_cpu(cpu)
> > >> +        flush_work(per_cpu_ptr(&rude_work, cpu));
> > >> +
> > >> +end:
> > >> +    cpus_read_unlock();
> > >> }
> > >> 
> > >> void call_rcu_tasks_rude(struct rcu_head *rhp, rcu_callback_t func);
> > >> -- 
> > >> 2.25.1
> > >>
Paul E. McKenney Nov. 30, 2022, 1:02 a.m. UTC | #15
On Wed, Nov 30, 2022 at 12:50:34AM +0000, Zhang, Qiang1 wrote:
> On Wed, Nov 30, 2022 at 12:26:37AM +0000, Zhang, Qiang1 wrote:
> > On Tue, Nov 29, 2022 at 06:25:04AM +0000, Zhang, Qiang1 wrote:
> > > > On Nov 28, 2022, at 11:54 PM, Zhang, Qiang1 <qiang1.zhang@intel.com> wrote:
> > > > 
> > > > On Mon, Nov 28, 2022 at 10:34:28PM +0800, Zqiang wrote:
> > > >> Currently, invoke rcu_tasks_rude_wait_gp() to wait one rude
> > > >> RCU-tasks grace period, if __num_online_cpus == 1, will return
> > > >> directly, indicates the end of the rude RCU-task grace period.
> > > >> suppose the system has two cpus, consider the following scenario:
> > > >> 
> > > >>    CPU0                                   CPU1 (going offline)
> > > >>                          migration/1 task:
> > > >>                                      cpu_stopper_thread
> > > >>                                       -> take_cpu_down
> > > >>                                          -> _cpu_disable
> > > >>                               (dec __num_online_cpus)
> > > >>                                          ->cpuhp_invoke_callback
> > > >>                                                preempt_disable
> > > >>                        access old_data0
> > > >>           task1
> > > >> del old_data0                                  .....
> > > >> synchronize_rcu_tasks_rude()
> > > >> task1 schedule out
> > > >> ....
> > > >> task2 schedule in
> > > >> rcu_tasks_rude_wait_gp()
> > > >>     ->__num_online_cpus == 1
> > > >>       ->return
> > > >> ....
> > > >> task1 schedule in
> > > >> ->free old_data0
> > > >>                                                preempt_enable
> > > >> 
> > > >> when CPU1 dec __num_online_cpus and __num_online_cpus is equal one,
> > > >> the CPU1 has not finished offline, stop_machine task(migration/1)
> > > >> still running on CPU1, maybe still accessing 'old_data0', but the
> > > >> 'old_data0' has freed on CPU0.
> > > >> 
> > > >> This commit add cpus_read_lock/unlock() protection before accessing
> > > >> __num_online_cpus variables, to ensure that the CPU in the offline
> > > >> process has been completed offline.
> > > >> 
> > > >> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> > > >> 
> > > >> First, good eyes and good catch!!!
> > > >> 
> > > >> The purpose of that check for num_online_cpus() is not performance
> > > >> on single-CPU systems, but rather correct operation during early boot.
> > > >> So a simpler way to make that work is to check for RCU_SCHEDULER_RUNNING,
> > > >> for example, as follows:
> > > >> 
> > > >>    if (rcu_scheduler_active != RCU_SCHEDULER_RUNNING &&
> > > >>        num_online_cpus() <= 1)
> > > >>        return;    // Early boot fastpath for only one CPU.
> > > > 
> > > > Hi Paul
> > > > 
> > > > During system startup, because the RCU_SCHEDULER_RUNNING is set after starting other CPUs, 
> > > > 
> > > >              CPU0                                                                       CPU1                                                                 
> > > > 
> > > > if (rcu_scheduler_active !=                                    
> > > >    RCU_SCHEDULER_RUNNING &&
> > > >           __num_online_cpus  == 1)                                               
> > > >    return;                                                                         inc  __num_online_cpus
> > > >                            (__num_online_cpus == 2)
> > > > 
> > > > CPU0 didn't notice the update of the __num_online_cpus variable by CPU1 in time
> > > > Can we move rcu_set_runtime_mode() before smp_init()
> > > > any thoughts?
> > > >
> > > >Is anyone expected to do rcu-tasks operation before the scheduler is running? 
> > > 
> > > Not sure if such a scenario exists.
> > > 
> > > >Typically this requires the tasks to context switch which is a scheduler operation.
> > > >
> > > >If the scheduler is not yet running, then I don’t think missing an update the __num_online_cpus matters since no one does a tasks-RCU synchronize.
> > > 
> > > Hi Joel
> > > 
> > > After the kernel_init task runs, before calling smp_init() to starting other CPUs, 
> > > the scheduler haven been initialization, task context switching can occur.
> > >
> > >Good catch, thank you both.  For some reason, I was thinking that the
> > >additional CPUs did not come online until later.
> > >
> > >So how about this?
> > >
> > >	if (rcu_scheduler_active == RCU_SCHEDULER_INACTIVE)
> > >		return;    // Early boot fastpath.
> > 
> > If use RCU_SCHEDULER_INACTIVE to check, Can we make the following changes?
> 
> >
> >You will need s/WARN_ONCE/WARN_ON_ONCE/ (or supply the added arguments),
> >but yes, this looks good.
> >
> >
> >And thank you for digging down the extra level!
> 
> Can I modify sending V3 as follows?

Your patch shown below is a good starting point.

You will need to fix up this line of code, though:

	if(WARN_ONCE(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE))

							Thanx, Paul

> Thanks
> Zqiang
> 
> >
> >							Thanx, Paul
> 
> > --- a/kernel/rcu/tasks.h
> > +++ b/kernel/rcu/tasks.h
> > @@ -562,8 +562,8 @@ static int __noreturn rcu_tasks_kthread(void *arg)
> >  static void synchronize_rcu_tasks_generic(struct rcu_tasks *rtp)
> >  {
> >         /* Complain if the scheduler has not started.  */
> > -       WARN_ONCE(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE,
> > -                        "synchronize_rcu_tasks called too soon");
> > +       if(WARN_ONCE(rcu_scheduler_active == RCU_SCHEDULER_INACTIVE))
> > +               return;
> > 
> >         // If the grace-period kthread is running, use it.
> >         if (READ_ONCE(rtp->kthread_ptr)) {
> > @@ -1066,9 +1066,6 @@ static void rcu_tasks_be_rude(struct work_struct *work)
> >  // Wait for one rude RCU-tasks grace period.
> >  static void rcu_tasks_rude_wait_gp(struct rcu_tasks *rtp)
> >  {
> > -       if (num_online_cpus() <= 1)
> > -               return; // Fastpath for only one CPU.
> > -
> >         rtp->n_ipis += cpumask_weight(cpu_online_mask);
> >         schedule_on_each_cpu(rcu_tasks_be_rude);
> >  }
> > 
> > Thanks
> > Zqiang
> > 
> > >
> > >If this condition is true, there is only one CPU and no scheduler,
> > >thus no preemption.
> > >
> > >						Thanx, Paul
> > 
> > > Thanks
> > > Zqiang
> > > 
> > > >
> > > >Or did I miss something?
> > > >
> > > >Thanks.
> > > >
> > > >
> > > >
> > > > 
> > > > Thanks
> > > > Zqiang
> > > > 
> > > >> 
> > > >> This works because rcu_scheduler_active is set to RCU_SCHEDULER_RUNNING
> > > >> long before it is possible to offline CPUs.
> > > >> 
> > > >> Yes, schedule_on_each_cpu() does do cpus_read_lock(), again, good eyes,
> > > >> and it also unnecessarily does the schedule_work_on() the current CPU,
> > > >> but the code calling synchronize_rcu_tasks_rude() is on high-overhead
> > > >> code paths, so this overhead is down in the noise.
> > > >> 
> > > >> Until further notice, anyway.
> > > >> 
> > > >> So simplicity is much more important than performance in this code.
> > > >> So just adding the check for RCU_SCHEDULER_RUNNING should fix this,
> > > >> unless I am missing something (always possible!).
> > > >> 
> > > >>                            Thanx, Paul
> > > >> 
> > > >> ---
> > > >> kernel/rcu/tasks.h | 20 ++++++++++++++++++--
> > > >> 1 file changed, 18 insertions(+), 2 deletions(-)
> > > >> 
> > > >> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> > > >> index 4a991311be9b..08e72c6462d8 100644
> > > >> --- a/kernel/rcu/tasks.h
> > > >> +++ b/kernel/rcu/tasks.h
> > > >> @@ -1033,14 +1033,30 @@ static void rcu_tasks_be_rude(struct work_struct *work)
> > > >> {
> > > >> }
> > > >> 
> > > >> +static DEFINE_PER_CPU(struct work_struct, rude_work);
> > > >> +
> > > >> // Wait for one rude RCU-tasks grace period.
> > > >> static void rcu_tasks_rude_wait_gp(struct rcu_tasks *rtp)
> > > >> {
> > > >> +    int cpu;
> > > >> +    struct work_struct *work;
> > > >> +
> > > >> +    cpus_read_lock();
> > > >>    if (num_online_cpus() <= 1)
> > > >> -        return;    // Fastpath for only one CPU.
> > > >> +        goto end;// Fastpath for only one CPU.
> > > >> 
> > > >>    rtp->n_ipis += cpumask_weight(cpu_online_mask);
> > > >> -    schedule_on_each_cpu(rcu_tasks_be_rude);
> > > >> +    for_each_online_cpu(cpu) {
> > > >> +        work = per_cpu_ptr(&rude_work, cpu);
> > > >> +        INIT_WORK(work, rcu_tasks_be_rude);
> > > >> +        schedule_work_on(cpu, work);
> > > >> +    }
> > > >> +
> > > >> +    for_each_online_cpu(cpu)
> > > >> +        flush_work(per_cpu_ptr(&rude_work, cpu));
> > > >> +
> > > >> +end:
> > > >> +    cpus_read_unlock();
> > > >> }
> > > >> 
> > > >> void call_rcu_tasks_rude(struct rcu_head *rhp, rcu_callback_t func);
> > > >> -- 
> > > >> 2.25.1
> > > >>
diff mbox series

Patch

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index 4a991311be9b..08e72c6462d8 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -1033,14 +1033,30 @@  static void rcu_tasks_be_rude(struct work_struct *work)
 {
 }
 
+static DEFINE_PER_CPU(struct work_struct, rude_work);
+
 // Wait for one rude RCU-tasks grace period.
 static void rcu_tasks_rude_wait_gp(struct rcu_tasks *rtp)
 {
+	int cpu;
+	struct work_struct *work;
+
+	cpus_read_lock();
 	if (num_online_cpus() <= 1)
-		return;	// Fastpath for only one CPU.
+		goto end;// Fastpath for only one CPU.
 
 	rtp->n_ipis += cpumask_weight(cpu_online_mask);
-	schedule_on_each_cpu(rcu_tasks_be_rude);
+	for_each_online_cpu(cpu) {
+		work = per_cpu_ptr(&rude_work, cpu);
+		INIT_WORK(work, rcu_tasks_be_rude);
+		schedule_work_on(cpu, work);
+	}
+
+	for_each_online_cpu(cpu)
+		flush_work(per_cpu_ptr(&rude_work, cpu));
+
+end:
+	cpus_read_unlock();
 }
 
 void call_rcu_tasks_rude(struct rcu_head *rhp, rcu_callback_t func);