diff mbox series

arm: smp: Avoid false positive CPU hotplug Lockdep-RCU splat

Message ID 20240307160951.3607374-1-stefan.wiehler@nokia.com (mailing list archive)
State New, archived
Headers show
Series arm: smp: Avoid false positive CPU hotplug Lockdep-RCU splat | expand

Commit Message

Stefan Wiehler March 7, 2024, 4:09 p.m. UTC
With CONFIG_PROVE_RCU_LIST=y and by executing

  $ echo 0 > /sys/devices/system/cpu/cpu1/online

one can trigger the following Lockdep-RCU splat on ARM:

  =============================
  WARNING: suspicious RCU usage
  6.8.0-rc7-00001-g0db1d0ed8958 #10 Not tainted
  -----------------------------
  kernel/locking/lockdep.c:3762 RCU-list traversed in non-reader section!!

  other info that might help us debug this:

  RCU used illegally from offline CPU!
  rcu_scheduler_active = 2, debug_locks = 1
  no locks held by swapper/1/0.

  stack backtrace:
  CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.8.0-rc7-00001-g0db1d0ed8958 #10
  Hardware name: Allwinner sun8i Family
   unwind_backtrace from show_stack+0x10/0x14
   show_stack from dump_stack_lvl+0x60/0x90
   dump_stack_lvl from lockdep_rcu_suspicious+0x150/0x1a0
   lockdep_rcu_suspicious from __lock_acquire+0x11fc/0x29f8
   __lock_acquire from lock_acquire+0x10c/0x348
   lock_acquire from _raw_spin_lock_irqsave+0x50/0x6c
   _raw_spin_lock_irqsave from check_and_switch_context+0x7c/0x4a8
   check_and_switch_context from arch_cpu_idle_dead+0x10/0x7c
   arch_cpu_idle_dead from do_idle+0xbc/0x138
   do_idle from cpu_startup_entry+0x28/0x2c
   cpu_startup_entry from secondary_start_kernel+0x11c/0x124
   secondary_start_kernel from 0x401018a0

The CPU is already reported as offline from RCU perspective in
cpuhp_report_idle_dead() before arch_cpu_idle_dead() is invoked. Above
RCU-Lockdep splat is then triggered by check_and_switch_context() acquiring the
ASID spinlock.

Avoid the false-positive Lockdep-RCU splat by briefly reporting the CPU as
online again while the spinlock is held.

Signed-off-by: Stefan Wiehler <stefan.wiehler@nokia.com>
---
 arch/arm/kernel/smp.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Paul E. McKenney March 7, 2024, 5:45 p.m. UTC | #1
On Thu, Mar 07, 2024 at 05:09:51PM +0100, Stefan Wiehler wrote:
> With CONFIG_PROVE_RCU_LIST=y and by executing
> 
>   $ echo 0 > /sys/devices/system/cpu/cpu1/online
> 
> one can trigger the following Lockdep-RCU splat on ARM:
> 
>   =============================
>   WARNING: suspicious RCU usage
>   6.8.0-rc7-00001-g0db1d0ed8958 #10 Not tainted
>   -----------------------------
>   kernel/locking/lockdep.c:3762 RCU-list traversed in non-reader section!!
> 
>   other info that might help us debug this:
> 
>   RCU used illegally from offline CPU!
>   rcu_scheduler_active = 2, debug_locks = 1
>   no locks held by swapper/1/0.
> 
>   stack backtrace:
>   CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.8.0-rc7-00001-g0db1d0ed8958 #10
>   Hardware name: Allwinner sun8i Family
>    unwind_backtrace from show_stack+0x10/0x14
>    show_stack from dump_stack_lvl+0x60/0x90
>    dump_stack_lvl from lockdep_rcu_suspicious+0x150/0x1a0
>    lockdep_rcu_suspicious from __lock_acquire+0x11fc/0x29f8
>    __lock_acquire from lock_acquire+0x10c/0x348
>    lock_acquire from _raw_spin_lock_irqsave+0x50/0x6c
>    _raw_spin_lock_irqsave from check_and_switch_context+0x7c/0x4a8
>    check_and_switch_context from arch_cpu_idle_dead+0x10/0x7c
>    arch_cpu_idle_dead from do_idle+0xbc/0x138
>    do_idle from cpu_startup_entry+0x28/0x2c
>    cpu_startup_entry from secondary_start_kernel+0x11c/0x124
>    secondary_start_kernel from 0x401018a0
> 
> The CPU is already reported as offline from RCU perspective in
> cpuhp_report_idle_dead() before arch_cpu_idle_dead() is invoked. Above
> RCU-Lockdep splat is then triggered by check_and_switch_context() acquiring the
> ASID spinlock.
> 
> Avoid the false-positive Lockdep-RCU splat by briefly reporting the CPU as
> online again while the spinlock is held.
> 
> Signed-off-by: Stefan Wiehler <stefan.wiehler@nokia.com>

From an RCU perspective, this looks plausible.  One question
below.

						Thanx, Paul

> ---
>  arch/arm/kernel/smp.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index 3431c0553f45..6875e2c5dd50 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -319,7 +319,14 @@ void __noreturn arch_cpu_idle_dead(void)
>  {
>  	unsigned int cpu = smp_processor_id();
>  
> +	/*
> +	 * Briefly report CPU as online again to avoid false positive
> +	 * Lockdep-RCU splat when check_and_switch_context() acquires ASID
> +	 * spinlock.
> +	 */
> +	rcutree_report_cpu_starting(cpu);
>  	idle_task_exit();
> +	rcutree_report_cpu_dead();
>  
>  	local_irq_disable();

Both rcutree_report_cpu_starting() and rcutree_report_cpu_dead() complain
bitterly via lockdep if interrupts are enabled.  And the call sites have
interrupts disabled.  So I don't understand what this local_irq_disable()
is needed for.
Paul E. McKenney March 7, 2024, 5:49 p.m. UTC | #2
On Thu, Mar 07, 2024 at 09:45:36AM -0800, Paul E. McKenney wrote:
> On Thu, Mar 07, 2024 at 05:09:51PM +0100, Stefan Wiehler wrote:
> > With CONFIG_PROVE_RCU_LIST=y and by executing
> > 
> >   $ echo 0 > /sys/devices/system/cpu/cpu1/online
> > 
> > one can trigger the following Lockdep-RCU splat on ARM:
> > 
> >   =============================
> >   WARNING: suspicious RCU usage
> >   6.8.0-rc7-00001-g0db1d0ed8958 #10 Not tainted
> >   -----------------------------
> >   kernel/locking/lockdep.c:3762 RCU-list traversed in non-reader section!!
> > 
> >   other info that might help us debug this:
> > 
> >   RCU used illegally from offline CPU!
> >   rcu_scheduler_active = 2, debug_locks = 1
> >   no locks held by swapper/1/0.
> > 
> >   stack backtrace:
> >   CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.8.0-rc7-00001-g0db1d0ed8958 #10
> >   Hardware name: Allwinner sun8i Family
> >    unwind_backtrace from show_stack+0x10/0x14
> >    show_stack from dump_stack_lvl+0x60/0x90
> >    dump_stack_lvl from lockdep_rcu_suspicious+0x150/0x1a0
> >    lockdep_rcu_suspicious from __lock_acquire+0x11fc/0x29f8
> >    __lock_acquire from lock_acquire+0x10c/0x348
> >    lock_acquire from _raw_spin_lock_irqsave+0x50/0x6c
> >    _raw_spin_lock_irqsave from check_and_switch_context+0x7c/0x4a8
> >    check_and_switch_context from arch_cpu_idle_dead+0x10/0x7c
> >    arch_cpu_idle_dead from do_idle+0xbc/0x138
> >    do_idle from cpu_startup_entry+0x28/0x2c
> >    cpu_startup_entry from secondary_start_kernel+0x11c/0x124
> >    secondary_start_kernel from 0x401018a0
> > 
> > The CPU is already reported as offline from RCU perspective in
> > cpuhp_report_idle_dead() before arch_cpu_idle_dead() is invoked. Above
> > RCU-Lockdep splat is then triggered by check_and_switch_context() acquiring the
> > ASID spinlock.
> > 
> > Avoid the false-positive Lockdep-RCU splat by briefly reporting the CPU as
> > online again while the spinlock is held.
> > 
> > Signed-off-by: Stefan Wiehler <stefan.wiehler@nokia.com>
> 
> From an RCU perspective, this looks plausible.  One question
> below.

But one additional caution...  If execution is delayed during that call
to idle_task_exit(), RCU will stall and won't have a reasonable way of
motivating this CPU.  Such delays could be due to vCPU preemption or
due to firmware grabbing the CPU.

But this is only a caution, not opposition.  After all, you could have
the same problem with an online CPU that gets similarly delayed while
its interrupts are disabled.

						Thanx, Paul

> > ---
> >  arch/arm/kernel/smp.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> > index 3431c0553f45..6875e2c5dd50 100644
> > --- a/arch/arm/kernel/smp.c
> > +++ b/arch/arm/kernel/smp.c
> > @@ -319,7 +319,14 @@ void __noreturn arch_cpu_idle_dead(void)
> >  {
> >  	unsigned int cpu = smp_processor_id();
> >  
> > +	/*
> > +	 * Briefly report CPU as online again to avoid false positive
> > +	 * Lockdep-RCU splat when check_and_switch_context() acquires ASID
> > +	 * spinlock.
> > +	 */
> > +	rcutree_report_cpu_starting(cpu);
> >  	idle_task_exit();
> > +	rcutree_report_cpu_dead();
> >  
> >  	local_irq_disable();
> 
> Both rcutree_report_cpu_starting() and rcutree_report_cpu_dead() complain
> bitterly via lockdep if interrupts are enabled.  And the call sites have
> interrupts disabled.  So I don't understand what this local_irq_disable()
> is needed for.
Boqun Feng March 7, 2024, 6:49 p.m. UTC | #3
[Cc Frederic]

On Thu, Mar 07, 2024 at 05:09:51PM +0100, Stefan Wiehler wrote:
> With CONFIG_PROVE_RCU_LIST=y and by executing
> 
>   $ echo 0 > /sys/devices/system/cpu/cpu1/online
> 
> one can trigger the following Lockdep-RCU splat on ARM:
> 
>   =============================
>   WARNING: suspicious RCU usage
>   6.8.0-rc7-00001-g0db1d0ed8958 #10 Not tainted
>   -----------------------------
>   kernel/locking/lockdep.c:3762 RCU-list traversed in non-reader section!!
> 
>   other info that might help us debug this:
> 
>   RCU used illegally from offline CPU!
>   rcu_scheduler_active = 2, debug_locks = 1
>   no locks held by swapper/1/0.
> 
>   stack backtrace:
>   CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.8.0-rc7-00001-g0db1d0ed8958 #10
>   Hardware name: Allwinner sun8i Family
>    unwind_backtrace from show_stack+0x10/0x14
>    show_stack from dump_stack_lvl+0x60/0x90
>    dump_stack_lvl from lockdep_rcu_suspicious+0x150/0x1a0
>    lockdep_rcu_suspicious from __lock_acquire+0x11fc/0x29f8
>    __lock_acquire from lock_acquire+0x10c/0x348
>    lock_acquire from _raw_spin_lock_irqsave+0x50/0x6c
>    _raw_spin_lock_irqsave from check_and_switch_context+0x7c/0x4a8
>    check_and_switch_context from arch_cpu_idle_dead+0x10/0x7c
>    arch_cpu_idle_dead from do_idle+0xbc/0x138
>    do_idle from cpu_startup_entry+0x28/0x2c
>    cpu_startup_entry from secondary_start_kernel+0x11c/0x124
>    secondary_start_kernel from 0x401018a0
> 
> The CPU is already reported as offline from RCU perspective in
> cpuhp_report_idle_dead() before arch_cpu_idle_dead() is invoked. Above
> RCU-Lockdep splat is then triggered by check_and_switch_context() acquiring the
> ASID spinlock.
> 
> Avoid the false-positive Lockdep-RCU splat by briefly reporting the CPU as

I'm not sure this is a false-positive: if you traverse an RCU-list
without RCU watching the CPU, then a grace period may pass, the next
list can be garbage and you can go anywhere from the list. Or am I
missing something that a CPU hotplug can block a grace period?

But codewise, it looks good to me. Although I hope we can avoid calling
rcutree_report_cpu_{starting,dead}() here and use something simple,
however after a few attempts, I don't find a better way.

Regards,
Boqun

> online again while the spinlock is held.
> 
> Signed-off-by: Stefan Wiehler <stefan.wiehler@nokia.com>
> ---
>  arch/arm/kernel/smp.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index 3431c0553f45..6875e2c5dd50 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -319,7 +319,14 @@ void __noreturn arch_cpu_idle_dead(void)
>  {
>  	unsigned int cpu = smp_processor_id();
>  
> +	/*
> +	 * Briefly report CPU as online again to avoid false positive
> +	 * Lockdep-RCU splat when check_and_switch_context() acquires ASID
> +	 * spinlock.
> +	 */
> +	rcutree_report_cpu_starting(cpu);
>  	idle_task_exit();
> +	rcutree_report_cpu_dead();
>  
>  	local_irq_disable();
>  
> -- 
> 2.42.0
>
Russell King (Oracle) March 7, 2024, 6:54 p.m. UTC | #4
On Thu, Mar 07, 2024 at 09:45:36AM -0800, Paul E. McKenney wrote:
> > diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> > index 3431c0553f45..6875e2c5dd50 100644
> > --- a/arch/arm/kernel/smp.c
> > +++ b/arch/arm/kernel/smp.c
> > @@ -319,7 +319,14 @@ void __noreturn arch_cpu_idle_dead(void)
> >  {
> >  	unsigned int cpu = smp_processor_id();
> >  
> > +	/*
> > +	 * Briefly report CPU as online again to avoid false positive
> > +	 * Lockdep-RCU splat when check_and_switch_context() acquires ASID
> > +	 * spinlock.
> > +	 */
> > +	rcutree_report_cpu_starting(cpu);
> >  	idle_task_exit();
> > +	rcutree_report_cpu_dead();
> >  
> >  	local_irq_disable();
> 
> Both rcutree_report_cpu_starting() and rcutree_report_cpu_dead() complain
> bitterly via lockdep if interrupts are enabled.  And the call sites have
> interrupts disabled.  So I don't understand what this local_irq_disable()
> is needed for.

I think that's a question for this commit:

commit e78a7614f3876ac649b3df608789cb6ef74d0480
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Wed Jun 5 07:46:43 2019 -0700

Before this commit, arch_cpu_idle_dead() was called with IRQs enabled.
This commit moved the local_irq_disable() before calling
arch_cpu_idle_dead() but it seems no one looked at the various arch
implementations to clean those up. Quite how arch people are supposed
to spot this and clean up after such a commit, I'm not sure.

The local_irq_disable() that you're asking about has been there ever
since the inception of SMP on 32-bit ARM in this commit:

commit a054a811597a17ffbe92bc4db04a4dc2f1b1ea55
Author: Russell King <rmk@dyn-67.arm.linux.org.uk>
Date:   Wed Nov 2 22:24:33 2005 +0000

Where cpu_die() was later renamed to arch_cpu_idle_dead(). So it's
purely a case of a change being made to core code and arch code not
receiving any fixups for it.
Paul E. McKenney March 7, 2024, 7:08 p.m. UTC | #5
On Thu, Mar 07, 2024 at 06:54:38PM +0000, Russell King (Oracle) wrote:
> On Thu, Mar 07, 2024 at 09:45:36AM -0800, Paul E. McKenney wrote:
> > > diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> > > index 3431c0553f45..6875e2c5dd50 100644
> > > --- a/arch/arm/kernel/smp.c
> > > +++ b/arch/arm/kernel/smp.c
> > > @@ -319,7 +319,14 @@ void __noreturn arch_cpu_idle_dead(void)
> > >  {
> > >  	unsigned int cpu = smp_processor_id();
> > >  
> > > +	/*
> > > +	 * Briefly report CPU as online again to avoid false positive
> > > +	 * Lockdep-RCU splat when check_and_switch_context() acquires ASID
> > > +	 * spinlock.
> > > +	 */
> > > +	rcutree_report_cpu_starting(cpu);
> > >  	idle_task_exit();
> > > +	rcutree_report_cpu_dead();
> > >  
> > >  	local_irq_disable();
> > 
> > Both rcutree_report_cpu_starting() and rcutree_report_cpu_dead() complain
> > bitterly via lockdep if interrupts are enabled.  And the call sites have
> > interrupts disabled.  So I don't understand what this local_irq_disable()
> > is needed for.
> 
> I think that's a question for this commit:
> 
> commit e78a7614f3876ac649b3df608789cb6ef74d0480
> Author: Peter Zijlstra <peterz@infradead.org>
> Date:   Wed Jun 5 07:46:43 2019 -0700
> 
> Before this commit, arch_cpu_idle_dead() was called with IRQs enabled.
> This commit moved the local_irq_disable() before calling
> arch_cpu_idle_dead() but it seems no one looked at the various arch
> implementations to clean those up. Quite how arch people are supposed
> to spot this and clean up after such a commit, I'm not sure.

Telepathy?  ;-)

> The local_irq_disable() that you're asking about has been there ever
> since the inception of SMP on 32-bit ARM in this commit:
> 
> commit a054a811597a17ffbe92bc4db04a4dc2f1b1ea55
> Author: Russell King <rmk@dyn-67.arm.linux.org.uk>
> Date:   Wed Nov 2 22:24:33 2005 +0000
> 
> Where cpu_die() was later renamed to arch_cpu_idle_dead(). So it's
> purely a case of a change being made to core code and arch code not
> receiving any fixups for it.

Thank you for the info!

							Thanx, Paul
Stefan Wiehler March 8, 2024, 10:20 a.m. UTC | #6
> I'm not sure this is a false-positive: if you traverse an RCU-list
> without RCU watching the CPU, then a grace period may pass, the next
> list can be garbage and you can go anywhere from the list. Or am I
> missing something that a CPU hotplug can block a grace period?
Thanks for the feedback! Yes, calling it a false positive is probably not
correct, I will remove that. However, as Russell has explained in a previous
email, the spinlock is fundamental for ASID handling and cannot be removed.

@Russell, I will submit to the patch tracking system then.
Joel Fernandes March 8, 2024, 2:57 p.m. UTC | #7
On 3/7/2024 11:09 AM, Stefan Wiehler wrote:
> With CONFIG_PROVE_RCU_LIST=y and by executing
> 
>   $ echo 0 > /sys/devices/system/cpu/cpu1/online
> 
> one can trigger the following Lockdep-RCU splat on ARM:
> 
>   =============================
>   WARNING: suspicious RCU usage
>   6.8.0-rc7-00001-g0db1d0ed8958 #10 Not tainted
>   -----------------------------
>   kernel/locking/lockdep.c:3762 RCU-list traversed in non-reader section!!
> 
>   other info that might help us debug this:
> 
>   RCU used illegally from offline CPU!
>   rcu_scheduler_active = 2, debug_locks = 1
>   no locks held by swapper/1/0.
> 
>   stack backtrace:
>   CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.8.0-rc7-00001-g0db1d0ed8958 #10
>   Hardware name: Allwinner sun8i Family
>    unwind_backtrace from show_stack+0x10/0x14
>    show_stack from dump_stack_lvl+0x60/0x90
>    dump_stack_lvl from lockdep_rcu_suspicious+0x150/0x1a0
>    lockdep_rcu_suspicious from __lock_acquire+0x11fc/0x29f8
>    __lock_acquire from lock_acquire+0x10c/0x348
>    lock_acquire from _raw_spin_lock_irqsave+0x50/0x6c
>    _raw_spin_lock_irqsave from check_and_switch_context+0x7c/0x4a8
>    check_and_switch_context from arch_cpu_idle_dead+0x10/0x7c
>    arch_cpu_idle_dead from do_idle+0xbc/0x138
>    do_idle from cpu_startup_entry+0x28/0x2c
>    cpu_startup_entry from secondary_start_kernel+0x11c/0x124
>    secondary_start_kernel from 0x401018a0
> 
> The CPU is already reported as offline from RCU perspective in
> cpuhp_report_idle_dead() before arch_cpu_idle_dead() is invoked. Above
> RCU-Lockdep splat is then triggered by check_and_switch_context() acquiring the
> ASID spinlock.
> 
> Avoid the false-positive Lockdep-RCU splat by briefly reporting the CPU as
> online again while the spinlock is held.
> 
> Signed-off-by: Stefan Wiehler <stefan.wiehler@nokia.com>
> ---
>  arch/arm/kernel/smp.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index 3431c0553f45..6875e2c5dd50 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -319,7 +319,14 @@ void __noreturn arch_cpu_idle_dead(void)
>  {
>  	unsigned int cpu = smp_processor_id();
>  
> +	/*
> +	 * Briefly report CPU as online again to avoid false positive
> +	 * Lockdep-RCU splat when check_and_switch_context() acquires ASID
> +	 * spinlock.
> +	 */
> +	rcutree_report_cpu_starting(cpu);
>  	idle_task_exit();
> +	rcutree_report_cpu_dead();

I agree with the problem but disagree with the patch because it feels like a
terrible workaround.

Can we just use arch_spin_lock() for the cpu_asid_lock? This might require
acquiring the raw_lock within the raw_spinlock_t, but there is precedent:

arch/powerpc/kvm/book3s_hv_rm_mmu.c:245:
arch_spin_lock(&kvm->mmu_lock.rlock.raw_lock);

IMO, lockdep tracking of this lock is not necessary or possible considering the
hotplug situation.

Or is there a reason you need lockdep working for the cpu_asid_lock?

thanks,

 - Joel
Stefan Wiehler March 9, 2024, 7:45 a.m. UTC | #8
> I agree with the problem but disagree with the patch because it feels like a
> terrible workaround.
> 
> Can we just use arch_spin_lock() for the cpu_asid_lock? This might require
> acquiring the raw_lock within the raw_spinlock_t, but there is precedent:
> 
> arch/powerpc/kvm/book3s_hv_rm_mmu.c:245:
> arch_spin_lock(&kvm->mmu_lock.rlock.raw_lock);
> 
> IMO, lockdep tracking of this lock is not necessary or possible considering the
> hotplug situation.
> 
> Or is there a reason you need lockdep working for the cpu_asid_lock?

I was not aware of this possibility to bypass lockdep tracing, but this seems
to work and indeed looks like less of a workaround:

diff --git a/arch/arm/mm/context.c b/arch/arm/mm/context.c
index 4204ffa2d104..4fc2c559f1b6 100644
--- a/arch/arm/mm/context.c
+++ b/arch/arm/mm/context.c
@@ -254,7 +254,8 @@ void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk)
            && atomic64_xchg(&per_cpu(active_asids, cpu), asid))
                goto switch_mm_fastpath;

-       raw_spin_lock_irqsave(&cpu_asid_lock, flags);
+       local_irq_save(flags);
+       arch_spin_lock(&cpu_asid_lock.raw_lock);
        /* Check that our ASID belongs to the current generation. */
        asid = atomic64_read(&mm->context.id);
        if ((asid ^ atomic64_read(&asid_generation)) >> ASID_BITS) {
@@ -269,7 +270,8 @@ void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk)

        atomic64_set(&per_cpu(active_asids, cpu), asid);
        cpumask_set_cpu(cpu, mm_cpumask(mm));
-       raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);
+       arch_spin_unlock(&cpu_asid_lock.raw_lock);
+       local_irq_restore(flags);

 switch_mm_fastpath:
        cpu_switch_mm(mm->pgd, mm);

@Russell, what do you think?
Russell King (Oracle) March 9, 2024, 9:57 a.m. UTC | #9
On Sat, Mar 09, 2024 at 08:45:35AM +0100, Stefan Wiehler wrote:
> > I agree with the problem but disagree with the patch because it feels like a
> > terrible workaround.
> > 
> > Can we just use arch_spin_lock() for the cpu_asid_lock? This might require
> > acquiring the raw_lock within the raw_spinlock_t, but there is precedent:
> > 
> > arch/powerpc/kvm/book3s_hv_rm_mmu.c:245:
> > arch_spin_lock(&kvm->mmu_lock.rlock.raw_lock);
> > 
> > IMO, lockdep tracking of this lock is not necessary or possible considering the
> > hotplug situation.
> > 
> > Or is there a reason you need lockdep working for the cpu_asid_lock?
> 
> I was not aware of this possibility to bypass lockdep tracing, but this seems
> to work and indeed looks like less of a workaround:
> 
> diff --git a/arch/arm/mm/context.c b/arch/arm/mm/context.c
> index 4204ffa2d104..4fc2c559f1b6 100644
> --- a/arch/arm/mm/context.c
> +++ b/arch/arm/mm/context.c
> @@ -254,7 +254,8 @@ void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk)
>             && atomic64_xchg(&per_cpu(active_asids, cpu), asid))
>                 goto switch_mm_fastpath;
> 
> -       raw_spin_lock_irqsave(&cpu_asid_lock, flags);
> +       local_irq_save(flags);
> +       arch_spin_lock(&cpu_asid_lock.raw_lock);
>         /* Check that our ASID belongs to the current generation. */
>         asid = atomic64_read(&mm->context.id);
>         if ((asid ^ atomic64_read(&asid_generation)) >> ASID_BITS) {
> @@ -269,7 +270,8 @@ void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk)
> 
>         atomic64_set(&per_cpu(active_asids, cpu), asid);
>         cpumask_set_cpu(cpu, mm_cpumask(mm));
> -       raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);
> +       arch_spin_unlock(&cpu_asid_lock.raw_lock);
> +       local_irq_restore(flags);
> 
>  switch_mm_fastpath:
>         cpu_switch_mm(mm->pgd, mm);
> 
> @Russell, what do you think?

I think this is Will Deacon's code, so we ought to hear from Will...
Russell King (Oracle) March 11, 2024, 4:08 p.m. UTC | #10
On Thu, Mar 07, 2024 at 05:09:51PM +0100, Stefan Wiehler wrote:
> With CONFIG_PROVE_RCU_LIST=y and by executing
> 
>   $ echo 0 > /sys/devices/system/cpu/cpu1/online
> 
> one can trigger the following Lockdep-RCU splat on ARM:
> 
>   =============================
>   WARNING: suspicious RCU usage
>   6.8.0-rc7-00001-g0db1d0ed8958 #10 Not tainted
>   -----------------------------
>   kernel/locking/lockdep.c:3762 RCU-list traversed in non-reader section!!
> 
>   other info that might help us debug this:
> 
>   RCU used illegally from offline CPU!
>   rcu_scheduler_active = 2, debug_locks = 1
>   no locks held by swapper/1/0.
> 
>   stack backtrace:
>   CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.8.0-rc7-00001-g0db1d0ed8958 #10
>   Hardware name: Allwinner sun8i Family
>    unwind_backtrace from show_stack+0x10/0x14
>    show_stack from dump_stack_lvl+0x60/0x90
>    dump_stack_lvl from lockdep_rcu_suspicious+0x150/0x1a0
>    lockdep_rcu_suspicious from __lock_acquire+0x11fc/0x29f8
>    __lock_acquire from lock_acquire+0x10c/0x348
>    lock_acquire from _raw_spin_lock_irqsave+0x50/0x6c
>    _raw_spin_lock_irqsave from check_and_switch_context+0x7c/0x4a8
>    check_and_switch_context from arch_cpu_idle_dead+0x10/0x7c
>    arch_cpu_idle_dead from do_idle+0xbc/0x138
>    do_idle from cpu_startup_entry+0x28/0x2c
>    cpu_startup_entry from secondary_start_kernel+0x11c/0x124
>    secondary_start_kernel from 0x401018a0
> 
> The CPU is already reported as offline from RCU perspective in
> cpuhp_report_idle_dead() before arch_cpu_idle_dead() is invoked. Above
> RCU-Lockdep splat is then triggered by check_and_switch_context() acquiring the
> ASID spinlock.
> 
> Avoid the false-positive Lockdep-RCU splat by briefly reporting the CPU as
> online again while the spinlock is held.
> 
> Signed-off-by: Stefan Wiehler <stefan.wiehler@nokia.com>

So what do I do about this? I see you submitted this to the patch system
on the 8th March, but proposed a different approach on the 9th March. I
don't see evidence that Paul is happy with the original approach either
and there's no ack/r-b's on anything here.
Stefan Wiehler March 11, 2024, 4:17 p.m. UTC | #11
> So what do I do about this? I see you submitted this to the patch system
> on the 8th March, but proposed a different approach on the 9th March. I
> don't see evidence that Paul is happy with the original approach either
> and there's no ack/r-b's on anything here.

I think we need to wait for feedback from Will Deacon regarding the new
arch_spin_lock() based approach. So please abandon the original proposal in the
patch system, at least for now…
Paul E. McKenney March 11, 2024, 7:01 p.m. UTC | #12
On Mon, Mar 11, 2024 at 05:17:43PM +0100, Stefan Wiehler wrote:
> > So what do I do about this? I see you submitted this to the patch system
> > on the 8th March, but proposed a different approach on the 9th March. I
> > don't see evidence that Paul is happy with the original approach either
> > and there's no ack/r-b's on anything here.
> 
> I think we need to wait for feedback from Will Deacon regarding the new
> arch_spin_lock() based approach. So please abandon the original proposal in the
> patch system, at least for now…

I prefer the arch_spin_lock() version, but the other approach also works.
I agree with Stefan that it would also be good to get Will's feedback.

The downside of the arch_spin_lock() approach is that, in theory, it
prevents lockdep from seeing deadlocks involving that lock acquisition.
But in practice, that is a problem only if additional locks can be
acquired while that lock is held, and there are no such additional
locks, right?

The downside of the original approach is that it further complicates
RCU's interaction with offline CPUs, especially should people copy that
approach in other pieces of offline code.

So, again, I prefer the arch_spin_lock() approach.

							Thanx, Paul
Will Deacon March 12, 2024, 10:14 p.m. UTC | #13
Hi Russell,

On Sat, Mar 09, 2024 at 09:57:04AM +0000, Russell King (Oracle) wrote:
> On Sat, Mar 09, 2024 at 08:45:35AM +0100, Stefan Wiehler wrote:
> > diff --git a/arch/arm/mm/context.c b/arch/arm/mm/context.c
> > index 4204ffa2d104..4fc2c559f1b6 100644
> > --- a/arch/arm/mm/context.c
> > +++ b/arch/arm/mm/context.c
> > @@ -254,7 +254,8 @@ void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk)
> >             && atomic64_xchg(&per_cpu(active_asids, cpu), asid))
> >                 goto switch_mm_fastpath;
> > 
> > -       raw_spin_lock_irqsave(&cpu_asid_lock, flags);
> > +       local_irq_save(flags);
> > +       arch_spin_lock(&cpu_asid_lock.raw_lock);
> >         /* Check that our ASID belongs to the current generation. */
> >         asid = atomic64_read(&mm->context.id);
> >         if ((asid ^ atomic64_read(&asid_generation)) >> ASID_BITS) {
> > @@ -269,7 +270,8 @@ void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk)
> > 
> >         atomic64_set(&per_cpu(active_asids, cpu), asid);
> >         cpumask_set_cpu(cpu, mm_cpumask(mm));
> > -       raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);
> > +       arch_spin_unlock(&cpu_asid_lock.raw_lock);
> > +       local_irq_restore(flags);
> > 
> >  switch_mm_fastpath:
> >         cpu_switch_mm(mm->pgd, mm);
> > 
> > @Russell, what do you think?
> 
> I think this is Will Deacon's code, so we ought to hear from Will...

Thanks for adding me in.

Using arch_spin_lock() really feels like a bodge to me. This code isn't
run only on the hot-unplug path, but rather this is part of switch_mm()
and we really should be able to have lockdep work properly there for
the usual case.

Now, do we actually need to worry about the ASID when switching to the
init_mm? I'd have thought that would be confined to global (kernel)
mappings, so I wonder whether we could avoid this slow path code
altogether like we do on arm64 in __switch_mm(). But I must confess that
I don't recall the details of the pre-LPAE MMU configuration...

Will
Russell King (Oracle) March 12, 2024, 10:39 p.m. UTC | #14
On Tue, Mar 12, 2024 at 10:14:40PM +0000, Will Deacon wrote:
> Hi Russell,
> 
> On Sat, Mar 09, 2024 at 09:57:04AM +0000, Russell King (Oracle) wrote:
> > On Sat, Mar 09, 2024 at 08:45:35AM +0100, Stefan Wiehler wrote:
> > > diff --git a/arch/arm/mm/context.c b/arch/arm/mm/context.c
> > > index 4204ffa2d104..4fc2c559f1b6 100644
> > > --- a/arch/arm/mm/context.c
> > > +++ b/arch/arm/mm/context.c
> > > @@ -254,7 +254,8 @@ void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk)
> > >             && atomic64_xchg(&per_cpu(active_asids, cpu), asid))
> > >                 goto switch_mm_fastpath;
> > > 
> > > -       raw_spin_lock_irqsave(&cpu_asid_lock, flags);
> > > +       local_irq_save(flags);
> > > +       arch_spin_lock(&cpu_asid_lock.raw_lock);
> > >         /* Check that our ASID belongs to the current generation. */
> > >         asid = atomic64_read(&mm->context.id);
> > >         if ((asid ^ atomic64_read(&asid_generation)) >> ASID_BITS) {
> > > @@ -269,7 +270,8 @@ void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk)
> > > 
> > >         atomic64_set(&per_cpu(active_asids, cpu), asid);
> > >         cpumask_set_cpu(cpu, mm_cpumask(mm));
> > > -       raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);
> > > +       arch_spin_unlock(&cpu_asid_lock.raw_lock);
> > > +       local_irq_restore(flags);
> > > 
> > >  switch_mm_fastpath:
> > >         cpu_switch_mm(mm->pgd, mm);
> > > 
> > > @Russell, what do you think?
> > 
> > I think this is Will Deacon's code, so we ought to hear from Will...
> 
> Thanks for adding me in.
> 
> Using arch_spin_lock() really feels like a bodge to me. This code isn't
> run only on the hot-unplug path, but rather this is part of switch_mm()
> and we really should be able to have lockdep work properly there for
> the usual case.
> 
> Now, do we actually need to worry about the ASID when switching to the
> init_mm? I'd have thought that would be confined to global (kernel)
> mappings, so I wonder whether we could avoid this slow path code
> altogether like we do on arm64 in __switch_mm(). But I must confess that
> I don't recall the details of the pre-LPAE MMU configuration...

As the init_mm shouldn't have any userspace mappings, isn't the ASID
entirely redundant? Couldn't check_and_switch_context() just simply
do the vmalloc seq check, set the reserved ASID, and then head to
switch_mm_fastpath to call the mm switch code?
Will Deacon March 13, 2024, 12:32 a.m. UTC | #15
On Tue, Mar 12, 2024 at 10:39:30PM +0000, Russell King (Oracle) wrote:
> On Tue, Mar 12, 2024 at 10:14:40PM +0000, Will Deacon wrote:
> > On Sat, Mar 09, 2024 at 09:57:04AM +0000, Russell King (Oracle) wrote:
> > > On Sat, Mar 09, 2024 at 08:45:35AM +0100, Stefan Wiehler wrote:
> > > > diff --git a/arch/arm/mm/context.c b/arch/arm/mm/context.c
> > > > index 4204ffa2d104..4fc2c559f1b6 100644
> > > > --- a/arch/arm/mm/context.c
> > > > +++ b/arch/arm/mm/context.c
> > > > @@ -254,7 +254,8 @@ void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk)
> > > >             && atomic64_xchg(&per_cpu(active_asids, cpu), asid))
> > > >                 goto switch_mm_fastpath;
> > > > 
> > > > -       raw_spin_lock_irqsave(&cpu_asid_lock, flags);
> > > > +       local_irq_save(flags);
> > > > +       arch_spin_lock(&cpu_asid_lock.raw_lock);
> > > >         /* Check that our ASID belongs to the current generation. */
> > > >         asid = atomic64_read(&mm->context.id);
> > > >         if ((asid ^ atomic64_read(&asid_generation)) >> ASID_BITS) {
> > > > @@ -269,7 +270,8 @@ void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk)
> > > > 
> > > >         atomic64_set(&per_cpu(active_asids, cpu), asid);
> > > >         cpumask_set_cpu(cpu, mm_cpumask(mm));
> > > > -       raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);
> > > > +       arch_spin_unlock(&cpu_asid_lock.raw_lock);
> > > > +       local_irq_restore(flags);
> > > > 
> > > >  switch_mm_fastpath:
> > > >         cpu_switch_mm(mm->pgd, mm);
> > > > 
> > > > @Russell, what do you think?
> > > 
> > > I think this is Will Deacon's code, so we ought to hear from Will...
> > 
> > Thanks for adding me in.
> > 
> > Using arch_spin_lock() really feels like a bodge to me. This code isn't
> > run only on the hot-unplug path, but rather this is part of switch_mm()
> > and we really should be able to have lockdep work properly there for
> > the usual case.
> > 
> > Now, do we actually need to worry about the ASID when switching to the
> > init_mm? I'd have thought that would be confined to global (kernel)
> > mappings, so I wonder whether we could avoid this slow path code
> > altogether like we do on arm64 in __switch_mm(). But I must confess that
> > I don't recall the details of the pre-LPAE MMU configuration...
> 
> As the init_mm shouldn't have any userspace mappings, isn't the ASID
> entirely redundant? Couldn't check_and_switch_context() just simply
> do the vmalloc seq check, set the reserved ASID, and then head to
> switch_mm_fastpath to call the mm switch code?

Right, that's what I was thinking too, but I have some distant memories
of the module space causing potential issues in some configurations. Does
that ring a bell with you?

Will
Russell King (Oracle) March 13, 2024, 9:58 a.m. UTC | #16
On Wed, Mar 13, 2024 at 12:32:44AM +0000, Will Deacon wrote:
> On Tue, Mar 12, 2024 at 10:39:30PM +0000, Russell King (Oracle) wrote:
> > On Tue, Mar 12, 2024 at 10:14:40PM +0000, Will Deacon wrote:
> > > On Sat, Mar 09, 2024 at 09:57:04AM +0000, Russell King (Oracle) wrote:
> > > > On Sat, Mar 09, 2024 at 08:45:35AM +0100, Stefan Wiehler wrote:
> > > > > diff --git a/arch/arm/mm/context.c b/arch/arm/mm/context.c
> > > > > index 4204ffa2d104..4fc2c559f1b6 100644
> > > > > --- a/arch/arm/mm/context.c
> > > > > +++ b/arch/arm/mm/context.c
> > > > > @@ -254,7 +254,8 @@ void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk)
> > > > >             && atomic64_xchg(&per_cpu(active_asids, cpu), asid))
> > > > >                 goto switch_mm_fastpath;
> > > > > 
> > > > > -       raw_spin_lock_irqsave(&cpu_asid_lock, flags);
> > > > > +       local_irq_save(flags);
> > > > > +       arch_spin_lock(&cpu_asid_lock.raw_lock);
> > > > >         /* Check that our ASID belongs to the current generation. */
> > > > >         asid = atomic64_read(&mm->context.id);
> > > > >         if ((asid ^ atomic64_read(&asid_generation)) >> ASID_BITS) {
> > > > > @@ -269,7 +270,8 @@ void check_and_switch_context(struct mm_struct *mm, struct task_struct *tsk)
> > > > > 
> > > > >         atomic64_set(&per_cpu(active_asids, cpu), asid);
> > > > >         cpumask_set_cpu(cpu, mm_cpumask(mm));
> > > > > -       raw_spin_unlock_irqrestore(&cpu_asid_lock, flags);
> > > > > +       arch_spin_unlock(&cpu_asid_lock.raw_lock);
> > > > > +       local_irq_restore(flags);
> > > > > 
> > > > >  switch_mm_fastpath:
> > > > >         cpu_switch_mm(mm->pgd, mm);
> > > > > 
> > > > > @Russell, what do you think?
> > > > 
> > > > I think this is Will Deacon's code, so we ought to hear from Will...
> > > 
> > > Thanks for adding me in.
> > > 
> > > Using arch_spin_lock() really feels like a bodge to me. This code isn't
> > > run only on the hot-unplug path, but rather this is part of switch_mm()
> > > and we really should be able to have lockdep work properly there for
> > > the usual case.
> > > 
> > > Now, do we actually need to worry about the ASID when switching to the
> > > init_mm? I'd have thought that would be confined to global (kernel)
> > > mappings, so I wonder whether we could avoid this slow path code
> > > altogether like we do on arm64 in __switch_mm(). But I must confess that
> > > I don't recall the details of the pre-LPAE MMU configuration...
> > 
> > As the init_mm shouldn't have any userspace mappings, isn't the ASID
> > entirely redundant? Couldn't check_and_switch_context() just simply
> > do the vmalloc seq check, set the reserved ASID, and then head to
> > switch_mm_fastpath to call the mm switch code?
> 
> Right, that's what I was thinking too, but I have some distant memories
> of the module space causing potential issues in some configurations. Does
> that ring a bell with you?

For 32-bit non-LPAE, I can't recall any issues, nor can I think of any
because module space is just another few entries in the L1 page tables
below the direct mapping (which isn't a problem because we don't use
anything in hardware to separate the kernel space from user space in
the page tables.) TTBCR is set to 0.

For LPAE, there may be issues there because TTBR0 and TTBR1 are both
used, and TTBCR.T1SZ is set non-zero to:

arch/arm/include/asm/pgtable-3level-hwdef.h:#define TTBR1_SIZE (((PAGE_OFFSET >> 30) - 1) << 16)

so I suspect that's where the problems may lie - but then module
mappings should also exist in init_mm (swapper_pg_dir) and should
be global.
diff mbox series

Patch

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 3431c0553f45..6875e2c5dd50 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -319,7 +319,14 @@  void __noreturn arch_cpu_idle_dead(void)
 {
 	unsigned int cpu = smp_processor_id();
 
+	/*
+	 * Briefly report CPU as online again to avoid false positive
+	 * Lockdep-RCU splat when check_and_switch_context() acquires ASID
+	 * spinlock.
+	 */
+	rcutree_report_cpu_starting(cpu);
 	idle_task_exit();
+	rcutree_report_cpu_dead();
 
 	local_irq_disable();