mbox series

[v2,0/8] NMI-safe SRCU reader API

Message ID 20220929180714.GA2874192@paulmck-ThinkPad-P17-Gen-1 (mailing list archive)
Headers show
Series NMI-safe SRCU reader API | expand

Message

Paul E. McKenney Sept. 29, 2022, 6:07 p.m. UTC
Hello!

This RFC series provides the second version of an NMI-safe SRCU reader API
in the guise of srcu_read_lock_nmisafe() and srcu_read_unlock_nmisafe().
A given srcu_struct structure must use either the traditional
srcu_read_lock() and srcu_read_unlock() API or the new _nmisafe() API:
Mixing and matching is not permitted.  So much so that kernels built
with CONFIG_PROVE_RCU=y will complain if you try it.

The reason for this restriction is that I have yet to find a use case
that is not a accident waiting to happen.  And if free intermixing
were permitted, it is pretty much a given that someone somewhere will
get confused and use srcu_read_lock_nmisafe() within NMI handlers and
srcu_read_lock() elsewhere, which will not (repeat, NOT) provide NMI
safety.

I do not expect to push this into the v6.1 merge window.  However, if
the printk() series that needs it goes in, then I will push it as a fix
for the resulting regression.

The series is as follows:

1.	Convert ->srcu_lock_count and ->srcu_unlock_count to atomic.

2.	Create an srcu_read_lock_nmisafe() and srcu_read_unlock_nmisafe().

3.	Check for consistent per-CPU per-srcu_struct NMI safety.

4.	Check for consistent global per-srcu_struct NMI safety.

5.	Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option.

6.	Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option.

7.	Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option.

8.	Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option.

Changes since v1 RFC:

1.	Added enabling patches for arm64, loongarch, s390, and x86.
	These have what appear to me to be NMI-safe this_cpu_inc()
	implementations.

2.	Fix a build error on !SMP kernels built without SRCU.

3.	Fix a build error on !SMP kernels.

						Thanx, Paul

------------------------------------------------------------------------

 b/arch/arm64/Kconfig       |    1 
 b/arch/loongarch/Kconfig   |    1 
 b/arch/s390/Kconfig        |    1 
 b/arch/x86/Kconfig         |    1 
 b/include/linux/srcu.h     |   39 +++++++++++++++++++++
 b/include/linux/srcutiny.h |   11 ++++++
 b/include/linux/srcutree.h |    4 +-
 b/kernel/rcu/Kconfig       |    3 +
 b/kernel/rcu/rcutorture.c  |   11 ++++--
 b/kernel/rcu/srcutree.c    |   24 ++++++-------
 include/linux/srcu.h       |    4 +-
 include/linux/srcutiny.h   |    4 +-
 include/linux/srcutree.h   |   12 +++++-
 kernel/rcu/srcutree.c      |   82 +++++++++++++++++++++++++++++++++++++++------
 14 files changed, 166 insertions(+), 32 deletions(-)

Comments

Frederic Weisbecker Oct. 3, 2022, 2:11 p.m. UTC | #1
On Thu, Sep 29, 2022 at 11:07:14AM -0700, Paul E. McKenney wrote:
> Hello!
> 
> This RFC series provides the second version of an NMI-safe SRCU reader API
> in the guise of srcu_read_lock_nmisafe() and srcu_read_unlock_nmisafe().
> A given srcu_struct structure must use either the traditional
> srcu_read_lock() and srcu_read_unlock() API or the new _nmisafe() API:
> Mixing and matching is not permitted.  So much so that kernels built
> with CONFIG_PROVE_RCU=y will complain if you try it.
> 
> The reason for this restriction is that I have yet to find a use case
> that is not a accident waiting to happen.  And if free intermixing
> were permitted, it is pretty much a given that someone somewhere will
> get confused and use srcu_read_lock_nmisafe() within NMI handlers and
> srcu_read_lock() elsewhere, which will not (repeat, NOT) provide NMI
> safety.
> 
> I do not expect to push this into the v6.1 merge window.  However, if
> the printk() series that needs it goes in, then I will push it as a fix
> for the resulting regression.
> 
> The series is as follows:
> 
> 1.	Convert ->srcu_lock_count and ->srcu_unlock_count to atomic.
> 
> 2.	Create an srcu_read_lock_nmisafe() and srcu_read_unlock_nmisafe().
> 
> 3.	Check for consistent per-CPU per-srcu_struct NMI safety.
> 
> 4.	Check for consistent global per-srcu_struct NMI safety.
> 
> 5.	Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option.
> 
> 6.	Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option.
> 
> 7.	Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option.
> 
> 8.	Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option.
> 
> Changes since v1 RFC:
> 
> 1.	Added enabling patches for arm64, loongarch, s390, and x86.
> 	These have what appear to me to be NMI-safe this_cpu_inc()
> 	implementations.
> 
> 2.	Fix a build error on !SMP kernels built without SRCU.
> 
> 3.	Fix a build error on !SMP kernels.
> 
> 						Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
>  b/arch/arm64/Kconfig       |    1 
>  b/arch/loongarch/Kconfig   |    1 
>  b/arch/s390/Kconfig        |    1 
>  b/arch/x86/Kconfig         |    1 
>  b/include/linux/srcu.h     |   39 +++++++++++++++++++++
>  b/include/linux/srcutiny.h |   11 ++++++
>  b/include/linux/srcutree.h |    4 +-
>  b/kernel/rcu/Kconfig       |    3 +
>  b/kernel/rcu/rcutorture.c  |   11 ++++--
>  b/kernel/rcu/srcutree.c    |   24 ++++++-------
>  include/linux/srcu.h       |    4 +-
>  include/linux/srcutiny.h   |    4 +-
>  include/linux/srcutree.h   |   12 +++++-
>  kernel/rcu/srcutree.c      |   82 +++++++++++++++++++++++++++++++++++++++------
>  14 files changed, 166 insertions(+), 32 deletions(-)

Except for patches 6/7/8, for which I may miss subtle things:

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Paul E. McKenney Oct. 3, 2022, 4:38 p.m. UTC | #2
On Mon, Oct 03, 2022 at 04:11:41PM +0200, Frederic Weisbecker wrote:
> On Thu, Sep 29, 2022 at 11:07:14AM -0700, Paul E. McKenney wrote:
> > Hello!
> > 
> > This RFC series provides the second version of an NMI-safe SRCU reader API
> > in the guise of srcu_read_lock_nmisafe() and srcu_read_unlock_nmisafe().
> > A given srcu_struct structure must use either the traditional
> > srcu_read_lock() and srcu_read_unlock() API or the new _nmisafe() API:
> > Mixing and matching is not permitted.  So much so that kernels built
> > with CONFIG_PROVE_RCU=y will complain if you try it.
> > 
> > The reason for this restriction is that I have yet to find a use case
> > that is not a accident waiting to happen.  And if free intermixing
> > were permitted, it is pretty much a given that someone somewhere will
> > get confused and use srcu_read_lock_nmisafe() within NMI handlers and
> > srcu_read_lock() elsewhere, which will not (repeat, NOT) provide NMI
> > safety.
> > 
> > I do not expect to push this into the v6.1 merge window.  However, if
> > the printk() series that needs it goes in, then I will push it as a fix
> > for the resulting regression.
> > 
> > The series is as follows:
> > 
> > 1.	Convert ->srcu_lock_count and ->srcu_unlock_count to atomic.
> > 
> > 2.	Create an srcu_read_lock_nmisafe() and srcu_read_unlock_nmisafe().
> > 
> > 3.	Check for consistent per-CPU per-srcu_struct NMI safety.
> > 
> > 4.	Check for consistent global per-srcu_struct NMI safety.
> > 
> > 5.	Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option.
> > 
> > 6.	Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option.
> > 
> > 7.	Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option.
> > 
> > 8.	Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option.
> > 
> > Changes since v1 RFC:
> > 
> > 1.	Added enabling patches for arm64, loongarch, s390, and x86.
> > 	These have what appear to me to be NMI-safe this_cpu_inc()
> > 	implementations.
> > 
> > 2.	Fix a build error on !SMP kernels built without SRCU.
> > 
> > 3.	Fix a build error on !SMP kernels.
> > 
> > 						Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> >  b/arch/arm64/Kconfig       |    1 
> >  b/arch/loongarch/Kconfig   |    1 
> >  b/arch/s390/Kconfig        |    1 
> >  b/arch/x86/Kconfig         |    1 
> >  b/include/linux/srcu.h     |   39 +++++++++++++++++++++
> >  b/include/linux/srcutiny.h |   11 ++++++
> >  b/include/linux/srcutree.h |    4 +-
> >  b/kernel/rcu/Kconfig       |    3 +
> >  b/kernel/rcu/rcutorture.c  |   11 ++++--
> >  b/kernel/rcu/srcutree.c    |   24 ++++++-------
> >  include/linux/srcu.h       |    4 +-
> >  include/linux/srcutiny.h   |    4 +-
> >  include/linux/srcutree.h   |   12 +++++-
> >  kernel/rcu/srcutree.c      |   82 +++++++++++++++++++++++++++++++++++++++------
> >  14 files changed, 166 insertions(+), 32 deletions(-)
> 
> Except for patches 6/7/8, for which I may miss subtle things:
> 
> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

Applied, thank you!

The new -rcu branch is srcunmisafe.2022.10.03a.

							Thanx, Paul
Joel Fernandes Oct. 14, 2022, 10:47 p.m. UTC | #3
On Thu, Sep 29, 2022 at 11:07:14AM -0700, Paul E. McKenney wrote:
> Hello!
> 
> This RFC series provides the second version of an NMI-safe SRCU reader API
> in the guise of srcu_read_lock_nmisafe() and srcu_read_unlock_nmisafe().

Just a comment about high-level use of the new NMI-safe SRCU api: say if for
certain arch, NMI cannot be interrupted (by breakpoint or something), then
using NMI-safe API will force such arch to potentially use more expensive RMW
atomic than the previously cheap local non-atomic operations that the arch
was able to get away with.

Thoughts? Otherwise, LGTM.

thanks,

 - Joel


> A given srcu_struct structure must use either the traditional
> srcu_read_lock() and srcu_read_unlock() API or the new _nmisafe() API:
> Mixing and matching is not permitted.  So much so that kernels built
> with CONFIG_PROVE_RCU=y will complain if you try it.
> 
> The reason for this restriction is that I have yet to find a use case
> that is not a accident waiting to happen.  And if free intermixing
> were permitted, it is pretty much a given that someone somewhere will
> get confused and use srcu_read_lock_nmisafe() within NMI handlers and
> srcu_read_lock() elsewhere, which will not (repeat, NOT) provide NMI
> safety.
> 
> I do not expect to push this into the v6.1 merge window.  However, if
> the printk() series that needs it goes in, then I will push it as a fix
> for the resulting regression.
> 
> The series is as follows:
> 
> 1.	Convert ->srcu_lock_count and ->srcu_unlock_count to atomic.
> 
> 2.	Create an srcu_read_lock_nmisafe() and srcu_read_unlock_nmisafe().
> 
> 3.	Check for consistent per-CPU per-srcu_struct NMI safety.
> 
> 4.	Check for consistent global per-srcu_struct NMI safety.
> 
> 5.	Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option.
> 
> 6.	Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option.
> 
> 7.	Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option.
> 
> 8.	Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option.
> 
> Changes since v1 RFC:
> 
> 1.	Added enabling patches for arm64, loongarch, s390, and x86.
> 	These have what appear to me to be NMI-safe this_cpu_inc()
> 	implementations.
> 
> 2.	Fix a build error on !SMP kernels built without SRCU.
> 
> 3.	Fix a build error on !SMP kernels.
> 
> 						Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
>  b/arch/arm64/Kconfig       |    1 
>  b/arch/loongarch/Kconfig   |    1 
>  b/arch/s390/Kconfig        |    1 
>  b/arch/x86/Kconfig         |    1 
>  b/include/linux/srcu.h     |   39 +++++++++++++++++++++
>  b/include/linux/srcutiny.h |   11 ++++++
>  b/include/linux/srcutree.h |    4 +-
>  b/kernel/rcu/Kconfig       |    3 +
>  b/kernel/rcu/rcutorture.c  |   11 ++++--
>  b/kernel/rcu/srcutree.c    |   24 ++++++-------
>  include/linux/srcu.h       |    4 +-
>  include/linux/srcutiny.h   |    4 +-
>  include/linux/srcutree.h   |   12 +++++-
>  kernel/rcu/srcutree.c      |   82 +++++++++++++++++++++++++++++++++++++++------
>  14 files changed, 166 insertions(+), 32 deletions(-)
Joel Fernandes Oct. 14, 2022, 10:52 p.m. UTC | #4
On Fri, Oct 14, 2022 at 6:47 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> On Thu, Sep 29, 2022 at 11:07:14AM -0700, Paul E. McKenney wrote:
> > Hello!
> >
> > This RFC series provides the second version of an NMI-safe SRCU reader API
> > in the guise of srcu_read_lock_nmisafe() and srcu_read_unlock_nmisafe().
>
> Just a comment about high-level use of the new NMI-safe SRCU api: say if for
> certain arch, NMI cannot be interrupted (by breakpoint or something), then
> using NMI-safe API will force such arch to potentially use more expensive RMW
> atomic than the previously cheap local non-atomic operations that the arch
> was able to get away with.
>
> Thoughts? Otherwise, LGTM.
>

I take it back. You are indeed guarding it correctly as below. I got
confused by another patch that was doing debug checking even for arch
that does not need it (which is a good thing).

+config NEED_SRCU_NMI_SAFE
+ def_bool HAVE_NMI && !ARCH_HAS_NMI_SAFE_THIS_CPU_OPS && !TINY_SRCU
+

Thanks!

 - Joel


> thanks,
>
>  - Joel
>
>
> > A given srcu_struct structure must use either the traditional
> > srcu_read_lock() and srcu_read_unlock() API or the new _nmisafe() API:
> > Mixing and matching is not permitted.  So much so that kernels built
> > with CONFIG_PROVE_RCU=y will complain if you try it.
> >
> > The reason for this restriction is that I have yet to find a use case
> > that is not a accident waiting to happen.  And if free intermixing
> > were permitted, it is pretty much a given that someone somewhere will
> > get confused and use srcu_read_lock_nmisafe() within NMI handlers and
> > srcu_read_lock() elsewhere, which will not (repeat, NOT) provide NMI
> > safety.
> >
> > I do not expect to push this into the v6.1 merge window.  However, if
> > the printk() series that needs it goes in, then I will push it as a fix
> > for the resulting regression.
> >
> > The series is as follows:
> >
> > 1.    Convert ->srcu_lock_count and ->srcu_unlock_count to atomic.
> >
> > 2.    Create an srcu_read_lock_nmisafe() and srcu_read_unlock_nmisafe().
> >
> > 3.    Check for consistent per-CPU per-srcu_struct NMI safety.
> >
> > 4.    Check for consistent global per-srcu_struct NMI safety.
> >
> > 5.    Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option.
> >
> > 6.    Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option.
> >
> > 7.    Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option.
> >
> > 8.    Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option.
> >
> > Changes since v1 RFC:
> >
> > 1.    Added enabling patches for arm64, loongarch, s390, and x86.
> >       These have what appear to me to be NMI-safe this_cpu_inc()
> >       implementations.
> >
> > 2.    Fix a build error on !SMP kernels built without SRCU.
> >
> > 3.    Fix a build error on !SMP kernels.
> >
> >                                               Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> >  b/arch/arm64/Kconfig       |    1
> >  b/arch/loongarch/Kconfig   |    1
> >  b/arch/s390/Kconfig        |    1
> >  b/arch/x86/Kconfig         |    1
> >  b/include/linux/srcu.h     |   39 +++++++++++++++++++++
> >  b/include/linux/srcutiny.h |   11 ++++++
> >  b/include/linux/srcutree.h |    4 +-
> >  b/kernel/rcu/Kconfig       |    3 +
> >  b/kernel/rcu/rcutorture.c  |   11 ++++--
> >  b/kernel/rcu/srcutree.c    |   24 ++++++-------
> >  include/linux/srcu.h       |    4 +-
> >  include/linux/srcutiny.h   |    4 +-
> >  include/linux/srcutree.h   |   12 +++++-
> >  kernel/rcu/srcutree.c      |   82 +++++++++++++++++++++++++++++++++++++++------
> >  14 files changed, 166 insertions(+), 32 deletions(-)
John Ogness Oct. 18, 2022, 10:33 a.m. UTC | #5
Hi Paul,

On 2022-09-29, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> This RFC series provides the second version of an NMI-safe SRCU reader
> API in the guise of srcu_read_lock_nmisafe() and
> srcu_read_unlock_nmisafe().

I would like to post a new series for printk that will rely on the
NMI-safe reader API. From the feedback of this series it appears that
the RFC v2 is an acceptable starting point. How should we proceed?

Will Frederic be sending a patch for the extra safety checks using
srcu_nmi_flags? Are you planning on posting a new version? Should I
include this or another version within my upcoming series?

Thanks for all your help on this!

John
Paul E. McKenney Oct. 18, 2022, 3:24 p.m. UTC | #6
On Tue, Oct 18, 2022 at 12:39:40PM +0206, John Ogness wrote:
> Hi Paul,
> 
> On 2022-09-29, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> > This RFC series provides the second version of an NMI-safe SRCU reader
> > API in the guise of srcu_read_lock_nmisafe() and
> > srcu_read_unlock_nmisafe().
> 
> I would like to post a new series for printk that will rely on the
> NMI-safe reader API. From the feedback of this series it appears that
> the RFC v2 is an acceptable starting point. How should we proceed?

Currently, I have this series on -rcu branch srcunmisafe.2022.10.03a,
with Frederic's patches on branch "dev".  I will be producing a shiny
new branch with your fix and Frederic's debug later today, Pacific Time.
With luck, based on v6.1-rc1.

> Will Frederic be sending a patch for the extra safety checks using
> srcu_nmi_flags? Are you planning on posting a new version? Should I
> include this or another version within my upcoming series?

I will be incorporating these commits from Frederic:

6558b914fc4e ("srcu: Warn when NMI-unsafe API is used in NMI")
5dc788627109 ("srcu: Explain the reason behind the read side critical section on GP start")
54a118fce487 ("srcu: Debug NMI safety even on archs that don't require it")

Are there other patches I should be applying?

> Thanks for all your help on this!

And looking forward to the new improved printk()!

							Thanx, Paul
John Ogness Oct. 18, 2022, 6:44 p.m. UTC | #7
On 2022-10-18, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> Currently, I have this series on -rcu branch srcunmisafe.2022.10.03a,
> with Frederic's patches on branch "dev".  I will be producing a shiny
> new branch with your fix and Frederic's debug later today, Pacific
> Time.  With luck, based on v6.1-rc1.

Perfect!

> I will be incorporating these commits from Frederic:
>
> 6558b914fc4e ("srcu: Warn when NMI-unsafe API is used in NMI")
> 5dc788627109 ("srcu: Explain the reason behind the read side critical section on GP start")
> 54a118fce487 ("srcu: Debug NMI safety even on archs that don't require it")
>
> Are there other patches I should be applying?

Not that I am aware of.

John
Paul E. McKenney Oct. 18, 2022, 6:59 p.m. UTC | #8
On Tue, Oct 18, 2022 at 08:50:57PM +0206, John Ogness wrote:
> On 2022-10-18, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> > Currently, I have this series on -rcu branch srcunmisafe.2022.10.03a,
> > with Frederic's patches on branch "dev".  I will be producing a shiny
> > new branch with your fix and Frederic's debug later today, Pacific
> > Time.  With luck, based on v6.1-rc1.
> 
> Perfect!
> 
> > I will be incorporating these commits from Frederic:
> >
> > 6558b914fc4e ("srcu: Warn when NMI-unsafe API is used in NMI")
> > 5dc788627109 ("srcu: Explain the reason behind the read side critical section on GP start")
> > 54a118fce487 ("srcu: Debug NMI safety even on archs that don't require it")
> >
> > Are there other patches I should be applying?
> 
> Not that I am aware of.

And if you are in a hurry, the v6.0-rc1 stack is at srcunmisafe.2022.10.18a.
With luck, the v6.1-rc1 stack will be at srcunmisafe.2022.10.18b before
the day is over, Pacific Time.

							Thanx, Paul
Paul E. McKenney Oct. 18, 2022, 9:57 p.m. UTC | #9
On Tue, Oct 18, 2022 at 11:59:36AM -0700, Paul E. McKenney wrote:
> On Tue, Oct 18, 2022 at 08:50:57PM +0206, John Ogness wrote:
> > On 2022-10-18, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> > > Currently, I have this series on -rcu branch srcunmisafe.2022.10.03a,
> > > with Frederic's patches on branch "dev".  I will be producing a shiny
> > > new branch with your fix and Frederic's debug later today, Pacific
> > > Time.  With luck, based on v6.1-rc1.
> > 
> > Perfect!
> > 
> > > I will be incorporating these commits from Frederic:
> > >
> > > 6558b914fc4e ("srcu: Warn when NMI-unsafe API is used in NMI")
> > > 5dc788627109 ("srcu: Explain the reason behind the read side critical section on GP start")
> > > 54a118fce487 ("srcu: Debug NMI safety even on archs that don't require it")
> > >
> > > Are there other patches I should be applying?
> > 
> > Not that I am aware of.
> 
> And if you are in a hurry, the v6.0-rc1 stack is at srcunmisafe.2022.10.18a.
> With luck, the v6.1-rc1 stack will be at srcunmisafe.2022.10.18b before
> the day is over, Pacific Time.

And the v6.1-rc1 stack is now at srcunmisafe.2022.10.18b.  ;-)

								Thanx, Paul
John Ogness Oct. 19, 2022, 11:13 a.m. UTC | #10
On 2022-10-18, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> And the v6.1-rc1 stack is now at srcunmisafe.2022.10.18b.

Thanks!

I guess the kernel test robot will catch this, but if you checkout
commit 79c95dc428ad ("arch/x86: Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
Kconfig option") and build for x86_64, you will get:

x86_64-linux-gnu-ld: kernel/rcu/srcutree.o: in function `srcu_gp_start_if_needed':
srcutree.c:(.text+0x133a): undefined reference to `__srcu_read_lock_nmisafe'
x86_64-linux-gnu-ld: srcutree.c:(.text+0x1490): undefined reference to `__srcu_read_unlock_nmisafe'
x86_64-linux-gnu-ld: kernel/rcu/srcutree.o: in function `srcu_barrier':
srcutree.c:(.text+0x1b03): undefined reference to `__srcu_read_lock_nmisafe'
x86_64-linux-gnu-ld: srcutree.c:(.text+0x1b38): undefined reference to `__srcu_read_unlock_nmisafe'

Note that this error is fixed with a later commit:

commit c2d158a284ab ("srcu: Debug NMI safety even on archs that don't
require it").

This does not affect what I am working on, so feel free to take care of
it whenever it fits your schedule.

John Ogness
Paul E. McKenney Oct. 19, 2022, 7:14 p.m. UTC | #11
On Wed, Oct 19, 2022 at 01:19:53PM +0206, John Ogness wrote:
> On 2022-10-18, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> > And the v6.1-rc1 stack is now at srcunmisafe.2022.10.18b.
> 
> Thanks!
> 
> I guess the kernel test robot will catch this, but if you checkout
> commit 79c95dc428ad ("arch/x86: Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
> Kconfig option") and build for x86_64, you will get:
> 
> x86_64-linux-gnu-ld: kernel/rcu/srcutree.o: in function `srcu_gp_start_if_needed':
> srcutree.c:(.text+0x133a): undefined reference to `__srcu_read_lock_nmisafe'
> x86_64-linux-gnu-ld: srcutree.c:(.text+0x1490): undefined reference to `__srcu_read_unlock_nmisafe'
> x86_64-linux-gnu-ld: kernel/rcu/srcutree.o: in function `srcu_barrier':
> srcutree.c:(.text+0x1b03): undefined reference to `__srcu_read_lock_nmisafe'
> x86_64-linux-gnu-ld: srcutree.c:(.text+0x1b38): undefined reference to `__srcu_read_unlock_nmisafe'
> 
> Note that this error is fixed with a later commit:
> 
> commit c2d158a284ab ("srcu: Debug NMI safety even on archs that don't
> require it").
> 
> This does not affect what I am working on, so feel free to take care of
> it whenever it fits your schedule.

Good catch, thank you!

It looks like the first two hunks in include/linux/srcu.h from that
later commit need to be in that earlier commit.

Frederic, does this make sense, or am I off in the weeds?

If so, my thought is to make this change in the name of bisectability,
then produce a new srcunmisafe branch.  The printk() series would
then need to rebase or remerge this new series.

John, would that work for you?

							Thanx, Paul
John Ogness Oct. 19, 2022, 9:38 p.m. UTC | #12
On 2022-10-19, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> my thought is to make this change in the name of bisectability,
> then produce a new srcunmisafe branch.  The printk() series would
> then need to rebase or remerge this new series.
>
> John, would that work for you?

Yes, that is fine. It really is just a bisectability issue, so for the
review of my v2 series it does not matter. I will rebase on the new
branch for my v3. ;-)

John
Frederic Weisbecker Oct. 19, 2022, 10:05 p.m. UTC | #13
On Wed, Oct 19, 2022 at 12:14:18PM -0700, Paul E. McKenney wrote:
> On Wed, Oct 19, 2022 at 01:19:53PM +0206, John Ogness wrote:
> > On 2022-10-18, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> > > And the v6.1-rc1 stack is now at srcunmisafe.2022.10.18b.
> > 
> > Thanks!
> > 
> > I guess the kernel test robot will catch this, but if you checkout
> > commit 79c95dc428ad ("arch/x86: Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
> > Kconfig option") and build for x86_64, you will get:
> > 
> > x86_64-linux-gnu-ld: kernel/rcu/srcutree.o: in function `srcu_gp_start_if_needed':
> > srcutree.c:(.text+0x133a): undefined reference to `__srcu_read_lock_nmisafe'
> > x86_64-linux-gnu-ld: srcutree.c:(.text+0x1490): undefined reference to `__srcu_read_unlock_nmisafe'
> > x86_64-linux-gnu-ld: kernel/rcu/srcutree.o: in function `srcu_barrier':
> > srcutree.c:(.text+0x1b03): undefined reference to `__srcu_read_lock_nmisafe'
> > x86_64-linux-gnu-ld: srcutree.c:(.text+0x1b38): undefined reference to `__srcu_read_unlock_nmisafe'
> > 
> > Note that this error is fixed with a later commit:
> > 
> > commit c2d158a284ab ("srcu: Debug NMI safety even on archs that don't
> > require it").
> > 
> > This does not affect what I am working on, so feel free to take care of
> > it whenever it fits your schedule.
> 
> Good catch, thank you!
> 
> It looks like the first two hunks in include/linux/srcu.h from that
> later commit need to be in that earlier commit.
> 
> Frederic, does this make sense, or am I off in the weeds?

Actually you need to do that earlier, in
6584822b1be1 ("srcu: Create an srcu_read_lock_nmisafe() and srcu_read_unlock_nmisafe()")

This way you don't only fix x86 bisectability but also the one of all the other safe archs.

And it's not just the first two hunks, you also need to include
the removal of the srcutiny.h/srcutree.h definitions.

So namely you need to apply the following to 6584822b1be1. You might
meet some minor retro-conflicts (the chknmisafe parameter didn't exist yet),
but that's pretty much it:

diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index 565f60d57484..f0814ffca34b 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -52,8 +52,6 @@ int init_srcu_struct(struct srcu_struct *ssp);
 #else
 /* Dummy definition for things like notifiers.  Actual use gets link error. */
 struct srcu_struct { };
-int __srcu_read_lock_nmisafe(struct srcu_struct *ssp, bool chknmisafe) __acquires(ssp);
-void __srcu_read_unlock_nmisafe(struct srcu_struct *ssp, int idx, bool chknmisafe) __releases(ssp);
 #endif
 
 void call_srcu(struct srcu_struct *ssp, struct rcu_head *head,
@@ -66,6 +64,20 @@ unsigned long get_state_synchronize_srcu(struct srcu_struct *ssp);
 unsigned long start_poll_synchronize_srcu(struct srcu_struct *ssp);
 bool poll_state_synchronize_srcu(struct srcu_struct *ssp, unsigned long cookie);
 
+#ifdef CONFIG_NEED_SRCU_NMI_SAFE
+int __srcu_read_lock_nmisafe(struct srcu_struct *ssp) __acquires(ssp);
+void __srcu_read_unlock_nmisafe(struct srcu_struct *ssp, int idx) __releases(ssp);
+#else
+static inline int __srcu_read_lock_nmisafe(struct srcu_struct *ssp)
+{
+	return __srcu_read_lock(ssp);
+}
+static inline void __srcu_read_unlock_nmisafe(struct srcu_struct *ssp, int idx)
+{
+	__srcu_read_unlock(ssp, idx);
+}
+#endif /* CONFIG_NEED_SRCU_NMI_SAFE */
+
 #ifdef CONFIG_SRCU
 void srcu_init(void);
 #else /* #ifdef CONFIG_SRCU */
diff --git a/include/linux/srcutiny.h b/include/linux/srcutiny.h
index f890301f123d..f3a4d65b91ef 100644
--- a/include/linux/srcutiny.h
+++ b/include/linux/srcutiny.h
@@ -89,16 +89,4 @@ static inline void srcu_torture_stats_print(struct srcu_struct *ssp,
 		 data_race(READ_ONCE(ssp->srcu_idx)),
 		 data_race(READ_ONCE(ssp->srcu_idx_max)));
 }
-
-static inline int __srcu_read_lock_nmisafe(struct srcu_struct *ssp, bool chknmisafe)
-{
-	BUG();
-	return 0;
-}
-
-static inline void __srcu_read_unlock_nmisafe(struct srcu_struct *ssp, int idx, bool chknmisafe)
-{
-	BUG();
-}
-
 #endif
diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h
index 35ffdedf86cc..c689a81752c9 100644
--- a/include/linux/srcutree.h
+++ b/include/linux/srcutree.h
@@ -159,7 +155,4 @@ void synchronize_srcu_expedited(struct srcu_struct *ssp);
 void srcu_barrier(struct srcu_struct *ssp);
 void srcu_torture_stats_print(struct srcu_struct *ssp, char *tt, char *tf);
 
-int __srcu_read_lock_nmisafe(struct srcu_struct *ssp, bool chknmisafe) __acquires(ssp);
-void __srcu_read_unlock_nmisafe(struct srcu_struct *ssp, int idx, bool chknmisafe) __releases(ssp);
-
 #endif
Paul E. McKenney Oct. 20, 2022, 10:27 p.m. UTC | #14
On Thu, Oct 20, 2022 at 12:05:37AM +0200, Frederic Weisbecker wrote:
> On Wed, Oct 19, 2022 at 12:14:18PM -0700, Paul E. McKenney wrote:
> > On Wed, Oct 19, 2022 at 01:19:53PM +0206, John Ogness wrote:
> > > On 2022-10-18, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> > > > And the v6.1-rc1 stack is now at srcunmisafe.2022.10.18b.
> > > 
> > > Thanks!
> > > 
> > > I guess the kernel test robot will catch this, but if you checkout
> > > commit 79c95dc428ad ("arch/x86: Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
> > > Kconfig option") and build for x86_64, you will get:
> > > 
> > > x86_64-linux-gnu-ld: kernel/rcu/srcutree.o: in function `srcu_gp_start_if_needed':
> > > srcutree.c:(.text+0x133a): undefined reference to `__srcu_read_lock_nmisafe'
> > > x86_64-linux-gnu-ld: srcutree.c:(.text+0x1490): undefined reference to `__srcu_read_unlock_nmisafe'
> > > x86_64-linux-gnu-ld: kernel/rcu/srcutree.o: in function `srcu_barrier':
> > > srcutree.c:(.text+0x1b03): undefined reference to `__srcu_read_lock_nmisafe'
> > > x86_64-linux-gnu-ld: srcutree.c:(.text+0x1b38): undefined reference to `__srcu_read_unlock_nmisafe'
> > > 
> > > Note that this error is fixed with a later commit:
> > > 
> > > commit c2d158a284ab ("srcu: Debug NMI safety even on archs that don't
> > > require it").
> > > 
> > > This does not affect what I am working on, so feel free to take care of
> > > it whenever it fits your schedule.
> > 
> > Good catch, thank you!
> > 
> > It looks like the first two hunks in include/linux/srcu.h from that
> > later commit need to be in that earlier commit.
> > 
> > Frederic, does this make sense, or am I off in the weeds?
> 
> Actually you need to do that earlier, in
> 6584822b1be1 ("srcu: Create an srcu_read_lock_nmisafe() and srcu_read_unlock_nmisafe()")
> 
> This way you don't only fix x86 bisectability but also the one of all the other safe archs.
> 
> And it's not just the first two hunks, you also need to include
> the removal of the srcutiny.h/srcutree.h definitions.
> 
> So namely you need to apply the following to 6584822b1be1. You might
> meet some minor retro-conflicts (the chknmisafe parameter didn't exist yet),
> but that's pretty much it:

Thank you both!

I have an untested but allegedly fixed branch on -rcu on branch
srcunmisafe.2022.10.20a.

							Thanx, Paul

> diff --git a/include/linux/srcu.h b/include/linux/srcu.h
> index 565f60d57484..f0814ffca34b 100644
> --- a/include/linux/srcu.h
> +++ b/include/linux/srcu.h
> @@ -52,8 +52,6 @@ int init_srcu_struct(struct srcu_struct *ssp);
>  #else
>  /* Dummy definition for things like notifiers.  Actual use gets link error. */
>  struct srcu_struct { };
> -int __srcu_read_lock_nmisafe(struct srcu_struct *ssp, bool chknmisafe) __acquires(ssp);
> -void __srcu_read_unlock_nmisafe(struct srcu_struct *ssp, int idx, bool chknmisafe) __releases(ssp);
>  #endif
>  
>  void call_srcu(struct srcu_struct *ssp, struct rcu_head *head,
> @@ -66,6 +64,20 @@ unsigned long get_state_synchronize_srcu(struct srcu_struct *ssp);
>  unsigned long start_poll_synchronize_srcu(struct srcu_struct *ssp);
>  bool poll_state_synchronize_srcu(struct srcu_struct *ssp, unsigned long cookie);
>  
> +#ifdef CONFIG_NEED_SRCU_NMI_SAFE
> +int __srcu_read_lock_nmisafe(struct srcu_struct *ssp) __acquires(ssp);
> +void __srcu_read_unlock_nmisafe(struct srcu_struct *ssp, int idx) __releases(ssp);
> +#else
> +static inline int __srcu_read_lock_nmisafe(struct srcu_struct *ssp)
> +{
> +	return __srcu_read_lock(ssp);
> +}
> +static inline void __srcu_read_unlock_nmisafe(struct srcu_struct *ssp, int idx)
> +{
> +	__srcu_read_unlock(ssp, idx);
> +}
> +#endif /* CONFIG_NEED_SRCU_NMI_SAFE */
> +
>  #ifdef CONFIG_SRCU
>  void srcu_init(void);
>  #else /* #ifdef CONFIG_SRCU */
> diff --git a/include/linux/srcutiny.h b/include/linux/srcutiny.h
> index f890301f123d..f3a4d65b91ef 100644
> --- a/include/linux/srcutiny.h
> +++ b/include/linux/srcutiny.h
> @@ -89,16 +89,4 @@ static inline void srcu_torture_stats_print(struct srcu_struct *ssp,
>  		 data_race(READ_ONCE(ssp->srcu_idx)),
>  		 data_race(READ_ONCE(ssp->srcu_idx_max)));
>  }
> -
> -static inline int __srcu_read_lock_nmisafe(struct srcu_struct *ssp, bool chknmisafe)
> -{
> -	BUG();
> -	return 0;
> -}
> -
> -static inline void __srcu_read_unlock_nmisafe(struct srcu_struct *ssp, int idx, bool chknmisafe)
> -{
> -	BUG();
> -}
> -
>  #endif
> diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h
> index 35ffdedf86cc..c689a81752c9 100644
> --- a/include/linux/srcutree.h
> +++ b/include/linux/srcutree.h
> @@ -159,7 +155,4 @@ void synchronize_srcu_expedited(struct srcu_struct *ssp);
>  void srcu_barrier(struct srcu_struct *ssp);
>  void srcu_torture_stats_print(struct srcu_struct *ssp, char *tt, char *tf);
>  
> -int __srcu_read_lock_nmisafe(struct srcu_struct *ssp, bool chknmisafe) __acquires(ssp);
> -void __srcu_read_unlock_nmisafe(struct srcu_struct *ssp, int idx, bool chknmisafe) __releases(ssp);
> -
>  #endif
> 
>
Paul E. McKenney Oct. 20, 2022, 10:41 p.m. UTC | #15
On Thu, Oct 20, 2022 at 03:27:18PM -0700, Paul E. McKenney wrote:
> On Thu, Oct 20, 2022 at 12:05:37AM +0200, Frederic Weisbecker wrote:
> > On Wed, Oct 19, 2022 at 12:14:18PM -0700, Paul E. McKenney wrote:
> > > On Wed, Oct 19, 2022 at 01:19:53PM +0206, John Ogness wrote:
> > > > On 2022-10-18, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> > > > > And the v6.1-rc1 stack is now at srcunmisafe.2022.10.18b.
> > > > 
> > > > Thanks!
> > > > 
> > > > I guess the kernel test robot will catch this, but if you checkout
> > > > commit 79c95dc428ad ("arch/x86: Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
> > > > Kconfig option") and build for x86_64, you will get:
> > > > 
> > > > x86_64-linux-gnu-ld: kernel/rcu/srcutree.o: in function `srcu_gp_start_if_needed':
> > > > srcutree.c:(.text+0x133a): undefined reference to `__srcu_read_lock_nmisafe'
> > > > x86_64-linux-gnu-ld: srcutree.c:(.text+0x1490): undefined reference to `__srcu_read_unlock_nmisafe'
> > > > x86_64-linux-gnu-ld: kernel/rcu/srcutree.o: in function `srcu_barrier':
> > > > srcutree.c:(.text+0x1b03): undefined reference to `__srcu_read_lock_nmisafe'
> > > > x86_64-linux-gnu-ld: srcutree.c:(.text+0x1b38): undefined reference to `__srcu_read_unlock_nmisafe'
> > > > 
> > > > Note that this error is fixed with a later commit:
> > > > 
> > > > commit c2d158a284ab ("srcu: Debug NMI safety even on archs that don't
> > > > require it").
> > > > 
> > > > This does not affect what I am working on, so feel free to take care of
> > > > it whenever it fits your schedule.
> > > 
> > > Good catch, thank you!
> > > 
> > > It looks like the first two hunks in include/linux/srcu.h from that
> > > later commit need to be in that earlier commit.
> > > 
> > > Frederic, does this make sense, or am I off in the weeds?
> > 
> > Actually you need to do that earlier, in
> > 6584822b1be1 ("srcu: Create an srcu_read_lock_nmisafe() and srcu_read_unlock_nmisafe()")
> > 
> > This way you don't only fix x86 bisectability but also the one of all the other safe archs.
> > 
> > And it's not just the first two hunks, you also need to include
> > the removal of the srcutiny.h/srcutree.h definitions.
> > 
> > So namely you need to apply the following to 6584822b1be1. You might
> > meet some minor retro-conflicts (the chknmisafe parameter didn't exist yet),
> > but that's pretty much it:
> 
> Thank you both!
> 
> I have an untested but allegedly fixed branch on -rcu on branch
> srcunmisafe.2022.10.20a.

Aside from having accidentally fused Frederic's last two commits together.
I will split them back out on the next rebase.

							Thanx, Paul

> > diff --git a/include/linux/srcu.h b/include/linux/srcu.h
> > index 565f60d57484..f0814ffca34b 100644
> > --- a/include/linux/srcu.h
> > +++ b/include/linux/srcu.h
> > @@ -52,8 +52,6 @@ int init_srcu_struct(struct srcu_struct *ssp);
> >  #else
> >  /* Dummy definition for things like notifiers.  Actual use gets link error. */
> >  struct srcu_struct { };
> > -int __srcu_read_lock_nmisafe(struct srcu_struct *ssp, bool chknmisafe) __acquires(ssp);
> > -void __srcu_read_unlock_nmisafe(struct srcu_struct *ssp, int idx, bool chknmisafe) __releases(ssp);
> >  #endif
> >  
> >  void call_srcu(struct srcu_struct *ssp, struct rcu_head *head,
> > @@ -66,6 +64,20 @@ unsigned long get_state_synchronize_srcu(struct srcu_struct *ssp);
> >  unsigned long start_poll_synchronize_srcu(struct srcu_struct *ssp);
> >  bool poll_state_synchronize_srcu(struct srcu_struct *ssp, unsigned long cookie);
> >  
> > +#ifdef CONFIG_NEED_SRCU_NMI_SAFE
> > +int __srcu_read_lock_nmisafe(struct srcu_struct *ssp) __acquires(ssp);
> > +void __srcu_read_unlock_nmisafe(struct srcu_struct *ssp, int idx) __releases(ssp);
> > +#else
> > +static inline int __srcu_read_lock_nmisafe(struct srcu_struct *ssp)
> > +{
> > +	return __srcu_read_lock(ssp);
> > +}
> > +static inline void __srcu_read_unlock_nmisafe(struct srcu_struct *ssp, int idx)
> > +{
> > +	__srcu_read_unlock(ssp, idx);
> > +}
> > +#endif /* CONFIG_NEED_SRCU_NMI_SAFE */
> > +
> >  #ifdef CONFIG_SRCU
> >  void srcu_init(void);
> >  #else /* #ifdef CONFIG_SRCU */
> > diff --git a/include/linux/srcutiny.h b/include/linux/srcutiny.h
> > index f890301f123d..f3a4d65b91ef 100644
> > --- a/include/linux/srcutiny.h
> > +++ b/include/linux/srcutiny.h
> > @@ -89,16 +89,4 @@ static inline void srcu_torture_stats_print(struct srcu_struct *ssp,
> >  		 data_race(READ_ONCE(ssp->srcu_idx)),
> >  		 data_race(READ_ONCE(ssp->srcu_idx_max)));
> >  }
> > -
> > -static inline int __srcu_read_lock_nmisafe(struct srcu_struct *ssp, bool chknmisafe)
> > -{
> > -	BUG();
> > -	return 0;
> > -}
> > -
> > -static inline void __srcu_read_unlock_nmisafe(struct srcu_struct *ssp, int idx, bool chknmisafe)
> > -{
> > -	BUG();
> > -}
> > -
> >  #endif
> > diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h
> > index 35ffdedf86cc..c689a81752c9 100644
> > --- a/include/linux/srcutree.h
> > +++ b/include/linux/srcutree.h
> > @@ -159,7 +155,4 @@ void synchronize_srcu_expedited(struct srcu_struct *ssp);
> >  void srcu_barrier(struct srcu_struct *ssp);
> >  void srcu_torture_stats_print(struct srcu_struct *ssp, char *tt, char *tf);
> >  
> > -int __srcu_read_lock_nmisafe(struct srcu_struct *ssp, bool chknmisafe) __acquires(ssp);
> > -void __srcu_read_unlock_nmisafe(struct srcu_struct *ssp, int idx, bool chknmisafe) __releases(ssp);
> > -
> >  #endif
> > 
> >
John Ogness Oct. 21, 2022, 12:27 p.m. UTC | #16
On 2022-10-20, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> I have an untested but allegedly fixed branch on -rcu on branch
> srcunmisafe.2022.10.20a.

FWIW, my C-I build system was happy with srcunmisafe.2022.10.20a.

John
Paul E. McKenney Oct. 21, 2022, 1:59 p.m. UTC | #17
On Fri, Oct 21, 2022 at 02:33:22PM +0206, John Ogness wrote:
> On 2022-10-20, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> > I have an untested but allegedly fixed branch on -rcu on branch
> > srcunmisafe.2022.10.20a.
> 
> FWIW, my C-I build system was happy with srcunmisafe.2022.10.20a.

Thank you for checking!

Today's transformation will have the same overall diffs, so here is
hoping!

							Thanx, Paul
Paul E. McKenney Oct. 21, 2022, 6:41 p.m. UTC | #18
On Fri, Oct 21, 2022 at 02:33:22PM +0206, John Ogness wrote:
> On 2022-10-20, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> > I have an untested but allegedly fixed branch on -rcu on branch
> > srcunmisafe.2022.10.20a.
> 
> FWIW, my C-I build system was happy with srcunmisafe.2022.10.20a.

And srcunmisafe.2022.10.21a diffs equal, so it hopefully also passes
your C-I.  ;-)

							Thanx, Paul
John Ogness Oct. 24, 2022, 6:15 a.m. UTC | #19
On 2022-10-21, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> And srcunmisafe.2022.10.21a diffs equal, so it hopefully also passes
> your C-I.

It does. Thanks, Paul!

John
Paul E. McKenney Oct. 24, 2022, 1:47 p.m. UTC | #20
On Mon, Oct 24, 2022 at 08:21:44AM +0206, John Ogness wrote:
> On 2022-10-21, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> > And srcunmisafe.2022.10.21a diffs equal, so it hopefully also passes
> > your C-I.
> 
> It does. Thanks, Paul!

Whew!!!  ;-)

							Thanx, Paul
John Ogness Oct. 27, 2022, 9:31 a.m. UTC | #21
Hi Paul,

I am running some tests using srcunmisafe.2022.10.21a and I am hitting
the WARN_ONCE in srcu_check_nmi_safety():

[    1.836703][    T1] rcu: Hierarchical SRCU implementation.
[    1.836707][    T1] rcu:     Max phase no-delay instances is 1000.
[    1.836844][   T15] ------------[ cut here ]------------
[    1.836846][   T15] CPU 0 old state 1 new state 2
[    1.836885][   T15] WARNING: CPU: 0 PID: 15 at kernel/rcu/srcutree.c:652 srcu_check_nmi_safety+0x79/0x90
[    1.836897][   T15] Modules linked in:
[    1.836903][   T15] CPU: 0 PID: 15 Comm: pr/bkl Not tainted 6.1.0-rc1+ #9
[    1.836909][   T15] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[    1.836912][   T15] RIP: 0010:srcu_check_nmi_safety+0x79/0x90
[    1.836919][   T15] Code: d3 80 3d 9f 76 d3 01 00 75 e5 55 8b b0 c8 01 00 00 44 89 c1 48 c7 c7 d0 1f 87 82 c6 05 850
[    1.836923][   T15] RSP: 0000:ffffc90000083e98 EFLAGS: 00010282
[    1.836929][   T15] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[    1.836933][   T15] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000ffffffff
[    1.836936][   T15] RBP: ffffc90000083e98 R08: 0000000000000000 R09: 0000000000000001
[    1.836940][   T15] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[    1.836943][   T15] R13: ffffc90000013d70 R14: ffff888004073900 R15: 0000000000000000
[    1.836946][   T15] FS:  0000000000000000(0000) GS:ffff888019600000(0000) knlGS:0000000000000000
[    1.836951][   T15] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    1.836954][   T15] CR2: ffff888003c01000 CR3: 0000000002c22001 CR4: 0000000000370ef0
[    1.836962][   T15] Call Trace:
[    1.836964][   T15]  <TASK>
[    1.836970][   T15]  console_bkl_kthread_func+0x27a/0x590
[    1.836981][   T15]  ? _raw_spin_unlock_irqrestore+0x3c/0x60
[    1.836998][   T15]  ? console_fill_outbuf+0x210/0x210
[    1.837003][   T15]  kthread+0x108/0x130
[    1.837012][   T15]  ? kthread_complete_and_exit+0x20/0x20
[    1.837025][   T15]  ret_from_fork+0x1f/0x30
[    1.837059][   T15]  </TASK>
[    1.837062][   T15] irq event stamp: 71
[    1.837065][   T15] hardirqs last  enabled at (73): [<ffffffff81106f99>] vprintk_store+0x1b9/0x5e0
[    1.837070][   T15] hardirqs last disabled at (74): [<ffffffff811071fb>] vprintk_store+0x41b/0x5e0
[    1.837075][   T15] softirqs last  enabled at (0): [<ffffffff8107ce22>] copy_process+0x952/0x1dd0
[    1.837081][   T15] softirqs last disabled at (0): [<0000000000000000>] 0x0
[    1.837085][   T15] ---[ end trace 0000000000000000 ]---
[    1.945054][   T12] Callback from call_rcu_tasks_rude() invoked.

My code is calling srcu_read_lock_nmisafe() from task context, in a
dedicated kthread. I am using DEFINE_STATIC_SRCU() to define/initialize
the srcu struct.

What does the warning imply?

John Ogness
Paul E. McKenney Oct. 27, 2022, 2:10 p.m. UTC | #22
On Thu, Oct 27, 2022 at 11:37:46AM +0206, John Ogness wrote:
> Hi Paul,
> 
> I am running some tests using srcunmisafe.2022.10.21a and I am hitting
> the WARN_ONCE in srcu_check_nmi_safety():
> 
> [    1.836703][    T1] rcu: Hierarchical SRCU implementation.
> [    1.836707][    T1] rcu:     Max phase no-delay instances is 1000.
> [    1.836844][   T15] ------------[ cut here ]------------
> [    1.836846][   T15] CPU 0 old state 1 new state 2
> [    1.836885][   T15] WARNING: CPU: 0 PID: 15 at kernel/rcu/srcutree.c:652 srcu_check_nmi_safety+0x79/0x90
> [    1.836897][   T15] Modules linked in:
> [    1.836903][   T15] CPU: 0 PID: 15 Comm: pr/bkl Not tainted 6.1.0-rc1+ #9
> [    1.836909][   T15] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
> [    1.836912][   T15] RIP: 0010:srcu_check_nmi_safety+0x79/0x90
> [    1.836919][   T15] Code: d3 80 3d 9f 76 d3 01 00 75 e5 55 8b b0 c8 01 00 00 44 89 c1 48 c7 c7 d0 1f 87 82 c6 05 850
> [    1.836923][   T15] RSP: 0000:ffffc90000083e98 EFLAGS: 00010282
> [    1.836929][   T15] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [    1.836933][   T15] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000ffffffff
> [    1.836936][   T15] RBP: ffffc90000083e98 R08: 0000000000000000 R09: 0000000000000001
> [    1.836940][   T15] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
> [    1.836943][   T15] R13: ffffc90000013d70 R14: ffff888004073900 R15: 0000000000000000
> [    1.836946][   T15] FS:  0000000000000000(0000) GS:ffff888019600000(0000) knlGS:0000000000000000
> [    1.836951][   T15] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    1.836954][   T15] CR2: ffff888003c01000 CR3: 0000000002c22001 CR4: 0000000000370ef0
> [    1.836962][   T15] Call Trace:
> [    1.836964][   T15]  <TASK>
> [    1.836970][   T15]  console_bkl_kthread_func+0x27a/0x590
> [    1.836981][   T15]  ? _raw_spin_unlock_irqrestore+0x3c/0x60
> [    1.836998][   T15]  ? console_fill_outbuf+0x210/0x210
> [    1.837003][   T15]  kthread+0x108/0x130
> [    1.837012][   T15]  ? kthread_complete_and_exit+0x20/0x20
> [    1.837025][   T15]  ret_from_fork+0x1f/0x30
> [    1.837059][   T15]  </TASK>
> [    1.837062][   T15] irq event stamp: 71
> [    1.837065][   T15] hardirqs last  enabled at (73): [<ffffffff81106f99>] vprintk_store+0x1b9/0x5e0
> [    1.837070][   T15] hardirqs last disabled at (74): [<ffffffff811071fb>] vprintk_store+0x41b/0x5e0
> [    1.837075][   T15] softirqs last  enabled at (0): [<ffffffff8107ce22>] copy_process+0x952/0x1dd0
> [    1.837081][   T15] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [    1.837085][   T15] ---[ end trace 0000000000000000 ]---
> [    1.945054][   T12] Callback from call_rcu_tasks_rude() invoked.
> 
> My code is calling srcu_read_lock_nmisafe() from task context, in a
> dedicated kthread. I am using DEFINE_STATIC_SRCU() to define/initialize
> the srcu struct.
> 
> What does the warning imply?

The warning is claiming that this srcu_struct was passed to
srcu_read_lock() at some point and is now being passed to
srcu_read_lock_nmisafe().

In other words, you have an srcu_read_lock() or srcu_read_unlock()
somewhere that needs to instead be srcu_read_lock_nmisafe() or
srcu_read_unlock_nmisafe().

							Thanx, Paul
John Ogness Oct. 27, 2022, 2:39 p.m. UTC | #23
On 2022-10-27, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> The warning is claiming that this srcu_struct was passed to
> srcu_read_lock() at some point and is now being passed to
> srcu_read_lock_nmisafe().

Indeed, I found some code that was not using my wrappers. Thanks. Living
proof that it was a good idea to add the checks. ;-)

John
Paul E. McKenney Oct. 27, 2022, 4:01 p.m. UTC | #24
On Thu, Oct 27, 2022 at 04:45:47PM +0206, John Ogness wrote:
> On 2022-10-27, "Paul E. McKenney" <paulmck@kernel.org> wrote:
> > The warning is claiming that this srcu_struct was passed to
> > srcu_read_lock() at some point and is now being passed to
> > srcu_read_lock_nmisafe().
> 
> Indeed, I found some code that was not using my wrappers. Thanks. Living
> proof that it was a good idea to add the checks. ;-)

Glad that they helped, and a big "thank you!" to Frederic!

							Thanx, Paul