diff mbox series

[1/4] arm64: alternative: wait for other CPUs before patching

Message ID 20211203104723.3412383-2-mark.rutland@arm.com (mailing list archive)
State New, archived
Headers show
Series arm64: ensure CPUs are quiescent before patching | expand

Commit Message

Mark Rutland Dec. 3, 2021, 10:47 a.m. UTC
In __apply_alternatives_multi_stop() we have a "really simple polling
protocol" to avoid patching code that is concurrently executed on other
CPUs. Secondary CPUs wait for the boot CPU to signal that patching is
complete, but the boot CPU doesn't wait for secondaries to enter the
polling loop, and it's possible that patching starts while secondaries
are still within the stop_machine logic.

Let's fix this by adding a vaguely simple polling protocol where the
boot CPU waits for secondaries to signal that they have entered the
unpatchable stop function. We can use the arch_atomic_*() functions for
this, as they are not patched with alternatives.

At the same time, let's make `all_alternatives_applied` local to
__apply_alternatives_multi_stop(), since it is only used there, and this
makes the code a little clearer.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/kernel/alternative.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

Comments

Catalin Marinas Dec. 10, 2021, 2:49 p.m. UTC | #1
On Fri, Dec 03, 2021 at 10:47:20AM +0000, Mark Rutland wrote:
> In __apply_alternatives_multi_stop() we have a "really simple polling
> protocol" to avoid patching code that is concurrently executed on other
> CPUs. Secondary CPUs wait for the boot CPU to signal that patching is
> complete, but the boot CPU doesn't wait for secondaries to enter the
> polling loop, and it's possible that patching starts while secondaries
> are still within the stop_machine logic.
> 
> Let's fix this by adding a vaguely simple polling protocol where the
> boot CPU waits for secondaries to signal that they have entered the
> unpatchable stop function. We can use the arch_atomic_*() functions for
> this, as they are not patched with alternatives.
> 
> At the same time, let's make `all_alternatives_applied` local to
> __apply_alternatives_multi_stop(), since it is only used there, and this
> makes the code a little clearer.

Doesn't the stop_machine() mechanism wait for the CPUs to get in the
same state before calling our function or we need another stop at a
lower level in the arch code?
Mark Rutland Dec. 13, 2021, 1:01 p.m. UTC | #2
On Fri, Dec 10, 2021 at 02:49:59PM +0000, Catalin Marinas wrote:
> On Fri, Dec 03, 2021 at 10:47:20AM +0000, Mark Rutland wrote:
> > In __apply_alternatives_multi_stop() we have a "really simple polling
> > protocol" to avoid patching code that is concurrently executed on other
> > CPUs. Secondary CPUs wait for the boot CPU to signal that patching is
> > complete, but the boot CPU doesn't wait for secondaries to enter the
> > polling loop, and it's possible that patching starts while secondaries
> > are still within the stop_machine logic.
> > 
> > Let's fix this by adding a vaguely simple polling protocol where the
> > boot CPU waits for secondaries to signal that they have entered the
> > unpatchable stop function. We can use the arch_atomic_*() functions for
> > this, as they are not patched with alternatives.
> > 
> > At the same time, let's make `all_alternatives_applied` local to
> > __apply_alternatives_multi_stop(), since it is only used there, and this
> > makes the code a little clearer.
> 
> Doesn't the stop_machine() mechanism wait for the CPUs to get in the
> same state before calling our function or we need another stop at a
> lower level in the arch code?

The stop_machine() logic doesn't wait on the way in; it queues some work on
each CPU sequentially to *enter* the stop function (in this case
__apply_alternatives_multi_stop()), but there's no existing logic to ensrue
that all CPUs have entered by some point. On the way out, stop_machine() waits
for all CPUs to *exit* the stop function before returning.

We need to synchronize on the way into __apply_alternatives_multi_stop() to be
sure that no CPUs are executing bits which might be patched -- portions of
stop_machine() itself, anything necessary to dequeue the work, or anything that
might be running before the CPU spots the queued work.

Thanks,
Mark.
Will Deacon Dec. 13, 2021, 1:27 p.m. UTC | #3
On Mon, Dec 13, 2021 at 01:01:25PM +0000, Mark Rutland wrote:
> On Fri, Dec 10, 2021 at 02:49:59PM +0000, Catalin Marinas wrote:
> > On Fri, Dec 03, 2021 at 10:47:20AM +0000, Mark Rutland wrote:
> > > In __apply_alternatives_multi_stop() we have a "really simple polling
> > > protocol" to avoid patching code that is concurrently executed on other
> > > CPUs. Secondary CPUs wait for the boot CPU to signal that patching is
> > > complete, but the boot CPU doesn't wait for secondaries to enter the
> > > polling loop, and it's possible that patching starts while secondaries
> > > are still within the stop_machine logic.
> > > 
> > > Let's fix this by adding a vaguely simple polling protocol where the
> > > boot CPU waits for secondaries to signal that they have entered the
> > > unpatchable stop function. We can use the arch_atomic_*() functions for
> > > this, as they are not patched with alternatives.
> > > 
> > > At the same time, let's make `all_alternatives_applied` local to
> > > __apply_alternatives_multi_stop(), since it is only used there, and this
> > > makes the code a little clearer.
> > 
> > Doesn't the stop_machine() mechanism wait for the CPUs to get in the
> > same state before calling our function or we need another stop at a
> > lower level in the arch code?
> 
> The stop_machine() logic doesn't wait on the way in; it queues some work on
> each CPU sequentially to *enter* the stop function (in this case
> __apply_alternatives_multi_stop()), but there's no existing logic to ensrue
> that all CPUs have entered by some point. On the way out, stop_machine() waits
> for all CPUs to *exit* the stop function before returning.
> 
> We need to synchronize on the way into __apply_alternatives_multi_stop() to be
> sure that no CPUs are executing bits which might be patched -- portions of
> stop_machine() itself, anything necessary to dequeue the work, or anything that
> might be running before the CPU spots the queued work.

I remember looking at this when I added the current polling loop for arm64
and, at the time, the loop around the state machine in multi_cpu_stop() was
straightforward enough that it wasn't a problem. Since then, however, it
appears to have grown a READ_ONCE(), so it looks like we'll need to so
something.

I'll have a look at your patches.

Will
Will Deacon Dec. 13, 2021, 1:31 p.m. UTC | #4
On Fri, Dec 03, 2021 at 10:47:20AM +0000, Mark Rutland wrote:
> In __apply_alternatives_multi_stop() we have a "really simple polling
> protocol" to avoid patching code that is concurrently executed on other
> CPUs. Secondary CPUs wait for the boot CPU to signal that patching is
> complete, but the boot CPU doesn't wait for secondaries to enter the
> polling loop, and it's possible that patching starts while secondaries
> are still within the stop_machine logic.
> 
> Let's fix this by adding a vaguely simple polling protocol where the
> boot CPU waits for secondaries to signal that they have entered the
> unpatchable stop function. We can use the arch_atomic_*() functions for
> this, as they are not patched with alternatives.
> 
> At the same time, let's make `all_alternatives_applied` local to
> __apply_alternatives_multi_stop(), since it is only used there, and this
> makes the code a little clearer.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Andre Przywara <andre.przywara@arm.com>
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: James Morse <james.morse@arm.com>
> Cc: Joey Gouly <joey.gouly@arm.com>
> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> Cc: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/kernel/alternative.c | 17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
> index 3fb79b76e9d9..4f32d4425aac 100644
> --- a/arch/arm64/kernel/alternative.c
> +++ b/arch/arm64/kernel/alternative.c
> @@ -21,9 +21,6 @@
>  #define ALT_ORIG_PTR(a)		__ALT_PTR(a, orig_offset)
>  #define ALT_REPL_PTR(a)		__ALT_PTR(a, alt_offset)
>  
> -/* Volatile, as we may be patching the guts of READ_ONCE() */
> -static volatile int all_alternatives_applied;
> -
>  static DECLARE_BITMAP(applied_alternatives, ARM64_NCAPS);
>  
>  struct alt_region {
> @@ -193,11 +190,17 @@ static void __nocfi __apply_alternatives(struct alt_region *region, bool is_modu
>  }
>  
>  /*
> - * We might be patching the stop_machine state machine, so implement a
> - * really simple polling protocol here.
> + * Apply alternatives, ensuring that no CPUs are concurrently executing code
> + * being patched.
> + *
> + * We might be patching the stop_machine state machine or READ_ONCE(), so
> + * we implement a simple polling protocol.
>   */
>  static int __apply_alternatives_multi_stop(void *unused)
>  {
> +	/* Volatile, as we may be patching the guts of READ_ONCE() */
> +	static volatile int all_alternatives_applied;
> +	static atomic_t stopped_cpus = ATOMIC_INIT(0);
>  	struct alt_region region = {
>  		.begin	= (struct alt_instr *)__alt_instructions,
>  		.end	= (struct alt_instr *)__alt_instructions_end,
> @@ -205,12 +208,16 @@ static int __apply_alternatives_multi_stop(void *unused)
>  
>  	/* We always have a CPU 0 at this point (__init) */
>  	if (smp_processor_id()) {
> +		arch_atomic_inc(&stopped_cpus);

Why can't we use normal atomic_inc() here?

>  		while (!all_alternatives_applied)
>  			cpu_relax();
>  		isb();
>  	} else {
>  		DECLARE_BITMAP(remaining_capabilities, ARM64_NPATCHABLE);
>  
> +		while (arch_atomic_read(&stopped_cpus) != num_online_cpus() - 1)

and normal atomic_read() here?

Will
Will Deacon Dec. 13, 2021, 1:41 p.m. UTC | #5
On Mon, Dec 13, 2021 at 01:31:52PM +0000, Will Deacon wrote:
> On Fri, Dec 03, 2021 at 10:47:20AM +0000, Mark Rutland wrote:
> > In __apply_alternatives_multi_stop() we have a "really simple polling
> > protocol" to avoid patching code that is concurrently executed on other
> > CPUs. Secondary CPUs wait for the boot CPU to signal that patching is
> > complete, but the boot CPU doesn't wait for secondaries to enter the
> > polling loop, and it's possible that patching starts while secondaries
> > are still within the stop_machine logic.
> > 
> > Let's fix this by adding a vaguely simple polling protocol where the
> > boot CPU waits for secondaries to signal that they have entered the
> > unpatchable stop function. We can use the arch_atomic_*() functions for
> > this, as they are not patched with alternatives.
> > 
> > At the same time, let's make `all_alternatives_applied` local to
> > __apply_alternatives_multi_stop(), since it is only used there, and this
> > makes the code a little clearer.
> > 
> > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > Cc: Andre Przywara <andre.przywara@arm.com>
> > Cc: Ard Biesheuvel <ardb@kernel.org>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: James Morse <james.morse@arm.com>
> > Cc: Joey Gouly <joey.gouly@arm.com>
> > Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> > Cc: Will Deacon <will@kernel.org>
> > ---
> >  arch/arm64/kernel/alternative.c | 17 ++++++++++++-----
> >  1 file changed, 12 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
> > index 3fb79b76e9d9..4f32d4425aac 100644
> > --- a/arch/arm64/kernel/alternative.c
> > +++ b/arch/arm64/kernel/alternative.c
> > @@ -21,9 +21,6 @@
> >  #define ALT_ORIG_PTR(a)		__ALT_PTR(a, orig_offset)
> >  #define ALT_REPL_PTR(a)		__ALT_PTR(a, alt_offset)
> >  
> > -/* Volatile, as we may be patching the guts of READ_ONCE() */
> > -static volatile int all_alternatives_applied;
> > -
> >  static DECLARE_BITMAP(applied_alternatives, ARM64_NCAPS);
> >  
> >  struct alt_region {
> > @@ -193,11 +190,17 @@ static void __nocfi __apply_alternatives(struct alt_region *region, bool is_modu
> >  }
> >  
> >  /*
> > - * We might be patching the stop_machine state machine, so implement a
> > - * really simple polling protocol here.
> > + * Apply alternatives, ensuring that no CPUs are concurrently executing code
> > + * being patched.
> > + *
> > + * We might be patching the stop_machine state machine or READ_ONCE(), so
> > + * we implement a simple polling protocol.
> >   */
> >  static int __apply_alternatives_multi_stop(void *unused)
> >  {
> > +	/* Volatile, as we may be patching the guts of READ_ONCE() */
> > +	static volatile int all_alternatives_applied;
> > +	static atomic_t stopped_cpus = ATOMIC_INIT(0);
> >  	struct alt_region region = {
> >  		.begin	= (struct alt_instr *)__alt_instructions,
> >  		.end	= (struct alt_instr *)__alt_instructions_end,
> > @@ -205,12 +208,16 @@ static int __apply_alternatives_multi_stop(void *unused)
> >  
> >  	/* We always have a CPU 0 at this point (__init) */
> >  	if (smp_processor_id()) {
> > +		arch_atomic_inc(&stopped_cpus);
> 
> Why can't we use normal atomic_inc() here?

Ah, ok, this is to deal with instrumentation and you add 'noinstr' when you
factor this out later on. It does, however, mean that we need to be really
careful with this because we're relying on (a) our atomics patching using
static keys and (b) static key patching not requiring stop_machine().

In particular, we cannot backport this to kernels where the atomics were
patched directly.

> 
> >  		while (!all_alternatives_applied)
> >  			cpu_relax();
> >  		isb();
> >  	} else {
> >  		DECLARE_BITMAP(remaining_capabilities, ARM64_NPATCHABLE);
> >  
> > +		while (arch_atomic_read(&stopped_cpus) != num_online_cpus() - 1)
> 
> and normal atomic_read() here?

This one I'm still thinking doesn't need the arch_ prefix.

Will
Mark Rutland Dec. 13, 2021, 1:49 p.m. UTC | #6
On Mon, Dec 13, 2021 at 01:31:52PM +0000, Will Deacon wrote:
> On Fri, Dec 03, 2021 at 10:47:20AM +0000, Mark Rutland wrote:
> > In __apply_alternatives_multi_stop() we have a "really simple polling
> > protocol" to avoid patching code that is concurrently executed on other
> > CPUs. Secondary CPUs wait for the boot CPU to signal that patching is
> > complete, but the boot CPU doesn't wait for secondaries to enter the
> > polling loop, and it's possible that patching starts while secondaries
> > are still within the stop_machine logic.
> > 
> > Let's fix this by adding a vaguely simple polling protocol where the
> > boot CPU waits for secondaries to signal that they have entered the
> > unpatchable stop function. We can use the arch_atomic_*() functions for
> > this, as they are not patched with alternatives.
> > 
> > At the same time, let's make `all_alternatives_applied` local to
> > __apply_alternatives_multi_stop(), since it is only used there, and this
> > makes the code a little clearer.
> > 
> > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > Cc: Andre Przywara <andre.przywara@arm.com>
> > Cc: Ard Biesheuvel <ardb@kernel.org>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: James Morse <james.morse@arm.com>
> > Cc: Joey Gouly <joey.gouly@arm.com>
> > Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> > Cc: Will Deacon <will@kernel.org>
> > ---
> >  arch/arm64/kernel/alternative.c | 17 ++++++++++++-----
> >  1 file changed, 12 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
> > index 3fb79b76e9d9..4f32d4425aac 100644
> > --- a/arch/arm64/kernel/alternative.c
> > +++ b/arch/arm64/kernel/alternative.c
> > @@ -21,9 +21,6 @@
> >  #define ALT_ORIG_PTR(a)		__ALT_PTR(a, orig_offset)
> >  #define ALT_REPL_PTR(a)		__ALT_PTR(a, alt_offset)
> >  
> > -/* Volatile, as we may be patching the guts of READ_ONCE() */
> > -static volatile int all_alternatives_applied;
> > -
> >  static DECLARE_BITMAP(applied_alternatives, ARM64_NCAPS);
> >  
> >  struct alt_region {
> > @@ -193,11 +190,17 @@ static void __nocfi __apply_alternatives(struct alt_region *region, bool is_modu
> >  }
> >  
> >  /*
> > - * We might be patching the stop_machine state machine, so implement a
> > - * really simple polling protocol here.
> > + * Apply alternatives, ensuring that no CPUs are concurrently executing code
> > + * being patched.
> > + *
> > + * We might be patching the stop_machine state machine or READ_ONCE(), so
> > + * we implement a simple polling protocol.
> >   */
> >  static int __apply_alternatives_multi_stop(void *unused)
> >  {
> > +	/* Volatile, as we may be patching the guts of READ_ONCE() */
> > +	static volatile int all_alternatives_applied;
> > +	static atomic_t stopped_cpus = ATOMIC_INIT(0);
> >  	struct alt_region region = {
> >  		.begin	= (struct alt_instr *)__alt_instructions,
> >  		.end	= (struct alt_instr *)__alt_instructions_end,
> > @@ -205,12 +208,16 @@ static int __apply_alternatives_multi_stop(void *unused)
> >  
> >  	/* We always have a CPU 0 at this point (__init) */
> >  	if (smp_processor_id()) {
> > +		arch_atomic_inc(&stopped_cpus);
> 
> Why can't we use normal atomic_inc() here?

In case there's any explicit instrumentation enabled in the atomic_inc()
wrapper, since the instrumentation code may call into patchable code.

Today we'd get away with using atomic_inc(), since currently all the
instrumentation happens to be prior to the actual AMO, but generally to avoid
instrumentation we're supposed to use the arch_atomic_*() ops.

There are some other latent issues with calling into instrumentable code here,
which I plan to address in future patches, so if you want I can make this a
regular atomic_inc() for now and tackle that as a separate problem. Otherwise,
I can elaborate on the mention in the commit message to make that clearer.

> >  		while (!all_alternatives_applied)
> >  			cpu_relax();
> >  		isb();
> >  	} else {
> >  		DECLARE_BITMAP(remaining_capabilities, ARM64_NPATCHABLE);
> >  
> > +		while (arch_atomic_read(&stopped_cpus) != num_online_cpus() - 1)
> 
> and normal atomic_read() here?

Same story as above.

Thanks,
Mark.
Mark Rutland Dec. 13, 2021, 1:54 p.m. UTC | #7
On Mon, Dec 13, 2021 at 01:41:46PM +0000, Will Deacon wrote:
> On Mon, Dec 13, 2021 at 01:31:52PM +0000, Will Deacon wrote:
> > On Fri, Dec 03, 2021 at 10:47:20AM +0000, Mark Rutland wrote:
> > > In __apply_alternatives_multi_stop() we have a "really simple polling
> > > protocol" to avoid patching code that is concurrently executed on other
> > > CPUs. Secondary CPUs wait for the boot CPU to signal that patching is
> > > complete, but the boot CPU doesn't wait for secondaries to enter the
> > > polling loop, and it's possible that patching starts while secondaries
> > > are still within the stop_machine logic.
> > > 
> > > Let's fix this by adding a vaguely simple polling protocol where the
> > > boot CPU waits for secondaries to signal that they have entered the
> > > unpatchable stop function. We can use the arch_atomic_*() functions for
> > > this, as they are not patched with alternatives.
> > > 
> > > At the same time, let's make `all_alternatives_applied` local to
> > > __apply_alternatives_multi_stop(), since it is only used there, and this
> > > makes the code a little clearer.
> > > 
> > > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > > Cc: Andre Przywara <andre.przywara@arm.com>
> > > Cc: Ard Biesheuvel <ardb@kernel.org>
> > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > Cc: James Morse <james.morse@arm.com>
> > > Cc: Joey Gouly <joey.gouly@arm.com>
> > > Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> > > Cc: Will Deacon <will@kernel.org>
> > > ---
> > >  arch/arm64/kernel/alternative.c | 17 ++++++++++++-----
> > >  1 file changed, 12 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
> > > index 3fb79b76e9d9..4f32d4425aac 100644
> > > --- a/arch/arm64/kernel/alternative.c
> > > +++ b/arch/arm64/kernel/alternative.c
> > > @@ -21,9 +21,6 @@
> > >  #define ALT_ORIG_PTR(a)		__ALT_PTR(a, orig_offset)
> > >  #define ALT_REPL_PTR(a)		__ALT_PTR(a, alt_offset)
> > >  
> > > -/* Volatile, as we may be patching the guts of READ_ONCE() */
> > > -static volatile int all_alternatives_applied;
> > > -
> > >  static DECLARE_BITMAP(applied_alternatives, ARM64_NCAPS);
> > >  
> > >  struct alt_region {
> > > @@ -193,11 +190,17 @@ static void __nocfi __apply_alternatives(struct alt_region *region, bool is_modu
> > >  }
> > >  
> > >  /*
> > > - * We might be patching the stop_machine state machine, so implement a
> > > - * really simple polling protocol here.
> > > + * Apply alternatives, ensuring that no CPUs are concurrently executing code
> > > + * being patched.
> > > + *
> > > + * We might be patching the stop_machine state machine or READ_ONCE(), so
> > > + * we implement a simple polling protocol.
> > >   */
> > >  static int __apply_alternatives_multi_stop(void *unused)
> > >  {
> > > +	/* Volatile, as we may be patching the guts of READ_ONCE() */
> > > +	static volatile int all_alternatives_applied;
> > > +	static atomic_t stopped_cpus = ATOMIC_INIT(0);
> > >  	struct alt_region region = {
> > >  		.begin	= (struct alt_instr *)__alt_instructions,
> > >  		.end	= (struct alt_instr *)__alt_instructions_end,
> > > @@ -205,12 +208,16 @@ static int __apply_alternatives_multi_stop(void *unused)
> > >  
> > >  	/* We always have a CPU 0 at this point (__init) */
> > >  	if (smp_processor_id()) {
> > > +		arch_atomic_inc(&stopped_cpus);
> > 
> > Why can't we use normal atomic_inc() here?
> 
> Ah, ok, this is to deal with instrumentation and you add 'noinstr' when you
> factor this out later on. It does, however, mean that we need to be really
> careful with this because we're relying on (a) our atomics patching using
> static keys and (b) static key patching not requiring stop_machine().
> 
> In particular, we cannot backport this to kernels where the atomics were
> patched directly.

Another option here would be to use the __ll_sc_*() atomics directly, which at
least will break the build if backported too far?

> > >  		while (!all_alternatives_applied)
> > >  			cpu_relax();
> > >  		isb();
> > >  	} else {
> > >  		DECLARE_BITMAP(remaining_capabilities, ARM64_NPATCHABLE);
> > >  
> > > +		while (arch_atomic_read(&stopped_cpus) != num_online_cpus() - 1)
> > 
> > and normal atomic_read() here?
> 
> This one I'm still thinking doesn't need the arch_ prefix.

We could use a regular atomic_read() here, yes.

I'd used the arch_atomic_*() from for consistency with the inc().

Thanks,
Mark.
Will Deacon Dec. 14, 2021, 4:01 p.m. UTC | #8
On Mon, Dec 13, 2021 at 01:54:39PM +0000, Mark Rutland wrote:
> On Mon, Dec 13, 2021 at 01:41:46PM +0000, Will Deacon wrote:
> > On Mon, Dec 13, 2021 at 01:31:52PM +0000, Will Deacon wrote:
> > > On Fri, Dec 03, 2021 at 10:47:20AM +0000, Mark Rutland wrote:
> > > > In __apply_alternatives_multi_stop() we have a "really simple polling
> > > > protocol" to avoid patching code that is concurrently executed on other
> > > > CPUs. Secondary CPUs wait for the boot CPU to signal that patching is
> > > > complete, but the boot CPU doesn't wait for secondaries to enter the
> > > > polling loop, and it's possible that patching starts while secondaries
> > > > are still within the stop_machine logic.
> > > > 
> > > > Let's fix this by adding a vaguely simple polling protocol where the
> > > > boot CPU waits for secondaries to signal that they have entered the
> > > > unpatchable stop function. We can use the arch_atomic_*() functions for
> > > > this, as they are not patched with alternatives.
> > > > 
> > > > At the same time, let's make `all_alternatives_applied` local to
> > > > __apply_alternatives_multi_stop(), since it is only used there, and this
> > > > makes the code a little clearer.
> > > > 
> > > > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > > > Cc: Andre Przywara <andre.przywara@arm.com>
> > > > Cc: Ard Biesheuvel <ardb@kernel.org>
> > > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > > Cc: James Morse <james.morse@arm.com>
> > > > Cc: Joey Gouly <joey.gouly@arm.com>
> > > > Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> > > > Cc: Will Deacon <will@kernel.org>
> > > > ---
> > > >  arch/arm64/kernel/alternative.c | 17 ++++++++++++-----
> > > >  1 file changed, 12 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
> > > > index 3fb79b76e9d9..4f32d4425aac 100644
> > > > --- a/arch/arm64/kernel/alternative.c
> > > > +++ b/arch/arm64/kernel/alternative.c
> > > > @@ -21,9 +21,6 @@
> > > >  #define ALT_ORIG_PTR(a)		__ALT_PTR(a, orig_offset)
> > > >  #define ALT_REPL_PTR(a)		__ALT_PTR(a, alt_offset)
> > > >  
> > > > -/* Volatile, as we may be patching the guts of READ_ONCE() */
> > > > -static volatile int all_alternatives_applied;
> > > > -
> > > >  static DECLARE_BITMAP(applied_alternatives, ARM64_NCAPS);
> > > >  
> > > >  struct alt_region {
> > > > @@ -193,11 +190,17 @@ static void __nocfi __apply_alternatives(struct alt_region *region, bool is_modu
> > > >  }
> > > >  
> > > >  /*
> > > > - * We might be patching the stop_machine state machine, so implement a
> > > > - * really simple polling protocol here.
> > > > + * Apply alternatives, ensuring that no CPUs are concurrently executing code
> > > > + * being patched.
> > > > + *
> > > > + * We might be patching the stop_machine state machine or READ_ONCE(), so
> > > > + * we implement a simple polling protocol.
> > > >   */
> > > >  static int __apply_alternatives_multi_stop(void *unused)
> > > >  {
> > > > +	/* Volatile, as we may be patching the guts of READ_ONCE() */
> > > > +	static volatile int all_alternatives_applied;
> > > > +	static atomic_t stopped_cpus = ATOMIC_INIT(0);
> > > >  	struct alt_region region = {
> > > >  		.begin	= (struct alt_instr *)__alt_instructions,
> > > >  		.end	= (struct alt_instr *)__alt_instructions_end,
> > > > @@ -205,12 +208,16 @@ static int __apply_alternatives_multi_stop(void *unused)
> > > >  
> > > >  	/* We always have a CPU 0 at this point (__init) */
> > > >  	if (smp_processor_id()) {
> > > > +		arch_atomic_inc(&stopped_cpus);
> > > 
> > > Why can't we use normal atomic_inc() here?
> > 
> > Ah, ok, this is to deal with instrumentation and you add 'noinstr' when you
> > factor this out later on. It does, however, mean that we need to be really
> > careful with this because we're relying on (a) our atomics patching using
> > static keys and (b) static key patching not requiring stop_machine().
> > 
> > In particular, we cannot backport this to kernels where the atomics were
> > patched directly.
> 
> Another option here would be to use the __ll_sc_*() atomics directly, which at
> least will break the build if backported too far?

Hopefully it's sufficient just to add the right Fixes: tag and stick the
kernel version on the CC stable line.

> > > >  		while (!all_alternatives_applied)
> > > >  			cpu_relax();
> > > >  		isb();
> > > >  	} else {
> > > >  		DECLARE_BITMAP(remaining_capabilities, ARM64_NPATCHABLE);
> > > >  
> > > > +		while (arch_atomic_read(&stopped_cpus) != num_online_cpus() - 1)
> > > 
> > > and normal atomic_read() here?
> > 
> > This one I'm still thinking doesn't need the arch_ prefix.
> 
> We could use a regular atomic_read() here, yes.
> 
> I'd used the arch_atomic_*() from for consistency with the inc().

I'd rather only use the arch_* forms where they are strictly needed, and
have a comment justifying each use.

Will
diff mbox series

Patch

diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index 3fb79b76e9d9..4f32d4425aac 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -21,9 +21,6 @@ 
 #define ALT_ORIG_PTR(a)		__ALT_PTR(a, orig_offset)
 #define ALT_REPL_PTR(a)		__ALT_PTR(a, alt_offset)
 
-/* Volatile, as we may be patching the guts of READ_ONCE() */
-static volatile int all_alternatives_applied;
-
 static DECLARE_BITMAP(applied_alternatives, ARM64_NCAPS);
 
 struct alt_region {
@@ -193,11 +190,17 @@  static void __nocfi __apply_alternatives(struct alt_region *region, bool is_modu
 }
 
 /*
- * We might be patching the stop_machine state machine, so implement a
- * really simple polling protocol here.
+ * Apply alternatives, ensuring that no CPUs are concurrently executing code
+ * being patched.
+ *
+ * We might be patching the stop_machine state machine or READ_ONCE(), so
+ * we implement a simple polling protocol.
  */
 static int __apply_alternatives_multi_stop(void *unused)
 {
+	/* Volatile, as we may be patching the guts of READ_ONCE() */
+	static volatile int all_alternatives_applied;
+	static atomic_t stopped_cpus = ATOMIC_INIT(0);
 	struct alt_region region = {
 		.begin	= (struct alt_instr *)__alt_instructions,
 		.end	= (struct alt_instr *)__alt_instructions_end,
@@ -205,12 +208,16 @@  static int __apply_alternatives_multi_stop(void *unused)
 
 	/* We always have a CPU 0 at this point (__init) */
 	if (smp_processor_id()) {
+		arch_atomic_inc(&stopped_cpus);
 		while (!all_alternatives_applied)
 			cpu_relax();
 		isb();
 	} else {
 		DECLARE_BITMAP(remaining_capabilities, ARM64_NPATCHABLE);
 
+		while (arch_atomic_read(&stopped_cpus) != num_online_cpus() - 1)
+			cpu_relax();
+
 		bitmap_complement(remaining_capabilities, boot_capabilities,
 				  ARM64_NPATCHABLE);