Message ID | 20211203104723.3412383-2-mark.rutland@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: ensure CPUs are quiescent before patching | expand |
On Fri, Dec 03, 2021 at 10:47:20AM +0000, Mark Rutland wrote: > In __apply_alternatives_multi_stop() we have a "really simple polling > protocol" to avoid patching code that is concurrently executed on other > CPUs. Secondary CPUs wait for the boot CPU to signal that patching is > complete, but the boot CPU doesn't wait for secondaries to enter the > polling loop, and it's possible that patching starts while secondaries > are still within the stop_machine logic. > > Let's fix this by adding a vaguely simple polling protocol where the > boot CPU waits for secondaries to signal that they have entered the > unpatchable stop function. We can use the arch_atomic_*() functions for > this, as they are not patched with alternatives. > > At the same time, let's make `all_alternatives_applied` local to > __apply_alternatives_multi_stop(), since it is only used there, and this > makes the code a little clearer. Doesn't the stop_machine() mechanism wait for the CPUs to get in the same state before calling our function or we need another stop at a lower level in the arch code?
On Fri, Dec 10, 2021 at 02:49:59PM +0000, Catalin Marinas wrote: > On Fri, Dec 03, 2021 at 10:47:20AM +0000, Mark Rutland wrote: > > In __apply_alternatives_multi_stop() we have a "really simple polling > > protocol" to avoid patching code that is concurrently executed on other > > CPUs. Secondary CPUs wait for the boot CPU to signal that patching is > > complete, but the boot CPU doesn't wait for secondaries to enter the > > polling loop, and it's possible that patching starts while secondaries > > are still within the stop_machine logic. > > > > Let's fix this by adding a vaguely simple polling protocol where the > > boot CPU waits for secondaries to signal that they have entered the > > unpatchable stop function. We can use the arch_atomic_*() functions for > > this, as they are not patched with alternatives. > > > > At the same time, let's make `all_alternatives_applied` local to > > __apply_alternatives_multi_stop(), since it is only used there, and this > > makes the code a little clearer. > > Doesn't the stop_machine() mechanism wait for the CPUs to get in the > same state before calling our function or we need another stop at a > lower level in the arch code? The stop_machine() logic doesn't wait on the way in; it queues some work on each CPU sequentially to *enter* the stop function (in this case __apply_alternatives_multi_stop()), but there's no existing logic to ensrue that all CPUs have entered by some point. On the way out, stop_machine() waits for all CPUs to *exit* the stop function before returning. We need to synchronize on the way into __apply_alternatives_multi_stop() to be sure that no CPUs are executing bits which might be patched -- portions of stop_machine() itself, anything necessary to dequeue the work, or anything that might be running before the CPU spots the queued work. Thanks, Mark.
On Mon, Dec 13, 2021 at 01:01:25PM +0000, Mark Rutland wrote: > On Fri, Dec 10, 2021 at 02:49:59PM +0000, Catalin Marinas wrote: > > On Fri, Dec 03, 2021 at 10:47:20AM +0000, Mark Rutland wrote: > > > In __apply_alternatives_multi_stop() we have a "really simple polling > > > protocol" to avoid patching code that is concurrently executed on other > > > CPUs. Secondary CPUs wait for the boot CPU to signal that patching is > > > complete, but the boot CPU doesn't wait for secondaries to enter the > > > polling loop, and it's possible that patching starts while secondaries > > > are still within the stop_machine logic. > > > > > > Let's fix this by adding a vaguely simple polling protocol where the > > > boot CPU waits for secondaries to signal that they have entered the > > > unpatchable stop function. We can use the arch_atomic_*() functions for > > > this, as they are not patched with alternatives. > > > > > > At the same time, let's make `all_alternatives_applied` local to > > > __apply_alternatives_multi_stop(), since it is only used there, and this > > > makes the code a little clearer. > > > > Doesn't the stop_machine() mechanism wait for the CPUs to get in the > > same state before calling our function or we need another stop at a > > lower level in the arch code? > > The stop_machine() logic doesn't wait on the way in; it queues some work on > each CPU sequentially to *enter* the stop function (in this case > __apply_alternatives_multi_stop()), but there's no existing logic to ensrue > that all CPUs have entered by some point. On the way out, stop_machine() waits > for all CPUs to *exit* the stop function before returning. > > We need to synchronize on the way into __apply_alternatives_multi_stop() to be > sure that no CPUs are executing bits which might be patched -- portions of > stop_machine() itself, anything necessary to dequeue the work, or anything that > might be running before the CPU spots the queued work. I remember looking at this when I added the current polling loop for arm64 and, at the time, the loop around the state machine in multi_cpu_stop() was straightforward enough that it wasn't a problem. Since then, however, it appears to have grown a READ_ONCE(), so it looks like we'll need to so something. I'll have a look at your patches. Will
On Fri, Dec 03, 2021 at 10:47:20AM +0000, Mark Rutland wrote: > In __apply_alternatives_multi_stop() we have a "really simple polling > protocol" to avoid patching code that is concurrently executed on other > CPUs. Secondary CPUs wait for the boot CPU to signal that patching is > complete, but the boot CPU doesn't wait for secondaries to enter the > polling loop, and it's possible that patching starts while secondaries > are still within the stop_machine logic. > > Let's fix this by adding a vaguely simple polling protocol where the > boot CPU waits for secondaries to signal that they have entered the > unpatchable stop function. We can use the arch_atomic_*() functions for > this, as they are not patched with alternatives. > > At the same time, let's make `all_alternatives_applied` local to > __apply_alternatives_multi_stop(), since it is only used there, and this > makes the code a little clearer. > > Signed-off-by: Mark Rutland <mark.rutland@arm.com> > Cc: Andre Przywara <andre.przywara@arm.com> > Cc: Ard Biesheuvel <ardb@kernel.org> > Cc: Catalin Marinas <catalin.marinas@arm.com> > Cc: James Morse <james.morse@arm.com> > Cc: Joey Gouly <joey.gouly@arm.com> > Cc: Suzuki K Poulose <suzuki.poulose@arm.com> > Cc: Will Deacon <will@kernel.org> > --- > arch/arm64/kernel/alternative.c | 17 ++++++++++++----- > 1 file changed, 12 insertions(+), 5 deletions(-) > > diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c > index 3fb79b76e9d9..4f32d4425aac 100644 > --- a/arch/arm64/kernel/alternative.c > +++ b/arch/arm64/kernel/alternative.c > @@ -21,9 +21,6 @@ > #define ALT_ORIG_PTR(a) __ALT_PTR(a, orig_offset) > #define ALT_REPL_PTR(a) __ALT_PTR(a, alt_offset) > > -/* Volatile, as we may be patching the guts of READ_ONCE() */ > -static volatile int all_alternatives_applied; > - > static DECLARE_BITMAP(applied_alternatives, ARM64_NCAPS); > > struct alt_region { > @@ -193,11 +190,17 @@ static void __nocfi __apply_alternatives(struct alt_region *region, bool is_modu > } > > /* > - * We might be patching the stop_machine state machine, so implement a > - * really simple polling protocol here. > + * Apply alternatives, ensuring that no CPUs are concurrently executing code > + * being patched. > + * > + * We might be patching the stop_machine state machine or READ_ONCE(), so > + * we implement a simple polling protocol. > */ > static int __apply_alternatives_multi_stop(void *unused) > { > + /* Volatile, as we may be patching the guts of READ_ONCE() */ > + static volatile int all_alternatives_applied; > + static atomic_t stopped_cpus = ATOMIC_INIT(0); > struct alt_region region = { > .begin = (struct alt_instr *)__alt_instructions, > .end = (struct alt_instr *)__alt_instructions_end, > @@ -205,12 +208,16 @@ static int __apply_alternatives_multi_stop(void *unused) > > /* We always have a CPU 0 at this point (__init) */ > if (smp_processor_id()) { > + arch_atomic_inc(&stopped_cpus); Why can't we use normal atomic_inc() here? > while (!all_alternatives_applied) > cpu_relax(); > isb(); > } else { > DECLARE_BITMAP(remaining_capabilities, ARM64_NPATCHABLE); > > + while (arch_atomic_read(&stopped_cpus) != num_online_cpus() - 1) and normal atomic_read() here? Will
On Mon, Dec 13, 2021 at 01:31:52PM +0000, Will Deacon wrote: > On Fri, Dec 03, 2021 at 10:47:20AM +0000, Mark Rutland wrote: > > In __apply_alternatives_multi_stop() we have a "really simple polling > > protocol" to avoid patching code that is concurrently executed on other > > CPUs. Secondary CPUs wait for the boot CPU to signal that patching is > > complete, but the boot CPU doesn't wait for secondaries to enter the > > polling loop, and it's possible that patching starts while secondaries > > are still within the stop_machine logic. > > > > Let's fix this by adding a vaguely simple polling protocol where the > > boot CPU waits for secondaries to signal that they have entered the > > unpatchable stop function. We can use the arch_atomic_*() functions for > > this, as they are not patched with alternatives. > > > > At the same time, let's make `all_alternatives_applied` local to > > __apply_alternatives_multi_stop(), since it is only used there, and this > > makes the code a little clearer. > > > > Signed-off-by: Mark Rutland <mark.rutland@arm.com> > > Cc: Andre Przywara <andre.przywara@arm.com> > > Cc: Ard Biesheuvel <ardb@kernel.org> > > Cc: Catalin Marinas <catalin.marinas@arm.com> > > Cc: James Morse <james.morse@arm.com> > > Cc: Joey Gouly <joey.gouly@arm.com> > > Cc: Suzuki K Poulose <suzuki.poulose@arm.com> > > Cc: Will Deacon <will@kernel.org> > > --- > > arch/arm64/kernel/alternative.c | 17 ++++++++++++----- > > 1 file changed, 12 insertions(+), 5 deletions(-) > > > > diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c > > index 3fb79b76e9d9..4f32d4425aac 100644 > > --- a/arch/arm64/kernel/alternative.c > > +++ b/arch/arm64/kernel/alternative.c > > @@ -21,9 +21,6 @@ > > #define ALT_ORIG_PTR(a) __ALT_PTR(a, orig_offset) > > #define ALT_REPL_PTR(a) __ALT_PTR(a, alt_offset) > > > > -/* Volatile, as we may be patching the guts of READ_ONCE() */ > > -static volatile int all_alternatives_applied; > > - > > static DECLARE_BITMAP(applied_alternatives, ARM64_NCAPS); > > > > struct alt_region { > > @@ -193,11 +190,17 @@ static void __nocfi __apply_alternatives(struct alt_region *region, bool is_modu > > } > > > > /* > > - * We might be patching the stop_machine state machine, so implement a > > - * really simple polling protocol here. > > + * Apply alternatives, ensuring that no CPUs are concurrently executing code > > + * being patched. > > + * > > + * We might be patching the stop_machine state machine or READ_ONCE(), so > > + * we implement a simple polling protocol. > > */ > > static int __apply_alternatives_multi_stop(void *unused) > > { > > + /* Volatile, as we may be patching the guts of READ_ONCE() */ > > + static volatile int all_alternatives_applied; > > + static atomic_t stopped_cpus = ATOMIC_INIT(0); > > struct alt_region region = { > > .begin = (struct alt_instr *)__alt_instructions, > > .end = (struct alt_instr *)__alt_instructions_end, > > @@ -205,12 +208,16 @@ static int __apply_alternatives_multi_stop(void *unused) > > > > /* We always have a CPU 0 at this point (__init) */ > > if (smp_processor_id()) { > > + arch_atomic_inc(&stopped_cpus); > > Why can't we use normal atomic_inc() here? Ah, ok, this is to deal with instrumentation and you add 'noinstr' when you factor this out later on. It does, however, mean that we need to be really careful with this because we're relying on (a) our atomics patching using static keys and (b) static key patching not requiring stop_machine(). In particular, we cannot backport this to kernels where the atomics were patched directly. > > > while (!all_alternatives_applied) > > cpu_relax(); > > isb(); > > } else { > > DECLARE_BITMAP(remaining_capabilities, ARM64_NPATCHABLE); > > > > + while (arch_atomic_read(&stopped_cpus) != num_online_cpus() - 1) > > and normal atomic_read() here? This one I'm still thinking doesn't need the arch_ prefix. Will
On Mon, Dec 13, 2021 at 01:31:52PM +0000, Will Deacon wrote: > On Fri, Dec 03, 2021 at 10:47:20AM +0000, Mark Rutland wrote: > > In __apply_alternatives_multi_stop() we have a "really simple polling > > protocol" to avoid patching code that is concurrently executed on other > > CPUs. Secondary CPUs wait for the boot CPU to signal that patching is > > complete, but the boot CPU doesn't wait for secondaries to enter the > > polling loop, and it's possible that patching starts while secondaries > > are still within the stop_machine logic. > > > > Let's fix this by adding a vaguely simple polling protocol where the > > boot CPU waits for secondaries to signal that they have entered the > > unpatchable stop function. We can use the arch_atomic_*() functions for > > this, as they are not patched with alternatives. > > > > At the same time, let's make `all_alternatives_applied` local to > > __apply_alternatives_multi_stop(), since it is only used there, and this > > makes the code a little clearer. > > > > Signed-off-by: Mark Rutland <mark.rutland@arm.com> > > Cc: Andre Przywara <andre.przywara@arm.com> > > Cc: Ard Biesheuvel <ardb@kernel.org> > > Cc: Catalin Marinas <catalin.marinas@arm.com> > > Cc: James Morse <james.morse@arm.com> > > Cc: Joey Gouly <joey.gouly@arm.com> > > Cc: Suzuki K Poulose <suzuki.poulose@arm.com> > > Cc: Will Deacon <will@kernel.org> > > --- > > arch/arm64/kernel/alternative.c | 17 ++++++++++++----- > > 1 file changed, 12 insertions(+), 5 deletions(-) > > > > diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c > > index 3fb79b76e9d9..4f32d4425aac 100644 > > --- a/arch/arm64/kernel/alternative.c > > +++ b/arch/arm64/kernel/alternative.c > > @@ -21,9 +21,6 @@ > > #define ALT_ORIG_PTR(a) __ALT_PTR(a, orig_offset) > > #define ALT_REPL_PTR(a) __ALT_PTR(a, alt_offset) > > > > -/* Volatile, as we may be patching the guts of READ_ONCE() */ > > -static volatile int all_alternatives_applied; > > - > > static DECLARE_BITMAP(applied_alternatives, ARM64_NCAPS); > > > > struct alt_region { > > @@ -193,11 +190,17 @@ static void __nocfi __apply_alternatives(struct alt_region *region, bool is_modu > > } > > > > /* > > - * We might be patching the stop_machine state machine, so implement a > > - * really simple polling protocol here. > > + * Apply alternatives, ensuring that no CPUs are concurrently executing code > > + * being patched. > > + * > > + * We might be patching the stop_machine state machine or READ_ONCE(), so > > + * we implement a simple polling protocol. > > */ > > static int __apply_alternatives_multi_stop(void *unused) > > { > > + /* Volatile, as we may be patching the guts of READ_ONCE() */ > > + static volatile int all_alternatives_applied; > > + static atomic_t stopped_cpus = ATOMIC_INIT(0); > > struct alt_region region = { > > .begin = (struct alt_instr *)__alt_instructions, > > .end = (struct alt_instr *)__alt_instructions_end, > > @@ -205,12 +208,16 @@ static int __apply_alternatives_multi_stop(void *unused) > > > > /* We always have a CPU 0 at this point (__init) */ > > if (smp_processor_id()) { > > + arch_atomic_inc(&stopped_cpus); > > Why can't we use normal atomic_inc() here? In case there's any explicit instrumentation enabled in the atomic_inc() wrapper, since the instrumentation code may call into patchable code. Today we'd get away with using atomic_inc(), since currently all the instrumentation happens to be prior to the actual AMO, but generally to avoid instrumentation we're supposed to use the arch_atomic_*() ops. There are some other latent issues with calling into instrumentable code here, which I plan to address in future patches, so if you want I can make this a regular atomic_inc() for now and tackle that as a separate problem. Otherwise, I can elaborate on the mention in the commit message to make that clearer. > > while (!all_alternatives_applied) > > cpu_relax(); > > isb(); > > } else { > > DECLARE_BITMAP(remaining_capabilities, ARM64_NPATCHABLE); > > > > + while (arch_atomic_read(&stopped_cpus) != num_online_cpus() - 1) > > and normal atomic_read() here? Same story as above. Thanks, Mark.
On Mon, Dec 13, 2021 at 01:41:46PM +0000, Will Deacon wrote: > On Mon, Dec 13, 2021 at 01:31:52PM +0000, Will Deacon wrote: > > On Fri, Dec 03, 2021 at 10:47:20AM +0000, Mark Rutland wrote: > > > In __apply_alternatives_multi_stop() we have a "really simple polling > > > protocol" to avoid patching code that is concurrently executed on other > > > CPUs. Secondary CPUs wait for the boot CPU to signal that patching is > > > complete, but the boot CPU doesn't wait for secondaries to enter the > > > polling loop, and it's possible that patching starts while secondaries > > > are still within the stop_machine logic. > > > > > > Let's fix this by adding a vaguely simple polling protocol where the > > > boot CPU waits for secondaries to signal that they have entered the > > > unpatchable stop function. We can use the arch_atomic_*() functions for > > > this, as they are not patched with alternatives. > > > > > > At the same time, let's make `all_alternatives_applied` local to > > > __apply_alternatives_multi_stop(), since it is only used there, and this > > > makes the code a little clearer. > > > > > > Signed-off-by: Mark Rutland <mark.rutland@arm.com> > > > Cc: Andre Przywara <andre.przywara@arm.com> > > > Cc: Ard Biesheuvel <ardb@kernel.org> > > > Cc: Catalin Marinas <catalin.marinas@arm.com> > > > Cc: James Morse <james.morse@arm.com> > > > Cc: Joey Gouly <joey.gouly@arm.com> > > > Cc: Suzuki K Poulose <suzuki.poulose@arm.com> > > > Cc: Will Deacon <will@kernel.org> > > > --- > > > arch/arm64/kernel/alternative.c | 17 ++++++++++++----- > > > 1 file changed, 12 insertions(+), 5 deletions(-) > > > > > > diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c > > > index 3fb79b76e9d9..4f32d4425aac 100644 > > > --- a/arch/arm64/kernel/alternative.c > > > +++ b/arch/arm64/kernel/alternative.c > > > @@ -21,9 +21,6 @@ > > > #define ALT_ORIG_PTR(a) __ALT_PTR(a, orig_offset) > > > #define ALT_REPL_PTR(a) __ALT_PTR(a, alt_offset) > > > > > > -/* Volatile, as we may be patching the guts of READ_ONCE() */ > > > -static volatile int all_alternatives_applied; > > > - > > > static DECLARE_BITMAP(applied_alternatives, ARM64_NCAPS); > > > > > > struct alt_region { > > > @@ -193,11 +190,17 @@ static void __nocfi __apply_alternatives(struct alt_region *region, bool is_modu > > > } > > > > > > /* > > > - * We might be patching the stop_machine state machine, so implement a > > > - * really simple polling protocol here. > > > + * Apply alternatives, ensuring that no CPUs are concurrently executing code > > > + * being patched. > > > + * > > > + * We might be patching the stop_machine state machine or READ_ONCE(), so > > > + * we implement a simple polling protocol. > > > */ > > > static int __apply_alternatives_multi_stop(void *unused) > > > { > > > + /* Volatile, as we may be patching the guts of READ_ONCE() */ > > > + static volatile int all_alternatives_applied; > > > + static atomic_t stopped_cpus = ATOMIC_INIT(0); > > > struct alt_region region = { > > > .begin = (struct alt_instr *)__alt_instructions, > > > .end = (struct alt_instr *)__alt_instructions_end, > > > @@ -205,12 +208,16 @@ static int __apply_alternatives_multi_stop(void *unused) > > > > > > /* We always have a CPU 0 at this point (__init) */ > > > if (smp_processor_id()) { > > > + arch_atomic_inc(&stopped_cpus); > > > > Why can't we use normal atomic_inc() here? > > Ah, ok, this is to deal with instrumentation and you add 'noinstr' when you > factor this out later on. It does, however, mean that we need to be really > careful with this because we're relying on (a) our atomics patching using > static keys and (b) static key patching not requiring stop_machine(). > > In particular, we cannot backport this to kernels where the atomics were > patched directly. Another option here would be to use the __ll_sc_*() atomics directly, which at least will break the build if backported too far? > > > while (!all_alternatives_applied) > > > cpu_relax(); > > > isb(); > > > } else { > > > DECLARE_BITMAP(remaining_capabilities, ARM64_NPATCHABLE); > > > > > > + while (arch_atomic_read(&stopped_cpus) != num_online_cpus() - 1) > > > > and normal atomic_read() here? > > This one I'm still thinking doesn't need the arch_ prefix. We could use a regular atomic_read() here, yes. I'd used the arch_atomic_*() from for consistency with the inc(). Thanks, Mark.
On Mon, Dec 13, 2021 at 01:54:39PM +0000, Mark Rutland wrote: > On Mon, Dec 13, 2021 at 01:41:46PM +0000, Will Deacon wrote: > > On Mon, Dec 13, 2021 at 01:31:52PM +0000, Will Deacon wrote: > > > On Fri, Dec 03, 2021 at 10:47:20AM +0000, Mark Rutland wrote: > > > > In __apply_alternatives_multi_stop() we have a "really simple polling > > > > protocol" to avoid patching code that is concurrently executed on other > > > > CPUs. Secondary CPUs wait for the boot CPU to signal that patching is > > > > complete, but the boot CPU doesn't wait for secondaries to enter the > > > > polling loop, and it's possible that patching starts while secondaries > > > > are still within the stop_machine logic. > > > > > > > > Let's fix this by adding a vaguely simple polling protocol where the > > > > boot CPU waits for secondaries to signal that they have entered the > > > > unpatchable stop function. We can use the arch_atomic_*() functions for > > > > this, as they are not patched with alternatives. > > > > > > > > At the same time, let's make `all_alternatives_applied` local to > > > > __apply_alternatives_multi_stop(), since it is only used there, and this > > > > makes the code a little clearer. > > > > > > > > Signed-off-by: Mark Rutland <mark.rutland@arm.com> > > > > Cc: Andre Przywara <andre.przywara@arm.com> > > > > Cc: Ard Biesheuvel <ardb@kernel.org> > > > > Cc: Catalin Marinas <catalin.marinas@arm.com> > > > > Cc: James Morse <james.morse@arm.com> > > > > Cc: Joey Gouly <joey.gouly@arm.com> > > > > Cc: Suzuki K Poulose <suzuki.poulose@arm.com> > > > > Cc: Will Deacon <will@kernel.org> > > > > --- > > > > arch/arm64/kernel/alternative.c | 17 ++++++++++++----- > > > > 1 file changed, 12 insertions(+), 5 deletions(-) > > > > > > > > diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c > > > > index 3fb79b76e9d9..4f32d4425aac 100644 > > > > --- a/arch/arm64/kernel/alternative.c > > > > +++ b/arch/arm64/kernel/alternative.c > > > > @@ -21,9 +21,6 @@ > > > > #define ALT_ORIG_PTR(a) __ALT_PTR(a, orig_offset) > > > > #define ALT_REPL_PTR(a) __ALT_PTR(a, alt_offset) > > > > > > > > -/* Volatile, as we may be patching the guts of READ_ONCE() */ > > > > -static volatile int all_alternatives_applied; > > > > - > > > > static DECLARE_BITMAP(applied_alternatives, ARM64_NCAPS); > > > > > > > > struct alt_region { > > > > @@ -193,11 +190,17 @@ static void __nocfi __apply_alternatives(struct alt_region *region, bool is_modu > > > > } > > > > > > > > /* > > > > - * We might be patching the stop_machine state machine, so implement a > > > > - * really simple polling protocol here. > > > > + * Apply alternatives, ensuring that no CPUs are concurrently executing code > > > > + * being patched. > > > > + * > > > > + * We might be patching the stop_machine state machine or READ_ONCE(), so > > > > + * we implement a simple polling protocol. > > > > */ > > > > static int __apply_alternatives_multi_stop(void *unused) > > > > { > > > > + /* Volatile, as we may be patching the guts of READ_ONCE() */ > > > > + static volatile int all_alternatives_applied; > > > > + static atomic_t stopped_cpus = ATOMIC_INIT(0); > > > > struct alt_region region = { > > > > .begin = (struct alt_instr *)__alt_instructions, > > > > .end = (struct alt_instr *)__alt_instructions_end, > > > > @@ -205,12 +208,16 @@ static int __apply_alternatives_multi_stop(void *unused) > > > > > > > > /* We always have a CPU 0 at this point (__init) */ > > > > if (smp_processor_id()) { > > > > + arch_atomic_inc(&stopped_cpus); > > > > > > Why can't we use normal atomic_inc() here? > > > > Ah, ok, this is to deal with instrumentation and you add 'noinstr' when you > > factor this out later on. It does, however, mean that we need to be really > > careful with this because we're relying on (a) our atomics patching using > > static keys and (b) static key patching not requiring stop_machine(). > > > > In particular, we cannot backport this to kernels where the atomics were > > patched directly. > > Another option here would be to use the __ll_sc_*() atomics directly, which at > least will break the build if backported too far? Hopefully it's sufficient just to add the right Fixes: tag and stick the kernel version on the CC stable line. > > > > while (!all_alternatives_applied) > > > > cpu_relax(); > > > > isb(); > > > > } else { > > > > DECLARE_BITMAP(remaining_capabilities, ARM64_NPATCHABLE); > > > > > > > > + while (arch_atomic_read(&stopped_cpus) != num_online_cpus() - 1) > > > > > > and normal atomic_read() here? > > > > This one I'm still thinking doesn't need the arch_ prefix. > > We could use a regular atomic_read() here, yes. > > I'd used the arch_atomic_*() from for consistency with the inc(). I'd rather only use the arch_* forms where they are strictly needed, and have a comment justifying each use. Will
diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c index 3fb79b76e9d9..4f32d4425aac 100644 --- a/arch/arm64/kernel/alternative.c +++ b/arch/arm64/kernel/alternative.c @@ -21,9 +21,6 @@ #define ALT_ORIG_PTR(a) __ALT_PTR(a, orig_offset) #define ALT_REPL_PTR(a) __ALT_PTR(a, alt_offset) -/* Volatile, as we may be patching the guts of READ_ONCE() */ -static volatile int all_alternatives_applied; - static DECLARE_BITMAP(applied_alternatives, ARM64_NCAPS); struct alt_region { @@ -193,11 +190,17 @@ static void __nocfi __apply_alternatives(struct alt_region *region, bool is_modu } /* - * We might be patching the stop_machine state machine, so implement a - * really simple polling protocol here. + * Apply alternatives, ensuring that no CPUs are concurrently executing code + * being patched. + * + * We might be patching the stop_machine state machine or READ_ONCE(), so + * we implement a simple polling protocol. */ static int __apply_alternatives_multi_stop(void *unused) { + /* Volatile, as we may be patching the guts of READ_ONCE() */ + static volatile int all_alternatives_applied; + static atomic_t stopped_cpus = ATOMIC_INIT(0); struct alt_region region = { .begin = (struct alt_instr *)__alt_instructions, .end = (struct alt_instr *)__alt_instructions_end, @@ -205,12 +208,16 @@ static int __apply_alternatives_multi_stop(void *unused) /* We always have a CPU 0 at this point (__init) */ if (smp_processor_id()) { + arch_atomic_inc(&stopped_cpus); while (!all_alternatives_applied) cpu_relax(); isb(); } else { DECLARE_BITMAP(remaining_capabilities, ARM64_NPATCHABLE); + while (arch_atomic_read(&stopped_cpus) != num_online_cpus() - 1) + cpu_relax(); + bitmap_complement(remaining_capabilities, boot_capabilities, ARM64_NPATCHABLE);
In __apply_alternatives_multi_stop() we have a "really simple polling protocol" to avoid patching code that is concurrently executed on other CPUs. Secondary CPUs wait for the boot CPU to signal that patching is complete, but the boot CPU doesn't wait for secondaries to enter the polling loop, and it's possible that patching starts while secondaries are still within the stop_machine logic. Let's fix this by adding a vaguely simple polling protocol where the boot CPU waits for secondaries to signal that they have entered the unpatchable stop function. We can use the arch_atomic_*() functions for this, as they are not patched with alternatives. At the same time, let's make `all_alternatives_applied` local to __apply_alternatives_multi_stop(), since it is only used there, and this makes the code a little clearer. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> --- arch/arm64/kernel/alternative.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-)