diff mbox series

[RFC] arm64: jump_label: Ensure patched jump_labels are visible to all CPUs

Message ID 20240731133601.3073-1-will@kernel.org (mailing list archive)
State New, archived
Headers show
Series [RFC] arm64: jump_label: Ensure patched jump_labels are visible to all CPUs | expand

Commit Message

Will Deacon July 31, 2024, 1:36 p.m. UTC
Although the Arm architecture permits concurrent modification and
execution of NOP and branch instructions, it still requires some
synchronisation to ensure that other CPUs consistently execute the newly
written instruction:

 >  When the modified instructions are observable, each PE that is
 >  executing the modified instructions must execute an ISB or perform a
 >  context synchronizing event to ensure execution of the modified
 >  instructions

Prior to commit f6cc0c501649 ("arm64: Avoid calling stop_machine() when
patching jump labels"), the arm64 jump_label patching machinery
performed synchronisation using stop_machine() after each modification,
however this was problematic when flipping static keys from atomic
contexts (namely, the arm_arch_timer CPU hotplug startup notifier) and
so we switched to the _nosync() patching routines to avoid "scheduling
while atomic" BUG()s during boot.

In hindsight, the analysis of the issue in f6cc0c501649 isn't quite
right: it cites the use of IPIs in the default patching routines as the
cause of the lockup, whereas stop_machine() does not rely on IPIs and
the I-cache invalidation is performed using __flush_icache_range(),
which elides the call to kick_all_cpus_sync(). In fact, the blocking
wait for other CPUs is what triggers the BUG() and the problem remains
even after f6cc0c501649, for example because we could block on the
jump_label_mutex. Eventually, the arm_arch_timer driver was fixed to
avoid the static key entirely in commit a862fc2254bd
("clocksource/arm_arch_timer: Remove use of workaround static key").

This all leaves the jump_label patching code in a funny situation on
arm64 as we do not synchronise with other CPUs to reduce the likelihood
of a bug which no longer exists. Consequently, toggling a static key on
one CPU cannot be assumed to take effect on other CPUs, leading to
potential issues, for example with missing preempt notifiers.

Rather than revert f6cc0c501649 and go back to stop_machine() for each
patch site, implement arch_jump_label_transform_apply() and kick all
the other CPUs with an IPI at the end of patching.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Fixes: f6cc0c501649 ("arm64: Avoid calling stop_machine() when patching jump labels")
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/jump_label.h |  1 +
 arch/arm64/kernel/jump_label.c      | 11 +++++++++--
 2 files changed, 10 insertions(+), 2 deletions(-)

Comments

Catalin Marinas Aug. 1, 2024, 3:53 p.m. UTC | #1
On Wed, Jul 31, 2024 at 02:36:01PM +0100, Will Deacon wrote:
> Although the Arm architecture permits concurrent modification and
> execution of NOP and branch instructions, it still requires some
> synchronisation to ensure that other CPUs consistently execute the newly
> written instruction:
> 
>  >  When the modified instructions are observable, each PE that is
>  >  executing the modified instructions must execute an ISB or perform a
>  >  context synchronizing event to ensure execution of the modified
>  >  instructions
> 
> Prior to commit f6cc0c501649 ("arm64: Avoid calling stop_machine() when
> patching jump labels"), the arm64 jump_label patching machinery
> performed synchronisation using stop_machine() after each modification,
> however this was problematic when flipping static keys from atomic
> contexts (namely, the arm_arch_timer CPU hotplug startup notifier) and
> so we switched to the _nosync() patching routines to avoid "scheduling
> while atomic" BUG()s during boot.
> 
> In hindsight, the analysis of the issue in f6cc0c501649 isn't quite
> right: it cites the use of IPIs in the default patching routines as the
> cause of the lockup, whereas stop_machine() does not rely on IPIs and
> the I-cache invalidation is performed using __flush_icache_range(),
> which elides the call to kick_all_cpus_sync(). In fact, the blocking
> wait for other CPUs is what triggers the BUG() and the problem remains
> even after f6cc0c501649, for example because we could block on the
> jump_label_mutex. Eventually, the arm_arch_timer driver was fixed to
> avoid the static key entirely in commit a862fc2254bd
> ("clocksource/arm_arch_timer: Remove use of workaround static key").
> 
> This all leaves the jump_label patching code in a funny situation on
> arm64 as we do not synchronise with other CPUs to reduce the likelihood
> of a bug which no longer exists. Consequently, toggling a static key on
> one CPU cannot be assumed to take effect on other CPUs, leading to
> potential issues, for example with missing preempt notifiers.
> 
> Rather than revert f6cc0c501649 and go back to stop_machine() for each
> patch site, implement arch_jump_label_transform_apply() and kick all
> the other CPUs with an IPI at the end of patching.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Alexander Potapenko <glider@google.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Marc Zyngier <maz@kernel.org>
> Fixes: f6cc0c501649 ("arm64: Avoid calling stop_machine() when patching jump labels")
> Signed-off-by: Will Deacon <will@kernel.org>

We need to keep an eye so that the patch is not picked up for 4.19 (the
fixed commit) as it doesn't have the arm_arch_timer fix, nor the batch
jump label support. LTS 5.4 is fine though, it has both. If we want it
in -stable, we can explicitly mention the version in the Cc line.

The patch looks good to me.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Marc Zyngier Aug. 1, 2024, 4:33 p.m. UTC | #2
On Wed, 31 Jul 2024 14:36:01 +0100,
Will Deacon <will@kernel.org> wrote:
> 
> Although the Arm architecture permits concurrent modification and
> execution of NOP and branch instructions, it still requires some
> synchronisation to ensure that other CPUs consistently execute the newly
> written instruction:
> 
>  >  When the modified instructions are observable, each PE that is
>  >  executing the modified instructions must execute an ISB or perform a
>  >  context synchronizing event to ensure execution of the modified
>  >  instructions
> 
> Prior to commit f6cc0c501649 ("arm64: Avoid calling stop_machine() when
> patching jump labels"), the arm64 jump_label patching machinery
> performed synchronisation using stop_machine() after each modification,
> however this was problematic when flipping static keys from atomic
> contexts (namely, the arm_arch_timer CPU hotplug startup notifier) and
> so we switched to the _nosync() patching routines to avoid "scheduling
> while atomic" BUG()s during boot.
> 
> In hindsight, the analysis of the issue in f6cc0c501649 isn't quite
> right: it cites the use of IPIs in the default patching routines as the
> cause of the lockup, whereas stop_machine() does not rely on IPIs and
> the I-cache invalidation is performed using __flush_icache_range(),
> which elides the call to kick_all_cpus_sync(). In fact, the blocking
> wait for other CPUs is what triggers the BUG() and the problem remains
> even after f6cc0c501649, for example because we could block on the
> jump_label_mutex. Eventually, the arm_arch_timer driver was fixed to
> avoid the static key entirely in commit a862fc2254bd
> ("clocksource/arm_arch_timer: Remove use of workaround static key").
> 
> This all leaves the jump_label patching code in a funny situation on
> arm64 as we do not synchronise with other CPUs to reduce the likelihood
> of a bug which no longer exists. Consequently, toggling a static key on
> one CPU cannot be assumed to take effect on other CPUs, leading to
> potential issues, for example with missing preempt notifiers.
> 
> Rather than revert f6cc0c501649 and go back to stop_machine() for each
> patch site, implement arch_jump_label_transform_apply() and kick all
> the other CPUs with an IPI at the end of patching.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Alexander Potapenko <glider@google.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Marc Zyngier <maz@kernel.org>
> Fixes: f6cc0c501649 ("arm64: Avoid calling stop_machine() when patching jump labels")
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/include/asm/jump_label.h |  1 +
>  arch/arm64/kernel/jump_label.c      | 11 +++++++++--
>  2 files changed, 10 insertions(+), 2 deletions(-)

Thanks for the detailed archaeology! This looks pretty reasonable to
me.

Reviewed-by: Marc Zyngier <maz@kernel.org>

	M.
Catalin Marinas Aug. 2, 2024, 4:51 p.m. UTC | #3
On Wed, 31 Jul 2024 14:36:01 +0100, Will Deacon wrote:
> Although the Arm architecture permits concurrent modification and
> execution of NOP and branch instructions, it still requires some
> synchronisation to ensure that other CPUs consistently execute the newly
> written instruction:
> 
>  >  When the modified instructions are observable, each PE that is
>  >  executing the modified instructions must execute an ISB or perform a
>  >  context synchronizing event to ensure execution of the modified
>  >  instructions
> 
> [...]

Applied to arm64 (for-next/fixes), thanks!

[1/1] arm64: jump_label: Ensure patched jump_labels are visible to all CPUs
      https://git.kernel.org/arm64/c/cfb00a357864
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/jump_label.h b/arch/arm64/include/asm/jump_label.h
index 4e753908b801..a0a5bbae7229 100644
--- a/arch/arm64/include/asm/jump_label.h
+++ b/arch/arm64/include/asm/jump_label.h
@@ -13,6 +13,7 @@ 
 #include <linux/types.h>
 #include <asm/insn.h>
 
+#define HAVE_JUMP_LABEL_BATCH
 #define JUMP_LABEL_NOP_SIZE		AARCH64_INSN_SIZE
 
 #define JUMP_TABLE_ENTRY(key, label)			\
diff --git a/arch/arm64/kernel/jump_label.c b/arch/arm64/kernel/jump_label.c
index faf88ec9c48e..f63ea915d6ad 100644
--- a/arch/arm64/kernel/jump_label.c
+++ b/arch/arm64/kernel/jump_label.c
@@ -7,11 +7,12 @@ 
  */
 #include <linux/kernel.h>
 #include <linux/jump_label.h>
+#include <linux/smp.h>
 #include <asm/insn.h>
 #include <asm/patching.h>
 
-void arch_jump_label_transform(struct jump_entry *entry,
-			       enum jump_label_type type)
+bool arch_jump_label_transform_queue(struct jump_entry *entry,
+				     enum jump_label_type type)
 {
 	void *addr = (void *)jump_entry_code(entry);
 	u32 insn;
@@ -25,4 +26,10 @@  void arch_jump_label_transform(struct jump_entry *entry,
 	}
 
 	aarch64_insn_patch_text_nosync(addr, insn);
+	return true;
+}
+
+void arch_jump_label_transform_apply(void)
+{
+	kick_all_cpus_sync();
 }