diff mbox series

[v3,2/2] riscv: Fix text patching when IPI are used

Message ID 20240229121056.203419-3-alexghiti@rivosinc.com (mailing list archive)
State New
Headers show
Series riscv: fix patching with IPI | expand

Checks

Context Check Description
conchuod/vmtest-for-next-PR success PR summary
conchuod/patch-2-test-1 success .github/scripts/patches/tests/build_rv32_defconfig.sh
conchuod/patch-2-test-2 success .github/scripts/patches/tests/build_rv64_clang_allmodconfig.sh
conchuod/patch-2-test-3 success .github/scripts/patches/tests/build_rv64_gcc_allmodconfig.sh
conchuod/patch-2-test-4 success .github/scripts/patches/tests/build_rv64_nommu_k210_defconfig.sh
conchuod/patch-2-test-5 success .github/scripts/patches/tests/build_rv64_nommu_virt_defconfig.sh
conchuod/patch-2-test-6 warning .github/scripts/patches/tests/checkpatch.sh
conchuod/patch-2-test-7 success .github/scripts/patches/tests/dtb_warn_rv64.sh
conchuod/patch-2-test-8 success .github/scripts/patches/tests/header_inline.sh
conchuod/patch-2-test-9 success .github/scripts/patches/tests/kdoc.sh
conchuod/patch-2-test-10 success .github/scripts/patches/tests/module_param.sh
conchuod/patch-2-test-11 success .github/scripts/patches/tests/verify_fixes.sh
conchuod/patch-2-test-12 success .github/scripts/patches/tests/verify_signedoff.sh

Commit Message

Alexandre Ghiti Feb. 29, 2024, 12:10 p.m. UTC
For now, we use stop_machine() to patch the text and when we use IPIs for
remote icache flushes (which is emitted in patch_text_nosync()), the system
hangs.

So instead, make sure every CPU executes the stop_machine() patching
function and emit a local icache flush there.

Co-developed-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrea Parri <parri.andrea@gmail.com>
---
 arch/riscv/include/asm/patch.h |  1 +
 arch/riscv/kernel/ftrace.c     | 44 ++++++++++++++++++++++++++++++----
 arch/riscv/kernel/patch.c      | 16 +++++++++----
 3 files changed, 53 insertions(+), 8 deletions(-)

Comments

Conor Dooley March 4, 2024, 7:27 p.m. UTC | #1
On Thu, Feb 29, 2024 at 01:10:56PM +0100, Alexandre Ghiti wrote:
> For now, we use stop_machine() to patch the text and when we use IPIs for
> remote icache flushes (which is emitted in patch_text_nosync()), the system
> hangs.
> 
> So instead, make sure every CPU executes the stop_machine() patching
> function and emit a local icache flush there.
> 
> Co-developed-by: Björn Töpel <bjorn@rivosinc.com>
> Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> Reviewed-by: Andrea Parri <parri.andrea@gmail.com>

What commit does this fix?
Björn Töpel March 4, 2024, 8:24 p.m. UTC | #2
Conor Dooley <conor@kernel.org> writes:

> On Thu, Feb 29, 2024 at 01:10:56PM +0100, Alexandre Ghiti wrote:
>> For now, we use stop_machine() to patch the text and when we use IPIs for
>> remote icache flushes (which is emitted in patch_text_nosync()), the system
>> hangs.
>> 
>> So instead, make sure every CPU executes the stop_machine() patching
>> function and emit a local icache flush there.
>> 
>> Co-developed-by: Björn Töpel <bjorn@rivosinc.com>
>> Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
>> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
>> Reviewed-by: Andrea Parri <parri.andrea@gmail.com>
>
> What commit does this fix?

Hmm. The bug is exposed when the AIA IPI are introduced, and used
(instead of the firmware-based).

I'm not sure this is something we'd like backported, but rather a
prerequisite to AIA.

@Anup @Alex WDYT?
Anup Patel March 5, 2024, 3:03 a.m. UTC | #3
On Tue, Mar 5, 2024 at 1:54 AM Björn Töpel <bjorn@kernel.org> wrote:
>
> Conor Dooley <conor@kernel.org> writes:
>
> > On Thu, Feb 29, 2024 at 01:10:56PM +0100, Alexandre Ghiti wrote:
> >> For now, we use stop_machine() to patch the text and when we use IPIs for
> >> remote icache flushes (which is emitted in patch_text_nosync()), the system
> >> hangs.
> >>
> >> So instead, make sure every CPU executes the stop_machine() patching
> >> function and emit a local icache flush there.
> >>
> >> Co-developed-by: Björn Töpel <bjorn@rivosinc.com>
> >> Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
> >> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> >> Reviewed-by: Andrea Parri <parri.andrea@gmail.com>
> >
> > What commit does this fix?
>
> Hmm. The bug is exposed when the AIA IPI are introduced, and used
> (instead of the firmware-based).
>
> I'm not sure this is something we'd like backported, but rather a
> prerequisite to AIA.
>
> @Anup @Alex WDYT?
>

The current text patching never considered IPIs being injected
directly in S-mode from hart to another so we are seeing this
issue now with AIA IPIs.

We certainly don't need to backport this fix since it's more
of a preparatory fix for AIA IPIs.

Regards,
Anup
Conor Dooley March 5, 2024, 7:59 a.m. UTC | #4
On Tue, Mar 05, 2024 at 08:33:30AM +0530, Anup Patel wrote:
> On Tue, Mar 5, 2024 at 1:54 AM Björn Töpel <bjorn@kernel.org> wrote:
> >
> > Conor Dooley <conor@kernel.org> writes:
> >
> > > On Thu, Feb 29, 2024 at 01:10:56PM +0100, Alexandre Ghiti wrote:
> > >> For now, we use stop_machine() to patch the text and when we use IPIs for
> > >> remote icache flushes (which is emitted in patch_text_nosync()), the system
> > >> hangs.
> > >>
> > >> So instead, make sure every CPU executes the stop_machine() patching
> > >> function and emit a local icache flush there.
> > >>
> > >> Co-developed-by: Björn Töpel <bjorn@rivosinc.com>
> > >> Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
> > >> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> > >> Reviewed-by: Andrea Parri <parri.andrea@gmail.com>
> > >
> > > What commit does this fix?
> >
> > Hmm. The bug is exposed when the AIA IPI are introduced, and used
> > (instead of the firmware-based).
> >
> > I'm not sure this is something we'd like backported, but rather a
> > prerequisite to AIA.
> >
> > @Anup @Alex WDYT?
> >
> 
> The current text patching never considered IPIs being injected
> directly in S-mode from hart to another so we are seeing this
> issue now with AIA IPIs.
> 
> We certainly don't need to backport this fix since it's more
> of a preparatory fix for AIA IPIs.

Whether or not this is backportable, if it fixes a bug, it should get
a Fixes: tag for the commit that it fixes. Fixes: isn't "the backport"
tag, cc: stable is.
Björn Töpel March 5, 2024, 8:21 a.m. UTC | #5
Conor Dooley <conor.dooley@microchip.com> writes:

> On Tue, Mar 05, 2024 at 08:33:30AM +0530, Anup Patel wrote:
>> On Tue, Mar 5, 2024 at 1:54 AM Björn Töpel <bjorn@kernel.org> wrote:
>> >
>> > Conor Dooley <conor@kernel.org> writes:
>> >
>> > > On Thu, Feb 29, 2024 at 01:10:56PM +0100, Alexandre Ghiti wrote:
>> > >> For now, we use stop_machine() to patch the text and when we use IPIs for
>> > >> remote icache flushes (which is emitted in patch_text_nosync()), the system
>> > >> hangs.
>> > >>
>> > >> So instead, make sure every CPU executes the stop_machine() patching
>> > >> function and emit a local icache flush there.
>> > >>
>> > >> Co-developed-by: Björn Töpel <bjorn@rivosinc.com>
>> > >> Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
>> > >> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
>> > >> Reviewed-by: Andrea Parri <parri.andrea@gmail.com>
>> > >
>> > > What commit does this fix?
>> >
>> > Hmm. The bug is exposed when the AIA IPI are introduced, and used
>> > (instead of the firmware-based).
>> >
>> > I'm not sure this is something we'd like backported, but rather a
>> > prerequisite to AIA.
>> >
>> > @Anup @Alex WDYT?
>> >
>> 
>> The current text patching never considered IPIs being injected
>> directly in S-mode from hart to another so we are seeing this
>> issue now with AIA IPIs.
>> 
>> We certainly don't need to backport this fix since it's more
>> of a preparatory fix for AIA IPIs.
>
> Whether or not this is backportable, if it fixes a bug, it should get
> a Fixes: tag for the commit that it fixes. Fixes: isn't "the backport"
> tag, cc: stable is.

I guess the question is if this *is* a fix, or rather a change required
for AIA (not a fix).
diff mbox series

Patch

diff --git a/arch/riscv/include/asm/patch.h b/arch/riscv/include/asm/patch.h
index e88b52d39eac..9f5d6e14c405 100644
--- a/arch/riscv/include/asm/patch.h
+++ b/arch/riscv/include/asm/patch.h
@@ -6,6 +6,7 @@ 
 #ifndef _ASM_RISCV_PATCH_H
 #define _ASM_RISCV_PATCH_H
 
+int patch_insn_write(void *addr, const void *insn, size_t len);
 int patch_text_nosync(void *addr, const void *insns, size_t len);
 int patch_text_set_nosync(void *addr, u8 c, size_t len);
 int patch_text(void *addr, u32 *insns, int ninsns);
diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
index f5aa24d9e1c1..4f4987a6d83d 100644
--- a/arch/riscv/kernel/ftrace.c
+++ b/arch/riscv/kernel/ftrace.c
@@ -8,6 +8,7 @@ 
 #include <linux/ftrace.h>
 #include <linux/uaccess.h>
 #include <linux/memory.h>
+#include <linux/stop_machine.h>
 #include <asm/cacheflush.h>
 #include <asm/patch.h>
 
@@ -75,8 +76,7 @@  static int __ftrace_modify_call(unsigned long hook_pos, unsigned long target,
 		make_call_t0(hook_pos, target, call);
 
 	/* Replace the auipc-jalr pair at once. Return -EPERM on write error. */
-	if (patch_text_nosync
-	    ((void *)hook_pos, enable ? call : nops, MCOUNT_INSN_SIZE))
+	if (patch_insn_write((void *)hook_pos, enable ? call : nops, MCOUNT_INSN_SIZE))
 		return -EPERM;
 
 	return 0;
@@ -88,7 +88,7 @@  int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
 
 	make_call_t0(rec->ip, addr, call);
 
-	if (patch_text_nosync((void *)rec->ip, call, MCOUNT_INSN_SIZE))
+	if (patch_insn_write((void *)rec->ip, call, MCOUNT_INSN_SIZE))
 		return -EPERM;
 
 	return 0;
@@ -99,7 +99,7 @@  int ftrace_make_nop(struct module *mod, struct dyn_ftrace *rec,
 {
 	unsigned int nops[2] = {NOP4, NOP4};
 
-	if (patch_text_nosync((void *)rec->ip, nops, MCOUNT_INSN_SIZE))
+	if (patch_insn_write((void *)rec->ip, nops, MCOUNT_INSN_SIZE))
 		return -EPERM;
 
 	return 0;
@@ -134,6 +134,42 @@  int ftrace_update_ftrace_func(ftrace_func_t func)
 
 	return ret;
 }
+
+struct ftrace_modify_param {
+	int command;
+	atomic_t cpu_count;
+};
+
+static int __ftrace_modify_code(void *data)
+{
+	struct ftrace_modify_param *param = data;
+
+	if (atomic_inc_return(&param->cpu_count) == num_online_cpus()) {
+		ftrace_modify_all_code(param->command);
+		/*
+		 * Make sure the patching store is effective *before* we
+		 * increment the counter which releases all waiting CPUs
+		 * by using the release variant of atomic increment. The
+		 * release pairs with the call to local_flush_icache_all()
+		 * on the waiting CPU.
+		 */
+		atomic_inc_return_release(&param->cpu_count);
+	} else {
+		while (atomic_read(&param->cpu_count) <= num_online_cpus())
+			cpu_relax();
+	}
+
+	local_flush_icache_all();
+
+	return 0;
+}
+
+void arch_ftrace_update_code(int command)
+{
+	struct ftrace_modify_param param = { command, ATOMIC_INIT(0) };
+
+	stop_machine(__ftrace_modify_code, &param, cpu_online_mask);
+}
 #endif
 
 #ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
index 0b5c16dfe3f4..9a1bce1adf5a 100644
--- a/arch/riscv/kernel/patch.c
+++ b/arch/riscv/kernel/patch.c
@@ -188,7 +188,7 @@  int patch_text_set_nosync(void *addr, u8 c, size_t len)
 }
 NOKPROBE_SYMBOL(patch_text_set_nosync);
 
-static int patch_insn_write(void *addr, const void *insn, size_t len)
+int patch_insn_write(void *addr, const void *insn, size_t len)
 {
 	size_t patched = 0;
 	size_t size;
@@ -232,15 +232,23 @@  static int patch_text_cb(void *data)
 	if (atomic_inc_return(&patch->cpu_count) == num_online_cpus()) {
 		for (i = 0; ret == 0 && i < patch->ninsns; i++) {
 			len = GET_INSN_LENGTH(patch->insns[i]);
-			ret = patch_text_nosync(patch->addr + i * len,
-						&patch->insns[i], len);
+			ret = patch_insn_write(patch->addr + i * len, &patch->insns[i], len);
 		}
-		atomic_inc(&patch->cpu_count);
+		/*
+		 * Make sure the patching store is effective *before* we
+		 * increment the counter which releases all waiting CPUs
+		 * by using the release variant of atomic increment. The
+		 * release pairs with the call to local_flush_icache_all()
+		 * on the waiting CPU.
+		 */
+		atomic_inc_return_release(&patch->cpu_count);
 	} else {
 		while (atomic_read(&patch->cpu_count) <= num_online_cpus())
 			cpu_relax();
 	}
 
+	local_flush_icache_all();
+
 	return ret;
 }
 NOKPROBE_SYMBOL(patch_text_cb);