diff mbox series

RISC-V: Don't check text_mutex during stop_machine

Message ID 20210506071041.417854-1-palmer@dabbelt.com (mailing list archive)
State New, archived
Headers show
Series RISC-V: Don't check text_mutex during stop_machine | expand

Commit Message

Palmer Dabbelt May 6, 2021, 7:10 a.m. UTC
From: Palmer Dabbelt <palmerdabbelt@google.com>

We're currently using stop_machine() to update ftrace, which means that
the thread that takes text_mutex during ftrace_prepare() may not be the
same as the thread that eventually patches the code.  This isn't
actually a race because the lock is still held (preventing any other
concurrent accesses) and there is only one thread running during
stop_machine(), but it does trigger a lockdep failure.

This patch just elides the lockdep check during stop_machine.

Fixes: c15ac4fd60d5 ("riscv/ftrace: Add dynamic function tracer support")
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
---
In theory we should be able to avoid using stop_machine() with some
clever code sequences, but that's too big of a change to be considered a
fix.  I also can't find the text I thought was in the ISA manual about
the allowed behaviors for concurrent modification of the instruction
stream, so I might have just mis-remembered that.
---
 arch/riscv/include/asm/ftrace.h |  4 ++++
 arch/riscv/kernel/ftrace.c      | 15 +++++++++++++++
 arch/riscv/kernel/patch.c       | 10 +++++++++-
 3 files changed, 28 insertions(+), 1 deletion(-)

Comments

kernel test robot May 6, 2021, 12:31 p.m. UTC | #1
Hi Palmer,

I love your patch! Yet something to improve:

[auto build test ERROR on tip/perf/core]
[also build test ERROR on linus/master v5.12 next-20210506]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Palmer-Dabbelt/RISC-V-Don-t-check-text_mutex-during-stop_machine/20210506-151316
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git ed8e50800bf4c2d904db9c75408a67085e6cca3d
config: riscv-randconfig-c023-20210506 (attached as .config)
compiler: riscv32-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/61fc593f7bfbd5a52e0e5ba3a55f02df81df08d5
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Palmer-Dabbelt/RISC-V-Don-t-check-text_mutex-during-stop_machine/20210506-151316
        git checkout 61fc593f7bfbd5a52e0e5ba3a55f02df81df08d5
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross W=1 ARCH=riscv 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   riscv32-linux-ld: arch/riscv/kernel/patch.o: in function `patch_insn_write':
>> patch.c:(.text+0x68): undefined reference to `riscv_ftrace_in_stop_machine'
>> riscv32-linux-ld: patch.c:(.text+0x6c): undefined reference to `riscv_ftrace_in_stop_machine'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Steven Rostedt May 6, 2021, 1:20 p.m. UTC | #2
On Thu,  6 May 2021 00:10:41 -0700
Palmer Dabbelt <palmer@dabbelt.com> wrote:

> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
> index 7f1e5203de88..da2405652f1d 100644
> --- a/arch/riscv/kernel/ftrace.c
> +++ b/arch/riscv/kernel/ftrace.c
> @@ -11,6 +11,8 @@
>  #include <asm/cacheflush.h>
>  #include <asm/patch.h>
>  
> +int riscv_ftrace_in_stop_machine;
> +
>  #ifdef CONFIG_DYNAMIC_FTRACE
>  int ftrace_arch_code_modify_prepare(void) __acquires(&text_mutex)
>  {
> @@ -232,3 +234,16 @@ int ftrace_disable_ftrace_graph_caller(void)
>  }
>  #endif /* CONFIG_DYNAMIC_FTRACE */
>  #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
> +
> +void arch_ftrace_update_code(int command)
> +{
> +	/*
> +	 * The code sequences we use for ftrace can't be patched while the
> +	 * kernel is running, so we need to use stop_machine() to modify them
> +	 * for now.  This doesn't play nice with text_mutex, we use this flag
> +	 * to elide the check.
> +	 */
> +	riscv_ftrace_in_stop_machine = true;
> +	ftrace_run_stop_machine(command);
> +	riscv_ftrace_in_stop_machine = false;
> +}
> diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
> index 0b552873a577..7983dba477f0 100644

This would work, but my suggestion was to do it without having to add this
arch function. Because the caller of this is:

static void ftrace_run_update_code(int command)
{
	int ret;

	ret = ftrace_arch_code_modify_prepare();
	FTRACE_WARN_ON(ret);
	if (ret)
		return;

	/*
	 * By default we use stop_machine() to modify the code.
	 * But archs can do what ever they want as long as it
	 * is safe. The stop_machine() is the safest, but also
	 * produces the most overhead.
	 */
	arch_ftrace_update_code(command);

	ret = ftrace_arch_code_modify_post_process();
	FTRACE_WARN_ON(ret);
}


Where you already have two hooks that you use to take the text_mutex before
calling arch_ftrace_update_code().

In RISC-V those are:

int ftrace_arch_code_modify_prepare(void) __acquires(&text_mutex)
{
        mutex_lock(&text_mutex);
        return 0;
}

int ftrace_arch_code_modify_post_process(void) __releases(&text_mutex)
{
        mutex_unlock(&text_mutex);
        return 0;
}

Where all you have to do is change them to:

int ftrace_arch_code_modify_prepare(void) __acquires(&text_mutex)
{
        mutex_lock(&text_mutex);
	riscv_ftrace_in_stop_machine = true;
        return 0;
}

int ftrace_arch_code_modify_post_process(void) __releases(&text_mutex)
{
	riscv_ftrace_in_stop_machine = false;
        mutex_unlock(&text_mutex);
        return 0;
}

And you have the exact same affect. Those functions are only used before
calling the stop machine code you have.

-- Steve
Steven Rostedt May 6, 2021, 1:25 p.m. UTC | #3
On Thu,  6 May 2021 00:10:41 -0700
Palmer Dabbelt <palmer@dabbelt.com> wrote:

> ---
> In theory we should be able to avoid using stop_machine() with some
> clever code sequences, but that's too big of a change to be considered a
> fix.  I also can't find the text I thought was in the ISA manual about
> the allowed behaviors for concurrent modification of the instruction
> stream, so I might have just mis-remembered that.
> ---

I wonder if you could at least use break points, as some other archs do,
and what x86 does.

If you have this make believe machine code:

	00 00 00 00		nop

And you want to turn it into a call.

	aa 12 34 56		bl ftrace_caller

And if your architecture has a way to inject a break point on live code.
Let's call this FF for the break point code.

You can inject that first:

	FF 00 00 00

sync all CPUs where now all callers will hit this and jump to the break
point handler, which simply does:

	ip = ip + 4;
	return;

and returns back to the instruction after this nop/call.

Change the rest of the instruction.

	FF 12 34 56

sync all CPUs so that they all see this new instruction, but are still
triggering the break point.

Then finally remove the break point.

	aa 12 34 56

And you just switched from the nop to the call without using stop machine.

-- Steve
Palmer Dabbelt May 22, 2021, 7:32 p.m. UTC | #4
On Thu, 06 May 2021 06:25:50 PDT (-0700), rostedt@goodmis.org wrote:
> On Thu,  6 May 2021 00:10:41 -0700
> Palmer Dabbelt <palmer@dabbelt.com> wrote:
>
>> ---
>> In theory we should be able to avoid using stop_machine() with some
>> clever code sequences, but that's too big of a change to be considered a
>> fix.  I also can't find the text I thought was in the ISA manual about
>> the allowed behaviors for concurrent modification of the instruction
>> stream, so I might have just mis-remembered that.
>> ---
>
> I wonder if you could at least use break points, as some other archs do,
> and what x86 does.
>
> If you have this make believe machine code:
>
> 	00 00 00 00		nop
>
> And you want to turn it into a call.
>
> 	aa 12 34 56		bl ftrace_caller
>
> And if your architecture has a way to inject a break point on live code.
> Let's call this FF for the break point code.
>
> You can inject that first:
>
> 	FF 00 00 00

That would rely on this "concurrent writes to the instruction stream 
don't tear" property that I thought we'd put in the ISA but I can't 
actually find in the text of the document.  That said, I hadn't actually 
thought about replacing single bytes in the instruction stream.  In 
theory we don't have anything that guartees those won't tear, but I'm 
not sure how any reasonable implementation would go about manifesting 
those so it might be an easy retrofit to the ISA.

I ran through our breakpoint encodings and don't think those help, but 
we do have 0x00000013 as a NOP (addi x0, x0, 0) which is one byte away 
from 0x00000000 (which is defined to be an invalid instruction on all 
RISC-V systems).  There's really no difference between how a breakpoint 
traps and how an illegal instruction traps on RISC-V, so we should be 
able to use that to construct some sort of sequence.

So we can start with

    nop
    nop
    nop
    nop
    nop

then do a single-byte write to turn it into

    unimp
    nop
    nop
    nop
    nop

we can then IPI to all the harts in order to get them on the same page 
about that trap, which we can then skip over.  We'll need some way to 
differentiate this from accidental executions of unimp, but we can just 
build up a table somewhere (it wasn't immediately clear how x86 does 
this).  Then we have no ordering restrictions on converting the rest of 
the stub into what's necessary to trace a function, which should look 
the same as what we have now

    unimp
    save ra, ...(sp)
    auipc ra, ...
    jalr ra, ...
    load ra, ...(sp)

At which point we can IPI everywhere to sync the instructions stream 
again, before we turn off the trap

    nop
    save ra, ...(sp)
    auipc ra, ...
    jalr ra, ...
    load ra, ...(sp)

> sync all CPUs where now all callers will hit this and jump to the break
> point handler, which simply does:
>
> 	ip = ip + 4;
> 	return;
>
> and returns back to the instruction after this nop/call.

There'll be some slight headaches there because the length of the all-0 
unimplemented instruction depends on which extensions are supported, but 
we can detect ILEN at runtime so that shouldn't be a showstopper.

> Change the rest of the instruction.
>
> 	FF 12 34 56
>
> sync all CPUs so that they all see this new instruction, but are still
> triggering the break point.
>
> Then finally remove the break point.
>
> 	aa 12 34 56
>
> And you just switched from the nop to the call without using stop machine.

OK, ya I think that all should work.  We're still depending on this 
"single byte writes to the instruction stream don't tear" guarantee, but 
I think that's a bit pedantic.  I'll open a question on the ISA spec 
just to be sure one I can link here.

I'm probably not going to have the time to do this for a bit because I'm 
buried in this non-coherent stuff, but if someone wants to take a shot 
at it I'd be happy to take a look.

Thanks for pointing this out, this should get is away from 
stop_machine() way sooner than waiting for a multi-byte modification 
sequence would!
Steven Rostedt May 28, 2021, 10:21 p.m. UTC | #5
On Sat, 22 May 2021 12:32:05 -0700 (PDT)
Palmer Dabbelt <palmer@dabbelt.com> wrote:

> we can then IPI to all the harts in order to get them on the same page 
> about that trap, which we can then skip over.  We'll need some way to 
> differentiate this from accidental executions of unimp, but we can just 
> build up a table somewhere (it wasn't immediately clear how x86 does 

It currently uses the same code as the text_poke does, which does a
"batching" and keeps track of the locations that were modified. But before
that change (768ae4406a x86/ftrace: Use text_poke()), it had:

int ftrace_int3_handler(struct pt_regs *regs)
{
        unsigned long ip;

        if (WARN_ON_ONCE(!regs))
                return 0;

        ip = regs->ip - INT3_INSN_SIZE;

        if (ftrace_location(ip)) {
                int3_emulate_call(regs, (unsigned long)ftrace_regs_caller);
                return 1;
        } else if (is_ftrace_caller(ip)) {
                if (!ftrace_update_func_call) {
                        int3_emulate_jmp(regs, ip + CALL_INSN_SIZE);
                        return 1;
                }
                int3_emulate_call(regs, ftrace_update_func_call);
                return 1;
        }

        return 0;
}

That "ftrace_location()" is the table you are looking for. It will return
true if the location is registered with ftrace or not (i.e. the mcount
call).

The "int3_emulate_jmp()" is needed to handle the case that we switch from
one trampoline to another trampoline. But that's also an architecture
specific feature, and RISC-V may not have that yet.


-- Steve


> this).  Then we have no ordering restrictions on converting the rest of 
> the stub into what's necessary to trace a function, which should look 
> the same as what we have now
diff mbox series

Patch

diff --git a/arch/riscv/include/asm/ftrace.h b/arch/riscv/include/asm/ftrace.h
index 04dad3380041..ee8f07b4dbee 100644
--- a/arch/riscv/include/asm/ftrace.h
+++ b/arch/riscv/include/asm/ftrace.h
@@ -85,4 +85,8 @@  int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec);
 
 #endif
 
+#ifndef __ASSEMBLY__
+extern int riscv_ftrace_in_stop_machine;
+#endif
+
 #endif /* _ASM_RISCV_FTRACE_H */
diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
index 7f1e5203de88..da2405652f1d 100644
--- a/arch/riscv/kernel/ftrace.c
+++ b/arch/riscv/kernel/ftrace.c
@@ -11,6 +11,8 @@ 
 #include <asm/cacheflush.h>
 #include <asm/patch.h>
 
+int riscv_ftrace_in_stop_machine;
+
 #ifdef CONFIG_DYNAMIC_FTRACE
 int ftrace_arch_code_modify_prepare(void) __acquires(&text_mutex)
 {
@@ -232,3 +234,16 @@  int ftrace_disable_ftrace_graph_caller(void)
 }
 #endif /* CONFIG_DYNAMIC_FTRACE */
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
+
+void arch_ftrace_update_code(int command)
+{
+	/*
+	 * The code sequences we use for ftrace can't be patched while the
+	 * kernel is running, so we need to use stop_machine() to modify them
+	 * for now.  This doesn't play nice with text_mutex, we use this flag
+	 * to elide the check.
+	 */
+	riscv_ftrace_in_stop_machine = true;
+	ftrace_run_stop_machine(command);
+	riscv_ftrace_in_stop_machine = false;
+}
diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
index 0b552873a577..7983dba477f0 100644
--- a/arch/riscv/kernel/patch.c
+++ b/arch/riscv/kernel/patch.c
@@ -11,6 +11,7 @@ 
 #include <asm/kprobes.h>
 #include <asm/cacheflush.h>
 #include <asm/fixmap.h>
+#include <asm/ftrace.h>
 #include <asm/patch.h>
 
 struct patch_insn {
@@ -59,8 +60,15 @@  static int patch_insn_write(void *addr, const void *insn, size_t len)
 	 * Before reaching here, it was expected to lock the text_mutex
 	 * already, so we don't need to give another lock here and could
 	 * ensure that it was safe between each cores.
+	 *
+	 * We're currently using stop_machine() for ftrace, and while that
+	 * ensures text_mutex is held before installing the mappings it does
+	 * not ensure text_mutex is held by the calling thread.  That's safe
+	 * but triggers a lockdep failure, so just elide it for that specific
+	 * case.
 	 */
-	lockdep_assert_held(&text_mutex);
+	if (!riscv_ftrace_in_stop_machine)
+		lockdep_assert_held(&text_mutex);
 
 	if (across_pages)
 		patch_map(addr + len, FIX_TEXT_POKE1);