diff mbox series

riscv: Fix early ftrace nop patching

Message ID 20240523115134.70380-1-alexghiti@rivosinc.com (mailing list archive)
State Accepted
Commit 6ca445d8af0ed5950ebf899415fd6bfcd7d9d7a3
Headers show
Series riscv: Fix early ftrace nop patching | expand

Checks

Context Check Description
conchuod/vmtest-for-next-PR fail PR summary
conchuod/patch-1-test-1 success .github/scripts/patches/tests/build_rv32_defconfig.sh
conchuod/patch-1-test-2 fail .github/scripts/patches/tests/build_rv64_clang_allmodconfig.sh
conchuod/patch-1-test-3 fail .github/scripts/patches/tests/build_rv64_gcc_allmodconfig.sh
conchuod/patch-1-test-4 success .github/scripts/patches/tests/build_rv64_nommu_k210_defconfig.sh
conchuod/patch-1-test-5 success .github/scripts/patches/tests/build_rv64_nommu_virt_defconfig.sh
conchuod/patch-1-test-6 warning .github/scripts/patches/tests/checkpatch.sh
conchuod/patch-1-test-7 success .github/scripts/patches/tests/dtb_warn_rv64.sh
conchuod/patch-1-test-8 success .github/scripts/patches/tests/header_inline.sh
conchuod/patch-1-test-9 success .github/scripts/patches/tests/kdoc.sh
conchuod/patch-1-test-10 success .github/scripts/patches/tests/module_param.sh
conchuod/patch-1-test-11 success .github/scripts/patches/tests/verify_fixes.sh
conchuod/patch-1-test-12 success .github/scripts/patches/tests/verify_signedoff.sh

Commit Message

Alexandre Ghiti May 23, 2024, 11:51 a.m. UTC
Commit c97bf629963e ("riscv: Fix text patching when IPI are used")
converted ftrace_make_nop() to use patch_insn_write() which does not
emit any icache flush relying entirely on __ftrace_modify_code() to do
that.

But we missed that ftrace_make_nop() was called very early directly when
converting mcount calls into nops (actually on riscv it converts 2B nops
emitted by the compiler into 4B nops).

This caused crashes on multiple HW as reported by Conor and Björn since
the booting core could have half-patched instructions in its icache
which would trigger an illegal instruction trap: fix this by emitting a
local flush icache when early patching nops.

Fixes: c97bf629963e ("riscv: Fix text patching when IPI are used")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
---
 arch/riscv/include/asm/cacheflush.h | 6 ++++++
 arch/riscv/kernel/ftrace.c          | 3 +++
 2 files changed, 9 insertions(+)

Comments

Björn Töpel May 23, 2024, 12:48 p.m. UTC | #1
Alexandre Ghiti <alexghiti@rivosinc.com> writes:

> Commit c97bf629963e ("riscv: Fix text patching when IPI are used")
> converted ftrace_make_nop() to use patch_insn_write() which does not
> emit any icache flush relying entirely on __ftrace_modify_code() to do
> that.
>
> But we missed that ftrace_make_nop() was called very early directly when
> converting mcount calls into nops (actually on riscv it converts 2B nops
> emitted by the compiler into 4B nops).
>
> This caused crashes on multiple HW as reported by Conor and Björn since
> the booting core could have half-patched instructions in its icache
> which would trigger an illegal instruction trap: fix this by emitting a
> local flush icache when early patching nops.
>
> Fixes: c97bf629963e ("riscv: Fix text patching when IPI are used")
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>

Nice!

I've manged to reproduce the crash on the VisionFive2 board (however
only triggered when CONFIG_RELOCATABLE=y), and can verify that this fix
solves the issue.

Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Conor Dooley May 23, 2024, 2:13 p.m. UTC | #2
On Thu, May 23, 2024 at 01:51:34PM +0200, Alexandre Ghiti wrote:
> Commit c97bf629963e ("riscv: Fix text patching when IPI are used")
> converted ftrace_make_nop() to use patch_insn_write() which does not
> emit any icache flush relying entirely on __ftrace_modify_code() to do
> that.
> 
> But we missed that ftrace_make_nop() was called very early directly when
> converting mcount calls into nops (actually on riscv it converts 2B nops
> emitted by the compiler into 4B nops).
> 
> This caused crashes on multiple HW as reported by Conor and Björn since
> the booting core could have half-patched instructions in its icache
> which would trigger an illegal instruction trap: fix this by emitting a
> local flush icache when early patching nops.
> 
> Fixes: c97bf629963e ("riscv: Fix text patching when IPI are used")
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>

Reported-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>

Thanks for the quick fix Alex :)
patchwork-bot+linux-riscv@kernel.org May 23, 2024, 6 p.m. UTC | #3
Hello:

This patch was applied to riscv/linux.git (for-next)
by Palmer Dabbelt <palmer@rivosinc.com>:

On Thu, 23 May 2024 13:51:34 +0200 you wrote:
> Commit c97bf629963e ("riscv: Fix text patching when IPI are used")
> converted ftrace_make_nop() to use patch_insn_write() which does not
> emit any icache flush relying entirely on __ftrace_modify_code() to do
> that.
> 
> But we missed that ftrace_make_nop() was called very early directly when
> converting mcount calls into nops (actually on riscv it converts 2B nops
> emitted by the compiler into 4B nops).
> 
> [...]

Here is the summary with links:
  - riscv: Fix early ftrace nop patching
    https://git.kernel.org/riscv/c/6ca445d8af0e

You are awesome, thank you!
Conor Dooley June 13, 2024, 7:48 a.m. UTC | #4
On Thu, May 23, 2024 at 01:51:34PM +0200, Alexandre Ghiti wrote:
> Commit c97bf629963e ("riscv: Fix text patching when IPI are used")
> converted ftrace_make_nop() to use patch_insn_write() which does not
> emit any icache flush relying entirely on __ftrace_modify_code() to do
> that.
> 
> But we missed that ftrace_make_nop() was called very early directly when
> converting mcount calls into nops (actually on riscv it converts 2B nops
> emitted by the compiler into 4B nops).
> 
> This caused crashes on multiple HW as reported by Conor and Björn since
> the booting core could have half-patched instructions in its icache
> which would trigger an illegal instruction trap: fix this by emitting a
> local flush icache when early patching nops.
> 
> Fixes: c97bf629963e ("riscv: Fix text patching when IPI are used")
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> ---
>  arch/riscv/include/asm/cacheflush.h | 6 ++++++
>  arch/riscv/kernel/ftrace.c          | 3 +++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h
> index dd8d07146116..ce79c558a4c8 100644
> --- a/arch/riscv/include/asm/cacheflush.h
> +++ b/arch/riscv/include/asm/cacheflush.h
> @@ -13,6 +13,12 @@ static inline void local_flush_icache_all(void)
>  	asm volatile ("fence.i" ::: "memory");
>  }
>  
> +static inline void local_flush_icache_range(unsigned long start,
> +					    unsigned long end)
> +{
> +	local_flush_icache_all();
> +}
> +
>  #define PG_dcache_clean PG_arch_1
>  
>  static inline void flush_dcache_folio(struct folio *folio)
> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
> index 4f4987a6d83d..32e7c401dfb4 100644
> --- a/arch/riscv/kernel/ftrace.c
> +++ b/arch/riscv/kernel/ftrace.c
> @@ -120,6 +120,9 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
>  	out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
>  	mutex_unlock(&text_mutex);

So, turns out that this patch is not sufficient. I've seen some more
crashes, seemingly due to initialising nops on this mutex_unlock().
Palmer suggested moving the if (!mod) ... into the lock, which fixed
the problem for me.

>  
> +	if (!mod)
> +		local_flush_icache_range(rec->ip, rec->ip + MCOUNT_INSN_SIZE);
> +
>  	return out;
>  }
>  
> -- 
> 2.39.2
>
Alexandre Ghiti June 17, 2024, 1:23 p.m. UTC | #5
Hi Conor,

Sorry for the delay here.

On 13/06/2024 09:48, Conor Dooley wrote:
> On Thu, May 23, 2024 at 01:51:34PM +0200, Alexandre Ghiti wrote:
>> Commit c97bf629963e ("riscv: Fix text patching when IPI are used")
>> converted ftrace_make_nop() to use patch_insn_write() which does not
>> emit any icache flush relying entirely on __ftrace_modify_code() to do
>> that.
>>
>> But we missed that ftrace_make_nop() was called very early directly when
>> converting mcount calls into nops (actually on riscv it converts 2B nops
>> emitted by the compiler into 4B nops).
>>
>> This caused crashes on multiple HW as reported by Conor and Björn since
>> the booting core could have half-patched instructions in its icache
>> which would trigger an illegal instruction trap: fix this by emitting a
>> local flush icache when early patching nops.
>>
>> Fixes: c97bf629963e ("riscv: Fix text patching when IPI are used")
>> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
>> ---
>>   arch/riscv/include/asm/cacheflush.h | 6 ++++++
>>   arch/riscv/kernel/ftrace.c          | 3 +++
>>   2 files changed, 9 insertions(+)
>>
>> diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h
>> index dd8d07146116..ce79c558a4c8 100644
>> --- a/arch/riscv/include/asm/cacheflush.h
>> +++ b/arch/riscv/include/asm/cacheflush.h
>> @@ -13,6 +13,12 @@ static inline void local_flush_icache_all(void)
>>   	asm volatile ("fence.i" ::: "memory");
>>   }
>>   
>> +static inline void local_flush_icache_range(unsigned long start,
>> +					    unsigned long end)
>> +{
>> +	local_flush_icache_all();
>> +}
>> +
>>   #define PG_dcache_clean PG_arch_1
>>   
>>   static inline void flush_dcache_folio(struct folio *folio)
>> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
>> index 4f4987a6d83d..32e7c401dfb4 100644
>> --- a/arch/riscv/kernel/ftrace.c
>> +++ b/arch/riscv/kernel/ftrace.c
>> @@ -120,6 +120,9 @@ int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
>>   	out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
>>   	mutex_unlock(&text_mutex);
> So, turns out that this patch is not sufficient. I've seen some more
> crashes, seemingly due to initialising nops on this mutex_unlock().
> Palmer suggested moving the if (!mod) ... into the lock, which fixed
> the problem for me.


Ok, it makes sense, I completely missed that. I'll send a fix for that 
shortly so that it can be merged in rc5.

Thanks,

Alex


>
>>   
>> +	if (!mod)
>> +		local_flush_icache_range(rec->ip, rec->ip + MCOUNT_INSN_SIZE);
>> +
>>   	return out;
>>   }
>>   
>> -- 
>> 2.39.2
>>
>>
>> _______________________________________________
>> linux-riscv mailing list
>> linux-riscv@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-riscv
Alexandre Ghiti June 18, 2024, 12:02 p.m. UTC | #6
Hi Conor,

On 17/06/2024 15:23, Alexandre Ghiti wrote:
> Hi Conor,
>
> Sorry for the delay here.
>
> On 13/06/2024 09:48, Conor Dooley wrote:
>> On Thu, May 23, 2024 at 01:51:34PM +0200, Alexandre Ghiti wrote:
>>> Commit c97bf629963e ("riscv: Fix text patching when IPI are used")
>>> converted ftrace_make_nop() to use patch_insn_write() which does not
>>> emit any icache flush relying entirely on __ftrace_modify_code() to do
>>> that.
>>>
>>> But we missed that ftrace_make_nop() was called very early directly 
>>> when
>>> converting mcount calls into nops (actually on riscv it converts 2B 
>>> nops
>>> emitted by the compiler into 4B nops).
>>>
>>> This caused crashes on multiple HW as reported by Conor and Björn since
>>> the booting core could have half-patched instructions in its icache
>>> which would trigger an illegal instruction trap: fix this by emitting a
>>> local flush icache when early patching nops.
>>>
>>> Fixes: c97bf629963e ("riscv: Fix text patching when IPI are used")
>>> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
>>> ---
>>>   arch/riscv/include/asm/cacheflush.h | 6 ++++++
>>>   arch/riscv/kernel/ftrace.c          | 3 +++
>>>   2 files changed, 9 insertions(+)
>>>
>>> diff --git a/arch/riscv/include/asm/cacheflush.h 
>>> b/arch/riscv/include/asm/cacheflush.h
>>> index dd8d07146116..ce79c558a4c8 100644
>>> --- a/arch/riscv/include/asm/cacheflush.h
>>> +++ b/arch/riscv/include/asm/cacheflush.h
>>> @@ -13,6 +13,12 @@ static inline void local_flush_icache_all(void)
>>>       asm volatile ("fence.i" ::: "memory");
>>>   }
>>>   +static inline void local_flush_icache_range(unsigned long start,
>>> +                        unsigned long end)
>>> +{
>>> +    local_flush_icache_all();
>>> +}
>>> +
>>>   #define PG_dcache_clean PG_arch_1
>>>     static inline void flush_dcache_folio(struct folio *folio)
>>> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
>>> index 4f4987a6d83d..32e7c401dfb4 100644
>>> --- a/arch/riscv/kernel/ftrace.c
>>> +++ b/arch/riscv/kernel/ftrace.c
>>> @@ -120,6 +120,9 @@ int ftrace_init_nop(struct module *mod, struct 
>>> dyn_ftrace *rec)
>>>       out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
>>>       mutex_unlock(&text_mutex);
>> So, turns out that this patch is not sufficient. I've seen some more
>> crashes, seemingly due to initialising nops on this mutex_unlock().
>> Palmer suggested moving the if (!mod) ... into the lock, which fixed
>> the problem for me.
>
>
> Ok, it makes sense, I completely missed that. I'll send a fix for that 
> shortly so that it can be merged in rc5.
>
> Thanks,
>
> Alex


So I digged a bit more and I'm afraid that the only way to make sure 
this issue does not happen elsewhere is to flush the icache right after 
the patching. We actually can't wait to batch the icache flush since 
along the way, we may call a function that has just been patched (the 
issue that you encountered here).

I don't know how much it will impact the performance but I guess it will.

Unless someone has a better idea (I added Andy and Puranjay in cc), here 
is the patch that implements this, can you give it a try? Thanks

diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
index 87cbd86576b2..4b95c574fd04 100644
--- a/arch/riscv/kernel/ftrace.c
+++ b/arch/riscv/kernel/ftrace.c
@@ -120,9 +120,6 @@ int ftrace_init_nop(struct module *mod, struct 
dyn_ftrace *rec)
         out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
         mutex_unlock(&text_mutex);

-       if (!mod)
-               local_flush_icache_range(rec->ip, rec->ip + 
MCOUNT_INSN_SIZE);
-
         return out;
  }

@@ -156,9 +153,9 @@ static int __ftrace_modify_code(void *data)
         } else {
                 while (atomic_read(&param->cpu_count) <= num_online_cpus())
                         cpu_relax();
-       }

-       local_flush_icache_all();
+               local_flush_icache_all();
+       }

         return 0;
  }
diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
index 4007563fb607..ab03732d06c4 100644
--- a/arch/riscv/kernel/patch.c
+++ b/arch/riscv/kernel/patch.c
@@ -89,6 +89,14 @@ static int __patch_insn_set(void *addr, u8 c, size_t len)

         memset(waddr, c, len);

+       /*
+        * We could have just patched a function that is about to be
+        * called so make sure we don't execute partially patched
+        * instructions by flushing the icache as soon as possible.
+        */
+       local_flush_icache_range((unsigned long)waddr,
+                                (unsigned long)waddr + len);
+
         patch_unmap(FIX_TEXT_POKE0);

         if (across_pages)
@@ -135,6 +143,14 @@ static int __patch_insn_write(void *addr, const 
void *insn, size_t len)

         ret = copy_to_kernel_nofault(waddr, insn, len);

+       /*
+        * We could have just patched a function that is about to be
+        * called so make sure we don't execute partially patched
+        * instructions by flushing the icache as soon as possible.
+        */
+       local_flush_icache_range((unsigned long)waddr,
+                                (unsigned long)waddr + len);
+
         patch_unmap(FIX_TEXT_POKE0);

         if (across_pages)
@@ -189,9 +205,6 @@ int patch_text_set_nosync(void *addr, u8 c, size_t len)

         ret = patch_insn_set(tp, c, len);

-       if (!ret)
-               flush_icache_range((uintptr_t)tp, (uintptr_t)tp + len);
-
         return ret;
  }
  NOKPROBE_SYMBOL(patch_text_set_nosync);
@@ -224,9 +237,6 @@ int patch_text_nosync(void *addr, const void *insns, 
size_t len)

         ret = patch_insn_write(tp, insns, len);

-       if (!ret)
-               flush_icache_range((uintptr_t) tp, (uintptr_t) tp + len);
-
         return ret;
  }
  NOKPROBE_SYMBOL(patch_text_nosync);
@@ -253,9 +263,9 @@ static int patch_text_cb(void *data)
         } else {
                 while (atomic_read(&patch->cpu_count) <= num_online_cpus())
                         cpu_relax();
-       }

-       local_flush_icache_all();
+               local_flush_icache_all();
+       }

         return ret;
  }


>
>
>>
>>>   +    if (!mod)
>>> +        local_flush_icache_range(rec->ip, rec->ip + MCOUNT_INSN_SIZE);
>>> +
>>>       return out;
>>>   }
>>>   --
>>> 2.39.2
>>>
>>>
>>> _______________________________________________
>>> linux-riscv mailing list
>>> linux-riscv@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Andy Chiu June 18, 2024, 12:48 p.m. UTC | #7
On Tue, Jun 18, 2024 at 8:02 PM Alexandre Ghiti <alex@ghiti.fr> wrote:
>
> Hi Conor,
>
> On 17/06/2024 15:23, Alexandre Ghiti wrote:
> > Hi Conor,
> >
> > Sorry for the delay here.
> >
> > On 13/06/2024 09:48, Conor Dooley wrote:
> >> On Thu, May 23, 2024 at 01:51:34PM +0200, Alexandre Ghiti wrote:
> >>> Commit c97bf629963e ("riscv: Fix text patching when IPI are used")
> >>> converted ftrace_make_nop() to use patch_insn_write() which does not
> >>> emit any icache flush relying entirely on __ftrace_modify_code() to do
> >>> that.
> >>>
> >>> But we missed that ftrace_make_nop() was called very early directly
> >>> when
> >>> converting mcount calls into nops (actually on riscv it converts 2B
> >>> nops
> >>> emitted by the compiler into 4B nops).
> >>>
> >>> This caused crashes on multiple HW as reported by Conor and Björn since
> >>> the booting core could have half-patched instructions in its icache
> >>> which would trigger an illegal instruction trap: fix this by emitting a
> >>> local flush icache when early patching nops.
> >>>
> >>> Fixes: c97bf629963e ("riscv: Fix text patching when IPI are used")
> >>> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> >>> ---
> >>>   arch/riscv/include/asm/cacheflush.h | 6 ++++++
> >>>   arch/riscv/kernel/ftrace.c          | 3 +++
> >>>   2 files changed, 9 insertions(+)
> >>>
> >>> diff --git a/arch/riscv/include/asm/cacheflush.h
> >>> b/arch/riscv/include/asm/cacheflush.h
> >>> index dd8d07146116..ce79c558a4c8 100644
> >>> --- a/arch/riscv/include/asm/cacheflush.h
> >>> +++ b/arch/riscv/include/asm/cacheflush.h
> >>> @@ -13,6 +13,12 @@ static inline void local_flush_icache_all(void)
> >>>       asm volatile ("fence.i" ::: "memory");
> >>>   }
> >>>   +static inline void local_flush_icache_range(unsigned long start,
> >>> +                        unsigned long end)
> >>> +{
> >>> +    local_flush_icache_all();
> >>> +}
> >>> +
> >>>   #define PG_dcache_clean PG_arch_1
> >>>     static inline void flush_dcache_folio(struct folio *folio)
> >>> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
> >>> index 4f4987a6d83d..32e7c401dfb4 100644
> >>> --- a/arch/riscv/kernel/ftrace.c
> >>> +++ b/arch/riscv/kernel/ftrace.c
> >>> @@ -120,6 +120,9 @@ int ftrace_init_nop(struct module *mod, struct
> >>> dyn_ftrace *rec)
> >>>       out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
> >>>       mutex_unlock(&text_mutex);
> >> So, turns out that this patch is not sufficient. I've seen some more
> >> crashes, seemingly due to initialising nops on this mutex_unlock().
> >> Palmer suggested moving the if (!mod) ... into the lock, which fixed
> >> the problem for me.
> >
> >
> > Ok, it makes sense, I completely missed that. I'll send a fix for that
> > shortly so that it can be merged in rc5.
> >
> > Thanks,
> >
> > Alex
>
>
> So I digged a bit more and I'm afraid that the only way to make sure
> this issue does not happen elsewhere is to flush the icache right after
> the patching. We actually can't wait to batch the icache flush since
> along the way, we may call a function that has just been patched (the
> issue that you encountered here).

Trying to provide my thoughts, please let me know if I missed
anything. I think what Conor suggested is safe for init_nop, as it
would be called only when there is only one core (booting) and at the
loading stage of kernel modules. In the first case we just have to
make sure there is no patchable entry before the core executes
fence.i. The second case is unconditionally safe because there is no
read-side of the race.

It is a bit tricky for the generic (runtime) case of ftrace code
patching, but that is not because of the batch fence.i maintenance. As
long as there exists a patchable entry for the stopping thread, it is
possible for them to execute on a partially patched instruction. A
solution for this is again to prevent any patchable entry in the
stop_machine's stopping thread. Another solution is to apply the
atomic ftrace patching series which aims to get rid of the race.

>
> I don't know how much it will impact the performance but I guess it will.
>
> Unless someone has a better idea (I added Andy and Puranjay in cc), here
> is the patch that implements this, can you give it a try? Thanks
>
> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
> index 87cbd86576b2..4b95c574fd04 100644
> --- a/arch/riscv/kernel/ftrace.c
> +++ b/arch/riscv/kernel/ftrace.c
> @@ -120,9 +120,6 @@ int ftrace_init_nop(struct module *mod, struct
> dyn_ftrace *rec)
>          out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
>          mutex_unlock(&text_mutex);
>
> -       if (!mod)
> -               local_flush_icache_range(rec->ip, rec->ip +
> MCOUNT_INSN_SIZE);
> -
>          return out;
>   }
>
> @@ -156,9 +153,9 @@ static int __ftrace_modify_code(void *data)
>          } else {
>                  while (atomic_read(&param->cpu_count) <= num_online_cpus())
>                          cpu_relax();
> -       }
>
> -       local_flush_icache_all();
> +               local_flush_icache_all();
> +       }
>
>          return 0;
>   }
> diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
> index 4007563fb607..ab03732d06c4 100644
> --- a/arch/riscv/kernel/patch.c
> +++ b/arch/riscv/kernel/patch.c
> @@ -89,6 +89,14 @@ static int __patch_insn_set(void *addr, u8 c, size_t len)
>
>          memset(waddr, c, len);
>
> +       /*
> +        * We could have just patched a function that is about to be
> +        * called so make sure we don't execute partially patched
> +        * instructions by flushing the icache as soon as possible.
> +        */
> +       local_flush_icache_range((unsigned long)waddr,
> +                                (unsigned long)waddr + len);
> +
>          patch_unmap(FIX_TEXT_POKE0);
>
>          if (across_pages)
> @@ -135,6 +143,14 @@ static int __patch_insn_write(void *addr, const
> void *insn, size_t len)
>
>          ret = copy_to_kernel_nofault(waddr, insn, len);
>
> +       /*
> +        * We could have just patched a function that is about to be
> +        * called so make sure we don't execute partially patched
> +        * instructions by flushing the icache as soon as possible.
> +        */
> +       local_flush_icache_range((unsigned long)waddr,
> +                                (unsigned long)waddr + len);
> +
>          patch_unmap(FIX_TEXT_POKE0);
>
>          if (across_pages)
> @@ -189,9 +205,6 @@ int patch_text_set_nosync(void *addr, u8 c, size_t len)
>
>          ret = patch_insn_set(tp, c, len);
>
> -       if (!ret)
> -               flush_icache_range((uintptr_t)tp, (uintptr_t)tp + len);
> -
>          return ret;
>   }
>   NOKPROBE_SYMBOL(patch_text_set_nosync);
> @@ -224,9 +237,6 @@ int patch_text_nosync(void *addr, const void *insns,
> size_t len)
>
>          ret = patch_insn_write(tp, insns, len);
>
> -       if (!ret)
> -               flush_icache_range((uintptr_t) tp, (uintptr_t) tp + len);
> -
>          return ret;
>   }
>   NOKPROBE_SYMBOL(patch_text_nosync);
> @@ -253,9 +263,9 @@ static int patch_text_cb(void *data)
>          } else {
>                  while (atomic_read(&patch->cpu_count) <= num_online_cpus())
>                          cpu_relax();
> -       }
>
> -       local_flush_icache_all();
> +               local_flush_icache_all();
> +       }
>
>          return ret;
>   }
>
>
> >
> >
> >>
> >>>   +    if (!mod)
> >>> +        local_flush_icache_range(rec->ip, rec->ip + MCOUNT_INSN_SIZE);
> >>> +
> >>>       return out;
> >>>   }
> >>>   --
> >>> 2.39.2
> >>>
> >>>
> >>> _______________________________________________
> >>> linux-riscv mailing list
> >>> linux-riscv@lists.infradead.org
> >>> http://lists.infradead.org/mailman/listinfo/linux-riscv
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv

Cheers,
Andy
Alexandre Ghiti June 18, 2024, 1:39 p.m. UTC | #8
Hi Andy,

On Tue, Jun 18, 2024 at 2:48 PM Andy Chiu <andy.chiu@sifive.com> wrote:
>
> On Tue, Jun 18, 2024 at 8:02 PM Alexandre Ghiti <alex@ghiti.fr> wrote:
> >
> > Hi Conor,
> >
> > On 17/06/2024 15:23, Alexandre Ghiti wrote:
> > > Hi Conor,
> > >
> > > Sorry for the delay here.
> > >
> > > On 13/06/2024 09:48, Conor Dooley wrote:
> > >> On Thu, May 23, 2024 at 01:51:34PM +0200, Alexandre Ghiti wrote:
> > >>> Commit c97bf629963e ("riscv: Fix text patching when IPI are used")
> > >>> converted ftrace_make_nop() to use patch_insn_write() which does not
> > >>> emit any icache flush relying entirely on __ftrace_modify_code() to do
> > >>> that.
> > >>>
> > >>> But we missed that ftrace_make_nop() was called very early directly
> > >>> when
> > >>> converting mcount calls into nops (actually on riscv it converts 2B
> > >>> nops
> > >>> emitted by the compiler into 4B nops).
> > >>>
> > >>> This caused crashes on multiple HW as reported by Conor and Björn since
> > >>> the booting core could have half-patched instructions in its icache
> > >>> which would trigger an illegal instruction trap: fix this by emitting a
> > >>> local flush icache when early patching nops.
> > >>>
> > >>> Fixes: c97bf629963e ("riscv: Fix text patching when IPI are used")
> > >>> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> > >>> ---
> > >>>   arch/riscv/include/asm/cacheflush.h | 6 ++++++
> > >>>   arch/riscv/kernel/ftrace.c          | 3 +++
> > >>>   2 files changed, 9 insertions(+)
> > >>>
> > >>> diff --git a/arch/riscv/include/asm/cacheflush.h
> > >>> b/arch/riscv/include/asm/cacheflush.h
> > >>> index dd8d07146116..ce79c558a4c8 100644
> > >>> --- a/arch/riscv/include/asm/cacheflush.h
> > >>> +++ b/arch/riscv/include/asm/cacheflush.h
> > >>> @@ -13,6 +13,12 @@ static inline void local_flush_icache_all(void)
> > >>>       asm volatile ("fence.i" ::: "memory");
> > >>>   }
> > >>>   +static inline void local_flush_icache_range(unsigned long start,
> > >>> +                        unsigned long end)
> > >>> +{
> > >>> +    local_flush_icache_all();
> > >>> +}
> > >>> +
> > >>>   #define PG_dcache_clean PG_arch_1
> > >>>     static inline void flush_dcache_folio(struct folio *folio)
> > >>> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
> > >>> index 4f4987a6d83d..32e7c401dfb4 100644
> > >>> --- a/arch/riscv/kernel/ftrace.c
> > >>> +++ b/arch/riscv/kernel/ftrace.c
> > >>> @@ -120,6 +120,9 @@ int ftrace_init_nop(struct module *mod, struct
> > >>> dyn_ftrace *rec)
> > >>>       out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
> > >>>       mutex_unlock(&text_mutex);
> > >> So, turns out that this patch is not sufficient. I've seen some more
> > >> crashes, seemingly due to initialising nops on this mutex_unlock().
> > >> Palmer suggested moving the if (!mod) ... into the lock, which fixed
> > >> the problem for me.
> > >
> > >
> > > Ok, it makes sense, I completely missed that. I'll send a fix for that
> > > shortly so that it can be merged in rc5.
> > >
> > > Thanks,
> > >
> > > Alex
> >
> >
> > So I digged a bit more and I'm afraid that the only way to make sure
> > this issue does not happen elsewhere is to flush the icache right after
> > the patching. We actually can't wait to batch the icache flush since
> > along the way, we may call a function that has just been patched (the
> > issue that you encountered here).
>
> Trying to provide my thoughts, please let me know if I missed
> anything. I think what Conor suggested is safe for init_nop, as it
> would be called only when there is only one core (booting) and at the
> loading stage of kernel modules. In the first case we just have to
> make sure there is no patchable entry before the core executes
> fence.i. The second case is unconditionally safe because there is no
> read-side of the race.
>
> It is a bit tricky for the generic (runtime) case of ftrace code
> patching, but that is not because of the batch fence.i maintenance. As
> long as there exists a patchable entry for the stopping thread, it is
> possible for them to execute on a partially patched instruction. A
> solution for this is again to prevent any patchable entry in the
> stop_machine's stopping thread. Another solution is to apply the
> atomic ftrace patching series which aims to get rid of the race.

Yeah but my worry is that we can't make sure that functions called
between the patching and the fence.i are not the ones that just get
patched. At least today, patch_insn_write() etc should be marked as
noinstr. But even then, we call core functions that could be patchable
(I went through all and it *seems* we are safe *for now*, but this is
very fragile).

Then what do you think we should do to fix Conor's issue right now:
should we mark the riscv specific functions as noinstr, cross fingers
that the core functions are not (and don't become) patchable and wait
for your atomic patching? Or should we flush as soon as possible as I
proposed above (which to me fixes the issue but hurts performance)?

Thanks,

Alex

>
> >
> > I don't know how much it will impact the performance but I guess it will.
> >
> > Unless someone has a better idea (I added Andy and Puranjay in cc), here
> > is the patch that implements this, can you give it a try? Thanks
> >
> > diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
> > index 87cbd86576b2..4b95c574fd04 100644
> > --- a/arch/riscv/kernel/ftrace.c
> > +++ b/arch/riscv/kernel/ftrace.c
> > @@ -120,9 +120,6 @@ int ftrace_init_nop(struct module *mod, struct
> > dyn_ftrace *rec)
> >          out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
> >          mutex_unlock(&text_mutex);
> >
> > -       if (!mod)
> > -               local_flush_icache_range(rec->ip, rec->ip +
> > MCOUNT_INSN_SIZE);
> > -
> >          return out;
> >   }
> >
> > @@ -156,9 +153,9 @@ static int __ftrace_modify_code(void *data)
> >          } else {
> >                  while (atomic_read(&param->cpu_count) <= num_online_cpus())
> >                          cpu_relax();
> > -       }
> >
> > -       local_flush_icache_all();
> > +               local_flush_icache_all();
> > +       }
> >
> >          return 0;
> >   }
> > diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
> > index 4007563fb607..ab03732d06c4 100644
> > --- a/arch/riscv/kernel/patch.c
> > +++ b/arch/riscv/kernel/patch.c
> > @@ -89,6 +89,14 @@ static int __patch_insn_set(void *addr, u8 c, size_t len)
> >
> >          memset(waddr, c, len);
> >
> > +       /*
> > +        * We could have just patched a function that is about to be
> > +        * called so make sure we don't execute partially patched
> > +        * instructions by flushing the icache as soon as possible.
> > +        */
> > +       local_flush_icache_range((unsigned long)waddr,
> > +                                (unsigned long)waddr + len);
> > +
> >          patch_unmap(FIX_TEXT_POKE0);
> >
> >          if (across_pages)
> > @@ -135,6 +143,14 @@ static int __patch_insn_write(void *addr, const
> > void *insn, size_t len)
> >
> >          ret = copy_to_kernel_nofault(waddr, insn, len);
> >
> > +       /*
> > +        * We could have just patched a function that is about to be
> > +        * called so make sure we don't execute partially patched
> > +        * instructions by flushing the icache as soon as possible.
> > +        */
> > +       local_flush_icache_range((unsigned long)waddr,
> > +                                (unsigned long)waddr + len);
> > +
> >          patch_unmap(FIX_TEXT_POKE0);
> >
> >          if (across_pages)
> > @@ -189,9 +205,6 @@ int patch_text_set_nosync(void *addr, u8 c, size_t len)
> >
> >          ret = patch_insn_set(tp, c, len);
> >
> > -       if (!ret)
> > -               flush_icache_range((uintptr_t)tp, (uintptr_t)tp + len);
> > -
> >          return ret;
> >   }
> >   NOKPROBE_SYMBOL(patch_text_set_nosync);
> > @@ -224,9 +237,6 @@ int patch_text_nosync(void *addr, const void *insns,
> > size_t len)
> >
> >          ret = patch_insn_write(tp, insns, len);
> >
> > -       if (!ret)
> > -               flush_icache_range((uintptr_t) tp, (uintptr_t) tp + len);
> > -
> >          return ret;
> >   }
> >   NOKPROBE_SYMBOL(patch_text_nosync);
> > @@ -253,9 +263,9 @@ static int patch_text_cb(void *data)
> >          } else {
> >                  while (atomic_read(&patch->cpu_count) <= num_online_cpus())
> >                          cpu_relax();
> > -       }
> >
> > -       local_flush_icache_all();
> > +               local_flush_icache_all();
> > +       }
> >
> >          return ret;
> >   }
> >
> >
> > >
> > >
> > >>
> > >>>   +    if (!mod)
> > >>> +        local_flush_icache_range(rec->ip, rec->ip + MCOUNT_INSN_SIZE);
> > >>> +
> > >>>       return out;
> > >>>   }
> > >>>   --
> > >>> 2.39.2
> > >>>
> > >>>
> > >>> _______________________________________________
> > >>> linux-riscv mailing list
> > >>> linux-riscv@lists.infradead.org
> > >>> http://lists.infradead.org/mailman/listinfo/linux-riscv
> > >
> > > _______________________________________________
> > > linux-riscv mailing list
> > > linux-riscv@lists.infradead.org
> > > http://lists.infradead.org/mailman/listinfo/linux-riscv
>
> Cheers,
> Andy
Andy Chiu June 19, 2024, 3:40 a.m. UTC | #9
On Tue, Jun 18, 2024 at 9:40 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
>
> Hi Andy,
>
> On Tue, Jun 18, 2024 at 2:48 PM Andy Chiu <andy.chiu@sifive.com> wrote:
> >
> > On Tue, Jun 18, 2024 at 8:02 PM Alexandre Ghiti <alex@ghiti.fr> wrote:
> > >
> > > Hi Conor,
> > >
> > > On 17/06/2024 15:23, Alexandre Ghiti wrote:
> > > > Hi Conor,
> > > >
> > > > Sorry for the delay here.
> > > >
> > > > On 13/06/2024 09:48, Conor Dooley wrote:
> > > >> On Thu, May 23, 2024 at 01:51:34PM +0200, Alexandre Ghiti wrote:
> > > >>> Commit c97bf629963e ("riscv: Fix text patching when IPI are used")
> > > >>> converted ftrace_make_nop() to use patch_insn_write() which does not
> > > >>> emit any icache flush relying entirely on __ftrace_modify_code() to do
> > > >>> that.
> > > >>>
> > > >>> But we missed that ftrace_make_nop() was called very early directly
> > > >>> when
> > > >>> converting mcount calls into nops (actually on riscv it converts 2B
> > > >>> nops
> > > >>> emitted by the compiler into 4B nops).
> > > >>>
> > > >>> This caused crashes on multiple HW as reported by Conor and Björn since
> > > >>> the booting core could have half-patched instructions in its icache
> > > >>> which would trigger an illegal instruction trap: fix this by emitting a
> > > >>> local flush icache when early patching nops.
> > > >>>
> > > >>> Fixes: c97bf629963e ("riscv: Fix text patching when IPI are used")
> > > >>> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> > > >>> ---
> > > >>>   arch/riscv/include/asm/cacheflush.h | 6 ++++++
> > > >>>   arch/riscv/kernel/ftrace.c          | 3 +++
> > > >>>   2 files changed, 9 insertions(+)
> > > >>>
> > > >>> diff --git a/arch/riscv/include/asm/cacheflush.h
> > > >>> b/arch/riscv/include/asm/cacheflush.h
> > > >>> index dd8d07146116..ce79c558a4c8 100644
> > > >>> --- a/arch/riscv/include/asm/cacheflush.h
> > > >>> +++ b/arch/riscv/include/asm/cacheflush.h
> > > >>> @@ -13,6 +13,12 @@ static inline void local_flush_icache_all(void)
> > > >>>       asm volatile ("fence.i" ::: "memory");
> > > >>>   }
> > > >>>   +static inline void local_flush_icache_range(unsigned long start,
> > > >>> +                        unsigned long end)
> > > >>> +{
> > > >>> +    local_flush_icache_all();
> > > >>> +}
> > > >>> +
> > > >>>   #define PG_dcache_clean PG_arch_1
> > > >>>     static inline void flush_dcache_folio(struct folio *folio)
> > > >>> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
> > > >>> index 4f4987a6d83d..32e7c401dfb4 100644
> > > >>> --- a/arch/riscv/kernel/ftrace.c
> > > >>> +++ b/arch/riscv/kernel/ftrace.c
> > > >>> @@ -120,6 +120,9 @@ int ftrace_init_nop(struct module *mod, struct
> > > >>> dyn_ftrace *rec)
> > > >>>       out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
> > > >>>       mutex_unlock(&text_mutex);
> > > >> So, turns out that this patch is not sufficient. I've seen some more
> > > >> crashes, seemingly due to initialising nops on this mutex_unlock().
> > > >> Palmer suggested moving the if (!mod) ... into the lock, which fixed
> > > >> the problem for me.
> > > >
> > > >
> > > > Ok, it makes sense, I completely missed that. I'll send a fix for that
> > > > shortly so that it can be merged in rc5.
> > > >
> > > > Thanks,
> > > >
> > > > Alex
> > >
> > >
> > > So I digged a bit more and I'm afraid that the only way to make sure
> > > this issue does not happen elsewhere is to flush the icache right after
> > > the patching. We actually can't wait to batch the icache flush since
> > > along the way, we may call a function that has just been patched (the
> > > issue that you encountered here).
> >
> > Trying to provide my thoughts, please let me know if I missed
> > anything. I think what Conor suggested is safe for init_nop, as it
> > would be called only when there is only one core (booting) and at the
> > loading stage of kernel modules. In the first case we just have to
> > make sure there is no patchable entry before the core executes
> > fence.i. The second case is unconditionally safe because there is no
> > read-side of the race.
> >
> > It is a bit tricky for the generic (runtime) case of ftrace code
> > patching, but that is not because of the batch fence.i maintenance. As
> > long as there exists a patchable entry for the stopping thread, it is
> > possible for them to execute on a partially patched instruction. A
> > solution for this is again to prevent any patchable entry in the
> > stop_machine's stopping thread. Another solution is to apply the
> > atomic ftrace patching series which aims to get rid of the race.
>
> Yeah but my worry is that we can't make sure that functions called
> between the patching and the fence.i are not the ones that just get
> patched. At least today, patch_insn_write() etc should be marked as

IIUC, the compiler does not generate a patchable entry for
patch_insn_write, and so do all functions in patch.c. The entire file
is not compiled with patchable entry because the flag is removed in
riscv's Makefile. Please let me know if I misunderstand it.

> noinstr. But even then, we call core functions that could be patchable
> (I went through all and it *seems* we are safe *for now*, but this is
> very fragile).

Yes, functions in the "else" clause, atomic_read() etc, are safe for
now. However, it is impossible to fix as long as there exists a
patchable entry along the way. This is the problem that we cannot
patch 2 instructions with a concurrent read-side. On the other hand,
the path of ftrace_modify_all_code may experience the batch fence.i
maintenance issue, and can be fixed via a local fence.i. This is
because the path is executed by only one core. There does not exist a
concurrent write-side. As a result, we just need to make sure it
leaves each patch-site with a local fence.i to make sure code is
up-to-date.

>
> Then what do you think we should do to fix Conor's issue right now:
> should we mark the riscv specific functions as noinstr, cross fingers
> that the core functions are not (and don't become) patchable and wait
> for your atomic patching? Or should we flush as soon as possible as I
> proposed above (which to me fixes the issue but hurts performance)?

Either way works for me. IMHO, the fix should include:

1. move fence.i before mutex_unlock in init_nop
2. do local fence.i in __ftrace_modify_call
3. make sure the else clause does not happen to have a patchable entry

Thanks,
Andy

>
> Thanks,
>
> Alex
>
> >
> > >
> > > I don't know how much it will impact the performance but I guess it will.
> > >
> > > Unless someone has a better idea (I added Andy and Puranjay in cc), here
> > > is the patch that implements this, can you give it a try? Thanks
> > >
> > > diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
> > > index 87cbd86576b2..4b95c574fd04 100644
> > > --- a/arch/riscv/kernel/ftrace.c
> > > +++ b/arch/riscv/kernel/ftrace.c
> > > @@ -120,9 +120,6 @@ int ftrace_init_nop(struct module *mod, struct
> > > dyn_ftrace *rec)
> > >          out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
> > >          mutex_unlock(&text_mutex);
> > >
> > > -       if (!mod)
> > > -               local_flush_icache_range(rec->ip, rec->ip +
> > > MCOUNT_INSN_SIZE);
> > > -
> > >          return out;
> > >   }
> > >
> > > @@ -156,9 +153,9 @@ static int __ftrace_modify_code(void *data)
> > >          } else {
> > >                  while (atomic_read(&param->cpu_count) <= num_online_cpus())
> > >                          cpu_relax();
> > > -       }
> > >
> > > -       local_flush_icache_all();
> > > +               local_flush_icache_all();
> > > +       }
> > >
> > >          return 0;
> > >   }
> > > diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
> > > index 4007563fb607..ab03732d06c4 100644
> > > --- a/arch/riscv/kernel/patch.c
> > > +++ b/arch/riscv/kernel/patch.c
> > > @@ -89,6 +89,14 @@ static int __patch_insn_set(void *addr, u8 c, size_t len)
> > >
> > >          memset(waddr, c, len);
> > >
> > > +       /*
> > > +        * We could have just patched a function that is about to be
> > > +        * called so make sure we don't execute partially patched
> > > +        * instructions by flushing the icache as soon as possible.
> > > +        */
> > > +       local_flush_icache_range((unsigned long)waddr,
> > > +                                (unsigned long)waddr + len);
> > > +
> > >          patch_unmap(FIX_TEXT_POKE0);
> > >
> > >          if (across_pages)
> > > @@ -135,6 +143,14 @@ static int __patch_insn_write(void *addr, const
> > > void *insn, size_t len)
> > >
> > >          ret = copy_to_kernel_nofault(waddr, insn, len);
> > >
> > > +       /*
> > > +        * We could have just patched a function that is about to be
> > > +        * called so make sure we don't execute partially patched
> > > +        * instructions by flushing the icache as soon as possible.
> > > +        */
> > > +       local_flush_icache_range((unsigned long)waddr,
> > > +                                (unsigned long)waddr + len);
> > > +
> > >          patch_unmap(FIX_TEXT_POKE0);
> > >
> > >          if (across_pages)
> > > @@ -189,9 +205,6 @@ int patch_text_set_nosync(void *addr, u8 c, size_t len)
> > >
> > >          ret = patch_insn_set(tp, c, len);
> > >
> > > -       if (!ret)
> > > -               flush_icache_range((uintptr_t)tp, (uintptr_t)tp + len);
> > > -
> > >          return ret;
> > >   }
> > >   NOKPROBE_SYMBOL(patch_text_set_nosync);
> > > @@ -224,9 +237,6 @@ int patch_text_nosync(void *addr, const void *insns,
> > > size_t len)
> > >
> > >          ret = patch_insn_write(tp, insns, len);
> > >
> > > -       if (!ret)
> > > -               flush_icache_range((uintptr_t) tp, (uintptr_t) tp + len);
> > > -
> > >          return ret;
> > >   }
> > >   NOKPROBE_SYMBOL(patch_text_nosync);
> > > @@ -253,9 +263,9 @@ static int patch_text_cb(void *data)
> > >          } else {
> > >                  while (atomic_read(&patch->cpu_count) <= num_online_cpus())
> > >                          cpu_relax();
> > > -       }
> > >
> > > -       local_flush_icache_all();
> > > +               local_flush_icache_all();
> > > +       }
> > >
> > >          return ret;
> > >   }
> > >
> > >
> > > >
> > > >
> > > >>
> > > >>>   +    if (!mod)
> > > >>> +        local_flush_icache_range(rec->ip, rec->ip + MCOUNT_INSN_SIZE);
> > > >>> +
> > > >>>       return out;
> > > >>>   }
> > > >>>   --
> > > >>> 2.39.2
> > > >>>
> > > >>>
> > > >>> _______________________________________________
> > > >>> linux-riscv mailing list
> > > >>> linux-riscv@lists.infradead.org
> > > >>> http://lists.infradead.org/mailman/listinfo/linux-riscv
> > > >
> > > > _______________________________________________
> > > > linux-riscv mailing list
> > > > linux-riscv@lists.infradead.org
> > > > http://lists.infradead.org/mailman/listinfo/linux-riscv
> >
> > Cheers,
> > Andy
Alexandre Ghiti June 20, 2024, 5:03 p.m. UTC | #10
On 19/06/2024 05:40, Andy Chiu wrote:
> On Tue, Jun 18, 2024 at 9:40 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
>> Hi Andy,
>>
>> On Tue, Jun 18, 2024 at 2:48 PM Andy Chiu <andy.chiu@sifive.com> wrote:
>>> On Tue, Jun 18, 2024 at 8:02 PM Alexandre Ghiti <alex@ghiti.fr> wrote:
>>>> Hi Conor,
>>>>
>>>> On 17/06/2024 15:23, Alexandre Ghiti wrote:
>>>>> Hi Conor,
>>>>>
>>>>> Sorry for the delay here.
>>>>>
>>>>> On 13/06/2024 09:48, Conor Dooley wrote:
>>>>>> On Thu, May 23, 2024 at 01:51:34PM +0200, Alexandre Ghiti wrote:
>>>>>>> Commit c97bf629963e ("riscv: Fix text patching when IPI are used")
>>>>>>> converted ftrace_make_nop() to use patch_insn_write() which does not
>>>>>>> emit any icache flush relying entirely on __ftrace_modify_code() to do
>>>>>>> that.
>>>>>>>
>>>>>>> But we missed that ftrace_make_nop() was called very early directly
>>>>>>> when
>>>>>>> converting mcount calls into nops (actually on riscv it converts 2B
>>>>>>> nops
>>>>>>> emitted by the compiler into 4B nops).
>>>>>>>
>>>>>>> This caused crashes on multiple HW as reported by Conor and Björn since
>>>>>>> the booting core could have half-patched instructions in its icache
>>>>>>> which would trigger an illegal instruction trap: fix this by emitting a
>>>>>>> local flush icache when early patching nops.
>>>>>>>
>>>>>>> Fixes: c97bf629963e ("riscv: Fix text patching when IPI are used")
>>>>>>> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
>>>>>>> ---
>>>>>>>    arch/riscv/include/asm/cacheflush.h | 6 ++++++
>>>>>>>    arch/riscv/kernel/ftrace.c          | 3 +++
>>>>>>>    2 files changed, 9 insertions(+)
>>>>>>>
>>>>>>> diff --git a/arch/riscv/include/asm/cacheflush.h
>>>>>>> b/arch/riscv/include/asm/cacheflush.h
>>>>>>> index dd8d07146116..ce79c558a4c8 100644
>>>>>>> --- a/arch/riscv/include/asm/cacheflush.h
>>>>>>> +++ b/arch/riscv/include/asm/cacheflush.h
>>>>>>> @@ -13,6 +13,12 @@ static inline void local_flush_icache_all(void)
>>>>>>>        asm volatile ("fence.i" ::: "memory");
>>>>>>>    }
>>>>>>>    +static inline void local_flush_icache_range(unsigned long start,
>>>>>>> +                        unsigned long end)
>>>>>>> +{
>>>>>>> +    local_flush_icache_all();
>>>>>>> +}
>>>>>>> +
>>>>>>>    #define PG_dcache_clean PG_arch_1
>>>>>>>      static inline void flush_dcache_folio(struct folio *folio)
>>>>>>> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
>>>>>>> index 4f4987a6d83d..32e7c401dfb4 100644
>>>>>>> --- a/arch/riscv/kernel/ftrace.c
>>>>>>> +++ b/arch/riscv/kernel/ftrace.c
>>>>>>> @@ -120,6 +120,9 @@ int ftrace_init_nop(struct module *mod, struct
>>>>>>> dyn_ftrace *rec)
>>>>>>>        out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
>>>>>>>        mutex_unlock(&text_mutex);
>>>>>> So, turns out that this patch is not sufficient. I've seen some more
>>>>>> crashes, seemingly due to initialising nops on this mutex_unlock().
>>>>>> Palmer suggested moving the if (!mod) ... into the lock, which fixed
>>>>>> the problem for me.
>>>>>
>>>>> Ok, it makes sense, I completely missed that. I'll send a fix for that
>>>>> shortly so that it can be merged in rc5.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Alex
>>>>
>>>> So I digged a bit more and I'm afraid that the only way to make sure
>>>> this issue does not happen elsewhere is to flush the icache right after
>>>> the patching. We actually can't wait to batch the icache flush since
>>>> along the way, we may call a function that has just been patched (the
>>>> issue that you encountered here).
>>> Trying to provide my thoughts, please let me know if I missed
>>> anything. I think what Conor suggested is safe for init_nop, as it
>>> would be called only when there is only one core (booting) and at the
>>> loading stage of kernel modules. In the first case we just have to
>>> make sure there is no patchable entry before the core executes
>>> fence.i. The second case is unconditionally safe because there is no
>>> read-side of the race.
>>>
>>> It is a bit tricky for the generic (runtime) case of ftrace code
>>> patching, but that is not because of the batch fence.i maintenance. As
>>> long as there exists a patchable entry for the stopping thread, it is
>>> possible for them to execute on a partially patched instruction. A
>>> solution for this is again to prevent any patchable entry in the
>>> stop_machine's stopping thread. Another solution is to apply the
>>> atomic ftrace patching series which aims to get rid of the race.
>> Yeah but my worry is that we can't make sure that functions called
>> between the patching and the fence.i are not the ones that just get
>> patched. At least today, patch_insn_write() etc should be marked as
> IIUC, the compiler does not generate a patchable entry for
> patch_insn_write, and so do all functions in patch.c. The entire file
> is not compiled with patchable entry because the flag is removed in
> riscv's Makefile. Please let me know if I misunderstand it.
>
>> noinstr. But even then, we call core functions that could be patchable
>> (I went through all and it *seems* we are safe *for now*, but this is
>> very fragile).
> Yes, functions in the "else" clause, atomic_read() etc, are safe for
> now. However, it is impossible to fix as long as there exists a
> patchable entry along the way. This is the problem that we cannot
> patch 2 instructions with a concurrent read-side. On the other hand,
> the path of ftrace_modify_all_code may experience the batch fence.i
> maintenance issue, and can be fixed via a local fence.i. This is
> because the path is executed by only one core. There does not exist a
> concurrent write-side. As a result, we just need to make sure it
> leaves each patch-site with a local fence.i to make sure code is
> up-to-date.
>
>> Then what do you think we should do to fix Conor's issue right now:
>> should we mark the riscv specific functions as noinstr, cross fingers
>> that the core functions are not (and don't become) patchable and wait
>> for your atomic patching? Or should we flush as soon as possible as I
>> proposed above (which to me fixes the issue but hurts performance)?
> Either way works for me. IMHO, the fix should include:
>
> 1. move fence.i before mutex_unlock in init_nop
> 2. do local fence.i in __ftrace_modify_call
> 3. make sure the else clause does not happen to have a patchable entry


OK I have just checked again and I agree with you on the whole fix :) I 
may add some comments for 3 so that it is very clear.

Thanks Andy!

Alex


>
> Thanks,
> Andy
>
>> Thanks,
>>
>> Alex
>>
>>>> I don't know how much it will impact the performance but I guess it will.
>>>>
>>>> Unless someone has a better idea (I added Andy and Puranjay in cc), here
>>>> is the patch that implements this, can you give it a try? Thanks
>>>>
>>>> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
>>>> index 87cbd86576b2..4b95c574fd04 100644
>>>> --- a/arch/riscv/kernel/ftrace.c
>>>> +++ b/arch/riscv/kernel/ftrace.c
>>>> @@ -120,9 +120,6 @@ int ftrace_init_nop(struct module *mod, struct
>>>> dyn_ftrace *rec)
>>>>           out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
>>>>           mutex_unlock(&text_mutex);
>>>>
>>>> -       if (!mod)
>>>> -               local_flush_icache_range(rec->ip, rec->ip +
>>>> MCOUNT_INSN_SIZE);
>>>> -
>>>>           return out;
>>>>    }
>>>>
>>>> @@ -156,9 +153,9 @@ static int __ftrace_modify_code(void *data)
>>>>           } else {
>>>>                   while (atomic_read(&param->cpu_count) <= num_online_cpus())
>>>>                           cpu_relax();
>>>> -       }
>>>>
>>>> -       local_flush_icache_all();
>>>> +               local_flush_icache_all();
>>>> +       }
>>>>
>>>>           return 0;
>>>>    }
>>>> diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
>>>> index 4007563fb607..ab03732d06c4 100644
>>>> --- a/arch/riscv/kernel/patch.c
>>>> +++ b/arch/riscv/kernel/patch.c
>>>> @@ -89,6 +89,14 @@ static int __patch_insn_set(void *addr, u8 c, size_t len)
>>>>
>>>>           memset(waddr, c, len);
>>>>
>>>> +       /*
>>>> +        * We could have just patched a function that is about to be
>>>> +        * called so make sure we don't execute partially patched
>>>> +        * instructions by flushing the icache as soon as possible.
>>>> +        */
>>>> +       local_flush_icache_range((unsigned long)waddr,
>>>> +                                (unsigned long)waddr + len);
>>>> +
>>>>           patch_unmap(FIX_TEXT_POKE0);
>>>>
>>>>           if (across_pages)
>>>> @@ -135,6 +143,14 @@ static int __patch_insn_write(void *addr, const
>>>> void *insn, size_t len)
>>>>
>>>>           ret = copy_to_kernel_nofault(waddr, insn, len);
>>>>
>>>> +       /*
>>>> +        * We could have just patched a function that is about to be
>>>> +        * called so make sure we don't execute partially patched
>>>> +        * instructions by flushing the icache as soon as possible.
>>>> +        */
>>>> +       local_flush_icache_range((unsigned long)waddr,
>>>> +                                (unsigned long)waddr + len);
>>>> +
>>>>           patch_unmap(FIX_TEXT_POKE0);
>>>>
>>>>           if (across_pages)
>>>> @@ -189,9 +205,6 @@ int patch_text_set_nosync(void *addr, u8 c, size_t len)
>>>>
>>>>           ret = patch_insn_set(tp, c, len);
>>>>
>>>> -       if (!ret)
>>>> -               flush_icache_range((uintptr_t)tp, (uintptr_t)tp + len);
>>>> -
>>>>           return ret;
>>>>    }
>>>>    NOKPROBE_SYMBOL(patch_text_set_nosync);
>>>> @@ -224,9 +237,6 @@ int patch_text_nosync(void *addr, const void *insns,
>>>> size_t len)
>>>>
>>>>           ret = patch_insn_write(tp, insns, len);
>>>>
>>>> -       if (!ret)
>>>> -               flush_icache_range((uintptr_t) tp, (uintptr_t) tp + len);
>>>> -
>>>>           return ret;
>>>>    }
>>>>    NOKPROBE_SYMBOL(patch_text_nosync);
>>>> @@ -253,9 +263,9 @@ static int patch_text_cb(void *data)
>>>>           } else {
>>>>                   while (atomic_read(&patch->cpu_count) <= num_online_cpus())
>>>>                           cpu_relax();
>>>> -       }
>>>>
>>>> -       local_flush_icache_all();
>>>> +               local_flush_icache_all();
>>>> +       }
>>>>
>>>>           return ret;
>>>>    }
>>>>
>>>>
>>>>>
>>>>>>>    +    if (!mod)
>>>>>>> +        local_flush_icache_range(rec->ip, rec->ip + MCOUNT_INSN_SIZE);
>>>>>>> +
>>>>>>>        return out;
>>>>>>>    }
>>>>>>>    --
>>>>>>> 2.39.2
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> linux-riscv mailing list
>>>>>>> linux-riscv@lists.infradead.org
>>>>>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>>>>> _______________________________________________
>>>>> linux-riscv mailing list
>>>>> linux-riscv@lists.infradead.org
>>>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>>> Cheers,
>>> Andy
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
diff mbox series

Patch

diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h
index dd8d07146116..ce79c558a4c8 100644
--- a/arch/riscv/include/asm/cacheflush.h
+++ b/arch/riscv/include/asm/cacheflush.h
@@ -13,6 +13,12 @@  static inline void local_flush_icache_all(void)
 	asm volatile ("fence.i" ::: "memory");
 }
 
+static inline void local_flush_icache_range(unsigned long start,
+					    unsigned long end)
+{
+	local_flush_icache_all();
+}
+
 #define PG_dcache_clean PG_arch_1
 
 static inline void flush_dcache_folio(struct folio *folio)
diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
index 4f4987a6d83d..32e7c401dfb4 100644
--- a/arch/riscv/kernel/ftrace.c
+++ b/arch/riscv/kernel/ftrace.c
@@ -120,6 +120,9 @@  int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec)
 	out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
 	mutex_unlock(&text_mutex);
 
+	if (!mod)
+		local_flush_icache_range(rec->ip, rec->ip + MCOUNT_INSN_SIZE);
+
 	return out;
 }