diff mbox series

RISC-V: only flush icache when it has VM_EXEC set

Message ID tencent_6D851035F6F2FD0B5A69FB391AE39AC6300A@qq.com (mailing list archive)
State Accepted
Commit 542124fc0d5cccea273bccf2ee58545ac17aad3f
Headers show
Series RISC-V: only flush icache when it has VM_EXEC set | expand

Checks

Context Check Description
conchuod/vmtest-for-next-PR success PR summary
conchuod/patch-1-test-1 success .github/scripts/patches/tests/build_rv32_defconfig.sh
conchuod/patch-1-test-2 success .github/scripts/patches/tests/build_rv64_clang_allmodconfig.sh
conchuod/patch-1-test-3 success .github/scripts/patches/tests/build_rv64_gcc_allmodconfig.sh
conchuod/patch-1-test-4 success .github/scripts/patches/tests/build_rv64_nommu_k210_defconfig.sh
conchuod/patch-1-test-5 success .github/scripts/patches/tests/build_rv64_nommu_virt_defconfig.sh
conchuod/patch-1-test-6 warning .github/scripts/patches/tests/checkpatch.sh
conchuod/patch-1-test-7 success .github/scripts/patches/tests/dtb_warn_rv64.sh
conchuod/patch-1-test-8 success .github/scripts/patches/tests/header_inline.sh
conchuod/patch-1-test-9 success .github/scripts/patches/tests/kdoc.sh
conchuod/patch-1-test-10 success .github/scripts/patches/tests/module_param.sh
conchuod/patch-1-test-11 success .github/scripts/patches/tests/verify_fixes.sh
conchuod/patch-1-test-12 success .github/scripts/patches/tests/verify_signedoff.sh

Commit Message

Yangyu Chen Jan. 9, 2024, 6:48 p.m. UTC
As I-Cache flush on current RISC-V needs to send IPIs to every CPU cores
in the system is very costly, limiting flush_icache_mm to be called only
when vma->vm_flags has VM_EXEC can help minimize the frequency of these
operations. It improves performance and reduces disturbances when
copy_from_user_page is needed such as profiling with perf.

For I-D coherence concerns, it will not fail if such a page adds VM_EXEC
flags in the future since we have checked it in the __set_pte_at function.

Signed-off-by: Yangyu Chen <cyy@cyyself.name>
---
 arch/riscv/include/asm/cacheflush.h | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Alexandre Ghiti Jan. 10, 2024, 6:47 a.m. UTC | #1
Hi Yangyu,

On 09/01/2024 19:48, Yangyu Chen wrote:
> As I-Cache flush on current RISC-V needs to send IPIs to every CPU cores
> in the system is very costly, limiting flush_icache_mm to be called only
> when vma->vm_flags has VM_EXEC can help minimize the frequency of these
> operations. It improves performance and reduces disturbances when


Which platform did you test this patchset? Do you have any measurement 
of the performance improvements?


> copy_from_user_page is needed such as profiling with perf.
>
> For I-D coherence concerns, it will not fail if such a page adds VM_EXEC
> flags in the future since we have checked it in the __set_pte_at function.
>
> Signed-off-by: Yangyu Chen <cyy@cyyself.name>
> ---
>   arch/riscv/include/asm/cacheflush.h | 7 +++++--
>   1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h
> index 3cb53c4df27c..915f532dc336 100644
> --- a/arch/riscv/include/asm/cacheflush.h
> +++ b/arch/riscv/include/asm/cacheflush.h
> @@ -33,8 +33,11 @@ static inline void flush_dcache_page(struct page *page)
>    * so instead we just flush the whole thing.
>    */
>   #define flush_icache_range(start, end) flush_icache_all()
> -#define flush_icache_user_page(vma, pg, addr, len) \
> -	flush_icache_mm(vma->vm_mm, 0)
> +#define flush_icache_user_page(vma, pg, addr, len)	\
> +do {							\
> +	if (vma->vm_flags & VM_EXEC)			\
> +		flush_icache_mm(vma->vm_mm, 0);		\
> +} while (0)
>   
>   #ifdef CONFIG_64BIT
>   #define flush_cache_vmap(start, end)	flush_tlb_kernel_range(start, end)


Otherwise, it looks good to me, so you can add:

Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>

Thanks,

Alex
Jisheng Zhang Jan. 15, 2024, 9:27 a.m. UTC | #2
On Wed, Jan 10, 2024 at 02:48:59AM +0800, Yangyu Chen wrote:
> As I-Cache flush on current RISC-V needs to send IPIs to every CPU cores
> in the system is very costly, limiting flush_icache_mm to be called only
> when vma->vm_flags has VM_EXEC can help minimize the frequency of these
> operations. It improves performance and reduces disturbances when
> copy_from_user_page is needed such as profiling with perf.

Hi Yangyu,

Since this is for "performance", can you please add the benchmark, test
platform and performance numbers in your commit msg?

thanks

> 
> For I-D coherence concerns, it will not fail if such a page adds VM_EXEC
> flags in the future since we have checked it in the __set_pte_at function.
> 
> Signed-off-by: Yangyu Chen <cyy@cyyself.name>
> ---
>  arch/riscv/include/asm/cacheflush.h | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h
> index 3cb53c4df27c..915f532dc336 100644
> --- a/arch/riscv/include/asm/cacheflush.h
> +++ b/arch/riscv/include/asm/cacheflush.h
> @@ -33,8 +33,11 @@ static inline void flush_dcache_page(struct page *page)
>   * so instead we just flush the whole thing.
>   */
>  #define flush_icache_range(start, end) flush_icache_all()
> -#define flush_icache_user_page(vma, pg, addr, len) \
> -	flush_icache_mm(vma->vm_mm, 0)
> +#define flush_icache_user_page(vma, pg, addr, len)	\
> +do {							\
> +	if (vma->vm_flags & VM_EXEC)			\
> +		flush_icache_mm(vma->vm_mm, 0);		\
> +} while (0)
>  
>  #ifdef CONFIG_64BIT
>  #define flush_cache_vmap(start, end)	flush_tlb_kernel_range(start, end)
> -- 
> 2.43.0
>
Palmer Dabbelt March 20, 2024, 12:48 a.m. UTC | #3
On Tue, 09 Jan 2024 10:48:59 PST (-0800), cyy@cyyself.name wrote:
> As I-Cache flush on current RISC-V needs to send IPIs to every CPU cores
> in the system is very costly, limiting flush_icache_mm to be called only
> when vma->vm_flags has VM_EXEC can help minimize the frequency of these
> operations. It improves performance and reduces disturbances when
> copy_from_user_page is needed such as profiling with perf.
>
> For I-D coherence concerns, it will not fail if such a page adds VM_EXEC
> flags in the future since we have checked it in the __set_pte_at function.
>
> Signed-off-by: Yangyu Chen <cyy@cyyself.name>
> ---
>  arch/riscv/include/asm/cacheflush.h | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h
> index 3cb53c4df27c..915f532dc336 100644
> --- a/arch/riscv/include/asm/cacheflush.h
> +++ b/arch/riscv/include/asm/cacheflush.h
> @@ -33,8 +33,11 @@ static inline void flush_dcache_page(struct page *page)
>   * so instead we just flush the whole thing.
>   */
>  #define flush_icache_range(start, end) flush_icache_all()
> -#define flush_icache_user_page(vma, pg, addr, len) \
> -	flush_icache_mm(vma->vm_mm, 0)
> +#define flush_icache_user_page(vma, pg, addr, len)	\
> +do {							\
> +	if (vma->vm_flags & VM_EXEC)			\
> +		flush_icache_mm(vma->vm_mm, 0);		\
> +} while (0)
>
>  #ifdef CONFIG_64BIT
>  #define flush_cache_vmap(start, end)	flush_tlb_kernel_range(start, end)

I'm not super worried about the benchmarks, I think we can just 
open-loop assume this is faster by avoiding the flushes.  I do think we 
need a hook into at least tlb_update_vma_flags(), though, to insert the 
fence.i when upgrading a mapping to include VM_EXEC.
Alexandre Ghiti March 22, 2024, 3:50 p.m. UTC | #4
On Wed, Mar 20, 2024 at 1:48 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>
> On Tue, 09 Jan 2024 10:48:59 PST (-0800), cyy@cyyself.name wrote:
> > As I-Cache flush on current RISC-V needs to send IPIs to every CPU cores
> > in the system is very costly, limiting flush_icache_mm to be called only
> > when vma->vm_flags has VM_EXEC can help minimize the frequency of these
> > operations. It improves performance and reduces disturbances when
> > copy_from_user_page is needed such as profiling with perf.
> >
> > For I-D coherence concerns, it will not fail if such a page adds VM_EXEC
> > flags in the future since we have checked it in the __set_pte_at function.
> >
> > Signed-off-by: Yangyu Chen <cyy@cyyself.name>
> > ---
> >  arch/riscv/include/asm/cacheflush.h | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h
> > index 3cb53c4df27c..915f532dc336 100644
> > --- a/arch/riscv/include/asm/cacheflush.h
> > +++ b/arch/riscv/include/asm/cacheflush.h
> > @@ -33,8 +33,11 @@ static inline void flush_dcache_page(struct page *page)
> >   * so instead we just flush the whole thing.
> >   */
> >  #define flush_icache_range(start, end) flush_icache_all()
> > -#define flush_icache_user_page(vma, pg, addr, len) \
> > -     flush_icache_mm(vma->vm_mm, 0)
> > +#define flush_icache_user_page(vma, pg, addr, len)   \
> > +do {                                                 \
> > +     if (vma->vm_flags & VM_EXEC)                    \
> > +             flush_icache_mm(vma->vm_mm, 0);         \
> > +} while (0)
> >
> >  #ifdef CONFIG_64BIT
> >  #define flush_cache_vmap(start, end) flush_tlb_kernel_range(start, end)
>
> I'm not super worried about the benchmarks, I think we can just
> open-loop assume this is faster by avoiding the flushes.  I do think we
> need a hook into at least tlb_update_vma_flags(), though, to insert the
> fence.i when upgrading a mapping to include VM_EXEC.

I'd say Yangyu is right when he mentions in the commit log: "For I-D
coherence concerns, it will not fail if such a page adds VM_EXEC flags
in the future since we have checked it in the __set_pte_at function.".
If a region indeed becomes executable, the page table will be modified
to reflect that and then will pass in __set_pte_at() which, as Yangyu
mentions, does the right thing. Or am I missing something?

Thanks,

Alex
Yangyu Chen March 23, 2024, 12:39 p.m. UTC | #5
> On Mar 22, 2024, at 23:50, Alexandre Ghiti <alexghiti@rivosinc.com> wrote:
> 
> On Wed, Mar 20, 2024 at 1:48 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>> 
>> On Tue, 09 Jan 2024 10:48:59 PST (-0800), cyy@cyyself.name wrote:
>>> As I-Cache flush on current RISC-V needs to send IPIs to every CPU cores
>>> in the system is very costly, limiting flush_icache_mm to be called only
>>> when vma->vm_flags has VM_EXEC can help minimize the frequency of these
>>> operations. It improves performance and reduces disturbances when
>>> copy_from_user_page is needed such as profiling with perf.
>>> 
>>> For I-D coherence concerns, it will not fail if such a page adds VM_EXEC
>>> flags in the future since we have checked it in the __set_pte_at function.
>>> 
>>> Signed-off-by: Yangyu Chen <cyy@cyyself.name>
>>> ---
>>> arch/riscv/include/asm/cacheflush.h | 7 +++++--
>>> 1 file changed, 5 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h
>>> index 3cb53c4df27c..915f532dc336 100644
>>> --- a/arch/riscv/include/asm/cacheflush.h
>>> +++ b/arch/riscv/include/asm/cacheflush.h
>>> @@ -33,8 +33,11 @@ static inline void flush_dcache_page(struct page *page)
>>>  * so instead we just flush the whole thing.
>>>  */
>>> #define flush_icache_range(start, end) flush_icache_all()
>>> -#define flush_icache_user_page(vma, pg, addr, len) \
>>> -     flush_icache_mm(vma->vm_mm, 0)
>>> +#define flush_icache_user_page(vma, pg, addr, len)   \
>>> +do {                                                 \
>>> +     if (vma->vm_flags & VM_EXEC)                    \
>>> +             flush_icache_mm(vma->vm_mm, 0);         \
>>> +} while (0)
>>> 
>>> #ifdef CONFIG_64BIT
>>> #define flush_cache_vmap(start, end) flush_tlb_kernel_range(start, end)
>> 
>> I'm not super worried about the benchmarks, I think we can just
>> open-loop assume this is faster by avoiding the flushes.  I do think we
>> need a hook into at least tlb_update_vma_flags(), though, to insert the
>> fence.i when upgrading a mapping to include VM_EXEC.
> 
> I'd say Yangyu is right when he mentions in the commit log: "For I-D
> coherence concerns, it will not fail if such a page adds VM_EXEC flags
> in the future since we have checked it in the __set_pte_at function.".
> If a region indeed becomes executable, the page table will be modified
> to reflect that and then will pass in __set_pte_at() which, as Yangyu
> mentions, does the right thing. Or am I missing something?
> 

I think so. Unless we have any other way to update PTE rather than
using __set_pte_at, I think it’s safe. I’m too busy these days to
provide a micro enough benchmark.

Thanks,
Yangyu Chen
patchwork-bot+linux-riscv@kernel.org April 10, 2024, 2:10 p.m. UTC | #6
Hello:

This patch was applied to riscv/linux.git (for-next)
by Palmer Dabbelt <palmer@rivosinc.com>:

On Wed, 10 Jan 2024 02:48:59 +0800 you wrote:
> As I-Cache flush on current RISC-V needs to send IPIs to every CPU cores
> in the system is very costly, limiting flush_icache_mm to be called only
> when vma->vm_flags has VM_EXEC can help minimize the frequency of these
> operations. It improves performance and reduces disturbances when
> copy_from_user_page is needed such as profiling with perf.
> 
> For I-D coherence concerns, it will not fail if such a page adds VM_EXEC
> flags in the future since we have checked it in the __set_pte_at function.
> 
> [...]

Here is the summary with links:
  - RISC-V: only flush icache when it has VM_EXEC set
    https://git.kernel.org/riscv/c/542124fc0d5c

You are awesome, thank you!
diff mbox series

Patch

diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h
index 3cb53c4df27c..915f532dc336 100644
--- a/arch/riscv/include/asm/cacheflush.h
+++ b/arch/riscv/include/asm/cacheflush.h
@@ -33,8 +33,11 @@  static inline void flush_dcache_page(struct page *page)
  * so instead we just flush the whole thing.
  */
 #define flush_icache_range(start, end) flush_icache_all()
-#define flush_icache_user_page(vma, pg, addr, len) \
-	flush_icache_mm(vma->vm_mm, 0)
+#define flush_icache_user_page(vma, pg, addr, len)	\
+do {							\
+	if (vma->vm_flags & VM_EXEC)			\
+		flush_icache_mm(vma->vm_mm, 0);		\
+} while (0)
 
 #ifdef CONFIG_64BIT
 #define flush_cache_vmap(start, end)	flush_tlb_kernel_range(start, end)