diff mbox series

[bpf-next,v2,6/9] riscv, bpf: provide RISC-V specific JIT image alloc/free

Message ID 20191216091343.23260-7-bjorn.topel@gmail.com (mailing list archive)
State New, archived
Headers show
Series riscv: BPF JIT fix, optimizations and far jumps support | expand

Commit Message

Björn Töpel Dec. 16, 2019, 9:13 a.m. UTC
This commit makes sure that the JIT images is kept close to the kernel
text, so BPF calls can use relative calling with auipc/jalr or jal
instead of loading the full 64-bit address and jalr.

The BPF JIT image region is 128 MB before the kernel text.

Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>
---
 arch/riscv/include/asm/pgtable.h |  4 ++++
 arch/riscv/net/bpf_jit_comp.c    | 13 +++++++++++++
 2 files changed, 17 insertions(+)

Comments

Daniel Borkmann Dec. 16, 2019, 3:09 p.m. UTC | #1
On 12/16/19 10:13 AM, Björn Töpel wrote:
> This commit makes sure that the JIT images is kept close to the kernel
> text, so BPF calls can use relative calling with auipc/jalr or jal
> instead of loading the full 64-bit address and jalr.
> 
> The BPF JIT image region is 128 MB before the kernel text.
> 
> Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>
> ---
>   arch/riscv/include/asm/pgtable.h |  4 ++++
>   arch/riscv/net/bpf_jit_comp.c    | 13 +++++++++++++
>   2 files changed, 17 insertions(+)
> 
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index 7ff0ed4f292e..cc3f49415620 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -404,6 +404,10 @@ static inline int ptep_clear_flush_young(struct vm_area_struct *vma,
>   #define VMALLOC_END      (PAGE_OFFSET - 1)
>   #define VMALLOC_START    (PAGE_OFFSET - VMALLOC_SIZE)
>   
> +#define BPF_JIT_REGION_SIZE	(SZ_128M)
> +#define BPF_JIT_REGION_START	(PAGE_OFFSET - BPF_JIT_REGION_SIZE)
> +#define BPF_JIT_REGION_END	(VMALLOC_END)
> +

Series looks good to me, thanks; I'd like to get an ACK from Palmer/others on this one.

>   /*
>    * Roughly size the vmemmap space to be large enough to fit enough
>    * struct pages to map half the virtual address space. Then
> diff --git a/arch/riscv/net/bpf_jit_comp.c b/arch/riscv/net/bpf_jit_comp.c
> index 8aa19c846881..46cff093f526 100644
> --- a/arch/riscv/net/bpf_jit_comp.c
> +++ b/arch/riscv/net/bpf_jit_comp.c
> @@ -1656,3 +1656,16 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
>   					   tmp : orig_prog);
>   	return prog;
>   }
> +
> +void *bpf_jit_alloc_exec(unsigned long size)
> +{
> +	return __vmalloc_node_range(size, PAGE_SIZE, BPF_JIT_REGION_START,
> +				    BPF_JIT_REGION_END, GFP_KERNEL,
> +				    PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
> +				    __builtin_return_address(0));
> +}
> +
> +void bpf_jit_free_exec(void *addr)
> +{
> +	return vfree(addr);
> +}
>
Björn Töpel Dec. 18, 2019, 6:23 a.m. UTC | #2
On Mon, 16 Dec 2019 at 16:09, Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 12/16/19 10:13 AM, Björn Töpel wrote:
> > This commit makes sure that the JIT images is kept close to the kernel
> > text, so BPF calls can use relative calling with auipc/jalr or jal
> > instead of loading the full 64-bit address and jalr.
> >
> > The BPF JIT image region is 128 MB before the kernel text.
> >
> > Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>
> > ---
> >   arch/riscv/include/asm/pgtable.h |  4 ++++
> >   arch/riscv/net/bpf_jit_comp.c    | 13 +++++++++++++
> >   2 files changed, 17 insertions(+)
> >
> > diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> > index 7ff0ed4f292e..cc3f49415620 100644
> > --- a/arch/riscv/include/asm/pgtable.h
> > +++ b/arch/riscv/include/asm/pgtable.h
> > @@ -404,6 +404,10 @@ static inline int ptep_clear_flush_young(struct vm_area_struct *vma,
> >   #define VMALLOC_END      (PAGE_OFFSET - 1)
> >   #define VMALLOC_START    (PAGE_OFFSET - VMALLOC_SIZE)
> >
> > +#define BPF_JIT_REGION_SIZE  (SZ_128M)
> > +#define BPF_JIT_REGION_START (PAGE_OFFSET - BPF_JIT_REGION_SIZE)
> > +#define BPF_JIT_REGION_END   (VMALLOC_END)
> > +
>
> Series looks good to me, thanks; I'd like to get an ACK from Palmer/others on this one.
>

Palmer/Paul/Albert, any thoughts/input? I can respin the the series
w/o the call optimizations (excluding this patch and the next), but
I'd prefer not given it's a nice speed/clean up for calls.
Palmer Dabbelt Dec. 23, 2019, 6:30 p.m. UTC | #3
On Mon, 16 Dec 2019 01:13:40 PST (-0800), Bjorn Topel wrote:
> This commit makes sure that the JIT images is kept close to the kernel
> text, so BPF calls can use relative calling with auipc/jalr or jal
> instead of loading the full 64-bit address and jalr.
>
> The BPF JIT image region is 128 MB before the kernel text.
>
> Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>
> ---
>  arch/riscv/include/asm/pgtable.h |  4 ++++
>  arch/riscv/net/bpf_jit_comp.c    | 13 +++++++++++++
>  2 files changed, 17 insertions(+)
>
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index 7ff0ed4f292e..cc3f49415620 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -404,6 +404,10 @@ static inline int ptep_clear_flush_young(struct vm_area_struct *vma,
>  #define VMALLOC_END      (PAGE_OFFSET - 1)
>  #define VMALLOC_START    (PAGE_OFFSET - VMALLOC_SIZE)
>
> +#define BPF_JIT_REGION_SIZE	(SZ_128M)
> +#define BPF_JIT_REGION_START	(PAGE_OFFSET - BPF_JIT_REGION_SIZE)
> +#define BPF_JIT_REGION_END	(VMALLOC_END)
> +
>  /*
>   * Roughly size the vmemmap space to be large enough to fit enough
>   * struct pages to map half the virtual address space. Then
> diff --git a/arch/riscv/net/bpf_jit_comp.c b/arch/riscv/net/bpf_jit_comp.c
> index 8aa19c846881..46cff093f526 100644
> --- a/arch/riscv/net/bpf_jit_comp.c
> +++ b/arch/riscv/net/bpf_jit_comp.c
> @@ -1656,3 +1656,16 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
>  					   tmp : orig_prog);
>  	return prog;
>  }
> +
> +void *bpf_jit_alloc_exec(unsigned long size)
> +{
> +	return __vmalloc_node_range(size, PAGE_SIZE, BPF_JIT_REGION_START,
> +				    BPF_JIT_REGION_END, GFP_KERNEL,
> +				    PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
> +				    __builtin_return_address(0));
> +}
> +
> +void bpf_jit_free_exec(void *addr)
> +{
> +	return vfree(addr);
> +}

Ah, I guess I should have read the whole patch set :)

Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>

and feel free to put the same on whatever patch I just asked that question
on, though maybe this one should go before the others as they sort of depend on
it?
Paul Walmsley Jan. 4, 2020, 1:32 a.m. UTC | #4
On Wed, 18 Dec 2019, Björn Töpel wrote:

> On Mon, 16 Dec 2019 at 16:09, Daniel Borkmann <daniel@iogearbox.net> wrote:
> >
> > On 12/16/19 10:13 AM, Björn Töpel wrote:
> > > This commit makes sure that the JIT images is kept close to the kernel
> > > text, so BPF calls can use relative calling with auipc/jalr or jal
> > > instead of loading the full 64-bit address and jalr.
> > >
> > > The BPF JIT image region is 128 MB before the kernel text.
> > >
> > > Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>
> > > ---
> > >   arch/riscv/include/asm/pgtable.h |  4 ++++
> > >   arch/riscv/net/bpf_jit_comp.c    | 13 +++++++++++++
> > >   2 files changed, 17 insertions(+)
> > >
> > > diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> > > index 7ff0ed4f292e..cc3f49415620 100644
> > > --- a/arch/riscv/include/asm/pgtable.h
> > > +++ b/arch/riscv/include/asm/pgtable.h
> > > @@ -404,6 +404,10 @@ static inline int ptep_clear_flush_young(struct vm_area_struct *vma,
> > >   #define VMALLOC_END      (PAGE_OFFSET - 1)
> > >   #define VMALLOC_START    (PAGE_OFFSET - VMALLOC_SIZE)
> > >
> > > +#define BPF_JIT_REGION_SIZE  (SZ_128M)
> > > +#define BPF_JIT_REGION_START (PAGE_OFFSET - BPF_JIT_REGION_SIZE)
> > > +#define BPF_JIT_REGION_END   (VMALLOC_END)
> > > +
> >
> > Series looks good to me, thanks; I'd like to get an ACK from Palmer/others on this one.
> >
> 
> Palmer/Paul/Albert, any thoughts/input? I can respin the the series
> w/o the call optimizations (excluding this patch and the next), but
> I'd prefer not given it's a nice speed/clean up for calls.

Looks like Palmer's already reviewed it.  One additional request: could 
you update the VM layout debugging messages in arch/riscv/mm/init.c to 
include this new area?

thanks,

- Paul
Björn Töpel Jan. 7, 2020, 10:24 a.m. UTC | #5
Paul,

Back from the holidays. Sorry about the delay!

On Sat, 4 Jan 2020 at 02:32, Paul Walmsley <paul.walmsley@sifive.com> wrote:
>
[...]
>
> Looks like Palmer's already reviewed it.  One additional request: could
> you update the VM layout debugging messages in arch/riscv/mm/init.c to
> include this new area?
>

Sure, I'll send that as a follow-up! Related; Other archs, e.g. arm64,
has moved away from dumping the VM layout (commit 071929dbdd86
("arm64: Stop printing the virtual memory layout")), and instead rely
on _PTDUMP. Going forward that probably applies to riscv as well!


Cheers,
Björn
Paul Walmsley Jan. 7, 2020, 10:47 a.m. UTC | #6
On Tue, 7 Jan 2020, Björn Töpel wrote:

> On Sat, 4 Jan 2020 at 02:32, Paul Walmsley <paul.walmsley@sifive.com> wrote:
>
> > Looks like Palmer's already reviewed it.  One additional request: could
> > you update the VM layout debugging messages in arch/riscv/mm/init.c to
> > include this new area?
> 
> Sure, I'll send that as a follow-up! 

Thanks.

> Related; Other archs, e.g. arm64, has moved away from dumping the VM 
> layout (commit 071929dbdd86 ("arm64: Stop printing the virtual memory 
> layout")), and instead rely on _PTDUMP. Going forward that probably 
> applies to riscv as well!

For the specific case of the page table dumper: we're waiting for the 
generic ptdump patchset to be merged first - hopefully for v5.6.  The 
RISC-V integration patches against it were posted to the lists back in 
October.  But, to me, that targets a different use-case than the VM layout 
debug print code.


- Paul
Alexandre Ghiti Feb. 2, 2020, 1:37 p.m. UTC | #7
On 12/16/19 4:13 AM, Björn Töpel wrote:
> This commit makes sure that the JIT images is kept close to the kernel
> text, so BPF calls can use relative calling with auipc/jalr or jal
> instead of loading the full 64-bit address and jalr.
>
> The BPF JIT image region is 128 MB before the kernel text.
>
> Signed-off-by: Björn Töpel <bjorn.topel@gmail.com>
> ---
>   arch/riscv/include/asm/pgtable.h |  4 ++++
>   arch/riscv/net/bpf_jit_comp.c    | 13 +++++++++++++
>   2 files changed, 17 insertions(+)
>
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index 7ff0ed4f292e..cc3f49415620 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -404,6 +404,10 @@ static inline int ptep_clear_flush_young(struct vm_area_struct *vma,
>   #define VMALLOC_END      (PAGE_OFFSET - 1)
>   #define VMALLOC_START    (PAGE_OFFSET - VMALLOC_SIZE)
>   
> +#define BPF_JIT_REGION_SIZE	(SZ_128M)
> +#define BPF_JIT_REGION_START	(PAGE_OFFSET - BPF_JIT_REGION_SIZE)
> +#define BPF_JIT_REGION_END	(VMALLOC_END)
> +
>   /*
>    * Roughly size the vmemmap space to be large enough to fit enough
>    * struct pages to map half the virtual address space. Then
> diff --git a/arch/riscv/net/bpf_jit_comp.c b/arch/riscv/net/bpf_jit_comp.c
> index 8aa19c846881..46cff093f526 100644
> --- a/arch/riscv/net/bpf_jit_comp.c
> +++ b/arch/riscv/net/bpf_jit_comp.c
> @@ -1656,3 +1656,16 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
>   					   tmp : orig_prog);
>   	return prog;
>   }
> +
> +void *bpf_jit_alloc_exec(unsigned long size)
> +{
> +	return __vmalloc_node_range(size, PAGE_SIZE, BPF_JIT_REGION_START,
> +				    BPF_JIT_REGION_END, GFP_KERNEL,
> +				    PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
> +				    __builtin_return_address(0));
> +}
> +
> +void bpf_jit_free_exec(void *addr)
> +{
> +	return vfree(addr);
> +}


I think it would be better to completely avoid this patch and the 
definition of this
new zone by using the generic implementation if we had the patch 
discussed here
regarding modules memory allocation (that in any case we need to fix 
modules loading):

https://lore.kernel.org/linux-riscv/d868acf5-7242-93dc-0051-f97e64dc4387@ghiti.fr/T/#m2be30cb71dc9aa834a50d346961acee26158a238

Alex
Björn Töpel Feb. 3, 2020, 12:28 p.m. UTC | #8
On Sun, 2 Feb 2020 at 14:37, Alex Ghiti <alex@ghiti.fr> wrote:
>
[...]
>
> I think it would be better to completely avoid this patch and the
> definition of this
> new zone by using the generic implementation if we had the patch
> discussed here
> regarding modules memory allocation (that in any case we need to fix
> modules loading):
>
> https://lore.kernel.org/linux-riscv/d868acf5-7242-93dc-0051-f97e64dc4387@ghiti.fr/T/#m2be30cb71dc9aa834a50d346961acee26158a238
>

This patch is already upstream. I agree that when the module
allocation fix is upstream, the BPF image allocation can be folded
into the module allocation. IOW, I wont send any page table dumper
patch for BPF memory.

But keep in mind that the RV BPF JIT relies on having the kernel text
within the 32b range (as does modules)


Cheers,
Björn
Alexandre Ghiti Feb. 3, 2020, 8:57 p.m. UTC | #9
On 2/3/20 7:28 AM, Björn Töpel wrote:
> On Sun, 2 Feb 2020 at 14:37, Alex Ghiti <alex@ghiti.fr> wrote:
> [...]
>> I think it would be better to completely avoid this patch and the
>> definition of this
>> new zone by using the generic implementation if we had the patch
>> discussed here
>> regarding modules memory allocation (that in any case we need to fix
>> modules loading):
>>
>> https://lore.kernel.org/linux-riscv/d868acf5-7242-93dc-0051-f97e64dc4387@ghiti.fr/T/#m2be30cb71dc9aa834a50d346961acee26158a238
>>
> This patch is already upstream. I agree that when the module
> allocation fix is upstream, the BPF image allocation can be folded
> into the module allocation. IOW, I wont send any page table dumper
> patch for BPF memory.


Too late then :) I'll remove this zone with the patch regarding module
allocation.


>
> But keep in mind that the RV BPF JIT relies on having the kernel text
> within the 32b range (as does modules)


Yep, same constraints as for modules ;)

Thanks,

Alex


>
> Cheers,
> Björn
diff mbox series

Patch

diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 7ff0ed4f292e..cc3f49415620 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -404,6 +404,10 @@  static inline int ptep_clear_flush_young(struct vm_area_struct *vma,
 #define VMALLOC_END      (PAGE_OFFSET - 1)
 #define VMALLOC_START    (PAGE_OFFSET - VMALLOC_SIZE)
 
+#define BPF_JIT_REGION_SIZE	(SZ_128M)
+#define BPF_JIT_REGION_START	(PAGE_OFFSET - BPF_JIT_REGION_SIZE)
+#define BPF_JIT_REGION_END	(VMALLOC_END)
+
 /*
  * Roughly size the vmemmap space to be large enough to fit enough
  * struct pages to map half the virtual address space. Then
diff --git a/arch/riscv/net/bpf_jit_comp.c b/arch/riscv/net/bpf_jit_comp.c
index 8aa19c846881..46cff093f526 100644
--- a/arch/riscv/net/bpf_jit_comp.c
+++ b/arch/riscv/net/bpf_jit_comp.c
@@ -1656,3 +1656,16 @@  struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 					   tmp : orig_prog);
 	return prog;
 }
+
+void *bpf_jit_alloc_exec(unsigned long size)
+{
+	return __vmalloc_node_range(size, PAGE_SIZE, BPF_JIT_REGION_START,
+				    BPF_JIT_REGION_END, GFP_KERNEL,
+				    PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
+				    __builtin_return_address(0));
+}
+
+void bpf_jit_free_exec(void *addr)
+{
+	return vfree(addr);
+}