Message ID | 20230616085038.4121892-8-rppt@kernel.org (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | mm: jit/text allocator | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch, async |
bpf/vmtest-bpf-next-PR | fail | PR summary |
bpf/vmtest-bpf-next-VM_Test-1 | success | Logs for ${{ matrix.test }} on ${{ matrix.arch }} with ${{ matrix.toolchain_full }} |
bpf/vmtest-bpf-next-VM_Test-2 | success | Logs for ShellCheck |
bpf/vmtest-bpf-next-VM_Test-3 | fail | Logs for build for aarch64 with gcc |
bpf/vmtest-bpf-next-VM_Test-4 | success | Logs for build for s390x with gcc |
bpf/vmtest-bpf-next-VM_Test-5 | success | Logs for build for x86_64 with gcc |
bpf/vmtest-bpf-next-VM_Test-6 | success | Logs for build for x86_64 with llvm-16 |
bpf/vmtest-bpf-next-VM_Test-7 | success | Logs for set-matrix |
bpf/vmtest-bpf-next-VM_Test-8 | success | Logs for veristat |
On Fri, Jun 16, 2023 at 1:52 AM Mike Rapoport <rppt@kernel.org> wrote: > > From: "Mike Rapoport (IBM)" <rppt@kernel.org> > > The memory allocations for kprobes on arm64 can be placed anywhere in > vmalloc address space and currently this is implemented with an override > of alloc_insn_page() in arm64. > > Extend execmem_params with a range for generated code allocations and > make kprobes on arm64 use this extension rather than override > alloc_insn_page(). > > Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> > --- > arch/arm64/kernel/module.c | 9 +++++++++ > arch/arm64/kernel/probes/kprobes.c | 7 ------- > include/linux/execmem.h | 11 +++++++++++ > mm/execmem.c | 14 +++++++++++++- > 4 files changed, 33 insertions(+), 8 deletions(-) > > diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c > index c3d999f3a3dd..52b09626bc0f 100644 > --- a/arch/arm64/kernel/module.c > +++ b/arch/arm64/kernel/module.c > @@ -30,6 +30,13 @@ static struct execmem_params execmem_params = { > .alignment = MODULE_ALIGN, > }, > }, > + .jit = { > + .text = { > + .start = VMALLOC_START, > + .end = VMALLOC_END, > + .alignment = 1, > + }, > + }, > }; This is growing fast. :) We have 3 now: text, data, jit. And it will be 5 when we split data into rw data, ro data, ro after init data. I wonder whether we should still do some type enum here. But we can revisit this topic later. Other than that Acked-by: Song Liu <song@kernel.org>
On Fri, Jun 16, 2023 at 01:05:29PM -0700, Song Liu wrote: > On Fri, Jun 16, 2023 at 1:52 AM Mike Rapoport <rppt@kernel.org> wrote: > > > > From: "Mike Rapoport (IBM)" <rppt@kernel.org> > > > > The memory allocations for kprobes on arm64 can be placed anywhere in > > vmalloc address space and currently this is implemented with an override > > of alloc_insn_page() in arm64. > > > > Extend execmem_params with a range for generated code allocations and > > make kprobes on arm64 use this extension rather than override > > alloc_insn_page(). > > > > Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> > > --- > > arch/arm64/kernel/module.c | 9 +++++++++ > > arch/arm64/kernel/probes/kprobes.c | 7 ------- > > include/linux/execmem.h | 11 +++++++++++ > > mm/execmem.c | 14 +++++++++++++- > > 4 files changed, 33 insertions(+), 8 deletions(-) > > > > diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c > > index c3d999f3a3dd..52b09626bc0f 100644 > > --- a/arch/arm64/kernel/module.c > > +++ b/arch/arm64/kernel/module.c > > @@ -30,6 +30,13 @@ static struct execmem_params execmem_params = { > > .alignment = MODULE_ALIGN, > > }, > > }, > > + .jit = { > > + .text = { > > + .start = VMALLOC_START, > > + .end = VMALLOC_END, > > + .alignment = 1, > > + }, > > + }, > > }; > > This is growing fast. :) We have 3 now: text, data, jit. And it will be > 5 when we split data into rw data, ro data, ro after init data. I wonder > whether we should still do some type enum here. But we can revisit > this topic later. I don't think we'd need 5. Four at most :) I don't know yet what would be the best way to differentiate RW and RO data, but ro_after_init surely won't need a new type. It either will be allocated as RW and then the caller will have to set it RO after initialization is done, or it will be allocated as RO and the caller will have to do something like text_poke to update it. > Other than that > > Acked-by: Song Liu <song@kernel.org>
On Sat, Jun 17, 2023 at 09:57:59AM +0300, Mike Rapoport wrote: > > This is growing fast. :) We have 3 now: text, data, jit. And it will be > > 5 when we split data into rw data, ro data, ro after init data. I wonder > > whether we should still do some type enum here. But we can revisit > > this topic later. > > I don't think we'd need 5. Four at most :) > > I don't know yet what would be the best way to differentiate RW and RO > data, but ro_after_init surely won't need a new type. It either will be > allocated as RW and then the caller will have to set it RO after > initialization is done, or it will be allocated as RO and the caller will > have to do something like text_poke to update it. Perhaps ro_after_init could use the same allocation interface and share pages with ro pages - if we just added a refcount for "this page currently needs to be rw, module is still loading?" text_poke() approach wouldn't be workable, you'd have to audit and fix all module init code in the entire kernel.
On Sat, Jun 17, 2023 at 8:37 AM Kent Overstreet <kent.overstreet@linux.dev> wrote: > > On Sat, Jun 17, 2023 at 09:57:59AM +0300, Mike Rapoport wrote: > > > This is growing fast. :) We have 3 now: text, data, jit. And it will be > > > 5 when we split data into rw data, ro data, ro after init data. I wonder > > > whether we should still do some type enum here. But we can revisit > > > this topic later. > > > > I don't think we'd need 5. Four at most :) > > > > I don't know yet what would be the best way to differentiate RW and RO > > data, but ro_after_init surely won't need a new type. It either will be > > allocated as RW and then the caller will have to set it RO after > > initialization is done, or it will be allocated as RO and the caller will > > have to do something like text_poke to update it. > > Perhaps ro_after_init could use the same allocation interface and share > pages with ro pages - if we just added a refcount for "this page > currently needs to be rw, module is still loading?" If we don't relax rules with read only, we will have to separate rw, ro, and ro_after_init. But we can still have page sharing: Two modules can put rw data on the same page. With text poke (ro data poke to be accurate), two modules can put ro data on the same page. > text_poke() approach wouldn't be workable, you'd have to audit and fix > all module init code in the entire kernel. Agreed. For this reason, each module has to have its own page(s) for ro_after_init data. To eventually remove VM_FLUSH_RESET_PERMS, we want ro_after_init data to share the same allocation interface. Thanks, Song
On Sat, Jun 17, 2023 at 09:38:17AM -0700, Song Liu wrote: > On Sat, Jun 17, 2023 at 8:37 AM Kent Overstreet > <kent.overstreet@linux.dev> wrote: > > > > On Sat, Jun 17, 2023 at 09:57:59AM +0300, Mike Rapoport wrote: > > > > This is growing fast. :) We have 3 now: text, data, jit. And it will be > > > > 5 when we split data into rw data, ro data, ro after init data. I wonder > > > > whether we should still do some type enum here. But we can revisit > > > > this topic later. > > > > > > I don't think we'd need 5. Four at most :) > > > > > > I don't know yet what would be the best way to differentiate RW and RO > > > data, but ro_after_init surely won't need a new type. It either will be > > > allocated as RW and then the caller will have to set it RO after > > > initialization is done, or it will be allocated as RO and the caller will > > > have to do something like text_poke to update it. > > > > Perhaps ro_after_init could use the same allocation interface and share > > pages with ro pages - if we just added a refcount for "this page > > currently needs to be rw, module is still loading?" > > If we don't relax rules with read only, we will have to separate rw, ro, > and ro_after_init. But we can still have page sharing: > > Two modules can put rw data on the same page. > With text poke (ro data poke to be accurate), two modules can put > ro data on the same page. > > > text_poke() approach wouldn't be workable, you'd have to audit and fix > > all module init code in the entire kernel. > > Agreed. For this reason, each module has to have its own page(s) for > ro_after_init data. Relaxing page permissions to allow for page sharing could also be a config option. For archs with 64k pages it seems worthwhile.
diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c index c3d999f3a3dd..52b09626bc0f 100644 --- a/arch/arm64/kernel/module.c +++ b/arch/arm64/kernel/module.c @@ -30,6 +30,13 @@ static struct execmem_params execmem_params = { .alignment = MODULE_ALIGN, }, }, + .jit = { + .text = { + .start = VMALLOC_START, + .end = VMALLOC_END, + .alignment = 1, + }, + }, }; struct execmem_params __init *execmem_arch_params(void) @@ -40,6 +47,8 @@ struct execmem_params __init *execmem_arch_params(void) execmem_params.modules.text.start = module_alloc_base; execmem_params.modules.text.end = module_alloc_end; + execmem_params.jit.text.pgprot = PAGE_KERNEL_ROX; + /* * KASAN without KASAN_VMALLOC can only deal with module * allocations being served from the reserved module region, diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c index 70b91a8c6bb3..6fccedd02b2a 100644 --- a/arch/arm64/kernel/probes/kprobes.c +++ b/arch/arm64/kernel/probes/kprobes.c @@ -129,13 +129,6 @@ int __kprobes arch_prepare_kprobe(struct kprobe *p) return 0; } -void *alloc_insn_page(void) -{ - return __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START, VMALLOC_END, - GFP_KERNEL, PAGE_KERNEL_ROX, VM_FLUSH_RESET_PERMS, - NUMA_NO_NODE, __builtin_return_address(0)); -} - /* arm kprobe: install breakpoint in text */ void __kprobes arch_arm_kprobe(struct kprobe *p) { diff --git a/include/linux/execmem.h b/include/linux/execmem.h index 2e1221310d13..dc7c9a446111 100644 --- a/include/linux/execmem.h +++ b/include/linux/execmem.h @@ -52,12 +52,23 @@ struct execmem_modules_range { struct execmem_range data; }; +/** + * struct execmem_jit_range - architecure parameters for address space + * suitable for JIT code allocations + * @text: address range for text allocations + */ +struct execmem_jit_range { + struct execmem_range text; +}; + /** * struct execmem_params - architecure parameters for code allocations * @modules: parameters for modules address space + * @jit: parameters for jit memory address space */ struct execmem_params { struct execmem_modules_range modules; + struct execmem_jit_range jit; }; /** diff --git a/mm/execmem.c b/mm/execmem.c index f7bf496ad4c3..9730ecef9a30 100644 --- a/mm/execmem.c +++ b/mm/execmem.c @@ -89,7 +89,12 @@ void execmem_free(void *ptr) void *jit_text_alloc(size_t size) { - return execmem_text_alloc(size); + unsigned long start = execmem_params.jit.text.start; + unsigned long end = execmem_params.jit.text.end; + pgprot_t pgprot = execmem_params.jit.text.pgprot; + unsigned int align = execmem_params.jit.text.alignment; + + return execmem_alloc(size, start, end, align, pgprot, 0, 0, false); } void jit_free(void *ptr) @@ -135,6 +140,13 @@ static void execmem_init_missing(struct execmem_params *p) execmem_params.modules.data.fallback_start = m->text.fallback_start; execmem_params.modules.data.fallback_end = m->text.fallback_end; } + + if (!execmem_params.jit.text.start) { + execmem_params.jit.text.start = m->text.start; + execmem_params.jit.text.end = m->text.end; + execmem_params.jit.text.alignment = m->text.alignment; + execmem_params.jit.text.pgprot = m->text.pgprot; + } } void __init execmem_init(void)