Message ID | 20181128000754.18056-3-rick.p.edgecombe@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Don’t leave executable TLB entries to freed pages | expand |
On Tue, 27 Nov 2018 16:07:54 -0800 Rick Edgecombe <rick.p.edgecombe@intel.com> wrote: > Change the module allocations to flush before freeing the pages. > > ... > > --- a/arch/x86/kernel/module.c > +++ b/arch/x86/kernel/module.c > @@ -87,8 +87,8 @@ void *module_alloc(unsigned long size) > p = __vmalloc_node_range(size, MODULE_ALIGN, > MODULES_VADDR + get_module_load_offset(), > MODULES_END, GFP_KERNEL, > - PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE, > - __builtin_return_address(0)); > + PAGE_KERNEL_EXEC, VM_IMMEDIATE_UNMAP, > + NUMA_NO_NODE, __builtin_return_address(0)); > if (p && (kasan_module_alloc(p, size) < 0)) { > vfree(p); > return NULL; Should any other architectures do this?
On Wed, 2018-11-28 at 15:11 -0800, Andrew Morton wrote: > On Tue, 27 Nov 2018 16:07:54 -0800 Rick Edgecombe <rick.p.edgecombe@intel.com> > wrote: > > > Change the module allocations to flush before freeing the pages. > > > > ... > > > > --- a/arch/x86/kernel/module.c > > +++ b/arch/x86/kernel/module.c > > @@ -87,8 +87,8 @@ void *module_alloc(unsigned long size) > > p = __vmalloc_node_range(size, MODULE_ALIGN, > > MODULES_VADDR + get_module_load_offset(), > > MODULES_END, GFP_KERNEL, > > - PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE, > > - __builtin_return_address(0)); > > + PAGE_KERNEL_EXEC, VM_IMMEDIATE_UNMAP, > > + NUMA_NO_NODE, __builtin_return_address(0)); > > if (p && (kasan_module_alloc(p, size) < 0)) { > > vfree(p); > > return NULL; > > Should any other architectures do this? I would think everything that has something like an NX bit and doesn't use the default module_alloc implementation. I could add the flag for every arch that defines PAGE_KERNEL_EXEC, but I don't have a good way to test on all of those architectures. Thanks, Rick
> On Nov 27, 2018, at 4:07 PM, Rick Edgecombe <rick.p.edgecombe@intel.com> wrote: > > Change the module allocations to flush before freeing the pages. > > Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> > --- > arch/x86/kernel/module.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c > index b052e883dd8c..1694daf256b3 100644 > --- a/arch/x86/kernel/module.c > +++ b/arch/x86/kernel/module.c > @@ -87,8 +87,8 @@ void *module_alloc(unsigned long size) > p = __vmalloc_node_range(size, MODULE_ALIGN, > MODULES_VADDR + get_module_load_offset(), > MODULES_END, GFP_KERNEL, > - PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE, > - __builtin_return_address(0)); > + PAGE_KERNEL_EXEC, VM_IMMEDIATE_UNMAP, > + NUMA_NO_NODE, __builtin_return_address(0)); Hmm. How awful is the resulting performance for heavy eBPF users? I’m wondering if the JIT will need some kind of cache to reuse allocations. > if (p && (kasan_module_alloc(p, size) < 0)) { > vfree(p); > return NULL; > -- > 2.17.1 >
On Wed, 2018-11-28 at 17:40 -0800, Andy Lutomirski wrote: > > On Nov 27, 2018, at 4:07 PM, Rick Edgecombe < > > rick.p.edgecombe@intel.com> wrote: > > > > Change the module allocations to flush before freeing the pages. > > > > Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> > > --- > > arch/x86/kernel/module.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c > > index b052e883dd8c..1694daf256b3 100644 > > --- a/arch/x86/kernel/module.c > > +++ b/arch/x86/kernel/module.c > > @@ -87,8 +87,8 @@ void *module_alloc(unsigned long size) > > p = __vmalloc_node_range(size, MODULE_ALIGN, > > MODULES_VADDR + get_module_load_offset(), > > MODULES_END, GFP_KERNEL, > > - PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE, > > - __builtin_return_address(0)); > > + PAGE_KERNEL_EXEC, VM_IMMEDIATE_UNMAP, > > + NUMA_NO_NODE, __builtin_return_address(0)); > > Hmm. How awful is the resulting performance for heavy eBPF > users? I’m > wondering if the JIT will need some kind of cache to reuse > allocations. I think it should have no effect for x86 at least. On allocation there is only the setting of the flag. For free-ing there is of course a new TLB flush, but it happens in way that should remove one elsewhere for BPF. On x86 today there are actually already 3 flushes for the operation around a module_alloc JIT free. What's happening is there are two allocations that are RO: the JIT and some data. When freeing, first the JIT is set RW, then vfreed. So this causes 1 TLB flush from the set_memory_rw, and there is now a lazy vmap area from the vfree. When the data allocation is set to RW, vm_unmap_aliases() is called in pageattr.c:change_page_attr_set_clr, so it will cause a flush from clearing the lazy area, then there is the third flush as part of the permissions change like usual. Since now the JIT vfree will call vm_unmap_aliases(), it should not trigger a TLB flush in the second permission change, so remain at 3. > > if (p && (kasan_module_alloc(p, size) < 0)) { > > vfree(p); > > return NULL; > > -- > > 2.17.1 > >
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c index b052e883dd8c..1694daf256b3 100644 --- a/arch/x86/kernel/module.c +++ b/arch/x86/kernel/module.c @@ -87,8 +87,8 @@ void *module_alloc(unsigned long size) p = __vmalloc_node_range(size, MODULE_ALIGN, MODULES_VADDR + get_module_load_offset(), MODULES_END, GFP_KERNEL, - PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE, - __builtin_return_address(0)); + PAGE_KERNEL_EXEC, VM_IMMEDIATE_UNMAP, + NUMA_NO_NODE, __builtin_return_address(0)); if (p && (kasan_module_alloc(p, size) < 0)) { vfree(p); return NULL;
Change the module allocations to flush before freeing the pages. Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> --- arch/x86/kernel/module.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)