diff mbox series

[2/2] x86/modules: Make x86 allocs to flush when free

Message ID 20181128000754.18056-3-rick.p.edgecombe@intel.com (mailing list archive)
State New, archived
Headers show
Series Don’t leave executable TLB entries to freed pages | expand

Commit Message

Rick Edgecombe Nov. 28, 2018, 12:07 a.m. UTC
Change the module allocations to flush before freeing the pages.

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 arch/x86/kernel/module.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Andrew Morton Nov. 28, 2018, 11:11 p.m. UTC | #1
On Tue, 27 Nov 2018 16:07:54 -0800 Rick Edgecombe <rick.p.edgecombe@intel.com> wrote:

> Change the module allocations to flush before freeing the pages.
> 
> ...
>
> --- a/arch/x86/kernel/module.c
> +++ b/arch/x86/kernel/module.c
> @@ -87,8 +87,8 @@ void *module_alloc(unsigned long size)
>  	p = __vmalloc_node_range(size, MODULE_ALIGN,
>  				    MODULES_VADDR + get_module_load_offset(),
>  				    MODULES_END, GFP_KERNEL,
> -				    PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
> -				    __builtin_return_address(0));
> +				    PAGE_KERNEL_EXEC, VM_IMMEDIATE_UNMAP,
> +				    NUMA_NO_NODE, __builtin_return_address(0));
>  	if (p && (kasan_module_alloc(p, size) < 0)) {
>  		vfree(p);
>  		return NULL;

Should any other architectures do this?
Rick Edgecombe Nov. 29, 2018, 12:02 a.m. UTC | #2
On Wed, 2018-11-28 at 15:11 -0800, Andrew Morton wrote:
> On Tue, 27 Nov 2018 16:07:54 -0800 Rick Edgecombe <rick.p.edgecombe@intel.com>
> wrote:
> 
> > Change the module allocations to flush before freeing the pages.
> > 
> > ...
> > 
> > --- a/arch/x86/kernel/module.c
> > +++ b/arch/x86/kernel/module.c
> > @@ -87,8 +87,8 @@ void *module_alloc(unsigned long size)
> >  	p = __vmalloc_node_range(size, MODULE_ALIGN,
> >  				    MODULES_VADDR + get_module_load_offset(),
> >  				    MODULES_END, GFP_KERNEL,
> > -				    PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
> > -				    __builtin_return_address(0));
> > +				    PAGE_KERNEL_EXEC, VM_IMMEDIATE_UNMAP,
> > +				    NUMA_NO_NODE, __builtin_return_address(0));
> >  	if (p && (kasan_module_alloc(p, size) < 0)) {
> >  		vfree(p);
> >  		return NULL;
> 
> Should any other architectures do this?

I would think everything that has something like an NX bit and doesn't use the
default module_alloc implementation.

I could add the flag for every arch that defines PAGE_KERNEL_EXEC, but I don't
have a good way to test on all of those architectures.

Thanks,

Rick
Andy Lutomirski Nov. 29, 2018, 1:40 a.m. UTC | #3
> On Nov 27, 2018, at 4:07 PM, Rick Edgecombe <rick.p.edgecombe@intel.com> wrote:
>
> Change the module allocations to flush before freeing the pages.
>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> arch/x86/kernel/module.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
> index b052e883dd8c..1694daf256b3 100644
> --- a/arch/x86/kernel/module.c
> +++ b/arch/x86/kernel/module.c
> @@ -87,8 +87,8 @@ void *module_alloc(unsigned long size)
>    p = __vmalloc_node_range(size, MODULE_ALIGN,
>                    MODULES_VADDR + get_module_load_offset(),
>                    MODULES_END, GFP_KERNEL,
> -                    PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
> -                    __builtin_return_address(0));
> +                    PAGE_KERNEL_EXEC, VM_IMMEDIATE_UNMAP,
> +                    NUMA_NO_NODE, __builtin_return_address(0));

Hmm. How awful is the resulting performance for heavy eBPF users?  I’m
wondering if the JIT will need some kind of cache to reuse
allocations.

>    if (p && (kasan_module_alloc(p, size) < 0)) {
>        vfree(p);
>        return NULL;
> --
> 2.17.1
>
Rick Edgecombe Nov. 29, 2018, 6:14 a.m. UTC | #4
On Wed, 2018-11-28 at 17:40 -0800, Andy Lutomirski wrote:
> > On Nov 27, 2018, at 4:07 PM, Rick Edgecombe <
> > rick.p.edgecombe@intel.com> wrote:
> > 
> > Change the module allocations to flush before freeing the pages.
> > 
> > Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> > ---
> > arch/x86/kernel/module.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
> > index b052e883dd8c..1694daf256b3 100644
> > --- a/arch/x86/kernel/module.c
> > +++ b/arch/x86/kernel/module.c
> > @@ -87,8 +87,8 @@ void *module_alloc(unsigned long size)
> >    p = __vmalloc_node_range(size, MODULE_ALIGN,
> >                    MODULES_VADDR + get_module_load_offset(),
> >                    MODULES_END, GFP_KERNEL,
> > -                    PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
> > -                    __builtin_return_address(0));
> > +                    PAGE_KERNEL_EXEC, VM_IMMEDIATE_UNMAP,
> > +                    NUMA_NO_NODE, __builtin_return_address(0));
> 
> Hmm. How awful is the resulting performance for heavy eBPF
> users?  I’m
> wondering if the JIT will need some kind of cache to reuse
> allocations.
I think it should have no effect for x86 at least. On allocation there
is only the setting of the flag. For free-ing there is of course a new
TLB flush, but it happens in way that should remove one elsewhere for
BPF.

On x86 today there are actually already 3 flushes for the operation
around a module_alloc JIT free. What's happening is there are two
allocations that are RO: the JIT and some data. When freeing, first the
JIT is set RW, then vfreed. So this causes 1 TLB flush from the
set_memory_rw, and there is now a lazy vmap area from the vfree. When
the data allocation is set to RW, vm_unmap_aliases() is called in
pageattr.c:change_page_attr_set_clr, so it will cause a flush from
clearing the lazy area, then there is the third flush as part of the
permissions change like usual.

Since now the JIT vfree will call vm_unmap_aliases(), it should not
trigger a TLB flush in the second permission change, so remain at 3.
> >    if (p && (kasan_module_alloc(p, size) < 0)) {
> >        vfree(p);
> >        return NULL;
> > --
> > 2.17.1
> >
diff mbox series

Patch

diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index b052e883dd8c..1694daf256b3 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -87,8 +87,8 @@  void *module_alloc(unsigned long size)
 	p = __vmalloc_node_range(size, MODULE_ALIGN,
 				    MODULES_VADDR + get_module_load_offset(),
 				    MODULES_END, GFP_KERNEL,
-				    PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
-				    __builtin_return_address(0));
+				    PAGE_KERNEL_EXEC, VM_IMMEDIATE_UNMAP,
+				    NUMA_NO_NODE, __builtin_return_address(0));
 	if (p && (kasan_module_alloc(p, size) < 0)) {
 		vfree(p);
 		return NULL;