diff mbox series

[v3,bpf,RESEND,3/4] module: introduce module_alloc_huge

Message ID 20220414195914.1648345-4-song@kernel.org (mailing list archive)
State New
Headers show
Series vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP | expand

Commit Message

Song Liu April 14, 2022, 7:59 p.m. UTC
Introduce module_alloc_huge, which allocates huge page backed memory in
module memory space. The primary user of this memory is bpf_prog_pack
(multiple BPF programs sharing a huge page).

Signed-off-by: Song Liu <song@kernel.org>
---
 arch/x86/kernel/module.c     | 21 +++++++++++++++++++++
 include/linux/moduleloader.h |  5 +++++
 kernel/module.c              |  5 +++++
 3 files changed, 31 insertions(+)

Comments

Luis Chamberlain April 14, 2022, 8:34 p.m. UTC | #1
On Thu, Apr 14, 2022 at 12:59:13PM -0700, Song Liu wrote:
> Introduce module_alloc_huge, which allocates huge page backed memory in
> module memory space. The primary user of this memory is bpf_prog_pack
> (multiple BPF programs sharing a huge page).
> 
> Signed-off-by: Song Liu <song@kernel.org>

See modules-next [0], as modules.c has been chopped up as of late.
So if you want this to go throug modules this will need to rebased
on that tree. fortunately the amount of code in question does not
seem like much.

[0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=modules-next

  Luis
Song Liu April 14, 2022, 9:03 p.m. UTC | #2
Hi Luis,

On Thu, Apr 14, 2022 at 1:34 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
>
> On Thu, Apr 14, 2022 at 12:59:13PM -0700, Song Liu wrote:
> > Introduce module_alloc_huge, which allocates huge page backed memory in
> > module memory space. The primary user of this memory is bpf_prog_pack
> > (multiple BPF programs sharing a huge page).
> >
> > Signed-off-by: Song Liu <song@kernel.org>
>
> See modules-next [0], as modules.c has been chopped up as of late.
> So if you want this to go throug modules this will need to rebased
> on that tree. fortunately the amount of code in question does not
> seem like much.
>
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=modules-next

We are hoping to ship this with to 5.18, as the set addresses some issue with
huge page backed vmalloc. I guess we cannot ship it via modules-next branch.

How about we ship module_alloc_huge() to 5.18 in module.c for now, and once
we update modules-next branch, I will send another patch to clean it up?

Thanks,
Song
Luis Chamberlain April 14, 2022, 9:11 p.m. UTC | #3
On Thu, Apr 14, 2022 at 02:03:17PM -0700, Song Liu wrote:
> Hi Luis,
> 
> On Thu, Apr 14, 2022 at 1:34 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
> >
> > On Thu, Apr 14, 2022 at 12:59:13PM -0700, Song Liu wrote:
> > > Introduce module_alloc_huge, which allocates huge page backed memory in
> > > module memory space. The primary user of this memory is bpf_prog_pack
> > > (multiple BPF programs sharing a huge page).
> > >
> > > Signed-off-by: Song Liu <song@kernel.org>
> >
> > See modules-next [0], as modules.c has been chopped up as of late.
> > So if you want this to go throug modules this will need to rebased
> > on that tree. fortunately the amount of code in question does not
> > seem like much.
> >
> > [0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=modules-next
> 
> We are hoping to ship this with to 5.18, as the set addresses some issue with
> huge page backed vmalloc. I guess we cannot ship it via modules-next branch.
> 

Huh, you intend this to go in as a fix for v5.18 (already released) once
properly reviewed?  This seems quite large... for a fix.

> How about we ship module_alloc_huge() to 5.18 in module.c for now, and once
> we update modules-next branch, I will send another patch to clean it up?

I rather set the expectations right about getting such a large fix in
for v5.18. I haven't even sat down to review all the changes in light of
this, but a cursorary glance seems to me it's rather "large" for a fix.

  Luis
Song Liu April 14, 2022, 9:31 p.m. UTC | #4
On Thu, Apr 14, 2022 at 2:11 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
>
> On Thu, Apr 14, 2022 at 02:03:17PM -0700, Song Liu wrote:
> > Hi Luis,
> >
> > On Thu, Apr 14, 2022 at 1:34 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
> > >
> > > On Thu, Apr 14, 2022 at 12:59:13PM -0700, Song Liu wrote:
> > > > Introduce module_alloc_huge, which allocates huge page backed memory in
> > > > module memory space. The primary user of this memory is bpf_prog_pack
> > > > (multiple BPF programs sharing a huge page).
> > > >
> > > > Signed-off-by: Song Liu <song@kernel.org>
> > >
> > > See modules-next [0], as modules.c has been chopped up as of late.
> > > So if you want this to go throug modules this will need to rebased
> > > on that tree. fortunately the amount of code in question does not
> > > seem like much.
> > >
> > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=modules-next
> >
> > We are hoping to ship this with to 5.18, as the set addresses some issue with
> > huge page backed vmalloc. I guess we cannot ship it via modules-next branch.
> >
>
> Huh, you intend this to go in as a fix for v5.18 (already released) once
> properly reviewed?  This seems quite large... for a fix.
>
> > How about we ship module_alloc_huge() to 5.18 in module.c for now, and once
> > we update modules-next branch, I will send another patch to clean it up?
>
> I rather set the expectations right about getting such a large fix in
> for v5.18. I haven't even sat down to review all the changes in light of
> this, but a cursorary glance seems to me it's rather "large" for a fix.

Yes, I agree this is a little too big for a fix. I guess we can discuss whether
some of the set need to wait until 5.19.

Thanks,
Song
Christoph Hellwig April 15, 2022, 6:32 a.m. UTC | #5
On Thu, Apr 14, 2022 at 12:59:13PM -0700, Song Liu wrote:
> Introduce module_alloc_huge, which allocates huge page backed memory in
> module memory space. The primary user of this memory is bpf_prog_pack
> (multiple BPF programs sharing a huge page).
> 
> Signed-off-by: Song Liu <song@kernel.org>
> ---
>  arch/x86/kernel/module.c     | 21 +++++++++++++++++++++
>  include/linux/moduleloader.h |  5 +++++
>  kernel/module.c              |  5 +++++
>  3 files changed, 31 insertions(+)
> 
> diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
> index b98ffcf4d250..63f6a16c70dc 100644
> --- a/arch/x86/kernel/module.c
> +++ b/arch/x86/kernel/module.c
> @@ -86,6 +86,27 @@ void *module_alloc(unsigned long size)
>  	return p;
>  }
>  
> +void *module_alloc_huge(unsigned long size)
> +{
> +	gfp_t gfp_mask = GFP_KERNEL;
> +	void *p;
> +
> +	if (PAGE_ALIGN(size) > MODULES_LEN)
> +		return NULL;
> +
> +	p = __vmalloc_node_range(size, MODULE_ALIGN,
> +				 MODULES_VADDR + get_module_load_offset(),
> +				 MODULES_END, gfp_mask, PAGE_KERNEL,
> +				 VM_DEFER_KMEMLEAK | VM_ALLOW_HUGE_VMAP,
> +				 NUMA_NO_NODE, __builtin_return_address(0));
> +	if (p && (kasan_alloc_module_shadow(p, size, gfp_mask) < 0)) {
> +		vfree(p);
> +		return NULL;
> +	}
> +
> +	return p;
> +}
> +
>  #ifdef CONFIG_X86_32
>  int apply_relocate(Elf32_Shdr *sechdrs,
>  		   const char *strtab,
> diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
> index 9e09d11ffe5b..d34743a88938 100644
> --- a/include/linux/moduleloader.h
> +++ b/include/linux/moduleloader.h
> @@ -26,6 +26,11 @@ unsigned int arch_mod_section_prepend(struct module *mod, unsigned int section);
>     sections.  Returns NULL on failure. */
>  void *module_alloc(unsigned long size);
>  
> +/* Allocator used for allocating memory in module memory space. If size is
> + * greater than PMD_SIZE, allow using huge pages. Returns NULL on failure.
> + */
> +void *module_alloc_huge(unsigned long size);
> +
>  /* Free memory returned from module_alloc. */
>  void module_memfree(void *module_region);
>  
> diff --git a/kernel/module.c b/kernel/module.c
> index 6cea788fd965..b2c6cb682a7d 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -2839,6 +2839,11 @@ void * __weak module_alloc(unsigned long size)
>  			NUMA_NO_NODE, __builtin_return_address(0));
>  }
>  
> +void * __weak module_alloc_huge(unsigned long size)
> +{
> +	return vmalloc_huge(size);
> +}

Umm.  This should use the same parameters as module_alloc except for
also passing the new huge page flag.
Song Liu April 15, 2022, 3:59 p.m. UTC | #6
> On Apr 14, 2022, at 11:32 PM, Christoph Hellwig <hch@infradead.org> wrote:
> 
> On Thu, Apr 14, 2022 at 12:59:13PM -0700, Song Liu wrote:
>> Introduce module_alloc_huge, which allocates huge page backed memory in
>> module memory space. The primary user of this memory is bpf_prog_pack
>> (multiple BPF programs sharing a huge page).
>> 
>> Signed-off-by: Song Liu <song@kernel.org>
>> ---
>> arch/x86/kernel/module.c | 21 +++++++++++++++++++++
>> include/linux/moduleloader.h | 5 +++++
>> kernel/module.c | 5 +++++
>> 3 files changed, 31 insertions(+)
>> 
>> diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
>> index b98ffcf4d250..63f6a16c70dc 100644
>> --- a/arch/x86/kernel/module.c
>> +++ b/arch/x86/kernel/module.c
>> @@ -86,6 +86,27 @@ void *module_alloc(unsigned long size)
>> 	return p;
>> }
>> 
>> +void *module_alloc_huge(unsigned long size)
>> +{
>> +	gfp_t gfp_mask = GFP_KERNEL;
>> +	void *p;
>> +
>> +	if (PAGE_ALIGN(size) > MODULES_LEN)
>> +		return NULL;
>> +
>> +	p = __vmalloc_node_range(size, MODULE_ALIGN,
>> +				 MODULES_VADDR + get_module_load_offset(),
>> +				 MODULES_END, gfp_mask, PAGE_KERNEL,
>> +				 VM_DEFER_KMEMLEAK | VM_ALLOW_HUGE_VMAP,
>> +				 NUMA_NO_NODE, __builtin_return_address(0));
>> +	if (p && (kasan_alloc_module_shadow(p, size, gfp_mask) < 0)) {
>> +		vfree(p);
>> +		return NULL;
>> +	}
>> +
>> +	return p;
>> +}
>> +
>> #ifdef CONFIG_X86_32
>> int apply_relocate(Elf32_Shdr *sechdrs,
>> 		 const char *strtab,
>> diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
>> index 9e09d11ffe5b..d34743a88938 100644
>> --- a/include/linux/moduleloader.h
>> +++ b/include/linux/moduleloader.h
>> @@ -26,6 +26,11 @@ unsigned int arch_mod_section_prepend(struct module *mod, unsigned int section);
>> sections. Returns NULL on failure. */
>> void *module_alloc(unsigned long size);
>> 
>> +/* Allocator used for allocating memory in module memory space. If size is
>> + * greater than PMD_SIZE, allow using huge pages. Returns NULL on failure.
>> + */
>> +void *module_alloc_huge(unsigned long size);
>> +
>> /* Free memory returned from module_alloc. */
>> void module_memfree(void *module_region);
>> 
>> diff --git a/kernel/module.c b/kernel/module.c
>> index 6cea788fd965..b2c6cb682a7d 100644
>> --- a/kernel/module.c
>> +++ b/kernel/module.c
>> @@ -2839,6 +2839,11 @@ void * __weak module_alloc(unsigned long size)
>> 			NUMA_NO_NODE, __builtin_return_address(0));
>> }
>> 
>> +void * __weak module_alloc_huge(unsigned long size)
>> +{
>> +	return vmalloc_huge(size);
>> +}
> 
> Umm. This should use the same parameters as module_alloc except for
> also passing the new huge page flag.

Will fix the set and send v4. 

Thanks,
Song
Luis Chamberlain April 15, 2022, 7:03 p.m. UTC | #7
On Thu, Apr 14, 2022 at 02:31:18PM -0700, Song Liu wrote:
> On Thu, Apr 14, 2022 at 2:11 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
> >
> > On Thu, Apr 14, 2022 at 02:03:17PM -0700, Song Liu wrote:
> > > Hi Luis,
> > >
> > > On Thu, Apr 14, 2022 at 1:34 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
> > > >
> > > > On Thu, Apr 14, 2022 at 12:59:13PM -0700, Song Liu wrote:
> > > > > Introduce module_alloc_huge, which allocates huge page backed memory in
> > > > > module memory space. The primary user of this memory is bpf_prog_pack
> > > > > (multiple BPF programs sharing a huge page).
> > > > >
> > > > > Signed-off-by: Song Liu <song@kernel.org>
> > > >
> > > > See modules-next [0], as modules.c has been chopped up as of late.
> > > > So if you want this to go throug modules this will need to rebased
> > > > on that tree. fortunately the amount of code in question does not
> > > > seem like much.
> > > >
> > > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=modules-next
> > >
> > > We are hoping to ship this with to 5.18, as the set addresses some issue with
> > > huge page backed vmalloc. I guess we cannot ship it via modules-next branch.
> > >
> >
> > Huh, you intend this to go in as a fix for v5.18 (already released) once
> > properly reviewed?  This seems quite large... for a fix.
> >
> > > How about we ship module_alloc_huge() to 5.18 in module.c for now, and once
> > > we update modules-next branch, I will send another patch to clean it up?
> >
> > I rather set the expectations right about getting such a large fix in
> > for v5.18. I haven't even sat down to review all the changes in light of
> > this, but a cursorary glance seems to me it's rather "large" for a fix.
> 
> Yes, I agree this is a little too big for a fix. I guess we can discuss whether
> some of the set need to wait until 5.19.

Doing a more thorough review of this now, and when the other changes
landed, it seems this is *large follow up fix* for an optimization for when tons
of JIT eBPF programs are used. It's so large I can't be confident this also
doesn't go in with other holes or issues, or that the other stuff merged
already also has some other issues. So I can't see anything screaming
for why this needs to go in for v5.18 other than it'd be nice.

So my preference is for this to go through v5.19 as I see no rush.

  Luis
diff mbox series

Patch

diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index b98ffcf4d250..63f6a16c70dc 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -86,6 +86,27 @@  void *module_alloc(unsigned long size)
 	return p;
 }
 
+void *module_alloc_huge(unsigned long size)
+{
+	gfp_t gfp_mask = GFP_KERNEL;
+	void *p;
+
+	if (PAGE_ALIGN(size) > MODULES_LEN)
+		return NULL;
+
+	p = __vmalloc_node_range(size, MODULE_ALIGN,
+				 MODULES_VADDR + get_module_load_offset(),
+				 MODULES_END, gfp_mask, PAGE_KERNEL,
+				 VM_DEFER_KMEMLEAK | VM_ALLOW_HUGE_VMAP,
+				 NUMA_NO_NODE, __builtin_return_address(0));
+	if (p && (kasan_alloc_module_shadow(p, size, gfp_mask) < 0)) {
+		vfree(p);
+		return NULL;
+	}
+
+	return p;
+}
+
 #ifdef CONFIG_X86_32
 int apply_relocate(Elf32_Shdr *sechdrs,
 		   const char *strtab,
diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
index 9e09d11ffe5b..d34743a88938 100644
--- a/include/linux/moduleloader.h
+++ b/include/linux/moduleloader.h
@@ -26,6 +26,11 @@  unsigned int arch_mod_section_prepend(struct module *mod, unsigned int section);
    sections.  Returns NULL on failure. */
 void *module_alloc(unsigned long size);
 
+/* Allocator used for allocating memory in module memory space. If size is
+ * greater than PMD_SIZE, allow using huge pages. Returns NULL on failure.
+ */
+void *module_alloc_huge(unsigned long size);
+
 /* Free memory returned from module_alloc. */
 void module_memfree(void *module_region);
 
diff --git a/kernel/module.c b/kernel/module.c
index 6cea788fd965..b2c6cb682a7d 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2839,6 +2839,11 @@  void * __weak module_alloc(unsigned long size)
 			NUMA_NO_NODE, __builtin_return_address(0));
 }
 
+void * __weak module_alloc_huge(unsigned long size)
+{
+	return vmalloc_huge(size);
+}
+
 bool __weak module_init_section(const char *name)
 {
 	return strstarts(name, ".init");