mbox series

[RFC,v2,0/4] vmalloc_exec for modules and BPF programs

Message ID 20221007234315.2877365-1-song@kernel.org (mailing list archive)
Headers show
Series vmalloc_exec for modules and BPF programs | expand

Message

Song Liu Oct. 7, 2022, 11:43 p.m. UTC
Changes RFC v1 => RFC v2:
1. Major rewrite of the logic of vmalloc_exec and vfree_exec. They now
   work fine with BPF programs (patch 1, 2, 4). But module side (patch 3)
   still need some work.

This set is a prototype that allows dynamic kernel text (modules, bpf
programs, various trampolines, etc.) to share huge pages. The idea is
similar to Peter's suggestion in [1]. Please refer to each patch for
more detais.

The ultimate goal is to only host kernel text in 2MB pages (for x86_64).

Please share your comments on this.

Thanks!

[1] https://lore.kernel.org/bpf/Ys6cWUMHO8XwyYgr@hirez.programming.kicks-ass.net/
[2] RFC v1: https://lore.kernel.org/linux-mm/20220818224218.2399791-3-song@kernel.org/T/

Song Liu (4):
  vmalloc: introduce vmalloc_exec and vfree_exec
  bpf: use vmalloc_exec
  modules, x86: use vmalloc_exec for module core
  vmalloc_exec: share a huge page with kernel text

 arch/x86/Kconfig              |   1 +
 arch/x86/kernel/alternative.c |  30 +++-
 arch/x86/kernel/module.c      |   1 +
 arch/x86/mm/init_64.c         |   3 +-
 include/linux/vmalloc.h       |   2 +
 kernel/bpf/core.c             | 155 ++----------------
 kernel/module/main.c          |  23 +--
 kernel/module/strict_rwx.c    |   3 -
 kernel/trace/ftrace.c         |   3 +-
 mm/nommu.c                    |   7 +
 mm/vmalloc.c                  | 296 ++++++++++++++++++++++++++++++++++
 11 files changed, 358 insertions(+), 166 deletions(-)

--
2.30.2

Comments

Song Liu Oct. 8, 2022, 12:17 a.m. UTC | #1
+ Luis.
> On Oct 7, 2022, at 4:43 PM, Song Liu <song@kernel.org> wrote:
> 
> Changes RFC v1 => RFC v2:
> 1. Major rewrite of the logic of vmalloc_exec and vfree_exec. They now
>   work fine with BPF programs (patch 1, 2, 4). But module side (patch 3)
>   still need some work.
> 
> This set is a prototype that allows dynamic kernel text (modules, bpf
> programs, various trampolines, etc.) to share huge pages. The idea is
> similar to Peter's suggestion in [1]. Please refer to each patch for
> more detais.
> 
> The ultimate goal is to only host kernel text in 2MB pages (for x86_64).
> 
> Please share your comments on this.
> 
> Thanks!
> 
> [1] https://lore.kernel.org/bpf/Ys6cWUMHO8XwyYgr@hirez.programming.kicks-ass.net/
> [2] RFC v1: https://lore.kernel.org/linux-mm/20220818224218.2399791-3-song@kernel.org/T/
> 
> Song Liu (4):
>  vmalloc: introduce vmalloc_exec and vfree_exec
>  bpf: use vmalloc_exec
>  modules, x86: use vmalloc_exec for module core
>  vmalloc_exec: share a huge page with kernel text
> 
> arch/x86/Kconfig              |   1 +
> arch/x86/kernel/alternative.c |  30 +++-
> arch/x86/kernel/module.c      |   1 +
> arch/x86/mm/init_64.c         |   3 +-
> include/linux/vmalloc.h       |   2 +
> kernel/bpf/core.c             | 155 ++----------------
> kernel/module/main.c          |  23 +--
> kernel/module/strict_rwx.c    |   3 -
> kernel/trace/ftrace.c         |   3 +-
> mm/nommu.c                    |   7 +
> mm/vmalloc.c                  | 296 ++++++++++++++++++++++++++++++++++
> 11 files changed, 358 insertions(+), 166 deletions(-)
> 
> --
> 2.30.2
Song Liu Oct. 12, 2022, 7:03 p.m. UTC | #2
Hi Peter,

> On Oct 7, 2022, at 4:43 PM, Song Liu <song@kernel.org> wrote:
> 
> Changes RFC v1 => RFC v2:
> 1. Major rewrite of the logic of vmalloc_exec and vfree_exec. They now
>   work fine with BPF programs (patch 1, 2, 4). But module side (patch 3)
>   still need some work.
> 
> This set is a prototype that allows dynamic kernel text (modules, bpf
> programs, various trampolines, etc.) to share huge pages. The idea is
> similar to Peter's suggestion in [1]. Please refer to each patch for
> more detais.
> 
> The ultimate goal is to only host kernel text in 2MB pages (for x86_64).
> 
> Please share your comments on this.
> 
> Thanks!
> 
> [1] https://lore.kernel.org/bpf/Ys6cWUMHO8XwyYgr@hirez.programming.kicks-ass.net/
> [2] RFC v1: https://lore.kernel.org/linux-mm/20220818224218.2399791-3-song@kernel.org/T/

Could you please share your comments on this version? Is this the 
right direction to move this work?

Thanks,
Song

> 
> Song Liu (4):
>  vmalloc: introduce vmalloc_exec and vfree_exec
>  bpf: use vmalloc_exec
>  modules, x86: use vmalloc_exec for module core
>  vmalloc_exec: share a huge page with kernel text
> 
> arch/x86/Kconfig              |   1 +
> arch/x86/kernel/alternative.c |  30 +++-
> arch/x86/kernel/module.c      |   1 +
> arch/x86/mm/init_64.c         |   3 +-
> include/linux/vmalloc.h       |   2 +
> kernel/bpf/core.c             | 155 ++----------------
> kernel/module/main.c          |  23 +--
> kernel/module/strict_rwx.c    |   3 -
> kernel/trace/ftrace.c         |   3 +-
> mm/nommu.c                    |   7 +
> mm/vmalloc.c                  | 296 ++++++++++++++++++++++++++++++++++
> 11 files changed, 358 insertions(+), 166 deletions(-)
> 
> --
> 2.30.2
Christoph Hellwig Oct. 17, 2022, 7:26 a.m. UTC | #3
On Fri, Oct 07, 2022 at 04:43:11PM -0700, Song Liu wrote:
> Changes RFC v1 => RFC v2:
> 1. Major rewrite of the logic of vmalloc_exec and vfree_exec. They now
>    work fine with BPF programs (patch 1, 2, 4). But module side (patch 3)
>    still need some work.

Can you please move the changelog under the description of WTF the
series actually does like the normal kernel process?  Explaining the
changes from a previous version before you even describe what the series
does is completely incoherent.

> This set is a prototype that allows dynamic kernel text (modules, bpf
> programs, various trampolines, etc.) to share huge pages. The idea is
> similar to Peter's suggestion in [1]. Please refer to each patch for
> more detais.

Well, nothing explains what the method is to avoid having memory
that is mapped writable and executable at the same time, which really
could use some explanation here (and in the main patch as well).
Song Liu Oct. 17, 2022, 4:23 p.m. UTC | #4
Hi Chritoph, 

> On Oct 17, 2022, at 12:26 AM, Christoph Hellwig <hch@lst.de> wrote:
> 
> On Fri, Oct 07, 2022 at 04:43:11PM -0700, Song Liu wrote:
>> Changes RFC v1 => RFC v2:
>> 1. Major rewrite of the logic of vmalloc_exec and vfree_exec. They now
>>   work fine with BPF programs (patch 1, 2, 4). But module side (patch 3)
>>   still need some work.
> 
> Can you please move the changelog under the description of WTF the
> series actually does like the normal kernel process?  Explaining the
> changes from a previous version before you even describe what the series
> does is completely incoherent.

Will fix in the next version. 

> 
>> This set is a prototype that allows dynamic kernel text (modules, bpf
>> programs, various trampolines, etc.) to share huge pages. The idea is
>> similar to Peter's suggestion in [1]. Please refer to each patch for
>> more detais.
> 
> Well, nothing explains what the method is to avoid having memory
> that is mapped writable and executable at the same time, which really
> could use some explanation here (and in the main patch as well).

Thanks for the feedback. I will add this. 

Does the code look good to you? I personally think patch 1, 2, 4 could
ship with a little more work. 

Thanks,
Song
Christoph Hellwig Oct. 18, 2022, 2:50 p.m. UTC | #5
On Mon, Oct 17, 2022 at 04:23:52PM +0000, Song Liu wrote:
> > Well, nothing explains what the method is to avoid having memory
> > that is mapped writable and executable at the same time, which really
> > could use some explanation here (and in the main patch as well).
> 
> Thanks for the feedback. I will add this. 
> 
> Does the code look good to you? I personally think patch 1, 2, 4 could
> ship with a little more work. 

I only took a quick look and I'm not sure how the W^X actually works.
Yes, it alls into the text poke helpers, but how do these work on
less than page sized allocations?
Song Liu Oct. 18, 2022, 3:05 p.m. UTC | #6
> On Oct 18, 2022, at 7:50 AM, Christoph Hellwig <hch@lst.de> wrote:
> 
> On Mon, Oct 17, 2022 at 04:23:52PM +0000, Song Liu wrote:
>>> Well, nothing explains what the method is to avoid having memory
>>> that is mapped writable and executable at the same time, which really
>>> could use some explanation here (and in the main patch as well).
>> 
>> Thanks for the feedback. I will add this. 
>> 
>> Does the code look good to you? I personally think patch 1, 2, 4 could
>> ship with a little more work. 
> 
> I only took a quick look and I'm not sure how the W^X actually works.
> Yes, it alls into the text poke helpers, but how do these work on
> less than page sized allocations?

Aha, I guess I understand your point (and concern) now. 

It is the same as text poke into static kernel text: we create a local 
writable mapping to the memory we need to update. For less than page
sized allocation, this mapping does have access to X memory that may 
belong to a different allocation, just like text poke into static 
kernel text. 

Maybe we need something like vcopy_exec(x_mem, tmp_buf, size), where
we explicitly check the allowed memory of x_mem is bigger or equal to
size. And users of vmalloc_exec should only use vcopy_exec to update
memory from vmalloc_exec. 

Does this make sense? Did I understand your concern correctly? 

Thanks,
Song
Christoph Hellwig Oct. 18, 2022, 3:40 p.m. UTC | #7
On Tue, Oct 18, 2022 at 03:05:24PM +0000, Song Liu wrote:
> Maybe we need something like vcopy_exec(x_mem, tmp_buf, size), where
> we explicitly check the allowed memory of x_mem is bigger or equal to
> size. And users of vmalloc_exec should only use vcopy_exec to update
> memory from vmalloc_exec. 
> 
> Does this make sense? Did I understand your concern correctly? 

It sounds sensible.  Make sure it iѕ properly documented and reviewed
by people that actually know this area.  I just know enough to ask
stupid question, not to very something is actually correct :)
Christoph Hellwig Oct. 18, 2022, 3:40 p.m. UTC | #8
On Tue, Oct 18, 2022 at 05:40:20PM +0200, Christoph Hellwig wrote:
> It sounds sensible.  Make sure it iѕ properly documented and reviewed
> by people that actually know this area.  I just know enough to ask
> stupid question, not to very something is actually correct :)

s/very/verify/