Message ID | 20221007234315.2877365-1-song@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | vmalloc_exec for modules and BPF programs | expand |
+ Luis. > On Oct 7, 2022, at 4:43 PM, Song Liu <song@kernel.org> wrote: > > Changes RFC v1 => RFC v2: > 1. Major rewrite of the logic of vmalloc_exec and vfree_exec. They now > work fine with BPF programs (patch 1, 2, 4). But module side (patch 3) > still need some work. > > This set is a prototype that allows dynamic kernel text (modules, bpf > programs, various trampolines, etc.) to share huge pages. The idea is > similar to Peter's suggestion in [1]. Please refer to each patch for > more detais. > > The ultimate goal is to only host kernel text in 2MB pages (for x86_64). > > Please share your comments on this. > > Thanks! > > [1] https://lore.kernel.org/bpf/Ys6cWUMHO8XwyYgr@hirez.programming.kicks-ass.net/ > [2] RFC v1: https://lore.kernel.org/linux-mm/20220818224218.2399791-3-song@kernel.org/T/ > > Song Liu (4): > vmalloc: introduce vmalloc_exec and vfree_exec > bpf: use vmalloc_exec > modules, x86: use vmalloc_exec for module core > vmalloc_exec: share a huge page with kernel text > > arch/x86/Kconfig | 1 + > arch/x86/kernel/alternative.c | 30 +++- > arch/x86/kernel/module.c | 1 + > arch/x86/mm/init_64.c | 3 +- > include/linux/vmalloc.h | 2 + > kernel/bpf/core.c | 155 ++---------------- > kernel/module/main.c | 23 +-- > kernel/module/strict_rwx.c | 3 - > kernel/trace/ftrace.c | 3 +- > mm/nommu.c | 7 + > mm/vmalloc.c | 296 ++++++++++++++++++++++++++++++++++ > 11 files changed, 358 insertions(+), 166 deletions(-) > > -- > 2.30.2
Hi Peter, > On Oct 7, 2022, at 4:43 PM, Song Liu <song@kernel.org> wrote: > > Changes RFC v1 => RFC v2: > 1. Major rewrite of the logic of vmalloc_exec and vfree_exec. They now > work fine with BPF programs (patch 1, 2, 4). But module side (patch 3) > still need some work. > > This set is a prototype that allows dynamic kernel text (modules, bpf > programs, various trampolines, etc.) to share huge pages. The idea is > similar to Peter's suggestion in [1]. Please refer to each patch for > more detais. > > The ultimate goal is to only host kernel text in 2MB pages (for x86_64). > > Please share your comments on this. > > Thanks! > > [1] https://lore.kernel.org/bpf/Ys6cWUMHO8XwyYgr@hirez.programming.kicks-ass.net/ > [2] RFC v1: https://lore.kernel.org/linux-mm/20220818224218.2399791-3-song@kernel.org/T/ Could you please share your comments on this version? Is this the right direction to move this work? Thanks, Song > > Song Liu (4): > vmalloc: introduce vmalloc_exec and vfree_exec > bpf: use vmalloc_exec > modules, x86: use vmalloc_exec for module core > vmalloc_exec: share a huge page with kernel text > > arch/x86/Kconfig | 1 + > arch/x86/kernel/alternative.c | 30 +++- > arch/x86/kernel/module.c | 1 + > arch/x86/mm/init_64.c | 3 +- > include/linux/vmalloc.h | 2 + > kernel/bpf/core.c | 155 ++---------------- > kernel/module/main.c | 23 +-- > kernel/module/strict_rwx.c | 3 - > kernel/trace/ftrace.c | 3 +- > mm/nommu.c | 7 + > mm/vmalloc.c | 296 ++++++++++++++++++++++++++++++++++ > 11 files changed, 358 insertions(+), 166 deletions(-) > > -- > 2.30.2
On Fri, Oct 07, 2022 at 04:43:11PM -0700, Song Liu wrote: > Changes RFC v1 => RFC v2: > 1. Major rewrite of the logic of vmalloc_exec and vfree_exec. They now > work fine with BPF programs (patch 1, 2, 4). But module side (patch 3) > still need some work. Can you please move the changelog under the description of WTF the series actually does like the normal kernel process? Explaining the changes from a previous version before you even describe what the series does is completely incoherent. > This set is a prototype that allows dynamic kernel text (modules, bpf > programs, various trampolines, etc.) to share huge pages. The idea is > similar to Peter's suggestion in [1]. Please refer to each patch for > more detais. Well, nothing explains what the method is to avoid having memory that is mapped writable and executable at the same time, which really could use some explanation here (and in the main patch as well).
Hi Chritoph, > On Oct 17, 2022, at 12:26 AM, Christoph Hellwig <hch@lst.de> wrote: > > On Fri, Oct 07, 2022 at 04:43:11PM -0700, Song Liu wrote: >> Changes RFC v1 => RFC v2: >> 1. Major rewrite of the logic of vmalloc_exec and vfree_exec. They now >> work fine with BPF programs (patch 1, 2, 4). But module side (patch 3) >> still need some work. > > Can you please move the changelog under the description of WTF the > series actually does like the normal kernel process? Explaining the > changes from a previous version before you even describe what the series > does is completely incoherent. Will fix in the next version. > >> This set is a prototype that allows dynamic kernel text (modules, bpf >> programs, various trampolines, etc.) to share huge pages. The idea is >> similar to Peter's suggestion in [1]. Please refer to each patch for >> more detais. > > Well, nothing explains what the method is to avoid having memory > that is mapped writable and executable at the same time, which really > could use some explanation here (and in the main patch as well). Thanks for the feedback. I will add this. Does the code look good to you? I personally think patch 1, 2, 4 could ship with a little more work. Thanks, Song
On Mon, Oct 17, 2022 at 04:23:52PM +0000, Song Liu wrote: > > Well, nothing explains what the method is to avoid having memory > > that is mapped writable and executable at the same time, which really > > could use some explanation here (and in the main patch as well). > > Thanks for the feedback. I will add this. > > Does the code look good to you? I personally think patch 1, 2, 4 could > ship with a little more work. I only took a quick look and I'm not sure how the W^X actually works. Yes, it alls into the text poke helpers, but how do these work on less than page sized allocations?
> On Oct 18, 2022, at 7:50 AM, Christoph Hellwig <hch@lst.de> wrote: > > On Mon, Oct 17, 2022 at 04:23:52PM +0000, Song Liu wrote: >>> Well, nothing explains what the method is to avoid having memory >>> that is mapped writable and executable at the same time, which really >>> could use some explanation here (and in the main patch as well). >> >> Thanks for the feedback. I will add this. >> >> Does the code look good to you? I personally think patch 1, 2, 4 could >> ship with a little more work. > > I only took a quick look and I'm not sure how the W^X actually works. > Yes, it alls into the text poke helpers, but how do these work on > less than page sized allocations? Aha, I guess I understand your point (and concern) now. It is the same as text poke into static kernel text: we create a local writable mapping to the memory we need to update. For less than page sized allocation, this mapping does have access to X memory that may belong to a different allocation, just like text poke into static kernel text. Maybe we need something like vcopy_exec(x_mem, tmp_buf, size), where we explicitly check the allowed memory of x_mem is bigger or equal to size. And users of vmalloc_exec should only use vcopy_exec to update memory from vmalloc_exec. Does this make sense? Did I understand your concern correctly? Thanks, Song
On Tue, Oct 18, 2022 at 03:05:24PM +0000, Song Liu wrote: > Maybe we need something like vcopy_exec(x_mem, tmp_buf, size), where > we explicitly check the allowed memory of x_mem is bigger or equal to > size. And users of vmalloc_exec should only use vcopy_exec to update > memory from vmalloc_exec. > > Does this make sense? Did I understand your concern correctly? It sounds sensible. Make sure it iѕ properly documented and reviewed by people that actually know this area. I just know enough to ask stupid question, not to very something is actually correct :)
On Tue, Oct 18, 2022 at 05:40:20PM +0200, Christoph Hellwig wrote: > It sounds sensible. Make sure it iѕ properly documented and reviewed > by people that actually know this area. I just know enough to ask > stupid question, not to very something is actually correct :) s/very/verify/