Message ID | 20221031222541.1773452-1-song@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | vmalloc_exec for modules and BPF programs | expand |
On Mon, Oct 31, 2022 at 03:25:36PM -0700, Song Liu wrote: > This set enables bpf programs and bpf dispatchers to share huge pages with > new API: > vmalloc_exec() > vfree_exec() > vcopy_exec() Maybe it's just me, but I don't like the names very much. They imply a slight extension to the vmalloc API, but while they use the vmalloc mechanisms internally, the API is actually quite different. So why not something like: execmem_alloc execmem_free execmem_fill or execmem_set or copy_to_execmem ?
On Tue, Nov 1, 2022 at 4:26 AM Christoph Hellwig <hch@lst.de> wrote: > > On Mon, Oct 31, 2022 at 03:25:36PM -0700, Song Liu wrote: > > This set enables bpf programs and bpf dispatchers to share huge pages with > > new API: > > vmalloc_exec() > > vfree_exec() > > vcopy_exec() > > Maybe it's just me, but I don't like the names very much. They imply > a slight extension to the vmalloc API, but while they use the vmalloc > mechanisms internally, the API is actually quite different. > > So why not something like: > > execmem_alloc > execmem_free > execmem_fill or execmem_set or copy_to_execmem > > ? I don't have a strong preference on names. We can change the name to whatever we agree on. Thanks, Song
On Mon, Oct 31, 2022 at 03:25:36PM -0700, Song Liu wrote: > Changes RFC v2 => PATCH v1: <-- snip --> > 3. Drop changes for kernel modules and focus on BPF side changes. Yeah because in my testing of RFCv2 things blow up pretty hard, I haven't been able to successfully even boot *any* system with those patches, virutal or bare metal. I'm very happy to help test new iterations if you have anything. Luis
On Mon, 2022-10-31 at 15:25 -0700, Song Liu wrote: > This set enables bpf programs and bpf dispatchers to share huge pages > with > new API: > vmalloc_exec() > vfree_exec() > vcopy_exec() > > The idea is similar to Peter's suggestion in [1]. > > vmalloc_exec() manages a set of PMD_SIZE RO+X memory, and allocates > these > memory to its users. vfree_exec() is used to free memory allocated by > vmalloc_exec(). vcopy_exec() is used to update memory allocated by > vmalloc_exec(). > > Memory allocated by vmalloc_exec() is RO+X, so this doesnot violate > W^X. > The caller has to update the content with text_poke like mechanism. > Specifically, vcopy_exec() is provided to update memory allocated by > vmalloc_exec(). vcopy_exec() also makes sure the update stays in the > boundary of one chunk allocated by vmalloc_exec(). Please refer to > patch > 1/5 for more details of > > Patch 3/5 uses these new APIs in bpf program and bpf dispatcher. > > Patch 4/5 and 5/5 allows static kernel text (_stext to _etext) to > share > PMD_SIZE pages with dynamic kernel text on x86_64. This is achieved > by > allocating PMD_SIZE pages to roundup(_etext, PMD_SIZE), and then use > _etext to roundup(_etext, PMD_SIZE) for dynamic kernel text. It might help to spell out what the benefits of this are. My understanding is that (to my surprise) we actually haven't seen a performance improvement with using 2MB pages for JITs. The main performance benefit you saw on your previous version was from reduced fragmentation of the direct map IIUC. This was from the effect of reusing the same pages for JITs so that new ones don't need to be broken. The other benefit of this thing is reduced shootdowns. It can load a JIT with about only a local TLB flush on average, which should help really high cpu systems some unknown amount.
On Wed, Nov 2, 2022 at 3:30 PM Edgecombe, Rick P <rick.p.edgecombe@intel.com> wrote: > > On Mon, 2022-10-31 at 15:25 -0700, Song Liu wrote: > > This set enables bpf programs and bpf dispatchers to share huge pages > > with > > new API: > > vmalloc_exec() > > vfree_exec() > > vcopy_exec() > > > > The idea is similar to Peter's suggestion in [1]. > > > > vmalloc_exec() manages a set of PMD_SIZE RO+X memory, and allocates > > these > > memory to its users. vfree_exec() is used to free memory allocated by > > vmalloc_exec(). vcopy_exec() is used to update memory allocated by > > vmalloc_exec(). > > > > Memory allocated by vmalloc_exec() is RO+X, so this doesnot violate > > W^X. > > The caller has to update the content with text_poke like mechanism. > > Specifically, vcopy_exec() is provided to update memory allocated by > > vmalloc_exec(). vcopy_exec() also makes sure the update stays in the > > boundary of one chunk allocated by vmalloc_exec(). Please refer to > > patch > > 1/5 for more details of > > > > Patch 3/5 uses these new APIs in bpf program and bpf dispatcher. > > > > Patch 4/5 and 5/5 allows static kernel text (_stext to _etext) to > > share > > PMD_SIZE pages with dynamic kernel text on x86_64. This is achieved > > by > > allocating PMD_SIZE pages to roundup(_etext, PMD_SIZE), and then use > > _etext to roundup(_etext, PMD_SIZE) for dynamic kernel text. > > It might help to spell out what the benefits of this are. My > understanding is that (to my surprise) we actually haven't seen a > performance improvement with using 2MB pages for JITs. The main > performance benefit you saw on your previous version was from reduced > fragmentation of the direct map IIUC. This was from the effect of > reusing the same pages for JITs so that new ones don't need to be > broken. > > The other benefit of this thing is reduced shootdowns. It can load a > JIT with about only a local TLB flush on average, which should help > really high cpu systems some unknown amount. Thanks for pointing out the missing information. I don't have a benchmark that uses very big BPF programs, so the results I have don't show much benefit from fewer iTLB misses. Song