diff mbox series

[v2,bpf-next,14/20] libbpf: Recognize __arena global varaibles.

Message ID 20240209040608.98927-15-alexei.starovoitov@gmail.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series bpf: Introduce BPF arena. | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-25 fail Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18 and -O2 optimization
bpf/vmtest-bpf-next-VM_Test-26 success Logs for x86_64-llvm-18 / test
bpf/vmtest-bpf-next-VM_Test-27 success Logs for x86_64-llvm-18 / veristat
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-2 success Logs for Unittests
bpf/vmtest-bpf-next-VM_Test-3 success Logs for Validate matrix.py
bpf/vmtest-bpf-next-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-next-VM_Test-4 fail Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-5 success Logs for aarch64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-6 success Logs for aarch64-gcc / test
bpf/vmtest-bpf-next-VM_Test-7 success Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-9 success Logs for s390x-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-8 fail Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for s390x-gcc / test
bpf/vmtest-bpf-next-VM_Test-11 success Logs for s390x-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-12 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-13 fail Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-14 success Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-15 success Logs for x86_64-gcc / test
bpf/vmtest-bpf-next-VM_Test-16 success Logs for x86_64-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-17 fail Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-19 success Logs for x86_64-llvm-17 / test
bpf/vmtest-bpf-next-VM_Test-20 success Logs for x86_64-llvm-17 / veristat
bpf/vmtest-bpf-next-VM_Test-21 fail Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-23 success Logs for x86_64-llvm-18 / test
bpf/vmtest-bpf-next-VM_Test-24 success Logs for x86_64-llvm-18 / veristat
bpf/vmtest-bpf-next-VM_Test-22 fail Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18 and -O2 optimization
bpf/vmtest-bpf-next-VM_Test-18 fail Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17 and -O2 optimization
netdev/series_format fail Series longer than 15 patches (and no cover letter)
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 8 this patch: 8
netdev/build_tools success Errors and warnings before: 1 this patch: 1
netdev/cc_maintainers warning 14 maintainers not CCed: quentin@isovalent.com jolsa@kernel.org john.fastabend@gmail.com nathan@kernel.org yonghong.song@linux.dev song@kernel.org martin.lau@linux.dev sdf@google.com morbo@google.com justinstitt@google.com kpsingh@kernel.org ndesaulniers@google.com llvm@lists.linux.dev haoluo@google.com
netdev/build_clang success Errors and warnings before: 8 this patch: 8
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 8 this patch: 8
netdev/checkpatch fail CHECK: No space is necessary after a cast ERROR: code indent should use tabs where possible WARNING: Prefer strscpy, strscpy_pad, or __nonstring over strncpy - see: https://github.com/KSPP/linux/issues/90 WARNING: break quoted strings at a space character WARNING: line length of 82 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns WARNING: line length of 84 exceeds 80 columns WARNING: line length of 85 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns WARNING: quoted string split across lines
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Alexei Starovoitov Feb. 9, 2024, 4:06 a.m. UTC
From: Alexei Starovoitov <ast@kernel.org>

LLVM automatically places __arena variables into ".arena.1" ELF section.
When libbpf sees such section it creates internal 'struct bpf_map' LIBBPF_MAP_ARENA
that is connected to actual BPF_MAP_TYPE_ARENA 'struct bpf_map'.
They share the same kernel's side bpf map and single map_fd.
Both are emitted into skeleton. Real arena with the name given by bpf program
in SEC(".maps") and another with "__arena_internal" name.
All global variables from ".arena.1" section are accessible from user space
via skel->arena->name_of_var.

For bss/data/rodata the skeleton/libbpf perform the following sequence:
1. addr = mmap(MAP_ANONYMOUS)
2. user space optionally modifies global vars
3. map_fd = bpf_create_map()
4. bpf_update_map_elem(map_fd, addr) // to store values into the kernel
5. mmap(addr, MAP_FIXED, map_fd)
after step 5 user spaces see the values it wrote at step 2 at the same addresses

arena doesn't support update_map_elem. Hence skeleton/libbpf do:
1. addr = mmap(MAP_ANONYMOUS)
2. user space optionally modifies global vars
3. map_fd = bpf_create_map(MAP_TYPE_ARENA)
4. real_addr = mmap(map->map_extra, MAP_SHARED | MAP_FIXED, map_fd)
5. memcpy(real_addr, addr) // this will fault-in and allocate pages
6. munmap(addr)

At the end look and feel of global data vs __arena global data is the same from bpf prog pov.

Another complication is:
struct {
  __uint(type, BPF_MAP_TYPE_ARENA);
} arena SEC(".maps");

int __arena foo;
int bar;

  ptr1 = &foo;   // relocation against ".arena.1" section
  ptr2 = &arena; // relocation against ".maps" section
  ptr3 = &bar;   // relocation against ".bss" section

Fo the kernel ptr1 and ptr2 has point to the same arena's map_fd
while ptr3 points to a different global array's map_fd.
For the verifier:
ptr1->type == unknown_scalar
ptr2->type == const_ptr_to_map
ptr3->type == ptr_to_map_value

after the verifier and for JIT all 3 ptr-s are normal ld_imm64 insns.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 tools/bpf/bpftool/gen.c |  13 ++++-
 tools/lib/bpf/libbpf.c  | 102 +++++++++++++++++++++++++++++++++++-----
 2 files changed, 101 insertions(+), 14 deletions(-)

Comments

Eduard Zingerman Feb. 13, 2024, 12:34 a.m. UTC | #1
On Thu, 2024-02-08 at 20:06 -0800, Alexei Starovoitov wrote:
> From: Alexei Starovoitov <ast@kernel.org>
> 
> LLVM automatically places __arena variables into ".arena.1" ELF section.
> When libbpf sees such section it creates internal 'struct bpf_map' LIBBPF_MAP_ARENA
> that is connected to actual BPF_MAP_TYPE_ARENA 'struct bpf_map'.
> They share the same kernel's side bpf map and single map_fd.
> Both are emitted into skeleton. Real arena with the name given by bpf program
> in SEC(".maps") and another with "__arena_internal" name.
> All global variables from ".arena.1" section are accessible from user space
> via skel->arena->name_of_var.

[...]

I hit a strange bug when playing with patch. Consider a simple example [0].
When the following BPF global variable:

    int __arena * __arena bar;

- is commented -- the test passes;
- is uncommented -- in the test fails because global variable 'shared' is NULL.

Note: the second __arena is necessary to put 'bar' to .arena.1 section.

[0] https://github.com/kernel-patches/bpf/commit/6d95c8557c25d01ef3f13e6aef2bda9ac2516484
Alexei Starovoitov Feb. 13, 2024, 12:44 a.m. UTC | #2
On Mon, Feb 12, 2024 at 4:34 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2024-02-08 at 20:06 -0800, Alexei Starovoitov wrote:
> > From: Alexei Starovoitov <ast@kernel.org>
> >
> > LLVM automatically places __arena variables into ".arena.1" ELF section.
> > When libbpf sees such section it creates internal 'struct bpf_map' LIBBPF_MAP_ARENA
> > that is connected to actual BPF_MAP_TYPE_ARENA 'struct bpf_map'.
> > They share the same kernel's side bpf map and single map_fd.
> > Both are emitted into skeleton. Real arena with the name given by bpf program
> > in SEC(".maps") and another with "__arena_internal" name.
> > All global variables from ".arena.1" section are accessible from user space
> > via skel->arena->name_of_var.
>
> [...]
>
> I hit a strange bug when playing with patch. Consider a simple example [0].
> When the following BPF global variable:
>
>     int __arena * __arena bar;
>
> - is commented -- the test passes;
> - is uncommented -- in the test fails because global variable 'shared' is NULL.

Right. That's expected, because __uint(max_entries, 1);
The test creates an area on 1 page and it's consumed
by int __arena * __arena bar; variable.
Of course, one variable doesn't take the whole page.
There could have been many arena global vars.
But that page is not available anymore to bpf_arena_alloc_pages,
so it returns NULL.
Eduard Zingerman Feb. 13, 2024, 12:49 a.m. UTC | #3
On Mon, 2024-02-12 at 16:44 -0800, Alexei Starovoitov wrote:
> > I hit a strange bug when playing with patch. Consider a simple example [0].
> > When the following BPF global variable:
> > 
> >     int __arena * __arena bar;
> > 
> > - is commented -- the test passes;
> > - is uncommented -- in the test fails because global variable 'shared' is NULL.
> 
> Right. That's expected, because __uint(max_entries, 1);
> The test creates an area on 1 page and it's consumed
> by int __arena * __arena bar; variable.
> Of course, one variable doesn't take the whole page.
> There could have been many arena global vars.
> But that page is not available anymore to bpf_arena_alloc_pages,
> so it returns NULL.

My bad, thank you for explaining.
Alexei Starovoitov Feb. 13, 2024, 2:08 a.m. UTC | #4
On Mon, Feb 12, 2024 at 4:49 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Mon, 2024-02-12 at 16:44 -0800, Alexei Starovoitov wrote:
> > > I hit a strange bug when playing with patch. Consider a simple example [0].
> > > When the following BPF global variable:
> > >
> > >     int __arena * __arena bar;
> > >
> > > - is commented -- the test passes;
> > > - is uncommented -- in the test fails because global variable 'shared' is NULL.
> >
> > Right. That's expected, because __uint(max_entries, 1);
> > The test creates an area on 1 page and it's consumed
> > by int __arena * __arena bar; variable.
> > Of course, one variable doesn't take the whole page.
> > There could have been many arena global vars.
> > But that page is not available anymore to bpf_arena_alloc_pages,
> > so it returns NULL.
>
> My bad, thank you for explaining.

Since it was a surprising behavior we can make libbpf
to auto-extend max_entries with the number of pages necessary
for arena global vars, but it will be surprising too.

struct {
  __uint(type, BPF_MAP_TYPE_ARENA);
  __uint(map_flags, BPF_F_MMAPABLE);
  __ulong(map_extra, 2ull << 44);  // this is start of user VMA
  __uint(max_entries, 1000);       // this is length of user VMA in pages
} arena SEC(".maps");

if libbpf adds extra pages to max_entries the user_vm_end shifts too
and libbpf would need to mmap() it with that size.
When all is hidden in libbpf it's fine, but still can be a surprise
to see a different max_entries in map_info and bpftool map list.
Not sure which way is user friendlier.
Eduard Zingerman Feb. 13, 2024, 12:48 p.m. UTC | #5
On Mon, 2024-02-12 at 18:08 -0800, Alexei Starovoitov wrote:
[...]

> Since it was a surprising behavior we can make libbpf
> to auto-extend max_entries with the number of pages necessary
> for arena global vars, but it will be surprising too.
> 
> struct {
>   __uint(type, BPF_MAP_TYPE_ARENA);
>   __uint(map_flags, BPF_F_MMAPABLE);
>   __ulong(map_extra, 2ull << 44);  // this is start of user VMA
>   __uint(max_entries, 1000);       // this is length of user VMA in pages
> } arena SEC(".maps");
> 
> if libbpf adds extra pages to max_entries the user_vm_end shifts too
> and libbpf would need to mmap() it with that size.
> When all is hidden in libbpf it's fine, but still can be a surprise
> to see a different max_entries in map_info and bpftool map list.
> Not sure which way is user friendlier.

Adjusting max_entries would be surprising indeed.
On the other hand, it would remove the error condition about
"Declared arena map size %zd is too small ...".
Probably either way is fine, as long as it is documented.
Don't have a strong opinion.
Eduard Zingerman Feb. 13, 2024, 11:11 p.m. UTC | #6
On Thu, 2024-02-08 at 20:06 -0800, Alexei Starovoitov wrote:
> From: Alexei Starovoitov <ast@kernel.org>
> 
> LLVM automatically places __arena variables into ".arena.1" ELF section.
> When libbpf sees such section it creates internal 'struct bpf_map' LIBBPF_MAP_ARENA
> that is connected to actual BPF_MAP_TYPE_ARENA 'struct bpf_map'.
> They share the same kernel's side bpf map and single map_fd.
> Both are emitted into skeleton. Real arena with the name given by bpf program
> in SEC(".maps") and another with "__arena_internal" name.
> All global variables from ".arena.1" section are accessible from user space
> via skel->arena->name_of_var.
> 
> For bss/data/rodata the skeleton/libbpf perform the following sequence:
> 1. addr = mmap(MAP_ANONYMOUS)
> 2. user space optionally modifies global vars
> 3. map_fd = bpf_create_map()
> 4. bpf_update_map_elem(map_fd, addr) // to store values into the kernel
> 5. mmap(addr, MAP_FIXED, map_fd)
> after step 5 user spaces see the values it wrote at step 2 at the same addresses
> 
> arena doesn't support update_map_elem. Hence skeleton/libbpf do:
> 1. addr = mmap(MAP_ANONYMOUS)
> 2. user space optionally modifies global vars
> 3. map_fd = bpf_create_map(MAP_TYPE_ARENA)
> 4. real_addr = mmap(map->map_extra, MAP_SHARED | MAP_FIXED, map_fd)
> 5. memcpy(real_addr, addr) // this will fault-in and allocate pages
> 6. munmap(addr)
> 
> At the end look and feel of global data vs __arena global data is the same from bpf prog pov.

[...]

So, at first I thought that having two maps is a bit of a hack.
However, after trying to make it work with only one map I don't really
like that either :)

The patch looks good to me, have not spotted any logical issues.

I have two questions if you don't mind:

First is regarding initialization data.
In bpf_object__init_internal_map() the amount of bpf_map_mmap_sz(map)
bytes is mmaped and only data_sz bytes are copied,
then bpf_map_mmap_sz(map) bytes are copied in bpf_object__create_maps().
Is Linux/libc smart enough to skip action on pages which were mmaped but
never touched?

Second is regarding naming.
Currently only one arena is supported, and generated skel has a single '->arena' field.
Is there a plan to support multiple arenas at some point?
If so, should '->arena' field use the same name as arena map declared in program?
Andrii Nakryiko Feb. 13, 2024, 11:15 p.m. UTC | #7
On Thu, Feb 8, 2024 at 8:07 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> From: Alexei Starovoitov <ast@kernel.org>
>
> LLVM automatically places __arena variables into ".arena.1" ELF section.
> When libbpf sees such section it creates internal 'struct bpf_map' LIBBPF_MAP_ARENA
> that is connected to actual BPF_MAP_TYPE_ARENA 'struct bpf_map'.
> They share the same kernel's side bpf map and single map_fd.
> Both are emitted into skeleton. Real arena with the name given by bpf program
> in SEC(".maps") and another with "__arena_internal" name.
> All global variables from ".arena.1" section are accessible from user space
> via skel->arena->name_of_var.

This "real arena" vs "__arena_internal" seems like an unnecessary
complication. When processing .arena.1 ELF section, we search for
explicit BPF_MAP_TYPE_ARENA map, right? Great, at that point, we can
use that map and it's map->mmapable pointer (we mmap() anonymous
memory to hold initial values of global variables). We copy init
values into map->mmapable on actual arena struct bpf_map. Then during
map creation (during load) we do a new mmap(map_fd), taking into
account map_extra. Then memcpy() from the original anonymous mmap into
this arena-linked mmap. Then discard the original mmap.

This way we don't have fake maps anymore, we initialize actual map (we
might need to just remember smaller init mmap_sz, which doesn't seem
like a big complication). WDYT?

BTW, I think bpf_object__load_skeleton() can re-assign skeleton's
arena data pointer, so user accessing skel->arena->var before and
after skeleton load will be getting correct pointer.

>
> For bss/data/rodata the skeleton/libbpf perform the following sequence:
> 1. addr = mmap(MAP_ANONYMOUS)
> 2. user space optionally modifies global vars
> 3. map_fd = bpf_create_map()
> 4. bpf_update_map_elem(map_fd, addr) // to store values into the kernel
> 5. mmap(addr, MAP_FIXED, map_fd)
> after step 5 user spaces see the values it wrote at step 2 at the same addresses
>
> arena doesn't support update_map_elem. Hence skeleton/libbpf do:
> 1. addr = mmap(MAP_ANONYMOUS)
> 2. user space optionally modifies global vars
> 3. map_fd = bpf_create_map(MAP_TYPE_ARENA)
> 4. real_addr = mmap(map->map_extra, MAP_SHARED | MAP_FIXED, map_fd)
> 5. memcpy(real_addr, addr) // this will fault-in and allocate pages
> 6. munmap(addr)
>
> At the end look and feel of global data vs __arena global data is the same from bpf prog pov.
>
> Another complication is:
> struct {
>   __uint(type, BPF_MAP_TYPE_ARENA);
> } arena SEC(".maps");
>
> int __arena foo;
> int bar;
>
>   ptr1 = &foo;   // relocation against ".arena.1" section
>   ptr2 = &arena; // relocation against ".maps" section
>   ptr3 = &bar;   // relocation against ".bss" section
>
> Fo the kernel ptr1 and ptr2 has point to the same arena's map_fd
> while ptr3 points to a different global array's map_fd.
> For the verifier:
> ptr1->type == unknown_scalar
> ptr2->type == const_ptr_to_map
> ptr3->type == ptr_to_map_value
>
> after the verifier and for JIT all 3 ptr-s are normal ld_imm64 insns.
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
>  tools/bpf/bpftool/gen.c |  13 ++++-
>  tools/lib/bpf/libbpf.c  | 102 +++++++++++++++++++++++++++++++++++-----
>  2 files changed, 101 insertions(+), 14 deletions(-)
>

[...]

> @@ -1718,10 +1722,34 @@ static int
>  bpf_object__init_internal_map(struct bpf_object *obj, enum libbpf_map_type type,
>                               const char *real_name, int sec_idx, void *data, size_t data_sz)
>  {
> +       const long page_sz = sysconf(_SC_PAGE_SIZE);
> +       struct bpf_map *map, *arena = NULL;
>         struct bpf_map_def *def;
> -       struct bpf_map *map;
>         size_t mmap_sz;
> -       int err;
> +       int err, i;
> +
> +       if (type == LIBBPF_MAP_ARENA) {
> +               for (i = 0; i < obj->nr_maps; i++) {
> +                       map = &obj->maps[i];
> +                       if (map->def.type != BPF_MAP_TYPE_ARENA)
> +                               continue;
> +                       arena = map;
> +                       real_name = "__arena_internal";
> +                       mmap_sz = bpf_map_mmap_sz(map);
> +                       if (roundup(data_sz, page_sz) > mmap_sz) {
> +                               pr_warn("Declared arena map size %zd is too small to hold"
> +                                       "global __arena variables of size %zd\n",
> +                                       mmap_sz, data_sz);
> +                               return -E2BIG;
> +                       }
> +                       break;
> +               }
> +               if (!arena) {
> +                       pr_warn("To use global __arena variables the arena map should"
> +                               "be declared explicitly in SEC(\".maps\")\n");
> +                       return -ENOENT;
> +               }

so basically here we found arena, let's arena->mmapable =
mmap(MAP_ANONYMOUS) here, memcpy(arena->mmapable, data, data_sz) and
exit early, not doing bpf_object__add_map()?


> +       }
>
>         map = bpf_object__add_map(obj);
>         if (IS_ERR(map))
> @@ -1732,6 +1760,7 @@ bpf_object__init_internal_map(struct bpf_object *obj, enum libbpf_map_type type,
>         map->sec_offset = 0;
>         map->real_name = strdup(real_name);
>         map->name = internal_map_name(obj, real_name);
> +       map->arena = arena;
>         if (!map->real_name || !map->name) {
>                 zfree(&map->real_name);
>                 zfree(&map->name);

[...]

> @@ -13437,6 +13508,11 @@ int bpf_object__load_skeleton(struct bpf_object_skeleton *s)
>                         continue;
>                 }
>
> +               if (map->arena) {
> +                       *mmaped = map->arena->mmaped;
> +                       continue;
> +               }
> +

yep, I was going to suggest that we can fix up this pointer in
load_skeleton, nice


>                 if (map->def.map_flags & BPF_F_RDONLY_PROG)
>                         prot = PROT_READ;
>                 else
> --
> 2.34.1
>
Andrii Nakryiko Feb. 13, 2024, 11:17 p.m. UTC | #8
On Tue, Feb 13, 2024 at 3:11 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2024-02-08 at 20:06 -0800, Alexei Starovoitov wrote:
> > From: Alexei Starovoitov <ast@kernel.org>
> >
> > LLVM automatically places __arena variables into ".arena.1" ELF section.
> > When libbpf sees such section it creates internal 'struct bpf_map' LIBBPF_MAP_ARENA
> > that is connected to actual BPF_MAP_TYPE_ARENA 'struct bpf_map'.
> > They share the same kernel's side bpf map and single map_fd.
> > Both are emitted into skeleton. Real arena with the name given by bpf program
> > in SEC(".maps") and another with "__arena_internal" name.
> > All global variables from ".arena.1" section are accessible from user space
> > via skel->arena->name_of_var.
> >
> > For bss/data/rodata the skeleton/libbpf perform the following sequence:
> > 1. addr = mmap(MAP_ANONYMOUS)
> > 2. user space optionally modifies global vars
> > 3. map_fd = bpf_create_map()
> > 4. bpf_update_map_elem(map_fd, addr) // to store values into the kernel
> > 5. mmap(addr, MAP_FIXED, map_fd)
> > after step 5 user spaces see the values it wrote at step 2 at the same addresses
> >
> > arena doesn't support update_map_elem. Hence skeleton/libbpf do:
> > 1. addr = mmap(MAP_ANONYMOUS)
> > 2. user space optionally modifies global vars
> > 3. map_fd = bpf_create_map(MAP_TYPE_ARENA)
> > 4. real_addr = mmap(map->map_extra, MAP_SHARED | MAP_FIXED, map_fd)
> > 5. memcpy(real_addr, addr) // this will fault-in and allocate pages
> > 6. munmap(addr)
> >
> > At the end look and feel of global data vs __arena global data is the same from bpf prog pov.
>
> [...]
>
> So, at first I thought that having two maps is a bit of a hack.

yep, that was my instinct as well

> However, after trying to make it work with only one map I don't really
> like that either :)

Can you elaborate? see my reply to Alexei, I wonder how did you think
about doing this?

>
> The patch looks good to me, have not spotted any logical issues.
>
> I have two questions if you don't mind:
>
> First is regarding initialization data.
> In bpf_object__init_internal_map() the amount of bpf_map_mmap_sz(map)
> bytes is mmaped and only data_sz bytes are copied,
> then bpf_map_mmap_sz(map) bytes are copied in bpf_object__create_maps().
> Is Linux/libc smart enough to skip action on pages which were mmaped but
> never touched?
>
> Second is regarding naming.
> Currently only one arena is supported, and generated skel has a single '->arena' field.
> Is there a plan to support multiple arenas at some point?
> If so, should '->arena' field use the same name as arena map declared in program?
>
Eduard Zingerman Feb. 13, 2024, 11:36 p.m. UTC | #9
On Tue, 2024-02-13 at 15:17 -0800, Andrii Nakryiko wrote:

[...]

> > So, at first I thought that having two maps is a bit of a hack.
> 
> yep, that was my instinct as well
> 
> > However, after trying to make it work with only one map I don't really
> > like that either :)
> 
> Can you elaborate? see my reply to Alexei, I wonder how did you think
> about doing this?

Relocations in the ELF file are against a new section: ".arena.1".
This works nicely with logic in bpf_program__record_reloc().
If single map is used, we effectively need to track two indexes for
the map section:
- one used for relocations against map variables themselves
  (named "generic map reference relocation" in the function code);
- one used for relocations against ".arena.1"
  (named "global data map relocation" in the function code).

This spooked me off:
- either bpf_object__init_internal_map() would have a specialized
  branch for arenas, as with current approach;
- or bpf_program__record_reloc() would have a specialized branch for arenas,
  as with one map approach.

Additionally, skel generation logic currently assumes that mmapable
bindings would be generated only for internal maps,
but that is probably not a big deal.
Andrii Nakryiko Feb. 14, 2024, 12:09 a.m. UTC | #10
On Tue, Feb 13, 2024 at 3:37 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Tue, 2024-02-13 at 15:17 -0800, Andrii Nakryiko wrote:
>
> [...]
>
> > > So, at first I thought that having two maps is a bit of a hack.
> >
> > yep, that was my instinct as well
> >
> > > However, after trying to make it work with only one map I don't really
> > > like that either :)
> >
> > Can you elaborate? see my reply to Alexei, I wonder how did you think
> > about doing this?
>
> Relocations in the ELF file are against a new section: ".arena.1".
> This works nicely with logic in bpf_program__record_reloc().
> If single map is used, we effectively need to track two indexes for
> the map section:
> - one used for relocations against map variables themselves
>   (named "generic map reference relocation" in the function code);
> - one used for relocations against ".arena.1"
>   (named "global data map relocation" in the function code).
>
> This spooked me off:
> - either bpf_object__init_internal_map() would have a specialized
>   branch for arenas, as with current approach;
> - or bpf_program__record_reloc() would have a specialized branch for arenas,
>   as with one map approach.

Yes, relocations would know about .arena.1, but it's a pretty simple
check in a few places. We basically have arena *definition* sec_idx
(corresponding to SEC(".maps")) and arena *data* sec_idx. The latter
is what is recorded for global variables in .arena.1. We can remember
this arena data sec_idx in struct bpf_object once during ELF
processing, and then just special case it internally in a few places.

The "fake" bpf_map for __arena_internal is user-visible and requires
autocreate=false tricks, etc. I feel like it's a worse tradeoff from a
user API perspective than a few extra ARENA-specific internal checks
(which we already have a few anyways, ARENA is not completely
transparent internally anyways).


>
> Additionally, skel generation logic currently assumes that mmapable
> bindings would be generated only for internal maps,
> but that is probably not a big deal.

yeah, it's not, we will have STRUCT_OPS maps handled special anyways
(Kui-Feng posted an RFC already), so ARENA won't be the only one
special case
Eduard Zingerman Feb. 14, 2024, 12:16 a.m. UTC | #11
On Tue, 2024-02-13 at 16:09 -0800, Andrii Nakryiko wrote:
[...]

> The "fake" bpf_map for __arena_internal is user-visible and requires
> autocreate=false tricks, etc. I feel like it's a worse tradeoff from a
> user API perspective than a few extra ARENA-specific internal checks
> (which we already have a few anyways, ARENA is not completely
> transparent internally anyways).

By user-visible you mean when doing "bpf_object__for_each_map()", right?
Shouldn't users ignore bpf_map__is_internal() maps?
But I agree that having one map might be a bit cleaner.
Andrii Nakryiko Feb. 14, 2024, 12:29 a.m. UTC | #12
On Tue, Feb 13, 2024 at 4:16 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Tue, 2024-02-13 at 16:09 -0800, Andrii Nakryiko wrote:
> [...]
>
> > The "fake" bpf_map for __arena_internal is user-visible and requires
> > autocreate=false tricks, etc. I feel like it's a worse tradeoff from a
> > user API perspective than a few extra ARENA-specific internal checks
> > (which we already have a few anyways, ARENA is not completely
> > transparent internally anyways).
>
> By user-visible you mean when doing "bpf_object__for_each_map()", right?
> Shouldn't users ignore bpf_map__is_internal() maps?

no, not really, they are valid maps, and you can
bpf_map__set_value_size() on them (for example). Similarly for
bpf_map__initial_value(). And here it will be interesting that before
load you should use bpf_map__initial_value(__arena_internal) if you
want to tune something, and after load it will be
bpf_map__initial_value(real_arena).

While if we combine them, it actually will work more naturally.

> But I agree that having one map might be a bit cleaner.

+1
Alexei Starovoitov Feb. 14, 2024, 1:02 a.m. UTC | #13
On Tue, Feb 13, 2024 at 3:11 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2024-02-08 at 20:06 -0800, Alexei Starovoitov wrote:
> > From: Alexei Starovoitov <ast@kernel.org>
> >
> > LLVM automatically places __arena variables into ".arena.1" ELF section.
> > When libbpf sees such section it creates internal 'struct bpf_map' LIBBPF_MAP_ARENA
> > that is connected to actual BPF_MAP_TYPE_ARENA 'struct bpf_map'.
> > They share the same kernel's side bpf map and single map_fd.
> > Both are emitted into skeleton. Real arena with the name given by bpf program
> > in SEC(".maps") and another with "__arena_internal" name.
> > All global variables from ".arena.1" section are accessible from user space
> > via skel->arena->name_of_var.
> >
> > For bss/data/rodata the skeleton/libbpf perform the following sequence:
> > 1. addr = mmap(MAP_ANONYMOUS)
> > 2. user space optionally modifies global vars
> > 3. map_fd = bpf_create_map()
> > 4. bpf_update_map_elem(map_fd, addr) // to store values into the kernel
> > 5. mmap(addr, MAP_FIXED, map_fd)
> > after step 5 user spaces see the values it wrote at step 2 at the same addresses
> >
> > arena doesn't support update_map_elem. Hence skeleton/libbpf do:
> > 1. addr = mmap(MAP_ANONYMOUS)
> > 2. user space optionally modifies global vars
> > 3. map_fd = bpf_create_map(MAP_TYPE_ARENA)
> > 4. real_addr = mmap(map->map_extra, MAP_SHARED | MAP_FIXED, map_fd)
> > 5. memcpy(real_addr, addr) // this will fault-in and allocate pages
> > 6. munmap(addr)
> >
> > At the end look and feel of global data vs __arena global data is the same from bpf prog pov.
>
> [...]
>
> So, at first I thought that having two maps is a bit of a hack.
> However, after trying to make it work with only one map I don't really
> like that either :)

My first attempt was with single arena map, but it ended up with
hacks all over libbpf and bpftool to treat one map differently depending
on conditions.
Two maps simplified the code a lot.

> The patch looks good to me, have not spotted any logical issues.
>
> I have two questions if you don't mind:
>
> First is regarding initialization data.
> In bpf_object__init_internal_map() the amount of bpf_map_mmap_sz(map)
> bytes is mmaped and only data_sz bytes are copied,
> then bpf_map_mmap_sz(map) bytes are copied in bpf_object__create_maps().
> Is Linux/libc smart enough to skip action on pages which were mmaped but
> never touched?

kernel gives zeroed out pages to user space.
So it's ok to mmap a page, copy data_sz bytes into it
and later copy the full page from one addr to another.
No garbage copy.
In this case there will be data by llvm construction of ".arena.1"
It looks to me that even .bss-like __arena vars have zero-s in data
and non-zero data_sz.

>
> Second is regarding naming.
> Currently only one arena is supported, and generated skel has a single '->arena' field.
> Is there a plan to support multiple arenas at some point?
> If so, should '->arena' field use the same name as arena map declared in program?

I wanted to place all global arena vars into a default name "arena"
and let skeleton to use that name without thinking what name
bpf prog gave to the actual map.
Alexei Starovoitov Feb. 14, 2024, 1:24 a.m. UTC | #14
On Tue, Feb 13, 2024 at 4:09 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Tue, Feb 13, 2024 at 3:37 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> >
> > On Tue, 2024-02-13 at 15:17 -0800, Andrii Nakryiko wrote:
> >
> > [...]
> >
> > > > So, at first I thought that having two maps is a bit of a hack.
> > >
> > > yep, that was my instinct as well
> > >
> > > > However, after trying to make it work with only one map I don't really
> > > > like that either :)
> > >
> > > Can you elaborate? see my reply to Alexei, I wonder how did you think
> > > about doing this?
> >
> > Relocations in the ELF file are against a new section: ".arena.1".
> > This works nicely with logic in bpf_program__record_reloc().
> > If single map is used, we effectively need to track two indexes for
> > the map section:
> > - one used for relocations against map variables themselves
> >   (named "generic map reference relocation" in the function code);
> > - one used for relocations against ".arena.1"
> >   (named "global data map relocation" in the function code).
> >
> > This spooked me off:
> > - either bpf_object__init_internal_map() would have a specialized
> >   branch for arenas, as with current approach;
> > - or bpf_program__record_reloc() would have a specialized branch for arenas,
> >   as with one map approach.
>
> Yes, relocations would know about .arena.1, but it's a pretty simple
> check in a few places. We basically have arena *definition* sec_idx
> (corresponding to SEC(".maps")) and arena *data* sec_idx. The latter
> is what is recorded for global variables in .arena.1. We can remember
> this arena data sec_idx in struct bpf_object once during ELF
> processing, and then just special case it internally in a few places.

That was my first attempt and bpf_program__record_reloc()
became a mess.
Currently it does relo search either in internal maps
or in obj->efile.btf_maps_shndx.
Doing double search wasn't nice.
And further, such dual meaning of 'struct bpf_map' object messes
assumptions of bpf_object__shndx_is_maps, bpf_object__shndx_is_data
and the way libbpf treats map->libbpf_type everywhere.

bpf_map__is_internal() cannot really say true or false
for such dual use map.
Then skeleton gen gets ugly.
Needs more public libbpf APIs to use in bpftool gen.
Just a mess.

> The "fake" bpf_map for __arena_internal is user-visible and requires
> autocreate=false tricks, etc. I feel like it's a worse tradeoff from a
> user API perspective than a few extra ARENA-specific internal checks
> (which we already have a few anyways, ARENA is not completely
> transparent internally anyways).

what do you mean 'user visible'?
I can add a filter to avoid generating a pointer for it in a skeleton.
Then it won't be any more visible than other bss/data fake maps.
The 2nd fake arena returns true out of bpf_map__is_internal.

The key comment in the patch:
                /* bpf_object will contain two arena maps:
                 * LIBBPF_MAP_ARENA & BPF_MAP_TYPE_ARENA
                 * and
                 * LIBBPF_MAP_UNSPEC & BPF_MAP_TYPE_ARENA.
                 * The former map->arena will point to latter.
                 */
Eduard Zingerman Feb. 14, 2024, 3:10 p.m. UTC | #15
On Tue, 2024-02-13 at 17:02 -0800, Alexei Starovoitov wrote:
[...]
> > First is regarding initialization data.
> > In bpf_object__init_internal_map() the amount of bpf_map_mmap_sz(map)
> > bytes is mmaped and only data_sz bytes are copied,
> > then bpf_map_mmap_sz(map) bytes are copied in bpf_object__create_maps().
> > Is Linux/libc smart enough to skip action on pages which were mmaped but
> > never touched?
> 
> kernel gives zeroed out pages to user space.
> So it's ok to mmap a page, copy data_sz bytes into it
> and later copy the full page from one addr to another.
> No garbage copy.
> In this case there will be data by llvm construction of ".arena.1"
> It looks to me that even .bss-like __arena vars have zero-s in data
> and non-zero data_sz.

I was actually worried about second memcpy increasing RSS unnecessarily,
but I missed that internal map does:

  def->max_entries = roundup(data_sz, page_sz) / page_sz;

So that is not an issue as bpf_map_mmap_sz() for internal map is
proportional to data_sz, not full arena.
Sorry for the noise.
Andrii Nakryiko Feb. 14, 2024, 5:24 p.m. UTC | #16
On Tue, Feb 13, 2024 at 5:24 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Feb 13, 2024 at 4:09 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Tue, Feb 13, 2024 at 3:37 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > >
> > > On Tue, 2024-02-13 at 15:17 -0800, Andrii Nakryiko wrote:
> > >
> > > [...]
> > >
> > > > > So, at first I thought that having two maps is a bit of a hack.
> > > >
> > > > yep, that was my instinct as well
> > > >
> > > > > However, after trying to make it work with only one map I don't really
> > > > > like that either :)
> > > >
> > > > Can you elaborate? see my reply to Alexei, I wonder how did you think
> > > > about doing this?
> > >
> > > Relocations in the ELF file are against a new section: ".arena.1".
> > > This works nicely with logic in bpf_program__record_reloc().
> > > If single map is used, we effectively need to track two indexes for
> > > the map section:
> > > - one used for relocations against map variables themselves
> > >   (named "generic map reference relocation" in the function code);
> > > - one used for relocations against ".arena.1"
> > >   (named "global data map relocation" in the function code).
> > >
> > > This spooked me off:
> > > - either bpf_object__init_internal_map() would have a specialized
> > >   branch for arenas, as with current approach;
> > > - or bpf_program__record_reloc() would have a specialized branch for arenas,
> > >   as with one map approach.
> >
> > Yes, relocations would know about .arena.1, but it's a pretty simple
> > check in a few places. We basically have arena *definition* sec_idx
> > (corresponding to SEC(".maps")) and arena *data* sec_idx. The latter
> > is what is recorded for global variables in .arena.1. We can remember
> > this arena data sec_idx in struct bpf_object once during ELF
> > processing, and then just special case it internally in a few places.
>
> That was my first attempt and bpf_program__record_reloc()
> became a mess.
> Currently it does relo search either in internal maps
> or in obj->efile.btf_maps_shndx.
> Doing double search wasn't nice.
> And further, such dual meaning of 'struct bpf_map' object messes
> assumptions of bpf_object__shndx_is_maps, bpf_object__shndx_is_data
> and the way libbpf treats map->libbpf_type everywhere.
>
> bpf_map__is_internal() cannot really say true or false
> for such dual use map.
> Then skeleton gen gets ugly.
> Needs more public libbpf APIs to use in bpftool gen.
> Just a mess.

It might be easier for me to try implement it the way I see it than
discuss it over emails. I'll give it a try today-tomorrow and get back
to you.

>
> > The "fake" bpf_map for __arena_internal is user-visible and requires
> > autocreate=false tricks, etc. I feel like it's a worse tradeoff from a
> > user API perspective than a few extra ARENA-specific internal checks
> > (which we already have a few anyways, ARENA is not completely
> > transparent internally anyways).
>
> what do you mean 'user visible'?

That __arena_internal (representing .area.1 data section) actually is
separate from actual ARENA map (represented by variable in .maps
section). And both have separate `struct bpf_map`, which you can look
up by name or through iterating all maps of bpf_object. And that you
can call getters/setters on __arena_internal, even though the only
thing that actually makes sense there is bpf_map__initial_value(),
which would just as much make sense on ARENA map itself.

> I can add a filter to avoid generating a pointer for it in a skeleton.
> Then it won't be any more visible than other bss/data fake maps.

bss/data are not fake maps, they have corresponding BPF map (ARRAY) in
the kernel. Which is different from __arena_internal. And even if we
hide it from skeleton, it's still there in bpf_object, as I mentioned
above.

Let me try implementing what I have in mind and see how bad it is.

> The 2nd fake arena returns true out of bpf_map__is_internal.
>
> The key comment in the patch:
>                 /* bpf_object will contain two arena maps:
>                  * LIBBPF_MAP_ARENA & BPF_MAP_TYPE_ARENA
>                  * and
>                  * LIBBPF_MAP_UNSPEC & BPF_MAP_TYPE_ARENA.
>                  * The former map->arena will point to latter.
>                  */

Yes, and I'd like to not have two arena maps because they are logically one.
Andrii Nakryiko Feb. 15, 2024, 11:22 p.m. UTC | #17
On Wed, Feb 14, 2024 at 9:24 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Tue, Feb 13, 2024 at 5:24 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, Feb 13, 2024 at 4:09 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Tue, Feb 13, 2024 at 3:37 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
> > > >
> > > > On Tue, 2024-02-13 at 15:17 -0800, Andrii Nakryiko wrote:
> > > >
> > > > [...]
> > > >
> > > > > > So, at first I thought that having two maps is a bit of a hack.
> > > > >
> > > > > yep, that was my instinct as well
> > > > >
> > > > > > However, after trying to make it work with only one map I don't really
> > > > > > like that either :)
> > > > >
> > > > > Can you elaborate? see my reply to Alexei, I wonder how did you think
> > > > > about doing this?
> > > >
> > > > Relocations in the ELF file are against a new section: ".arena.1".
> > > > This works nicely with logic in bpf_program__record_reloc().
> > > > If single map is used, we effectively need to track two indexes for
> > > > the map section:
> > > > - one used for relocations against map variables themselves
> > > >   (named "generic map reference relocation" in the function code);
> > > > - one used for relocations against ".arena.1"
> > > >   (named "global data map relocation" in the function code).
> > > >
> > > > This spooked me off:
> > > > - either bpf_object__init_internal_map() would have a specialized
> > > >   branch for arenas, as with current approach;
> > > > - or bpf_program__record_reloc() would have a specialized branch for arenas,
> > > >   as with one map approach.
> > >
> > > Yes, relocations would know about .arena.1, but it's a pretty simple
> > > check in a few places. We basically have arena *definition* sec_idx
> > > (corresponding to SEC(".maps")) and arena *data* sec_idx. The latter
> > > is what is recorded for global variables in .arena.1. We can remember
> > > this arena data sec_idx in struct bpf_object once during ELF
> > > processing, and then just special case it internally in a few places.
> >
> > That was my first attempt and bpf_program__record_reloc()
> > became a mess.
> > Currently it does relo search either in internal maps
> > or in obj->efile.btf_maps_shndx.
> > Doing double search wasn't nice.
> > And further, such dual meaning of 'struct bpf_map' object messes
> > assumptions of bpf_object__shndx_is_maps, bpf_object__shndx_is_data
> > and the way libbpf treats map->libbpf_type everywhere.
> >
> > bpf_map__is_internal() cannot really say true or false
> > for such dual use map.
> > Then skeleton gen gets ugly.
> > Needs more public libbpf APIs to use in bpftool gen.
> > Just a mess.
>
> It might be easier for me to try implement it the way I see it than
> discuss it over emails. I'll give it a try today-tomorrow and get back
> to you.
>
> >
> > > The "fake" bpf_map for __arena_internal is user-visible and requires
> > > autocreate=false tricks, etc. I feel like it's a worse tradeoff from a
> > > user API perspective than a few extra ARENA-specific internal checks
> > > (which we already have a few anyways, ARENA is not completely
> > > transparent internally anyways).
> >
> > what do you mean 'user visible'?
>
> That __arena_internal (representing .area.1 data section) actually is
> separate from actual ARENA map (represented by variable in .maps
> section). And both have separate `struct bpf_map`, which you can look
> up by name or through iterating all maps of bpf_object. And that you
> can call getters/setters on __arena_internal, even though the only
> thing that actually makes sense there is bpf_map__initial_value(),
> which would just as much make sense on ARENA map itself.
>
> > I can add a filter to avoid generating a pointer for it in a skeleton.
> > Then it won't be any more visible than other bss/data fake maps.
>
> bss/data are not fake maps, they have corresponding BPF map (ARRAY) in
> the kernel. Which is different from __arena_internal. And even if we
> hide it from skeleton, it's still there in bpf_object, as I mentioned
> above.
>
> Let me try implementing what I have in mind and see how bad it is.
>
> > The 2nd fake arena returns true out of bpf_map__is_internal.
> >
> > The key comment in the patch:
> >                 /* bpf_object will contain two arena maps:
> >                  * LIBBPF_MAP_ARENA & BPF_MAP_TYPE_ARENA
> >                  * and
> >                  * LIBBPF_MAP_UNSPEC & BPF_MAP_TYPE_ARENA.
> >                  * The former map->arena will point to latter.
> >                  */
>
> Yes, and I'd like to not have two arena maps because they are logically one.

Alright, I'm back. I pushed 3 patches on top of your patches into [0].
Available also at [1], if that's more convenient. I'll paste the main
diff below, but gmail will inevitably butcher the formatting, but it's
easier to discuss the code this way.

  [0] https://github.com/anakryiko/linux/commits/arena/
  [1] https://git.kernel.org/pub/scm/linux/kernel/git/andrii/bpf-next.git/log/?h=arena

First, as I was working on the code, I realized that the place where
we do mmap() after creating ARENA map is different from where we
normally do post-creation steps, so I moved the code to keep all those
extra steps in one place. No changes in logic, but now we also don't
need to close map_fd and so on, I think it's better this way.

And so the main changes are below. There are definitely a few
ARENA-specific checks here and there, but I don't think it's that bad.
A bunch of code is just undoing code changes from previous patch, so
once you incorporate these changes into your patches it will be even
smaller.

The main outcome is that we don't have a fake map as an independent
struct bpf_map and bpf_map__initial_value() logic works transparently.

We'll probably need similar special-casing for STRUCT_OPS maps that
Kui-Feng is adding, so ARENA won't be the only one.

The slightly annoying part is that special casing is necessary because
of map->mmapable assumption that it has to be munmap() and its size is
calculated by bpf_map_mmap_sz() (I could have hacked
map->def.value_size for this, but that felt wrong). We could
generalize/fix that, but I chose not to do that just yet.


commit 2a7a90e06d02a4edb60cf92c19aee2b3f05d3cca
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Thu Feb 15 14:55:00 2024 -0800

    libbpf: remove fake __arena_internal map

    Unify actual ARENA map and fake __arena_internal map. .arena.1 ELF
    section isn't a stand-alone BPF map, so it shouldn't be represented as
    `struct bpf_map *` instance in libbpf APIs. Instead, use ELF section
    data as initial data image, which is exposed through skeleton and
    bpf_map__initial_value() to the user, if they need to tune it before the
    load phase. During load phase, this initial image is copied over into
    mmap()'ed region corresponding to ARENA, and discarded.

    Few small checks here and there had to be added to make sure this
    approach works with bpf_map__initial_value(), mostly due to hard-coded
    assumption that map->mmaped is set up with mmap() syscall and should be
    munmap()'ed. For ARENA, .arena.1 can be (much) smaller than maximum
    ARENA size, so this smaller data size has to be tracked separately.
    Given it is enforced that there is only one ARENA for entire bpf_object
    instance, we just keep it in a separate field. This can be generalized
    if necessary later.

    bpftool is adjusted a bit as well, because ARENA map is not reported as
    "internal" (it's not a great fit in this case), plus we need to take
    into account that ARENA map can exist without corresponding .arena.1 ELF
    section, so user-facing data section in skeleton is optional.

    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

diff --git a/tools/bpf/bpftool/gen.c b/tools/bpf/bpftool/gen.c
index 273da2098231..6e17b95436de 100644
--- a/tools/bpf/bpftool/gen.c
+++ b/tools/bpf/bpftool/gen.c
@@ -82,7 +82,7 @@ static bool get_map_ident(const struct bpf_map *map,
char *buf, size_t buf_sz)
     const char *name = bpf_map__name(map);
     int i, n;

-    if (!bpf_map__is_internal(map) || bpf_map__type(map) ==
BPF_MAP_TYPE_ARENA) {
+    if (!bpf_map__is_internal(map)) {
         snprintf(buf, buf_sz, "%s", name);
         return true;
     }
@@ -109,7 +109,7 @@ static bool get_datasec_ident(const char
*sec_name, char *buf, size_t buf_sz)
     /* recognize hard coded LLVM section name */
     if (strcmp(sec_name, ".arena.1") == 0) {
         /* this is the name to use in skeleton */
-        strncpy(buf, "arena", buf_sz);
+        snprintf(buf, buf_sz, "arena");
         return true;
     }
     for  (i = 0, n = ARRAY_SIZE(pfxs); i < n; i++) {
@@ -242,14 +242,16 @@ static const struct btf_type
*find_type_for_map(struct btf *btf, const char *map

 static bool is_mmapable_map(const struct bpf_map *map, char *buf, size_t sz)
 {
-    if (!bpf_map__is_internal(map) || !(bpf_map__map_flags(map) &
BPF_F_MMAPABLE))
-        return false;
+    size_t tmp_sz;

-    if (bpf_map__type(map) == BPF_MAP_TYPE_ARENA) {
-        strncpy(buf, "arena", sz);
+    if (bpf_map__type(map) == BPF_MAP_TYPE_ARENA &&
bpf_map__initial_value(map, &tmp_sz)) {
+        snprintf(buf, sz, "arena");
         return true;
     }

+    if (!bpf_map__is_internal(map) || !(bpf_map__map_flags(map) &
BPF_F_MMAPABLE))
+        return false;
+
     if (!get_map_ident(map, buf, sz))
         return false;

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 5a53f1ed87f2..c72577bef439 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -506,7 +506,6 @@ enum libbpf_map_type {
     LIBBPF_MAP_BSS,
     LIBBPF_MAP_RODATA,
     LIBBPF_MAP_KCONFIG,
-    LIBBPF_MAP_ARENA,
 };

 struct bpf_map_def {
@@ -549,7 +548,6 @@ struct bpf_map {
     bool reused;
     bool autocreate;
     __u64 map_extra;
-    struct bpf_map *arena;
 };

 enum extern_type {
@@ -616,7 +614,6 @@ enum sec_type {
     SEC_BSS,
     SEC_DATA,
     SEC_RODATA,
-    SEC_ARENA,
 };

 struct elf_sec_desc {
@@ -634,6 +631,7 @@ struct elf_state {
     Elf_Data *symbols;
     Elf_Data *st_ops_data;
     Elf_Data *st_ops_link_data;
+    Elf_Data *arena_data;
     size_t shstrndx; /* section index for section name strings */
     size_t strtabidx;
     struct elf_sec_desc *secs;
@@ -644,6 +642,7 @@ struct elf_state {
     int symbols_shndx;
     int st_ops_shndx;
     int st_ops_link_shndx;
+    int arena_data_shndx;
 };

 struct usdt_manager;
@@ -703,6 +702,10 @@ struct bpf_object {

     struct usdt_manager *usdt_man;

+    struct bpf_map *arena_map;
+    void *arena_data;
+    size_t arena_data_sz;
+
     struct kern_feature_cache *feat_cache;
     char *token_path;
     int token_fd;
@@ -1340,6 +1343,7 @@ static void bpf_object__elf_finish(struct bpf_object *obj)
     obj->efile.symbols = NULL;
     obj->efile.st_ops_data = NULL;
     obj->efile.st_ops_link_data = NULL;
+    obj->efile.arena_data = NULL;

     zfree(&obj->efile.secs);
     obj->efile.sec_cnt = 0;
@@ -1722,34 +1726,10 @@ static int
 bpf_object__init_internal_map(struct bpf_object *obj, enum
libbpf_map_type type,
                   const char *real_name, int sec_idx, void *data,
size_t data_sz)
 {
-    const long page_sz = sysconf(_SC_PAGE_SIZE);
-    struct bpf_map *map, *arena = NULL;
     struct bpf_map_def *def;
+    struct bpf_map *map;
     size_t mmap_sz;
-    int err, i;
-
-    if (type == LIBBPF_MAP_ARENA) {
-        for (i = 0; i < obj->nr_maps; i++) {
-            map = &obj->maps[i];
-            if (map->def.type != BPF_MAP_TYPE_ARENA)
-                continue;
-            arena = map;
-            real_name = "__arena_internal";
-                mmap_sz = bpf_map_mmap_sz(map);
-            if (roundup(data_sz, page_sz) > mmap_sz) {
-                pr_warn("Declared arena map size %zd is too small to hold"
-                    "global __arena variables of size %zd\n",
-                    mmap_sz, data_sz);
-                return -E2BIG;
-            }
-            break;
-        }
-        if (!arena) {
-            pr_warn("To use global __arena variables the arena map should"
-                "be declared explicitly in SEC(\".maps\")\n");
-            return -ENOENT;
-        }
-    }
+    int err;

     map = bpf_object__add_map(obj);
     if (IS_ERR(map))
@@ -1760,7 +1740,6 @@ bpf_object__init_internal_map(struct bpf_object
*obj, enum libbpf_map_type type,
     map->sec_offset = 0;
     map->real_name = strdup(real_name);
     map->name = internal_map_name(obj, real_name);
-    map->arena = arena;
     if (!map->real_name || !map->name) {
         zfree(&map->real_name);
         zfree(&map->name);
@@ -1768,32 +1747,18 @@ bpf_object__init_internal_map(struct
bpf_object *obj, enum libbpf_map_type type,
     }

     def = &map->def;
-    if (type == LIBBPF_MAP_ARENA) {
-        /* bpf_object will contain two arena maps:
-         * LIBBPF_MAP_ARENA & BPF_MAP_TYPE_ARENA
-         * and
-         * LIBBPF_MAP_UNSPEC & BPF_MAP_TYPE_ARENA.
-         * The former map->arena will point to latter.
-         */
-        def->type = BPF_MAP_TYPE_ARENA;
-        def->key_size = 0;
-        def->value_size = 0;
-        def->max_entries = roundup(data_sz, page_sz) / page_sz;
-        def->map_flags = BPF_F_MMAPABLE;
-    } else {
-        def->type = BPF_MAP_TYPE_ARRAY;
-        def->key_size = sizeof(int);
-        def->value_size = data_sz;
-        def->max_entries = 1;
-        def->map_flags = type == LIBBPF_MAP_RODATA || type ==
LIBBPF_MAP_KCONFIG
-            ? BPF_F_RDONLY_PROG : 0;
+    def->type = BPF_MAP_TYPE_ARRAY;
+    def->key_size = sizeof(int);
+    def->value_size = data_sz;
+    def->max_entries = 1;
+    def->map_flags = type == LIBBPF_MAP_RODATA || type == LIBBPF_MAP_KCONFIG
+        ? BPF_F_RDONLY_PROG : 0;

-        /* failures are fine because of maps like .rodata.str1.1 */
-        (void) map_fill_btf_type_info(obj, map);
+    /* failures are fine because of maps like .rodata.str1.1 */
+    (void) map_fill_btf_type_info(obj, map);

-        if (map_is_mmapable(obj, map))
-            def->map_flags |= BPF_F_MMAPABLE;
-    }
+    if (map_is_mmapable(obj, map))
+        def->map_flags |= BPF_F_MMAPABLE;

     pr_debug("map '%s' (global data): at sec_idx %d, offset %zu, flags %x.\n",
          map->name, map->sec_idx, map->sec_offset, def->map_flags);
@@ -1857,13 +1822,6 @@ static int
bpf_object__init_global_data_maps(struct bpf_object *obj)
                                 NULL,
                                 sec_desc->data->d_size);
             break;
-        case SEC_ARENA:
-            sec_name = elf_sec_name(obj, elf_sec_by_idx(obj, sec_idx));
-            err = bpf_object__init_internal_map(obj, LIBBPF_MAP_ARENA,
-                                sec_name, sec_idx,
-                                sec_desc->data->d_buf,
-                                sec_desc->data->d_size);
-            break;
         default:
             /* skip */
             break;
@@ -2786,6 +2744,32 @@ static int bpf_object__init_user_btf_map(struct
bpf_object *obj,
     return 0;
 }

+static int init_arena_map_data(struct bpf_object *obj, struct bpf_map *map,
+                   const char *sec_name, int sec_idx,
+                   void *data, size_t data_sz)
+{
+    const long page_sz = sysconf(_SC_PAGE_SIZE);
+    size_t mmap_sz;
+
+    mmap_sz = bpf_map_mmap_sz(obj->arena_map);
+    if (roundup(data_sz, page_sz) > mmap_sz) {
+        pr_warn("elf: sec '%s': declared ARENA map size (%zu) is too
small to hold global __arena variables of size %zu\n",
+            sec_name, mmap_sz, data_sz);
+        return -E2BIG;
+    }
+
+    obj->arena_data = malloc(data_sz);
+    if (!obj->arena_data)
+        return -ENOMEM;
+    memcpy(obj->arena_data, data, data_sz);
+    obj->arena_data_sz = data_sz;
+
+    /* make bpf_map__init_value() work for ARENA maps */
+    map->mmaped = obj->arena_data;
+
+    return 0;
+}
+
 static int bpf_object__init_user_btf_maps(struct bpf_object *obj, bool strict,
                       const char *pin_root_path)
 {
@@ -2835,6 +2819,33 @@ static int
bpf_object__init_user_btf_maps(struct bpf_object *obj, bool strict,
             return err;
     }

+    for (i = 0; i < obj->nr_maps; i++) {
+        struct bpf_map *map = &obj->maps[i];
+
+        if (map->def.type != BPF_MAP_TYPE_ARENA)
+            continue;
+
+        if (obj->arena_map) {
+            pr_warn("map '%s': only single ARENA map is supported
(map '%s' is also ARENA)\n",
+                map->name, obj->arena_map->name);
+            return -EINVAL;
+        }
+        obj->arena_map = map;
+
+        if (obj->efile.arena_data) {
+            err = init_arena_map_data(obj, map, ARENA_SEC,
obj->efile.arena_data_shndx,
+                          obj->efile.arena_data->d_buf,
+                          obj->efile.arena_data->d_size);
+            if (err)
+                return err;
+        }
+    }
+    if (obj->efile.arena_data && !obj->arena_map) {
+        pr_warn("elf: sec '%s': to use global __arena variables the
ARENA map should be explicitly declared in SEC(\".maps\")\n",
+            ARENA_SEC);
+        return -ENOENT;
+    }
+
     return 0;
 }

@@ -3699,9 +3710,8 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
                 obj->efile.st_ops_link_data = data;
                 obj->efile.st_ops_link_shndx = idx;
             } else if (strcmp(name, ARENA_SEC) == 0) {
-                sec_desc->sec_type = SEC_ARENA;
-                sec_desc->shdr = sh;
-                sec_desc->data = data;
+                obj->efile.arena_data = data;
+                obj->efile.arena_data_shndx = idx;
             } else {
                 pr_info("elf: skipping unrecognized data section(%d) %s\n",
                     idx, name);
@@ -4204,7 +4214,6 @@ static bool bpf_object__shndx_is_data(const
struct bpf_object *obj,
     case SEC_BSS:
     case SEC_DATA:
     case SEC_RODATA:
-    case SEC_ARENA:
         return true;
     default:
         return false;
@@ -4230,8 +4239,6 @@ bpf_object__section_to_libbpf_map_type(const
struct bpf_object *obj, int shndx)
         return LIBBPF_MAP_DATA;
     case SEC_RODATA:
         return LIBBPF_MAP_RODATA;
-    case SEC_ARENA:
-        return LIBBPF_MAP_ARENA;
     default:
         return LIBBPF_MAP_UNSPEC;
     }
@@ -4332,6 +4339,15 @@ static int bpf_program__record_reloc(struct
bpf_program *prog,
     type = bpf_object__section_to_libbpf_map_type(obj, shdr_idx);
     sym_sec_name = elf_sec_name(obj, elf_sec_by_idx(obj, shdr_idx));

+    /* arena data relocation */
+    if (shdr_idx == obj->efile.arena_data_shndx) {
+        reloc_desc->type = RELO_DATA;
+        reloc_desc->insn_idx = insn_idx;
+        reloc_desc->map_idx = obj->arena_map - obj->maps;
+        reloc_desc->sym_off = sym->st_value;
+        return 0;
+    }
+
     /* generic map reference relocation */
     if (type == LIBBPF_MAP_UNSPEC) {
         if (!bpf_object__shndx_is_maps(obj, shdr_idx)) {
@@ -4385,7 +4401,7 @@ static int bpf_program__record_reloc(struct
bpf_program *prog,

     reloc_desc->type = RELO_DATA;
     reloc_desc->insn_idx = insn_idx;
-    reloc_desc->map_idx = map->arena ? map->arena - obj->maps : map_idx;
+    reloc_desc->map_idx = map_idx;
     reloc_desc->sym_off = sym->st_value;
     return 0;
 }
@@ -4872,8 +4888,6 @@ bpf_object__populate_internal_map(struct
bpf_object *obj, struct bpf_map *map)
             bpf_gen__map_freeze(obj->gen_loader, map - obj->maps);
         return 0;
     }
-    if (map_type == LIBBPF_MAP_ARENA)
-        return 0;

     err = bpf_map_update_elem(map->fd, &zero, map->mmaped, 0);
     if (err) {
@@ -5166,15 +5180,6 @@ bpf_object__create_maps(struct bpf_object *obj)
         if (bpf_map__is_internal(map) && !kernel_supports(obj,
FEAT_GLOBAL_DATA))
             map->autocreate = false;

-        if (map->libbpf_type == LIBBPF_MAP_ARENA) {
-            size_t len = bpf_map_mmap_sz(map);
-
-            memcpy(map->arena->mmaped, map->mmaped, len);
-            map->autocreate = false;
-            munmap(map->mmaped, len);
-            map->mmaped = NULL;
-        }
-
         if (!map->autocreate) {
             pr_debug("map '%s': skipped auto-creating...\n", map->name);
             continue;
@@ -5229,6 +5234,10 @@ bpf_object__create_maps(struct bpf_object *obj)
                         map->name, err);
                     return err;
                 }
+                if (obj->arena_data) {
+                    memcpy(map->mmaped, obj->arena_data, obj->arena_data_sz);
+                    zfree(&obj->arena_data);
+                }
             }
             if (map->init_slots_sz && map->def.type !=
BPF_MAP_TYPE_PROG_ARRAY) {
                 err = init_map_in_map_slots(obj, map);
@@ -8716,13 +8725,9 @@ static void bpf_map__destroy(struct bpf_map *map)
     zfree(&map->init_slots);
     map->init_slots_sz = 0;

-    if (map->mmaped) {
-        size_t mmap_sz;
-
-        mmap_sz = bpf_map_mmap_sz(map);
-        munmap(map->mmaped, mmap_sz);
-        map->mmaped = NULL;
-    }
+    if (map->mmaped && map->mmaped != map->obj->arena_data)
+        munmap(map->mmaped, bpf_map_mmap_sz(map));
+    map->mmaped = NULL;

     if (map->st_ops) {
         zfree(&map->st_ops->data);
@@ -8782,6 +8787,8 @@ void bpf_object__close(struct bpf_object *obj)
     if (obj->token_fd > 0)
         close(obj->token_fd);

+    zfree(&obj->arena_data);
+
     free(obj);
 }

@@ -9803,8 +9810,6 @@ static bool map_uses_real_name(const struct bpf_map *map)
         return true;
     if (map->libbpf_type == LIBBPF_MAP_RODATA &&
strcmp(map->real_name, RODATA_SEC) != 0)
         return true;
-    if (map->libbpf_type == LIBBPF_MAP_ARENA)
-        return true;
     return false;
 }

@@ -10006,22 +10011,35 @@ __u32 bpf_map__btf_value_type_id(const
struct bpf_map *map)
 int bpf_map__set_initial_value(struct bpf_map *map,
                    const void *data, size_t size)
 {
+    size_t actual_sz;
+
     if (map->obj->loaded || map->reused)
         return libbpf_err(-EBUSY);

-    if (!map->mmaped || map->libbpf_type == LIBBPF_MAP_KCONFIG ||
-        size != map->def.value_size)
+    if (!map->mmaped || map->libbpf_type == LIBBPF_MAP_KCONFIG)
+        return libbpf_err(-EINVAL);
+
+    if (map->def.type == BPF_MAP_TYPE_ARENA)
+        actual_sz = map->obj->arena_data_sz;
+    else
+        actual_sz = map->def.value_size;
+    if (size != actual_sz)
         return libbpf_err(-EINVAL);

     memcpy(map->mmaped, data, size);
     return 0;
 }

-void *bpf_map__initial_value(struct bpf_map *map, size_t *psize)
+void *bpf_map__initial_value(const struct bpf_map *map, size_t *psize)
 {
     if (!map->mmaped)
         return NULL;
-    *psize = map->def.value_size;
+
+    if (map->def.type == BPF_MAP_TYPE_ARENA)
+        *psize = map->obj->arena_data_sz;
+    else
+        *psize = map->def.value_size;
+
     return map->mmaped;
 }

@@ -13510,8 +13528,8 @@ int bpf_object__load_skeleton(struct
bpf_object_skeleton *s)
             continue;
         }

-        if (map->arena) {
-            *mmaped = map->arena->mmaped;
+        if (map->def.type == BPF_MAP_TYPE_ARENA) {
+            *mmaped = map->mmaped;
             continue;
         }

diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 5723cbbfcc41..7b510761f545 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -1014,7 +1014,7 @@ LIBBPF_API int bpf_map__set_map_extra(struct
bpf_map *map, __u64 map_extra);

 LIBBPF_API int bpf_map__set_initial_value(struct bpf_map *map,
                       const void *data, size_t size);
-LIBBPF_API void *bpf_map__initial_value(struct bpf_map *map, size_t *psize);
+LIBBPF_API void *bpf_map__initial_value(const struct bpf_map *map,
size_t *psize);

 /**
  * @brief **bpf_map__is_internal()** tells the caller whether or not the
Alexei Starovoitov Feb. 16, 2024, 2:45 a.m. UTC | #18
On Thu, Feb 15, 2024 at 3:22 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
>  {
> @@ -2835,6 +2819,33 @@ static int
> bpf_object__init_user_btf_maps(struct bpf_object *obj, bool strict,
>              return err;
>      }
>
> +    for (i = 0; i < obj->nr_maps; i++) {
> +        struct bpf_map *map = &obj->maps[i];
> +
> +        if (map->def.type != BPF_MAP_TYPE_ARENA)
> +            continue;
> +
> +        if (obj->arena_map) {
> +            pr_warn("map '%s': only single ARENA map is supported
> (map '%s' is also ARENA)\n",
> +                map->name, obj->arena_map->name);
> +            return -EINVAL;
> +        }
> +        obj->arena_map = map;
> +
> +        if (obj->efile.arena_data) {
> +            err = init_arena_map_data(obj, map, ARENA_SEC,
> obj->efile.arena_data_shndx,
> +                          obj->efile.arena_data->d_buf,
> +                          obj->efile.arena_data->d_size);
> +            if (err)
> +                return err;
> +        }
> +    }
> +    if (obj->efile.arena_data && !obj->arena_map) {
> +        pr_warn("elf: sec '%s': to use global __arena variables the
> ARENA map should be explicitly declared in SEC(\".maps\")\n",
> +            ARENA_SEC);
> +        return -ENOENT;
> +    }
> +
>      return 0;
>  }
>
> @@ -3699,9 +3710,8 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
>                  obj->efile.st_ops_link_data = data;
>                  obj->efile.st_ops_link_shndx = idx;
>              } else if (strcmp(name, ARENA_SEC) == 0) {
> -                sec_desc->sec_type = SEC_ARENA;
> -                sec_desc->shdr = sh;
> -                sec_desc->data = data;
> +                obj->efile.arena_data = data;
> +                obj->efile.arena_data_shndx = idx;

I see. So these two are sort-of main tricks.
Special case ARENA_SEC like ".maps" and then look for this
obj level map in the right spots.
The special case around bpf_map__[set_]initial_value kind break
the layering with:
if (map->def.type == BPF_MAP_TYPE_ARENA)
  actual_sz = map->obj->arena_data_sz;
but no big deal.

How do you want me to squash the patches?
Keep "rename is_internal_mmapable_map into is_mmapable_map" patch as-is
and then squash mine and your 2nd and 3rd?
Andrii Nakryiko Feb. 16, 2024, 4:51 a.m. UTC | #19
On Thu, Feb 15, 2024 at 6:45 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Feb 15, 2024 at 3:22 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> >  {
> > @@ -2835,6 +2819,33 @@ static int
> > bpf_object__init_user_btf_maps(struct bpf_object *obj, bool strict,
> >              return err;
> >      }
> >
> > +    for (i = 0; i < obj->nr_maps; i++) {
> > +        struct bpf_map *map = &obj->maps[i];
> > +
> > +        if (map->def.type != BPF_MAP_TYPE_ARENA)
> > +            continue;
> > +
> > +        if (obj->arena_map) {
> > +            pr_warn("map '%s': only single ARENA map is supported
> > (map '%s' is also ARENA)\n",
> > +                map->name, obj->arena_map->name);
> > +            return -EINVAL;
> > +        }
> > +        obj->arena_map = map;
> > +
> > +        if (obj->efile.arena_data) {
> > +            err = init_arena_map_data(obj, map, ARENA_SEC,
> > obj->efile.arena_data_shndx,
> > +                          obj->efile.arena_data->d_buf,
> > +                          obj->efile.arena_data->d_size);
> > +            if (err)
> > +                return err;
> > +        }
> > +    }
> > +    if (obj->efile.arena_data && !obj->arena_map) {
> > +        pr_warn("elf: sec '%s': to use global __arena variables the
> > ARENA map should be explicitly declared in SEC(\".maps\")\n",
> > +            ARENA_SEC);
> > +        return -ENOENT;
> > +    }
> > +
> >      return 0;
> >  }
> >
> > @@ -3699,9 +3710,8 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
> >                  obj->efile.st_ops_link_data = data;
> >                  obj->efile.st_ops_link_shndx = idx;
> >              } else if (strcmp(name, ARENA_SEC) == 0) {
> > -                sec_desc->sec_type = SEC_ARENA;
> > -                sec_desc->shdr = sh;
> > -                sec_desc->data = data;
> > +                obj->efile.arena_data = data;
> > +                obj->efile.arena_data_shndx = idx;
>
> I see. So these two are sort-of main tricks.
> Special case ARENA_SEC like ".maps" and then look for this
> obj level map in the right spots.

yep

> The special case around bpf_map__[set_]initial_value kind break
> the layering with:
> if (map->def.type == BPF_MAP_TYPE_ARENA)
>   actual_sz = map->obj->arena_data_sz;
> but no big deal.
>

true, and struct_ops will be another special case, so we might want to
think about generalizing that a bit, but that's a separate thing we
can handle later on

> How do you want me to squash the patches?
> Keep "rename is_internal_mmapable_map into is_mmapable_map" patch as-is

yep

> and then squash mine and your 2nd and 3rd?

I think `libbpf: move post-creation steps for ARENA map` should be
squashed into your `libbpf: Add support for bpf_arena.` which
introduces ARENA map by itself. And then `libbpf: Recognize __arena
global varaibles.` and `libbpf: remove fake __arena_internal map` can
be squashed together as well.
diff mbox series

Patch

diff --git a/tools/bpf/bpftool/gen.c b/tools/bpf/bpftool/gen.c
index a9334c57e859..74fabbdbad2b 100644
--- a/tools/bpf/bpftool/gen.c
+++ b/tools/bpf/bpftool/gen.c
@@ -82,7 +82,7 @@  static bool get_map_ident(const struct bpf_map *map, char *buf, size_t buf_sz)
 	const char *name = bpf_map__name(map);
 	int i, n;
 
-	if (!bpf_map__is_internal(map)) {
+	if (!bpf_map__is_internal(map) || bpf_map__type(map) == BPF_MAP_TYPE_ARENA) {
 		snprintf(buf, buf_sz, "%s", name);
 		return true;
 	}
@@ -106,6 +106,12 @@  static bool get_datasec_ident(const char *sec_name, char *buf, size_t buf_sz)
 	static const char *pfxs[] = { ".data", ".rodata", ".bss", ".kconfig" };
 	int i, n;
 
+	/* recognize hard coded LLVM section name */
+	if (strcmp(sec_name, ".arena.1") == 0) {
+		/* this is the name to use in skeleton */
+		strncpy(buf, "arena", buf_sz);
+		return true;
+	}
 	for  (i = 0, n = ARRAY_SIZE(pfxs); i < n; i++) {
 		const char *pfx = pfxs[i];
 
@@ -239,6 +245,11 @@  static bool is_internal_mmapable_map(const struct bpf_map *map, char *buf, size_
 	if (!bpf_map__is_internal(map) || !(bpf_map__map_flags(map) & BPF_F_MMAPABLE))
 		return false;
 
+	if (bpf_map__type(map) == BPF_MAP_TYPE_ARENA) {
+		strncpy(buf, "arena", sz);
+		return true;
+	}
+
 	if (!get_map_ident(map, buf, sz))
 		return false;
 
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index f8158e250327..d5364280a06c 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -498,6 +498,7 @@  struct bpf_struct_ops {
 #define KSYMS_SEC ".ksyms"
 #define STRUCT_OPS_SEC ".struct_ops"
 #define STRUCT_OPS_LINK_SEC ".struct_ops.link"
+#define ARENA_SEC ".arena.1"
 
 enum libbpf_map_type {
 	LIBBPF_MAP_UNSPEC,
@@ -505,6 +506,7 @@  enum libbpf_map_type {
 	LIBBPF_MAP_BSS,
 	LIBBPF_MAP_RODATA,
 	LIBBPF_MAP_KCONFIG,
+	LIBBPF_MAP_ARENA,
 };
 
 struct bpf_map_def {
@@ -547,6 +549,7 @@  struct bpf_map {
 	bool reused;
 	bool autocreate;
 	__u64 map_extra;
+	struct bpf_map *arena;
 };
 
 enum extern_type {
@@ -613,6 +616,7 @@  enum sec_type {
 	SEC_BSS,
 	SEC_DATA,
 	SEC_RODATA,
+	SEC_ARENA,
 };
 
 struct elf_sec_desc {
@@ -1718,10 +1722,34 @@  static int
 bpf_object__init_internal_map(struct bpf_object *obj, enum libbpf_map_type type,
 			      const char *real_name, int sec_idx, void *data, size_t data_sz)
 {
+	const long page_sz = sysconf(_SC_PAGE_SIZE);
+	struct bpf_map *map, *arena = NULL;
 	struct bpf_map_def *def;
-	struct bpf_map *map;
 	size_t mmap_sz;
-	int err;
+	int err, i;
+
+	if (type == LIBBPF_MAP_ARENA) {
+		for (i = 0; i < obj->nr_maps; i++) {
+			map = &obj->maps[i];
+			if (map->def.type != BPF_MAP_TYPE_ARENA)
+				continue;
+			arena = map;
+			real_name = "__arena_internal";
+		        mmap_sz = bpf_map_mmap_sz(map);
+			if (roundup(data_sz, page_sz) > mmap_sz) {
+				pr_warn("Declared arena map size %zd is too small to hold"
+					"global __arena variables of size %zd\n",
+					mmap_sz, data_sz);
+				return -E2BIG;
+			}
+			break;
+		}
+		if (!arena) {
+			pr_warn("To use global __arena variables the arena map should"
+				"be declared explicitly in SEC(\".maps\")\n");
+			return -ENOENT;
+		}
+	}
 
 	map = bpf_object__add_map(obj);
 	if (IS_ERR(map))
@@ -1732,6 +1760,7 @@  bpf_object__init_internal_map(struct bpf_object *obj, enum libbpf_map_type type,
 	map->sec_offset = 0;
 	map->real_name = strdup(real_name);
 	map->name = internal_map_name(obj, real_name);
+	map->arena = arena;
 	if (!map->real_name || !map->name) {
 		zfree(&map->real_name);
 		zfree(&map->name);
@@ -1739,18 +1768,32 @@  bpf_object__init_internal_map(struct bpf_object *obj, enum libbpf_map_type type,
 	}
 
 	def = &map->def;
-	def->type = BPF_MAP_TYPE_ARRAY;
-	def->key_size = sizeof(int);
-	def->value_size = data_sz;
-	def->max_entries = 1;
-	def->map_flags = type == LIBBPF_MAP_RODATA || type == LIBBPF_MAP_KCONFIG
-			 ? BPF_F_RDONLY_PROG : 0;
+	if (type == LIBBPF_MAP_ARENA) {
+		/* bpf_object will contain two arena maps:
+		 * LIBBPF_MAP_ARENA & BPF_MAP_TYPE_ARENA
+		 * and
+		 * LIBBPF_MAP_UNSPEC & BPF_MAP_TYPE_ARENA.
+		 * The former map->arena will point to latter.
+		 */
+		def->type = BPF_MAP_TYPE_ARENA;
+		def->key_size = 0;
+		def->value_size = 0;
+		def->max_entries = roundup(data_sz, page_sz) / page_sz;
+		def->map_flags = BPF_F_MMAPABLE;
+	} else {
+		def->type = BPF_MAP_TYPE_ARRAY;
+		def->key_size = sizeof(int);
+		def->value_size = data_sz;
+		def->max_entries = 1;
+		def->map_flags = type == LIBBPF_MAP_RODATA || type == LIBBPF_MAP_KCONFIG
+			? BPF_F_RDONLY_PROG : 0;
 
-	/* failures are fine because of maps like .rodata.str1.1 */
-	(void) map_fill_btf_type_info(obj, map);
+		/* failures are fine because of maps like .rodata.str1.1 */
+		(void) map_fill_btf_type_info(obj, map);
 
-	if (map_is_mmapable(obj, map))
-		def->map_flags |= BPF_F_MMAPABLE;
+		if (map_is_mmapable(obj, map))
+			def->map_flags |= BPF_F_MMAPABLE;
+	}
 
 	pr_debug("map '%s' (global data): at sec_idx %d, offset %zu, flags %x.\n",
 		 map->name, map->sec_idx, map->sec_offset, def->map_flags);
@@ -1814,6 +1857,13 @@  static int bpf_object__init_global_data_maps(struct bpf_object *obj)
 							    NULL,
 							    sec_desc->data->d_size);
 			break;
+		case SEC_ARENA:
+			sec_name = elf_sec_name(obj, elf_sec_by_idx(obj, sec_idx));
+			err = bpf_object__init_internal_map(obj, LIBBPF_MAP_ARENA,
+							    sec_name, sec_idx,
+							    sec_desc->data->d_buf,
+							    sec_desc->data->d_size);
+			break;
 		default:
 			/* skip */
 			break;
@@ -3646,6 +3696,10 @@  static int bpf_object__elf_collect(struct bpf_object *obj)
 			} else if (strcmp(name, STRUCT_OPS_LINK_SEC) == 0) {
 				obj->efile.st_ops_link_data = data;
 				obj->efile.st_ops_link_shndx = idx;
+			} else if (strcmp(name, ARENA_SEC) == 0) {
+				sec_desc->sec_type = SEC_ARENA;
+				sec_desc->shdr = sh;
+				sec_desc->data = data;
 			} else {
 				pr_info("elf: skipping unrecognized data section(%d) %s\n",
 					idx, name);
@@ -4148,6 +4202,7 @@  static bool bpf_object__shndx_is_data(const struct bpf_object *obj,
 	case SEC_BSS:
 	case SEC_DATA:
 	case SEC_RODATA:
+	case SEC_ARENA:
 		return true;
 	default:
 		return false;
@@ -4173,6 +4228,8 @@  bpf_object__section_to_libbpf_map_type(const struct bpf_object *obj, int shndx)
 		return LIBBPF_MAP_DATA;
 	case SEC_RODATA:
 		return LIBBPF_MAP_RODATA;
+	case SEC_ARENA:
+		return LIBBPF_MAP_ARENA;
 	default:
 		return LIBBPF_MAP_UNSPEC;
 	}
@@ -4326,7 +4383,7 @@  static int bpf_program__record_reloc(struct bpf_program *prog,
 
 	reloc_desc->type = RELO_DATA;
 	reloc_desc->insn_idx = insn_idx;
-	reloc_desc->map_idx = map_idx;
+	reloc_desc->map_idx = map->arena ? map->arena - obj->maps : map_idx;
 	reloc_desc->sym_off = sym->st_value;
 	return 0;
 }
@@ -4813,6 +4870,9 @@  bpf_object__populate_internal_map(struct bpf_object *obj, struct bpf_map *map)
 			bpf_gen__map_freeze(obj->gen_loader, map - obj->maps);
 		return 0;
 	}
+	if (map_type == LIBBPF_MAP_ARENA)
+		return 0;
+
 	err = bpf_map_update_elem(map->fd, &zero, map->mmaped, 0);
 	if (err) {
 		err = -errno;
@@ -5119,6 +5179,15 @@  bpf_object__create_maps(struct bpf_object *obj)
 		if (bpf_map__is_internal(map) && !kernel_supports(obj, FEAT_GLOBAL_DATA))
 			map->autocreate = false;
 
+		if (map->libbpf_type == LIBBPF_MAP_ARENA) {
+			size_t len = bpf_map_mmap_sz(map);
+
+			memcpy(map->arena->mmaped, map->mmaped, len);
+			map->autocreate = false;
+			munmap(map->mmaped, len);
+			map->mmaped = NULL;
+		}
+
 		if (!map->autocreate) {
 			pr_debug("map '%s': skipped auto-creating...\n", map->name);
 			continue;
@@ -9735,6 +9804,8 @@  static bool map_uses_real_name(const struct bpf_map *map)
 		return true;
 	if (map->libbpf_type == LIBBPF_MAP_RODATA && strcmp(map->real_name, RODATA_SEC) != 0)
 		return true;
+	if (map->libbpf_type == LIBBPF_MAP_ARENA)
+		return true;
 	return false;
 }
 
@@ -13437,6 +13508,11 @@  int bpf_object__load_skeleton(struct bpf_object_skeleton *s)
 			continue;
 		}
 
+		if (map->arena) {
+			*mmaped = map->arena->mmaped;
+			continue;
+		}
+
 		if (map->def.map_flags & BPF_F_RDONLY_PROG)
 			prot = PROT_READ;
 		else