mbox series

[bpf-next,0/7] bpf, mm: bpf memory usage

Message ID 20230202014158.19616-1-laoar.shao@gmail.com (mailing list archive)
Headers show
Series bpf, mm: bpf memory usage | expand

Message

Yafang Shao Feb. 2, 2023, 1:41 a.m. UTC
Currently we can't get bpf memory usage reliably. bpftool now shows the
bpf memory footprint, which is difference with bpf memory usage. The
difference can be quite great between the footprint showed in bpftool
and the memory actually allocated by bpf in some cases, for example,

- non-preallocated bpf map
  The non-preallocated bpf map memory usage is dynamically changed. The
  allocated elements count can be from 0 to the max entries. But the
  memory footprint in bpftool only shows a fixed number.
- bpf metadata consumes more memory than bpf element 
  In some corner cases, the bpf metadata can consumes a lot more memory
  than bpf element consumes. For example, it can happen when the element
  size is quite small.

We need a way to get the bpf memory usage especially there will be more
and more bpf programs running on the production environment and thus the
bpf memory usage is not trivial.

This patchset introduces a new map ops ->map_mem_usage to get the memory
usage. In this ops, the memory usage is got from the pointers which is
already allocated by a bpf map. To make the code simple, we igore some
small pointers as their size are quite small compared with the total
usage.

In order to get the memory size from the pointers, some generic mm helpers
are introduced firstly, for example, percpu_size(), vsize() and kvsize(). 

This patchset only implements the bpf memory usage for hashtab. I will
extend it to other maps and bpf progs (bpf progs can dynamically allocate
memory via bpf_obj_new()) in the future.

The detailed result can be found in patch #7.

Patch #1~#4: Generic mm helpers
Patch #5   : Introduce new ops
Patch #6   : Helpers for bpf_mem_alloc
Patch #7   : hashtab memory usage

Future works:
- extend it to other maps
- extend it to bpf prog
- per-container bpf memory usage 

Historical discussions,
- RFC PATCH v1 mm, bpf: Add BPF into /proc/meminfo
  https://lwn.net/Articles/917647/  
- RFC PATCH v2 mm, bpf: Add BPF into /proc/meminfo
  https://lwn.net/Articles/919848/

Yafang Shao (7):
  mm: percpu: fix incorrect size in pcpu_obj_full_size()
  mm: percpu: introduce percpu_size()
  mm: vmalloc: introduce vsize()
  mm: util: introduce kvsize()
  bpf: add new map ops ->map_mem_usage
  bpf: introduce bpf_mem_alloc_size()
  bpf: hashtab memory usage

 include/linux/bpf.h           |  2 ++
 include/linux/bpf_mem_alloc.h |  2 ++
 include/linux/percpu.h        |  1 +
 include/linux/slab.h          |  1 +
 include/linux/vmalloc.h       |  1 +
 kernel/bpf/hashtab.c          | 80 ++++++++++++++++++++++++++++++++++++++++++-
 kernel/bpf/memalloc.c         | 70 +++++++++++++++++++++++++++++++++++++
 kernel/bpf/syscall.c          | 18 ++++++----
 mm/percpu-internal.h          |  4 ++-
 mm/percpu.c                   | 35 +++++++++++++++++++
 mm/util.c                     | 15 ++++++++
 mm/vmalloc.c                  | 17 +++++++++
 12 files changed, 237 insertions(+), 9 deletions(-)

Comments

John Fastabend Feb. 4, 2023, 2:15 a.m. UTC | #1
Yafang Shao wrote:
> Currently we can't get bpf memory usage reliably. bpftool now shows the
> bpf memory footprint, which is difference with bpf memory usage. The
> difference can be quite great between the footprint showed in bpftool
> and the memory actually allocated by bpf in some cases, for example,
> 
> - non-preallocated bpf map
>   The non-preallocated bpf map memory usage is dynamically changed. The
>   allocated elements count can be from 0 to the max entries. But the
>   memory footprint in bpftool only shows a fixed number.
> - bpf metadata consumes more memory than bpf element 
>   In some corner cases, the bpf metadata can consumes a lot more memory
>   than bpf element consumes. For example, it can happen when the element
>   size is quite small.

Just following up slightly on previous comment.

The metadata should be fixed and knowable correct? What I'm getting at
is if this can be calculated directly instead of through a BPF helper
and walking the entire map.

> 
> We need a way to get the bpf memory usage especially there will be more
> and more bpf programs running on the production environment and thus the
> bpf memory usage is not trivial.

In our environments we track map usage so we always know how many entries
are in a map. I don't think we use this to calculate memory footprint
at the moment, but just for map usage. Seems though once you have this
calculating memory footprint can be done out of band because element
and overheads costs are fixed.

> 
> This patchset introduces a new map ops ->map_mem_usage to get the memory
> usage. In this ops, the memory usage is got from the pointers which is
> already allocated by a bpf map. To make the code simple, we igore some
> small pointers as their size are quite small compared with the total
> usage.
> 
> In order to get the memory size from the pointers, some generic mm helpers
> are introduced firstly, for example, percpu_size(), vsize() and kvsize(). 
> 
> This patchset only implements the bpf memory usage for hashtab. I will
> extend it to other maps and bpf progs (bpf progs can dynamically allocate
> memory via bpf_obj_new()) in the future.

My preference would be to calculate this out of band. Walking a
large map and doing it in a critical section to get the memory
usage seems not optimal 

> 
> The detailed result can be found in patch #7.
> 
> Patch #1~#4: Generic mm helpers
> Patch #5   : Introduce new ops
> Patch #6   : Helpers for bpf_mem_alloc
> Patch #7   : hashtab memory usage
> 
> Future works:
> - extend it to other maps
> - extend it to bpf prog
> - per-container bpf memory usage 
> 
> Historical discussions,
> - RFC PATCH v1 mm, bpf: Add BPF into /proc/meminfo
>   https://lwn.net/Articles/917647/  
> - RFC PATCH v2 mm, bpf: Add BPF into /proc/meminfo
>   https://lwn.net/Articles/919848/
> 
> Yafang Shao (7):
>   mm: percpu: fix incorrect size in pcpu_obj_full_size()
>   mm: percpu: introduce percpu_size()
>   mm: vmalloc: introduce vsize()
>   mm: util: introduce kvsize()
>   bpf: add new map ops ->map_mem_usage
>   bpf: introduce bpf_mem_alloc_size()
>   bpf: hashtab memory usage
> 
>  include/linux/bpf.h           |  2 ++
>  include/linux/bpf_mem_alloc.h |  2 ++
>  include/linux/percpu.h        |  1 +
>  include/linux/slab.h          |  1 +
>  include/linux/vmalloc.h       |  1 +
>  kernel/bpf/hashtab.c          | 80 ++++++++++++++++++++++++++++++++++++++++++-
>  kernel/bpf/memalloc.c         | 70 +++++++++++++++++++++++++++++++++++++
>  kernel/bpf/syscall.c          | 18 ++++++----
>  mm/percpu-internal.h          |  4 ++-
>  mm/percpu.c                   | 35 +++++++++++++++++++
>  mm/util.c                     | 15 ++++++++
>  mm/vmalloc.c                  | 17 +++++++++
>  12 files changed, 237 insertions(+), 9 deletions(-)
> 
> -- 
> 1.8.3.1
>
Yafang Shao Feb. 5, 2023, 4:03 a.m. UTC | #2
On Sat, Feb 4, 2023 at 10:15 AM John Fastabend <john.fastabend@gmail.com> wrote:
>
> Yafang Shao wrote:
> > Currently we can't get bpf memory usage reliably. bpftool now shows the
> > bpf memory footprint, which is difference with bpf memory usage. The
> > difference can be quite great between the footprint showed in bpftool
> > and the memory actually allocated by bpf in some cases, for example,
> >
> > - non-preallocated bpf map
> >   The non-preallocated bpf map memory usage is dynamically changed. The
> >   allocated elements count can be from 0 to the max entries. But the
> >   memory footprint in bpftool only shows a fixed number.
> > - bpf metadata consumes more memory than bpf element
> >   In some corner cases, the bpf metadata can consumes a lot more memory
> >   than bpf element consumes. For example, it can happen when the element
> >   size is quite small.
>
> Just following up slightly on previous comment.
>
> The metadata should be fixed and knowable correct?

The metadata of BPF itself is fixed, but the medata of MM allocation
depends on the kernel configuretion.

> What I'm getting at
> is if this can be calculated directly instead of through a BPF helper
> and walking the entire map.
>

As I explained in another thread, it doesn't walk the entire map.

> >
> > We need a way to get the bpf memory usage especially there will be more
> > and more bpf programs running on the production environment and thus the
> > bpf memory usage is not trivial.
>
> In our environments we track map usage so we always know how many entries
> are in a map. I don't think we use this to calculate memory footprint
> at the moment, but just for map usage. Seems though once you have this
> calculating memory footprint can be done out of band because element
> and overheads costs are fixed.
>
> >
> > This patchset introduces a new map ops ->map_mem_usage to get the memory
> > usage. In this ops, the memory usage is got from the pointers which is
> > already allocated by a bpf map. To make the code simple, we igore some
> > small pointers as their size are quite small compared with the total
> > usage.
> >
> > In order to get the memory size from the pointers, some generic mm helpers
> > are introduced firstly, for example, percpu_size(), vsize() and kvsize().
> >
> > This patchset only implements the bpf memory usage for hashtab. I will
> > extend it to other maps and bpf progs (bpf progs can dynamically allocate
> > memory via bpf_obj_new()) in the future.
>
> My preference would be to calculate this out of band. Walking a
> large map and doing it in a critical section to get the memory
> usage seems not optimal
>

I don't quite understand what you mean by calculating it out of band.
This patchset introduces a BPF helper which is used in bpftool, so it
is already out of band, right ?
We should do it in bpftool, because the sys admin wants a generic way
to get the system-wide bpf memory usage.
Ho-Ren Chuang Feb. 7, 2023, 12:48 a.m. UTC | #3
Hi Yafang and everyone,

We've proposed very similar features at
https://lore.kernel.org/bpf/CAAYibXgiCOOEY9NvLXbY4ve7pH8xWrZjnczrj6SHy3x_TtOU1g@mail.gmail.com/#t


We are very excited seeing we are not the only ones eager to have this
feature upstream to monitor eBPF map's actual usage. This shows the need
for having such an ability in eBPF.


Regarding the use cases please also check
https://lore.kernel.org/all/CAADnVQLBt0snxv4bKwg1WKQ9wDFbaDCtZ03v1-LjOTYtsKPckQ@mail.gmail.com/#t
<https://lore.kernel.org/all/CAADnVQLBt0snxv4bKwg1WKQ9wDFbaDCtZ03v1-LjOTYtsKPckQ@mail.gmail.com/>
.
We are developing an app to monitor memory footprints used by eBPF
programs/maps similar to Linux `top` command.

Thank you,

On Sat, Feb 4, 2023 at 8:03 PM Yafang Shao <laoar.shao@gmail.com> wrote:

> On Sat, Feb 4, 2023 at 10:15 AM John Fastabend <john.fastabend@gmail.com>
> wrote:
> >
> > Yafang Shao wrote:
> > > Currently we can't get bpf memory usage reliably. bpftool now shows the
> > > bpf memory footprint, which is difference with bpf memory usage. The
> > > difference can be quite great between the footprint showed in bpftool
> > > and the memory actually allocated by bpf in some cases, for example,
> > >
> > > - non-preallocated bpf map
> > >   The non-preallocated bpf map memory usage is dynamically changed. The
> > >   allocated elements count can be from 0 to the max entries. But the
> > >   memory footprint in bpftool only shows a fixed number.
> > > - bpf metadata consumes more memory than bpf element
> > >   In some corner cases, the bpf metadata can consumes a lot more memory
> > >   than bpf element consumes. For example, it can happen when the
> element
> > >   size is quite small.
> >
> > Just following up slightly on previous comment.
> >
> > The metadata should be fixed and knowable correct?
>
> The metadata of BPF itself is fixed, but the medata of MM allocation
> depends on the kernel configuretion.
>
> > What I'm getting at
> > is if this can be calculated directly instead of through a BPF helper
> > and walking the entire map.
> >
>
> As I explained in another thread, it doesn't walk the entire map.
>
> > >
> > > We need a way to get the bpf memory usage especially there will be more
> > > and more bpf programs running on the production environment and thus
> the
> > > bpf memory usage is not trivial.
> >
> > In our environments we track map usage so we always know how many entries
> > are in a map. I don't think we use this to calculate memory footprint
> > at the moment, but just for map usage. Seems though once you have this
> > calculating memory footprint can be done out of band because element
> > and overheads costs are fixed.
> >
> > >
> > > This patchset introduces a new map ops ->map_mem_usage to get the
> memory
> > > usage. In this ops, the memory usage is got from the pointers which is
> > > already allocated by a bpf map. To make the code simple, we igore some
> > > small pointers as their size are quite small compared with the total
> > > usage.
> > >
> > > In order to get the memory size from the pointers, some generic mm
> helpers
> > > are introduced firstly, for example, percpu_size(), vsize() and
> kvsize().
> > >
> > > This patchset only implements the bpf memory usage for hashtab. I will
> > > extend it to other maps and bpf progs (bpf progs can dynamically
> allocate
> > > memory via bpf_obj_new()) in the future.
> >
> > My preference would be to calculate this out of band. Walking a
> > large map and doing it in a critical section to get the memory
> > usage seems not optimal
> >
>
> I don't quite understand what you mean by calculating it out of band.
> This patchset introduces a BPF helper which is used in bpftool, so it
> is already out of band, right ?
> We should do it in bpftool, because the sys admin wants a generic way
> to get the system-wide bpf memory usage.
>
> --
> Regards
> Yafang
>
Ho-Ren Chuang Feb. 7, 2023, 12:53 a.m. UTC | #4
Hi Yafang and everyone,

We've proposed very similar features at
https://lore.kernel.org/bpf/CAAYibXgiCOOEY9NvLXbY4ve7pH8xWrZjnczrj6SHy3x_TtOU1g@mail.gmail.com/#t

We are very excited seeing we are not the only ones eager to have this
feature upstream to monitor eBPF map's actual usage. This shows the
need for having such an ability in eBPF.

Regarding the use cases please also check
https://lore.kernel.org/all/CAADnVQLBt0snxv4bKwg1WKQ9wDFbaDCtZ03v1-LjOTYtsKPckQ@mail.gmail.com/#t
. We are developing an app to monitor memory footprints used by eBPF
programs/maps similar to Linux `top` command.

Thank you,


On Sat, Feb 4, 2023 at 8:03 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Sat, Feb 4, 2023 at 10:15 AM John Fastabend <john.fastabend@gmail.com> wrote:
> >
> > Yafang Shao wrote:
> > > Currently we can't get bpf memory usage reliably. bpftool now shows the
> > > bpf memory footprint, which is difference with bpf memory usage. The
> > > difference can be quite great between the footprint showed in bpftool
> > > and the memory actually allocated by bpf in some cases, for example,
> > >
> > > - non-preallocated bpf map
> > >   The non-preallocated bpf map memory usage is dynamically changed. The
> > >   allocated elements count can be from 0 to the max entries. But the
> > >   memory footprint in bpftool only shows a fixed number.
> > > - bpf metadata consumes more memory than bpf element
> > >   In some corner cases, the bpf metadata can consumes a lot more memory
> > >   than bpf element consumes. For example, it can happen when the element
> > >   size is quite small.
> >
> > Just following up slightly on previous comment.
> >
> > The metadata should be fixed and knowable correct?
>
> The metadata of BPF itself is fixed, but the medata of MM allocation
> depends on the kernel configuretion.
>
> > What I'm getting at
> > is if this can be calculated directly instead of through a BPF helper
> > and walking the entire map.
> >
>
> As I explained in another thread, it doesn't walk the entire map.
>
> > >
> > > We need a way to get the bpf memory usage especially there will be more
> > > and more bpf programs running on the production environment and thus the
> > > bpf memory usage is not trivial.
> >
> > In our environments we track map usage so we always know how many entries
> > are in a map. I don't think we use this to calculate memory footprint
> > at the moment, but just for map usage. Seems though once you have this
> > calculating memory footprint can be done out of band because element
> > and overheads costs are fixed.
> >
> > >
> > > This patchset introduces a new map ops ->map_mem_usage to get the memory
> > > usage. In this ops, the memory usage is got from the pointers which is
> > > already allocated by a bpf map. To make the code simple, we igore some
> > > small pointers as their size are quite small compared with the total
> > > usage.
> > >
> > > In order to get the memory size from the pointers, some generic mm helpers
> > > are introduced firstly, for example, percpu_size(), vsize() and kvsize().
> > >
> > > This patchset only implements the bpf memory usage for hashtab. I will
> > > extend it to other maps and bpf progs (bpf progs can dynamically allocate
> > > memory via bpf_obj_new()) in the future.
> >
> > My preference would be to calculate this out of band. Walking a
> > large map and doing it in a critical section to get the memory
> > usage seems not optimal
> >
>
> I don't quite understand what you mean by calculating it out of band.
> This patchset introduces a BPF helper which is used in bpftool, so it
> is already out of band, right ?
> We should do it in bpftool, because the sys admin wants a generic way
> to get the system-wide bpf memory usage.
>
> --
> Regards
> Yafang
Yafang Shao Feb. 7, 2023, 7:02 a.m. UTC | #5
On Tue, Feb 7, 2023 at 8:49 AM Ho-Ren Chuang <horenc@vt.edu> wrote:
>
> Hi Yafang and everyone,
>
> We've proposed very similar features at https://lore.kernel.org/bpf/CAAYibXgiCOOEY9NvLXbY4ve7pH8xWrZjnczrj6SHy3x_TtOU1g@mail.gmail.com/#t
>

I have looked through your patchset. Maybe we can use max_entires  to
show the used_enties for preallocated hashtab?  Because for the
preallocated hashtab, the memory is already allocated, so it doesn't
matter how many entries it is using now. Then we can avoid the runtime
overhead which Alexei is worried about.

>
> We are very excited seeing we are not the only ones eager to have this feature upstream to monitor eBPF map's actual usage. This shows the need for having such an ability in eBPF.
>

Happy to hear that this feature could help you.
I think over time there will be more users who want to monitor the bpf
memory usage :)

>
> Regarding the use cases please also check https://lore.kernel.org/all/CAADnVQLBt0snxv4bKwg1WKQ9wDFbaDCtZ03v1-LjOTYtsKPckQ@mail.gmail.com/#t . We are developing an app to monitor memory footprints used by eBPF programs/maps similar to Linux `top` command.
>
>
> Thank you,
>