Message ID | 20201201215900.3569844-1-guro@fb.com (mailing list archive) |
---|---|
Headers | show |
Series | bpf: switch to memcg-based memory accounting | expand |
Hello: This series was applied to bpf/bpf-next.git (refs/heads/master): On Tue, 1 Dec 2020 13:58:26 -0800 you wrote: > Currently bpf is using the memlock rlimit for the memory accounting. > This approach has its downsides and over time has created a significant > amount of problems: > > 1) The limit is per-user, but because most bpf operations are performed > as root, the limit has a little value. > > [...] Here is the summary with links: - [bpf-next,v9,01/34] mm: memcontrol: use helpers to read page's memcg data https://git.kernel.org/bpf/bpf-next/c/bcfe06bf2622 - [bpf-next,v9,02/34] mm: memcontrol/slab: use helpers to access slab page's memcg_data https://git.kernel.org/bpf/bpf-next/c/270c6a71460e - [bpf-next,v9,03/34] mm: introduce page memcg flags https://git.kernel.org/bpf/bpf-next/c/87944e2992bd - [bpf-next,v9,04/34] mm: convert page kmemcg type to a page memcg flag https://git.kernel.org/bpf/bpf-next/c/18b2db3b0385 - [bpf-next,v9,05/34] bpf: memcg-based memory accounting for bpf progs https://git.kernel.org/bpf/bpf-next/c/ddf8503c7c43 - [bpf-next,v9,06/34] bpf: prepare for memcg-based memory accounting for bpf maps https://git.kernel.org/bpf/bpf-next/c/48edc1f78aab - [bpf-next,v9,07/34] bpf: memcg-based memory accounting for bpf maps https://git.kernel.org/bpf/bpf-next/c/d5299b67dd59 - [bpf-next,v9,08/34] bpf: refine memcg-based memory accounting for arraymap maps https://git.kernel.org/bpf/bpf-next/c/6d192c7938b7 - [bpf-next,v9,09/34] bpf: refine memcg-based memory accounting for cpumap maps https://git.kernel.org/bpf/bpf-next/c/e88cc05b61f3 - [bpf-next,v9,10/34] bpf: memcg-based memory accounting for cgroup storage maps https://git.kernel.org/bpf/bpf-next/c/3a61c7c58b30 - [bpf-next,v9,11/34] bpf: refine memcg-based memory accounting for devmap maps https://git.kernel.org/bpf/bpf-next/c/1440290adf7b - [bpf-next,v9,12/34] bpf: refine memcg-based memory accounting for hashtab maps https://git.kernel.org/bpf/bpf-next/c/881456811a33 - [bpf-next,v9,13/34] bpf: memcg-based memory accounting for lpm_trie maps https://git.kernel.org/bpf/bpf-next/c/353e7af4bf5e - [bpf-next,v9,14/34] bpf: memcg-based memory accounting for bpf ringbuffer https://git.kernel.org/bpf/bpf-next/c/be4035c734d1 - [bpf-next,v9,15/34] bpf: memcg-based memory accounting for bpf local storage maps https://git.kernel.org/bpf/bpf-next/c/e9aae8beba82 - [bpf-next,v9,16/34] bpf: refine memcg-based memory accounting for sockmap and sockhash maps https://git.kernel.org/bpf/bpf-next/c/7846dd9f835e - [bpf-next,v9,17/34] bpf: refine memcg-based memory accounting for xskmap maps https://git.kernel.org/bpf/bpf-next/c/28e1dcdef0cb - [bpf-next,v9,18/34] bpf: eliminate rlimit-based memory accounting for arraymap maps https://git.kernel.org/bpf/bpf-next/c/1bc5975613ed - [bpf-next,v9,19/34] bpf: eliminate rlimit-based memory accounting for bpf_struct_ops maps https://git.kernel.org/bpf/bpf-next/c/f043733f31e5 - [bpf-next,v9,20/34] bpf: eliminate rlimit-based memory accounting for cpumap maps https://git.kernel.org/bpf/bpf-next/c/711cabaf1432 - [bpf-next,v9,21/34] bpf: eliminate rlimit-based memory accounting for cgroup storage maps https://git.kernel.org/bpf/bpf-next/c/087b0d39fe22 - [bpf-next,v9,22/34] bpf: eliminate rlimit-based memory accounting for devmap maps https://git.kernel.org/bpf/bpf-next/c/844f157f6c0a - [bpf-next,v9,23/34] bpf: eliminate rlimit-based memory accounting for hashtab maps https://git.kernel.org/bpf/bpf-next/c/755e5d55367a - [bpf-next,v9,24/34] bpf: eliminate rlimit-based memory accounting for lpm_trie maps https://git.kernel.org/bpf/bpf-next/c/cbddcb574d41 - [bpf-next,v9,25/34] bpf: eliminate rlimit-based memory accounting for queue_stack_maps maps https://git.kernel.org/bpf/bpf-next/c/a37fb7ef24a4 - [bpf-next,v9,26/34] bpf: eliminate rlimit-based memory accounting for reuseport_array maps https://git.kernel.org/bpf/bpf-next/c/db54330d3e13 - [bpf-next,v9,27/34] bpf: eliminate rlimit-based memory accounting for bpf ringbuffer https://git.kernel.org/bpf/bpf-next/c/abbdd0813f34 - [bpf-next,v9,28/34] bpf: eliminate rlimit-based memory accounting for sockmap and sockhash maps https://git.kernel.org/bpf/bpf-next/c/0d2c4f964050 - [bpf-next,v9,29/34] bpf: eliminate rlimit-based memory accounting for stackmap maps https://git.kernel.org/bpf/bpf-next/c/370868107bf6 - [bpf-next,v9,30/34] bpf: eliminate rlimit-based memory accounting for xskmap maps https://git.kernel.org/bpf/bpf-next/c/819a4f323579 - [bpf-next,v9,31/34] bpf: eliminate rlimit-based memory accounting for bpf local storage maps https://git.kernel.org/bpf/bpf-next/c/ab31be378a63 - [bpf-next,v9,32/34] bpf: eliminate rlimit-based memory accounting infra for bpf maps https://git.kernel.org/bpf/bpf-next/c/80ee81e0403c - [bpf-next,v9,33/34] bpf: eliminate rlimit-based memory accounting for bpf progs https://git.kernel.org/bpf/bpf-next/c/3ac1f01b43b6 - [bpf-next,v9,34/34] bpf: samples: do not touch RLIMIT_MEMLOCK https://git.kernel.org/bpf/bpf-next/c/5b0764b2d345 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html
On Tue, Dec 1, 2020 at 1:59 PM Roman Gushchin <guro@fb.com> wrote: > > 5) Cryptic -EPERM is returned on exceeding the limit. Libbpf even had > a function to "explain" this case for users. ... > v9: > - always charge the saved memory cgroup, by Daniel, Toke and Alexei > - added bpf_map_kzalloc() > - rebase and minor fixes This looks great. Applied. Please follow up with a change to libbpf's pr_perm_msg(). That helpful warning should stay for old kernels, but it would be misleading for new kernels. libbpf probably needs a feature check to make this warning conditional. Thanks!
On Wed, Dec 02, 2020 at 06:54:46PM -0800, Alexei Starovoitov wrote: > On Tue, Dec 1, 2020 at 1:59 PM Roman Gushchin <guro@fb.com> wrote: > > > > 5) Cryptic -EPERM is returned on exceeding the limit. Libbpf even had > > a function to "explain" this case for users. > ... > > v9: > > - always charge the saved memory cgroup, by Daniel, Toke and Alexei > > - added bpf_map_kzalloc() > > - rebase and minor fixes > > This looks great. Applied. Thanks! > Please follow up with a change to libbpf's pr_perm_msg(). > That helpful warning should stay for old kernels, but it would be > misleading for new kernels. > libbpf probably needs a feature check to make this warning conditional. I think we've discussed it several months ago and at that time we didn't find a good way to check this feature. I'll think again, but if somebody has any ideas here, I'll appreciate a lot.
On 12/3/20 4:26 AM, Roman Gushchin wrote: > On Wed, Dec 02, 2020 at 06:54:46PM -0800, Alexei Starovoitov wrote: >> On Tue, Dec 1, 2020 at 1:59 PM Roman Gushchin <guro@fb.com> wrote: >>> >>> 5) Cryptic -EPERM is returned on exceeding the limit. Libbpf even had >>> a function to "explain" this case for users. >> ... >>> v9: >>> - always charge the saved memory cgroup, by Daniel, Toke and Alexei >>> - added bpf_map_kzalloc() >>> - rebase and minor fixes >> >> This looks great. Applied. > > Thanks! > >> Please follow up with a change to libbpf's pr_perm_msg(). >> That helpful warning should stay for old kernels, but it would be >> misleading for new kernels. >> libbpf probably needs a feature check to make this warning conditional. > > I think we've discussed it several months ago and at that time we didn't > find a good way to check this feature. I'll think again, but if somebody > has any ideas here, I'll appreciate a lot. Hm, bit tricky, agree .. given we only throw the warning in pr_perm_msg() for non-root and thus probing options are also limited, otherwise just probing for a helper that was added in this same cycle would have been good enough as a simple heuristic. I wonder if it would make sense to add some hint inside the bpf_{prog,map}_show_fdinfo() to indicate that accounting with memcg is enabled for the prog/map one way or another? Not just for the sake of pr_perm_msg(), but in general for apps to stop messing with rlimit at this point. Maybe also bpftool feature probe could be extended to indicate that as well (e.g. the json output can be fed into Go natively).
On Fri, Dec 4, 2020 at 4:37 PM Daniel Borkmann <daniel@iogearbox.net> wrote: > > On 12/3/20 4:26 AM, Roman Gushchin wrote: > > On Wed, Dec 02, 2020 at 06:54:46PM -0800, Alexei Starovoitov wrote: > >> On Tue, Dec 1, 2020 at 1:59 PM Roman Gushchin <guro@fb.com> wrote: > >>> > >>> 5) Cryptic -EPERM is returned on exceeding the limit. Libbpf even had > >>> a function to "explain" this case for users. > >> ... > >>> v9: > >>> - always charge the saved memory cgroup, by Daniel, Toke and Alexei > >>> - added bpf_map_kzalloc() > >>> - rebase and minor fixes > >> > >> This looks great. Applied. > > > > Thanks! > > > >> Please follow up with a change to libbpf's pr_perm_msg(). > >> That helpful warning should stay for old kernels, but it would be > >> misleading for new kernels. > >> libbpf probably needs a feature check to make this warning conditional. > > > > I think we've discussed it several months ago and at that time we didn't > > find a good way to check this feature. I'll think again, but if somebody > > has any ideas here, I'll appreciate a lot. > > Hm, bit tricky, agree .. given we only throw the warning in pr_perm_msg() for > non-root and thus probing options are also limited, otherwise just probing for > a helper that was added in this same cycle would have been good enough as a > simple heuristic. I wonder if it would make sense to add some hint inside the > bpf_{prog,map}_show_fdinfo() to indicate that accounting with memcg is enabled I think the initial version was emitting 0 for memlock, so that was a pretty simple way to prove stuff. But I think it was changed at the last minute to emit some non-zero "estimate" of memory used or something like that? > for the prog/map one way or another? Not just for the sake of pr_perm_msg(), but > in general for apps to stop messing with rlimit at this point. Maybe also bpftool > feature probe could be extended to indicate that as well (e.g. the json output > can be fed into Go natively).