Message ID | 20240105104819.3916743-1-houtao@huaweicloud.com (mailing list archive) |
---|---|
Headers | show |
Series | bpf: inline bpf_kptr_xchg() | expand |
On Fri, Jan 5, 2024 at 2:47 AM Hou Tao <houtao@huaweicloud.com> wrote: > > From: Hou Tao <houtao1@huawei.com> > > Hi, > > The motivation of inlining bpf_kptr_xchg() comes from the performance > profiling of bpf memory allocator benchmark [1]. The benchmark uses > bpf_kptr_xchg() to stash the allocated objects and to pop the stashed > objects for free. After inling bpf_kptr_xchg(), the performance for > object free on 8-CPUs VM increases about 2%~10%. However the performance > gain comes with costs: both the kasan and kcsan checks on the pointer > will be unavailable. Initially the inline is implemented in do_jit() for > x86-64 directly, but I think it will more portable to implement the > inline in verifier. How much work would it take to enable this on other major architectures? AFAICT, most jit compilers already handle BPF_XCHG, so it should be relatively simple? Other than this, for the set Acked-by: Song Liu <song@kernel.org> Thanks, Song
On 1/6/2024 6:53 AM, Song Liu wrote: > On Fri, Jan 5, 2024 at 2:47 AM Hou Tao <houtao@huaweicloud.com> wrote: >> From: Hou Tao <houtao1@huawei.com> >> >> Hi, >> >> The motivation of inlining bpf_kptr_xchg() comes from the performance >> profiling of bpf memory allocator benchmark [1]. The benchmark uses >> bpf_kptr_xchg() to stash the allocated objects and to pop the stashed >> objects for free. After inling bpf_kptr_xchg(), the performance for >> object free on 8-CPUs VM increases about 2%~10%. However the performance >> gain comes with costs: both the kasan and kcsan checks on the pointer >> will be unavailable. Initially the inline is implemented in do_jit() for >> x86-64 directly, but I think it will more portable to implement the >> inline in verifier. > How much work would it take to enable this on other major architectures? > AFAICT, most jit compilers already handle BPF_XCHG, so it should be > relatively simple? Yes. I think enabling this inline will be relatively simple. As said in patch #1, the inline depends on two conditions: 1) atomic_xchg() support on pointer-sized word. 2) the implementation of xchg is the same as atomic_xchg() on pointer-sized words. For condition 1), I think most major architecture JIT backends have support it. So the following work is to check the implementation of xchg and atomic_xchg(), to enable the inline and to do more test. I will try to enable the inline on arm64 first. And will x86-64 + arm64 be enough for the definition of "major architectures" ? Or Should it include riscv, s380, powerpc as well ? > Other than this, for the set > > Acked-by: Song Liu <song@kernel.org> Thanks for the ack. > > Thanks, > Song > .
On Fri, Jan 5, 2024 at 6:34 PM Hou Tao <houtao@huaweicloud.com> wrote: > > > > On 1/6/2024 6:53 AM, Song Liu wrote: > > On Fri, Jan 5, 2024 at 2:47 AM Hou Tao <houtao@huaweicloud.com> wrote: > >> From: Hou Tao <houtao1@huawei.com> > >> > >> Hi, > >> > >> The motivation of inlining bpf_kptr_xchg() comes from the performance > >> profiling of bpf memory allocator benchmark [1]. The benchmark uses > >> bpf_kptr_xchg() to stash the allocated objects and to pop the stashed > >> objects for free. After inling bpf_kptr_xchg(), the performance for > >> object free on 8-CPUs VM increases about 2%~10%. However the performance > >> gain comes with costs: both the kasan and kcsan checks on the pointer > >> will be unavailable. Initially the inline is implemented in do_jit() for > >> x86-64 directly, but I think it will more portable to implement the > >> inline in verifier. > > How much work would it take to enable this on other major architectures? > > AFAICT, most jit compilers already handle BPF_XCHG, so it should be > > relatively simple? > > Yes. I think enabling this inline will be relatively simple. As said in > patch #1, the inline depends on two conditions: > 1) atomic_xchg() support on pointer-sized word. > 2) the implementation of xchg is the same as atomic_xchg() on > pointer-sized words. > For condition 1), I think most major architecture JIT backends have > support it. So the following work is to check the implementation of xchg > and atomic_xchg(), to enable the inline and to do more test. Thanks for the clarification. > I will try to enable the inline on arm64 first. And will x86-64 + arm64 > be enough for the definition of "major architectures" ? Or Should it > include riscv, s380, powerpc as well ? x86_64 + arm64 is "major" enough. :) Maintainers of other JIT engines can help with other archs. Thanks, Song
Hello: This series was applied to bpf/bpf-next.git (master) by Alexei Starovoitov <ast@kernel.org>: On Fri, 5 Jan 2024 18:48:16 +0800 you wrote: > From: Hou Tao <houtao1@huawei.com> > > Hi, > > The motivation of inlining bpf_kptr_xchg() comes from the performance > profiling of bpf memory allocator benchmark [1]. The benchmark uses > bpf_kptr_xchg() to stash the allocated objects and to pop the stashed > objects for free. After inling bpf_kptr_xchg(), the performance for > object free on 8-CPUs VM increases about 2%~10%. However the performance > gain comes with costs: both the kasan and kcsan checks on the pointer > will be unavailable. Initially the inline is implemented in do_jit() for > x86-64 directly, but I think it will more portable to implement the > inline in verifier. > > [...] Here is the summary with links: - [bpf-next,v3,1/3] bpf: Support inlining bpf_kptr_xchg() helper https://git.kernel.org/bpf/bpf-next/c/ac780beba187 - [bpf-next,v3,2/3] selftests/bpf: Factor out get_xlated_program() helper https://git.kernel.org/bpf/bpf-next/c/10cdab919df6 - [bpf-next,v3,3/3] selftests/bpf: Test the inlining of bpf_kptr_xchg() https://git.kernel.org/bpf/bpf-next/c/ca8cf57c7754 You are awesome, thank you!
From: Hou Tao <houtao1@huawei.com> Hi, The motivation of inlining bpf_kptr_xchg() comes from the performance profiling of bpf memory allocator benchmark [1]. The benchmark uses bpf_kptr_xchg() to stash the allocated objects and to pop the stashed objects for free. After inling bpf_kptr_xchg(), the performance for object free on 8-CPUs VM increases about 2%~10%. However the performance gain comes with costs: both the kasan and kcsan checks on the pointer will be unavailable. Initially the inline is implemented in do_jit() for x86-64 directly, but I think it will more portable to implement the inline in verifier. Patch #1 supports inlining bpf_kptr_xchg() helper and enables it on x86-4. Patch #2 factors out a helper for newly-added test in patch #3. Patch #3 tests whether the inlining of bpf_kptr_xchg() is expected. Please see individual patches for more details. And comments are always welcome. Change Log: v3: * rebased on bpf-next tree * patch 1 & 2: Add Rvb-by and Ack-by tags from Eduard * patch 3: use inline assembly and naked function instead of c code (suggested by Eduard) v2: https://lore.kernel.org/bpf/20231223104042.1432300-1-houtao@huaweicloud.com/ * rebased on bpf-next tree * drop patch #1 in v1 due to discussion in [2] * patch #1: add the motivation in the commit message, merge patch #1 and #3 into the new patch in v2. (Daniel) * patch #2/#3: newly-added patch to test the inlining of bpf_kptr_xchg() (Eduard) v1: https://lore.kernel.org/bpf/95b8c2cd-44d5-5fe1-60b5-7e8218779566@huaweicloud.com/ [1]: https://lore.kernel.org/bpf/20231221141501.3588586-1-houtao@huaweicloud.com/ [2]: https://lore.kernel.org/bpf/fd94efb9-4a56-c982-dc2e-c66be5202cb7@huaweicloud.com/ Hou Tao (3): bpf: Support inlining bpf_kptr_xchg() helper selftests/bpf: Factor out get_xlated_program() helper selftests/bpf: Test the inlining of bpf_kptr_xchg() arch/x86/net/bpf_jit_comp.c | 5 ++ include/linux/filter.h | 1 + kernel/bpf/core.c | 10 ++++ kernel/bpf/helpers.c | 1 + kernel/bpf/verifier.c | 17 +++++++ .../selftests/bpf/prog_tests/ctx_rewrite.c | 44 ---------------- .../bpf/prog_tests/kptr_xchg_inline.c | 51 +++++++++++++++++++ .../selftests/bpf/progs/kptr_xchg_inline.c | 48 +++++++++++++++++ tools/testing/selftests/bpf/test_verifier.c | 47 +---------------- tools/testing/selftests/bpf/testing_helpers.c | 42 +++++++++++++++ tools/testing/selftests/bpf/testing_helpers.h | 6 +++ 11 files changed, 183 insertions(+), 89 deletions(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/kptr_xchg_inline.c create mode 100644 tools/testing/selftests/bpf/progs/kptr_xchg_inline.c