mbox series

[bpf-next,v3,0/3] bpf: inline bpf_kptr_xchg()

Message ID 20240105104819.3916743-1-houtao@huaweicloud.com (mailing list archive)
Headers show
Series bpf: inline bpf_kptr_xchg() | expand

Message

Hou Tao Jan. 5, 2024, 10:48 a.m. UTC
From: Hou Tao <houtao1@huawei.com>

Hi,

The motivation of inlining bpf_kptr_xchg() comes from the performance
profiling of bpf memory allocator benchmark [1]. The benchmark uses
bpf_kptr_xchg() to stash the allocated objects and to pop the stashed
objects for free. After inling bpf_kptr_xchg(), the performance for
object free on 8-CPUs VM increases about 2%~10%. However the performance
gain comes with costs: both the kasan and kcsan checks on the pointer
will be unavailable. Initially the inline is implemented in do_jit() for
x86-64 directly, but I think it will more portable to implement the
inline in verifier.

Patch #1 supports inlining bpf_kptr_xchg() helper and enables it on
x86-4. Patch #2 factors out a helper for newly-added test in patch #3.
Patch #3 tests whether the inlining of bpf_kptr_xchg() is expected.
Please see individual patches for more details. And comments are always
welcome.

Change Log:
v3:
  * rebased on bpf-next tree
  * patch 1 & 2: Add Rvb-by and Ack-by tags from Eduard
  * patch 3: use inline assembly and naked function instead of c code
             (suggested by Eduard)

v2: https://lore.kernel.org/bpf/20231223104042.1432300-1-houtao@huaweicloud.com/
  * rebased on bpf-next tree
  * drop patch #1 in v1 due to discussion in [2]
  * patch #1: add the motivation in the commit message, merge patch #1
              and #3 into the new patch in v2. (Daniel)
  * patch #2/#3: newly-added patch to test the inlining of
                 bpf_kptr_xchg() (Eduard)

v1: https://lore.kernel.org/bpf/95b8c2cd-44d5-5fe1-60b5-7e8218779566@huaweicloud.com/

[1]: https://lore.kernel.org/bpf/20231221141501.3588586-1-houtao@huaweicloud.com/
[2]: https://lore.kernel.org/bpf/fd94efb9-4a56-c982-dc2e-c66be5202cb7@huaweicloud.com/

Hou Tao (3):
  bpf: Support inlining bpf_kptr_xchg() helper
  selftests/bpf: Factor out get_xlated_program() helper
  selftests/bpf: Test the inlining of bpf_kptr_xchg()

 arch/x86/net/bpf_jit_comp.c                   |  5 ++
 include/linux/filter.h                        |  1 +
 kernel/bpf/core.c                             | 10 ++++
 kernel/bpf/helpers.c                          |  1 +
 kernel/bpf/verifier.c                         | 17 +++++++
 .../selftests/bpf/prog_tests/ctx_rewrite.c    | 44 ----------------
 .../bpf/prog_tests/kptr_xchg_inline.c         | 51 +++++++++++++++++++
 .../selftests/bpf/progs/kptr_xchg_inline.c    | 48 +++++++++++++++++
 tools/testing/selftests/bpf/test_verifier.c   | 47 +----------------
 tools/testing/selftests/bpf/testing_helpers.c | 42 +++++++++++++++
 tools/testing/selftests/bpf/testing_helpers.h |  6 +++
 11 files changed, 183 insertions(+), 89 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/kptr_xchg_inline.c
 create mode 100644 tools/testing/selftests/bpf/progs/kptr_xchg_inline.c

Comments

Song Liu Jan. 5, 2024, 10:53 p.m. UTC | #1
On Fri, Jan 5, 2024 at 2:47 AM Hou Tao <houtao@huaweicloud.com> wrote:
>
> From: Hou Tao <houtao1@huawei.com>
>
> Hi,
>
> The motivation of inlining bpf_kptr_xchg() comes from the performance
> profiling of bpf memory allocator benchmark [1]. The benchmark uses
> bpf_kptr_xchg() to stash the allocated objects and to pop the stashed
> objects for free. After inling bpf_kptr_xchg(), the performance for
> object free on 8-CPUs VM increases about 2%~10%. However the performance
> gain comes with costs: both the kasan and kcsan checks on the pointer
> will be unavailable. Initially the inline is implemented in do_jit() for
> x86-64 directly, but I think it will more portable to implement the
> inline in verifier.

How much work would it take to enable this on other major architectures?
AFAICT, most jit compilers already handle BPF_XCHG, so it should be
relatively simple?

Other than this, for the set

Acked-by: Song Liu <song@kernel.org>

Thanks,
Song
Hou Tao Jan. 6, 2024, 2:34 a.m. UTC | #2
On 1/6/2024 6:53 AM, Song Liu wrote:
> On Fri, Jan 5, 2024 at 2:47 AM Hou Tao <houtao@huaweicloud.com> wrote:
>> From: Hou Tao <houtao1@huawei.com>
>>
>> Hi,
>>
>> The motivation of inlining bpf_kptr_xchg() comes from the performance
>> profiling of bpf memory allocator benchmark [1]. The benchmark uses
>> bpf_kptr_xchg() to stash the allocated objects and to pop the stashed
>> objects for free. After inling bpf_kptr_xchg(), the performance for
>> object free on 8-CPUs VM increases about 2%~10%. However the performance
>> gain comes with costs: both the kasan and kcsan checks on the pointer
>> will be unavailable. Initially the inline is implemented in do_jit() for
>> x86-64 directly, but I think it will more portable to implement the
>> inline in verifier.
> How much work would it take to enable this on other major architectures?
> AFAICT, most jit compilers already handle BPF_XCHG, so it should be
> relatively simple?

Yes. I think enabling this inline will be relatively simple. As said in
patch #1, the inline depends on two conditions:
1) atomic_xchg() support on pointer-sized word.
2)  the implementation of xchg is the same as atomic_xchg() on
pointer-sized words.
For condition 1), I think most major architecture JIT backends have
support it. So the following work is to check the implementation of xchg
and atomic_xchg(), to enable the inline and to do more test.

I will try to enable the inline on arm64 first. And will x86-64 + arm64
be enough for the definition of "major architectures" ? Or Should it
include riscv, s380, powerpc as well ?

> Other than this, for the set
>
> Acked-by: Song Liu <song@kernel.org>

Thanks for the ack.
>
> Thanks,
> Song
> .
Song Liu Jan. 6, 2024, 8:51 a.m. UTC | #3
On Fri, Jan 5, 2024 at 6:34 PM Hou Tao <houtao@huaweicloud.com> wrote:
>
>
>
> On 1/6/2024 6:53 AM, Song Liu wrote:
> > On Fri, Jan 5, 2024 at 2:47 AM Hou Tao <houtao@huaweicloud.com> wrote:
> >> From: Hou Tao <houtao1@huawei.com>
> >>
> >> Hi,
> >>
> >> The motivation of inlining bpf_kptr_xchg() comes from the performance
> >> profiling of bpf memory allocator benchmark [1]. The benchmark uses
> >> bpf_kptr_xchg() to stash the allocated objects and to pop the stashed
> >> objects for free. After inling bpf_kptr_xchg(), the performance for
> >> object free on 8-CPUs VM increases about 2%~10%. However the performance
> >> gain comes with costs: both the kasan and kcsan checks on the pointer
> >> will be unavailable. Initially the inline is implemented in do_jit() for
> >> x86-64 directly, but I think it will more portable to implement the
> >> inline in verifier.
> > How much work would it take to enable this on other major architectures?
> > AFAICT, most jit compilers already handle BPF_XCHG, so it should be
> > relatively simple?
>
> Yes. I think enabling this inline will be relatively simple. As said in
> patch #1, the inline depends on two conditions:
> 1) atomic_xchg() support on pointer-sized word.
> 2)  the implementation of xchg is the same as atomic_xchg() on
> pointer-sized words.
> For condition 1), I think most major architecture JIT backends have
> support it. So the following work is to check the implementation of xchg
> and atomic_xchg(), to enable the inline and to do more test.

Thanks for the clarification.

> I will try to enable the inline on arm64 first. And will x86-64 + arm64
> be enough for the definition of "major architectures" ? Or Should it
> include riscv, s380, powerpc as well ?

x86_64 + arm64 is "major" enough. :) Maintainers of other JIT engines
can help with other archs.

Thanks,
Song
patchwork-bot+netdevbpf@kernel.org Jan. 12, 2024, 2:30 a.m. UTC | #4
Hello:

This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Fri,  5 Jan 2024 18:48:16 +0800 you wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> Hi,
> 
> The motivation of inlining bpf_kptr_xchg() comes from the performance
> profiling of bpf memory allocator benchmark [1]. The benchmark uses
> bpf_kptr_xchg() to stash the allocated objects and to pop the stashed
> objects for free. After inling bpf_kptr_xchg(), the performance for
> object free on 8-CPUs VM increases about 2%~10%. However the performance
> gain comes with costs: both the kasan and kcsan checks on the pointer
> will be unavailable. Initially the inline is implemented in do_jit() for
> x86-64 directly, but I think it will more portable to implement the
> inline in verifier.
> 
> [...]

Here is the summary with links:
  - [bpf-next,v3,1/3] bpf: Support inlining bpf_kptr_xchg() helper
    https://git.kernel.org/bpf/bpf-next/c/ac780beba187
  - [bpf-next,v3,2/3] selftests/bpf: Factor out get_xlated_program() helper
    https://git.kernel.org/bpf/bpf-next/c/10cdab919df6
  - [bpf-next,v3,3/3] selftests/bpf: Test the inlining of bpf_kptr_xchg()
    https://git.kernel.org/bpf/bpf-next/c/ca8cf57c7754

You are awesome, thank you!