mbox series

[v6,bpf-next,0/3] bpf: Fix tailcall hierarchy

Message ID 20240714123902.32305-1-hffilwlqm@gmail.com (mailing list archive)
Headers show
Series bpf: Fix tailcall hierarchy | expand

Message

Leon Hwang July 14, 2024, 12:38 p.m. UTC
This patchset fixes a tailcall hierarchy issue.

The issue is confirmed in the discussions of "bpf, x64: Fix tailcall
infinite loop"[0].

The issue has been resolved on both x86_64 and arm64[1].

I provide a long commit message in the "bpf, x64: Fix tailcall hierarchy"
patch to describe how the issue happens and how this patchset resolves the
issue in details.

How does this patchset resolve the issue?

In short, it stores tail_call_cnt on the stack of main prog, and propagates
tail_call_cnt_ptr to its subprogs.

First, at the prologue of main prog, it initializes tail_call_cnt and
prepares tail_call_cnt_ptr. And at the prologue of subprog, it reuses
the tail_call_cnt_ptr from caller.

Then, when a tailcall happens, it increments tail_call_cnt by its pointer.

v5 -> v6:
  * Address comments from Eduard:
    * Add JITed dumping along annotating comments in "bpf, x64: Fix
      tailcall hierarchy".
    * Rewrite two selftests with RUN_TESTS macro.

v4 -> v5:
  * Solution changes from tailcall run ctx to tail_call_cnt and its pointer.
    It's because v4 solution is unable to handle the case that there is no
    tailcall in subprog but there is tailcall in EXT prog which attaches to
    the subprog.

v3 -> v4:
  * Solution changes from per-task tail_call_cnt to tailcall run ctx.
    As for per-cpu/per-task solution, there is a case it is unable to handle[2].

v2 -> v3:
  * Solution changes from percpu tail_call_cnt to tail_call_cnt at task_struct.

v1 -> v2:
  * Solution changes from extra run-time call insn to percpu tail_call_cnt.
  * Address comments from Alexei:
    * Use percpu tail_call_cnt.
    * Use asm to make sure no callee saved registers are touched.

RFC v2 -> v1:
  * Solution changes from propagating tail_call_cnt with its pointer to extra
    run-time call insn.
  * Address comments from Maciej:
    * Replace all memcpy(prog, x86_nops[5], X86_PATCH_SIZE) with
        emit_nops(&prog, X86_PATCH_SIZE)

RFC v1 -> RFC v2:
  * Address comments from Stanislav:
    * Separate moving emit_nops() as first patch.

Links:
[0] https://lore.kernel.org/bpf/6203dd01-789d-f02c-5293-def4c1b18aef@gmail.com/
[1] https://github.com/kernel-patches/bpf/pull/7350/checks
[2] https://lore.kernel.org/bpf/CAADnVQK1qF+uBjwom2s2W-yEmgd_3rGi5Nr+KiV3cW0T+UPPfA@mail.gmail.com/

Leon Hwang (3):
  bpf, x64: Fix tailcall hierarchy
  bpf, arm64: Fix tailcall hierarchy
  selftests/bpf: Add testcases for tailcall hierarchy fixing

 arch/arm64/net/bpf_jit_comp.c                 |  57 +++-
 arch/x86/net/bpf_jit_comp.c                   | 107 ++++--
 .../selftests/bpf/prog_tests/tailcalls.c      | 320 ++++++++++++++++++
 .../bpf/progs/tailcall_bpf2bpf_hierarchy1.c   |  34 ++
 .../bpf/progs/tailcall_bpf2bpf_hierarchy2.c   |  70 ++++
 .../bpf/progs/tailcall_bpf2bpf_hierarchy3.c   |  62 ++++
 .../progs/tailcall_bpf2bpf_hierarchy_fentry.c |  35 ++
 tools/testing/selftests/bpf/progs/tc_dummy.c  |  12 +
 8 files changed, 653 insertions(+), 44 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy1.c
 create mode 100644 tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy2.c
 create mode 100644 tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy3.c
 create mode 100644 tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_hierarchy_fentry.c
 create mode 100644 tools/testing/selftests/bpf/progs/tc_dummy.c

Comments

bot+bpf-ci@kernel.org July 19, 2024, 11:52 p.m. UTC | #1
Dear patch submitter,

CI has tested the following submission:
Status:     FAILURE
Name:       [v6,bpf-next,0/3] bpf: Fix tailcall hierarchy
Patchwork:  https://patchwork.kernel.org/project/netdevbpf/list/?series=871133&state=*
Matrix:     https://github.com/kernel-patches/bpf/actions/runs/10015594671

Failed jobs:
test_maps-s390x-gcc: https://github.com/kernel-patches/bpf/actions/runs/10015594671/job/27687419423


Please note: this email is coming from an unmonitored mailbox. If you have
questions or feedback, please reach out to the Meta Kernel CI team at
kernel-ci@meta.com.
Alexei Starovoitov July 20, 2024, 2:57 a.m. UTC | #2
On Sun, Jul 14, 2024 at 5:39 AM Leon Hwang <hffilwlqm@gmail.com> wrote:
>
> This patchset fixes a tailcall hierarchy issue.

Thank you for sticking to it. Applied to bpf-next.

It took almost a year, but the final fix is imo much better
than all the previous attempts.
I suspect some tail_call users may see a small performance
degradation, but it's a trade off we had to make.
The evolution of the fix made the perf hit as small as possible.
patchwork-bot+netdevbpf@kernel.org July 20, 2024, 3 a.m. UTC | #3
Hello:

This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Sun, 14 Jul 2024 20:38:59 +0800 you wrote:
> This patchset fixes a tailcall hierarchy issue.
> 
> The issue is confirmed in the discussions of "bpf, x64: Fix tailcall
> infinite loop"[0].
> 
> The issue has been resolved on both x86_64 and arm64[1].
> 
> [...]

Here is the summary with links:
  - [v6,bpf-next,1/3] bpf, x64: Fix tailcall hierarchy
    https://git.kernel.org/bpf/bpf-next/c/00ac9693a3ba
  - [v6,bpf-next,2/3] bpf, arm64: Fix tailcall hierarchy
    https://git.kernel.org/bpf/bpf-next/c/a53c0f324aed
  - [v6,bpf-next,3/3] selftests/bpf: Add testcases for tailcall hierarchy fixing
    https://git.kernel.org/bpf/bpf-next/c/92a227f61911

You are awesome, thank you!