mbox series

[RFC,bpf-next,v4,0/4] bpf, x64: Fix tailcall infinite loop

Message ID 20230903151448.61696-1-hffilwlqm@gmail.com (mailing list archive)
Headers show
Series bpf, x64: Fix tailcall infinite loop | expand

Message

Leon Hwang Sept. 3, 2023, 3:14 p.m. UTC
This patch series fixes a tailcall infinite loop on x64.

From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and tailcall
handling in JIT"), the tailcall on x64 works better than before.

From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF subprograms
for x64 JIT"), tailcall is able to run in BPF subprograms on x64.

From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF program
to other BPF programs"), BPF program is able to trace other BPF programs.

How about combining them all together?

1. FENTRY/FEXIT on a BPF subprogram.
2. A tailcall runs in the BPF subprogram.
3. The tailcall calls the subprogram's caller.

As a result, a tailcall infinite loop comes up. And the loop would halt
the machine.

As we know, in tail call context, the tail_call_cnt propagates by stack
and rax register between BPF subprograms. So do in trampolines.

How did I discover the bug?

From commit 7f6e4312e15a5c37 ("bpf: Limit caller's stack depth 256 for
subprogs with tailcalls"), the total stack size limits to around 8KiB.
Then, I write some bpf progs to validate the stack consuming, that are
tailcalls running in bpf2bpf and FENTRY/FEXIT tracing on bpf2bpf[1].

At that time, accidently, I made a tailcall loop. And then the loop halted
my VM. Without the loop, the bpf progs would consume over 8KiB stack size.
But the _stack-overflow_ did not halt my VM.

With bpf_printk(), I confirmed that the tailcall count limit did not work
expectedly. Next, read the code and fix it.

Finally, unfortunately, I only fix it on x64 but other arches. As a
result, CI tests failed because this bug hasn't been fixed on s390x.

Some helps on s390x are requested.

v3 -> v4:
  * Suggestions from Maciej:
    * Separate tail_call_cnt initialisation to a single patch.
    * Unnecessary to check subprog.

v2 -> v3:
  * Suggestions from Alexei:
    * Fix comment style.
    * Remove FIXME comment.
    * Remove arch_prepare_bpf_trampoline() function comment update.
    * Remove the unnecessary 'tgt_prog->aux->func[subprog]->is_func' check.

[1]: https://github.com/Asphaltt/learn-by-example/tree/main/ebpf/tailcall-stackoverflow

Leon Hwang (4):
  bpf, x64: Comment tail_call_cnt initialisation
  bpf, x64: Fix tailcall infinite loop
  selftests/bpf: Correct map_fd to data_fd in tailcalls
  selftests/bpf: Add testcases for tailcall infinite loop fixing

 arch/x86/net/bpf_jit_comp.c                   |  32 +-
 include/linux/bpf.h                           |   5 +
 kernel/bpf/trampoline.c                       |   4 +-
 kernel/bpf/verifier.c                         |   3 +
 .../selftests/bpf/prog_tests/tailcalls.c      | 311 +++++++++++++++++-
 .../bpf/progs/tailcall_bpf2bpf_fentry.c       |  18 +
 .../bpf/progs/tailcall_bpf2bpf_fexit.c        |  18 +
 7 files changed, 377 insertions(+), 14 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_fentry.c
 create mode 100644 tools/testing/selftests/bpf/progs/tailcall_bpf2bpf_fexit.c


base-commit: 9930e4af4b509bcf6f060b09b16884f26102d110

Comments

Ilya Leoshkevich Sept. 4, 2023, 1:10 p.m. UTC | #1
On Sun, 2023-09-03 at 23:14 +0800, Leon Hwang wrote:
> This patch series fixes a tailcall infinite loop on x64.
> 
> From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and
> tailcall
> handling in JIT"), the tailcall on x64 works better than before.
> 
> From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF
> subprograms
> for x64 JIT"), tailcall is able to run in BPF subprograms on x64.
> 
> From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF
> program
> to other BPF programs"), BPF program is able to trace other BPF
> programs.
> 
> How about combining them all together?
> 
> 1. FENTRY/FEXIT on a BPF subprogram.
> 2. A tailcall runs in the BPF subprogram.
> 3. The tailcall calls the subprogram's caller.
> 
> As a result, a tailcall infinite loop comes up. And the loop would
> halt
> the machine.
> 
> As we know, in tail call context, the tail_call_cnt propagates by
> stack
> and rax register between BPF subprograms. So do in trampolines.
> 
> How did I discover the bug?
> 
> From commit 7f6e4312e15a5c37 ("bpf: Limit caller's stack depth 256
> for
> subprogs with tailcalls"), the total stack size limits to around
> 8KiB.
> Then, I write some bpf progs to validate the stack consuming, that
> are
> tailcalls running in bpf2bpf and FENTRY/FEXIT tracing on bpf2bpf[1].
> 
> At that time, accidently, I made a tailcall loop. And then the loop
> halted
> my VM. Without the loop, the bpf progs would consume over 8KiB stack
> size.
> But the _stack-overflow_ did not halt my VM.
> 
> With bpf_printk(), I confirmed that the tailcall count limit did not
> work
> expectedly. Next, read the code and fix it.
> 
> Finally, unfortunately, I only fix it on x64 but other arches. As a
> result, CI tests failed because this bug hasn't been fixed on s390x.
> 
> Some helps on s390x are requested.

I will take a look, thanks for letting me know.
I noticed there was something peculiar in this area when implementing
the trampoline:

	 * Note 1: The callee can increment the tail call counter, but
	 * we do not load it back, since the x86 JIT does not do this
	 * either.

but I thought that this was intentional.


[...]
Leon Hwang Sept. 4, 2023, 3:15 p.m. UTC | #2
On 2023/9/4 21:10, Ilya Leoshkevich wrote:
> On Sun, 2023-09-03 at 23:14 +0800, Leon Hwang wrote:
>> This patch series fixes a tailcall infinite loop on x64.
>>
>> From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and
>> tailcall
>> handling in JIT"), the tailcall on x64 works better than before.
>>
>> From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF
>> subprograms
>> for x64 JIT"), tailcall is able to run in BPF subprograms on x64.
>>
>> From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF
>> program
>> to other BPF programs"), BPF program is able to trace other BPF
>> programs.
>>
>> How about combining them all together?
>>
>> 1. FENTRY/FEXIT on a BPF subprogram.
>> 2. A tailcall runs in the BPF subprogram.
>> 3. The tailcall calls the subprogram's caller.
>>
>> As a result, a tailcall infinite loop comes up. And the loop would
>> halt
>> the machine.
>>
>> As we know, in tail call context, the tail_call_cnt propagates by
>> stack
>> and rax register between BPF subprograms. So do in trampolines.
>>
>> How did I discover the bug?
>>
>> From commit 7f6e4312e15a5c37 ("bpf: Limit caller's stack depth 256
>> for
>> subprogs with tailcalls"), the total stack size limits to around
>> 8KiB.
>> Then, I write some bpf progs to validate the stack consuming, that
>> are
>> tailcalls running in bpf2bpf and FENTRY/FEXIT tracing on bpf2bpf[1].
>>
>> At that time, accidently, I made a tailcall loop. And then the loop
>> halted
>> my VM. Without the loop, the bpf progs would consume over 8KiB stack
>> size.
>> But the _stack-overflow_ did not halt my VM.
>>
>> With bpf_printk(), I confirmed that the tailcall count limit did not
>> work
>> expectedly. Next, read the code and fix it.
>>
>> Finally, unfortunately, I only fix it on x64 but other arches. As a
>> result, CI tests failed because this bug hasn't been fixed on s390x.
>>
>> Some helps on s390x are requested.
> 
> I will take a look, thanks for letting me know.

Great.

> I noticed there was something peculiar in this area when implementing
> the trampoline:
> 
> 	 * Note 1: The callee can increment the tail call counter, but
> 	 * we do not load it back, since the x86 JIT does not do this
> 	 * either.>
> but I thought that this was intentional.

I do think so.

But wait, should we load it back?

Let me show a demo:

struct {
	__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
	__uint(max_entries, 4);
	__uint(key_size, sizeof(__u32));
	__uint(value_size, sizeof(__u32));
} jmp_table SEC(".maps");

static __noinline
int subprog_tail_01(struct __sk_buff *skb)
{
	if (load_byte(skb, 0))
		bpf_tail_call_static(skb, &jmp_table, 1);
	else
		bpf_tail_call_static(skb, &jmp_table, 0);
	return 1;
}

static __noinline
int subprog_tail_23(struct __sk_buff *skb)
{
	if (load_byte(skb, 0))
		bpf_tail_call_static(skb, &jmp_table, 3);
	else
		bpf_tail_call_static(skb, &jmp_table, 2);
	return 1;
}

int count0 = 0;

SEC("tc")
int classifier_01(struct __sk_buff *skb)
{
	count0++;
	return subprog_tail_01(skb);
}

int count1 = 0;

SEC("tc")
int classifier_23(struct __sk_buff *skb)
{
	count1++;
	return subprog_tail_23(skb);
}

static __noinline
int subprog_tailcall(struct __sk_buff *skb, int index)
{
	volatile int retval = 0;
	bpf_tail_call(skb, &jmp_table, index);
	return retval;
}

SEC("tc")
int entry(struct __sk_buff *skb)
{
	subprog_tailcall(skb, 0);
	subprog_tailcall(skb, 2);

	return 0;
}

Finally, count0 is 33. And count1 is 33, too. It breaks the
MAX_TAIL_CALL_CNT limit by the way tailcall in subprog.

From 04fd61ab36ec065e ("bpf: allow bpf programs to tail-call other bpf
programs"):
The chain of tail calls can form unpredictable dynamic loops therefore
tail_call_cnt is used to limit the number of calls and currently is set
to 32.

It seems like that MAX_TAIL_CALL_CNT limits the max tailcall hierarchy
layers.

So, should we load it back?

Thanks,
Leon
Ilya Leoshkevich Sept. 6, 2023, 12:57 a.m. UTC | #3
On Mon, 2023-09-04 at 23:15 +0800, Leon Hwang wrote:
> 
> 
> On 2023/9/4 21:10, Ilya Leoshkevich wrote:
> > On Sun, 2023-09-03 at 23:14 +0800, Leon Hwang wrote:
> > > This patch series fixes a tailcall infinite loop on x64.
> > > 
> > > From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and
> > > tailcall
> > > handling in JIT"), the tailcall on x64 works better than before.
> > > 
> > > From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF
> > > subprograms
> > > for x64 JIT"), tailcall is able to run in BPF subprograms on x64.
> > > 
> > > From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF
> > > program
> > > to other BPF programs"), BPF program is able to trace other BPF
> > > programs.
> > > 
> > > How about combining them all together?
> > > 
> > > 1. FENTRY/FEXIT on a BPF subprogram.
> > > 2. A tailcall runs in the BPF subprogram.
> > > 3. The tailcall calls the subprogram's caller.
> > > 
> > > As a result, a tailcall infinite loop comes up. And the loop
> > > would
> > > halt
> > > the machine.
> > > 
> > > As we know, in tail call context, the tail_call_cnt propagates by
> > > stack
> > > and rax register between BPF subprograms. So do in trampolines.
> > > 
> > > How did I discover the bug?
> > > 
> > > From commit 7f6e4312e15a5c37 ("bpf: Limit caller's stack depth
> > > 256
> > > for
> > > subprogs with tailcalls"), the total stack size limits to around
> > > 8KiB.
> > > Then, I write some bpf progs to validate the stack consuming,
> > > that
> > > are
> > > tailcalls running in bpf2bpf and FENTRY/FEXIT tracing on
> > > bpf2bpf[1].
> > > 
> > > At that time, accidently, I made a tailcall loop. And then the
> > > loop
> > > halted
> > > my VM. Without the loop, the bpf progs would consume over 8KiB
> > > stack
> > > size.
> > > But the _stack-overflow_ did not halt my VM.
> > > 
> > > With bpf_printk(), I confirmed that the tailcall count limit did
> > > not
> > > work
> > > expectedly. Next, read the code and fix it.
> > > 
> > > Finally, unfortunately, I only fix it on x64 but other arches. As
> > > a
> > > result, CI tests failed because this bug hasn't been fixed on
> > > s390x.
> > > 
> > > Some helps on s390x are requested.
> > 
> > I will take a look, thanks for letting me know.
> 
> Great.
> 
> > I noticed there was something peculiar in this area when
> > implementing
> > the trampoline:
> > 
> >          * Note 1: The callee can increment the tail call counter,
> > but
> >          * we do not load it back, since the x86 JIT does not do
> > this
> >          * either.>
> > but I thought that this was intentional.
> 
> I do think so.
> 
> But wait, should we load it back?
> 
> Let me show a demo:
> 
> struct {
>         __uint(type, BPF_MAP_TYPE_PROG_ARRAY);
>         __uint(max_entries, 4);
>         __uint(key_size, sizeof(__u32));
>         __uint(value_size, sizeof(__u32));
> } jmp_table SEC(".maps");
> 
> static __noinline
> int subprog_tail_01(struct __sk_buff *skb)
> {
>         if (load_byte(skb, 0))
>                 bpf_tail_call_static(skb, &jmp_table, 1);
>         else
>                 bpf_tail_call_static(skb, &jmp_table, 0);
>         return 1;
> }
> 
> static __noinline
> int subprog_tail_23(struct __sk_buff *skb)
> {
>         if (load_byte(skb, 0))
>                 bpf_tail_call_static(skb, &jmp_table, 3);
>         else
>                 bpf_tail_call_static(skb, &jmp_table, 2);
>         return 1;
> }
> 
> int count0 = 0;
> 
> SEC("tc")
> int classifier_01(struct __sk_buff *skb)
> {
>         count0++;
>         return subprog_tail_01(skb);
> }
> 
> int count1 = 0;
> 
> SEC("tc")
> int classifier_23(struct __sk_buff *skb)
> {
>         count1++;
>         return subprog_tail_23(skb);
> }
> 
> static __noinline
> int subprog_tailcall(struct __sk_buff *skb, int index)
> {
>         volatile int retval = 0;
>         bpf_tail_call(skb, &jmp_table, index);
>         return retval;
> }
> 
> SEC("tc")
> int entry(struct __sk_buff *skb)
> {
>         subprog_tailcall(skb, 0);
>         subprog_tailcall(skb, 2);
> 
>         return 0;
> }
> 
> Finally, count0 is 33. And count1 is 33, too. It breaks the
> MAX_TAIL_CALL_CNT limit by the way tailcall in subprog.

Thanks for coming up with an example, I could not create something like
this back then and just left a note for the future.

> From 04fd61ab36ec065e ("bpf: allow bpf programs to tail-call other
> bpf
> programs"):
> The chain of tail calls can form unpredictable dynamic loops
> therefore
> tail_call_cnt is used to limit the number of calls and currently is
> set
> to 32.
> 
> It seems like that MAX_TAIL_CALL_CNT limits the max tailcall
> hierarchy
> layers.
> 
> So, should we load it back?

I wonder if the current behavior can lead to high system load in some
cases. The current limit on the number of instructions is 1M; with tail
calls it goes up to MAX_TAIL_CALL_CNT * 1M. If one uses the technique
above to the fullest extent, i.e., on each tail call hierarchy level,
can't one execute (2 ** MAX_TAIL_CALL_CNT) * 1M instructions as a
result?

> Thanks,
> Leon
Leon Hwang Sept. 6, 2023, 2:39 a.m. UTC | #4
On 6/9/23 08:57, Ilya Leoshkevich wrote:
> On Mon, 2023-09-04 at 23:15 +0800, Leon Hwang wrote:
>>
>>
>> On 2023/9/4 21:10, Ilya Leoshkevich wrote:
>>> On Sun, 2023-09-03 at 23:14 +0800, Leon Hwang wrote:
>>>> This patch series fixes a tailcall infinite loop on x64.
>>>>
>>>> From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and
>>>> tailcall
>>>> handling in JIT"), the tailcall on x64 works better than before.
>>>>
>>>> From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF
>>>> subprograms
>>>> for x64 JIT"), tailcall is able to run in BPF subprograms on x64.
>>>>
>>>> From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF
>>>> program
>>>> to other BPF programs"), BPF program is able to trace other BPF
>>>> programs.
>>>>
>>>> How about combining them all together?
>>>>
>>>> 1. FENTRY/FEXIT on a BPF subprogram.
>>>> 2. A tailcall runs in the BPF subprogram.
>>>> 3. The tailcall calls the subprogram's caller.
>>>>
>>>> As a result, a tailcall infinite loop comes up. And the loop
>>>> would
>>>> halt
>>>> the machine.
>>>>
>>>> As we know, in tail call context, the tail_call_cnt propagates by
>>>> stack
>>>> and rax register between BPF subprograms. So do in trampolines.
>>>>
>>>> How did I discover the bug?
>>>>
>>>> From commit 7f6e4312e15a5c37 ("bpf: Limit caller's stack depth
>>>> 256
>>>> for
>>>> subprogs with tailcalls"), the total stack size limits to around
>>>> 8KiB.
>>>> Then, I write some bpf progs to validate the stack consuming,
>>>> that
>>>> are
>>>> tailcalls running in bpf2bpf and FENTRY/FEXIT tracing on
>>>> bpf2bpf[1].
>>>>
>>>> At that time, accidently, I made a tailcall loop. And then the
>>>> loop
>>>> halted
>>>> my VM. Without the loop, the bpf progs would consume over 8KiB
>>>> stack
>>>> size.
>>>> But the _stack-overflow_ did not halt my VM.
>>>>
>>>> With bpf_printk(), I confirmed that the tailcall count limit did
>>>> not
>>>> work
>>>> expectedly. Next, read the code and fix it.
>>>>
>>>> Finally, unfortunately, I only fix it on x64 but other arches. As
>>>> a
>>>> result, CI tests failed because this bug hasn't been fixed on
>>>> s390x.
>>>>
>>>> Some helps on s390x are requested.
>>>
>>> I will take a look, thanks for letting me know.
>>
>> Great.
>>
>>> I noticed there was something peculiar in this area when
>>> implementing
>>> the trampoline:
>>>
>>>          * Note 1: The callee can increment the tail call counter,
>>> but
>>>          * we do not load it back, since the x86 JIT does not do
>>> this
>>>          * either.>
>>> but I thought that this was intentional.
>>
>> I do think so.
>>
>> But wait, should we load it back?
>>
>> Let me show a demo:
>>
>> struct {
>>         __uint(type, BPF_MAP_TYPE_PROG_ARRAY);
>>         __uint(max_entries, 4);
>>         __uint(key_size, sizeof(__u32));
>>         __uint(value_size, sizeof(__u32));
>> } jmp_table SEC(".maps");
>>
>> static __noinline
>> int subprog_tail_01(struct __sk_buff *skb)
>> {
>>         if (load_byte(skb, 0))
>>                 bpf_tail_call_static(skb, &jmp_table, 1);
>>         else
>>                 bpf_tail_call_static(skb, &jmp_table, 0);
>>         return 1;
>> }
>>
>> static __noinline
>> int subprog_tail_23(struct __sk_buff *skb)
>> {
>>         if (load_byte(skb, 0))
>>                 bpf_tail_call_static(skb, &jmp_table, 3);
>>         else
>>                 bpf_tail_call_static(skb, &jmp_table, 2);
>>         return 1;
>> }
>>
>> int count0 = 0;
>>
>> SEC("tc")
>> int classifier_01(struct __sk_buff *skb)
>> {
>>         count0++;
>>         return subprog_tail_01(skb);
>> }
>>
>> int count1 = 0;
>>
>> SEC("tc")
>> int classifier_23(struct __sk_buff *skb)
>> {
>>         count1++;
>>         return subprog_tail_23(skb);
>> }
>>
>> static __noinline
>> int subprog_tailcall(struct __sk_buff *skb, int index)
>> {
>>         volatile int retval = 0;
>>         bpf_tail_call(skb, &jmp_table, index);
>>         return retval;
>> }
>>
>> SEC("tc")
>> int entry(struct __sk_buff *skb)
>> {
>>         subprog_tailcall(skb, 0);
>>         subprog_tailcall(skb, 2);
>>
>>         return 0;
>> }
>>
>> Finally, count0 is 33. And count1 is 33, too. It breaks the
>> MAX_TAIL_CALL_CNT limit by the way tailcall in subprog.
> 
> Thanks for coming up with an example, I could not create something like
> this back then and just left a note for the future.
> 
>> From 04fd61ab36ec065e ("bpf: allow bpf programs to tail-call other
>> bpf
>> programs"):
>> The chain of tail calls can form unpredictable dynamic loops
>> therefore
>> tail_call_cnt is used to limit the number of calls and currently is
>> set
>> to 32.
>>
>> It seems like that MAX_TAIL_CALL_CNT limits the max tailcall
>> hierarchy
>> layers.
>>
>> So, should we load it back?
> 
> I wonder if the current behavior can lead to high system load in some
> cases. The current limit on the number of instructions is 1M; with tail
> calls it goes up to MAX_TAIL_CALL_CNT * 1M. If one uses the technique
> above to the fullest extent, i.e., on each tail call hierarchy level,
> can't one execute (2 ** MAX_TAIL_CALL_CNT) * 1M instructions as a
> result?

Executing (2 ** MAX_TAIL_CALL_CNT) * 1M instructions is really harmful.

So, we should load it back.

I'll try to fix it with another patchset.

Thanks,
Leon