Message ID | 20230903151448.61696-1-hffilwlqm@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | bpf, x64: Fix tailcall infinite loop | expand |
On Sun, 2023-09-03 at 23:14 +0800, Leon Hwang wrote: > This patch series fixes a tailcall infinite loop on x64. > > From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and > tailcall > handling in JIT"), the tailcall on x64 works better than before. > > From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF > subprograms > for x64 JIT"), tailcall is able to run in BPF subprograms on x64. > > From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF > program > to other BPF programs"), BPF program is able to trace other BPF > programs. > > How about combining them all together? > > 1. FENTRY/FEXIT on a BPF subprogram. > 2. A tailcall runs in the BPF subprogram. > 3. The tailcall calls the subprogram's caller. > > As a result, a tailcall infinite loop comes up. And the loop would > halt > the machine. > > As we know, in tail call context, the tail_call_cnt propagates by > stack > and rax register between BPF subprograms. So do in trampolines. > > How did I discover the bug? > > From commit 7f6e4312e15a5c37 ("bpf: Limit caller's stack depth 256 > for > subprogs with tailcalls"), the total stack size limits to around > 8KiB. > Then, I write some bpf progs to validate the stack consuming, that > are > tailcalls running in bpf2bpf and FENTRY/FEXIT tracing on bpf2bpf[1]. > > At that time, accidently, I made a tailcall loop. And then the loop > halted > my VM. Without the loop, the bpf progs would consume over 8KiB stack > size. > But the _stack-overflow_ did not halt my VM. > > With bpf_printk(), I confirmed that the tailcall count limit did not > work > expectedly. Next, read the code and fix it. > > Finally, unfortunately, I only fix it on x64 but other arches. As a > result, CI tests failed because this bug hasn't been fixed on s390x. > > Some helps on s390x are requested. I will take a look, thanks for letting me know. I noticed there was something peculiar in this area when implementing the trampoline: * Note 1: The callee can increment the tail call counter, but * we do not load it back, since the x86 JIT does not do this * either. but I thought that this was intentional. [...]
On 2023/9/4 21:10, Ilya Leoshkevich wrote: > On Sun, 2023-09-03 at 23:14 +0800, Leon Hwang wrote: >> This patch series fixes a tailcall infinite loop on x64. >> >> From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and >> tailcall >> handling in JIT"), the tailcall on x64 works better than before. >> >> From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF >> subprograms >> for x64 JIT"), tailcall is able to run in BPF subprograms on x64. >> >> From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF >> program >> to other BPF programs"), BPF program is able to trace other BPF >> programs. >> >> How about combining them all together? >> >> 1. FENTRY/FEXIT on a BPF subprogram. >> 2. A tailcall runs in the BPF subprogram. >> 3. The tailcall calls the subprogram's caller. >> >> As a result, a tailcall infinite loop comes up. And the loop would >> halt >> the machine. >> >> As we know, in tail call context, the tail_call_cnt propagates by >> stack >> and rax register between BPF subprograms. So do in trampolines. >> >> How did I discover the bug? >> >> From commit 7f6e4312e15a5c37 ("bpf: Limit caller's stack depth 256 >> for >> subprogs with tailcalls"), the total stack size limits to around >> 8KiB. >> Then, I write some bpf progs to validate the stack consuming, that >> are >> tailcalls running in bpf2bpf and FENTRY/FEXIT tracing on bpf2bpf[1]. >> >> At that time, accidently, I made a tailcall loop. And then the loop >> halted >> my VM. Without the loop, the bpf progs would consume over 8KiB stack >> size. >> But the _stack-overflow_ did not halt my VM. >> >> With bpf_printk(), I confirmed that the tailcall count limit did not >> work >> expectedly. Next, read the code and fix it. >> >> Finally, unfortunately, I only fix it on x64 but other arches. As a >> result, CI tests failed because this bug hasn't been fixed on s390x. >> >> Some helps on s390x are requested. > > I will take a look, thanks for letting me know. Great. > I noticed there was something peculiar in this area when implementing > the trampoline: > > * Note 1: The callee can increment the tail call counter, but > * we do not load it back, since the x86 JIT does not do this > * either.> > but I thought that this was intentional. I do think so. But wait, should we load it back? Let me show a demo: struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 4); __uint(key_size, sizeof(__u32)); __uint(value_size, sizeof(__u32)); } jmp_table SEC(".maps"); static __noinline int subprog_tail_01(struct __sk_buff *skb) { if (load_byte(skb, 0)) bpf_tail_call_static(skb, &jmp_table, 1); else bpf_tail_call_static(skb, &jmp_table, 0); return 1; } static __noinline int subprog_tail_23(struct __sk_buff *skb) { if (load_byte(skb, 0)) bpf_tail_call_static(skb, &jmp_table, 3); else bpf_tail_call_static(skb, &jmp_table, 2); return 1; } int count0 = 0; SEC("tc") int classifier_01(struct __sk_buff *skb) { count0++; return subprog_tail_01(skb); } int count1 = 0; SEC("tc") int classifier_23(struct __sk_buff *skb) { count1++; return subprog_tail_23(skb); } static __noinline int subprog_tailcall(struct __sk_buff *skb, int index) { volatile int retval = 0; bpf_tail_call(skb, &jmp_table, index); return retval; } SEC("tc") int entry(struct __sk_buff *skb) { subprog_tailcall(skb, 0); subprog_tailcall(skb, 2); return 0; } Finally, count0 is 33. And count1 is 33, too. It breaks the MAX_TAIL_CALL_CNT limit by the way tailcall in subprog. From 04fd61ab36ec065e ("bpf: allow bpf programs to tail-call other bpf programs"): The chain of tail calls can form unpredictable dynamic loops therefore tail_call_cnt is used to limit the number of calls and currently is set to 32. It seems like that MAX_TAIL_CALL_CNT limits the max tailcall hierarchy layers. So, should we load it back? Thanks, Leon
On Mon, 2023-09-04 at 23:15 +0800, Leon Hwang wrote: > > > On 2023/9/4 21:10, Ilya Leoshkevich wrote: > > On Sun, 2023-09-03 at 23:14 +0800, Leon Hwang wrote: > > > This patch series fixes a tailcall infinite loop on x64. > > > > > > From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and > > > tailcall > > > handling in JIT"), the tailcall on x64 works better than before. > > > > > > From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF > > > subprograms > > > for x64 JIT"), tailcall is able to run in BPF subprograms on x64. > > > > > > From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF > > > program > > > to other BPF programs"), BPF program is able to trace other BPF > > > programs. > > > > > > How about combining them all together? > > > > > > 1. FENTRY/FEXIT on a BPF subprogram. > > > 2. A tailcall runs in the BPF subprogram. > > > 3. The tailcall calls the subprogram's caller. > > > > > > As a result, a tailcall infinite loop comes up. And the loop > > > would > > > halt > > > the machine. > > > > > > As we know, in tail call context, the tail_call_cnt propagates by > > > stack > > > and rax register between BPF subprograms. So do in trampolines. > > > > > > How did I discover the bug? > > > > > > From commit 7f6e4312e15a5c37 ("bpf: Limit caller's stack depth > > > 256 > > > for > > > subprogs with tailcalls"), the total stack size limits to around > > > 8KiB. > > > Then, I write some bpf progs to validate the stack consuming, > > > that > > > are > > > tailcalls running in bpf2bpf and FENTRY/FEXIT tracing on > > > bpf2bpf[1]. > > > > > > At that time, accidently, I made a tailcall loop. And then the > > > loop > > > halted > > > my VM. Without the loop, the bpf progs would consume over 8KiB > > > stack > > > size. > > > But the _stack-overflow_ did not halt my VM. > > > > > > With bpf_printk(), I confirmed that the tailcall count limit did > > > not > > > work > > > expectedly. Next, read the code and fix it. > > > > > > Finally, unfortunately, I only fix it on x64 but other arches. As > > > a > > > result, CI tests failed because this bug hasn't been fixed on > > > s390x. > > > > > > Some helps on s390x are requested. > > > > I will take a look, thanks for letting me know. > > Great. > > > I noticed there was something peculiar in this area when > > implementing > > the trampoline: > > > > * Note 1: The callee can increment the tail call counter, > > but > > * we do not load it back, since the x86 JIT does not do > > this > > * either.> > > but I thought that this was intentional. > > I do think so. > > But wait, should we load it back? > > Let me show a demo: > > struct { > __uint(type, BPF_MAP_TYPE_PROG_ARRAY); > __uint(max_entries, 4); > __uint(key_size, sizeof(__u32)); > __uint(value_size, sizeof(__u32)); > } jmp_table SEC(".maps"); > > static __noinline > int subprog_tail_01(struct __sk_buff *skb) > { > if (load_byte(skb, 0)) > bpf_tail_call_static(skb, &jmp_table, 1); > else > bpf_tail_call_static(skb, &jmp_table, 0); > return 1; > } > > static __noinline > int subprog_tail_23(struct __sk_buff *skb) > { > if (load_byte(skb, 0)) > bpf_tail_call_static(skb, &jmp_table, 3); > else > bpf_tail_call_static(skb, &jmp_table, 2); > return 1; > } > > int count0 = 0; > > SEC("tc") > int classifier_01(struct __sk_buff *skb) > { > count0++; > return subprog_tail_01(skb); > } > > int count1 = 0; > > SEC("tc") > int classifier_23(struct __sk_buff *skb) > { > count1++; > return subprog_tail_23(skb); > } > > static __noinline > int subprog_tailcall(struct __sk_buff *skb, int index) > { > volatile int retval = 0; > bpf_tail_call(skb, &jmp_table, index); > return retval; > } > > SEC("tc") > int entry(struct __sk_buff *skb) > { > subprog_tailcall(skb, 0); > subprog_tailcall(skb, 2); > > return 0; > } > > Finally, count0 is 33. And count1 is 33, too. It breaks the > MAX_TAIL_CALL_CNT limit by the way tailcall in subprog. Thanks for coming up with an example, I could not create something like this back then and just left a note for the future. > From 04fd61ab36ec065e ("bpf: allow bpf programs to tail-call other > bpf > programs"): > The chain of tail calls can form unpredictable dynamic loops > therefore > tail_call_cnt is used to limit the number of calls and currently is > set > to 32. > > It seems like that MAX_TAIL_CALL_CNT limits the max tailcall > hierarchy > layers. > > So, should we load it back? I wonder if the current behavior can lead to high system load in some cases. The current limit on the number of instructions is 1M; with tail calls it goes up to MAX_TAIL_CALL_CNT * 1M. If one uses the technique above to the fullest extent, i.e., on each tail call hierarchy level, can't one execute (2 ** MAX_TAIL_CALL_CNT) * 1M instructions as a result? > Thanks, > Leon
On 6/9/23 08:57, Ilya Leoshkevich wrote: > On Mon, 2023-09-04 at 23:15 +0800, Leon Hwang wrote: >> >> >> On 2023/9/4 21:10, Ilya Leoshkevich wrote: >>> On Sun, 2023-09-03 at 23:14 +0800, Leon Hwang wrote: >>>> This patch series fixes a tailcall infinite loop on x64. >>>> >>>> From commit ebf7d1f508a73871 ("bpf, x64: rework pro/epilogue and >>>> tailcall >>>> handling in JIT"), the tailcall on x64 works better than before. >>>> >>>> From commit e411901c0b775a3a ("bpf: allow for tailcalls in BPF >>>> subprograms >>>> for x64 JIT"), tailcall is able to run in BPF subprograms on x64. >>>> >>>> From commit 5b92a28aae4dd0f8 ("bpf: Support attaching tracing BPF >>>> program >>>> to other BPF programs"), BPF program is able to trace other BPF >>>> programs. >>>> >>>> How about combining them all together? >>>> >>>> 1. FENTRY/FEXIT on a BPF subprogram. >>>> 2. A tailcall runs in the BPF subprogram. >>>> 3. The tailcall calls the subprogram's caller. >>>> >>>> As a result, a tailcall infinite loop comes up. And the loop >>>> would >>>> halt >>>> the machine. >>>> >>>> As we know, in tail call context, the tail_call_cnt propagates by >>>> stack >>>> and rax register between BPF subprograms. So do in trampolines. >>>> >>>> How did I discover the bug? >>>> >>>> From commit 7f6e4312e15a5c37 ("bpf: Limit caller's stack depth >>>> 256 >>>> for >>>> subprogs with tailcalls"), the total stack size limits to around >>>> 8KiB. >>>> Then, I write some bpf progs to validate the stack consuming, >>>> that >>>> are >>>> tailcalls running in bpf2bpf and FENTRY/FEXIT tracing on >>>> bpf2bpf[1]. >>>> >>>> At that time, accidently, I made a tailcall loop. And then the >>>> loop >>>> halted >>>> my VM. Without the loop, the bpf progs would consume over 8KiB >>>> stack >>>> size. >>>> But the _stack-overflow_ did not halt my VM. >>>> >>>> With bpf_printk(), I confirmed that the tailcall count limit did >>>> not >>>> work >>>> expectedly. Next, read the code and fix it. >>>> >>>> Finally, unfortunately, I only fix it on x64 but other arches. As >>>> a >>>> result, CI tests failed because this bug hasn't been fixed on >>>> s390x. >>>> >>>> Some helps on s390x are requested. >>> >>> I will take a look, thanks for letting me know. >> >> Great. >> >>> I noticed there was something peculiar in this area when >>> implementing >>> the trampoline: >>> >>> * Note 1: The callee can increment the tail call counter, >>> but >>> * we do not load it back, since the x86 JIT does not do >>> this >>> * either.> >>> but I thought that this was intentional. >> >> I do think so. >> >> But wait, should we load it back? >> >> Let me show a demo: >> >> struct { >> __uint(type, BPF_MAP_TYPE_PROG_ARRAY); >> __uint(max_entries, 4); >> __uint(key_size, sizeof(__u32)); >> __uint(value_size, sizeof(__u32)); >> } jmp_table SEC(".maps"); >> >> static __noinline >> int subprog_tail_01(struct __sk_buff *skb) >> { >> if (load_byte(skb, 0)) >> bpf_tail_call_static(skb, &jmp_table, 1); >> else >> bpf_tail_call_static(skb, &jmp_table, 0); >> return 1; >> } >> >> static __noinline >> int subprog_tail_23(struct __sk_buff *skb) >> { >> if (load_byte(skb, 0)) >> bpf_tail_call_static(skb, &jmp_table, 3); >> else >> bpf_tail_call_static(skb, &jmp_table, 2); >> return 1; >> } >> >> int count0 = 0; >> >> SEC("tc") >> int classifier_01(struct __sk_buff *skb) >> { >> count0++; >> return subprog_tail_01(skb); >> } >> >> int count1 = 0; >> >> SEC("tc") >> int classifier_23(struct __sk_buff *skb) >> { >> count1++; >> return subprog_tail_23(skb); >> } >> >> static __noinline >> int subprog_tailcall(struct __sk_buff *skb, int index) >> { >> volatile int retval = 0; >> bpf_tail_call(skb, &jmp_table, index); >> return retval; >> } >> >> SEC("tc") >> int entry(struct __sk_buff *skb) >> { >> subprog_tailcall(skb, 0); >> subprog_tailcall(skb, 2); >> >> return 0; >> } >> >> Finally, count0 is 33. And count1 is 33, too. It breaks the >> MAX_TAIL_CALL_CNT limit by the way tailcall in subprog. > > Thanks for coming up with an example, I could not create something like > this back then and just left a note for the future. > >> From 04fd61ab36ec065e ("bpf: allow bpf programs to tail-call other >> bpf >> programs"): >> The chain of tail calls can form unpredictable dynamic loops >> therefore >> tail_call_cnt is used to limit the number of calls and currently is >> set >> to 32. >> >> It seems like that MAX_TAIL_CALL_CNT limits the max tailcall >> hierarchy >> layers. >> >> So, should we load it back? > > I wonder if the current behavior can lead to high system load in some > cases. The current limit on the number of instructions is 1M; with tail > calls it goes up to MAX_TAIL_CALL_CNT * 1M. If one uses the technique > above to the fullest extent, i.e., on each tail call hierarchy level, > can't one execute (2 ** MAX_TAIL_CALL_CNT) * 1M instructions as a > result? Executing (2 ** MAX_TAIL_CALL_CNT) * 1M instructions is really harmful. So, we should load it back. I'll try to fix it with another patchset. Thanks, Leon