Message ID | 20210505162307.2545061-1-revest@chromium.org (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | [bpf] bpf: Don't WARN_ON_ONCE in bpf_bprintf_prepare | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for bpf |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | warning | 5 maintainers not CCed: netdev@vger.kernel.org yhs@fb.com kafai@fb.com john.fastabend@gmail.com songliubraving@fb.com |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 6 this patch: 6 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 8 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 6 this patch: 6 |
netdev/header_inline | success | Link |
On Wed, May 5, 2021 at 9:23 AM Florent Revest <revest@chromium.org> wrote: > > The bpf_seq_printf, bpf_trace_printk and bpf_snprintf helpers share one > per-cpu buffer that they use to store temporary data (arguments to > bprintf). They "get" that buffer with try_get_fmt_tmp_buf and "put" it > by the end of their scope with bpf_bprintf_cleanup. > > If one of these helpers gets called within the scope of one of these > helpers, for example: a first bpf program gets called, uses Can we afford having few struct bpf_printf_bufs? They are just 512 bytes, so can we have 3-5 of them? Tracing low-level stuff isn't the only situation where this can occur, right? If someone is doing bpf_snprintf() and interrupt occurs and we run another BPF program, it will be impossible to do bpf_snprintf() or bpf_trace_printk() from the second BPF program, etc. We can't eliminate the probability, but having a small stack of buffers would make the probability so miniscule as to not worry about it at all. Good thing is that try_get_fmt_tmp_buf() abstracts all the details, so the changes are minimal. Nestedness property is preserved for non-sleepable BPF programs, right? If we want this to work for sleepable we'd need to either: 1) disable migration or 2) instead of assuming a stack of buffers, do a loop to find unused one. Should be acceptable performance-wise, as it's not the fastest code anyway (printf'ing in general). In any case, re-using the same buffer for sort-of-optional-to-work bpf_trace_printk() and probably-important-to-work bpf_snprintf() is suboptimal, so seems worth fixing this. Thoughts? > bpf_trace_printk which calls raw_spin_lock_irqsave which is traced by > another bpf program that calls bpf_trace_printk again, then the second > "get" fails. Essentially, these helpers are not re-entrant, and it's not > that bad because they would simply return -EBUSY and recover gracefully. > > However, when this happens, the code hits a WARN_ON_ONCE. The guidelines > in include/asm-generic/bug.h say "Do not use these macros [...] on > transient conditions like ENOMEM or EAGAIN." > > This condition qualifies as transient, for example, the next > raw_spin_lock_irqsave probe is likely to succeed, so it does not deserve > a WARN_ON_ONCE. > > The guidelines also say "Do not use these macros when checking for > invalid external inputs (e.g. invalid system call arguments" and, in a > way, this can be seen as an invalid input because syzkaller triggered > it. > > Signed-off-by: Florent Revest <revest@chromium.org> > Reported-by: syzbot <syzbot@syzkaller.appspotmail.com> > Fixes: d9c9e4db186a ("bpf: Factorize bpf_trace_printk and bpf_seq_printf") > --- > kernel/bpf/helpers.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > index 544773970dbc..007fa26eb3f5 100644 > --- a/kernel/bpf/helpers.c > +++ b/kernel/bpf/helpers.c > @@ -709,7 +709,7 @@ static int try_get_fmt_tmp_buf(char **tmp_buf) > > preempt_disable(); > used = this_cpu_inc_return(bpf_printf_buf_used); > - if (WARN_ON_ONCE(used > 1)) { > + if (used > 1) { > this_cpu_dec(bpf_printf_buf_used); > preempt_enable(); > return -EBUSY; > -- > 2.31.1.527.g47e6f16901-goog >
On 5/5/21 8:55 PM, Andrii Nakryiko wrote: > On Wed, May 5, 2021 at 9:23 AM Florent Revest <revest@chromium.org> wrote: >> >> The bpf_seq_printf, bpf_trace_printk and bpf_snprintf helpers share one >> per-cpu buffer that they use to store temporary data (arguments to >> bprintf). They "get" that buffer with try_get_fmt_tmp_buf and "put" it >> by the end of their scope with bpf_bprintf_cleanup. >> >> If one of these helpers gets called within the scope of one of these >> helpers, for example: a first bpf program gets called, uses > > Can we afford having few struct bpf_printf_bufs? They are just 512 > bytes, so can we have 3-5 of them? Tracing low-level stuff isn't the > only situation where this can occur, right? If someone is doing > bpf_snprintf() and interrupt occurs and we run another BPF program, it > will be impossible to do bpf_snprintf() or bpf_trace_printk() from the > second BPF program, etc. We can't eliminate the probability, but > having a small stack of buffers would make the probability so > miniscule as to not worry about it at all. > > Good thing is that try_get_fmt_tmp_buf() abstracts all the details, so > the changes are minimal. Nestedness property is preserved for > non-sleepable BPF programs, right? If we want this to work for > sleepable we'd need to either: 1) disable migration or 2) instead of > assuming a stack of buffers, do a loop to find unused one. Should be > acceptable performance-wise, as it's not the fastest code anyway > (printf'ing in general). > > In any case, re-using the same buffer for sort-of-optional-to-work > bpf_trace_printk() and probably-important-to-work bpf_snprintf() is > suboptimal, so seems worth fixing this. > > Thoughts? Yes, agree, it would otherwise be really hard to debug. I had the same thought on why not allowing nesting here given users very likely expect these helpers to just work for all the contexts. Thanks, Daniel
On Wed, May 5, 2021 at 1:00 PM Daniel Borkmann <daniel@iogearbox.net> wrote: > > On 5/5/21 8:55 PM, Andrii Nakryiko wrote: > > On Wed, May 5, 2021 at 9:23 AM Florent Revest <revest@chromium.org> wrote: > >> > >> The bpf_seq_printf, bpf_trace_printk and bpf_snprintf helpers share one > >> per-cpu buffer that they use to store temporary data (arguments to > >> bprintf). They "get" that buffer with try_get_fmt_tmp_buf and "put" it > >> by the end of their scope with bpf_bprintf_cleanup. > >> > >> If one of these helpers gets called within the scope of one of these > >> helpers, for example: a first bpf program gets called, uses > > > > Can we afford having few struct bpf_printf_bufs? They are just 512 > > bytes, so can we have 3-5 of them? Tracing low-level stuff isn't the > > only situation where this can occur, right? If someone is doing > > bpf_snprintf() and interrupt occurs and we run another BPF program, it > > will be impossible to do bpf_snprintf() or bpf_trace_printk() from the > > second BPF program, etc. We can't eliminate the probability, but > > having a small stack of buffers would make the probability so > > miniscule as to not worry about it at all. > > > > Good thing is that try_get_fmt_tmp_buf() abstracts all the details, so > > the changes are minimal. Nestedness property is preserved for > > non-sleepable BPF programs, right? If we want this to work for > > sleepable we'd need to either: 1) disable migration or 2) instead of oh wait, we already disable migration for sleepable BPF progs, so it should be good to do nestedness level only > > assuming a stack of buffers, do a loop to find unused one. Should be > > acceptable performance-wise, as it's not the fastest code anyway > > (printf'ing in general). > > > > In any case, re-using the same buffer for sort-of-optional-to-work > > bpf_trace_printk() and probably-important-to-work bpf_snprintf() is > > suboptimal, so seems worth fixing this. > > > > Thoughts? > > Yes, agree, it would otherwise be really hard to debug. I had the same > thought on why not allowing nesting here given users very likely expect > these helpers to just work for all the contexts. > > Thanks, > Daniel
On Wed, May 5, 2021 at 1:48 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Wed, May 5, 2021 at 1:00 PM Daniel Borkmann <daniel@iogearbox.net> wrote: > > > > On 5/5/21 8:55 PM, Andrii Nakryiko wrote: > > > On Wed, May 5, 2021 at 9:23 AM Florent Revest <revest@chromium.org> wrote: > > >> > > >> The bpf_seq_printf, bpf_trace_printk and bpf_snprintf helpers share one > > >> per-cpu buffer that they use to store temporary data (arguments to > > >> bprintf). They "get" that buffer with try_get_fmt_tmp_buf and "put" it > > >> by the end of their scope with bpf_bprintf_cleanup. > > >> > > >> If one of these helpers gets called within the scope of one of these > > >> helpers, for example: a first bpf program gets called, uses > > > > > > Can we afford having few struct bpf_printf_bufs? They are just 512 > > > bytes, so can we have 3-5 of them? Tracing low-level stuff isn't the > > > only situation where this can occur, right? If someone is doing > > > bpf_snprintf() and interrupt occurs and we run another BPF program, it > > > will be impossible to do bpf_snprintf() or bpf_trace_printk() from the > > > second BPF program, etc. We can't eliminate the probability, but > > > having a small stack of buffers would make the probability so > > > miniscule as to not worry about it at all. > > > > > > Good thing is that try_get_fmt_tmp_buf() abstracts all the details, so > > > the changes are minimal. Nestedness property is preserved for > > > non-sleepable BPF programs, right? If we want this to work for > > > sleepable we'd need to either: 1) disable migration or 2) instead of > > oh wait, we already disable migration for sleepable BPF progs, so it > should be good to do nestedness level only actually, migrate_disable() might not be enough. Unless it is impossible for some reason I miss, worst case it could be that two sleepable programs (A and B) can be intermixed on the same CPU: A starts&sleeps - B starts&sleeps - A continues&returns - B continues and nestedness doesn't work anymore. So something like "reserving a slot" would work better. > > > > assuming a stack of buffers, do a loop to find unused one. Should be > > > acceptable performance-wise, as it's not the fastest code anyway > > > (printf'ing in general). > > > > > > In any case, re-using the same buffer for sort-of-optional-to-work > > > bpf_trace_printk() and probably-important-to-work bpf_snprintf() is > > > suboptimal, so seems worth fixing this. > > > > > > Thoughts? > > > > Yes, agree, it would otherwise be really hard to debug. I had the same > > thought on why not allowing nesting here given users very likely expect > > these helpers to just work for all the contexts. > > > > Thanks, > > Daniel
On Wed, May 5, 2021 at 10:52 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Wed, May 5, 2021 at 1:48 PM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > On Wed, May 5, 2021 at 1:00 PM Daniel Borkmann <daniel@iogearbox.net> wrote: > > > > > > On 5/5/21 8:55 PM, Andrii Nakryiko wrote: > > > > On Wed, May 5, 2021 at 9:23 AM Florent Revest <revest@chromium.org> wrote: > > > >> > > > >> The bpf_seq_printf, bpf_trace_printk and bpf_snprintf helpers share one > > > >> per-cpu buffer that they use to store temporary data (arguments to > > > >> bprintf). They "get" that buffer with try_get_fmt_tmp_buf and "put" it > > > >> by the end of their scope with bpf_bprintf_cleanup. > > > >> > > > >> If one of these helpers gets called within the scope of one of these > > > >> helpers, for example: a first bpf program gets called, uses > > > > > > > > Can we afford having few struct bpf_printf_bufs? They are just 512 > > > > bytes, so can we have 3-5 of them? Tracing low-level stuff isn't the > > > > only situation where this can occur, right? If someone is doing > > > > bpf_snprintf() and interrupt occurs and we run another BPF program, it > > > > will be impossible to do bpf_snprintf() or bpf_trace_printk() from the > > > > second BPF program, etc. We can't eliminate the probability, but > > > > having a small stack of buffers would make the probability so > > > > miniscule as to not worry about it at all. > > > > > > > > Good thing is that try_get_fmt_tmp_buf() abstracts all the details, so > > > > the changes are minimal. Nestedness property is preserved for > > > > non-sleepable BPF programs, right? If we want this to work for > > > > sleepable we'd need to either: 1) disable migration or 2) instead of > > > > oh wait, we already disable migration for sleepable BPF progs, so it > > should be good to do nestedness level only > > actually, migrate_disable() might not be enough. Unless it is > impossible for some reason I miss, worst case it could be that two > sleepable programs (A and B) can be intermixed on the same CPU: A > starts&sleeps - B starts&sleeps - A continues&returns - B continues > and nestedness doesn't work anymore. So something like "reserving a > slot" would work better. Iiuc try_get_fmt_tmp_buf does preempt_enable to avoid that situation ? > > > > > > assuming a stack of buffers, do a loop to find unused one. Should be > > > > acceptable performance-wise, as it's not the fastest code anyway > > > > (printf'ing in general). > > > > > > > > In any case, re-using the same buffer for sort-of-optional-to-work > > > > bpf_trace_printk() and probably-important-to-work bpf_snprintf() is > > > > suboptimal, so seems worth fixing this. > > > > > > > > Thoughts? > > > > > > Yes, agree, it would otherwise be really hard to debug. I had the same > > > thought on why not allowing nesting here given users very likely expect > > > these helpers to just work for all the contexts. > > > > > > Thanks, > > > Daniel What would you think of just letting the helpers own these 512 bytes buffers as local variables on their stacks ? Then bpf_prepare_bprintf would only need to write there, there would be no acquire semantic (like try_get_fmt_tmp_buf) and the stack frame would just be freed on the helper return so there would be no bpf_printf_cleanup either. We would also not pre-reserve static memory for all CPUs and it becomes trivial to handle re-entrant helper calls. I inherited this per-cpu buffer from the pre-existing bpf_seq_printf code but I've not been convinced of its necessity.
On Wed, May 5, 2021 at 3:29 PM Florent Revest <revest@chromium.org> wrote: > > On Wed, May 5, 2021 at 10:52 PM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > On Wed, May 5, 2021 at 1:48 PM Andrii Nakryiko > > <andrii.nakryiko@gmail.com> wrote: > > > > > > On Wed, May 5, 2021 at 1:00 PM Daniel Borkmann <daniel@iogearbox.net> wrote: > > > > > > > > On 5/5/21 8:55 PM, Andrii Nakryiko wrote: > > > > > On Wed, May 5, 2021 at 9:23 AM Florent Revest <revest@chromium.org> wrote: > > > > >> > > > > >> The bpf_seq_printf, bpf_trace_printk and bpf_snprintf helpers share one > > > > >> per-cpu buffer that they use to store temporary data (arguments to > > > > >> bprintf). They "get" that buffer with try_get_fmt_tmp_buf and "put" it > > > > >> by the end of their scope with bpf_bprintf_cleanup. > > > > >> > > > > >> If one of these helpers gets called within the scope of one of these > > > > >> helpers, for example: a first bpf program gets called, uses > > > > > > > > > > Can we afford having few struct bpf_printf_bufs? They are just 512 > > > > > bytes, so can we have 3-5 of them? Tracing low-level stuff isn't the > > > > > only situation where this can occur, right? If someone is doing > > > > > bpf_snprintf() and interrupt occurs and we run another BPF program, it > > > > > will be impossible to do bpf_snprintf() or bpf_trace_printk() from the > > > > > second BPF program, etc. We can't eliminate the probability, but > > > > > having a small stack of buffers would make the probability so > > > > > miniscule as to not worry about it at all. > > > > > > > > > > Good thing is that try_get_fmt_tmp_buf() abstracts all the details, so > > > > > the changes are minimal. Nestedness property is preserved for > > > > > non-sleepable BPF programs, right? If we want this to work for > > > > > sleepable we'd need to either: 1) disable migration or 2) instead of > > > > > > oh wait, we already disable migration for sleepable BPF progs, so it > > > should be good to do nestedness level only > > > > actually, migrate_disable() might not be enough. Unless it is > > impossible for some reason I miss, worst case it could be that two > > sleepable programs (A and B) can be intermixed on the same CPU: A > > starts&sleeps - B starts&sleeps - A continues&returns - B continues > > and nestedness doesn't work anymore. So something like "reserving a > > slot" would work better. > > Iiuc try_get_fmt_tmp_buf does preempt_enable to avoid that situation ? > > > > > > > > > assuming a stack of buffers, do a loop to find unused one. Should be > > > > > acceptable performance-wise, as it's not the fastest code anyway > > > > > (printf'ing in general). > > > > > > > > > > In any case, re-using the same buffer for sort-of-optional-to-work > > > > > bpf_trace_printk() and probably-important-to-work bpf_snprintf() is > > > > > suboptimal, so seems worth fixing this. > > > > > > > > > > Thoughts? > > > > > > > > Yes, agree, it would otherwise be really hard to debug. I had the same > > > > thought on why not allowing nesting here given users very likely expect > > > > these helpers to just work for all the contexts. > > > > > > > > Thanks, > > > > Daniel > > What would you think of just letting the helpers own these 512 bytes > buffers as local variables on their stacks ? Then bpf_prepare_bprintf > would only need to write there, there would be no acquire semantic > (like try_get_fmt_tmp_buf) and the stack frame would just be freed on > the helper return so there would be no bpf_printf_cleanup either. We > would also not pre-reserve static memory for all CPUs and it becomes > trivial to handle re-entrant helper calls. > > I inherited this per-cpu buffer from the pre-existing bpf_seq_printf > code but I've not been convinced of its necessity. I got the impression that extra 512 bytes on the kernel stack is quite a lot and that's why we have per-cpu buffers. Especially that bpf_trace_printk() can be called from any context, including NMI.
On Thu, May 6, 2021 at 8:52 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Wed, May 5, 2021 at 3:29 PM Florent Revest <revest@chromium.org> wrote: > > > > On Wed, May 5, 2021 at 10:52 PM Andrii Nakryiko > > <andrii.nakryiko@gmail.com> wrote: > > > > > > On Wed, May 5, 2021 at 1:48 PM Andrii Nakryiko > > > <andrii.nakryiko@gmail.com> wrote: > > > > > > > > On Wed, May 5, 2021 at 1:00 PM Daniel Borkmann <daniel@iogearbox.net> wrote: > > > > > > > > > > On 5/5/21 8:55 PM, Andrii Nakryiko wrote: > > > > > > On Wed, May 5, 2021 at 9:23 AM Florent Revest <revest@chromium.org> wrote: > > > > > >> > > > > > >> The bpf_seq_printf, bpf_trace_printk and bpf_snprintf helpers share one > > > > > >> per-cpu buffer that they use to store temporary data (arguments to > > > > > >> bprintf). They "get" that buffer with try_get_fmt_tmp_buf and "put" it > > > > > >> by the end of their scope with bpf_bprintf_cleanup. > > > > > >> > > > > > >> If one of these helpers gets called within the scope of one of these > > > > > >> helpers, for example: a first bpf program gets called, uses > > > > > > > > > > > > Can we afford having few struct bpf_printf_bufs? They are just 512 > > > > > > bytes, so can we have 3-5 of them? Tracing low-level stuff isn't the > > > > > > only situation where this can occur, right? If someone is doing > > > > > > bpf_snprintf() and interrupt occurs and we run another BPF program, it > > > > > > will be impossible to do bpf_snprintf() or bpf_trace_printk() from the > > > > > > second BPF program, etc. We can't eliminate the probability, but > > > > > > having a small stack of buffers would make the probability so > > > > > > miniscule as to not worry about it at all. > > > > > > > > > > > > Good thing is that try_get_fmt_tmp_buf() abstracts all the details, so > > > > > > the changes are minimal. Nestedness property is preserved for > > > > > > non-sleepable BPF programs, right? If we want this to work for > > > > > > sleepable we'd need to either: 1) disable migration or 2) instead of > > > > > > > > oh wait, we already disable migration for sleepable BPF progs, so it > > > > should be good to do nestedness level only > > > > > > actually, migrate_disable() might not be enough. Unless it is > > > impossible for some reason I miss, worst case it could be that two > > > sleepable programs (A and B) can be intermixed on the same CPU: A > > > starts&sleeps - B starts&sleeps - A continues&returns - B continues > > > and nestedness doesn't work anymore. So something like "reserving a > > > slot" would work better. > > > > Iiuc try_get_fmt_tmp_buf does preempt_enable to avoid that situation ? > > > > > > > > > > > > assuming a stack of buffers, do a loop to find unused one. Should be > > > > > > acceptable performance-wise, as it's not the fastest code anyway > > > > > > (printf'ing in general). > > > > > > > > > > > > In any case, re-using the same buffer for sort-of-optional-to-work > > > > > > bpf_trace_printk() and probably-important-to-work bpf_snprintf() is > > > > > > suboptimal, so seems worth fixing this. > > > > > > > > > > > > Thoughts? > > > > > > > > > > Yes, agree, it would otherwise be really hard to debug. I had the same > > > > > thought on why not allowing nesting here given users very likely expect > > > > > these helpers to just work for all the contexts. > > > > > > > > > > Thanks, > > > > > Daniel > > > > What would you think of just letting the helpers own these 512 bytes > > buffers as local variables on their stacks ? Then bpf_prepare_bprintf > > would only need to write there, there would be no acquire semantic > > (like try_get_fmt_tmp_buf) and the stack frame would just be freed on > > the helper return so there would be no bpf_printf_cleanup either. We > > would also not pre-reserve static memory for all CPUs and it becomes > > trivial to handle re-entrant helper calls. > > > > I inherited this per-cpu buffer from the pre-existing bpf_seq_printf > > code but I've not been convinced of its necessity. > > I got the impression that extra 512 bytes on the kernel stack is quite > a lot and that's why we have per-cpu buffers. Especially that > bpf_trace_printk() can be called from any context, including NMI. Ok, I understand. What about having one buffer per helper, synchronized with a spinlock? Actually, bpf_trace_printk already has that, not for the bprintf arguments but for the bprintf output so this wouldn't change much to the performance of the helpers anyway: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/kernel/trace/bpf_trace.c?id=9d31d2338950293ec19d9b095fbaa9030899dcb4#n385 These helpers are not performance sensitive so a per-cpu stack of buffers feels over-engineered to me (and is also complexity I feel a bit uncomfortable with).
On 5/6/21 10:17 PM, Florent Revest wrote: > On Thu, May 6, 2021 at 8:52 PM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: >> On Wed, May 5, 2021 at 3:29 PM Florent Revest <revest@chromium.org> wrote: >>> On Wed, May 5, 2021 at 10:52 PM Andrii Nakryiko >>> <andrii.nakryiko@gmail.com> wrote: >>>> On Wed, May 5, 2021 at 1:48 PM Andrii Nakryiko >>>> <andrii.nakryiko@gmail.com> wrote: >>>>> On Wed, May 5, 2021 at 1:00 PM Daniel Borkmann <daniel@iogearbox.net> wrote: >>>>>> On 5/5/21 8:55 PM, Andrii Nakryiko wrote: >>>>>>> On Wed, May 5, 2021 at 9:23 AM Florent Revest <revest@chromium.org> wrote: >>>>>>>> >>>>>>>> The bpf_seq_printf, bpf_trace_printk and bpf_snprintf helpers share one >>>>>>>> per-cpu buffer that they use to store temporary data (arguments to >>>>>>>> bprintf). They "get" that buffer with try_get_fmt_tmp_buf and "put" it >>>>>>>> by the end of their scope with bpf_bprintf_cleanup. >>>>>>>> >>>>>>>> If one of these helpers gets called within the scope of one of these >>>>>>>> helpers, for example: a first bpf program gets called, uses >>>>>>> >>>>>>> Can we afford having few struct bpf_printf_bufs? They are just 512 >>>>>>> bytes, so can we have 3-5 of them? Tracing low-level stuff isn't the >>>>>>> only situation where this can occur, right? If someone is doing >>>>>>> bpf_snprintf() and interrupt occurs and we run another BPF program, it >>>>>>> will be impossible to do bpf_snprintf() or bpf_trace_printk() from the >>>>>>> second BPF program, etc. We can't eliminate the probability, but >>>>>>> having a small stack of buffers would make the probability so >>>>>>> miniscule as to not worry about it at all. >>>>>>> >>>>>>> Good thing is that try_get_fmt_tmp_buf() abstracts all the details, so >>>>>>> the changes are minimal. Nestedness property is preserved for >>>>>>> non-sleepable BPF programs, right? If we want this to work for >>>>>>> sleepable we'd need to either: 1) disable migration or 2) instead of >>>>> >>>>> oh wait, we already disable migration for sleepable BPF progs, so it >>>>> should be good to do nestedness level only >>>> >>>> actually, migrate_disable() might not be enough. Unless it is >>>> impossible for some reason I miss, worst case it could be that two >>>> sleepable programs (A and B) can be intermixed on the same CPU: A >>>> starts&sleeps - B starts&sleeps - A continues&returns - B continues >>>> and nestedness doesn't work anymore. So something like "reserving a >>>> slot" would work better. >>> >>> Iiuc try_get_fmt_tmp_buf does preempt_enable to avoid that situation ? >>> >>>>>>> assuming a stack of buffers, do a loop to find unused one. Should be >>>>>>> acceptable performance-wise, as it's not the fastest code anyway >>>>>>> (printf'ing in general). >>>>>>> >>>>>>> In any case, re-using the same buffer for sort-of-optional-to-work >>>>>>> bpf_trace_printk() and probably-important-to-work bpf_snprintf() is >>>>>>> suboptimal, so seems worth fixing this. >>>>>>> >>>>>>> Thoughts? >>>>>> >>>>>> Yes, agree, it would otherwise be really hard to debug. I had the same >>>>>> thought on why not allowing nesting here given users very likely expect >>>>>> these helpers to just work for all the contexts. >>>>>> >>>>>> Thanks, >>>>>> Daniel >>> >>> What would you think of just letting the helpers own these 512 bytes >>> buffers as local variables on their stacks ? Then bpf_prepare_bprintf >>> would only need to write there, there would be no acquire semantic >>> (like try_get_fmt_tmp_buf) and the stack frame would just be freed on >>> the helper return so there would be no bpf_printf_cleanup either. We >>> would also not pre-reserve static memory for all CPUs and it becomes >>> trivial to handle re-entrant helper calls. >>> >>> I inherited this per-cpu buffer from the pre-existing bpf_seq_printf >>> code but I've not been convinced of its necessity. >> >> I got the impression that extra 512 bytes on the kernel stack is quite >> a lot and that's why we have per-cpu buffers. Especially that >> bpf_trace_printk() can be called from any context, including NMI. > > Ok, I understand. > > What about having one buffer per helper, synchronized with a spinlock? > Actually, bpf_trace_printk already has that, not for the bprintf > arguments but for the bprintf output so this wouldn't change much to > the performance of the helpers anyway: > https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/kernel/trace/bpf_trace.c?id=9d31d2338950293ec19d9b095fbaa9030899dcb4#n385 > > These helpers are not performance sensitive so a per-cpu stack of > buffers feels over-engineered to me (and is also complexity I feel a > bit uncomfortable with). But wouldn't this have same potential of causing a deadlock? Simple example would be if you have a tracing prog attached to bstr_printf(), and one of the other helpers using the same lock called from a non-tracing prog. If it can be avoided fairly easily, I'd also opt for per-cpu buffers as Andrii mentioned earlier. We've had few prior examples with similar issues [0]. [0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9594dc3c7e71b9f52bee1d7852eb3d4e3aea9e99
On Thu, May 6, 2021 at 11:38 PM Daniel Borkmann <daniel@iogearbox.net> wrote: > > On 5/6/21 10:17 PM, Florent Revest wrote: > > On Thu, May 6, 2021 at 8:52 PM Andrii Nakryiko > > <andrii.nakryiko@gmail.com> wrote: > >> On Wed, May 5, 2021 at 3:29 PM Florent Revest <revest@chromium.org> wrote: > >>> On Wed, May 5, 2021 at 10:52 PM Andrii Nakryiko > >>> <andrii.nakryiko@gmail.com> wrote: > >>>> On Wed, May 5, 2021 at 1:48 PM Andrii Nakryiko > >>>> <andrii.nakryiko@gmail.com> wrote: > >>>>> On Wed, May 5, 2021 at 1:00 PM Daniel Borkmann <daniel@iogearbox.net> wrote: > >>>>>> On 5/5/21 8:55 PM, Andrii Nakryiko wrote: > >>>>>>> On Wed, May 5, 2021 at 9:23 AM Florent Revest <revest@chromium.org> wrote: > >>>>>>>> > >>>>>>>> The bpf_seq_printf, bpf_trace_printk and bpf_snprintf helpers share one > >>>>>>>> per-cpu buffer that they use to store temporary data (arguments to > >>>>>>>> bprintf). They "get" that buffer with try_get_fmt_tmp_buf and "put" it > >>>>>>>> by the end of their scope with bpf_bprintf_cleanup. > >>>>>>>> > >>>>>>>> If one of these helpers gets called within the scope of one of these > >>>>>>>> helpers, for example: a first bpf program gets called, uses > >>>>>>> > >>>>>>> Can we afford having few struct bpf_printf_bufs? They are just 512 > >>>>>>> bytes, so can we have 3-5 of them? Tracing low-level stuff isn't the > >>>>>>> only situation where this can occur, right? If someone is doing > >>>>>>> bpf_snprintf() and interrupt occurs and we run another BPF program, it > >>>>>>> will be impossible to do bpf_snprintf() or bpf_trace_printk() from the > >>>>>>> second BPF program, etc. We can't eliminate the probability, but > >>>>>>> having a small stack of buffers would make the probability so > >>>>>>> miniscule as to not worry about it at all. > >>>>>>> > >>>>>>> Good thing is that try_get_fmt_tmp_buf() abstracts all the details, so > >>>>>>> the changes are minimal. Nestedness property is preserved for > >>>>>>> non-sleepable BPF programs, right? If we want this to work for > >>>>>>> sleepable we'd need to either: 1) disable migration or 2) instead of > >>>>> > >>>>> oh wait, we already disable migration for sleepable BPF progs, so it > >>>>> should be good to do nestedness level only > >>>> > >>>> actually, migrate_disable() might not be enough. Unless it is > >>>> impossible for some reason I miss, worst case it could be that two > >>>> sleepable programs (A and B) can be intermixed on the same CPU: A > >>>> starts&sleeps - B starts&sleeps - A continues&returns - B continues > >>>> and nestedness doesn't work anymore. So something like "reserving a > >>>> slot" would work better. > >>> > >>> Iiuc try_get_fmt_tmp_buf does preempt_enable to avoid that situation ? > >>> > >>>>>>> assuming a stack of buffers, do a loop to find unused one. Should be > >>>>>>> acceptable performance-wise, as it's not the fastest code anyway > >>>>>>> (printf'ing in general). > >>>>>>> > >>>>>>> In any case, re-using the same buffer for sort-of-optional-to-work > >>>>>>> bpf_trace_printk() and probably-important-to-work bpf_snprintf() is > >>>>>>> suboptimal, so seems worth fixing this. > >>>>>>> > >>>>>>> Thoughts? > >>>>>> > >>>>>> Yes, agree, it would otherwise be really hard to debug. I had the same > >>>>>> thought on why not allowing nesting here given users very likely expect > >>>>>> these helpers to just work for all the contexts. > >>>>>> > >>>>>> Thanks, > >>>>>> Daniel > >>> > >>> What would you think of just letting the helpers own these 512 bytes > >>> buffers as local variables on their stacks ? Then bpf_prepare_bprintf > >>> would only need to write there, there would be no acquire semantic > >>> (like try_get_fmt_tmp_buf) and the stack frame would just be freed on > >>> the helper return so there would be no bpf_printf_cleanup either. We > >>> would also not pre-reserve static memory for all CPUs and it becomes > >>> trivial to handle re-entrant helper calls. > >>> > >>> I inherited this per-cpu buffer from the pre-existing bpf_seq_printf > >>> code but I've not been convinced of its necessity. > >> > >> I got the impression that extra 512 bytes on the kernel stack is quite > >> a lot and that's why we have per-cpu buffers. Especially that > >> bpf_trace_printk() can be called from any context, including NMI. > > > > Ok, I understand. > > > > What about having one buffer per helper, synchronized with a spinlock? > > Actually, bpf_trace_printk already has that, not for the bprintf > > arguments but for the bprintf output so this wouldn't change much to > > the performance of the helpers anyway: > > https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/kernel/trace/bpf_trace.c?id=9d31d2338950293ec19d9b095fbaa9030899dcb4#n385 > > > > These helpers are not performance sensitive so a per-cpu stack of > > buffers feels over-engineered to me (and is also complexity I feel a > > bit uncomfortable with). > > But wouldn't this have same potential of causing a deadlock? Simple example > would be if you have a tracing prog attached to bstr_printf(), and one of > the other helpers using the same lock called from a non-tracing prog. If Ah, right, I see :/ > it can be avoided fairly easily, I'd also opt for per-cpu buffers as Andrii > mentioned earlier. We've had few prior examples with similar issues [0]. > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9594dc3c7e71b9f52bee1d7852eb3d4e3aea9e99 Ok it's not as bad as I imagined, thank you Daniel :) I'll look into it beginning of next week.
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 544773970dbc..007fa26eb3f5 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -709,7 +709,7 @@ static int try_get_fmt_tmp_buf(char **tmp_buf) preempt_disable(); used = this_cpu_inc_return(bpf_printf_buf_used); - if (WARN_ON_ONCE(used > 1)) { + if (used > 1) { this_cpu_dec(bpf_printf_buf_used); preempt_enable(); return -EBUSY;
The bpf_seq_printf, bpf_trace_printk and bpf_snprintf helpers share one per-cpu buffer that they use to store temporary data (arguments to bprintf). They "get" that buffer with try_get_fmt_tmp_buf and "put" it by the end of their scope with bpf_bprintf_cleanup. If one of these helpers gets called within the scope of one of these helpers, for example: a first bpf program gets called, uses bpf_trace_printk which calls raw_spin_lock_irqsave which is traced by another bpf program that calls bpf_trace_printk again, then the second "get" fails. Essentially, these helpers are not re-entrant, and it's not that bad because they would simply return -EBUSY and recover gracefully. However, when this happens, the code hits a WARN_ON_ONCE. The guidelines in include/asm-generic/bug.h say "Do not use these macros [...] on transient conditions like ENOMEM or EAGAIN." This condition qualifies as transient, for example, the next raw_spin_lock_irqsave probe is likely to succeed, so it does not deserve a WARN_ON_ONCE. The guidelines also say "Do not use these macros when checking for invalid external inputs (e.g. invalid system call arguments" and, in a way, this can be seen as an invalid input because syzkaller triggered it. Signed-off-by: Florent Revest <revest@chromium.org> Reported-by: syzbot <syzbot@syzkaller.appspotmail.com> Fixes: d9c9e4db186a ("bpf: Factorize bpf_trace_printk and bpf_seq_printf") --- kernel/bpf/helpers.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)