Message ID | 20210421190736.1538217-1-linux@rasmusvillemoes.dk (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | bpf: remove pointless code from bpf_do_trace_printk() | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
On Wed, Apr 21, 2021 at 6:19 PM Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote: > > The comment is wrong. snprintf(buf, 16, "") and snprintf(buf, 16, > "%s", "") etc. will certainly put '\0' in buf[0]. The only case where > snprintf() does not guarantee a nul-terminated string is when it is > given a buffer size of 0 (which of course prevents it from writing > anything at all to the buffer). > > Remove it before it gets cargo-culted elsewhere. > > Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> > --- > kernel/trace/bpf_trace.c | 3 --- > 1 file changed, 3 deletions(-) > The change looks good to me, but please rebase it on top of the bpf-next tree. This is not a bug, so it doesn't have to go into the bpf tree. As it is right now, it doesn't apply cleanly onto bpf-next. > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > index b0c45d923f0f..4ee55df84cd3 100644 > --- a/kernel/trace/bpf_trace.c > +++ b/kernel/trace/bpf_trace.c > @@ -412,9 +412,6 @@ static __printf(1, 0) int bpf_do_trace_printk(const char *fmt, ...) > va_start(ap, fmt); > ret = vsnprintf(buf, sizeof(buf), fmt, ap); > va_end(ap); > - /* vsnprintf() will not append null for zero-length strings */ > - if (ret == 0) > - buf[0] = '\0'; > trace_bpf_trace_printk(buf); > raw_spin_unlock_irqrestore(&trace_printk_lock, flags); > > -- > 2.29.2 >
On 22/04/2021 05.32, Andrii Nakryiko wrote: > On Wed, Apr 21, 2021 at 6:19 PM Rasmus Villemoes > <linux@rasmusvillemoes.dk> wrote: >> >> The comment is wrong. snprintf(buf, 16, "") and snprintf(buf, 16, >> "%s", "") etc. will certainly put '\0' in buf[0]. The only case where >> snprintf() does not guarantee a nul-terminated string is when it is >> given a buffer size of 0 (which of course prevents it from writing >> anything at all to the buffer). >> >> Remove it before it gets cargo-culted elsewhere. >> >> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> >> --- >> kernel/trace/bpf_trace.c | 3 --- >> 1 file changed, 3 deletions(-) >> > > The change looks good to me, but please rebase it on top of the > bpf-next tree. This is not a bug, so it doesn't have to go into the > bpf tree. As it is right now, it doesn't apply cleanly onto bpf-next. Thanks for the pointer. Looking in next-20210420, it seems to me that commit d9c9e4db186ab4d81f84e6f22b225d333b9424e3 Author: Florent Revest <revest@chromium.org> Date: Mon Apr 19 17:52:38 2021 +0200 bpf: Factorize bpf_trace_printk and bpf_seq_printf is buggy. In particular, these two snippets: +#define BPF_CAST_FMT_ARG(arg_nb, args, mod) \ + (mod[arg_nb] == BPF_PRINTF_LONG_LONG || \ + (mod[arg_nb] == BPF_PRINTF_LONG && __BITS_PER_LONG == 64) \ + ? (u64)args[arg_nb] \ + : (u32)args[arg_nb]) + ret = snprintf(buf, sizeof(buf), fmt, BPF_CAST_FMT_ARG(0, args, mod), + BPF_CAST_FMT_ARG(1, args, mod), BPF_CAST_FMT_ARG(2, args, mod)); Regardless of the casts done in that macro, the type of the resulting expression is that resulting from C promotion rules. And (foo ? (u64)bla : (u32)blib) has type u64, which is thus the type the compiler uses when building the vararg list being passed into snprintf(). C simply doesn't allow you to change types at run-time in this way. It probably works fine on x86-64, which passes the first six or so argument in registers, va_start() puts those registers into the va_list opaque structure, and when it comes time to do a va_arg(int), just the lower 32 bits are used. It is broken on i386 and other architectures where arguments are passed on the stack (and for x86-64 as well had there been a few more arguments) and va_arg(ap, int) is essentially ({ int res = *(int *)ap; ap += 4; res; }) [or maybe it's -= 4 because stack direction etc., that's not really relevant here]. Rasmus
On Thu, Apr 22, 2021 at 9:13 AM Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote: > > On 22/04/2021 05.32, Andrii Nakryiko wrote: > > On Wed, Apr 21, 2021 at 6:19 PM Rasmus Villemoes > > <linux@rasmusvillemoes.dk> wrote: > >> > >> The comment is wrong. snprintf(buf, 16, "") and snprintf(buf, 16, > >> "%s", "") etc. will certainly put '\0' in buf[0]. The only case where > >> snprintf() does not guarantee a nul-terminated string is when it is > >> given a buffer size of 0 (which of course prevents it from writing > >> anything at all to the buffer). > >> > >> Remove it before it gets cargo-culted elsewhere. > >> > >> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> > >> --- > >> kernel/trace/bpf_trace.c | 3 --- > >> 1 file changed, 3 deletions(-) > >> > > > > The change looks good to me, but please rebase it on top of the > > bpf-next tree. This is not a bug, so it doesn't have to go into the > > bpf tree. As it is right now, it doesn't apply cleanly onto bpf-next. FWIW the idea of the patch also looks good to me :) > Thanks for the pointer. Looking in next-20210420, it seems to me that > > commit d9c9e4db186ab4d81f84e6f22b225d333b9424e3 > Author: Florent Revest <revest@chromium.org> > Date: Mon Apr 19 17:52:38 2021 +0200 > > bpf: Factorize bpf_trace_printk and bpf_seq_printf > > is buggy. In particular, these two snippets: > > +#define BPF_CAST_FMT_ARG(arg_nb, args, mod) \ > + (mod[arg_nb] == BPF_PRINTF_LONG_LONG || \ > + (mod[arg_nb] == BPF_PRINTF_LONG && __BITS_PER_LONG == 64) \ > + ? (u64)args[arg_nb] \ > + : (u32)args[arg_nb]) > > > + ret = snprintf(buf, sizeof(buf), fmt, BPF_CAST_FMT_ARG(0, args, > mod), > + BPF_CAST_FMT_ARG(1, args, mod), BPF_CAST_FMT_ARG(2, > args, mod)); > > Regardless of the casts done in that macro, the type of the resulting > expression is that resulting from C promotion rules. And (foo ? (u64)bla > : (u32)blib) has type u64, which is thus the type the compiler uses when > building the vararg list being passed into snprintf(). C simply doesn't > allow you to change types at run-time in this way. > > It probably works fine on x86-64, which passes the first six or so > argument in registers, va_start() puts those registers into the va_list > opaque structure, and when it comes time to do a va_arg(int), just the > lower 32 bits are used. It is broken on i386 and other architectures > where arguments are passed on the stack (and for x86-64 as well had > there been a few more arguments) and va_arg(ap, int) is essentially ({ > int res = *(int *)ap; ap += 4; res; }) [or maybe it's -= 4 because stack > direction etc., that's not really relevant here]. > > Rasmus Thank you Rasmus :) It seems that we went offtrack in https://lore.kernel.org/bpf/CAEf4BzZVEGM4esi-Rz67_xX_RTDrgxViy0gHfpeauECR5bmRNA@mail.gmail.com/ and we do need something like "88a5c690b6 bpf: fix bpf_trace_printk on 32 bit archs". Thinking about it again, it's clearer now why the __BPF_TP_EMIT macro emits 2^3=8 different __trace_printk() indeed. In the case of bpf_trace_printk with a maximum of 3 args, it's relatively cheap; but for bpf_seq_printf and bpf_snprintf which accept up to 12 arguments, that would be 2^12=4096 calls. Until now bpf_seq_printf has just ignored this problem and just considered everything as u64, I wonder if that'd be the best approach for these two helpers anyway.
On 22/04/2021 11.23, Florent Revest wrote: > On Thu, Apr 22, 2021 at 9:13 AM Rasmus Villemoes > <linux@rasmusvillemoes.dk> wrote: >> >> On 22/04/2021 05.32, Andrii Nakryiko wrote: >>> On Wed, Apr 21, 2021 at 6:19 PM Rasmus Villemoes >>> <linux@rasmusvillemoes.dk> wrote: >>>> >>>> The comment is wrong. snprintf(buf, 16, "") and snprintf(buf, 16, >>>> "%s", "") etc. will certainly put '\0' in buf[0]. The only case where >>>> snprintf() does not guarantee a nul-terminated string is when it is >>>> given a buffer size of 0 (which of course prevents it from writing >>>> anything at all to the buffer). >>>> >>>> Remove it before it gets cargo-culted elsewhere. >>>> >>>> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> >>>> --- >>>> kernel/trace/bpf_trace.c | 3 --- >>>> 1 file changed, 3 deletions(-) >>>> >>> >>> The change looks good to me, but please rebase it on top of the >>> bpf-next tree. This is not a bug, so it doesn't have to go into the >>> bpf tree. As it is right now, it doesn't apply cleanly onto bpf-next. > > FWIW the idea of the patch also looks good to me :) > >> Thanks for the pointer. Looking in next-20210420, it seems to me that >> >> commit d9c9e4db186ab4d81f84e6f22b225d333b9424e3 >> Author: Florent Revest <revest@chromium.org> >> Date: Mon Apr 19 17:52:38 2021 +0200 >> >> bpf: Factorize bpf_trace_printk and bpf_seq_printf >> >> is buggy. In particular, these two snippets: >> >> +#define BPF_CAST_FMT_ARG(arg_nb, args, mod) \ >> + (mod[arg_nb] == BPF_PRINTF_LONG_LONG || \ >> + (mod[arg_nb] == BPF_PRINTF_LONG && __BITS_PER_LONG == 64) \ >> + ? (u64)args[arg_nb] \ >> + : (u32)args[arg_nb]) >> >> >> + ret = snprintf(buf, sizeof(buf), fmt, BPF_CAST_FMT_ARG(0, args, >> mod), >> + BPF_CAST_FMT_ARG(1, args, mod), BPF_CAST_FMT_ARG(2, >> args, mod)); >> >> Regardless of the casts done in that macro, the type of the resulting >> expression is that resulting from C promotion rules. And (foo ? (u64)bla >> : (u32)blib) has type u64, which is thus the type the compiler uses when >> building the vararg list being passed into snprintf(). C simply doesn't >> allow you to change types at run-time in this way. >> >> It probably works fine on x86-64, which passes the first six or so >> argument in registers, va_start() puts those registers into the va_list >> opaque structure, and when it comes time to do a va_arg(int), just the >> lower 32 bits are used. It is broken on i386 and other architectures >> where arguments are passed on the stack (and for x86-64 as well had >> there been a few more arguments) and va_arg(ap, int) is essentially ({ >> int res = *(int *)ap; ap += 4; res; }) [or maybe it's -= 4 because stack >> direction etc., that's not really relevant here]. >> >> Rasmus > > Thank you Rasmus :) I think you were lucky (or unlucky, depending on how you look at it) with your test case + num_ret = BPF_SNPRINTF(num_out, sizeof(num_out), + "%d %u %x %li %llu %lX", + -8, 9, 150, -424242, 1337, 0xDABBAD00); because it just so happens that the eventual snprintf() call uses three arguments for itself, so the first three 32-bit arguments end up being passed via registers, while the 64 bit arguments are passed via the stack. Can I get you to test what would happen if you interchanged these, i.e. changed the test case to do + num_ret = BPF_SNPRINTF(num_out, sizeof(num_out), + "%li %llu %lX %d %u %x", + -424242, 1337, 0xDABBAD00, -8, 9, 150); (or just add a few more expects-a-32-bit argument format specifiers and corresponding arguments). My guess is that up until formatting -8 it goes well, but when vsnprintf() is to grab the argument corresponding to %u, it will get the 0xffffffff from the upper half of (u64)-8. > It seems that we went offtrack in > https://lore.kernel.org/bpf/CAEf4BzZVEGM4esi-Rz67_xX_RTDrgxViy0gHfpeauECR5bmRNA@mail.gmail.com/ > and we do need something like "88a5c690b6 bpf: fix bpf_trace_printk on > 32 bit archs". Thinking about it again, it's clearer now why the > __BPF_TP_EMIT macro emits 2^3=8 different __trace_printk() indeed. Isn't it 3^3 = 27, or has that been reduced in -next compared to Linus' master? Doesn't matter much, just curious. > In the case of bpf_trace_printk with a maximum of 3 args, it's > relatively cheap; but for bpf_seq_printf and bpf_snprintf which accept > up to 12 arguments, that would be 2^12=4096 calls. Yeah, that doesn't scale at all. Until now > bpf_seq_printf has just ignored this problem and just considered > everything as u64, I wonder if that'd be the best approach for these > two helpers anyway. > [wild handwaving ahead] One possibility, if one is willing to get hands dirty and dig into ABI details on various arches, is to create a struct fake_va_list { union { va_list ap; /* opaque, compiler-provided */ arch_va_list _ap; /* arch-provided, must match layout of ap */ }; void *stack; }; Then do struct fake_va_list fva; u64 buf[24]; /* or whatever you want to support, can be different in different functions */ fake_va_init(&fva, buf); /* various C code, parsing format string etc. */ if (arg[i] is really 32 bits) fake_va_push(&fva, (u32)arg[i]); else fake_va_push(&fva, (u64)arg[i]); /* etc. */ ... vsnprintf(out, size, fmt, fva.va); On arches like x86-64, where va_list is really a typedef for a one-element array of struct __va_list_tag { unsigned int gp_offset; unsigned int fp_offset; void * overflow_arg_area; void * reg_save_area; }; fake_va_init() would make the va_list look like the reg_save_area is already used (i.e., set gp_offset to 48), and initialize both ->_ap.overflow_arg_area and ->stack to point at the given buffer. fake_va_push() would use and update stack appropriately. For 32 bit x86, va_list is really just a pointer, so fake_va_init would essentially just do "fva->_ap = fva->stack = buf", and fake_va_push() would again just need to manipulate ->stack. It's not pretty, but I don't think it necessarily requires too much arch-specific work (fake_va_push() could be common, perhaps just with a arch define to say whether 64 bit arguments need ->stack to first be up-aligned to an 8 byte boundary). Rasmus
On Thu, Apr 22, 2021 at 12:09 PM Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote: > > On 22/04/2021 11.23, Florent Revest wrote: > > On Thu, Apr 22, 2021 at 9:13 AM Rasmus Villemoes > > <linux@rasmusvillemoes.dk> wrote: > >> > >> On 22/04/2021 05.32, Andrii Nakryiko wrote: > >>> On Wed, Apr 21, 2021 at 6:19 PM Rasmus Villemoes > >>> <linux@rasmusvillemoes.dk> wrote: > >>>> > >>>> The comment is wrong. snprintf(buf, 16, "") and snprintf(buf, 16, > >>>> "%s", "") etc. will certainly put '\0' in buf[0]. The only case where > >>>> snprintf() does not guarantee a nul-terminated string is when it is > >>>> given a buffer size of 0 (which of course prevents it from writing > >>>> anything at all to the buffer). > >>>> > >>>> Remove it before it gets cargo-culted elsewhere. > >>>> > >>>> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> > >>>> --- > >>>> kernel/trace/bpf_trace.c | 3 --- > >>>> 1 file changed, 3 deletions(-) > >>>> > >>> > >>> The change looks good to me, but please rebase it on top of the > >>> bpf-next tree. This is not a bug, so it doesn't have to go into the > >>> bpf tree. As it is right now, it doesn't apply cleanly onto bpf-next. > > > > FWIW the idea of the patch also looks good to me :) > > > >> Thanks for the pointer. Looking in next-20210420, it seems to me that > >> > >> commit d9c9e4db186ab4d81f84e6f22b225d333b9424e3 > >> Author: Florent Revest <revest@chromium.org> > >> Date: Mon Apr 19 17:52:38 2021 +0200 > >> > >> bpf: Factorize bpf_trace_printk and bpf_seq_printf > >> > >> is buggy. In particular, these two snippets: > >> > >> +#define BPF_CAST_FMT_ARG(arg_nb, args, mod) \ > >> + (mod[arg_nb] == BPF_PRINTF_LONG_LONG || \ > >> + (mod[arg_nb] == BPF_PRINTF_LONG && __BITS_PER_LONG == 64) \ > >> + ? (u64)args[arg_nb] \ > >> + : (u32)args[arg_nb]) > >> > >> > >> + ret = snprintf(buf, sizeof(buf), fmt, BPF_CAST_FMT_ARG(0, args, > >> mod), > >> + BPF_CAST_FMT_ARG(1, args, mod), BPF_CAST_FMT_ARG(2, > >> args, mod)); > >> > >> Regardless of the casts done in that macro, the type of the resulting > >> expression is that resulting from C promotion rules. And (foo ? (u64)bla > >> : (u32)blib) has type u64, which is thus the type the compiler uses when > >> building the vararg list being passed into snprintf(). C simply doesn't > >> allow you to change types at run-time in this way. > >> > >> It probably works fine on x86-64, which passes the first six or so > >> argument in registers, va_start() puts those registers into the va_list > >> opaque structure, and when it comes time to do a va_arg(int), just the > >> lower 32 bits are used. It is broken on i386 and other architectures > >> where arguments are passed on the stack (and for x86-64 as well had > >> there been a few more arguments) and va_arg(ap, int) is essentially ({ > >> int res = *(int *)ap; ap += 4; res; }) [or maybe it's -= 4 because stack > >> direction etc., that's not really relevant here]. > >> > >> Rasmus > > > > Thank you Rasmus :) > > > I think you were lucky (or unlucky, depending on how you look at it) > with your test case > > + num_ret = BPF_SNPRINTF(num_out, sizeof(num_out), > + "%d %u %x %li %llu %lX", > + -8, 9, 150, -424242, 1337, 0xDABBAD00); > > because it just so happens that the eventual snprintf() call uses three > arguments for itself, so the first three 32-bit arguments end up being > passed via registers, while the 64 bit arguments are passed via the > stack. Can I get you to test what would happen if you interchanged > these, i.e. changed the test case to do > > + num_ret = BPF_SNPRINTF(num_out, sizeof(num_out), > + "%li %llu %lX %d %u %x", > + -424242, 1337, 0xDABBAD00, -8, 9, 150); > > (or just add a few more expects-a-32-bit argument format specifiers and > corresponding arguments). My guess is that up until formatting -8 it > goes well, but when vsnprintf() is to grab the argument corresponding to > %u, it will get the 0xffffffff from the upper half of (u64)-8. I will need to come up with a repro and let you know yes :) > > It seems that we went offtrack in > > https://lore.kernel.org/bpf/CAEf4BzZVEGM4esi-Rz67_xX_RTDrgxViy0gHfpeauECR5bmRNA@mail.gmail.com/ > > and we do need something like "88a5c690b6 bpf: fix bpf_trace_printk on > > 32 bit archs". Thinking about it again, it's clearer now why the > > __BPF_TP_EMIT macro emits 2^3=8 different __trace_printk() indeed. > > Isn't it 3^3 = 27, or has that been reduced in -next compared to Linus' > master? Doesn't matter much, just curious. > > > In the case of bpf_trace_printk with a maximum of 3 args, it's > > relatively cheap; but for bpf_seq_printf and bpf_snprintf which accept > > up to 12 arguments, that would be 2^12=4096 calls. > > Yeah, that doesn't scale at all. > > Until now > > bpf_seq_printf has just ignored this problem and just considered > > everything as u64, I wonder if that'd be the best approach for these > > two helpers anyway. > > > > [wild handwaving ahead] > > One possibility, if one is willing to get hands dirty and dig into ABI > details on various arches, is to create a > > struct fake_va_list { > union { > va_list ap; /* opaque, compiler-provided */ > arch_va_list _ap; /* arch-provided, must match layout of ap */ > }; > void *stack; > }; > > Then do > > struct fake_va_list fva; > u64 buf[24]; /* or whatever you want to support, can be different in > different functions */ > > fake_va_init(&fva, buf); > /* various C code, parsing format string etc. */ > if (arg[i] is really 32 bits) > fake_va_push(&fva, (u32)arg[i]); > else > fake_va_push(&fva, (u64)arg[i]); > /* etc. */ > ... > vsnprintf(out, size, fmt, fva.va); > > On arches like x86-64, where va_list is really a typedef for a > one-element array of > > struct __va_list_tag { > unsigned int gp_offset; > unsigned int fp_offset; > void * overflow_arg_area; > void * reg_save_area; > }; > > > fake_va_init() would make the va_list look like the reg_save_area is > already used (i.e., set gp_offset to 48), and initialize both > ->_ap.overflow_arg_area and ->stack to point at the given buffer. > fake_va_push() would use and update stack appropriately. For 32 bit x86, > va_list is really just a pointer, so fake_va_init would essentially just > do "fva->_ap = fva->stack = buf", and fake_va_push() would again just > need to manipulate ->stack. > > It's not pretty, but I don't think it necessarily requires too much > arch-specific work (fake_va_push() could be common, perhaps just with a > arch define to say whether 64 bit arguments need ->stack to first be > up-aligned to an 8 byte boundary). > > Rasmus Creative! :D I think these arch-specific structures would be a hard sell though ahah. I was having a stroll through lib/vsprintf.c and noticed bstr_printf: * This function like C99 vsnprintf, but the difference is that vsnprintf gets * arguments from stack, and bstr_printf gets arguments from @bin_buf which is * a binary buffer that generated by vbin_printf. Maybe it would be easier to just build our argument buffer similarly to what vbin_printf does.
On Thu, Apr 22, 2021 at 2:36 PM Florent Revest <revest@chromium.org> wrote: > > On Thu, Apr 22, 2021 at 12:09 PM Rasmus Villemoes > <linux@rasmusvillemoes.dk> wrote: > > > > On 22/04/2021 11.23, Florent Revest wrote: > > > On Thu, Apr 22, 2021 at 9:13 AM Rasmus Villemoes > > > <linux@rasmusvillemoes.dk> wrote: > > >> > > >> On 22/04/2021 05.32, Andrii Nakryiko wrote: > > >>> On Wed, Apr 21, 2021 at 6:19 PM Rasmus Villemoes > > >>> <linux@rasmusvillemoes.dk> wrote: > > >>>> > > >>>> The comment is wrong. snprintf(buf, 16, "") and snprintf(buf, 16, > > >>>> "%s", "") etc. will certainly put '\0' in buf[0]. The only case where > > >>>> snprintf() does not guarantee a nul-terminated string is when it is > > >>>> given a buffer size of 0 (which of course prevents it from writing > > >>>> anything at all to the buffer). > > >>>> > > >>>> Remove it before it gets cargo-culted elsewhere. > > >>>> > > >>>> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> > > >>>> --- > > >>>> kernel/trace/bpf_trace.c | 3 --- > > >>>> 1 file changed, 3 deletions(-) > > >>>> > > >>> > > >>> The change looks good to me, but please rebase it on top of the > > >>> bpf-next tree. This is not a bug, so it doesn't have to go into the > > >>> bpf tree. As it is right now, it doesn't apply cleanly onto bpf-next. > > > > > > FWIW the idea of the patch also looks good to me :) > > > > > >> Thanks for the pointer. Looking in next-20210420, it seems to me that > > >> > > >> commit d9c9e4db186ab4d81f84e6f22b225d333b9424e3 > > >> Author: Florent Revest <revest@chromium.org> > > >> Date: Mon Apr 19 17:52:38 2021 +0200 > > >> > > >> bpf: Factorize bpf_trace_printk and bpf_seq_printf > > >> > > >> is buggy. In particular, these two snippets: > > >> > > >> +#define BPF_CAST_FMT_ARG(arg_nb, args, mod) \ > > >> + (mod[arg_nb] == BPF_PRINTF_LONG_LONG || \ > > >> + (mod[arg_nb] == BPF_PRINTF_LONG && __BITS_PER_LONG == 64) \ > > >> + ? (u64)args[arg_nb] \ > > >> + : (u32)args[arg_nb]) > > >> > > >> > > >> + ret = snprintf(buf, sizeof(buf), fmt, BPF_CAST_FMT_ARG(0, args, > > >> mod), > > >> + BPF_CAST_FMT_ARG(1, args, mod), BPF_CAST_FMT_ARG(2, > > >> args, mod)); > > >> > > >> Regardless of the casts done in that macro, the type of the resulting > > >> expression is that resulting from C promotion rules. And (foo ? (u64)bla > > >> : (u32)blib) has type u64, which is thus the type the compiler uses when > > >> building the vararg list being passed into snprintf(). C simply doesn't > > >> allow you to change types at run-time in this way. > > >> > > >> It probably works fine on x86-64, which passes the first six or so > > >> argument in registers, va_start() puts those registers into the va_list > > >> opaque structure, and when it comes time to do a va_arg(int), just the > > >> lower 32 bits are used. It is broken on i386 and other architectures > > >> where arguments are passed on the stack (and for x86-64 as well had > > >> there been a few more arguments) and va_arg(ap, int) is essentially ({ > > >> int res = *(int *)ap; ap += 4; res; }) [or maybe it's -= 4 because stack > > >> direction etc., that's not really relevant here]. > > >> > > >> Rasmus > > > > > > Thank you Rasmus :) > > > > > > I think you were lucky (or unlucky, depending on how you look at it) > > with your test case > > > > + num_ret = BPF_SNPRINTF(num_out, sizeof(num_out), > > + "%d %u %x %li %llu %lX", > > + -8, 9, 150, -424242, 1337, 0xDABBAD00); > > > > because it just so happens that the eventual snprintf() call uses three > > arguments for itself, so the first three 32-bit arguments end up being > > passed via registers, while the 64 bit arguments are passed via the > > stack. Can I get you to test what would happen if you interchanged > > these, i.e. changed the test case to do > > > > + num_ret = BPF_SNPRINTF(num_out, sizeof(num_out), > > + "%li %llu %lX %d %u %x", > > + -424242, 1337, 0xDABBAD00, -8, 9, 150); > > > > (or just add a few more expects-a-32-bit argument format specifiers and > > corresponding arguments). My guess is that up until formatting -8 it > > goes well, but when vsnprintf() is to grab the argument corresponding to > > %u, it will get the 0xffffffff from the upper half of (u64)-8. > > I will need to come up with a repro and let you know yes :) > > > > It seems that we went offtrack in > > > https://lore.kernel.org/bpf/CAEf4BzZVEGM4esi-Rz67_xX_RTDrgxViy0gHfpeauECR5bmRNA@mail.gmail.com/ > > > and we do need something like "88a5c690b6 bpf: fix bpf_trace_printk on > > > 32 bit archs". Thinking about it again, it's clearer now why the > > > __BPF_TP_EMIT macro emits 2^3=8 different __trace_printk() indeed. > > > > Isn't it 3^3 = 27, or has that been reduced in -next compared to Linus' > > master? Doesn't matter much, just curious. > > > > > In the case of bpf_trace_printk with a maximum of 3 args, it's > > > relatively cheap; but for bpf_seq_printf and bpf_snprintf which accept > > > up to 12 arguments, that would be 2^12=4096 calls. > > > > Yeah, that doesn't scale at all. > > > > Until now > > > bpf_seq_printf has just ignored this problem and just considered > > > everything as u64, I wonder if that'd be the best approach for these > > > two helpers anyway. > > > > > > > [wild handwaving ahead] > > > > One possibility, if one is willing to get hands dirty and dig into ABI > > details on various arches, is to create a > > > > struct fake_va_list { > > union { > > va_list ap; /* opaque, compiler-provided */ > > arch_va_list _ap; /* arch-provided, must match layout of ap */ > > }; > > void *stack; > > }; > > > > Then do > > > > struct fake_va_list fva; > > u64 buf[24]; /* or whatever you want to support, can be different in > > different functions */ > > > > fake_va_init(&fva, buf); > > /* various C code, parsing format string etc. */ > > if (arg[i] is really 32 bits) > > fake_va_push(&fva, (u32)arg[i]); > > else > > fake_va_push(&fva, (u64)arg[i]); > > /* etc. */ > > ... > > vsnprintf(out, size, fmt, fva.va); > > > > On arches like x86-64, where va_list is really a typedef for a > > one-element array of > > > > struct __va_list_tag { > > unsigned int gp_offset; > > unsigned int fp_offset; > > void * overflow_arg_area; > > void * reg_save_area; > > }; > > > > > > fake_va_init() would make the va_list look like the reg_save_area is > > already used (i.e., set gp_offset to 48), and initialize both > > ->_ap.overflow_arg_area and ->stack to point at the given buffer. > > fake_va_push() would use and update stack appropriately. For 32 bit x86, > > va_list is really just a pointer, so fake_va_init would essentially just > > do "fva->_ap = fva->stack = buf", and fake_va_push() would again just > > need to manipulate ->stack. > > > > It's not pretty, but I don't think it necessarily requires too much > > arch-specific work (fake_va_push() could be common, perhaps just with a > > arch define to say whether 64 bit arguments need ->stack to first be > > up-aligned to an 8 byte boundary). > > > > Rasmus > > Creative! :D I think these arch-specific structures would be a hard > sell though ahah. > > I was having a stroll through lib/vsprintf.c and noticed bstr_printf: > > * This function like C99 vsnprintf, but the difference is that vsnprintf gets > * arguments from stack, and bstr_printf gets arguments from @bin_buf which is > * a binary buffer that generated by vbin_printf. > > Maybe it would be easier to just build our argument buffer similarly > to what vbin_printf does. I've been experimenting with this idea and it is quite promising :) it also makes the code much cleaner, I find. I'll send a series asap. BPF maintainers: should we fix forward or do you prefer reverting the snprintf series and then re-applying another snprintf series without the regression in bpf_trace_printk that mangles some argument types ? (bpf_seq_printf has always been like that so no regression there)
On Thu, Apr 22, 2021 at 8:35 AM Florent Revest <revest@chromium.org> wrote: > > > > I was having a stroll through lib/vsprintf.c and noticed bstr_printf: > > > > * This function like C99 vsnprintf, but the difference is that vsnprintf gets > > * arguments from stack, and bstr_printf gets arguments from @bin_buf which is > > * a binary buffer that generated by vbin_printf. > > > > Maybe it would be easier to just build our argument buffer similarly > > to what vbin_printf does. > > I've been experimenting with this idea and it is quite promising :) it > also makes the code much cleaner, I find. I'll send a series asap. You mean to use bstr_printf internally ? That could work indeed. Make sure CONFIG_BINARY_PRINTF is selected. CONFIG_TRACING does it already. > BPF maintainers: should we fix forward or do you prefer reverting the > snprintf series and then re-applying another snprintf series without > the regression in bpf_trace_printk that mangles some argument types ? > (bpf_seq_printf has always been like that so no regression there) Pls send it as a follow up. Along with another patch to clean verifier bits we discussed. The merge window is approaching, so it has to be done asap.
On Thu, Apr 22, 2021 at 5:44 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Thu, Apr 22, 2021 at 8:35 AM Florent Revest <revest@chromium.org> wrote: > > > > > > I was having a stroll through lib/vsprintf.c and noticed bstr_printf: > > > > > > * This function like C99 vsnprintf, but the difference is that vsnprintf gets > > > * arguments from stack, and bstr_printf gets arguments from @bin_buf which is > > > * a binary buffer that generated by vbin_printf. > > > > > > Maybe it would be easier to just build our argument buffer similarly > > > to what vbin_printf does. > > > > I've been experimenting with this idea and it is quite promising :) it > > also makes the code much cleaner, I find. I'll send a series asap. > > You mean to use bstr_printf internally ? That could work indeed. > Make sure CONFIG_BINARY_PRINTF is selected. > CONFIG_TRACING does it already. Yes :) > > BPF maintainers: should we fix forward or do you prefer reverting the > > snprintf series and then re-applying another snprintf series without > > the regression in bpf_trace_printk that mangles some argument types ? > > (bpf_seq_printf has always been like that so no regression there) > > Pls send it as a follow up. > Along with another patch to clean verifier bits we discussed. > The merge window is approaching, so it has to be done asap. On it ;)
On Thu, Apr 22, 2021 at 2:23 AM Florent Revest <revest@chromium.org> wrote: > > On Thu, Apr 22, 2021 at 9:13 AM Rasmus Villemoes > <linux@rasmusvillemoes.dk> wrote: > > > > On 22/04/2021 05.32, Andrii Nakryiko wrote: > > > On Wed, Apr 21, 2021 at 6:19 PM Rasmus Villemoes > > > <linux@rasmusvillemoes.dk> wrote: > > >> > > >> The comment is wrong. snprintf(buf, 16, "") and snprintf(buf, 16, > > >> "%s", "") etc. will certainly put '\0' in buf[0]. The only case where > > >> snprintf() does not guarantee a nul-terminated string is when it is > > >> given a buffer size of 0 (which of course prevents it from writing > > >> anything at all to the buffer). > > >> > > >> Remove it before it gets cargo-culted elsewhere. > > >> > > >> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> > > >> --- > > >> kernel/trace/bpf_trace.c | 3 --- > > >> 1 file changed, 3 deletions(-) > > >> > > > > > > The change looks good to me, but please rebase it on top of the > > > bpf-next tree. This is not a bug, so it doesn't have to go into the > > > bpf tree. As it is right now, it doesn't apply cleanly onto bpf-next. > > FWIW the idea of the patch also looks good to me :) > > > Thanks for the pointer. Looking in next-20210420, it seems to me that > > > > commit d9c9e4db186ab4d81f84e6f22b225d333b9424e3 > > Author: Florent Revest <revest@chromium.org> > > Date: Mon Apr 19 17:52:38 2021 +0200 > > > > bpf: Factorize bpf_trace_printk and bpf_seq_printf > > > > is buggy. In particular, these two snippets: > > > > +#define BPF_CAST_FMT_ARG(arg_nb, args, mod) \ > > + (mod[arg_nb] == BPF_PRINTF_LONG_LONG || \ > > + (mod[arg_nb] == BPF_PRINTF_LONG && __BITS_PER_LONG == 64) \ > > + ? (u64)args[arg_nb] \ > > + : (u32)args[arg_nb]) > > > > > > + ret = snprintf(buf, sizeof(buf), fmt, BPF_CAST_FMT_ARG(0, args, > > mod), > > + BPF_CAST_FMT_ARG(1, args, mod), BPF_CAST_FMT_ARG(2, > > args, mod)); > > > > Regardless of the casts done in that macro, the type of the resulting > > expression is that resulting from C promotion rules. And (foo ? (u64)bla > > : (u32)blib) has type u64, which is thus the type the compiler uses when > > building the vararg list being passed into snprintf(). C simply doesn't > > allow you to change types at run-time in this way. > > > > It probably works fine on x86-64, which passes the first six or so > > argument in registers, va_start() puts those registers into the va_list > > opaque structure, and when it comes time to do a va_arg(int), just the > > lower 32 bits are used. It is broken on i386 and other architectures > > where arguments are passed on the stack (and for x86-64 as well had > > there been a few more arguments) and va_arg(ap, int) is essentially ({ > > int res = *(int *)ap; ap += 4; res; }) [or maybe it's -= 4 because stack > > direction etc., that's not really relevant here]. > > > > Rasmus > > Thank you Rasmus :) > > It seems that we went offtrack in > https://lore.kernel.org/bpf/CAEf4BzZVEGM4esi-Rz67_xX_RTDrgxViy0gHfpeauECR5bmRNA@mail.gmail.com/ > and we do need something like "88a5c690b6 bpf: fix bpf_trace_printk on > 32 bit archs". Thinking about it again, it's clearer now why the > __BPF_TP_EMIT macro emits 2^3=8 different __trace_printk() indeed. Yeah, we wondering but no one could guess why it was done the way it was done :) Next time we should invest in a better comment ;-P > > In the case of bpf_trace_printk with a maximum of 3 args, it's > relatively cheap; but for bpf_seq_printf and bpf_snprintf which accept > up to 12 arguments, that would be 2^12=4096 calls. Until now > bpf_seq_printf has just ignored this problem and just considered > everything as u64, I wonder if that'd be the best approach for these > two helpers anyway.
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index b0c45d923f0f..4ee55df84cd3 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -412,9 +412,6 @@ static __printf(1, 0) int bpf_do_trace_printk(const char *fmt, ...) va_start(ap, fmt); ret = vsnprintf(buf, sizeof(buf), fmt, ap); va_end(ap); - /* vsnprintf() will not append null for zero-length strings */ - if (ret == 0) - buf[0] = '\0'; trace_bpf_trace_printk(buf); raw_spin_unlock_irqrestore(&trace_printk_lock, flags);
The comment is wrong. snprintf(buf, 16, "") and snprintf(buf, 16, "%s", "") etc. will certainly put '\0' in buf[0]. The only case where snprintf() does not guarantee a nul-terminated string is when it is given a buffer size of 0 (which of course prevents it from writing anything at all to the buffer). Remove it before it gets cargo-culted elsewhere. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> --- kernel/trace/bpf_trace.c | 3 --- 1 file changed, 3 deletions(-)