Message ID | 20210324022211.1718762-4-revest@chromium.org (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | Add a snprintf eBPF helper | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for bpf-next |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | warning | 9 maintainers not CCed: netdev@vger.kernel.org joe@cilium.io linux-api@vger.kernel.org mingo@redhat.com kafai@fb.com rostedt@goodmis.org john.fastabend@gmail.com songliubraving@fb.com quentin@isovalent.com |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 12236 this patch: 12236 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | warning | CHECK: Alignment should match open parenthesis WARNING: line length of 87 exceeds 80 columns |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 12742 this patch: 12740 |
netdev/header_inline | success | Link |
On Tue, Mar 23, 2021 at 7:23 PM Florent Revest <revest@chromium.org> wrote: > > The implementation takes inspiration from the existing bpf_trace_printk > helper but there are a few differences: > > To allow for a large number of format-specifiers, parameters are > provided in an array, like in bpf_seq_printf. > > Because the output string takes two arguments and the array of > parameters also takes two arguments, the format string needs to fit in > one argument. But because ARG_PTR_TO_CONST_STR guarantees to point to a > NULL-terminated read-only map, we don't need a format string length arg. > > Because the format-string is known at verification time, we also move > most of the format string validation, currently done in formatting > helper calls, into the verifier logic. This makes debugging easier and > also slightly improves the runtime performance. > > Signed-off-by: Florent Revest <revest@chromium.org> > --- > include/linux/bpf.h | 6 ++++ > include/uapi/linux/bpf.h | 28 ++++++++++++++++++ > kernel/bpf/helpers.c | 2 ++ > kernel/bpf/verifier.c | 41 +++++++++++++++++++++++++++ > kernel/trace/bpf_trace.c | 52 ++++++++++++++++++++++++++++++++++ > tools/include/uapi/linux/bpf.h | 28 ++++++++++++++++++ > 6 files changed, 157 insertions(+) > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index 7b5319d75b3e..f3d9c8fa60b3 100644 > --- a/include/linux/bpf.h > +++ b/include/linux/bpf.h > @@ -1893,6 +1893,7 @@ extern const struct bpf_func_proto bpf_skc_to_tcp_request_sock_proto; > extern const struct bpf_func_proto bpf_skc_to_udp6_sock_proto; > extern const struct bpf_func_proto bpf_copy_from_user_proto; > extern const struct bpf_func_proto bpf_snprintf_btf_proto; > +extern const struct bpf_func_proto bpf_snprintf_proto; > extern const struct bpf_func_proto bpf_per_cpu_ptr_proto; > extern const struct bpf_func_proto bpf_this_cpu_ptr_proto; > extern const struct bpf_func_proto bpf_ktime_get_coarse_ns_proto; > @@ -2018,4 +2019,9 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t, > struct btf_id_set; > bool btf_id_set_contains(const struct btf_id_set *set, u32 id); > > +enum bpf_printf_mod_type; > +int bpf_printf_preamble(char *fmt, u32 fmt_size, const u64 *raw_args, > + u64 *final_args, enum bpf_printf_mod_type *mod, > + u32 num_args); > + > #endif /* _LINUX_BPF_H */ > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index 2d3036e292a9..86af61e912c6 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -4660,6 +4660,33 @@ union bpf_attr { > * Return > * The number of traversed map elements for success, **-EINVAL** for > * invalid **flags**. > + * > + * long bpf_snprintf(char *str, u32 str_size, const char *fmt, u64 *data, u32 data_len) > + * Description > + * Outputs a string into the **str** buffer of size **str_size** > + * based on a format string stored in a read-only map pointed by > + * **fmt**. > + * > + * Each format specifier in **fmt** corresponds to one u64 element > + * in the **data** array. For strings and pointers where pointees > + * are accessed, only the pointer values are stored in the *data* > + * array. The *data_len* is the size of *data* in bytes. > + * > + * Formats **%s** and **%p{i,I}{4,6}** require to read kernel > + * memory. Reading kernel memory may fail due to either invalid > + * address or valid address but requiring a major memory fault. If > + * reading kernel memory fails, the string for **%s** will be an > + * empty string, and the ip address for **%p{i,I}{4,6}** will be 0. would it make sense for sleepable programs to allow memory fault when reading memory? > + * Not returning error to bpf program is consistent with what > + * **bpf_trace_printk**\ () does for now. > + * > + * Return > + * The strictly positive length of the formatted string, including > + * the trailing zero character. If the return value is greater than > + * **str_size**, **str** contains a truncated string, guaranteed to > + * be zero-terminated. Except when str_size == 0. > + * > + * Or **-EBUSY** if the per-CPU memory copy buffer is busy. > */ > #define __BPF_FUNC_MAPPER(FN) \ > FN(unspec), \ > @@ -4827,6 +4854,7 @@ union bpf_attr { > FN(sock_from_file), \ > FN(check_mtu), \ > FN(for_each_map_elem), \ > + FN(snprintf), \ > /* */ > > /* integer value in 'imm' field of BPF_CALL instruction selects which helper > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > index 074800226327..12f4cfb04fe7 100644 > --- a/kernel/bpf/helpers.c > +++ b/kernel/bpf/helpers.c > @@ -750,6 +750,8 @@ bpf_base_func_proto(enum bpf_func_id func_id) > return &bpf_probe_read_kernel_str_proto; > case BPF_FUNC_snprintf_btf: > return &bpf_snprintf_btf_proto; > + case BPF_FUNC_snprintf: > + return &bpf_snprintf_proto; > default: > return NULL; > } > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 9e03608725b4..a89599dc51c9 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -5729,6 +5729,41 @@ static int check_reference_leak(struct bpf_verifier_env *env) > return state->acquired_refs ? -EINVAL : 0; > } > > +static int check_bpf_snprintf_call(struct bpf_verifier_env *env, > + struct bpf_reg_state *regs) > +{ > + struct bpf_reg_state *fmt_reg = ®s[BPF_REG_3]; > + struct bpf_reg_state *data_len_reg = ®s[BPF_REG_5]; > + struct bpf_map *fmt_map = fmt_reg->map_ptr; > + int err, fmt_map_off, num_args; > + u64 fmt_addr; > + char *fmt; > + > + /* data must be an array of u64 so data_len must be a multiple of 8 */ > + if (data_len_reg->var_off.value & 7) `% 8` is not cool anymore... :) > + return -EINVAL; > + num_args = data_len_reg->var_off.value / 8; > + > + /* fmt being ARG_PTR_TO_CONST_STR guarantees that var_off is const > + * and map_direct_value_addr is set. > + */ > + fmt_map_off = fmt_reg->off + fmt_reg->var_off.value; > + err = fmt_map->ops->map_direct_value_addr(fmt_map, &fmt_addr, > + fmt_map_off); > + if (err) > + return err; > + fmt = (char *)fmt_addr + fmt_map_off; > + > + /* We are also guaranteed that fmt+fmt_map_off is NULL terminated, we > + * can focus on validating the format specifiers. > + */ > + err = bpf_printf_preamble(fmt, UINT_MAX, NULL, NULL, NULL, num_args); > + if (err < 0) > + verbose(env, "Invalid format string\n"); > + > + return err; > +} > + > static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn, > int *insn_idx_p) > { > @@ -5843,6 +5878,12 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn > return -EINVAL; > } > > + if (func_id == BPF_FUNC_snprintf) { > + err = check_bpf_snprintf_call(env, regs); > + if (err < 0) > + return err; > + } > + > /* reset caller saved regs */ > for (i = 0; i < CALLER_SAVED_REGS; i++) { > mark_reg_not_init(env, regs, caller_saved[i]); > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > index 0fdca94a3c9c..15cbc8b63206 100644 > --- a/kernel/trace/bpf_trace.c > +++ b/kernel/trace/bpf_trace.c > @@ -1230,6 +1230,56 @@ const struct bpf_func_proto bpf_snprintf_btf_proto = { > .arg5_type = ARG_ANYTHING, > }; > > +#define MAX_SNPRINTF_VARARGS 12 > + > +BPF_CALL_5(bpf_snprintf, char *, str, u32, str_size, char *, fmt, > + const void *, data, u32, data_len) > +{ > + enum bpf_printf_mod_type mod[MAX_SNPRINTF_VARARGS]; > + u64 args[MAX_SNPRINTF_VARARGS]; > + int err, num_args; > + > + if (data_len & 7 || data_len > MAX_SNPRINTF_VARARGS * 8 || > + (data_len && !data)) see previous patches, data_len > 0 should be iff data != NULL, I think > + return -EINVAL; > + num_args = data_len / 8; > + > + /* ARG_PTR_TO_CONST_STR guarantees that fmt is zero-terminated so we > + * can safely give an unbounded size. > + */ > + err = bpf_printf_preamble(fmt, UINT_MAX, data, args, mod, num_args); > + if (err < 0) > + return err; > + > + /* Maximumly we can have MAX_SNPRINTF_VARARGS parameters, just give > + * all of them to snprintf(). > + */ > + err = snprintf(str, str_size, fmt, BPF_CAST_FMT_ARG(0, args, mod), > + BPF_CAST_FMT_ARG(1, args, mod), BPF_CAST_FMT_ARG(2, args, mod), > + BPF_CAST_FMT_ARG(3, args, mod), BPF_CAST_FMT_ARG(4, args, mod), > + BPF_CAST_FMT_ARG(5, args, mod), BPF_CAST_FMT_ARG(6, args, mod), > + BPF_CAST_FMT_ARG(7, args, mod), BPF_CAST_FMT_ARG(8, args, mod), > + BPF_CAST_FMT_ARG(9, args, mod), BPF_CAST_FMT_ARG(10, args, mod), > + BPF_CAST_FMT_ARG(11, args, mod)); > + if (str_size) > + str[str_size - 1] = '\0'; hm... what if err < str_size ? > + > + bpf_printf_postamble(); > + > + return err + 1; > +} > + [...]
On Fri, Mar 26, 2021 at 11:55 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > On Tue, Mar 23, 2021 at 7:23 PM Florent Revest <revest@chromium.org> wrote: > > The implementation takes inspiration from the existing bpf_trace_printk > > helper but there are a few differences: > > > > To allow for a large number of format-specifiers, parameters are > > provided in an array, like in bpf_seq_printf. > > > > Because the output string takes two arguments and the array of > > parameters also takes two arguments, the format string needs to fit in > > one argument. But because ARG_PTR_TO_CONST_STR guarantees to point to a > > NULL-terminated read-only map, we don't need a format string length arg. > > > > Because the format-string is known at verification time, we also move > > most of the format string validation, currently done in formatting > > helper calls, into the verifier logic. This makes debugging easier and > > also slightly improves the runtime performance. > > > > Signed-off-by: Florent Revest <revest@chromium.org> > > --- > > include/linux/bpf.h | 6 ++++ > > include/uapi/linux/bpf.h | 28 ++++++++++++++++++ > > kernel/bpf/helpers.c | 2 ++ > > kernel/bpf/verifier.c | 41 +++++++++++++++++++++++++++ > > kernel/trace/bpf_trace.c | 52 ++++++++++++++++++++++++++++++++++ > > tools/include/uapi/linux/bpf.h | 28 ++++++++++++++++++ > > 6 files changed, 157 insertions(+) > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > > index 7b5319d75b3e..f3d9c8fa60b3 100644 > > --- a/include/linux/bpf.h > > +++ b/include/linux/bpf.h > > @@ -1893,6 +1893,7 @@ extern const struct bpf_func_proto bpf_skc_to_tcp_request_sock_proto; > > extern const struct bpf_func_proto bpf_skc_to_udp6_sock_proto; > > extern const struct bpf_func_proto bpf_copy_from_user_proto; > > extern const struct bpf_func_proto bpf_snprintf_btf_proto; > > +extern const struct bpf_func_proto bpf_snprintf_proto; > > extern const struct bpf_func_proto bpf_per_cpu_ptr_proto; > > extern const struct bpf_func_proto bpf_this_cpu_ptr_proto; > > extern const struct bpf_func_proto bpf_ktime_get_coarse_ns_proto; > > @@ -2018,4 +2019,9 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t, > > struct btf_id_set; > > bool btf_id_set_contains(const struct btf_id_set *set, u32 id); > > > > +enum bpf_printf_mod_type; > > +int bpf_printf_preamble(char *fmt, u32 fmt_size, const u64 *raw_args, > > + u64 *final_args, enum bpf_printf_mod_type *mod, > > + u32 num_args); > > + > > #endif /* _LINUX_BPF_H */ > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > index 2d3036e292a9..86af61e912c6 100644 > > --- a/include/uapi/linux/bpf.h > > +++ b/include/uapi/linux/bpf.h > > @@ -4660,6 +4660,33 @@ union bpf_attr { > > * Return > > * The number of traversed map elements for success, **-EINVAL** for > > * invalid **flags**. > > + * > > + * long bpf_snprintf(char *str, u32 str_size, const char *fmt, u64 *data, u32 data_len) > > + * Description > > + * Outputs a string into the **str** buffer of size **str_size** > > + * based on a format string stored in a read-only map pointed by > > + * **fmt**. > > + * > > + * Each format specifier in **fmt** corresponds to one u64 element > > + * in the **data** array. For strings and pointers where pointees > > + * are accessed, only the pointer values are stored in the *data* > > + * array. The *data_len* is the size of *data* in bytes. > > + * > > + * Formats **%s** and **%p{i,I}{4,6}** require to read kernel > > + * memory. Reading kernel memory may fail due to either invalid > > + * address or valid address but requiring a major memory fault. If > > + * reading kernel memory fails, the string for **%s** will be an > > + * empty string, and the ip address for **%p{i,I}{4,6}** will be 0. > > would it make sense for sleepable programs to allow memory fault when > reading memory? Probably yes. How would you do that ? I'm guessing that in bpf_trace_copy_string you would call either strncpy_from_X_nofault or strncpy_from_X depending on a condition but I'm not sure which one. > > + * Not returning error to bpf program is consistent with what > > + * **bpf_trace_printk**\ () does for now. > > + * > > + * Return > > + * The strictly positive length of the formatted string, including > > + * the trailing zero character. If the return value is greater than > > + * **str_size**, **str** contains a truncated string, guaranteed to > > + * be zero-terminated. > > Except when str_size == 0. Right > > + * > > + * Or **-EBUSY** if the per-CPU memory copy buffer is busy. > > */ > > #define __BPF_FUNC_MAPPER(FN) \ > > FN(unspec), \ > > @@ -4827,6 +4854,7 @@ union bpf_attr { > > FN(sock_from_file), \ > > FN(check_mtu), \ > > FN(for_each_map_elem), \ > > + FN(snprintf), \ > > /* */ > > > > /* integer value in 'imm' field of BPF_CALL instruction selects which helper > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > > index 074800226327..12f4cfb04fe7 100644 > > --- a/kernel/bpf/helpers.c > > +++ b/kernel/bpf/helpers.c > > @@ -750,6 +750,8 @@ bpf_base_func_proto(enum bpf_func_id func_id) > > return &bpf_probe_read_kernel_str_proto; > > case BPF_FUNC_snprintf_btf: > > return &bpf_snprintf_btf_proto; > > + case BPF_FUNC_snprintf: > > + return &bpf_snprintf_proto; > > default: > > return NULL; > > } > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > > index 9e03608725b4..a89599dc51c9 100644 > > --- a/kernel/bpf/verifier.c > > +++ b/kernel/bpf/verifier.c > > @@ -5729,6 +5729,41 @@ static int check_reference_leak(struct bpf_verifier_env *env) > > return state->acquired_refs ? -EINVAL : 0; > > } > > > > +static int check_bpf_snprintf_call(struct bpf_verifier_env *env, > > + struct bpf_reg_state *regs) > > +{ > > + struct bpf_reg_state *fmt_reg = ®s[BPF_REG_3]; > > + struct bpf_reg_state *data_len_reg = ®s[BPF_REG_5]; > > + struct bpf_map *fmt_map = fmt_reg->map_ptr; > > + int err, fmt_map_off, num_args; > > + u64 fmt_addr; > > + char *fmt; > > + > > + /* data must be an array of u64 so data_len must be a multiple of 8 */ > > + if (data_len_reg->var_off.value & 7) > > `% 8` is not cool anymore... :) Haha, this is a leftover from bpf_seq_printf but I agree % 8 is nicer. > > + return -EINVAL; > > + num_args = data_len_reg->var_off.value / 8; > > + > > + /* fmt being ARG_PTR_TO_CONST_STR guarantees that var_off is const > > + * and map_direct_value_addr is set. > > + */ > > + fmt_map_off = fmt_reg->off + fmt_reg->var_off.value; > > + err = fmt_map->ops->map_direct_value_addr(fmt_map, &fmt_addr, > > + fmt_map_off); > > + if (err) > > + return err; > > + fmt = (char *)fmt_addr + fmt_map_off; > > + > > + /* We are also guaranteed that fmt+fmt_map_off is NULL terminated, we > > + * can focus on validating the format specifiers. > > + */ > > + err = bpf_printf_preamble(fmt, UINT_MAX, NULL, NULL, NULL, num_args); > > + if (err < 0) > > + verbose(env, "Invalid format string\n"); > > + > > + return err; > > +} > > + > > static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn, > > int *insn_idx_p) > > { > > @@ -5843,6 +5878,12 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn > > return -EINVAL; > > } > > > > + if (func_id == BPF_FUNC_snprintf) { > > + err = check_bpf_snprintf_call(env, regs); > > + if (err < 0) > > + return err; > > + } > > + > > /* reset caller saved regs */ > > for (i = 0; i < CALLER_SAVED_REGS; i++) { > > mark_reg_not_init(env, regs, caller_saved[i]); > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > > index 0fdca94a3c9c..15cbc8b63206 100644 > > --- a/kernel/trace/bpf_trace.c > > +++ b/kernel/trace/bpf_trace.c > > @@ -1230,6 +1230,56 @@ const struct bpf_func_proto bpf_snprintf_btf_proto = { > > .arg5_type = ARG_ANYTHING, > > }; > > > > +#define MAX_SNPRINTF_VARARGS 12 > > + > > +BPF_CALL_5(bpf_snprintf, char *, str, u32, str_size, char *, fmt, > > + const void *, data, u32, data_len) > > +{ > > + enum bpf_printf_mod_type mod[MAX_SNPRINTF_VARARGS]; > > + u64 args[MAX_SNPRINTF_VARARGS]; > > + int err, num_args; > > + > > + if (data_len & 7 || data_len > MAX_SNPRINTF_VARARGS * 8 || > > + (data_len && !data)) > > see previous patches, data_len > 0 should be iff data != NULL, I think Commented there. > > + return -EINVAL; > > + num_args = data_len / 8; > > + > > + /* ARG_PTR_TO_CONST_STR guarantees that fmt is zero-terminated so we > > + * can safely give an unbounded size. > > + */ > > + err = bpf_printf_preamble(fmt, UINT_MAX, data, args, mod, num_args); > > + if (err < 0) > > + return err; > > + > > + /* Maximumly we can have MAX_SNPRINTF_VARARGS parameters, just give > > + * all of them to snprintf(). > > + */ > > + err = snprintf(str, str_size, fmt, BPF_CAST_FMT_ARG(0, args, mod), > > + BPF_CAST_FMT_ARG(1, args, mod), BPF_CAST_FMT_ARG(2, args, mod), > > + BPF_CAST_FMT_ARG(3, args, mod), BPF_CAST_FMT_ARG(4, args, mod), > > + BPF_CAST_FMT_ARG(5, args, mod), BPF_CAST_FMT_ARG(6, args, mod), > > + BPF_CAST_FMT_ARG(7, args, mod), BPF_CAST_FMT_ARG(8, args, mod), > > + BPF_CAST_FMT_ARG(9, args, mod), BPF_CAST_FMT_ARG(10, args, mod), > > + BPF_CAST_FMT_ARG(11, args, mod)); > > + if (str_size) > > + str[str_size - 1] = '\0'; > > hm... what if err < str_size ? Then there would be two zeroes, one set by snprintf in the middle and one set by us at the end. :| I was a bit lazy there, I agree it would be nicer if we'd do if (err >= str_size) instead. Also makes me wonder what if str == NULL and str_size != 0. I just assumed that the verifier would prevent that from happening but discussions in the other patches make me unsure now.
On Tue, Apr 6, 2021 at 9:06 AM Florent Revest <revest@chromium.org> wrote: > > On Fri, Mar 26, 2021 at 11:55 PM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > On Tue, Mar 23, 2021 at 7:23 PM Florent Revest <revest@chromium.org> wrote: > > > The implementation takes inspiration from the existing bpf_trace_printk > > > helper but there are a few differences: > > > > > > To allow for a large number of format-specifiers, parameters are > > > provided in an array, like in bpf_seq_printf. > > > > > > Because the output string takes two arguments and the array of > > > parameters also takes two arguments, the format string needs to fit in > > > one argument. But because ARG_PTR_TO_CONST_STR guarantees to point to a > > > NULL-terminated read-only map, we don't need a format string length arg. > > > > > > Because the format-string is known at verification time, we also move > > > most of the format string validation, currently done in formatting > > > helper calls, into the verifier logic. This makes debugging easier and > > > also slightly improves the runtime performance. > > > > > > Signed-off-by: Florent Revest <revest@chromium.org> > > > --- > > > include/linux/bpf.h | 6 ++++ > > > include/uapi/linux/bpf.h | 28 ++++++++++++++++++ > > > kernel/bpf/helpers.c | 2 ++ > > > kernel/bpf/verifier.c | 41 +++++++++++++++++++++++++++ > > > kernel/trace/bpf_trace.c | 52 ++++++++++++++++++++++++++++++++++ > > > tools/include/uapi/linux/bpf.h | 28 ++++++++++++++++++ > > > 6 files changed, 157 insertions(+) > > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > > > index 7b5319d75b3e..f3d9c8fa60b3 100644 > > > --- a/include/linux/bpf.h > > > +++ b/include/linux/bpf.h > > > @@ -1893,6 +1893,7 @@ extern const struct bpf_func_proto bpf_skc_to_tcp_request_sock_proto; > > > extern const struct bpf_func_proto bpf_skc_to_udp6_sock_proto; > > > extern const struct bpf_func_proto bpf_copy_from_user_proto; > > > extern const struct bpf_func_proto bpf_snprintf_btf_proto; > > > +extern const struct bpf_func_proto bpf_snprintf_proto; > > > extern const struct bpf_func_proto bpf_per_cpu_ptr_proto; > > > extern const struct bpf_func_proto bpf_this_cpu_ptr_proto; > > > extern const struct bpf_func_proto bpf_ktime_get_coarse_ns_proto; > > > @@ -2018,4 +2019,9 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t, > > > struct btf_id_set; > > > bool btf_id_set_contains(const struct btf_id_set *set, u32 id); > > > > > > +enum bpf_printf_mod_type; > > > +int bpf_printf_preamble(char *fmt, u32 fmt_size, const u64 *raw_args, > > > + u64 *final_args, enum bpf_printf_mod_type *mod, > > > + u32 num_args); > > > + > > > #endif /* _LINUX_BPF_H */ > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > > index 2d3036e292a9..86af61e912c6 100644 > > > --- a/include/uapi/linux/bpf.h > > > +++ b/include/uapi/linux/bpf.h > > > @@ -4660,6 +4660,33 @@ union bpf_attr { > > > * Return > > > * The number of traversed map elements for success, **-EINVAL** for > > > * invalid **flags**. > > > + * > > > + * long bpf_snprintf(char *str, u32 str_size, const char *fmt, u64 *data, u32 data_len) > > > + * Description > > > + * Outputs a string into the **str** buffer of size **str_size** > > > + * based on a format string stored in a read-only map pointed by > > > + * **fmt**. > > > + * > > > + * Each format specifier in **fmt** corresponds to one u64 element > > > + * in the **data** array. For strings and pointers where pointees > > > + * are accessed, only the pointer values are stored in the *data* > > > + * array. The *data_len* is the size of *data* in bytes. > > > + * > > > + * Formats **%s** and **%p{i,I}{4,6}** require to read kernel > > > + * memory. Reading kernel memory may fail due to either invalid > > > + * address or valid address but requiring a major memory fault. If > > > + * reading kernel memory fails, the string for **%s** will be an > > > + * empty string, and the ip address for **%p{i,I}{4,6}** will be 0. > > > > would it make sense for sleepable programs to allow memory fault when > > reading memory? > > Probably yes. How would you do that ? I'm guessing that in > bpf_trace_copy_string you would call either strncpy_from_X_nofault or > strncpy_from_X depending on a condition but I'm not sure which one. So you'd have different bpf_snprintf_proto definitions for sleepable and non-sleepable programs. And each implementation would call bpf_printf_prepare() with a flag specifying which copy_string variant to use (sleepable or not). So for BPF users it would be the same bpf_snprintf() helper, but it would transparently be doing different things depending on which BPF program it is being called from. That's how we do bpf_get_stack(), for example, see bpf_get_stack_proto_pe/bpf_get_stack_proto_raw_tp/bpf_get_stack_proto_tp. But consider that for a follow up, no need to address right now. > > > > + * Not returning error to bpf program is consistent with what > > > + * **bpf_trace_printk**\ () does for now. > > > + * > > > + * Return > > > + * The strictly positive length of the formatted string, including > > > + * the trailing zero character. If the return value is greater than > > > + * **str_size**, **str** contains a truncated string, guaranteed to > > > + * be zero-terminated. > > > > Except when str_size == 0. > > Right > So I assume you'll adjust the comment? I always find it confusing when zero case is allowed but it is not specified what's the behavior is. > > > + * > > > + * Or **-EBUSY** if the per-CPU memory copy buffer is busy. > > > */ [...] > > > + err = snprintf(str, str_size, fmt, BPF_CAST_FMT_ARG(0, args, mod), > > > + BPF_CAST_FMT_ARG(1, args, mod), BPF_CAST_FMT_ARG(2, args, mod), > > > + BPF_CAST_FMT_ARG(3, args, mod), BPF_CAST_FMT_ARG(4, args, mod), > > > + BPF_CAST_FMT_ARG(5, args, mod), BPF_CAST_FMT_ARG(6, args, mod), > > > + BPF_CAST_FMT_ARG(7, args, mod), BPF_CAST_FMT_ARG(8, args, mod), > > > + BPF_CAST_FMT_ARG(9, args, mod), BPF_CAST_FMT_ARG(10, args, mod), > > > + BPF_CAST_FMT_ARG(11, args, mod)); > > > + if (str_size) > > > + str[str_size - 1] = '\0'; > > > > hm... what if err < str_size ? > > Then there would be two zeroes, one set by snprintf in the middle and > one set by us at the end. :| I was a bit lazy there, I agree it would > be nicer if we'd do if (err >= str_size) instead. > snprintf() seems to be always zero-terminating the string if str_size > 0, and does nothing if str_size == 0, which is exactly what you want, so you can just drop that zero termination logic. > Also makes me wonder what if str == NULL and str_size != 0. I just > assumed that the verifier would prevent that from happening but > discussions in the other patches make me unsure now. ARG_CONST_SIZE_OR_ZERO should make sure that ARG_PTR_TO_MEM before that is a valid initialized memory. But please double-check, of course.
On Thu, Apr 8, 2021 at 12:03 AM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > On Tue, Apr 6, 2021 at 9:06 AM Florent Revest <revest@chromium.org> wrote: > > On Fri, Mar 26, 2021 at 11:55 PM Andrii Nakryiko > > <andrii.nakryiko@gmail.com> wrote: > > > On Tue, Mar 23, 2021 at 7:23 PM Florent Revest <revest@chromium.org> wrote: > > > > + * Formats **%s** and **%p{i,I}{4,6}** require to read kernel > > > > + * memory. Reading kernel memory may fail due to either invalid > > > > + * address or valid address but requiring a major memory fault. If > > > > + * reading kernel memory fails, the string for **%s** will be an > > > > + * empty string, and the ip address for **%p{i,I}{4,6}** will be 0. > > > > > > would it make sense for sleepable programs to allow memory fault when > > > reading memory? > > > > Probably yes. How would you do that ? I'm guessing that in > > bpf_trace_copy_string you would call either strncpy_from_X_nofault or > > strncpy_from_X depending on a condition but I'm not sure which one. > > So you'd have different bpf_snprintf_proto definitions for sleepable > and non-sleepable programs. And each implementation would call > bpf_printf_prepare() with a flag specifying which copy_string variant > to use (sleepable or not). So for BPF users it would be the same > bpf_snprintf() helper, but it would transparently be doing different > things depending on which BPF program it is being called from. That's > how we do bpf_get_stack(), for example, see > bpf_get_stack_proto_pe/bpf_get_stack_proto_raw_tp/bpf_get_stack_proto_tp. > > But consider that for a follow up, no need to address right now. Ok let's keep this separate. > > > > > > + * Not returning error to bpf program is consistent with what > > > > + * **bpf_trace_printk**\ () does for now. > > > > + * > > > > + * Return > > > > + * The strictly positive length of the formatted string, including > > > > + * the trailing zero character. If the return value is greater than > > > > + * **str_size**, **str** contains a truncated string, guaranteed to > > > > + * be zero-terminated. > > > > > > Except when str_size == 0. > > > > Right > > > > So I assume you'll adjust the comment? I always find it confusing when > zero case is allowed but it is not specified what's the behavior is. Yes, sorry it wasn't clear :) I agree it's worth being explicit. > > > > + err = snprintf(str, str_size, fmt, BPF_CAST_FMT_ARG(0, args, mod), > > > > + BPF_CAST_FMT_ARG(1, args, mod), BPF_CAST_FMT_ARG(2, args, mod), > > > > + BPF_CAST_FMT_ARG(3, args, mod), BPF_CAST_FMT_ARG(4, args, mod), > > > > + BPF_CAST_FMT_ARG(5, args, mod), BPF_CAST_FMT_ARG(6, args, mod), > > > > + BPF_CAST_FMT_ARG(7, args, mod), BPF_CAST_FMT_ARG(8, args, mod), > > > > + BPF_CAST_FMT_ARG(9, args, mod), BPF_CAST_FMT_ARG(10, args, mod), > > > > + BPF_CAST_FMT_ARG(11, args, mod)); > > > > + if (str_size) > > > > + str[str_size - 1] = '\0'; > > > > > > hm... what if err < str_size ? > > > > Then there would be two zeroes, one set by snprintf in the middle and > > one set by us at the end. :| I was a bit lazy there, I agree it would > > be nicer if we'd do if (err >= str_size) instead. > > > > snprintf() seems to be always zero-terminating the string if str_size > > 0, and does nothing if str_size == 0, which is exactly what you > want, so you can just drop that zero termination logic. Oh, that's right! I was confused by snprintf's documentation "the resulting string is truncated" but as I read the vsnprintf implementation I see this is indeed always zero-terminated. Great :) > > Also makes me wonder what if str == NULL and str_size != 0. I just > > assumed that the verifier would prevent that from happening but > > discussions in the other patches make me unsure now. > > > ARG_CONST_SIZE_OR_ZERO should make sure that ARG_PTR_TO_MEM before > that is a valid initialized memory. But please double-check, of > course. Will do.
diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 7b5319d75b3e..f3d9c8fa60b3 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1893,6 +1893,7 @@ extern const struct bpf_func_proto bpf_skc_to_tcp_request_sock_proto; extern const struct bpf_func_proto bpf_skc_to_udp6_sock_proto; extern const struct bpf_func_proto bpf_copy_from_user_proto; extern const struct bpf_func_proto bpf_snprintf_btf_proto; +extern const struct bpf_func_proto bpf_snprintf_proto; extern const struct bpf_func_proto bpf_per_cpu_ptr_proto; extern const struct bpf_func_proto bpf_this_cpu_ptr_proto; extern const struct bpf_func_proto bpf_ktime_get_coarse_ns_proto; @@ -2018,4 +2019,9 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t, struct btf_id_set; bool btf_id_set_contains(const struct btf_id_set *set, u32 id); +enum bpf_printf_mod_type; +int bpf_printf_preamble(char *fmt, u32 fmt_size, const u64 *raw_args, + u64 *final_args, enum bpf_printf_mod_type *mod, + u32 num_args); + #endif /* _LINUX_BPF_H */ diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 2d3036e292a9..86af61e912c6 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -4660,6 +4660,33 @@ union bpf_attr { * Return * The number of traversed map elements for success, **-EINVAL** for * invalid **flags**. + * + * long bpf_snprintf(char *str, u32 str_size, const char *fmt, u64 *data, u32 data_len) + * Description + * Outputs a string into the **str** buffer of size **str_size** + * based on a format string stored in a read-only map pointed by + * **fmt**. + * + * Each format specifier in **fmt** corresponds to one u64 element + * in the **data** array. For strings and pointers where pointees + * are accessed, only the pointer values are stored in the *data* + * array. The *data_len* is the size of *data* in bytes. + * + * Formats **%s** and **%p{i,I}{4,6}** require to read kernel + * memory. Reading kernel memory may fail due to either invalid + * address or valid address but requiring a major memory fault. If + * reading kernel memory fails, the string for **%s** will be an + * empty string, and the ip address for **%p{i,I}{4,6}** will be 0. + * Not returning error to bpf program is consistent with what + * **bpf_trace_printk**\ () does for now. + * + * Return + * The strictly positive length of the formatted string, including + * the trailing zero character. If the return value is greater than + * **str_size**, **str** contains a truncated string, guaranteed to + * be zero-terminated. + * + * Or **-EBUSY** if the per-CPU memory copy buffer is busy. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -4827,6 +4854,7 @@ union bpf_attr { FN(sock_from_file), \ FN(check_mtu), \ FN(for_each_map_elem), \ + FN(snprintf), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 074800226327..12f4cfb04fe7 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -750,6 +750,8 @@ bpf_base_func_proto(enum bpf_func_id func_id) return &bpf_probe_read_kernel_str_proto; case BPF_FUNC_snprintf_btf: return &bpf_snprintf_btf_proto; + case BPF_FUNC_snprintf: + return &bpf_snprintf_proto; default: return NULL; } diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 9e03608725b4..a89599dc51c9 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -5729,6 +5729,41 @@ static int check_reference_leak(struct bpf_verifier_env *env) return state->acquired_refs ? -EINVAL : 0; } +static int check_bpf_snprintf_call(struct bpf_verifier_env *env, + struct bpf_reg_state *regs) +{ + struct bpf_reg_state *fmt_reg = ®s[BPF_REG_3]; + struct bpf_reg_state *data_len_reg = ®s[BPF_REG_5]; + struct bpf_map *fmt_map = fmt_reg->map_ptr; + int err, fmt_map_off, num_args; + u64 fmt_addr; + char *fmt; + + /* data must be an array of u64 so data_len must be a multiple of 8 */ + if (data_len_reg->var_off.value & 7) + return -EINVAL; + num_args = data_len_reg->var_off.value / 8; + + /* fmt being ARG_PTR_TO_CONST_STR guarantees that var_off is const + * and map_direct_value_addr is set. + */ + fmt_map_off = fmt_reg->off + fmt_reg->var_off.value; + err = fmt_map->ops->map_direct_value_addr(fmt_map, &fmt_addr, + fmt_map_off); + if (err) + return err; + fmt = (char *)fmt_addr + fmt_map_off; + + /* We are also guaranteed that fmt+fmt_map_off is NULL terminated, we + * can focus on validating the format specifiers. + */ + err = bpf_printf_preamble(fmt, UINT_MAX, NULL, NULL, NULL, num_args); + if (err < 0) + verbose(env, "Invalid format string\n"); + + return err; +} + static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn, int *insn_idx_p) { @@ -5843,6 +5878,12 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn return -EINVAL; } + if (func_id == BPF_FUNC_snprintf) { + err = check_bpf_snprintf_call(env, regs); + if (err < 0) + return err; + } + /* reset caller saved regs */ for (i = 0; i < CALLER_SAVED_REGS; i++) { mark_reg_not_init(env, regs, caller_saved[i]); diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 0fdca94a3c9c..15cbc8b63206 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -1230,6 +1230,56 @@ const struct bpf_func_proto bpf_snprintf_btf_proto = { .arg5_type = ARG_ANYTHING, }; +#define MAX_SNPRINTF_VARARGS 12 + +BPF_CALL_5(bpf_snprintf, char *, str, u32, str_size, char *, fmt, + const void *, data, u32, data_len) +{ + enum bpf_printf_mod_type mod[MAX_SNPRINTF_VARARGS]; + u64 args[MAX_SNPRINTF_VARARGS]; + int err, num_args; + + if (data_len & 7 || data_len > MAX_SNPRINTF_VARARGS * 8 || + (data_len && !data)) + return -EINVAL; + num_args = data_len / 8; + + /* ARG_PTR_TO_CONST_STR guarantees that fmt is zero-terminated so we + * can safely give an unbounded size. + */ + err = bpf_printf_preamble(fmt, UINT_MAX, data, args, mod, num_args); + if (err < 0) + return err; + + /* Maximumly we can have MAX_SNPRINTF_VARARGS parameters, just give + * all of them to snprintf(). + */ + err = snprintf(str, str_size, fmt, BPF_CAST_FMT_ARG(0, args, mod), + BPF_CAST_FMT_ARG(1, args, mod), BPF_CAST_FMT_ARG(2, args, mod), + BPF_CAST_FMT_ARG(3, args, mod), BPF_CAST_FMT_ARG(4, args, mod), + BPF_CAST_FMT_ARG(5, args, mod), BPF_CAST_FMT_ARG(6, args, mod), + BPF_CAST_FMT_ARG(7, args, mod), BPF_CAST_FMT_ARG(8, args, mod), + BPF_CAST_FMT_ARG(9, args, mod), BPF_CAST_FMT_ARG(10, args, mod), + BPF_CAST_FMT_ARG(11, args, mod)); + if (str_size) + str[str_size - 1] = '\0'; + + bpf_printf_postamble(); + + return err + 1; +} + +const struct bpf_func_proto bpf_snprintf_proto = { + .func = bpf_snprintf, + .gpl_only = true, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_MEM, + .arg2_type = ARG_CONST_SIZE_OR_ZERO, + .arg3_type = ARG_PTR_TO_CONST_STR, + .arg4_type = ARG_PTR_TO_MEM, + .arg5_type = ARG_CONST_SIZE_OR_ZERO, +}; + const struct bpf_func_proto * bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) { @@ -1332,6 +1382,8 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) return &bpf_task_storage_delete_proto; case BPF_FUNC_for_each_map_elem: return &bpf_for_each_map_elem_proto; + case BPF_FUNC_snprintf: + return &bpf_snprintf_proto; default: return NULL; } diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 2d3036e292a9..86af61e912c6 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -4660,6 +4660,33 @@ union bpf_attr { * Return * The number of traversed map elements for success, **-EINVAL** for * invalid **flags**. + * + * long bpf_snprintf(char *str, u32 str_size, const char *fmt, u64 *data, u32 data_len) + * Description + * Outputs a string into the **str** buffer of size **str_size** + * based on a format string stored in a read-only map pointed by + * **fmt**. + * + * Each format specifier in **fmt** corresponds to one u64 element + * in the **data** array. For strings and pointers where pointees + * are accessed, only the pointer values are stored in the *data* + * array. The *data_len* is the size of *data* in bytes. + * + * Formats **%s** and **%p{i,I}{4,6}** require to read kernel + * memory. Reading kernel memory may fail due to either invalid + * address or valid address but requiring a major memory fault. If + * reading kernel memory fails, the string for **%s** will be an + * empty string, and the ip address for **%p{i,I}{4,6}** will be 0. + * Not returning error to bpf program is consistent with what + * **bpf_trace_printk**\ () does for now. + * + * Return + * The strictly positive length of the formatted string, including + * the trailing zero character. If the return value is greater than + * **str_size**, **str** contains a truncated string, guaranteed to + * be zero-terminated. + * + * Or **-EBUSY** if the per-CPU memory copy buffer is busy. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -4827,6 +4854,7 @@ union bpf_attr { FN(sock_from_file), \ FN(check_mtu), \ FN(for_each_map_elem), \ + FN(snprintf), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper
The implementation takes inspiration from the existing bpf_trace_printk helper but there are a few differences: To allow for a large number of format-specifiers, parameters are provided in an array, like in bpf_seq_printf. Because the output string takes two arguments and the array of parameters also takes two arguments, the format string needs to fit in one argument. But because ARG_PTR_TO_CONST_STR guarantees to point to a NULL-terminated read-only map, we don't need a format string length arg. Because the format-string is known at verification time, we also move most of the format string validation, currently done in formatting helper calls, into the verifier logic. This makes debugging easier and also slightly improves the runtime performance. Signed-off-by: Florent Revest <revest@chromium.org> --- include/linux/bpf.h | 6 ++++ include/uapi/linux/bpf.h | 28 ++++++++++++++++++ kernel/bpf/helpers.c | 2 ++ kernel/bpf/verifier.c | 41 +++++++++++++++++++++++++++ kernel/trace/bpf_trace.c | 52 ++++++++++++++++++++++++++++++++++ tools/include/uapi/linux/bpf.h | 28 ++++++++++++++++++ 6 files changed, 157 insertions(+)