Message ID | bc06e1f4bef09ba3d431d7a7236303746a7adb57.1727329823.git.vmalik@redhat.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | bpf: Add kfuncs for read-only string operations | expand |
On Wed, Sep 25, 2024 at 11:18 PM Viktor Malik <vmalik@redhat.com> wrote: > > Kernel contains highly optimised implementation of traditional string > operations. Expose them as kfuncs to allow BPF programs leverage the > kernel implementation instead of needing to reimplement the operations. > > For now, add kfuncs for all string operations which do not copy memory > around: strcmp, strchr, strrchr, strnchr, strstr, strnstr, strlen, > strnlen, strpbrk, strspn, strcspn. Do not add strncmp as it is already > exposed as a helper. > > Signed-off-by: Viktor Malik <vmalik@redhat.com> > --- > kernel/bpf/helpers.c | 66 ++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 66 insertions(+) > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > index 1a43d06eab28..daa19760d8c8 100644 > --- a/kernel/bpf/helpers.c > +++ b/kernel/bpf/helpers.c > @@ -3004,6 +3004,61 @@ __bpf_kfunc int bpf_copy_from_user_str(void *dst, u32 dst__sz, const void __user > return ret + 1; > } > > +__bpf_kfunc int bpf_strcmp(const char *cs, const char *ct) I don't think this will work as you hope it will work. I suspect BPF verifier thinks you are asking to a pointer to a singular char (1-byte memory, basically), and expects kfunc to not access beyond that single byte. Which is not what you are doing, so it's a violation of declared kfunc contract. We do have support to pass constant string pointers that are known at verification time (see bpf_strncmp() and also is_kfunc_arg_const_str() for kfunc-like equivalent), but I don't think that's what you want here. Right now, the only way to pass dynamically sized anything is through dynptr, AFAIU. pw-bot: cr > +{ > + return strcmp(cs, ct); > +} > + > +__bpf_kfunc char *bpf_strchr(const char *s, int c) > +{ > + return strchr(s, c); > +} > + > +__bpf_kfunc char *bpf_strrchr(const char *s, int c) > +{ > + return strrchr(s, c); > +} > + > +__bpf_kfunc char *bpf_strnchr(const char *s, size_t count, int c) > +{ > + return strnchr(s, count, c); > +} > + > +__bpf_kfunc char *bpf_strstr(const char *s1, const char *s2) > +{ > + return strstr(s1, s2); > +} > + > +__bpf_kfunc char *bpf_strnstr(const char *s1, const char *s2, size_t len) > +{ > + return strnstr(s1, s2, len); > +} > + > +__bpf_kfunc size_t bpf_strlen(const char *s) > +{ > + return strlen(s); > +} > + > +__bpf_kfunc size_t bpf_strnlen(const char *s, size_t count) > +{ > + return strnlen(s, count); > +} > + > +__bpf_kfunc char *bpf_strpbrk(const char *cs, const char *ct) > +{ > + return strpbrk(cs, ct); > +} > + > +__bpf_kfunc size_t bpf_strspn(const char *s, const char *accept) > +{ > + return strspn(s, accept); > +} > + > +__bpf_kfunc size_t bpf_strcspn(const char *s, const char *reject) > +{ > + return strcspn(s, reject); > +} > + > __bpf_kfunc_end_defs(); > > BTF_KFUNCS_START(generic_btf_ids) > @@ -3090,6 +3145,17 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW) > BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL) > BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY) > BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE) > +BTF_ID_FLAGS(func, bpf_strcmp) > +BTF_ID_FLAGS(func, bpf_strchr) > +BTF_ID_FLAGS(func, bpf_strrchr) > +BTF_ID_FLAGS(func, bpf_strnchr) > +BTF_ID_FLAGS(func, bpf_strstr) > +BTF_ID_FLAGS(func, bpf_strnstr) > +BTF_ID_FLAGS(func, bpf_strlen) > +BTF_ID_FLAGS(func, bpf_strnlen) > +BTF_ID_FLAGS(func, bpf_strpbrk) > +BTF_ID_FLAGS(func, bpf_strspn) > +BTF_ID_FLAGS(func, bpf_strcspn) > BTF_KFUNCS_END(common_btf_ids) > > static const struct btf_kfunc_id_set common_kfunc_set = { > -- > 2.46.0 >
On Mon, 2024-09-30 at 15:00 -0700, Andrii Nakryiko wrote: [...] > Right now, the only way to pass dynamically sized anything is through > dynptr, AFAIU. But we do have 'is_kfunc_arg_mem_size()' that checks for __sz suffix, e.g. used for bpf_copy_from_user_str(): /** * bpf_copy_from_user_str() - Copy a string from an unsafe user address * @dst: Destination address, in kernel space. This buffer must be * at least @dst__sz bytes long. * @dst__sz: Maximum number of bytes to copy, includes the trailing NUL. * ... */ __bpf_kfunc int bpf_copy_from_user_str(void *dst, u32 dst__sz, const void __user *unsafe_ptr__ign, u64 flags) However, this suffix won't work for strnstr because of the arguments order. [...]
On Tue, Oct 1, 2024 at 4:26 AM Eduard Zingerman <eddyz87@gmail.com> wrote: > > On Mon, 2024-09-30 at 15:00 -0700, Andrii Nakryiko wrote: > > [...] > > > Right now, the only way to pass dynamically sized anything is through > > dynptr, AFAIU. > > But we do have 'is_kfunc_arg_mem_size()' that checks for __sz suffix, > e.g. used for bpf_copy_from_user_str(): > > /** > * bpf_copy_from_user_str() - Copy a string from an unsafe user address > * @dst: Destination address, in kernel space. This buffer must be > * at least @dst__sz bytes long. > * @dst__sz: Maximum number of bytes to copy, includes the trailing NUL. > * ... > */ > __bpf_kfunc int bpf_copy_from_user_str(void *dst, u32 dst__sz, const void __user *unsafe_ptr__ign, u64 flags) > > However, this suffix won't work for strnstr because of the arguments order. Stating the obvious... we don't need to keep the order exactly the same. Regarding all of these kfuncs... as Andrii pointed out 'const char *s' means that the verifier will check that 's' points to a valid byte. I think we can do a hybrid static + dynamic safety scheme here. All of the kfunc signatures can stay the same, but we'd have to open code all string helpers with __get_kernel_nofault() instead of direct memory access. Since the first byte is guaranteed to be valid by the verifier we only need to make sure that the s+N bytes won't cause page faults and __get_kernel_nofault is an efficient mechanism to do that. It's just an annotated load. No extra overhead. So readonly kfuncs can look like: bpf_str...(const char *src) while kfuncs that need a destination buffer will look like: bpf_str...(void *dst, u32 dst__sz, ...) bpf_strcpy(), strncpy, strlcpy shouldn't be introduced though. but bpf_strscpy_pad(void *dst, u32 dst__sz, const char *src) would be good to have. And it will be just as fast as strscpy_pad().
On Tue, Oct 1, 2024 at 7:48 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Tue, Oct 1, 2024 at 4:26 AM Eduard Zingerman <eddyz87@gmail.com> wrote: > > > > On Mon, 2024-09-30 at 15:00 -0700, Andrii Nakryiko wrote: > > > > [...] > > > > > Right now, the only way to pass dynamically sized anything is through > > > dynptr, AFAIU. > > > > But we do have 'is_kfunc_arg_mem_size()' that checks for __sz suffix, > > e.g. used for bpf_copy_from_user_str(): > > > > /** > > * bpf_copy_from_user_str() - Copy a string from an unsafe user address > > * @dst: Destination address, in kernel space. This buffer must be > > * at least @dst__sz bytes long. > > * @dst__sz: Maximum number of bytes to copy, includes the trailing NUL. > > * ... > > */ > > __bpf_kfunc int bpf_copy_from_user_str(void *dst, u32 dst__sz, const void __user *unsafe_ptr__ign, u64 flags) > > > > However, this suffix won't work for strnstr because of the arguments order. > > Stating the obvious... we don't need to keep the order exactly the same. > > Regarding all of these kfuncs... as Andrii pointed out 'const char *s' > means that the verifier will check that 's' points to a valid byte. > I think we can do a hybrid static + dynamic safety scheme here. > All of the kfunc signatures can stay the same, but we'd have to > open code all string helpers with __get_kernel_nofault() instead of > direct memory access. > Since the first byte is guaranteed to be valid by the verifier > we only need to make sure that the s+N bytes won't cause page faults You mean to just check that s[N-1] can be read? Given a large enough N, couldn't it be that some page between s[0] and s[N-1] still can be unmapped, defeating this check? > and __get_kernel_nofault is an efficient mechanism to do that. > It's just an annotated load. No extra overhead. > > So readonly kfuncs can look like: > bpf_str...(const char *src) > > while kfuncs that need a destination buffer will look like: > bpf_str...(void *dst, u32 dst__sz, ...) > > bpf_strcpy(), strncpy, strlcpy shouldn't be introduced though. > > but bpf_strscpy_pad(void *dst, u32 dst__sz, const char *src) > would be good to have. > And it will be just as fast as strscpy_pad().
On Tue, Oct 1, 2024 at 10:04 AM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Tue, Oct 1, 2024 at 7:48 AM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Tue, Oct 1, 2024 at 4:26 AM Eduard Zingerman <eddyz87@gmail.com> wrote: > > > > > > On Mon, 2024-09-30 at 15:00 -0700, Andrii Nakryiko wrote: > > > > > > [...] > > > > > > > Right now, the only way to pass dynamically sized anything is through > > > > dynptr, AFAIU. > > > > > > But we do have 'is_kfunc_arg_mem_size()' that checks for __sz suffix, > > > e.g. used for bpf_copy_from_user_str(): > > > > > > /** > > > * bpf_copy_from_user_str() - Copy a string from an unsafe user address > > > * @dst: Destination address, in kernel space. This buffer must be > > > * at least @dst__sz bytes long. > > > * @dst__sz: Maximum number of bytes to copy, includes the trailing NUL. > > > * ... > > > */ > > > __bpf_kfunc int bpf_copy_from_user_str(void *dst, u32 dst__sz, const void __user *unsafe_ptr__ign, u64 flags) > > > > > > However, this suffix won't work for strnstr because of the arguments order. > > > > Stating the obvious... we don't need to keep the order exactly the same. > > > > Regarding all of these kfuncs... as Andrii pointed out 'const char *s' > > means that the verifier will check that 's' points to a valid byte. > > I think we can do a hybrid static + dynamic safety scheme here. > > All of the kfunc signatures can stay the same, but we'd have to > > open code all string helpers with __get_kernel_nofault() instead of > > direct memory access. > > Since the first byte is guaranteed to be valid by the verifier > > we only need to make sure that the s+N bytes won't cause page faults > > You mean to just check that s[N-1] can be read? Given a large enough > N, couldn't it be that some page between s[0] and s[N-1] still can be > unmapped, defeating this check? Just checking s[0] and s[N-1] is not enough, obviously, and especially, since the logic won't know where nul byte is, so N is unknown. I meant to that all of str* kfuncs will be reading all bytes via __get_kernel_nofault() until they find \0. It can be optimized to 8 byte access. The open coding (aka copy-paste) is unfortunate, of course.
On Tue, Oct 1, 2024 at 10:34 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Tue, Oct 1, 2024 at 10:04 AM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > On Tue, Oct 1, 2024 at 7:48 AM Alexei Starovoitov > > <alexei.starovoitov@gmail.com> wrote: > > > > > > On Tue, Oct 1, 2024 at 4:26 AM Eduard Zingerman <eddyz87@gmail.com> wrote: > > > > > > > > On Mon, 2024-09-30 at 15:00 -0700, Andrii Nakryiko wrote: > > > > > > > > [...] > > > > > > > > > Right now, the only way to pass dynamically sized anything is through > > > > > dynptr, AFAIU. > > > > > > > > But we do have 'is_kfunc_arg_mem_size()' that checks for __sz suffix, > > > > e.g. used for bpf_copy_from_user_str(): > > > > > > > > /** > > > > * bpf_copy_from_user_str() - Copy a string from an unsafe user address > > > > * @dst: Destination address, in kernel space. This buffer must be > > > > * at least @dst__sz bytes long. > > > > * @dst__sz: Maximum number of bytes to copy, includes the trailing NUL. > > > > * ... > > > > */ > > > > __bpf_kfunc int bpf_copy_from_user_str(void *dst, u32 dst__sz, const void __user *unsafe_ptr__ign, u64 flags) > > > > > > > > However, this suffix won't work for strnstr because of the arguments order. > > > > > > Stating the obvious... we don't need to keep the order exactly the same. > > > > > > Regarding all of these kfuncs... as Andrii pointed out 'const char *s' > > > means that the verifier will check that 's' points to a valid byte. > > > I think we can do a hybrid static + dynamic safety scheme here. > > > All of the kfunc signatures can stay the same, but we'd have to > > > open code all string helpers with __get_kernel_nofault() instead of > > > direct memory access. > > > Since the first byte is guaranteed to be valid by the verifier > > > we only need to make sure that the s+N bytes won't cause page faults > > > > You mean to just check that s[N-1] can be read? Given a large enough > > N, couldn't it be that some page between s[0] and s[N-1] still can be > > unmapped, defeating this check? > > Just checking s[0] and s[N-1] is not enough, obviously, and especially, > since the logic won't know where nul byte is, so N is unknown. > I meant to that all of str* kfuncs will be reading all bytes > via __get_kernel_nofault() until they find \0. Ah, ok, I see what you mean now. > It can be optimized to 8 byte access. > The open coding (aka copy-paste) is unfortunate, of course. Yep, this sucks.
On 10/1/24 19:40, Andrii Nakryiko wrote: > On Tue, Oct 1, 2024 at 10:34 AM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: >> >> On Tue, Oct 1, 2024 at 10:04 AM Andrii Nakryiko >> <andrii.nakryiko@gmail.com> wrote: >>> >>> On Tue, Oct 1, 2024 at 7:48 AM Alexei Starovoitov >>> <alexei.starovoitov@gmail.com> wrote: >>>> >>>> On Tue, Oct 1, 2024 at 4:26 AM Eduard Zingerman <eddyz87@gmail.com> wrote: >>>>> >>>>> On Mon, 2024-09-30 at 15:00 -0700, Andrii Nakryiko wrote: >>>>> >>>>> [...] >>>>> >>>>>> Right now, the only way to pass dynamically sized anything is through >>>>>> dynptr, AFAIU. >>>>> >>>>> But we do have 'is_kfunc_arg_mem_size()' that checks for __sz suffix, >>>>> e.g. used for bpf_copy_from_user_str(): >>>>> >>>>> /** >>>>> * bpf_copy_from_user_str() - Copy a string from an unsafe user address >>>>> * @dst: Destination address, in kernel space. This buffer must be >>>>> * at least @dst__sz bytes long. >>>>> * @dst__sz: Maximum number of bytes to copy, includes the trailing NUL. >>>>> * ... >>>>> */ >>>>> __bpf_kfunc int bpf_copy_from_user_str(void *dst, u32 dst__sz, const void __user *unsafe_ptr__ign, u64 flags) >>>>> >>>>> However, this suffix won't work for strnstr because of the arguments order. >>>> >>>> Stating the obvious... we don't need to keep the order exactly the same. >>>> >>>> Regarding all of these kfuncs... as Andrii pointed out 'const char *s' >>>> means that the verifier will check that 's' points to a valid byte. >>>> I think we can do a hybrid static + dynamic safety scheme here. >>>> All of the kfunc signatures can stay the same, but we'd have to >>>> open code all string helpers with __get_kernel_nofault() instead of >>>> direct memory access. >>>> Since the first byte is guaranteed to be valid by the verifier >>>> we only need to make sure that the s+N bytes won't cause page faults >>> >>> You mean to just check that s[N-1] can be read? Given a large enough >>> N, couldn't it be that some page between s[0] and s[N-1] still can be >>> unmapped, defeating this check? >> >> Just checking s[0] and s[N-1] is not enough, obviously, and especially, >> since the logic won't know where nul byte is, so N is unknown. >> I meant to that all of str* kfuncs will be reading all bytes >> via __get_kernel_nofault() until they find \0. > > Ah, ok, I see what you mean now. > >> It can be optimized to 8 byte access. >> The open coding (aka copy-paste) is unfortunate, of course. > > Yep, this sucks. Yeah, that's quite annoying. I really wanted to avoid doing that. Also, we won't be able to use arch-optimized versions of the functions. Just to make sure I understand things correctly - can we do what Eduard suggested and add explicit sizes for all arguments using the __sz suffix? So something like: const char *bpf_strnstr(const char *s1, u32 s1__sz, const char *s2, u32 s2__sz); Or that would still not be sufficient b/c the strings may still be unsafe and we need an additional safety check (using __get_kernel_nofault suggested by Alexei)?
On Tue, Oct 1, 2024 at 11:12 PM Viktor Malik <vmalik@redhat.com> wrote: > > On 10/1/24 19:40, Andrii Nakryiko wrote: > > On Tue, Oct 1, 2024 at 10:34 AM Alexei Starovoitov > > <alexei.starovoitov@gmail.com> wrote: > >> > >> On Tue, Oct 1, 2024 at 10:04 AM Andrii Nakryiko > >> <andrii.nakryiko@gmail.com> wrote: > >>> > >>> On Tue, Oct 1, 2024 at 7:48 AM Alexei Starovoitov > >>> <alexei.starovoitov@gmail.com> wrote: > >>>> > >>>> On Tue, Oct 1, 2024 at 4:26 AM Eduard Zingerman <eddyz87@gmail.com> wrote: > >>>>> > >>>>> On Mon, 2024-09-30 at 15:00 -0700, Andrii Nakryiko wrote: > >>>>> > >>>>> [...] > >>>>> > >>>>>> Right now, the only way to pass dynamically sized anything is through > >>>>>> dynptr, AFAIU. > >>>>> > >>>>> But we do have 'is_kfunc_arg_mem_size()' that checks for __sz suffix, > >>>>> e.g. used for bpf_copy_from_user_str(): > >>>>> > >>>>> /** > >>>>> * bpf_copy_from_user_str() - Copy a string from an unsafe user address > >>>>> * @dst: Destination address, in kernel space. This buffer must be > >>>>> * at least @dst__sz bytes long. > >>>>> * @dst__sz: Maximum number of bytes to copy, includes the trailing NUL. > >>>>> * ... > >>>>> */ > >>>>> __bpf_kfunc int bpf_copy_from_user_str(void *dst, u32 dst__sz, const void __user *unsafe_ptr__ign, u64 flags) > >>>>> > >>>>> However, this suffix won't work for strnstr because of the arguments order. > >>>> > >>>> Stating the obvious... we don't need to keep the order exactly the same. > >>>> > >>>> Regarding all of these kfuncs... as Andrii pointed out 'const char *s' > >>>> means that the verifier will check that 's' points to a valid byte. > >>>> I think we can do a hybrid static + dynamic safety scheme here. > >>>> All of the kfunc signatures can stay the same, but we'd have to > >>>> open code all string helpers with __get_kernel_nofault() instead of > >>>> direct memory access. > >>>> Since the first byte is guaranteed to be valid by the verifier > >>>> we only need to make sure that the s+N bytes won't cause page faults > >>> > >>> You mean to just check that s[N-1] can be read? Given a large enough > >>> N, couldn't it be that some page between s[0] and s[N-1] still can be > >>> unmapped, defeating this check? > >> > >> Just checking s[0] and s[N-1] is not enough, obviously, and especially, > >> since the logic won't know where nul byte is, so N is unknown. > >> I meant to that all of str* kfuncs will be reading all bytes > >> via __get_kernel_nofault() until they find \0. > > > > Ah, ok, I see what you mean now. > > > >> It can be optimized to 8 byte access. > >> The open coding (aka copy-paste) is unfortunate, of course. > > > > Yep, this sucks. > > Yeah, that's quite annoying. I really wanted to avoid doing that. Also, > we won't be able to use arch-optimized versions of the functions. > > Just to make sure I understand things correctly - can we do what Eduard > suggested and add explicit sizes for all arguments using the __sz > suffix? So something like: > > const char *bpf_strnstr(const char *s1, u32 s1__sz, const char *s2, u32 s2__sz); That's ok-ish, but you probably want: const char *bpf_strnstr(void *s1, u32 s1__sz, void *s2, u32 s2__sz); and then to call strnstr() you still need to strnlen(s2, s2__sz). But a more general question... how always passing size will work for bpftrace ? Does it always know the upper bound of storage where strings are stored? I would think __get_kernel_nofault() approach is user friendlier.
On 10/2/24 18:55, Alexei Starovoitov wrote: > On Tue, Oct 1, 2024 at 11:12 PM Viktor Malik <vmalik@redhat.com> wrote: >> >> On 10/1/24 19:40, Andrii Nakryiko wrote: >>> On Tue, Oct 1, 2024 at 10:34 AM Alexei Starovoitov >>> <alexei.starovoitov@gmail.com> wrote: >>>> >>>> On Tue, Oct 1, 2024 at 10:04 AM Andrii Nakryiko >>>> <andrii.nakryiko@gmail.com> wrote: >>>>> >>>>> On Tue, Oct 1, 2024 at 7:48 AM Alexei Starovoitov >>>>> <alexei.starovoitov@gmail.com> wrote: >>>>>> >>>>>> On Tue, Oct 1, 2024 at 4:26 AM Eduard Zingerman <eddyz87@gmail.com> wrote: >>>>>>> >>>>>>> On Mon, 2024-09-30 at 15:00 -0700, Andrii Nakryiko wrote: >>>>>>> >>>>>>> [...] >>>>>>> >>>>>>>> Right now, the only way to pass dynamically sized anything is through >>>>>>>> dynptr, AFAIU. >>>>>>> >>>>>>> But we do have 'is_kfunc_arg_mem_size()' that checks for __sz suffix, >>>>>>> e.g. used for bpf_copy_from_user_str(): >>>>>>> >>>>>>> /** >>>>>>> * bpf_copy_from_user_str() - Copy a string from an unsafe user address >>>>>>> * @dst: Destination address, in kernel space. This buffer must be >>>>>>> * at least @dst__sz bytes long. >>>>>>> * @dst__sz: Maximum number of bytes to copy, includes the trailing NUL. >>>>>>> * ... >>>>>>> */ >>>>>>> __bpf_kfunc int bpf_copy_from_user_str(void *dst, u32 dst__sz, const void __user *unsafe_ptr__ign, u64 flags) >>>>>>> >>>>>>> However, this suffix won't work for strnstr because of the arguments order. >>>>>> >>>>>> Stating the obvious... we don't need to keep the order exactly the same. >>>>>> >>>>>> Regarding all of these kfuncs... as Andrii pointed out 'const char *s' >>>>>> means that the verifier will check that 's' points to a valid byte. >>>>>> I think we can do a hybrid static + dynamic safety scheme here. >>>>>> All of the kfunc signatures can stay the same, but we'd have to >>>>>> open code all string helpers with __get_kernel_nofault() instead of >>>>>> direct memory access. >>>>>> Since the first byte is guaranteed to be valid by the verifier >>>>>> we only need to make sure that the s+N bytes won't cause page faults >>>>> >>>>> You mean to just check that s[N-1] can be read? Given a large enough >>>>> N, couldn't it be that some page between s[0] and s[N-1] still can be >>>>> unmapped, defeating this check? >>>> >>>> Just checking s[0] and s[N-1] is not enough, obviously, and especially, >>>> since the logic won't know where nul byte is, so N is unknown. >>>> I meant to that all of str* kfuncs will be reading all bytes >>>> via __get_kernel_nofault() until they find \0. >>> >>> Ah, ok, I see what you mean now. >>> >>>> It can be optimized to 8 byte access. >>>> The open coding (aka copy-paste) is unfortunate, of course. >>> >>> Yep, this sucks. >> >> Yeah, that's quite annoying. I really wanted to avoid doing that. Also, >> we won't be able to use arch-optimized versions of the functions. >> >> Just to make sure I understand things correctly - can we do what Eduard >> suggested and add explicit sizes for all arguments using the __sz >> suffix? So something like: >> >> const char *bpf_strnstr(const char *s1, u32 s1__sz, const char *s2, u32 s2__sz); > > That's ok-ish, but you probably want: > > const char *bpf_strnstr(void *s1, u32 s1__sz, void *s2, u32 s2__sz); > > and then to call strnstr() you still need to strnlen(s2, s2__sz). > > But a more general question... how always passing size will work > for bpftrace ? Does it always know the upper bound of storage where > strings are stored? Yes, it does. The strings must be read via the str() call (which internally calls bpf_probe_read_str) and there's an upper bound on the size of each string. > I would think __get_kernel_nofault() approach is user friendlier. That's probably true but isn't there still the problem that strings are not necessarily null-terminated? And in such case, unbounded string functions may not terminate which is not allowed in BPF?
On Wed, Oct 2, 2024 at 9:51 PM Viktor Malik <vmalik@redhat.com> wrote: > > On 10/2/24 18:55, Alexei Starovoitov wrote: > > On Tue, Oct 1, 2024 at 11:12 PM Viktor Malik <vmalik@redhat.com> wrote: > >> > >> On 10/1/24 19:40, Andrii Nakryiko wrote: > >>> On Tue, Oct 1, 2024 at 10:34 AM Alexei Starovoitov > >>> <alexei.starovoitov@gmail.com> wrote: > >>>> > >>>> On Tue, Oct 1, 2024 at 10:04 AM Andrii Nakryiko > >>>> <andrii.nakryiko@gmail.com> wrote: > >>>>> > >>>>> On Tue, Oct 1, 2024 at 7:48 AM Alexei Starovoitov > >>>>> <alexei.starovoitov@gmail.com> wrote: > >>>>>> > >>>>>> On Tue, Oct 1, 2024 at 4:26 AM Eduard Zingerman <eddyz87@gmail.com> wrote: > >>>>>>> > >>>>>>> On Mon, 2024-09-30 at 15:00 -0700, Andrii Nakryiko wrote: > >>>>>>> > >>>>>>> [...] > >>>>>>> > >>>>>>>> Right now, the only way to pass dynamically sized anything is through > >>>>>>>> dynptr, AFAIU. > >>>>>>> > >>>>>>> But we do have 'is_kfunc_arg_mem_size()' that checks for __sz suffix, > >>>>>>> e.g. used for bpf_copy_from_user_str(): > >>>>>>> > >>>>>>> /** > >>>>>>> * bpf_copy_from_user_str() - Copy a string from an unsafe user address > >>>>>>> * @dst: Destination address, in kernel space. This buffer must be > >>>>>>> * at least @dst__sz bytes long. > >>>>>>> * @dst__sz: Maximum number of bytes to copy, includes the trailing NUL. > >>>>>>> * ... > >>>>>>> */ > >>>>>>> __bpf_kfunc int bpf_copy_from_user_str(void *dst, u32 dst__sz, const void __user *unsafe_ptr__ign, u64 flags) > >>>>>>> > >>>>>>> However, this suffix won't work for strnstr because of the arguments order. > >>>>>> > >>>>>> Stating the obvious... we don't need to keep the order exactly the same. > >>>>>> > >>>>>> Regarding all of these kfuncs... as Andrii pointed out 'const char *s' > >>>>>> means that the verifier will check that 's' points to a valid byte. > >>>>>> I think we can do a hybrid static + dynamic safety scheme here. > >>>>>> All of the kfunc signatures can stay the same, but we'd have to > >>>>>> open code all string helpers with __get_kernel_nofault() instead of > >>>>>> direct memory access. > >>>>>> Since the first byte is guaranteed to be valid by the verifier > >>>>>> we only need to make sure that the s+N bytes won't cause page faults > >>>>> > >>>>> You mean to just check that s[N-1] can be read? Given a large enough > >>>>> N, couldn't it be that some page between s[0] and s[N-1] still can be > >>>>> unmapped, defeating this check? > >>>> > >>>> Just checking s[0] and s[N-1] is not enough, obviously, and especially, > >>>> since the logic won't know where nul byte is, so N is unknown. > >>>> I meant to that all of str* kfuncs will be reading all bytes > >>>> via __get_kernel_nofault() until they find \0. > >>> > >>> Ah, ok, I see what you mean now. > >>> > >>>> It can be optimized to 8 byte access. > >>>> The open coding (aka copy-paste) is unfortunate, of course. > >>> > >>> Yep, this sucks. > >> > >> Yeah, that's quite annoying. I really wanted to avoid doing that. Also, > >> we won't be able to use arch-optimized versions of the functions. > >> > >> Just to make sure I understand things correctly - can we do what Eduard > >> suggested and add explicit sizes for all arguments using the __sz > >> suffix? So something like: > >> > >> const char *bpf_strnstr(const char *s1, u32 s1__sz, const char *s2, u32 s2__sz); > > > > That's ok-ish, but you probably want: > > > > const char *bpf_strnstr(void *s1, u32 s1__sz, void *s2, u32 s2__sz); > > > > and then to call strnstr() you still need to strnlen(s2, s2__sz). > > > > But a more general question... how always passing size will work > > for bpftrace ? Does it always know the upper bound of storage where > > strings are stored? > > Yes, it does. The strings must be read via the str() call (which > internally calls bpf_probe_read_str) and there's an upper bound on the > size of each string. which sounds like a bpftrace current limitation and not something to depend on from kfunc design pov. Wouldn't you want strings to be arbitrary length? > > I would think __get_kernel_nofault() approach is user friendlier. > > That's probably true but isn't there still the problem that strings are > not necessarily null-terminated? And in such case, unbounded string > functions may not terminate which is not allowed in BPF? kfuncs that are searching for nul via loop with __get_kernel_nofault() would certainly need an upper bound. Something like PATH_MAX (4k) or XATTR_SIZE_MAX (64k) would cover 99.99% of use cases.
On 10/3/24 19:02, Alexei Starovoitov wrote: > On Wed, Oct 2, 2024 at 9:51 PM Viktor Malik <vmalik@redhat.com> wrote: >> >> On 10/2/24 18:55, Alexei Starovoitov wrote: >>> On Tue, Oct 1, 2024 at 11:12 PM Viktor Malik <vmalik@redhat.com> wrote: >>>> >>>> On 10/1/24 19:40, Andrii Nakryiko wrote: >>>>> On Tue, Oct 1, 2024 at 10:34 AM Alexei Starovoitov >>>>> <alexei.starovoitov@gmail.com> wrote: >>>>>> >>>>>> On Tue, Oct 1, 2024 at 10:04 AM Andrii Nakryiko >>>>>> <andrii.nakryiko@gmail.com> wrote: >>>>>>> >>>>>>> On Tue, Oct 1, 2024 at 7:48 AM Alexei Starovoitov >>>>>>> <alexei.starovoitov@gmail.com> wrote: >>>>>>>> >>>>>>>> On Tue, Oct 1, 2024 at 4:26 AM Eduard Zingerman <eddyz87@gmail.com> wrote: >>>>>>>>> >>>>>>>>> On Mon, 2024-09-30 at 15:00 -0700, Andrii Nakryiko wrote: >>>>>>>>> >>>>>>>>> [...] >>>>>>>>> >>>>>>>>>> Right now, the only way to pass dynamically sized anything is through >>>>>>>>>> dynptr, AFAIU. >>>>>>>>> >>>>>>>>> But we do have 'is_kfunc_arg_mem_size()' that checks for __sz suffix, >>>>>>>>> e.g. used for bpf_copy_from_user_str(): >>>>>>>>> >>>>>>>>> /** >>>>>>>>> * bpf_copy_from_user_str() - Copy a string from an unsafe user address >>>>>>>>> * @dst: Destination address, in kernel space. This buffer must be >>>>>>>>> * at least @dst__sz bytes long. >>>>>>>>> * @dst__sz: Maximum number of bytes to copy, includes the trailing NUL. >>>>>>>>> * ... >>>>>>>>> */ >>>>>>>>> __bpf_kfunc int bpf_copy_from_user_str(void *dst, u32 dst__sz, const void __user *unsafe_ptr__ign, u64 flags) >>>>>>>>> >>>>>>>>> However, this suffix won't work for strnstr because of the arguments order. >>>>>>>> >>>>>>>> Stating the obvious... we don't need to keep the order exactly the same. >>>>>>>> >>>>>>>> Regarding all of these kfuncs... as Andrii pointed out 'const char *s' >>>>>>>> means that the verifier will check that 's' points to a valid byte. >>>>>>>> I think we can do a hybrid static + dynamic safety scheme here. >>>>>>>> All of the kfunc signatures can stay the same, but we'd have to >>>>>>>> open code all string helpers with __get_kernel_nofault() instead of >>>>>>>> direct memory access. >>>>>>>> Since the first byte is guaranteed to be valid by the verifier >>>>>>>> we only need to make sure that the s+N bytes won't cause page faults >>>>>>> >>>>>>> You mean to just check that s[N-1] can be read? Given a large enough >>>>>>> N, couldn't it be that some page between s[0] and s[N-1] still can be >>>>>>> unmapped, defeating this check? >>>>>> >>>>>> Just checking s[0] and s[N-1] is not enough, obviously, and especially, >>>>>> since the logic won't know where nul byte is, so N is unknown. >>>>>> I meant to that all of str* kfuncs will be reading all bytes >>>>>> via __get_kernel_nofault() until they find \0. >>>>> >>>>> Ah, ok, I see what you mean now. >>>>> >>>>>> It can be optimized to 8 byte access. >>>>>> The open coding (aka copy-paste) is unfortunate, of course. >>>>> >>>>> Yep, this sucks. >>>> >>>> Yeah, that's quite annoying. I really wanted to avoid doing that. Also, >>>> we won't be able to use arch-optimized versions of the functions. >>>> >>>> Just to make sure I understand things correctly - can we do what Eduard >>>> suggested and add explicit sizes for all arguments using the __sz >>>> suffix? So something like: >>>> >>>> const char *bpf_strnstr(const char *s1, u32 s1__sz, const char *s2, u32 s2__sz); >>> >>> That's ok-ish, but you probably want: >>> >>> const char *bpf_strnstr(void *s1, u32 s1__sz, void *s2, u32 s2__sz); >>> >>> and then to call strnstr() you still need to strnlen(s2, s2__sz). >>> >>> But a more general question... how always passing size will work >>> for bpftrace ? Does it always know the upper bound of storage where >>> strings are stored? >> >> Yes, it does. The strings must be read via the str() call (which >> internally calls bpf_probe_read_str) and there's an upper bound on the >> size of each string. > > which sounds like a bpftrace current limitation and not something to > depend on from kfunc design pov. > Wouldn't you want strings to be arbitrary length? Sure, there's just a lot of work to be done on that front... But I agree, it shouldn't be a limitation for the kfuncs. I just wanted to point out that if we agreed to only create kfuncs for bounded functions, bpftrace use-case would be (at least for the moment) covered. Anyways, it seems to me that both the bounded and the unbounded versions have their place. Would it be ok with you to open-code just the unbounded ones and call in-kernel implementations for the bounded ones? Or would you prefer to have everything unified (i.e. open-coded)? Also, just out of curiosity, what are the ways to create/obtain strings of unbounded length in BPF programs? Arguments of BTF-enabled program types (like fentry)? Any other examples? Because IIUC, when you read strings from kernel/userspace memory using bpf_probe_read_str, you always need to specify the size. > >>> I would think __get_kernel_nofault() approach is user friendlier. >> >> That's probably true but isn't there still the problem that strings are >> not necessarily null-terminated? And in such case, unbounded string >> functions may not terminate which is not allowed in BPF? > > kfuncs that are searching for nul via loop with __get_kernel_nofault() > would certainly need an upper bound. > Something like PATH_MAX (4k) or XATTR_SIZE_MAX (64k) > would cover 99.99% of use cases. Right, makes sense. Thanks! Viktor >
On Thu, Oct 3, 2024 at 12:37 PM Viktor Malik <vmalik@redhat.com> wrote: > > Anyways, it seems to me that both the bounded and the unbounded versions > have their place. Would it be ok with you to open-code just the > unbounded ones and call in-kernel implementations for the bounded ones? Right. Open coding unbounded ones is not a lot of code. We can copy paste from arch/x86/boot/string.c and replace pointer deref with __get_kernel_nofault(). No need to be fancy. The bounded ones should call into in-kernel bits that are optimized in asm. Documenting the difference in performance between bounded vs unbounded should be part of the patch. > Also, just out of curiosity, what are the ways to create/obtain strings > of unbounded length in BPF programs? Arguments of BTF-enabled program > types (like fentry)? Any other examples? Because IIUC, when you read > strings from kernel/userspace memory using bpf_probe_read_str, you > always need to specify the size. The main use case is argv/env processing in bpf-lsm programs. These strings are nul terminated and can be very large. Attackers use multi megabyte env vars to hide things. Folks push them into ringbuf and strstr() in user space as a workaround. Unbounded bpf_strstr() kfunc would be handy.
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 1a43d06eab28..daa19760d8c8 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -3004,6 +3004,61 @@ __bpf_kfunc int bpf_copy_from_user_str(void *dst, u32 dst__sz, const void __user return ret + 1; } +__bpf_kfunc int bpf_strcmp(const char *cs, const char *ct) +{ + return strcmp(cs, ct); +} + +__bpf_kfunc char *bpf_strchr(const char *s, int c) +{ + return strchr(s, c); +} + +__bpf_kfunc char *bpf_strrchr(const char *s, int c) +{ + return strrchr(s, c); +} + +__bpf_kfunc char *bpf_strnchr(const char *s, size_t count, int c) +{ + return strnchr(s, count, c); +} + +__bpf_kfunc char *bpf_strstr(const char *s1, const char *s2) +{ + return strstr(s1, s2); +} + +__bpf_kfunc char *bpf_strnstr(const char *s1, const char *s2, size_t len) +{ + return strnstr(s1, s2, len); +} + +__bpf_kfunc size_t bpf_strlen(const char *s) +{ + return strlen(s); +} + +__bpf_kfunc size_t bpf_strnlen(const char *s, size_t count) +{ + return strnlen(s, count); +} + +__bpf_kfunc char *bpf_strpbrk(const char *cs, const char *ct) +{ + return strpbrk(cs, ct); +} + +__bpf_kfunc size_t bpf_strspn(const char *s, const char *accept) +{ + return strspn(s, accept); +} + +__bpf_kfunc size_t bpf_strcspn(const char *s, const char *reject) +{ + return strcspn(s, reject); +} + __bpf_kfunc_end_defs(); BTF_KFUNCS_START(generic_btf_ids) @@ -3090,6 +3145,17 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW) BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY) BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE) +BTF_ID_FLAGS(func, bpf_strcmp) +BTF_ID_FLAGS(func, bpf_strchr) +BTF_ID_FLAGS(func, bpf_strrchr) +BTF_ID_FLAGS(func, bpf_strnchr) +BTF_ID_FLAGS(func, bpf_strstr) +BTF_ID_FLAGS(func, bpf_strnstr) +BTF_ID_FLAGS(func, bpf_strlen) +BTF_ID_FLAGS(func, bpf_strnlen) +BTF_ID_FLAGS(func, bpf_strpbrk) +BTF_ID_FLAGS(func, bpf_strspn) +BTF_ID_FLAGS(func, bpf_strcspn) BTF_KFUNCS_END(common_btf_ids) static const struct btf_kfunc_id_set common_kfunc_set = {
Kernel contains highly optimised implementation of traditional string operations. Expose them as kfuncs to allow BPF programs leverage the kernel implementation instead of needing to reimplement the operations. For now, add kfuncs for all string operations which do not copy memory around: strcmp, strchr, strrchr, strnchr, strstr, strnstr, strlen, strnlen, strpbrk, strspn, strcspn. Do not add strncmp as it is already exposed as a helper. Signed-off-by: Viktor Malik <vmalik@redhat.com> --- kernel/bpf/helpers.c | 66 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 66 insertions(+)