Message ID | 20230510122045.2259-1-zegao@tencent.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | bpf: reject blacklisted symbols in kprobe_multi to avoid recursive trap | expand |
On 5/10/23 5:20 AM, Ze Gao wrote: > BPF_LINK_TYPE_KPROBE_MULTI attaches kprobe programs through fprobe, > however it does not takes those kprobe blacklisted into consideration, > which likely introduce recursive traps and blows up stacks. > > this patch adds simple check and remove those are in kprobe_blacklist > from one fprobe during bpf_kprobe_multi_link_attach. And also > check_kprobe_address_safe is open for more future checks. > > note that ftrace provides recursion detection mechanism, but for kprobe > only, we can directly reject those cases early without turning to ftrace. > > Signed-off-by: Ze Gao <zegao@tencent.com> > --- > kernel/trace/bpf_trace.c | 37 +++++++++++++++++++++++++++++++++++++ > 1 file changed, 37 insertions(+) > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > index 9a050e36dc6c..44c68bc06bbd 100644 > --- a/kernel/trace/bpf_trace.c > +++ b/kernel/trace/bpf_trace.c > @@ -2764,6 +2764,37 @@ static int get_modules_for_addrs(struct module ***mods, unsigned long *addrs, u3 > return arr.mods_cnt; > } > > +static inline int check_kprobe_address_safe(unsigned long addr) > +{ > + if (within_kprobe_blacklist(addr)) > + return -EINVAL; > + else > + return 0; > +} > + > +static int check_bpf_kprobe_addrs_safe(unsigned long *addrs, int num) > +{ > + int i, cnt; > + char symname[KSYM_NAME_LEN]; > + > + for (i = 0; i < num; ++i) { > + if (check_kprobe_address_safe((unsigned long)addrs[i])) { > + lookup_symbol_name(addrs[i], symname); > + pr_warn("bpf_kprobe: %s at %lx is blacklisted\n", symname, addrs[i]); So user request cannot be fulfilled and a warning is issued and some of user requests are discarded and the rest is proceeded. Does not sound a good idea. Maybe we should do filtering in user space, e.g., in libbpf, check /sys/kernel/debug/kprobes/blacklist and return error earlier? bpftrace/libbpf-tools/bcc-tools all do filtering before requesting kprobe in the kernel. > + /* mark blacklisted symbol for remove */ > + addrs[i] = 0; > + } > + } > + > + /* remove blacklisted symbol from addrs */ > + for (i = 0, cnt = 0; i < num; ++i) { > + if (addrs[i]) > + addrs[cnt++] = addrs[i]; > + } > + > + return cnt; > +} > + > int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) > { > struct bpf_kprobe_multi_link *link = NULL; > @@ -2859,6 +2890,12 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr > else > link->fp.entry_handler = kprobe_multi_link_handler; > > + cnt = check_bpf_kprobe_addrs_safe(addrs, cnt); > + if (!cnt) { > + err = -EINVAL; > + goto error; > + } > + > link->addrs = addrs; > link->cookies = cookies; > link->cnt = cnt;
On Wed, May 10, 2023 at 07:13:58AM -0700, Yonghong Song wrote: > > > On 5/10/23 5:20 AM, Ze Gao wrote: > > BPF_LINK_TYPE_KPROBE_MULTI attaches kprobe programs through fprobe, > > however it does not takes those kprobe blacklisted into consideration, > > which likely introduce recursive traps and blows up stacks. > > > > this patch adds simple check and remove those are in kprobe_blacklist > > from one fprobe during bpf_kprobe_multi_link_attach. And also > > check_kprobe_address_safe is open for more future checks. > > > > note that ftrace provides recursion detection mechanism, but for kprobe > > only, we can directly reject those cases early without turning to ftrace. > > > > Signed-off-by: Ze Gao <zegao@tencent.com> > > --- > > kernel/trace/bpf_trace.c | 37 +++++++++++++++++++++++++++++++++++++ > > 1 file changed, 37 insertions(+) > > > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > > index 9a050e36dc6c..44c68bc06bbd 100644 > > --- a/kernel/trace/bpf_trace.c > > +++ b/kernel/trace/bpf_trace.c > > @@ -2764,6 +2764,37 @@ static int get_modules_for_addrs(struct module ***mods, unsigned long *addrs, u3 > > return arr.mods_cnt; > > } > > +static inline int check_kprobe_address_safe(unsigned long addr) > > +{ > > + if (within_kprobe_blacklist(addr)) > > + return -EINVAL; > > + else > > + return 0; > > +} > > + > > +static int check_bpf_kprobe_addrs_safe(unsigned long *addrs, int num) > > +{ > > + int i, cnt; > > + char symname[KSYM_NAME_LEN]; > > + > > + for (i = 0; i < num; ++i) { > > + if (check_kprobe_address_safe((unsigned long)addrs[i])) { > > + lookup_symbol_name(addrs[i], symname); > > + pr_warn("bpf_kprobe: %s at %lx is blacklisted\n", symname, addrs[i]); > > So user request cannot be fulfilled and a warning is issued and some > of user requests are discarded and the rest is proceeded. Does not > sound a good idea. > > Maybe we should do filtering in user space, e.g., in libbpf, check > /sys/kernel/debug/kprobes/blacklist and return error > earlier? bpftrace/libbpf-tools/bcc-tools all do filtering before > requesting kprobe in the kernel. also fprobe uses ftrace drectly without paths in kprobe, so I wonder some of the kprobe blacklisted functions are actually safe jirka > > > + /* mark blacklisted symbol for remove */ > > + addrs[i] = 0; > > + } > > + } > > + > > + /* remove blacklisted symbol from addrs */ > > + for (i = 0, cnt = 0; i < num; ++i) { > > + if (addrs[i]) > > + addrs[cnt++] = addrs[i]; > > + } > > + > > + return cnt; > > +} > > + > > int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) > > { > > struct bpf_kprobe_multi_link *link = NULL; > > @@ -2859,6 +2890,12 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr > > else > > link->fp.entry_handler = kprobe_multi_link_handler; > > + cnt = check_bpf_kprobe_addrs_safe(addrs, cnt); > > + if (!cnt) { > > + err = -EINVAL; > > + goto error; > > + } > > + > > link->addrs = addrs; > > link->cookies = cookies; > > link->cnt = cnt;
On 5/10/23 10:27 AM, Jiri Olsa wrote: > On Wed, May 10, 2023 at 07:13:58AM -0700, Yonghong Song wrote: >> >> >> On 5/10/23 5:20 AM, Ze Gao wrote: >>> BPF_LINK_TYPE_KPROBE_MULTI attaches kprobe programs through fprobe, >>> however it does not takes those kprobe blacklisted into consideration, >>> which likely introduce recursive traps and blows up stacks. >>> >>> this patch adds simple check and remove those are in kprobe_blacklist >>> from one fprobe during bpf_kprobe_multi_link_attach. And also >>> check_kprobe_address_safe is open for more future checks. >>> >>> note that ftrace provides recursion detection mechanism, but for kprobe >>> only, we can directly reject those cases early without turning to ftrace. >>> >>> Signed-off-by: Ze Gao <zegao@tencent.com> >>> --- >>> kernel/trace/bpf_trace.c | 37 +++++++++++++++++++++++++++++++++++++ >>> 1 file changed, 37 insertions(+) >>> >>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c >>> index 9a050e36dc6c..44c68bc06bbd 100644 >>> --- a/kernel/trace/bpf_trace.c >>> +++ b/kernel/trace/bpf_trace.c >>> @@ -2764,6 +2764,37 @@ static int get_modules_for_addrs(struct module ***mods, unsigned long *addrs, u3 >>> return arr.mods_cnt; >>> } >>> +static inline int check_kprobe_address_safe(unsigned long addr) >>> +{ >>> + if (within_kprobe_blacklist(addr)) >>> + return -EINVAL; >>> + else >>> + return 0; >>> +} >>> + >>> +static int check_bpf_kprobe_addrs_safe(unsigned long *addrs, int num) >>> +{ >>> + int i, cnt; >>> + char symname[KSYM_NAME_LEN]; >>> + >>> + for (i = 0; i < num; ++i) { >>> + if (check_kprobe_address_safe((unsigned long)addrs[i])) { >>> + lookup_symbol_name(addrs[i], symname); >>> + pr_warn("bpf_kprobe: %s at %lx is blacklisted\n", symname, addrs[i]); >> >> So user request cannot be fulfilled and a warning is issued and some >> of user requests are discarded and the rest is proceeded. Does not >> sound a good idea. >> >> Maybe we should do filtering in user space, e.g., in libbpf, check >> /sys/kernel/debug/kprobes/blacklist and return error >> earlier? bpftrace/libbpf-tools/bcc-tools all do filtering before >> requesting kprobe in the kernel. > > also fprobe uses ftrace drectly without paths in kprobe, so I wonder > some of the kprobe blacklisted functions are actually safe Could you give a pointer about 'some of the kprobe blacklisted functions are actually safe'? > > jirka > >> >>> + /* mark blacklisted symbol for remove */ >>> + addrs[i] = 0; >>> + } >>> + } >>> + >>> + /* remove blacklisted symbol from addrs */ >>> + for (i = 0, cnt = 0; i < num; ++i) { >>> + if (addrs[i]) >>> + addrs[cnt++] = addrs[i]; >>> + } >>> + >>> + return cnt; >>> +} >>> + >>> int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) >>> { >>> struct bpf_kprobe_multi_link *link = NULL; >>> @@ -2859,6 +2890,12 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr >>> else >>> link->fp.entry_handler = kprobe_multi_link_handler; >>> + cnt = check_bpf_kprobe_addrs_safe(addrs, cnt); >>> + if (!cnt) { >>> + err = -EINVAL; >>> + goto error; >>> + } >>> + >>> link->addrs = addrs; >>> link->cookies = cookies; >>> link->cnt = cnt;
On 5/10/23 1:20 PM, Yonghong Song wrote: > > > On 5/10/23 10:27 AM, Jiri Olsa wrote: >> On Wed, May 10, 2023 at 07:13:58AM -0700, Yonghong Song wrote: >>> >>> >>> On 5/10/23 5:20 AM, Ze Gao wrote: >>>> BPF_LINK_TYPE_KPROBE_MULTI attaches kprobe programs through fprobe, >>>> however it does not takes those kprobe blacklisted into consideration, >>>> which likely introduce recursive traps and blows up stacks. >>>> >>>> this patch adds simple check and remove those are in kprobe_blacklist >>>> from one fprobe during bpf_kprobe_multi_link_attach. And also >>>> check_kprobe_address_safe is open for more future checks. >>>> >>>> note that ftrace provides recursion detection mechanism, but for kprobe >>>> only, we can directly reject those cases early without turning to >>>> ftrace. >>>> >>>> Signed-off-by: Ze Gao <zegao@tencent.com> >>>> --- >>>> kernel/trace/bpf_trace.c | 37 +++++++++++++++++++++++++++++++++++++ >>>> 1 file changed, 37 insertions(+) >>>> >>>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c >>>> index 9a050e36dc6c..44c68bc06bbd 100644 >>>> --- a/kernel/trace/bpf_trace.c >>>> +++ b/kernel/trace/bpf_trace.c >>>> @@ -2764,6 +2764,37 @@ static int get_modules_for_addrs(struct >>>> module ***mods, unsigned long *addrs, u3 >>>> return arr.mods_cnt; >>>> } >>>> +static inline int check_kprobe_address_safe(unsigned long addr) >>>> +{ >>>> + if (within_kprobe_blacklist(addr)) >>>> + return -EINVAL; >>>> + else >>>> + return 0; >>>> +} >>>> + >>>> +static int check_bpf_kprobe_addrs_safe(unsigned long *addrs, int num) >>>> +{ >>>> + int i, cnt; >>>> + char symname[KSYM_NAME_LEN]; >>>> + >>>> + for (i = 0; i < num; ++i) { >>>> + if (check_kprobe_address_safe((unsigned long)addrs[i])) { >>>> + lookup_symbol_name(addrs[i], symname); >>>> + pr_warn("bpf_kprobe: %s at %lx is blacklisted\n", >>>> symname, addrs[i]); >>> >>> So user request cannot be fulfilled and a warning is issued and some >>> of user requests are discarded and the rest is proceeded. Does not >>> sound a good idea. >>> >>> Maybe we should do filtering in user space, e.g., in libbpf, check >>> /sys/kernel/debug/kprobes/blacklist and return error >>> earlier? bpftrace/libbpf-tools/bcc-tools all do filtering before >>> requesting kprobe in the kernel. >> >> also fprobe uses ftrace drectly without paths in kprobe, so I wonder >> some of the kprobe blacklisted functions are actually safe > > Could you give a pointer about 'some of the kprobe blacklisted > functions are actually safe'? Thanks Jiri for answering my question. it is not clear whether kprobe blacklist == fprobe blacklist, probably not. You mentioned: note that ftrace provides recursion detection mechanism, but for kprobe only Maybe the right choice is to improve ftrace to provide recursion detection mechanism for fprobe as well? > >> >> jirka >> >>> >>>> + /* mark blacklisted symbol for remove */ >>>> + addrs[i] = 0; >>>> + } >>>> + } >>>> + >>>> + /* remove blacklisted symbol from addrs */ >>>> + for (i = 0, cnt = 0; i < num; ++i) { >>>> + if (addrs[i]) >>>> + addrs[cnt++] = addrs[i]; >>>> + } >>>> + >>>> + return cnt; >>>> +} >>>> + >>>> int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, >>>> struct bpf_prog *prog) >>>> { >>>> struct bpf_kprobe_multi_link *link = NULL; >>>> @@ -2859,6 +2890,12 @@ int bpf_kprobe_multi_link_attach(const union >>>> bpf_attr *attr, struct bpf_prog *pr >>>> else >>>> link->fp.entry_handler = kprobe_multi_link_handler; >>>> + cnt = check_bpf_kprobe_addrs_safe(addrs, cnt); >>>> + if (!cnt) { >>>> + err = -EINVAL; >>>> + goto error; >>>> + } >>>> + >>>> link->addrs = addrs; >>>> link->cookies = cookies; >>>> link->cnt = cnt;
I'm afraid filtering in user space tools is not enough, cause it's a kernel BUG. it would 100% trigger a kernel crash if you run cmd like retsnoop -e 'pick_next_task_fair' -a ':kernel/sched/*.c' -vvv which is caused by that BPF_LINK_TYPE_KPROBE_MULTI accidentally attaches bpf progs to preempt_count_{add, sub}, which in turn triggers stackoverflow because the handler itself calls those functions. checking in libbpf is not enough if some tools use bpf syscall directly, you know, we can not cover all cases. So kernel checking is a must, we cannot rely on users to not crash the kernel. Thanks Ze On Wed, May 10, 2023 at 10:14 PM Yonghong Song <yhs@meta.com> wrote: > > > > On 5/10/23 5:20 AM, Ze Gao wrote: > > BPF_LINK_TYPE_KPROBE_MULTI attaches kprobe programs through fprobe, > > however it does not takes those kprobe blacklisted into consideration, > > which likely introduce recursive traps and blows up stacks. > > > > this patch adds simple check and remove those are in kprobe_blacklist > > from one fprobe during bpf_kprobe_multi_link_attach. And also > > check_kprobe_address_safe is open for more future checks. > > > > note that ftrace provides recursion detection mechanism, but for kprobe > > only, we can directly reject those cases early without turning to ftrace. > > > > Signed-off-by: Ze Gao <zegao@tencent.com> > > --- > > kernel/trace/bpf_trace.c | 37 +++++++++++++++++++++++++++++++++++++ > > 1 file changed, 37 insertions(+) > > > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > > index 9a050e36dc6c..44c68bc06bbd 100644 > > --- a/kernel/trace/bpf_trace.c > > +++ b/kernel/trace/bpf_trace.c > > @@ -2764,6 +2764,37 @@ static int get_modules_for_addrs(struct module ***mods, unsigned long *addrs, u3 > > return arr.mods_cnt; > > } > > > > +static inline int check_kprobe_address_safe(unsigned long addr) > > +{ > > + if (within_kprobe_blacklist(addr)) > > + return -EINVAL; > > + else > > + return 0; > > +} > > + > > +static int check_bpf_kprobe_addrs_safe(unsigned long *addrs, int num) > > +{ > > + int i, cnt; > > + char symname[KSYM_NAME_LEN]; > > + > > + for (i = 0; i < num; ++i) { > > + if (check_kprobe_address_safe((unsigned long)addrs[i])) { > > + lookup_symbol_name(addrs[i], symname); > > + pr_warn("bpf_kprobe: %s at %lx is blacklisted\n", symname, addrs[i]); > > So user request cannot be fulfilled and a warning is issued and some > of user requests are discarded and the rest is proceeded. Does not > sound a good idea. > > Maybe we should do filtering in user space, e.g., in libbpf, check > /sys/kernel/debug/kprobes/blacklist and return error > earlier? bpftrace/libbpf-tools/bcc-tools all do filtering before > requesting kprobe in the kernel. > > > + /* mark blacklisted symbol for remove */ > > + addrs[i] = 0; > > + } > > + } > > + > > + /* remove blacklisted symbol from addrs */ > > + for (i = 0, cnt = 0; i < num; ++i) { > > + if (addrs[i]) > > + addrs[cnt++] = addrs[i]; > > + } > > + > > + return cnt; > > +} > > + > > int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) > > { > > struct bpf_kprobe_multi_link *link = NULL; > > @@ -2859,6 +2890,12 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr > > else > > link->fp.entry_handler = kprobe_multi_link_handler; > > > > + cnt = check_bpf_kprobe_addrs_safe(addrs, cnt); > > + if (!cnt) { > > + err = -EINVAL; > > + goto error; > > + } > > + > > link->addrs = addrs; > > link->cookies = cookies; > > link->cnt = cnt;
Thank yonghong for your sage reviews. Yes, this is an option I am also considering . I will try this out later to see if works But like you said it's not clear whether kprobe blacklist== fprobe blacklist. And also there are cases I need to investigate on, like how to avoid recursions when kprobes and fprobes are mixed. Rejecting symbols kprobe_blacklisted is kinda brute-force yet a straight way to avoid kernel crash AFAIK. Ze On Thu, May 11, 2023 at 7:54 AM Yonghong Song <yhs@meta.com> wrote: > > > > On 5/10/23 1:20 PM, Yonghong Song wrote: > > > > > > On 5/10/23 10:27 AM, Jiri Olsa wrote: > >> On Wed, May 10, 2023 at 07:13:58AM -0700, Yonghong Song wrote: > >>> > >>> > >>> On 5/10/23 5:20 AM, Ze Gao wrote: > >>>> BPF_LINK_TYPE_KPROBE_MULTI attaches kprobe programs through fprobe, > >>>> however it does not takes those kprobe blacklisted into consideration, > >>>> which likely introduce recursive traps and blows up stacks. > >>>> > >>>> this patch adds simple check and remove those are in kprobe_blacklist > >>>> from one fprobe during bpf_kprobe_multi_link_attach. And also > >>>> check_kprobe_address_safe is open for more future checks. > >>>> > >>>> note that ftrace provides recursion detection mechanism, but for kprobe > >>>> only, we can directly reject those cases early without turning to > >>>> ftrace. > >>>> > >>>> Signed-off-by: Ze Gao <zegao@tencent.com> > >>>> --- > >>>> kernel/trace/bpf_trace.c | 37 +++++++++++++++++++++++++++++++++++++ > >>>> 1 file changed, 37 insertions(+) > >>>> > >>>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > >>>> index 9a050e36dc6c..44c68bc06bbd 100644 > >>>> --- a/kernel/trace/bpf_trace.c > >>>> +++ b/kernel/trace/bpf_trace.c > >>>> @@ -2764,6 +2764,37 @@ static int get_modules_for_addrs(struct > >>>> module ***mods, unsigned long *addrs, u3 > >>>> return arr.mods_cnt; > >>>> } > >>>> +static inline int check_kprobe_address_safe(unsigned long addr) > >>>> +{ > >>>> + if (within_kprobe_blacklist(addr)) > >>>> + return -EINVAL; > >>>> + else > >>>> + return 0; > >>>> +} > >>>> + > >>>> +static int check_bpf_kprobe_addrs_safe(unsigned long *addrs, int num) > >>>> +{ > >>>> + int i, cnt; > >>>> + char symname[KSYM_NAME_LEN]; > >>>> + > >>>> + for (i = 0; i < num; ++i) { > >>>> + if (check_kprobe_address_safe((unsigned long)addrs[i])) { > >>>> + lookup_symbol_name(addrs[i], symname); > >>>> + pr_warn("bpf_kprobe: %s at %lx is blacklisted\n", > >>>> symname, addrs[i]); > >>> > >>> So user request cannot be fulfilled and a warning is issued and some > >>> of user requests are discarded and the rest is proceeded. Does not > >>> sound a good idea. > >>> > >>> Maybe we should do filtering in user space, e.g., in libbpf, check > >>> /sys/kernel/debug/kprobes/blacklist and return error > >>> earlier? bpftrace/libbpf-tools/bcc-tools all do filtering before > >>> requesting kprobe in the kernel. > >> > >> also fprobe uses ftrace drectly without paths in kprobe, so I wonder > >> some of the kprobe blacklisted functions are actually safe > > > > Could you give a pointer about 'some of the kprobe blacklisted > > functions are actually safe'? > > Thanks Jiri for answering my question. it is not clear whether > kprobe blacklist == fprobe blacklist, probably not. > > You mentioned: > note that ftrace provides recursion detection mechanism, > but for kprobe only > Maybe the right choice is to improve ftrace to provide recursion > detection mechanism for fprobe as well? > > > > >> > >> jirka > >> > >>> > >>>> + /* mark blacklisted symbol for remove */ > >>>> + addrs[i] = 0; > >>>> + } > >>>> + } > >>>> + > >>>> + /* remove blacklisted symbol from addrs */ > >>>> + for (i = 0, cnt = 0; i < num; ++i) { > >>>> + if (addrs[i]) > >>>> + addrs[cnt++] = addrs[i]; > >>>> + } > >>>> + > >>>> + return cnt; > >>>> +} > >>>> + > >>>> int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, > >>>> struct bpf_prog *prog) > >>>> { > >>>> struct bpf_kprobe_multi_link *link = NULL; > >>>> @@ -2859,6 +2890,12 @@ int bpf_kprobe_multi_link_attach(const union > >>>> bpf_attr *attr, struct bpf_prog *pr > >>>> else > >>>> link->fp.entry_handler = kprobe_multi_link_handler; > >>>> + cnt = check_bpf_kprobe_addrs_safe(addrs, cnt); > >>>> + if (!cnt) { > >>>> + err = -EINVAL; > >>>> + goto error; > >>>> + } > >>>> + > >>>> link->addrs = addrs; > >>>> link->cookies = cookies; > >>>> link->cnt = cnt;
I just looked through fprobe_handler, it already does the recursion check from the code. So the root cause of the case I mentioned above which triggers kernel crash may be much more complicated than I read from the exception backtrace. It seems more effort is needed to look for a better solution than my initial proposal. I will keep the thread updated if there is any progress anyway. Thanks Ze On Thu, May 11, 2023 at 9:24 AM Ze Gao <zegao2021@gmail.com> wrote: > > Thank yonghong for your sage reviews. > Yes, this is an option I am also considering . I will try this out > later to see if works > > But like you said it's not clear whether kprobe blacklist== fprobe blacklist. > And also there are cases I need to investigate on, like how to avoid recursions > when kprobes and fprobes are mixed. > > Rejecting symbols kprobe_blacklisted is kinda brute-force yet a straight way to > avoid kernel crash AFAIK. > > Ze > > On Thu, May 11, 2023 at 7:54 AM Yonghong Song <yhs@meta.com> wrote: > > > > > > > > On 5/10/23 1:20 PM, Yonghong Song wrote: > > > > > > > > > On 5/10/23 10:27 AM, Jiri Olsa wrote: > > >> On Wed, May 10, 2023 at 07:13:58AM -0700, Yonghong Song wrote: > > >>> > > >>> > > >>> On 5/10/23 5:20 AM, Ze Gao wrote: > > >>>> BPF_LINK_TYPE_KPROBE_MULTI attaches kprobe programs through fprobe, > > >>>> however it does not takes those kprobe blacklisted into consideration, > > >>>> which likely introduce recursive traps and blows up stacks. > > >>>> > > >>>> this patch adds simple check and remove those are in kprobe_blacklist > > >>>> from one fprobe during bpf_kprobe_multi_link_attach. And also > > >>>> check_kprobe_address_safe is open for more future checks. > > >>>> > > >>>> note that ftrace provides recursion detection mechanism, but for kprobe > > >>>> only, we can directly reject those cases early without turning to > > >>>> ftrace. > > >>>> > > >>>> Signed-off-by: Ze Gao <zegao@tencent.com> > > >>>> --- > > >>>> kernel/trace/bpf_trace.c | 37 +++++++++++++++++++++++++++++++++++++ > > >>>> 1 file changed, 37 insertions(+) > > >>>> > > >>>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > > >>>> index 9a050e36dc6c..44c68bc06bbd 100644 > > >>>> --- a/kernel/trace/bpf_trace.c > > >>>> +++ b/kernel/trace/bpf_trace.c > > >>>> @@ -2764,6 +2764,37 @@ static int get_modules_for_addrs(struct > > >>>> module ***mods, unsigned long *addrs, u3 > > >>>> return arr.mods_cnt; > > >>>> } > > >>>> +static inline int check_kprobe_address_safe(unsigned long addr) > > >>>> +{ > > >>>> + if (within_kprobe_blacklist(addr)) > > >>>> + return -EINVAL; > > >>>> + else > > >>>> + return 0; > > >>>> +} > > >>>> + > > >>>> +static int check_bpf_kprobe_addrs_safe(unsigned long *addrs, int num) > > >>>> +{ > > >>>> + int i, cnt; > > >>>> + char symname[KSYM_NAME_LEN]; > > >>>> + > > >>>> + for (i = 0; i < num; ++i) { > > >>>> + if (check_kprobe_address_safe((unsigned long)addrs[i])) { > > >>>> + lookup_symbol_name(addrs[i], symname); > > >>>> + pr_warn("bpf_kprobe: %s at %lx is blacklisted\n", > > >>>> symname, addrs[i]); > > >>> > > >>> So user request cannot be fulfilled and a warning is issued and some > > >>> of user requests are discarded and the rest is proceeded. Does not > > >>> sound a good idea. > > >>> > > >>> Maybe we should do filtering in user space, e.g., in libbpf, check > > >>> /sys/kernel/debug/kprobes/blacklist and return error > > >>> earlier? bpftrace/libbpf-tools/bcc-tools all do filtering before > > >>> requesting kprobe in the kernel. > > >> > > >> also fprobe uses ftrace drectly without paths in kprobe, so I wonder > > >> some of the kprobe blacklisted functions are actually safe > > > > > > Could you give a pointer about 'some of the kprobe blacklisted > > > functions are actually safe'? > > > > Thanks Jiri for answering my question. it is not clear whether > > kprobe blacklist == fprobe blacklist, probably not. > > > > You mentioned: > > note that ftrace provides recursion detection mechanism, > > but for kprobe only > > Maybe the right choice is to improve ftrace to provide recursion > > detection mechanism for fprobe as well? > > > > > > > >> > > >> jirka > > >> > > >>> > > >>>> + /* mark blacklisted symbol for remove */ > > >>>> + addrs[i] = 0; > > >>>> + } > > >>>> + } > > >>>> + > > >>>> + /* remove blacklisted symbol from addrs */ > > >>>> + for (i = 0, cnt = 0; i < num; ++i) { > > >>>> + if (addrs[i]) > > >>>> + addrs[cnt++] = addrs[i]; > > >>>> + } > > >>>> + > > >>>> + return cnt; > > >>>> +} > > >>>> + > > >>>> int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, > > >>>> struct bpf_prog *prog) > > >>>> { > > >>>> struct bpf_kprobe_multi_link *link = NULL; > > >>>> @@ -2859,6 +2890,12 @@ int bpf_kprobe_multi_link_attach(const union > > >>>> bpf_attr *attr, struct bpf_prog *pr > > >>>> else > > >>>> link->fp.entry_handler = kprobe_multi_link_handler; > > >>>> + cnt = check_bpf_kprobe_addrs_safe(addrs, cnt); > > >>>> + if (!cnt) { > > >>>> + err = -EINVAL; > > >>>> + goto error; > > >>>> + } > > >>>> + > > >>>> link->addrs = addrs; > > >>>> link->cookies = cookies; > > >>>> link->cnt = cnt;
Yes, Jiri. Thanks for pointing it out. It's true that not all probe blacklisted functions should be banned from bpf_kprobe. I tried some of them, and all kprobe blacklisted symbols I hooked works fine except preempt_count_{sub, add}. so the takeaway here is preempt_cout_{sub, add} must be rejected at least for now since kprobe_multi_link_prog_run ( i.e., the fprobe handler) and rethook_trampoline_handler( i.e. the rethook handler) calls preempt_cout_{sub, add}. I'm considering providing a general fprobe_blacklist framework just like what kprobe does to allow others to mark functions used inside fprobe handler or rethook handler as NOFPROBE to avoid potential stack recursion. But only after I figure out how ftrace handles recursion problems currently and why it fails in the case I ran into. Thanks Ze On Thu, May 11, 2023 at 1:28 AM Jiri Olsa <olsajiri@gmail.com> wrote: > > On Wed, May 10, 2023 at 07:13:58AM -0700, Yonghong Song wrote: > > > > > > On 5/10/23 5:20 AM, Ze Gao wrote: > > > BPF_LINK_TYPE_KPROBE_MULTI attaches kprobe programs through fprobe, > > > however it does not takes those kprobe blacklisted into consideration, > > > which likely introduce recursive traps and blows up stacks. > > > > > > this patch adds simple check and remove those are in kprobe_blacklist > > > from one fprobe during bpf_kprobe_multi_link_attach. And also > > > check_kprobe_address_safe is open for more future checks. > > > > > > note that ftrace provides recursion detection mechanism, but for kprobe > > > only, we can directly reject those cases early without turning to ftrace. > > > > > > Signed-off-by: Ze Gao <zegao@tencent.com> > > > --- > > > kernel/trace/bpf_trace.c | 37 +++++++++++++++++++++++++++++++++++++ > > > 1 file changed, 37 insertions(+) > > > > > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > > > index 9a050e36dc6c..44c68bc06bbd 100644 > > > --- a/kernel/trace/bpf_trace.c > > > +++ b/kernel/trace/bpf_trace.c > > > @@ -2764,6 +2764,37 @@ static int get_modules_for_addrs(struct module ***mods, unsigned long *addrs, u3 > > > return arr.mods_cnt; > > > } > > > +static inline int check_kprobe_address_safe(unsigned long addr) > > > +{ > > > + if (within_kprobe_blacklist(addr)) > > > + return -EINVAL; > > > + else > > > + return 0; > > > +} > > > + > > > +static int check_bpf_kprobe_addrs_safe(unsigned long *addrs, int num) > > > +{ > > > + int i, cnt; > > > + char symname[KSYM_NAME_LEN]; > > > + > > > + for (i = 0; i < num; ++i) { > > > + if (check_kprobe_address_safe((unsigned long)addrs[i])) { > > > + lookup_symbol_name(addrs[i], symname); > > > + pr_warn("bpf_kprobe: %s at %lx is blacklisted\n", symname, addrs[i]); > > > > So user request cannot be fulfilled and a warning is issued and some > > of user requests are discarded and the rest is proceeded. Does not > > sound a good idea. > > > > Maybe we should do filtering in user space, e.g., in libbpf, check > > /sys/kernel/debug/kprobes/blacklist and return error > > earlier? bpftrace/libbpf-tools/bcc-tools all do filtering before > > requesting kprobe in the kernel. > > also fprobe uses ftrace drectly without paths in kprobe, so I wonder > some of the kprobe blacklisted functions are actually safe > > jirka > > > > > > + /* mark blacklisted symbol for remove */ > > > + addrs[i] = 0; > > > + } > > > + } > > > + > > > + /* remove blacklisted symbol from addrs */ > > > + for (i = 0, cnt = 0; i < num; ++i) { > > > + if (addrs[i]) > > > + addrs[cnt++] = addrs[i]; > > > + } > > > + > > > + return cnt; > > > +} > > > + > > > int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) > > > { > > > struct bpf_kprobe_multi_link *link = NULL; > > > @@ -2859,6 +2890,12 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr > > > else > > > link->fp.entry_handler = kprobe_multi_link_handler; > > > + cnt = check_bpf_kprobe_addrs_safe(addrs, cnt); > > > + if (!cnt) { > > > + err = -EINVAL; > > > + goto error; > > > + } > > > + > > > link->addrs = addrs; > > > link->cookies = cookies; > > > link->cnt = cnt;
On 5/11/23 10:53 PM, Ze Gao wrote: > Yes, Jiri. Thanks for pointing it out. It's true that not all probe > blacklisted functions should be banned from bpf_kprobe. > > I tried some of them, and all kprobe blacklisted symbols I hooked > works fine except preempt_count_{sub, add}. > so the takeaway here is preempt_cout_{sub, add} must be rejected at > least for now since kprobe_multi_link_prog_run > ( i.e., the fprobe handler) and rethook_trampoline_handler( i.e. the > rethook handler) calls preempt_cout_{sub, add}. > > I'm considering providing a general fprobe_blacklist framework just > like what kprobe does to allow others to mark > functions used inside fprobe handler or rethook handler as NOFPROBE to > avoid potential stack recursion. But only after > I figure out how ftrace handles recursion problems currently and why > it fails in the case I ran into. A fprobe_blacklist might make sense indeed as fprobe and kprobe are quite different... Thanks for working on this. > > Thanks > Ze > > On Thu, May 11, 2023 at 1:28 AM Jiri Olsa <olsajiri@gmail.com> wrote: >> >> On Wed, May 10, 2023 at 07:13:58AM -0700, Yonghong Song wrote: >>> >>> >>> On 5/10/23 5:20 AM, Ze Gao wrote: >>>> BPF_LINK_TYPE_KPROBE_MULTI attaches kprobe programs through fprobe, >>>> however it does not takes those kprobe blacklisted into consideration, >>>> which likely introduce recursive traps and blows up stacks. >>>> >>>> this patch adds simple check and remove those are in kprobe_blacklist >>>> from one fprobe during bpf_kprobe_multi_link_attach. And also >>>> check_kprobe_address_safe is open for more future checks. >>>> >>>> note that ftrace provides recursion detection mechanism, but for kprobe >>>> only, we can directly reject those cases early without turning to ftrace. >>>> >>>> Signed-off-by: Ze Gao <zegao@tencent.com> >>>> --- >>>> kernel/trace/bpf_trace.c | 37 +++++++++++++++++++++++++++++++++++++ >>>> 1 file changed, 37 insertions(+) >>>> >>>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c >>>> index 9a050e36dc6c..44c68bc06bbd 100644 >>>> --- a/kernel/trace/bpf_trace.c >>>> +++ b/kernel/trace/bpf_trace.c >>>> @@ -2764,6 +2764,37 @@ static int get_modules_for_addrs(struct module ***mods, unsigned long *addrs, u3 >>>> return arr.mods_cnt; >>>> } >>>> +static inline int check_kprobe_address_safe(unsigned long addr) >>>> +{ >>>> + if (within_kprobe_blacklist(addr)) >>>> + return -EINVAL; >>>> + else >>>> + return 0; >>>> +} >>>> + >>>> +static int check_bpf_kprobe_addrs_safe(unsigned long *addrs, int num) >>>> +{ >>>> + int i, cnt; >>>> + char symname[KSYM_NAME_LEN]; >>>> + >>>> + for (i = 0; i < num; ++i) { >>>> + if (check_kprobe_address_safe((unsigned long)addrs[i])) { >>>> + lookup_symbol_name(addrs[i], symname); >>>> + pr_warn("bpf_kprobe: %s at %lx is blacklisted\n", symname, addrs[i]); >>> >>> So user request cannot be fulfilled and a warning is issued and some >>> of user requests are discarded and the rest is proceeded. Does not >>> sound a good idea. >>> >>> Maybe we should do filtering in user space, e.g., in libbpf, check >>> /sys/kernel/debug/kprobes/blacklist and return error >>> earlier? bpftrace/libbpf-tools/bcc-tools all do filtering before >>> requesting kprobe in the kernel. >> >> also fprobe uses ftrace drectly without paths in kprobe, so I wonder >> some of the kprobe blacklisted functions are actually safe >> >> jirka >> >>> >>>> + /* mark blacklisted symbol for remove */ >>>> + addrs[i] = 0; >>>> + } >>>> + } >>>> + >>>> + /* remove blacklisted symbol from addrs */ >>>> + for (i = 0, cnt = 0; i < num; ++i) { >>>> + if (addrs[i]) >>>> + addrs[cnt++] = addrs[i]; >>>> + } >>>> + >>>> + return cnt; >>>> +} >>>> + >>>> int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) >>>> { >>>> struct bpf_kprobe_multi_link *link = NULL; >>>> @@ -2859,6 +2890,12 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr >>>> else >>>> link->fp.entry_handler = kprobe_multi_link_handler; >>>> + cnt = check_bpf_kprobe_addrs_safe(addrs, cnt); >>>> + if (!cnt) { >>>> + err = -EINVAL; >>>> + goto error; >>>> + } >>>> + >>>> link->addrs = addrs; >>>> link->cookies = cookies; >>>> link->cnt = cnt;
On Fri, May 12, 2023 at 07:29:02AM -0700, Yonghong Song wrote: > > > On 5/11/23 10:53 PM, Ze Gao wrote: > > Yes, Jiri. Thanks for pointing it out. It's true that not all probe > > blacklisted functions should be banned from bpf_kprobe. > > > > I tried some of them, and all kprobe blacklisted symbols I hooked > > works fine except preempt_count_{sub, add}. > > so the takeaway here is preempt_cout_{sub, add} must be rejected at > > least for now since kprobe_multi_link_prog_run > > ( i.e., the fprobe handler) and rethook_trampoline_handler( i.e. the > > rethook handler) calls preempt_cout_{sub, add}. check BTF_SET_START(btf_id_deny) list for functions that we do not allow to attach for tracing programs.. the direct ftrace interface used by trampolines has likely similar limitations as fptrace_ops API used by fprobe > > > > I'm considering providing a general fprobe_blacklist framework just > > like what kprobe does to allow others to mark > > functions used inside fprobe handler or rethook handler as NOFPROBE to > > avoid potential stack recursion. But only after > > I figure out how ftrace handles recursion problems currently and why > > it fails in the case I ran into. > > A fprobe_blacklist might make sense indeed as fprobe and kprobe are quite > different... Thanks for working on this. +1 jirka > > > > > Thanks > > Ze > > > > On Thu, May 11, 2023 at 1:28 AM Jiri Olsa <olsajiri@gmail.com> wrote: > > > > > > On Wed, May 10, 2023 at 07:13:58AM -0700, Yonghong Song wrote: > > > > > > > > > > > > On 5/10/23 5:20 AM, Ze Gao wrote: > > > > > BPF_LINK_TYPE_KPROBE_MULTI attaches kprobe programs through fprobe, > > > > > however it does not takes those kprobe blacklisted into consideration, > > > > > which likely introduce recursive traps and blows up stacks. > > > > > > > > > > this patch adds simple check and remove those are in kprobe_blacklist > > > > > from one fprobe during bpf_kprobe_multi_link_attach. And also > > > > > check_kprobe_address_safe is open for more future checks. > > > > > > > > > > note that ftrace provides recursion detection mechanism, but for kprobe > > > > > only, we can directly reject those cases early without turning to ftrace. > > > > > > > > > > Signed-off-by: Ze Gao <zegao@tencent.com> > > > > > --- > > > > > kernel/trace/bpf_trace.c | 37 +++++++++++++++++++++++++++++++++++++ > > > > > 1 file changed, 37 insertions(+) > > > > > > > > > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > > > > > index 9a050e36dc6c..44c68bc06bbd 100644 > > > > > --- a/kernel/trace/bpf_trace.c > > > > > +++ b/kernel/trace/bpf_trace.c > > > > > @@ -2764,6 +2764,37 @@ static int get_modules_for_addrs(struct module ***mods, unsigned long *addrs, u3 > > > > > return arr.mods_cnt; > > > > > } > > > > > +static inline int check_kprobe_address_safe(unsigned long addr) > > > > > +{ > > > > > + if (within_kprobe_blacklist(addr)) > > > > > + return -EINVAL; > > > > > + else > > > > > + return 0; > > > > > +} > > > > > + > > > > > +static int check_bpf_kprobe_addrs_safe(unsigned long *addrs, int num) > > > > > +{ > > > > > + int i, cnt; > > > > > + char symname[KSYM_NAME_LEN]; > > > > > + > > > > > + for (i = 0; i < num; ++i) { > > > > > + if (check_kprobe_address_safe((unsigned long)addrs[i])) { > > > > > + lookup_symbol_name(addrs[i], symname); > > > > > + pr_warn("bpf_kprobe: %s at %lx is blacklisted\n", symname, addrs[i]); > > > > > > > > So user request cannot be fulfilled and a warning is issued and some > > > > of user requests are discarded and the rest is proceeded. Does not > > > > sound a good idea. > > > > > > > > Maybe we should do filtering in user space, e.g., in libbpf, check > > > > /sys/kernel/debug/kprobes/blacklist and return error > > > > earlier? bpftrace/libbpf-tools/bcc-tools all do filtering before > > > > requesting kprobe in the kernel. > > > > > > also fprobe uses ftrace drectly without paths in kprobe, so I wonder > > > some of the kprobe blacklisted functions are actually safe > > > > > > jirka > > > > > > > > > > > > + /* mark blacklisted symbol for remove */ > > > > > + addrs[i] = 0; > > > > > + } > > > > > + } > > > > > + > > > > > + /* remove blacklisted symbol from addrs */ > > > > > + for (i = 0, cnt = 0; i < num; ++i) { > > > > > + if (addrs[i]) > > > > > + addrs[cnt++] = addrs[i]; > > > > > + } > > > > > + > > > > > + return cnt; > > > > > +} > > > > > + > > > > > int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) > > > > > { > > > > > struct bpf_kprobe_multi_link *link = NULL; > > > > > @@ -2859,6 +2890,12 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr > > > > > else > > > > > link->fp.entry_handler = kprobe_multi_link_handler; > > > > > + cnt = check_bpf_kprobe_addrs_safe(addrs, cnt); > > > > > + if (!cnt) { > > > > > + err = -EINVAL; > > > > > + goto error; > > > > > + } > > > > > + > > > > > link->addrs = addrs; > > > > > link->cookies = cookies; > > > > > link->cnt = cnt;
On Fri, 12 May 2023 07:29:02 -0700 Yonghong Song <yhs@meta.com> wrote: > A fprobe_blacklist might make sense indeed as fprobe and kprobe are > quite different... Thanks for working on this. Hmm, I think I see the problem: fprobe_kprobe_handler() { kprobe_busy_begin() { preempt_disable() { preempt_count_add() { <-- trace fprobe_kprobe_handler() { [ wash, rinse, repeat, CRASH!!! ] Either the kprobe_busy_begin() needs to use preempt_disable_notrace() versions, or fprobe_kprobe_handle() needs a ftrace_test_recursion_trylock() call. -- Steve
Exactly, and rethook_trampoline_handler suffers the same problem. And I've posted two patches for kprobe and rethook by using the notrace verison of preempt_ {disable, enable} to fix fprobe+rethook. [1] https://lore.kernel.org/all/20230513081656.375846-1-zegao@tencent.com/T/#u [2] https://lore.kernel.org/all/20230513090548.376522-1-zegao@tencent.com/T/#u Even worse, bpf callback introduces more such use cases, which is typically organized as follows to guard the lifetime of bpf related resources ( per-cpu access or trampoline). migrate_disable() rcu_read_lock() ... bpf_prog_run() ... rcu_read_unlock() migrate_enable(). But this may need to introduce fprobe_blacklist and bpf_kprobe_blacklist to solve such bugs at all, just like what Jiri and Yonghong suggested. Since bpf kprobe works on a different (higher and constrained) level than fprobe and ftrace and we cannot blindly mark functions (migrate_disable, __rcu_read_lock, etc.) used in tracer callbacks from external subsystems in case of semantic breakage. And I will try to implement these ideas later. Thanks, Ze On Sat, May 13, 2023 at 12:18 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Fri, 12 May 2023 07:29:02 -0700 > Yonghong Song <yhs@meta.com> wrote: > > > A fprobe_blacklist might make sense indeed as fprobe and kprobe are > > quite different... Thanks for working on this. > > Hmm, I think I see the problem: > > fprobe_kprobe_handler() { > kprobe_busy_begin() { > preempt_disable() { > preempt_count_add() { <-- trace > fprobe_kprobe_handler() { > [ wash, rinse, repeat, CRASH!!! ] > > Either the kprobe_busy_begin() needs to use preempt_disable_notrace() > versions, or fprobe_kprobe_handle() needs a > ftrace_test_recursion_trylock() call. > > -- Steve
On 5/12/23 9:17 PM, Steven Rostedt wrote: > On Fri, 12 May 2023 07:29:02 -0700 > Yonghong Song <yhs@meta.com> wrote: > >> A fprobe_blacklist might make sense indeed as fprobe and kprobe are >> quite different... Thanks for working on this. > > Hmm, I think I see the problem: > > fprobe_kprobe_handler() { > kprobe_busy_begin() { > preempt_disable() { > preempt_count_add() { <-- trace > fprobe_kprobe_handler() { > [ wash, rinse, repeat, CRASH!!! ] > > Either the kprobe_busy_begin() needs to use preempt_disable_notrace() > versions, or fprobe_kprobe_handle() needs a > ftrace_test_recursion_trylock() call. Currently, in verifier we have: BTF_SET_START(btf_id_deny) BTF_ID_UNUSED #ifdef CONFIG_SMP BTF_ID(func, migrate_disable) BTF_ID(func, migrate_enable) #endif #if !defined CONFIG_PREEMPT_RCU && !defined CONFIG_TINY_RCU BTF_ID(func, rcu_read_unlock_strict) #endif #if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_TRACE_PREEMPT_TOGGLE) BTF_ID(func, preempt_count_add) BTF_ID(func, preempt_count_sub) #endif #ifdef CONFIG_PREEMPT_RCU BTF_ID(func, __rcu_read_lock) BTF_ID(func, __rcu_read_unlock) #endif BTF_SET_END(btf_id_deny) ... } else if (prog->type == BPF_PROG_TYPE_TRACING && btf_id_set_contains(&btf_id_deny, btf_id)) { return -EINVAL; } Since we do not have a explicit deny list available to user space, the above checking will prevent to trace a few functions for tracing prog (fentry, fexit, fmod_ret). For fprobe_kprobe case, if we can construct a user visible deny list which will be the best. Otherwise, we can add a btf_id_deny_fprobe btf set which should work too. > > -- Steve
Dear all, On Thu, May 11, 2023 at 9:06 AM Ze Gao <zegao2021@gmail.com> wrote: > > I'm afraid filtering in user space tools is not enough, cause it's a kernel BUG. > > it would 100% trigger a kernel crash if you run cmd like > > retsnoop -e 'pick_next_task_fair' -a ':kernel/sched/*.c' -vvv > > which is caused by that BPF_LINK_TYPE_KPROBE_MULTI accidentally > attaches bpf progs > to preempt_count_{add, sub}, which in turn triggers stackoverflow > because the handler itself > calls those functions. I managed to see the big picture of this problem by digging into the code. here is what it goes: rethook_trampoline_handler{ preempt_disable() { preempt_count_add() { <-- trace fprobe_handler() { ftrace_test_recursion_trylock ... ftrace_test_recursion_unlock <- it fails to detect the recursion caused by rethook (rethook_trampoline_handler precisely for this case) routines. } ... rethook_trampoline_handler { [ wash, rinse, repeat, CRASH!!! ] There are some pitfalls here: 1. fprobe exit callback should be guarded by ftrace recursion check as well since user might call any traceable functions just like kprobe_multi_link_prog_run calls migrate_{disable, enable}. In this case, detection in fprobe_handler only is not enough. 2. rethook_trampoline_handler should use preempt_{disable, enable}_notrace instead because it's beyond recursion-free region guarded like 1 3. many rethook helper functions are also used outside the recursion-free regions and therefore they should be marked notrace I've post a new series of patches to resolve cases mentioned above: [Link]: https://lkml.org/lkml/2023/5/14/452 In theory, bpf_kprobe as the user of fprobe+ rethook, is spared from suffering recursion again by applying these. And [Link]: https://lore.kernel.org/all/20230513090548.376522-1-zegao@tencent.com/T/#u becomes optional. That leaves one final question, whether we need probe blacklist or bpf_kprobe blacklist depends on how we deal with user requests if one of its expected hook points fails because of recursion detected. Do we need to reject them in the first place by blacklist or let it fail to execute the callback silently, which needs your gentle advice. Regards, Ze
On Sat, 13 May 2023 00:17:57 -0400 Steven Rostedt <rostedt@goodmis.org> wrote: > On Fri, 12 May 2023 07:29:02 -0700 > Yonghong Song <yhs@meta.com> wrote: > > > A fprobe_blacklist might make sense indeed as fprobe and kprobe are > > quite different... Thanks for working on this. > > Hmm, I think I see the problem: > > fprobe_kprobe_handler() { > kprobe_busy_begin() { > preempt_disable() { > preempt_count_add() { <-- trace > fprobe_kprobe_handler() { > [ wash, rinse, repeat, CRASH!!! ] > > Either the kprobe_busy_begin() needs to use preempt_disable_notrace() > versions, or fprobe_kprobe_handle() needs a > ftrace_test_recursion_trylock() call. Oops, I got it. Is preempt_count_add() tracable? If so, kprobe_busy_begin() should be updated. Thanks, > > -- Steve
On Thu, 11 May 2023 09:24:18 +0800 Ze Gao <zegao2021@gmail.com> wrote: > Thank yonghong for your sage reviews. > Yes, this is an option I am also considering . I will try this out > later to see if works > > But like you said it's not clear whether kprobe blacklist== fprobe blacklist. Just FYI, those are not the same. kprobe blacklist is functions marked by __kprobes or NOKPROBE_SYMBOL(), but fprobe blacklist is "notrace" functions. Thank you, > And also there are cases I need to investigate on, like how to avoid recursions > when kprobes and fprobes are mixed. > > Rejecting symbols kprobe_blacklisted is kinda brute-force yet a straight way to > avoid kernel crash AFAIK. > > Ze > > On Thu, May 11, 2023 at 7:54 AM Yonghong Song <yhs@meta.com> wrote: > > > > > > > > On 5/10/23 1:20 PM, Yonghong Song wrote: > > > > > > > > > On 5/10/23 10:27 AM, Jiri Olsa wrote: > > >> On Wed, May 10, 2023 at 07:13:58AM -0700, Yonghong Song wrote: > > >>> > > >>> > > >>> On 5/10/23 5:20 AM, Ze Gao wrote: > > >>>> BPF_LINK_TYPE_KPROBE_MULTI attaches kprobe programs through fprobe, > > >>>> however it does not takes those kprobe blacklisted into consideration, > > >>>> which likely introduce recursive traps and blows up stacks. > > >>>> > > >>>> this patch adds simple check and remove those are in kprobe_blacklist > > >>>> from one fprobe during bpf_kprobe_multi_link_attach. And also > > >>>> check_kprobe_address_safe is open for more future checks. > > >>>> > > >>>> note that ftrace provides recursion detection mechanism, but for kprobe > > >>>> only, we can directly reject those cases early without turning to > > >>>> ftrace. > > >>>> > > >>>> Signed-off-by: Ze Gao <zegao@tencent.com> > > >>>> --- > > >>>> kernel/trace/bpf_trace.c | 37 +++++++++++++++++++++++++++++++++++++ > > >>>> 1 file changed, 37 insertions(+) > > >>>> > > >>>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > > >>>> index 9a050e36dc6c..44c68bc06bbd 100644 > > >>>> --- a/kernel/trace/bpf_trace.c > > >>>> +++ b/kernel/trace/bpf_trace.c > > >>>> @@ -2764,6 +2764,37 @@ static int get_modules_for_addrs(struct > > >>>> module ***mods, unsigned long *addrs, u3 > > >>>> return arr.mods_cnt; > > >>>> } > > >>>> +static inline int check_kprobe_address_safe(unsigned long addr) > > >>>> +{ > > >>>> + if (within_kprobe_blacklist(addr)) > > >>>> + return -EINVAL; > > >>>> + else > > >>>> + return 0; > > >>>> +} > > >>>> + > > >>>> +static int check_bpf_kprobe_addrs_safe(unsigned long *addrs, int num) > > >>>> +{ > > >>>> + int i, cnt; > > >>>> + char symname[KSYM_NAME_LEN]; > > >>>> + > > >>>> + for (i = 0; i < num; ++i) { > > >>>> + if (check_kprobe_address_safe((unsigned long)addrs[i])) { > > >>>> + lookup_symbol_name(addrs[i], symname); > > >>>> + pr_warn("bpf_kprobe: %s at %lx is blacklisted\n", > > >>>> symname, addrs[i]); > > >>> > > >>> So user request cannot be fulfilled and a warning is issued and some > > >>> of user requests are discarded and the rest is proceeded. Does not > > >>> sound a good idea. > > >>> > > >>> Maybe we should do filtering in user space, e.g., in libbpf, check > > >>> /sys/kernel/debug/kprobes/blacklist and return error > > >>> earlier? bpftrace/libbpf-tools/bcc-tools all do filtering before > > >>> requesting kprobe in the kernel. > > >> > > >> also fprobe uses ftrace drectly without paths in kprobe, so I wonder > > >> some of the kprobe blacklisted functions are actually safe > > > > > > Could you give a pointer about 'some of the kprobe blacklisted > > > functions are actually safe'? > > > > Thanks Jiri for answering my question. it is not clear whether > > kprobe blacklist == fprobe blacklist, probably not. > > > > You mentioned: > > note that ftrace provides recursion detection mechanism, > > but for kprobe only > > Maybe the right choice is to improve ftrace to provide recursion > > detection mechanism for fprobe as well? > > > > > > > >> > > >> jirka > > >> > > >>> > > >>>> + /* mark blacklisted symbol for remove */ > > >>>> + addrs[i] = 0; > > >>>> + } > > >>>> + } > > >>>> + > > >>>> + /* remove blacklisted symbol from addrs */ > > >>>> + for (i = 0, cnt = 0; i < num; ++i) { > > >>>> + if (addrs[i]) > > >>>> + addrs[cnt++] = addrs[i]; > > >>>> + } > > >>>> + > > >>>> + return cnt; > > >>>> +} > > >>>> + > > >>>> int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, > > >>>> struct bpf_prog *prog) > > >>>> { > > >>>> struct bpf_kprobe_multi_link *link = NULL; > > >>>> @@ -2859,6 +2890,12 @@ int bpf_kprobe_multi_link_attach(const union > > >>>> bpf_attr *attr, struct bpf_prog *pr > > >>>> else > > >>>> link->fp.entry_handler = kprobe_multi_link_handler; > > >>>> + cnt = check_bpf_kprobe_addrs_safe(addrs, cnt); > > >>>> + if (!cnt) { > > >>>> + err = -EINVAL; > > >>>> + goto error; > > >>>> + } > > >>>> + > > >>>> link->addrs = addrs; > > >>>> link->cookies = cookies; > > >>>> link->cnt = cnt;
On Tue, 16 May 2023 13:31:53 +0900 Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote: > On Sat, 13 May 2023 00:17:57 -0400 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > On Fri, 12 May 2023 07:29:02 -0700 > > Yonghong Song <yhs@meta.com> wrote: > > > > > A fprobe_blacklist might make sense indeed as fprobe and kprobe are > > > quite different... Thanks for working on this. > > > > Hmm, I think I see the problem: > > > > fprobe_kprobe_handler() { > > kprobe_busy_begin() { > > preempt_disable() { > > preempt_count_add() { <-- trace > > fprobe_kprobe_handler() { > > [ wash, rinse, repeat, CRASH!!! ] > > > > Either the kprobe_busy_begin() needs to use preempt_disable_notrace() > > versions, or fprobe_kprobe_handle() needs a > > ftrace_test_recursion_trylock() call. > > Oops, I got it. Is preempt_count_add() tracable? If so, kprobe_busy_begin() > should be updated. OK, preempt_count_add() is NOKPROBE_SYMBOL() so kprobe_busy_begin() should be safe. The problem is in fprobe_kprobe_handler() then. Thanks! > > Thanks, > > > > > -- Steve > > > -- > Masami Hiramatsu (Google) <mhiramat@kernel.org>
On Fri, 12 May 2023 07:29:02 -0700 Yonghong Song <yhs@meta.com> wrote: > > > On 5/11/23 10:53 PM, Ze Gao wrote: > > Yes, Jiri. Thanks for pointing it out. It's true that not all probe > > blacklisted functions should be banned from bpf_kprobe. > > > > I tried some of them, and all kprobe blacklisted symbols I hooked > > works fine except preempt_count_{sub, add}. > > so the takeaway here is preempt_cout_{sub, add} must be rejected at > > least for now since kprobe_multi_link_prog_run > > ( i.e., the fprobe handler) and rethook_trampoline_handler( i.e. the > > rethook handler) calls preempt_cout_{sub, add}. > > > > I'm considering providing a general fprobe_blacklist framework just > > like what kprobe does to allow others to mark > > functions used inside fprobe handler or rethook handler as NOFPROBE to > > avoid potential stack recursion. But only after > > I figure out how ftrace handles recursion problems currently and why > > it fails in the case I ran into. > > A fprobe_blacklist might make sense indeed as fprobe and kprobe are > quite different... Thanks for working on this. No, I don't like fprobe_blacklist, because you can filter user given function with <tracefs>/available_filter_functions :) If the function is not listed there, you can not put fprobe on it. IOW, kprobe_multi_link_prog_run only covers those functions. (white-list) At the tooling side, it should check whether the probe is defined for single function or multiple functions, and use kprobe-blacklist (single) or available_filter_functions (multiple). Thank you, > > > > > Thanks > > Ze > > > > On Thu, May 11, 2023 at 1:28 AM Jiri Olsa <olsajiri@gmail.com> wrote: > >> > >> On Wed, May 10, 2023 at 07:13:58AM -0700, Yonghong Song wrote: > >>> > >>> > >>> On 5/10/23 5:20 AM, Ze Gao wrote: > >>>> BPF_LINK_TYPE_KPROBE_MULTI attaches kprobe programs through fprobe, > >>>> however it does not takes those kprobe blacklisted into consideration, > >>>> which likely introduce recursive traps and blows up stacks. > >>>> > >>>> this patch adds simple check and remove those are in kprobe_blacklist > >>>> from one fprobe during bpf_kprobe_multi_link_attach. And also > >>>> check_kprobe_address_safe is open for more future checks. > >>>> > >>>> note that ftrace provides recursion detection mechanism, but for kprobe > >>>> only, we can directly reject those cases early without turning to ftrace. > >>>> > >>>> Signed-off-by: Ze Gao <zegao@tencent.com> > >>>> --- > >>>> kernel/trace/bpf_trace.c | 37 +++++++++++++++++++++++++++++++++++++ > >>>> 1 file changed, 37 insertions(+) > >>>> > >>>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > >>>> index 9a050e36dc6c..44c68bc06bbd 100644 > >>>> --- a/kernel/trace/bpf_trace.c > >>>> +++ b/kernel/trace/bpf_trace.c > >>>> @@ -2764,6 +2764,37 @@ static int get_modules_for_addrs(struct module ***mods, unsigned long *addrs, u3 > >>>> return arr.mods_cnt; > >>>> } > >>>> +static inline int check_kprobe_address_safe(unsigned long addr) > >>>> +{ > >>>> + if (within_kprobe_blacklist(addr)) > >>>> + return -EINVAL; > >>>> + else > >>>> + return 0; > >>>> +} > >>>> + > >>>> +static int check_bpf_kprobe_addrs_safe(unsigned long *addrs, int num) > >>>> +{ > >>>> + int i, cnt; > >>>> + char symname[KSYM_NAME_LEN]; > >>>> + > >>>> + for (i = 0; i < num; ++i) { > >>>> + if (check_kprobe_address_safe((unsigned long)addrs[i])) { > >>>> + lookup_symbol_name(addrs[i], symname); > >>>> + pr_warn("bpf_kprobe: %s at %lx is blacklisted\n", symname, addrs[i]); > >>> > >>> So user request cannot be fulfilled and a warning is issued and some > >>> of user requests are discarded and the rest is proceeded. Does not > >>> sound a good idea. > >>> > >>> Maybe we should do filtering in user space, e.g., in libbpf, check > >>> /sys/kernel/debug/kprobes/blacklist and return error > >>> earlier? bpftrace/libbpf-tools/bcc-tools all do filtering before > >>> requesting kprobe in the kernel. > >> > >> also fprobe uses ftrace drectly without paths in kprobe, so I wonder > >> some of the kprobe blacklisted functions are actually safe > >> > >> jirka > >> > >>> > >>>> + /* mark blacklisted symbol for remove */ > >>>> + addrs[i] = 0; > >>>> + } > >>>> + } > >>>> + > >>>> + /* remove blacklisted symbol from addrs */ > >>>> + for (i = 0, cnt = 0; i < num; ++i) { > >>>> + if (addrs[i]) > >>>> + addrs[cnt++] = addrs[i]; > >>>> + } > >>>> + > >>>> + return cnt; > >>>> +} > >>>> + > >>>> int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) > >>>> { > >>>> struct bpf_kprobe_multi_link *link = NULL; > >>>> @@ -2859,6 +2890,12 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr > >>>> else > >>>> link->fp.entry_handler = kprobe_multi_link_handler; > >>>> + cnt = check_bpf_kprobe_addrs_safe(addrs, cnt); > >>>> + if (!cnt) { > >>>> + err = -EINVAL; > >>>> + goto error; > >>>> + } > >>>> + > >>>> link->addrs = addrs; > >>>> link->cookies = cookies; > >>>> link->cnt = cnt;
On 5/15/23 10:49 PM, Masami Hiramatsu (Google) wrote: > On Fri, 12 May 2023 07:29:02 -0700 > Yonghong Song <yhs@meta.com> wrote: > >> >> >> On 5/11/23 10:53 PM, Ze Gao wrote: >>> Yes, Jiri. Thanks for pointing it out. It's true that not all probe >>> blacklisted functions should be banned from bpf_kprobe. >>> >>> I tried some of them, and all kprobe blacklisted symbols I hooked >>> works fine except preempt_count_{sub, add}. >>> so the takeaway here is preempt_cout_{sub, add} must be rejected at >>> least for now since kprobe_multi_link_prog_run >>> ( i.e., the fprobe handler) and rethook_trampoline_handler( i.e. the >>> rethook handler) calls preempt_cout_{sub, add}. >>> >>> I'm considering providing a general fprobe_blacklist framework just >>> like what kprobe does to allow others to mark >>> functions used inside fprobe handler or rethook handler as NOFPROBE to >>> avoid potential stack recursion. But only after >>> I figure out how ftrace handles recursion problems currently and why >>> it fails in the case I ran into. >> >> A fprobe_blacklist might make sense indeed as fprobe and kprobe are >> quite different... Thanks for working on this. > > No, I don't like fprobe_blacklist, because you can filter user given > function with <tracefs>/available_filter_functions :) > If the function is not listed there, you can not put fprobe on it. > IOW, kprobe_multi_link_prog_run only covers those functions. (white-list) > > At the tooling side, it should check whether the probe is defined for > single function or multiple functions, and use kprobe-blacklist (single) > or available_filter_functions (multiple). Thanks for clarification. So basically fprobe blacklist is similar to fentry, not able to trace functions marked with notrace. So agree, the checking scheme could be: - user space to check available_filter_functions - a few other tracable functions but may have recursion effect handled by infrastructure for fprobe case. fentry case is already covered by verifier to deny a few functions like preempt_count_sub etc. > > Thank you, > >> >>> >>> Thanks >>> Ze >>> >>> On Thu, May 11, 2023 at 1:28 AM Jiri Olsa <olsajiri@gmail.com> wrote: >>>> >>>> On Wed, May 10, 2023 at 07:13:58AM -0700, Yonghong Song wrote: >>>>> >>>>> >>>>> On 5/10/23 5:20 AM, Ze Gao wrote: >>>>>> BPF_LINK_TYPE_KPROBE_MULTI attaches kprobe programs through fprobe, >>>>>> however it does not takes those kprobe blacklisted into consideration, >>>>>> which likely introduce recursive traps and blows up stacks. >>>>>> >>>>>> this patch adds simple check and remove those are in kprobe_blacklist >>>>>> from one fprobe during bpf_kprobe_multi_link_attach. And also >>>>>> check_kprobe_address_safe is open for more future checks. >>>>>> >>>>>> note that ftrace provides recursion detection mechanism, but for kprobe >>>>>> only, we can directly reject those cases early without turning to ftrace. >>>>>> >>>>>> Signed-off-by: Ze Gao <zegao@tencent.com> >>>>>> --- >>>>>> kernel/trace/bpf_trace.c | 37 +++++++++++++++++++++++++++++++++++++ >>>>>> 1 file changed, 37 insertions(+) >>>>>> >>>>>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c >>>>>> index 9a050e36dc6c..44c68bc06bbd 100644 >>>>>> --- a/kernel/trace/bpf_trace.c >>>>>> +++ b/kernel/trace/bpf_trace.c >>>>>> @@ -2764,6 +2764,37 @@ static int get_modules_for_addrs(struct module ***mods, unsigned long *addrs, u3 >>>>>> return arr.mods_cnt; >>>>>> } >>>>>> +static inline int check_kprobe_address_safe(unsigned long addr) >>>>>> +{ >>>>>> + if (within_kprobe_blacklist(addr)) >>>>>> + return -EINVAL; >>>>>> + else >>>>>> + return 0; >>>>>> +} >>>>>> + >>>>>> +static int check_bpf_kprobe_addrs_safe(unsigned long *addrs, int num) >>>>>> +{ >>>>>> + int i, cnt; >>>>>> + char symname[KSYM_NAME_LEN]; >>>>>> + >>>>>> + for (i = 0; i < num; ++i) { >>>>>> + if (check_kprobe_address_safe((unsigned long)addrs[i])) { >>>>>> + lookup_symbol_name(addrs[i], symname); >>>>>> + pr_warn("bpf_kprobe: %s at %lx is blacklisted\n", symname, addrs[i]); >>>>> >>>>> So user request cannot be fulfilled and a warning is issued and some >>>>> of user requests are discarded and the rest is proceeded. Does not >>>>> sound a good idea. >>>>> >>>>> Maybe we should do filtering in user space, e.g., in libbpf, check >>>>> /sys/kernel/debug/kprobes/blacklist and return error >>>>> earlier? bpftrace/libbpf-tools/bcc-tools all do filtering before >>>>> requesting kprobe in the kernel. >>>> >>>> also fprobe uses ftrace drectly without paths in kprobe, so I wonder >>>> some of the kprobe blacklisted functions are actually safe >>>> >>>> jirka >>>> >>>>> >>>>>> + /* mark blacklisted symbol for remove */ >>>>>> + addrs[i] = 0; >>>>>> + } >>>>>> + } >>>>>> + >>>>>> + /* remove blacklisted symbol from addrs */ >>>>>> + for (i = 0, cnt = 0; i < num; ++i) { >>>>>> + if (addrs[i]) >>>>>> + addrs[cnt++] = addrs[i]; >>>>>> + } >>>>>> + >>>>>> + return cnt; >>>>>> +} >>>>>> + >>>>>> int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) >>>>>> { >>>>>> struct bpf_kprobe_multi_link *link = NULL; >>>>>> @@ -2859,6 +2890,12 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr >>>>>> else >>>>>> link->fp.entry_handler = kprobe_multi_link_handler; >>>>>> + cnt = check_bpf_kprobe_addrs_safe(addrs, cnt); >>>>>> + if (!cnt) { >>>>>> + err = -EINVAL; >>>>>> + goto error; >>>>>> + } >>>>>> + >>>>>> link->addrs = addrs; >>>>>> link->cookies = cookies; >>>>>> link->cnt = cnt; > >
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 9a050e36dc6c..44c68bc06bbd 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -2764,6 +2764,37 @@ static int get_modules_for_addrs(struct module ***mods, unsigned long *addrs, u3 return arr.mods_cnt; } +static inline int check_kprobe_address_safe(unsigned long addr) +{ + if (within_kprobe_blacklist(addr)) + return -EINVAL; + else + return 0; +} + +static int check_bpf_kprobe_addrs_safe(unsigned long *addrs, int num) +{ + int i, cnt; + char symname[KSYM_NAME_LEN]; + + for (i = 0; i < num; ++i) { + if (check_kprobe_address_safe((unsigned long)addrs[i])) { + lookup_symbol_name(addrs[i], symname); + pr_warn("bpf_kprobe: %s at %lx is blacklisted\n", symname, addrs[i]); + /* mark blacklisted symbol for remove */ + addrs[i] = 0; + } + } + + /* remove blacklisted symbol from addrs */ + for (i = 0, cnt = 0; i < num; ++i) { + if (addrs[i]) + addrs[cnt++] = addrs[i]; + } + + return cnt; +} + int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) { struct bpf_kprobe_multi_link *link = NULL; @@ -2859,6 +2890,12 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr else link->fp.entry_handler = kprobe_multi_link_handler; + cnt = check_bpf_kprobe_addrs_safe(addrs, cnt); + if (!cnt) { + err = -EINVAL; + goto error; + } + link->addrs = addrs; link->cookies = cookies; link->cnt = cnt;
BPF_LINK_TYPE_KPROBE_MULTI attaches kprobe programs through fprobe, however it does not takes those kprobe blacklisted into consideration, which likely introduce recursive traps and blows up stacks. this patch adds simple check and remove those are in kprobe_blacklist from one fprobe during bpf_kprobe_multi_link_attach. And also check_kprobe_address_safe is open for more future checks. note that ftrace provides recursion detection mechanism, but for kprobe only, we can directly reject those cases early without turning to ftrace. Signed-off-by: Ze Gao <zegao@tencent.com> --- kernel/trace/bpf_trace.c | 37 +++++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+)