diff mbox series

bpf: reject blacklisted symbols in kprobe_multi to avoid recursive trap

Message ID 20230510122045.2259-1-zegao@tencent.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series bpf: reject blacklisted symbols in kprobe_multi to avoid recursive trap | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 23 this patch: 23
netdev/cc_maintainers success CCed 15 of 15 maintainers
netdev/build_clang success Errors and warnings before: 8 this patch: 8
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 23 this patch: 23
netdev/checkpatch warning WARNING: From:/Signed-off-by: email address mismatch: 'From: Ze Gao <zegao2021@gmail.com>' != 'Signed-off-by: Ze Gao <zegao@tencent.com>' WARNING: line length of 93 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline fail Was 0 now: 1
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-5 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-6 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-7 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-4 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-11 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-12 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-16 success Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-17 success Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-21 success Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-22 success Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-25 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-26 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-29 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-30 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-34 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-35 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-36 success Logs for veristat
bpf/vmtest-bpf-next-VM_Test-8 success Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for test_maps on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-13 success Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-14 success Logs for test_progs on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-18 success Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-19 success Logs for test_progs_no_alu32 on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-23 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-24 success Logs for test_progs_no_alu32_parallel on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-27 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-28 success Logs for test_progs_parallel on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-31 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-32 success Logs for test_verifier on aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-33 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-15 success Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-10 success Logs for test_maps on s390x with gcc

Commit Message

Ze Gao May 10, 2023, 12:20 p.m. UTC
BPF_LINK_TYPE_KPROBE_MULTI attaches kprobe programs through fprobe,
however it does not takes those kprobe blacklisted into consideration,
which likely introduce recursive traps and blows up stacks.

this patch adds simple check and remove those are in kprobe_blacklist
from one fprobe during bpf_kprobe_multi_link_attach. And also
check_kprobe_address_safe is open for more future checks.

note that ftrace provides recursion detection mechanism, but for kprobe
only, we can directly reject those cases early without turning to ftrace.

Signed-off-by: Ze Gao <zegao@tencent.com>
---
 kernel/trace/bpf_trace.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

Comments

Yonghong Song May 10, 2023, 2:13 p.m. UTC | #1
On 5/10/23 5:20 AM, Ze Gao wrote:
> BPF_LINK_TYPE_KPROBE_MULTI attaches kprobe programs through fprobe,
> however it does not takes those kprobe blacklisted into consideration,
> which likely introduce recursive traps and blows up stacks.
> 
> this patch adds simple check and remove those are in kprobe_blacklist
> from one fprobe during bpf_kprobe_multi_link_attach. And also
> check_kprobe_address_safe is open for more future checks.
> 
> note that ftrace provides recursion detection mechanism, but for kprobe
> only, we can directly reject those cases early without turning to ftrace.
> 
> Signed-off-by: Ze Gao <zegao@tencent.com>
> ---
>   kernel/trace/bpf_trace.c | 37 +++++++++++++++++++++++++++++++++++++
>   1 file changed, 37 insertions(+)
> 
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 9a050e36dc6c..44c68bc06bbd 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -2764,6 +2764,37 @@ static int get_modules_for_addrs(struct module ***mods, unsigned long *addrs, u3
>   	return arr.mods_cnt;
>   }
>   
> +static inline int check_kprobe_address_safe(unsigned long addr)
> +{
> +	if (within_kprobe_blacklist(addr))
> +		return -EINVAL;
> +	else
> +		return 0;
> +}
> +
> +static int check_bpf_kprobe_addrs_safe(unsigned long *addrs, int num)
> +{
> +	int i, cnt;
> +	char symname[KSYM_NAME_LEN];
> +
> +	for (i = 0; i < num; ++i) {
> +		if (check_kprobe_address_safe((unsigned long)addrs[i])) {
> +			lookup_symbol_name(addrs[i], symname);
> +			pr_warn("bpf_kprobe: %s at %lx is blacklisted\n", symname, addrs[i]);

So user request cannot be fulfilled and a warning is issued and some
of user requests are discarded and the rest is proceeded. Does not
sound a good idea.

Maybe we should do filtering in user space, e.g., in libbpf, check
/sys/kernel/debug/kprobes/blacklist and return error
earlier? bpftrace/libbpf-tools/bcc-tools all do filtering before
requesting kprobe in the kernel.

> +			/* mark blacklisted symbol for remove */
> +			addrs[i] = 0;
> +		}
> +	}
> +
> +	/* remove blacklisted symbol from addrs */
> +	for (i = 0, cnt = 0; i < num; ++i) {
> +		if (addrs[i])
> +			addrs[cnt++]  = addrs[i];
> +	}
> +
> +	return cnt;
> +}
> +
>   int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
>   {
>   	struct bpf_kprobe_multi_link *link = NULL;
> @@ -2859,6 +2890,12 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
>   	else
>   		link->fp.entry_handler = kprobe_multi_link_handler;
>   
> +	cnt = check_bpf_kprobe_addrs_safe(addrs, cnt);
> +	if (!cnt) {
> +		err = -EINVAL;
> +		goto error;
> +	}
> +
>   	link->addrs = addrs;
>   	link->cookies = cookies;
>   	link->cnt = cnt;
Jiri Olsa May 10, 2023, 5:27 p.m. UTC | #2
On Wed, May 10, 2023 at 07:13:58AM -0700, Yonghong Song wrote:
> 
> 
> On 5/10/23 5:20 AM, Ze Gao wrote:
> > BPF_LINK_TYPE_KPROBE_MULTI attaches kprobe programs through fprobe,
> > however it does not takes those kprobe blacklisted into consideration,
> > which likely introduce recursive traps and blows up stacks.
> > 
> > this patch adds simple check and remove those are in kprobe_blacklist
> > from one fprobe during bpf_kprobe_multi_link_attach. And also
> > check_kprobe_address_safe is open for more future checks.
> > 
> > note that ftrace provides recursion detection mechanism, but for kprobe
> > only, we can directly reject those cases early without turning to ftrace.
> > 
> > Signed-off-by: Ze Gao <zegao@tencent.com>
> > ---
> >   kernel/trace/bpf_trace.c | 37 +++++++++++++++++++++++++++++++++++++
> >   1 file changed, 37 insertions(+)
> > 
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index 9a050e36dc6c..44c68bc06bbd 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -2764,6 +2764,37 @@ static int get_modules_for_addrs(struct module ***mods, unsigned long *addrs, u3
> >   	return arr.mods_cnt;
> >   }
> > +static inline int check_kprobe_address_safe(unsigned long addr)
> > +{
> > +	if (within_kprobe_blacklist(addr))
> > +		return -EINVAL;
> > +	else
> > +		return 0;
> > +}
> > +
> > +static int check_bpf_kprobe_addrs_safe(unsigned long *addrs, int num)
> > +{
> > +	int i, cnt;
> > +	char symname[KSYM_NAME_LEN];
> > +
> > +	for (i = 0; i < num; ++i) {
> > +		if (check_kprobe_address_safe((unsigned long)addrs[i])) {
> > +			lookup_symbol_name(addrs[i], symname);
> > +			pr_warn("bpf_kprobe: %s at %lx is blacklisted\n", symname, addrs[i]);
> 
> So user request cannot be fulfilled and a warning is issued and some
> of user requests are discarded and the rest is proceeded. Does not
> sound a good idea.
> 
> Maybe we should do filtering in user space, e.g., in libbpf, check
> /sys/kernel/debug/kprobes/blacklist and return error
> earlier? bpftrace/libbpf-tools/bcc-tools all do filtering before
> requesting kprobe in the kernel.

also fprobe uses ftrace drectly without paths in kprobe, so I wonder
some of the kprobe blacklisted functions are actually safe

jirka

> 
> > +			/* mark blacklisted symbol for remove */
> > +			addrs[i] = 0;
> > +		}
> > +	}
> > +
> > +	/* remove blacklisted symbol from addrs */
> > +	for (i = 0, cnt = 0; i < num; ++i) {
> > +		if (addrs[i])
> > +			addrs[cnt++]  = addrs[i];
> > +	}
> > +
> > +	return cnt;
> > +}
> > +
> >   int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
> >   {
> >   	struct bpf_kprobe_multi_link *link = NULL;
> > @@ -2859,6 +2890,12 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
> >   	else
> >   		link->fp.entry_handler = kprobe_multi_link_handler;
> > +	cnt = check_bpf_kprobe_addrs_safe(addrs, cnt);
> > +	if (!cnt) {
> > +		err = -EINVAL;
> > +		goto error;
> > +	}
> > +
> >   	link->addrs = addrs;
> >   	link->cookies = cookies;
> >   	link->cnt = cnt;
Yonghong Song May 10, 2023, 8:20 p.m. UTC | #3
On 5/10/23 10:27 AM, Jiri Olsa wrote:
> On Wed, May 10, 2023 at 07:13:58AM -0700, Yonghong Song wrote:
>>
>>
>> On 5/10/23 5:20 AM, Ze Gao wrote:
>>> BPF_LINK_TYPE_KPROBE_MULTI attaches kprobe programs through fprobe,
>>> however it does not takes those kprobe blacklisted into consideration,
>>> which likely introduce recursive traps and blows up stacks.
>>>
>>> this patch adds simple check and remove those are in kprobe_blacklist
>>> from one fprobe during bpf_kprobe_multi_link_attach. And also
>>> check_kprobe_address_safe is open for more future checks.
>>>
>>> note that ftrace provides recursion detection mechanism, but for kprobe
>>> only, we can directly reject those cases early without turning to ftrace.
>>>
>>> Signed-off-by: Ze Gao <zegao@tencent.com>
>>> ---
>>>    kernel/trace/bpf_trace.c | 37 +++++++++++++++++++++++++++++++++++++
>>>    1 file changed, 37 insertions(+)
>>>
>>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>>> index 9a050e36dc6c..44c68bc06bbd 100644
>>> --- a/kernel/trace/bpf_trace.c
>>> +++ b/kernel/trace/bpf_trace.c
>>> @@ -2764,6 +2764,37 @@ static int get_modules_for_addrs(struct module ***mods, unsigned long *addrs, u3
>>>    	return arr.mods_cnt;
>>>    }
>>> +static inline int check_kprobe_address_safe(unsigned long addr)
>>> +{
>>> +	if (within_kprobe_blacklist(addr))
>>> +		return -EINVAL;
>>> +	else
>>> +		return 0;
>>> +}
>>> +
>>> +static int check_bpf_kprobe_addrs_safe(unsigned long *addrs, int num)
>>> +{
>>> +	int i, cnt;
>>> +	char symname[KSYM_NAME_LEN];
>>> +
>>> +	for (i = 0; i < num; ++i) {
>>> +		if (check_kprobe_address_safe((unsigned long)addrs[i])) {
>>> +			lookup_symbol_name(addrs[i], symname);
>>> +			pr_warn("bpf_kprobe: %s at %lx is blacklisted\n", symname, addrs[i]);
>>
>> So user request cannot be fulfilled and a warning is issued and some
>> of user requests are discarded and the rest is proceeded. Does not
>> sound a good idea.
>>
>> Maybe we should do filtering in user space, e.g., in libbpf, check
>> /sys/kernel/debug/kprobes/blacklist and return error
>> earlier? bpftrace/libbpf-tools/bcc-tools all do filtering before
>> requesting kprobe in the kernel.
> 
> also fprobe uses ftrace drectly without paths in kprobe, so I wonder
> some of the kprobe blacklisted functions are actually safe

Could you give a pointer about 'some of the kprobe blacklisted
functions are actually safe'?

> 
> jirka
> 
>>
>>> +			/* mark blacklisted symbol for remove */
>>> +			addrs[i] = 0;
>>> +		}
>>> +	}
>>> +
>>> +	/* remove blacklisted symbol from addrs */
>>> +	for (i = 0, cnt = 0; i < num; ++i) {
>>> +		if (addrs[i])
>>> +			addrs[cnt++]  = addrs[i];
>>> +	}
>>> +
>>> +	return cnt;
>>> +}
>>> +
>>>    int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
>>>    {
>>>    	struct bpf_kprobe_multi_link *link = NULL;
>>> @@ -2859,6 +2890,12 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
>>>    	else
>>>    		link->fp.entry_handler = kprobe_multi_link_handler;
>>> +	cnt = check_bpf_kprobe_addrs_safe(addrs, cnt);
>>> +	if (!cnt) {
>>> +		err = -EINVAL;
>>> +		goto error;
>>> +	}
>>> +
>>>    	link->addrs = addrs;
>>>    	link->cookies = cookies;
>>>    	link->cnt = cnt;
Yonghong Song May 10, 2023, 11:54 p.m. UTC | #4
On 5/10/23 1:20 PM, Yonghong Song wrote:
> 
> 
> On 5/10/23 10:27 AM, Jiri Olsa wrote:
>> On Wed, May 10, 2023 at 07:13:58AM -0700, Yonghong Song wrote:
>>>
>>>
>>> On 5/10/23 5:20 AM, Ze Gao wrote:
>>>> BPF_LINK_TYPE_KPROBE_MULTI attaches kprobe programs through fprobe,
>>>> however it does not takes those kprobe blacklisted into consideration,
>>>> which likely introduce recursive traps and blows up stacks.
>>>>
>>>> this patch adds simple check and remove those are in kprobe_blacklist
>>>> from one fprobe during bpf_kprobe_multi_link_attach. And also
>>>> check_kprobe_address_safe is open for more future checks.
>>>>
>>>> note that ftrace provides recursion detection mechanism, but for kprobe
>>>> only, we can directly reject those cases early without turning to 
>>>> ftrace.
>>>>
>>>> Signed-off-by: Ze Gao <zegao@tencent.com>
>>>> ---
>>>>    kernel/trace/bpf_trace.c | 37 +++++++++++++++++++++++++++++++++++++
>>>>    1 file changed, 37 insertions(+)
>>>>
>>>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>>>> index 9a050e36dc6c..44c68bc06bbd 100644
>>>> --- a/kernel/trace/bpf_trace.c
>>>> +++ b/kernel/trace/bpf_trace.c
>>>> @@ -2764,6 +2764,37 @@ static int get_modules_for_addrs(struct 
>>>> module ***mods, unsigned long *addrs, u3
>>>>        return arr.mods_cnt;
>>>>    }
>>>> +static inline int check_kprobe_address_safe(unsigned long addr)
>>>> +{
>>>> +    if (within_kprobe_blacklist(addr))
>>>> +        return -EINVAL;
>>>> +    else
>>>> +        return 0;
>>>> +}
>>>> +
>>>> +static int check_bpf_kprobe_addrs_safe(unsigned long *addrs, int num)
>>>> +{
>>>> +    int i, cnt;
>>>> +    char symname[KSYM_NAME_LEN];
>>>> +
>>>> +    for (i = 0; i < num; ++i) {
>>>> +        if (check_kprobe_address_safe((unsigned long)addrs[i])) {
>>>> +            lookup_symbol_name(addrs[i], symname);
>>>> +            pr_warn("bpf_kprobe: %s at %lx is blacklisted\n", 
>>>> symname, addrs[i]);
>>>
>>> So user request cannot be fulfilled and a warning is issued and some
>>> of user requests are discarded and the rest is proceeded. Does not
>>> sound a good idea.
>>>
>>> Maybe we should do filtering in user space, e.g., in libbpf, check
>>> /sys/kernel/debug/kprobes/blacklist and return error
>>> earlier? bpftrace/libbpf-tools/bcc-tools all do filtering before
>>> requesting kprobe in the kernel.
>>
>> also fprobe uses ftrace drectly without paths in kprobe, so I wonder
>> some of the kprobe blacklisted functions are actually safe
> 
> Could you give a pointer about 'some of the kprobe blacklisted
> functions are actually safe'?

Thanks Jiri for answering my question. it is not clear whether
kprobe blacklist == fprobe blacklist, probably not.

You mentioned:
   note that ftrace provides recursion detection mechanism,
   but for kprobe only
Maybe the right choice is to improve ftrace to provide recursion
detection mechanism for fprobe as well?

> 
>>
>> jirka
>>
>>>
>>>> +            /* mark blacklisted symbol for remove */
>>>> +            addrs[i] = 0;
>>>> +        }
>>>> +    }
>>>> +
>>>> +    /* remove blacklisted symbol from addrs */
>>>> +    for (i = 0, cnt = 0; i < num; ++i) {
>>>> +        if (addrs[i])
>>>> +            addrs[cnt++]  = addrs[i];
>>>> +    }
>>>> +
>>>> +    return cnt;
>>>> +}
>>>> +
>>>>    int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, 
>>>> struct bpf_prog *prog)
>>>>    {
>>>>        struct bpf_kprobe_multi_link *link = NULL;
>>>> @@ -2859,6 +2890,12 @@ int bpf_kprobe_multi_link_attach(const union 
>>>> bpf_attr *attr, struct bpf_prog *pr
>>>>        else
>>>>            link->fp.entry_handler = kprobe_multi_link_handler;
>>>> +    cnt = check_bpf_kprobe_addrs_safe(addrs, cnt);
>>>> +    if (!cnt) {
>>>> +        err = -EINVAL;
>>>> +        goto error;
>>>> +    }
>>>> +
>>>>        link->addrs = addrs;
>>>>        link->cookies = cookies;
>>>>        link->cnt = cnt;
Ze Gao May 11, 2023, 1:06 a.m. UTC | #5
I'm afraid filtering in user space tools is not enough, cause it's a kernel BUG.

it would 100% trigger a kernel crash if you run cmd like

retsnoop -e 'pick_next_task_fair' -a ':kernel/sched/*.c' -vvv

which is caused by that BPF_LINK_TYPE_KPROBE_MULTI accidentally
attaches bpf progs
to preempt_count_{add, sub}, which in turn triggers stackoverflow
because the handler itself
calls those functions.

checking in libbpf is not enough if some tools use bpf syscall
directly, you know,
we can not cover all cases.

So kernel checking is a must, we cannot rely on users to not crash the kernel.

Thanks
Ze

On Wed, May 10, 2023 at 10:14 PM Yonghong Song <yhs@meta.com> wrote:
>
>
>
> On 5/10/23 5:20 AM, Ze Gao wrote:
> > BPF_LINK_TYPE_KPROBE_MULTI attaches kprobe programs through fprobe,
> > however it does not takes those kprobe blacklisted into consideration,
> > which likely introduce recursive traps and blows up stacks.
> >
> > this patch adds simple check and remove those are in kprobe_blacklist
> > from one fprobe during bpf_kprobe_multi_link_attach. And also
> > check_kprobe_address_safe is open for more future checks.
> >
> > note that ftrace provides recursion detection mechanism, but for kprobe
> > only, we can directly reject those cases early without turning to ftrace.
> >
> > Signed-off-by: Ze Gao <zegao@tencent.com>
> > ---
> >   kernel/trace/bpf_trace.c | 37 +++++++++++++++++++++++++++++++++++++
> >   1 file changed, 37 insertions(+)
> >
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index 9a050e36dc6c..44c68bc06bbd 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -2764,6 +2764,37 @@ static int get_modules_for_addrs(struct module ***mods, unsigned long *addrs, u3
> >       return arr.mods_cnt;
> >   }
> >
> > +static inline int check_kprobe_address_safe(unsigned long addr)
> > +{
> > +     if (within_kprobe_blacklist(addr))
> > +             return -EINVAL;
> > +     else
> > +             return 0;
> > +}
> > +
> > +static int check_bpf_kprobe_addrs_safe(unsigned long *addrs, int num)
> > +{
> > +     int i, cnt;
> > +     char symname[KSYM_NAME_LEN];
> > +
> > +     for (i = 0; i < num; ++i) {
> > +             if (check_kprobe_address_safe((unsigned long)addrs[i])) {
> > +                     lookup_symbol_name(addrs[i], symname);
> > +                     pr_warn("bpf_kprobe: %s at %lx is blacklisted\n", symname, addrs[i]);
>
> So user request cannot be fulfilled and a warning is issued and some
> of user requests are discarded and the rest is proceeded. Does not
> sound a good idea.
>
> Maybe we should do filtering in user space, e.g., in libbpf, check
> /sys/kernel/debug/kprobes/blacklist and return error
> earlier? bpftrace/libbpf-tools/bcc-tools all do filtering before
> requesting kprobe in the kernel.
>
> > +                     /* mark blacklisted symbol for remove */
> > +                     addrs[i] = 0;
> > +             }
> > +     }
> > +
> > +     /* remove blacklisted symbol from addrs */
> > +     for (i = 0, cnt = 0; i < num; ++i) {
> > +             if (addrs[i])
> > +                     addrs[cnt++]  = addrs[i];
> > +     }
> > +
> > +     return cnt;
> > +}
> > +
> >   int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
> >   {
> >       struct bpf_kprobe_multi_link *link = NULL;
> > @@ -2859,6 +2890,12 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
> >       else
> >               link->fp.entry_handler = kprobe_multi_link_handler;
> >
> > +     cnt = check_bpf_kprobe_addrs_safe(addrs, cnt);
> > +     if (!cnt) {
> > +             err = -EINVAL;
> > +             goto error;
> > +     }
> > +
> >       link->addrs = addrs;
> >       link->cookies = cookies;
> >       link->cnt = cnt;
Ze Gao May 11, 2023, 1:24 a.m. UTC | #6
Thank yonghong for your sage reviews.
Yes, this is an option I am also considering . I will try this out
later to see if works

But like you said it's not clear whether kprobe blacklist== fprobe blacklist.
And also there are cases I need to investigate on, like how to avoid recursions
when kprobes and fprobes are mixed.

Rejecting symbols  kprobe_blacklisted is kinda brute-force yet a straight way to
avoid kernel crash AFAIK.

Ze

On Thu, May 11, 2023 at 7:54 AM Yonghong Song <yhs@meta.com> wrote:
>
>
>
> On 5/10/23 1:20 PM, Yonghong Song wrote:
> >
> >
> > On 5/10/23 10:27 AM, Jiri Olsa wrote:
> >> On Wed, May 10, 2023 at 07:13:58AM -0700, Yonghong Song wrote:
> >>>
> >>>
> >>> On 5/10/23 5:20 AM, Ze Gao wrote:
> >>>> BPF_LINK_TYPE_KPROBE_MULTI attaches kprobe programs through fprobe,
> >>>> however it does not takes those kprobe blacklisted into consideration,
> >>>> which likely introduce recursive traps and blows up stacks.
> >>>>
> >>>> this patch adds simple check and remove those are in kprobe_blacklist
> >>>> from one fprobe during bpf_kprobe_multi_link_attach. And also
> >>>> check_kprobe_address_safe is open for more future checks.
> >>>>
> >>>> note that ftrace provides recursion detection mechanism, but for kprobe
> >>>> only, we can directly reject those cases early without turning to
> >>>> ftrace.
> >>>>
> >>>> Signed-off-by: Ze Gao <zegao@tencent.com>
> >>>> ---
> >>>>    kernel/trace/bpf_trace.c | 37 +++++++++++++++++++++++++++++++++++++
> >>>>    1 file changed, 37 insertions(+)
> >>>>
> >>>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> >>>> index 9a050e36dc6c..44c68bc06bbd 100644
> >>>> --- a/kernel/trace/bpf_trace.c
> >>>> +++ b/kernel/trace/bpf_trace.c
> >>>> @@ -2764,6 +2764,37 @@ static int get_modules_for_addrs(struct
> >>>> module ***mods, unsigned long *addrs, u3
> >>>>        return arr.mods_cnt;
> >>>>    }
> >>>> +static inline int check_kprobe_address_safe(unsigned long addr)
> >>>> +{
> >>>> +    if (within_kprobe_blacklist(addr))
> >>>> +        return -EINVAL;
> >>>> +    else
> >>>> +        return 0;
> >>>> +}
> >>>> +
> >>>> +static int check_bpf_kprobe_addrs_safe(unsigned long *addrs, int num)
> >>>> +{
> >>>> +    int i, cnt;
> >>>> +    char symname[KSYM_NAME_LEN];
> >>>> +
> >>>> +    for (i = 0; i < num; ++i) {
> >>>> +        if (check_kprobe_address_safe((unsigned long)addrs[i])) {
> >>>> +            lookup_symbol_name(addrs[i], symname);
> >>>> +            pr_warn("bpf_kprobe: %s at %lx is blacklisted\n",
> >>>> symname, addrs[i]);
> >>>
> >>> So user request cannot be fulfilled and a warning is issued and some
> >>> of user requests are discarded and the rest is proceeded. Does not
> >>> sound a good idea.
> >>>
> >>> Maybe we should do filtering in user space, e.g., in libbpf, check
> >>> /sys/kernel/debug/kprobes/blacklist and return error
> >>> earlier? bpftrace/libbpf-tools/bcc-tools all do filtering before
> >>> requesting kprobe in the kernel.
> >>
> >> also fprobe uses ftrace drectly without paths in kprobe, so I wonder
> >> some of the kprobe blacklisted functions are actually safe
> >
> > Could you give a pointer about 'some of the kprobe blacklisted
> > functions are actually safe'?
>
> Thanks Jiri for answering my question. it is not clear whether
> kprobe blacklist == fprobe blacklist, probably not.
>
> You mentioned:
>    note that ftrace provides recursion detection mechanism,
>    but for kprobe only
> Maybe the right choice is to improve ftrace to provide recursion
> detection mechanism for fprobe as well?
>
> >
> >>
> >> jirka
> >>
> >>>
> >>>> +            /* mark blacklisted symbol for remove */
> >>>> +            addrs[i] = 0;
> >>>> +        }
> >>>> +    }
> >>>> +
> >>>> +    /* remove blacklisted symbol from addrs */
> >>>> +    for (i = 0, cnt = 0; i < num; ++i) {
> >>>> +        if (addrs[i])
> >>>> +            addrs[cnt++]  = addrs[i];
> >>>> +    }
> >>>> +
> >>>> +    return cnt;
> >>>> +}
> >>>> +
> >>>>    int bpf_kprobe_multi_link_attach(const union bpf_attr *attr,
> >>>> struct bpf_prog *prog)
> >>>>    {
> >>>>        struct bpf_kprobe_multi_link *link = NULL;
> >>>> @@ -2859,6 +2890,12 @@ int bpf_kprobe_multi_link_attach(const union
> >>>> bpf_attr *attr, struct bpf_prog *pr
> >>>>        else
> >>>>            link->fp.entry_handler = kprobe_multi_link_handler;
> >>>> +    cnt = check_bpf_kprobe_addrs_safe(addrs, cnt);
> >>>> +    if (!cnt) {
> >>>> +        err = -EINVAL;
> >>>> +        goto error;
> >>>> +    }
> >>>> +
> >>>>        link->addrs = addrs;
> >>>>        link->cookies = cookies;
> >>>>        link->cnt = cnt;
Ze Gao May 11, 2023, 2:06 a.m. UTC | #7
I just looked through fprobe_handler, it already does the recursion
check from the code. So the root cause of the case I mentioned
above which triggers kernel crash may be much more complicated
than I read from the exception backtrace.

It seems more effort is needed to look for a better solution than my
initial proposal. I will keep the thread updated if there is any progress
anyway.

Thanks
Ze

On Thu, May 11, 2023 at 9:24 AM Ze Gao <zegao2021@gmail.com> wrote:
>
> Thank yonghong for your sage reviews.
> Yes, this is an option I am also considering . I will try this out
> later to see if works
>
> But like you said it's not clear whether kprobe blacklist== fprobe blacklist.
> And also there are cases I need to investigate on, like how to avoid recursions
> when kprobes and fprobes are mixed.
>
> Rejecting symbols  kprobe_blacklisted is kinda brute-force yet a straight way to
> avoid kernel crash AFAIK.
>
> Ze
>
> On Thu, May 11, 2023 at 7:54 AM Yonghong Song <yhs@meta.com> wrote:
> >
> >
> >
> > On 5/10/23 1:20 PM, Yonghong Song wrote:
> > >
> > >
> > > On 5/10/23 10:27 AM, Jiri Olsa wrote:
> > >> On Wed, May 10, 2023 at 07:13:58AM -0700, Yonghong Song wrote:
> > >>>
> > >>>
> > >>> On 5/10/23 5:20 AM, Ze Gao wrote:
> > >>>> BPF_LINK_TYPE_KPROBE_MULTI attaches kprobe programs through fprobe,
> > >>>> however it does not takes those kprobe blacklisted into consideration,
> > >>>> which likely introduce recursive traps and blows up stacks.
> > >>>>
> > >>>> this patch adds simple check and remove those are in kprobe_blacklist
> > >>>> from one fprobe during bpf_kprobe_multi_link_attach. And also
> > >>>> check_kprobe_address_safe is open for more future checks.
> > >>>>
> > >>>> note that ftrace provides recursion detection mechanism, but for kprobe
> > >>>> only, we can directly reject those cases early without turning to
> > >>>> ftrace.
> > >>>>
> > >>>> Signed-off-by: Ze Gao <zegao@tencent.com>
> > >>>> ---
> > >>>>    kernel/trace/bpf_trace.c | 37 +++++++++++++++++++++++++++++++++++++
> > >>>>    1 file changed, 37 insertions(+)
> > >>>>
> > >>>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > >>>> index 9a050e36dc6c..44c68bc06bbd 100644
> > >>>> --- a/kernel/trace/bpf_trace.c
> > >>>> +++ b/kernel/trace/bpf_trace.c
> > >>>> @@ -2764,6 +2764,37 @@ static int get_modules_for_addrs(struct
> > >>>> module ***mods, unsigned long *addrs, u3
> > >>>>        return arr.mods_cnt;
> > >>>>    }
> > >>>> +static inline int check_kprobe_address_safe(unsigned long addr)
> > >>>> +{
> > >>>> +    if (within_kprobe_blacklist(addr))
> > >>>> +        return -EINVAL;
> > >>>> +    else
> > >>>> +        return 0;
> > >>>> +}
> > >>>> +
> > >>>> +static int check_bpf_kprobe_addrs_safe(unsigned long *addrs, int num)
> > >>>> +{
> > >>>> +    int i, cnt;
> > >>>> +    char symname[KSYM_NAME_LEN];
> > >>>> +
> > >>>> +    for (i = 0; i < num; ++i) {
> > >>>> +        if (check_kprobe_address_safe((unsigned long)addrs[i])) {
> > >>>> +            lookup_symbol_name(addrs[i], symname);
> > >>>> +            pr_warn("bpf_kprobe: %s at %lx is blacklisted\n",
> > >>>> symname, addrs[i]);
> > >>>
> > >>> So user request cannot be fulfilled and a warning is issued and some
> > >>> of user requests are discarded and the rest is proceeded. Does not
> > >>> sound a good idea.
> > >>>
> > >>> Maybe we should do filtering in user space, e.g., in libbpf, check
> > >>> /sys/kernel/debug/kprobes/blacklist and return error
> > >>> earlier? bpftrace/libbpf-tools/bcc-tools all do filtering before
> > >>> requesting kprobe in the kernel.
> > >>
> > >> also fprobe uses ftrace drectly without paths in kprobe, so I wonder
> > >> some of the kprobe blacklisted functions are actually safe
> > >
> > > Could you give a pointer about 'some of the kprobe blacklisted
> > > functions are actually safe'?
> >
> > Thanks Jiri for answering my question. it is not clear whether
> > kprobe blacklist == fprobe blacklist, probably not.
> >
> > You mentioned:
> >    note that ftrace provides recursion detection mechanism,
> >    but for kprobe only
> > Maybe the right choice is to improve ftrace to provide recursion
> > detection mechanism for fprobe as well?
> >
> > >
> > >>
> > >> jirka
> > >>
> > >>>
> > >>>> +            /* mark blacklisted symbol for remove */
> > >>>> +            addrs[i] = 0;
> > >>>> +        }
> > >>>> +    }
> > >>>> +
> > >>>> +    /* remove blacklisted symbol from addrs */
> > >>>> +    for (i = 0, cnt = 0; i < num; ++i) {
> > >>>> +        if (addrs[i])
> > >>>> +            addrs[cnt++]  = addrs[i];
> > >>>> +    }
> > >>>> +
> > >>>> +    return cnt;
> > >>>> +}
> > >>>> +
> > >>>>    int bpf_kprobe_multi_link_attach(const union bpf_attr *attr,
> > >>>> struct bpf_prog *prog)
> > >>>>    {
> > >>>>        struct bpf_kprobe_multi_link *link = NULL;
> > >>>> @@ -2859,6 +2890,12 @@ int bpf_kprobe_multi_link_attach(const union
> > >>>> bpf_attr *attr, struct bpf_prog *pr
> > >>>>        else
> > >>>>            link->fp.entry_handler = kprobe_multi_link_handler;
> > >>>> +    cnt = check_bpf_kprobe_addrs_safe(addrs, cnt);
> > >>>> +    if (!cnt) {
> > >>>> +        err = -EINVAL;
> > >>>> +        goto error;
> > >>>> +    }
> > >>>> +
> > >>>>        link->addrs = addrs;
> > >>>>        link->cookies = cookies;
> > >>>>        link->cnt = cnt;
Ze Gao May 12, 2023, 5:53 a.m. UTC | #8
Yes, Jiri. Thanks for pointing it out. It's true that not all probe
blacklisted functions should be banned from bpf_kprobe.

I tried some of them, and all kprobe blacklisted symbols I hooked
works fine except preempt_count_{sub, add}.
so the takeaway here is preempt_cout_{sub, add} must be rejected at
least for now since kprobe_multi_link_prog_run
( i.e., the fprobe handler) and rethook_trampoline_handler( i.e. the
rethook handler) calls preempt_cout_{sub, add}.

I'm considering providing a general  fprobe_blacklist framework just
like what kprobe does to allow others to mark
functions used inside fprobe handler or rethook handler as NOFPROBE to
avoid potential stack recursion. But only after
I figure out how ftrace handles recursion problems currently and why
it fails in the case I ran into.

Thanks
Ze

On Thu, May 11, 2023 at 1:28 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Wed, May 10, 2023 at 07:13:58AM -0700, Yonghong Song wrote:
> >
> >
> > On 5/10/23 5:20 AM, Ze Gao wrote:
> > > BPF_LINK_TYPE_KPROBE_MULTI attaches kprobe programs through fprobe,
> > > however it does not takes those kprobe blacklisted into consideration,
> > > which likely introduce recursive traps and blows up stacks.
> > >
> > > this patch adds simple check and remove those are in kprobe_blacklist
> > > from one fprobe during bpf_kprobe_multi_link_attach. And also
> > > check_kprobe_address_safe is open for more future checks.
> > >
> > > note that ftrace provides recursion detection mechanism, but for kprobe
> > > only, we can directly reject those cases early without turning to ftrace.
> > >
> > > Signed-off-by: Ze Gao <zegao@tencent.com>
> > > ---
> > >   kernel/trace/bpf_trace.c | 37 +++++++++++++++++++++++++++++++++++++
> > >   1 file changed, 37 insertions(+)
> > >
> > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > > index 9a050e36dc6c..44c68bc06bbd 100644
> > > --- a/kernel/trace/bpf_trace.c
> > > +++ b/kernel/trace/bpf_trace.c
> > > @@ -2764,6 +2764,37 @@ static int get_modules_for_addrs(struct module ***mods, unsigned long *addrs, u3
> > >     return arr.mods_cnt;
> > >   }
> > > +static inline int check_kprobe_address_safe(unsigned long addr)
> > > +{
> > > +   if (within_kprobe_blacklist(addr))
> > > +           return -EINVAL;
> > > +   else
> > > +           return 0;
> > > +}
> > > +
> > > +static int check_bpf_kprobe_addrs_safe(unsigned long *addrs, int num)
> > > +{
> > > +   int i, cnt;
> > > +   char symname[KSYM_NAME_LEN];
> > > +
> > > +   for (i = 0; i < num; ++i) {
> > > +           if (check_kprobe_address_safe((unsigned long)addrs[i])) {
> > > +                   lookup_symbol_name(addrs[i], symname);
> > > +                   pr_warn("bpf_kprobe: %s at %lx is blacklisted\n", symname, addrs[i]);
> >
> > So user request cannot be fulfilled and a warning is issued and some
> > of user requests are discarded and the rest is proceeded. Does not
> > sound a good idea.
> >
> > Maybe we should do filtering in user space, e.g., in libbpf, check
> > /sys/kernel/debug/kprobes/blacklist and return error
> > earlier? bpftrace/libbpf-tools/bcc-tools all do filtering before
> > requesting kprobe in the kernel.
>
> also fprobe uses ftrace drectly without paths in kprobe, so I wonder
> some of the kprobe blacklisted functions are actually safe
>
> jirka
>
> >
> > > +                   /* mark blacklisted symbol for remove */
> > > +                   addrs[i] = 0;
> > > +           }
> > > +   }
> > > +
> > > +   /* remove blacklisted symbol from addrs */
> > > +   for (i = 0, cnt = 0; i < num; ++i) {
> > > +           if (addrs[i])
> > > +                   addrs[cnt++]  = addrs[i];
> > > +   }
> > > +
> > > +   return cnt;
> > > +}
> > > +
> > >   int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
> > >   {
> > >     struct bpf_kprobe_multi_link *link = NULL;
> > > @@ -2859,6 +2890,12 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
> > >     else
> > >             link->fp.entry_handler = kprobe_multi_link_handler;
> > > +   cnt = check_bpf_kprobe_addrs_safe(addrs, cnt);
> > > +   if (!cnt) {
> > > +           err = -EINVAL;
> > > +           goto error;
> > > +   }
> > > +
> > >     link->addrs = addrs;
> > >     link->cookies = cookies;
> > >     link->cnt = cnt;
Yonghong Song May 12, 2023, 2:29 p.m. UTC | #9
On 5/11/23 10:53 PM, Ze Gao wrote:
> Yes, Jiri. Thanks for pointing it out. It's true that not all probe
> blacklisted functions should be banned from bpf_kprobe.
> 
> I tried some of them, and all kprobe blacklisted symbols I hooked
> works fine except preempt_count_{sub, add}.
> so the takeaway here is preempt_cout_{sub, add} must be rejected at
> least for now since kprobe_multi_link_prog_run
> ( i.e., the fprobe handler) and rethook_trampoline_handler( i.e. the
> rethook handler) calls preempt_cout_{sub, add}.
> 
> I'm considering providing a general  fprobe_blacklist framework just
> like what kprobe does to allow others to mark
> functions used inside fprobe handler or rethook handler as NOFPROBE to
> avoid potential stack recursion. But only after
> I figure out how ftrace handles recursion problems currently and why
> it fails in the case I ran into.

A fprobe_blacklist might make sense indeed as fprobe and kprobe are 
quite different... Thanks for working on this.

> 
> Thanks
> Ze
> 
> On Thu, May 11, 2023 at 1:28 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>>
>> On Wed, May 10, 2023 at 07:13:58AM -0700, Yonghong Song wrote:
>>>
>>>
>>> On 5/10/23 5:20 AM, Ze Gao wrote:
>>>> BPF_LINK_TYPE_KPROBE_MULTI attaches kprobe programs through fprobe,
>>>> however it does not takes those kprobe blacklisted into consideration,
>>>> which likely introduce recursive traps and blows up stacks.
>>>>
>>>> this patch adds simple check and remove those are in kprobe_blacklist
>>>> from one fprobe during bpf_kprobe_multi_link_attach. And also
>>>> check_kprobe_address_safe is open for more future checks.
>>>>
>>>> note that ftrace provides recursion detection mechanism, but for kprobe
>>>> only, we can directly reject those cases early without turning to ftrace.
>>>>
>>>> Signed-off-by: Ze Gao <zegao@tencent.com>
>>>> ---
>>>>    kernel/trace/bpf_trace.c | 37 +++++++++++++++++++++++++++++++++++++
>>>>    1 file changed, 37 insertions(+)
>>>>
>>>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>>>> index 9a050e36dc6c..44c68bc06bbd 100644
>>>> --- a/kernel/trace/bpf_trace.c
>>>> +++ b/kernel/trace/bpf_trace.c
>>>> @@ -2764,6 +2764,37 @@ static int get_modules_for_addrs(struct module ***mods, unsigned long *addrs, u3
>>>>      return arr.mods_cnt;
>>>>    }
>>>> +static inline int check_kprobe_address_safe(unsigned long addr)
>>>> +{
>>>> +   if (within_kprobe_blacklist(addr))
>>>> +           return -EINVAL;
>>>> +   else
>>>> +           return 0;
>>>> +}
>>>> +
>>>> +static int check_bpf_kprobe_addrs_safe(unsigned long *addrs, int num)
>>>> +{
>>>> +   int i, cnt;
>>>> +   char symname[KSYM_NAME_LEN];
>>>> +
>>>> +   for (i = 0; i < num; ++i) {
>>>> +           if (check_kprobe_address_safe((unsigned long)addrs[i])) {
>>>> +                   lookup_symbol_name(addrs[i], symname);
>>>> +                   pr_warn("bpf_kprobe: %s at %lx is blacklisted\n", symname, addrs[i]);
>>>
>>> So user request cannot be fulfilled and a warning is issued and some
>>> of user requests are discarded and the rest is proceeded. Does not
>>> sound a good idea.
>>>
>>> Maybe we should do filtering in user space, e.g., in libbpf, check
>>> /sys/kernel/debug/kprobes/blacklist and return error
>>> earlier? bpftrace/libbpf-tools/bcc-tools all do filtering before
>>> requesting kprobe in the kernel.
>>
>> also fprobe uses ftrace drectly without paths in kprobe, so I wonder
>> some of the kprobe blacklisted functions are actually safe
>>
>> jirka
>>
>>>
>>>> +                   /* mark blacklisted symbol for remove */
>>>> +                   addrs[i] = 0;
>>>> +           }
>>>> +   }
>>>> +
>>>> +   /* remove blacklisted symbol from addrs */
>>>> +   for (i = 0, cnt = 0; i < num; ++i) {
>>>> +           if (addrs[i])
>>>> +                   addrs[cnt++]  = addrs[i];
>>>> +   }
>>>> +
>>>> +   return cnt;
>>>> +}
>>>> +
>>>>    int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
>>>>    {
>>>>      struct bpf_kprobe_multi_link *link = NULL;
>>>> @@ -2859,6 +2890,12 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
>>>>      else
>>>>              link->fp.entry_handler = kprobe_multi_link_handler;
>>>> +   cnt = check_bpf_kprobe_addrs_safe(addrs, cnt);
>>>> +   if (!cnt) {
>>>> +           err = -EINVAL;
>>>> +           goto error;
>>>> +   }
>>>> +
>>>>      link->addrs = addrs;
>>>>      link->cookies = cookies;
>>>>      link->cnt = cnt;
Jiri Olsa May 12, 2023, 10:33 p.m. UTC | #10
On Fri, May 12, 2023 at 07:29:02AM -0700, Yonghong Song wrote:
> 
> 
> On 5/11/23 10:53 PM, Ze Gao wrote:
> > Yes, Jiri. Thanks for pointing it out. It's true that not all probe
> > blacklisted functions should be banned from bpf_kprobe.
> > 
> > I tried some of them, and all kprobe blacklisted symbols I hooked
> > works fine except preempt_count_{sub, add}.
> > so the takeaway here is preempt_cout_{sub, add} must be rejected at
> > least for now since kprobe_multi_link_prog_run
> > ( i.e., the fprobe handler) and rethook_trampoline_handler( i.e. the
> > rethook handler) calls preempt_cout_{sub, add}.

check BTF_SET_START(btf_id_deny) list for functions that we do not allow to
attach for tracing programs.. the direct ftrace interface used by trampolines
has likely similar limitations as fptrace_ops API used by fprobe


> > 
> > I'm considering providing a general  fprobe_blacklist framework just
> > like what kprobe does to allow others to mark
> > functions used inside fprobe handler or rethook handler as NOFPROBE to
> > avoid potential stack recursion. But only after
> > I figure out how ftrace handles recursion problems currently and why
> > it fails in the case I ran into.
> 
> A fprobe_blacklist might make sense indeed as fprobe and kprobe are quite
> different... Thanks for working on this.

+1

jirka

> 
> > 
> > Thanks
> > Ze
> > 
> > On Thu, May 11, 2023 at 1:28 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > > 
> > > On Wed, May 10, 2023 at 07:13:58AM -0700, Yonghong Song wrote:
> > > > 
> > > > 
> > > > On 5/10/23 5:20 AM, Ze Gao wrote:
> > > > > BPF_LINK_TYPE_KPROBE_MULTI attaches kprobe programs through fprobe,
> > > > > however it does not takes those kprobe blacklisted into consideration,
> > > > > which likely introduce recursive traps and blows up stacks.
> > > > > 
> > > > > this patch adds simple check and remove those are in kprobe_blacklist
> > > > > from one fprobe during bpf_kprobe_multi_link_attach. And also
> > > > > check_kprobe_address_safe is open for more future checks.
> > > > > 
> > > > > note that ftrace provides recursion detection mechanism, but for kprobe
> > > > > only, we can directly reject those cases early without turning to ftrace.
> > > > > 
> > > > > Signed-off-by: Ze Gao <zegao@tencent.com>
> > > > > ---
> > > > >    kernel/trace/bpf_trace.c | 37 +++++++++++++++++++++++++++++++++++++
> > > > >    1 file changed, 37 insertions(+)
> > > > > 
> > > > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > > > > index 9a050e36dc6c..44c68bc06bbd 100644
> > > > > --- a/kernel/trace/bpf_trace.c
> > > > > +++ b/kernel/trace/bpf_trace.c
> > > > > @@ -2764,6 +2764,37 @@ static int get_modules_for_addrs(struct module ***mods, unsigned long *addrs, u3
> > > > >      return arr.mods_cnt;
> > > > >    }
> > > > > +static inline int check_kprobe_address_safe(unsigned long addr)
> > > > > +{
> > > > > +   if (within_kprobe_blacklist(addr))
> > > > > +           return -EINVAL;
> > > > > +   else
> > > > > +           return 0;
> > > > > +}
> > > > > +
> > > > > +static int check_bpf_kprobe_addrs_safe(unsigned long *addrs, int num)
> > > > > +{
> > > > > +   int i, cnt;
> > > > > +   char symname[KSYM_NAME_LEN];
> > > > > +
> > > > > +   for (i = 0; i < num; ++i) {
> > > > > +           if (check_kprobe_address_safe((unsigned long)addrs[i])) {
> > > > > +                   lookup_symbol_name(addrs[i], symname);
> > > > > +                   pr_warn("bpf_kprobe: %s at %lx is blacklisted\n", symname, addrs[i]);
> > > > 
> > > > So user request cannot be fulfilled and a warning is issued and some
> > > > of user requests are discarded and the rest is proceeded. Does not
> > > > sound a good idea.
> > > > 
> > > > Maybe we should do filtering in user space, e.g., in libbpf, check
> > > > /sys/kernel/debug/kprobes/blacklist and return error
> > > > earlier? bpftrace/libbpf-tools/bcc-tools all do filtering before
> > > > requesting kprobe in the kernel.
> > > 
> > > also fprobe uses ftrace drectly without paths in kprobe, so I wonder
> > > some of the kprobe blacklisted functions are actually safe
> > > 
> > > jirka
> > > 
> > > > 
> > > > > +                   /* mark blacklisted symbol for remove */
> > > > > +                   addrs[i] = 0;
> > > > > +           }
> > > > > +   }
> > > > > +
> > > > > +   /* remove blacklisted symbol from addrs */
> > > > > +   for (i = 0, cnt = 0; i < num; ++i) {
> > > > > +           if (addrs[i])
> > > > > +                   addrs[cnt++]  = addrs[i];
> > > > > +   }
> > > > > +
> > > > > +   return cnt;
> > > > > +}
> > > > > +
> > > > >    int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
> > > > >    {
> > > > >      struct bpf_kprobe_multi_link *link = NULL;
> > > > > @@ -2859,6 +2890,12 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
> > > > >      else
> > > > >              link->fp.entry_handler = kprobe_multi_link_handler;
> > > > > +   cnt = check_bpf_kprobe_addrs_safe(addrs, cnt);
> > > > > +   if (!cnt) {
> > > > > +           err = -EINVAL;
> > > > > +           goto error;
> > > > > +   }
> > > > > +
> > > > >      link->addrs = addrs;
> > > > >      link->cookies = cookies;
> > > > >      link->cnt = cnt;
Steven Rostedt May 13, 2023, 4:17 a.m. UTC | #11
On Fri, 12 May 2023 07:29:02 -0700
Yonghong Song <yhs@meta.com> wrote:

> A fprobe_blacklist might make sense indeed as fprobe and kprobe are 
> quite different... Thanks for working on this.

Hmm, I think I see the problem:

fprobe_kprobe_handler() {
   kprobe_busy_begin() {
      preempt_disable() {
         preempt_count_add() {  <-- trace
            fprobe_kprobe_handler() {
		[ wash, rinse, repeat, CRASH!!! ]

Either the kprobe_busy_begin() needs to use preempt_disable_notrace()
versions, or fprobe_kprobe_handle() needs a
ftrace_test_recursion_trylock() call.

-- Steve
Ze Gao May 13, 2023, 9:19 a.m. UTC | #12
Exactly, and rethook_trampoline_handler suffers the same problem.

And I've posted two patches for kprobe and rethook by using the
notrace verison of preempt_
{disable, enable} to fix fprobe+rethook.
[1] https://lore.kernel.org/all/20230513081656.375846-1-zegao@tencent.com/T/#u
[2] https://lore.kernel.org/all/20230513090548.376522-1-zegao@tencent.com/T/#u

Even worse, bpf callback introduces more such use cases, which is
typically organized as follows
to guard the lifetime of bpf related resources ( per-cpu access or trampoline).

migrate_disable()
rcu_read_lock()
...
bpf_prog_run()
...
rcu_read_unlock()
migrate_enable().

But this may need to introduce fprobe_blacklist and
bpf_kprobe_blacklist to solve such bugs at all,
just like what Jiri and Yonghong suggested. Since bpf kprobe works on
a different (higher and
constrained) level than fprobe and ftrace and we cannot blindly mark
functions (migrate_disable,
__rcu_read_lock, etc.) used in tracer callbacks from external
subsystems in case of semantic breakage.
And I will try to implement these ideas later.

Thanks,
Ze

On Sat, May 13, 2023 at 12:18 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Fri, 12 May 2023 07:29:02 -0700
> Yonghong Song <yhs@meta.com> wrote:
>
> > A fprobe_blacklist might make sense indeed as fprobe and kprobe are
> > quite different... Thanks for working on this.
>
> Hmm, I think I see the problem:
>
> fprobe_kprobe_handler() {
>    kprobe_busy_begin() {
>       preempt_disable() {
>          preempt_count_add() {  <-- trace
>             fprobe_kprobe_handler() {
>                 [ wash, rinse, repeat, CRASH!!! ]
>
> Either the kprobe_busy_begin() needs to use preempt_disable_notrace()
> versions, or fprobe_kprobe_handle() needs a
> ftrace_test_recursion_trylock() call.
>
> -- Steve
Yonghong Song May 14, 2023, 5:11 p.m. UTC | #13
On 5/12/23 9:17 PM, Steven Rostedt wrote:
> On Fri, 12 May 2023 07:29:02 -0700
> Yonghong Song <yhs@meta.com> wrote:
> 
>> A fprobe_blacklist might make sense indeed as fprobe and kprobe are
>> quite different... Thanks for working on this.
> 
> Hmm, I think I see the problem:
> 
> fprobe_kprobe_handler() {
>     kprobe_busy_begin() {
>        preempt_disable() {
>           preempt_count_add() {  <-- trace
>              fprobe_kprobe_handler() {
> 		[ wash, rinse, repeat, CRASH!!! ]
> 
> Either the kprobe_busy_begin() needs to use preempt_disable_notrace()
> versions, or fprobe_kprobe_handle() needs a
> ftrace_test_recursion_trylock() call.

Currently, in verifier we have:

BTF_SET_START(btf_id_deny)
BTF_ID_UNUSED
#ifdef CONFIG_SMP
BTF_ID(func, migrate_disable)
BTF_ID(func, migrate_enable)
#endif
#if !defined CONFIG_PREEMPT_RCU && !defined CONFIG_TINY_RCU
BTF_ID(func, rcu_read_unlock_strict)
#endif
#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_TRACE_PREEMPT_TOGGLE)
BTF_ID(func, preempt_count_add)
BTF_ID(func, preempt_count_sub)
#endif
#ifdef CONFIG_PREEMPT_RCU
BTF_ID(func, __rcu_read_lock)
BTF_ID(func, __rcu_read_unlock)
#endif
BTF_SET_END(btf_id_deny)

         ...
         } else if (prog->type == BPF_PROG_TYPE_TRACING &&
                    btf_id_set_contains(&btf_id_deny, btf_id)) {
                 return -EINVAL;
         }

Since we do not have a explicit deny list available to user space,
the above checking will prevent to trace a few functions for
tracing prog (fentry, fexit, fmod_ret).

For fprobe_kprobe case, if we can construct a user visible deny
list which will be the best. Otherwise, we can add a
btf_id_deny_fprobe btf set which should work too.

> 
> -- Steve
Ze Gao May 15, 2023, 5:59 a.m. UTC | #14
Dear all,

On Thu, May 11, 2023 at 9:06 AM Ze Gao <zegao2021@gmail.com> wrote:
>
> I'm afraid filtering in user space tools is not enough, cause it's a kernel BUG.
>
> it would 100% trigger a kernel crash if you run cmd like
>
> retsnoop -e 'pick_next_task_fair' -a ':kernel/sched/*.c' -vvv
>
> which is caused by that BPF_LINK_TYPE_KPROBE_MULTI accidentally
> attaches bpf progs
> to preempt_count_{add, sub}, which in turn triggers stackoverflow
> because the handler itself
> calls those functions.

I managed to see the big picture of this problem by digging into the code.

here is what it goes:

rethook_trampoline_handler{
   preempt_disable() {
      preempt_count_add() {  <-- trace
          fprobe_handler() {
            ftrace_test_recursion_trylock
            ...
            ftrace_test_recursion_unlock    <- it fails to detect the
recursion caused by rethook (rethook_trampoline_handler precisely for
this case)  routines.
          }
          ...
          rethook_trampoline_handler {
          [ wash, rinse, repeat, CRASH!!! ]

There are some pitfalls here:
1.  fprobe exit callback should be guarded by ftrace recursion check
as well since user might call any traceable functions
just like kprobe_multi_link_prog_run calls migrate_{disable, enable}.
In this case,  detection in fprobe_handler only is not
enough.
2. rethook_trampoline_handler should use preempt_{disable,
enable}_notrace instead because it's beyond recursion-free
region guarded like 1
3. many rethook helper functions are also used outside the
recursion-free regions and therefore they should be marked
notrace

I've post a new series of patches to resolve cases mentioned above:
  [Link]: https://lkml.org/lkml/2023/5/14/452

In theory, bpf_kprobe as the user of fprobe+ rethook, is spared from
suffering recursion again by applying these. And
  [Link]: https://lore.kernel.org/all/20230513090548.376522-1-zegao@tencent.com/T/#u
becomes optional.

That leaves one final question, whether we need probe blacklist or
bpf_kprobe blacklist depends on how we deal with
user requests if one of its expected hook points fails because of
recursion detected. Do we need to reject them in the first place
by blacklist or  let it fail to execute the callback silently, which
needs your gentle advice.

Regards,
Ze
Masami Hiramatsu (Google) May 16, 2023, 4:31 a.m. UTC | #15
On Sat, 13 May 2023 00:17:57 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Fri, 12 May 2023 07:29:02 -0700
> Yonghong Song <yhs@meta.com> wrote:
> 
> > A fprobe_blacklist might make sense indeed as fprobe and kprobe are 
> > quite different... Thanks for working on this.
> 
> Hmm, I think I see the problem:
> 
> fprobe_kprobe_handler() {
>    kprobe_busy_begin() {
>       preempt_disable() {
>          preempt_count_add() {  <-- trace
>             fprobe_kprobe_handler() {
> 		[ wash, rinse, repeat, CRASH!!! ]
> 
> Either the kprobe_busy_begin() needs to use preempt_disable_notrace()
> versions, or fprobe_kprobe_handle() needs a
> ftrace_test_recursion_trylock() call.

Oops, I got it. Is preempt_count_add() tracable? If so, kprobe_busy_begin()
should be updated.

Thanks,

> 
> -- Steve
Masami Hiramatsu (Google) May 16, 2023, 4:57 a.m. UTC | #16
On Thu, 11 May 2023 09:24:18 +0800
Ze Gao <zegao2021@gmail.com> wrote:

> Thank yonghong for your sage reviews.
> Yes, this is an option I am also considering . I will try this out
> later to see if works
> 
> But like you said it's not clear whether kprobe blacklist== fprobe blacklist.

Just FYI, those are not the same. kprobe blacklist is functions marked
by __kprobes or NOKPROBE_SYMBOL(), but fprobe blacklist is "notrace"
functions.

Thank you,

> And also there are cases I need to investigate on, like how to avoid recursions
> when kprobes and fprobes are mixed.
> 
> Rejecting symbols  kprobe_blacklisted is kinda brute-force yet a straight way to
> avoid kernel crash AFAIK.
> 
> Ze
> 
> On Thu, May 11, 2023 at 7:54 AM Yonghong Song <yhs@meta.com> wrote:
> >
> >
> >
> > On 5/10/23 1:20 PM, Yonghong Song wrote:
> > >
> > >
> > > On 5/10/23 10:27 AM, Jiri Olsa wrote:
> > >> On Wed, May 10, 2023 at 07:13:58AM -0700, Yonghong Song wrote:
> > >>>
> > >>>
> > >>> On 5/10/23 5:20 AM, Ze Gao wrote:
> > >>>> BPF_LINK_TYPE_KPROBE_MULTI attaches kprobe programs through fprobe,
> > >>>> however it does not takes those kprobe blacklisted into consideration,
> > >>>> which likely introduce recursive traps and blows up stacks.
> > >>>>
> > >>>> this patch adds simple check and remove those are in kprobe_blacklist
> > >>>> from one fprobe during bpf_kprobe_multi_link_attach. And also
> > >>>> check_kprobe_address_safe is open for more future checks.
> > >>>>
> > >>>> note that ftrace provides recursion detection mechanism, but for kprobe
> > >>>> only, we can directly reject those cases early without turning to
> > >>>> ftrace.
> > >>>>
> > >>>> Signed-off-by: Ze Gao <zegao@tencent.com>
> > >>>> ---
> > >>>>    kernel/trace/bpf_trace.c | 37 +++++++++++++++++++++++++++++++++++++
> > >>>>    1 file changed, 37 insertions(+)
> > >>>>
> > >>>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > >>>> index 9a050e36dc6c..44c68bc06bbd 100644
> > >>>> --- a/kernel/trace/bpf_trace.c
> > >>>> +++ b/kernel/trace/bpf_trace.c
> > >>>> @@ -2764,6 +2764,37 @@ static int get_modules_for_addrs(struct
> > >>>> module ***mods, unsigned long *addrs, u3
> > >>>>        return arr.mods_cnt;
> > >>>>    }
> > >>>> +static inline int check_kprobe_address_safe(unsigned long addr)
> > >>>> +{
> > >>>> +    if (within_kprobe_blacklist(addr))
> > >>>> +        return -EINVAL;
> > >>>> +    else
> > >>>> +        return 0;
> > >>>> +}
> > >>>> +
> > >>>> +static int check_bpf_kprobe_addrs_safe(unsigned long *addrs, int num)
> > >>>> +{
> > >>>> +    int i, cnt;
> > >>>> +    char symname[KSYM_NAME_LEN];
> > >>>> +
> > >>>> +    for (i = 0; i < num; ++i) {
> > >>>> +        if (check_kprobe_address_safe((unsigned long)addrs[i])) {
> > >>>> +            lookup_symbol_name(addrs[i], symname);
> > >>>> +            pr_warn("bpf_kprobe: %s at %lx is blacklisted\n",
> > >>>> symname, addrs[i]);
> > >>>
> > >>> So user request cannot be fulfilled and a warning is issued and some
> > >>> of user requests are discarded and the rest is proceeded. Does not
> > >>> sound a good idea.
> > >>>
> > >>> Maybe we should do filtering in user space, e.g., in libbpf, check
> > >>> /sys/kernel/debug/kprobes/blacklist and return error
> > >>> earlier? bpftrace/libbpf-tools/bcc-tools all do filtering before
> > >>> requesting kprobe in the kernel.
> > >>
> > >> also fprobe uses ftrace drectly without paths in kprobe, so I wonder
> > >> some of the kprobe blacklisted functions are actually safe
> > >
> > > Could you give a pointer about 'some of the kprobe blacklisted
> > > functions are actually safe'?
> >
> > Thanks Jiri for answering my question. it is not clear whether
> > kprobe blacklist == fprobe blacklist, probably not.
> >
> > You mentioned:
> >    note that ftrace provides recursion detection mechanism,
> >    but for kprobe only
> > Maybe the right choice is to improve ftrace to provide recursion
> > detection mechanism for fprobe as well?
> >
> > >
> > >>
> > >> jirka
> > >>
> > >>>
> > >>>> +            /* mark blacklisted symbol for remove */
> > >>>> +            addrs[i] = 0;
> > >>>> +        }
> > >>>> +    }
> > >>>> +
> > >>>> +    /* remove blacklisted symbol from addrs */
> > >>>> +    for (i = 0, cnt = 0; i < num; ++i) {
> > >>>> +        if (addrs[i])
> > >>>> +            addrs[cnt++]  = addrs[i];
> > >>>> +    }
> > >>>> +
> > >>>> +    return cnt;
> > >>>> +}
> > >>>> +
> > >>>>    int bpf_kprobe_multi_link_attach(const union bpf_attr *attr,
> > >>>> struct bpf_prog *prog)
> > >>>>    {
> > >>>>        struct bpf_kprobe_multi_link *link = NULL;
> > >>>> @@ -2859,6 +2890,12 @@ int bpf_kprobe_multi_link_attach(const union
> > >>>> bpf_attr *attr, struct bpf_prog *pr
> > >>>>        else
> > >>>>            link->fp.entry_handler = kprobe_multi_link_handler;
> > >>>> +    cnt = check_bpf_kprobe_addrs_safe(addrs, cnt);
> > >>>> +    if (!cnt) {
> > >>>> +        err = -EINVAL;
> > >>>> +        goto error;
> > >>>> +    }
> > >>>> +
> > >>>>        link->addrs = addrs;
> > >>>>        link->cookies = cookies;
> > >>>>        link->cnt = cnt;
Masami Hiramatsu (Google) May 16, 2023, 5:10 a.m. UTC | #17
On Tue, 16 May 2023 13:31:53 +0900
Masami Hiramatsu (Google) <mhiramat@kernel.org> wrote:

> On Sat, 13 May 2023 00:17:57 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > On Fri, 12 May 2023 07:29:02 -0700
> > Yonghong Song <yhs@meta.com> wrote:
> > 
> > > A fprobe_blacklist might make sense indeed as fprobe and kprobe are 
> > > quite different... Thanks for working on this.
> > 
> > Hmm, I think I see the problem:
> > 
> > fprobe_kprobe_handler() {
> >    kprobe_busy_begin() {
> >       preempt_disable() {
> >          preempt_count_add() {  <-- trace
> >             fprobe_kprobe_handler() {
> > 		[ wash, rinse, repeat, CRASH!!! ]
> > 
> > Either the kprobe_busy_begin() needs to use preempt_disable_notrace()
> > versions, or fprobe_kprobe_handle() needs a
> > ftrace_test_recursion_trylock() call.
> 
> Oops, I got it. Is preempt_count_add() tracable? If so, kprobe_busy_begin()
> should be updated.

OK, preempt_count_add() is NOKPROBE_SYMBOL() so kprobe_busy_begin() should
be safe. The problem is in fprobe_kprobe_handler() then.

Thanks!

> 
> Thanks,
> 
> > 
> > -- Steve
> 
> 
> -- 
> Masami Hiramatsu (Google) <mhiramat@kernel.org>
Masami Hiramatsu (Google) May 16, 2023, 5:49 a.m. UTC | #18
On Fri, 12 May 2023 07:29:02 -0700
Yonghong Song <yhs@meta.com> wrote:

> 
> 
> On 5/11/23 10:53 PM, Ze Gao wrote:
> > Yes, Jiri. Thanks for pointing it out. It's true that not all probe
> > blacklisted functions should be banned from bpf_kprobe.
> > 
> > I tried some of them, and all kprobe blacklisted symbols I hooked
> > works fine except preempt_count_{sub, add}.
> > so the takeaway here is preempt_cout_{sub, add} must be rejected at
> > least for now since kprobe_multi_link_prog_run
> > ( i.e., the fprobe handler) and rethook_trampoline_handler( i.e. the
> > rethook handler) calls preempt_cout_{sub, add}.
> > 
> > I'm considering providing a general  fprobe_blacklist framework just
> > like what kprobe does to allow others to mark
> > functions used inside fprobe handler or rethook handler as NOFPROBE to
> > avoid potential stack recursion. But only after
> > I figure out how ftrace handles recursion problems currently and why
> > it fails in the case I ran into.
> 
> A fprobe_blacklist might make sense indeed as fprobe and kprobe are 
> quite different... Thanks for working on this.

No, I don't like fprobe_blacklist, because you can filter user given
function with <tracefs>/available_filter_functions :)
If the function is not listed there, you can not put fprobe on it.
IOW, kprobe_multi_link_prog_run only covers those functions. (white-list)

At the tooling side, it should check whether the probe is defined for
single function or multiple functions, and use kprobe-blacklist (single)
or available_filter_functions (multiple).

Thank you,

> 
> > 
> > Thanks
> > Ze
> > 
> > On Thu, May 11, 2023 at 1:28 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >>
> >> On Wed, May 10, 2023 at 07:13:58AM -0700, Yonghong Song wrote:
> >>>
> >>>
> >>> On 5/10/23 5:20 AM, Ze Gao wrote:
> >>>> BPF_LINK_TYPE_KPROBE_MULTI attaches kprobe programs through fprobe,
> >>>> however it does not takes those kprobe blacklisted into consideration,
> >>>> which likely introduce recursive traps and blows up stacks.
> >>>>
> >>>> this patch adds simple check and remove those are in kprobe_blacklist
> >>>> from one fprobe during bpf_kprobe_multi_link_attach. And also
> >>>> check_kprobe_address_safe is open for more future checks.
> >>>>
> >>>> note that ftrace provides recursion detection mechanism, but for kprobe
> >>>> only, we can directly reject those cases early without turning to ftrace.
> >>>>
> >>>> Signed-off-by: Ze Gao <zegao@tencent.com>
> >>>> ---
> >>>>    kernel/trace/bpf_trace.c | 37 +++++++++++++++++++++++++++++++++++++
> >>>>    1 file changed, 37 insertions(+)
> >>>>
> >>>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> >>>> index 9a050e36dc6c..44c68bc06bbd 100644
> >>>> --- a/kernel/trace/bpf_trace.c
> >>>> +++ b/kernel/trace/bpf_trace.c
> >>>> @@ -2764,6 +2764,37 @@ static int get_modules_for_addrs(struct module ***mods, unsigned long *addrs, u3
> >>>>      return arr.mods_cnt;
> >>>>    }
> >>>> +static inline int check_kprobe_address_safe(unsigned long addr)
> >>>> +{
> >>>> +   if (within_kprobe_blacklist(addr))
> >>>> +           return -EINVAL;
> >>>> +   else
> >>>> +           return 0;
> >>>> +}
> >>>> +
> >>>> +static int check_bpf_kprobe_addrs_safe(unsigned long *addrs, int num)
> >>>> +{
> >>>> +   int i, cnt;
> >>>> +   char symname[KSYM_NAME_LEN];
> >>>> +
> >>>> +   for (i = 0; i < num; ++i) {
> >>>> +           if (check_kprobe_address_safe((unsigned long)addrs[i])) {
> >>>> +                   lookup_symbol_name(addrs[i], symname);
> >>>> +                   pr_warn("bpf_kprobe: %s at %lx is blacklisted\n", symname, addrs[i]);
> >>>
> >>> So user request cannot be fulfilled and a warning is issued and some
> >>> of user requests are discarded and the rest is proceeded. Does not
> >>> sound a good idea.
> >>>
> >>> Maybe we should do filtering in user space, e.g., in libbpf, check
> >>> /sys/kernel/debug/kprobes/blacklist and return error
> >>> earlier? bpftrace/libbpf-tools/bcc-tools all do filtering before
> >>> requesting kprobe in the kernel.
> >>
> >> also fprobe uses ftrace drectly without paths in kprobe, so I wonder
> >> some of the kprobe blacklisted functions are actually safe
> >>
> >> jirka
> >>
> >>>
> >>>> +                   /* mark blacklisted symbol for remove */
> >>>> +                   addrs[i] = 0;
> >>>> +           }
> >>>> +   }
> >>>> +
> >>>> +   /* remove blacklisted symbol from addrs */
> >>>> +   for (i = 0, cnt = 0; i < num; ++i) {
> >>>> +           if (addrs[i])
> >>>> +                   addrs[cnt++]  = addrs[i];
> >>>> +   }
> >>>> +
> >>>> +   return cnt;
> >>>> +}
> >>>> +
> >>>>    int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
> >>>>    {
> >>>>      struct bpf_kprobe_multi_link *link = NULL;
> >>>> @@ -2859,6 +2890,12 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
> >>>>      else
> >>>>              link->fp.entry_handler = kprobe_multi_link_handler;
> >>>> +   cnt = check_bpf_kprobe_addrs_safe(addrs, cnt);
> >>>> +   if (!cnt) {
> >>>> +           err = -EINVAL;
> >>>> +           goto error;
> >>>> +   }
> >>>> +
> >>>>      link->addrs = addrs;
> >>>>      link->cookies = cookies;
> >>>>      link->cnt = cnt;
Yonghong Song May 16, 2023, 3:16 p.m. UTC | #19
On 5/15/23 10:49 PM, Masami Hiramatsu (Google) wrote:
> On Fri, 12 May 2023 07:29:02 -0700
> Yonghong Song <yhs@meta.com> wrote:
> 
>>
>>
>> On 5/11/23 10:53 PM, Ze Gao wrote:
>>> Yes, Jiri. Thanks for pointing it out. It's true that not all probe
>>> blacklisted functions should be banned from bpf_kprobe.
>>>
>>> I tried some of them, and all kprobe blacklisted symbols I hooked
>>> works fine except preempt_count_{sub, add}.
>>> so the takeaway here is preempt_cout_{sub, add} must be rejected at
>>> least for now since kprobe_multi_link_prog_run
>>> ( i.e., the fprobe handler) and rethook_trampoline_handler( i.e. the
>>> rethook handler) calls preempt_cout_{sub, add}.
>>>
>>> I'm considering providing a general  fprobe_blacklist framework just
>>> like what kprobe does to allow others to mark
>>> functions used inside fprobe handler or rethook handler as NOFPROBE to
>>> avoid potential stack recursion. But only after
>>> I figure out how ftrace handles recursion problems currently and why
>>> it fails in the case I ran into.
>>
>> A fprobe_blacklist might make sense indeed as fprobe and kprobe are
>> quite different... Thanks for working on this.
> 
> No, I don't like fprobe_blacklist, because you can filter user given
> function with <tracefs>/available_filter_functions :)
> If the function is not listed there, you can not put fprobe on it.
> IOW, kprobe_multi_link_prog_run only covers those functions. (white-list)
> 
> At the tooling side, it should check whether the probe is defined for
> single function or multiple functions, and use kprobe-blacklist (single)
> or available_filter_functions (multiple).

Thanks for clarification. So basically fprobe blacklist is similar to
fentry, not able to trace functions marked with notrace. So agree,
the checking scheme could be:
   - user space to check available_filter_functions
   - a few other tracable functions but may have recursion effect
     handled by infrastructure for fprobe case. fentry case is already
     covered by verifier to deny a few functions like preempt_count_sub
     etc.

> 
> Thank you,
> 
>>
>>>
>>> Thanks
>>> Ze
>>>
>>> On Thu, May 11, 2023 at 1:28 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>>>>
>>>> On Wed, May 10, 2023 at 07:13:58AM -0700, Yonghong Song wrote:
>>>>>
>>>>>
>>>>> On 5/10/23 5:20 AM, Ze Gao wrote:
>>>>>> BPF_LINK_TYPE_KPROBE_MULTI attaches kprobe programs through fprobe,
>>>>>> however it does not takes those kprobe blacklisted into consideration,
>>>>>> which likely introduce recursive traps and blows up stacks.
>>>>>>
>>>>>> this patch adds simple check and remove those are in kprobe_blacklist
>>>>>> from one fprobe during bpf_kprobe_multi_link_attach. And also
>>>>>> check_kprobe_address_safe is open for more future checks.
>>>>>>
>>>>>> note that ftrace provides recursion detection mechanism, but for kprobe
>>>>>> only, we can directly reject those cases early without turning to ftrace.
>>>>>>
>>>>>> Signed-off-by: Ze Gao <zegao@tencent.com>
>>>>>> ---
>>>>>>     kernel/trace/bpf_trace.c | 37 +++++++++++++++++++++++++++++++++++++
>>>>>>     1 file changed, 37 insertions(+)
>>>>>>
>>>>>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>>>>>> index 9a050e36dc6c..44c68bc06bbd 100644
>>>>>> --- a/kernel/trace/bpf_trace.c
>>>>>> +++ b/kernel/trace/bpf_trace.c
>>>>>> @@ -2764,6 +2764,37 @@ static int get_modules_for_addrs(struct module ***mods, unsigned long *addrs, u3
>>>>>>       return arr.mods_cnt;
>>>>>>     }
>>>>>> +static inline int check_kprobe_address_safe(unsigned long addr)
>>>>>> +{
>>>>>> +   if (within_kprobe_blacklist(addr))
>>>>>> +           return -EINVAL;
>>>>>> +   else
>>>>>> +           return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static int check_bpf_kprobe_addrs_safe(unsigned long *addrs, int num)
>>>>>> +{
>>>>>> +   int i, cnt;
>>>>>> +   char symname[KSYM_NAME_LEN];
>>>>>> +
>>>>>> +   for (i = 0; i < num; ++i) {
>>>>>> +           if (check_kprobe_address_safe((unsigned long)addrs[i])) {
>>>>>> +                   lookup_symbol_name(addrs[i], symname);
>>>>>> +                   pr_warn("bpf_kprobe: %s at %lx is blacklisted\n", symname, addrs[i]);
>>>>>
>>>>> So user request cannot be fulfilled and a warning is issued and some
>>>>> of user requests are discarded and the rest is proceeded. Does not
>>>>> sound a good idea.
>>>>>
>>>>> Maybe we should do filtering in user space, e.g., in libbpf, check
>>>>> /sys/kernel/debug/kprobes/blacklist and return error
>>>>> earlier? bpftrace/libbpf-tools/bcc-tools all do filtering before
>>>>> requesting kprobe in the kernel.
>>>>
>>>> also fprobe uses ftrace drectly without paths in kprobe, so I wonder
>>>> some of the kprobe blacklisted functions are actually safe
>>>>
>>>> jirka
>>>>
>>>>>
>>>>>> +                   /* mark blacklisted symbol for remove */
>>>>>> +                   addrs[i] = 0;
>>>>>> +           }
>>>>>> +   }
>>>>>> +
>>>>>> +   /* remove blacklisted symbol from addrs */
>>>>>> +   for (i = 0, cnt = 0; i < num; ++i) {
>>>>>> +           if (addrs[i])
>>>>>> +                   addrs[cnt++]  = addrs[i];
>>>>>> +   }
>>>>>> +
>>>>>> +   return cnt;
>>>>>> +}
>>>>>> +
>>>>>>     int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
>>>>>>     {
>>>>>>       struct bpf_kprobe_multi_link *link = NULL;
>>>>>> @@ -2859,6 +2890,12 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
>>>>>>       else
>>>>>>               link->fp.entry_handler = kprobe_multi_link_handler;
>>>>>> +   cnt = check_bpf_kprobe_addrs_safe(addrs, cnt);
>>>>>> +   if (!cnt) {
>>>>>> +           err = -EINVAL;
>>>>>> +           goto error;
>>>>>> +   }
>>>>>> +
>>>>>>       link->addrs = addrs;
>>>>>>       link->cookies = cookies;
>>>>>>       link->cnt = cnt;
> 
>
diff mbox series

Patch

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 9a050e36dc6c..44c68bc06bbd 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2764,6 +2764,37 @@  static int get_modules_for_addrs(struct module ***mods, unsigned long *addrs, u3
 	return arr.mods_cnt;
 }
 
+static inline int check_kprobe_address_safe(unsigned long addr)
+{
+	if (within_kprobe_blacklist(addr))
+		return -EINVAL;
+	else
+		return 0;
+}
+
+static int check_bpf_kprobe_addrs_safe(unsigned long *addrs, int num)
+{
+	int i, cnt;
+	char symname[KSYM_NAME_LEN];
+
+	for (i = 0; i < num; ++i) {
+		if (check_kprobe_address_safe((unsigned long)addrs[i])) {
+			lookup_symbol_name(addrs[i], symname);
+			pr_warn("bpf_kprobe: %s at %lx is blacklisted\n", symname, addrs[i]);
+			/* mark blacklisted symbol for remove */
+			addrs[i] = 0;
+		}
+	}
+
+	/* remove blacklisted symbol from addrs */
+	for (i = 0, cnt = 0; i < num; ++i) {
+		if (addrs[i])
+			addrs[cnt++]  = addrs[i];
+	}
+
+	return cnt;
+}
+
 int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 {
 	struct bpf_kprobe_multi_link *link = NULL;
@@ -2859,6 +2890,12 @@  int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
 	else
 		link->fp.entry_handler = kprobe_multi_link_handler;
 
+	cnt = check_bpf_kprobe_addrs_safe(addrs, cnt);
+	if (!cnt) {
+		err = -EINVAL;
+		goto error;
+	}
+
 	link->addrs = addrs;
 	link->cookies = cookies;
 	link->cnt = cnt;