diff mbox series

[v1,bpf-next,1/7] bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire

Message ID 20230801203630.3581291-2-davemarchevsky@fb.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1354 this patch: 1354
netdev/cc_maintainers warning 8 maintainers not CCed: kpsingh@kernel.org martin.lau@linux.dev john.fastabend@gmail.com sdf@google.com song@kernel.org yonghong.song@linux.dev jolsa@kernel.org haoluo@google.com
netdev/build_clang success Errors and warnings before: 1365 this patch: 1365
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1377 this patch: 1377
netdev/checkpatch warning WARNING: line length of 105 exceeds 80 columns WARNING: line length of 107 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ${{ matrix.test }} on ${{ matrix.arch }} with ${{ matrix.toolchain_full }}
bpf/vmtest-bpf-next-VM_Test-2 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-4 fail Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-5 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-6 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-7 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-8 success Logs for veristat

Commit Message

Dave Marchevsky Aug. 1, 2023, 8:36 p.m. UTC
It's straightforward to prove that kptr_struct_meta must be non-NULL for
any valid call to these kfuncs:

  * btf_parse_struct_metas in btf.c creates a btf_struct_meta for any
    struct in user BTF with a special field (e.g. bpf_refcount,
    {rb,list}_node). These are stored in that BTF's struct_meta_tab.

  * __process_kf_arg_ptr_to_graph_node in verifier.c ensures that nodes
    have {rb,list}_node field and that it's at the correct offset.
    Similarly, check_kfunc_args ensures bpf_refcount field existence for
    node param to bpf_refcount_acquire.

  * So a btf_struct_meta must have been created for the struct type of
    node param to these kfuncs

  * That BTF and its struct_meta_tab are guaranteed to still be around.
    Any arbitrary {rb,list} node the BPF program interacts with either:
    came from bpf_obj_new or a collection removal kfunc in the same
    program, in which case the BTF is associated with the program and
    still around; or came from bpf_kptr_xchg, in which case the BTF was
    associated with the map and is still around

Instead of silently continuing with NULL struct_meta, which caused
confusing bugs such as those addressed by commit 2140a6e3422d ("bpf: Set
kptr_struct_meta for node param to list and rbtree insert funcs"), let's
error out. Then, at runtime, we can confidently say that the
implementations of these kfuncs were given a non-NULL kptr_struct_meta,
meaning that special-field-specific functionality like
bpf_obj_free_fields and the bpf_obj_drop change introduced later in this
series are guaranteed to execute.

This patch doesn't change functionality, just makes it easier to reason
about existing functionality.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
---
 kernel/bpf/verifier.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Yonghong Song Aug. 2, 2023, 3:57 a.m. UTC | #1
On 8/1/23 1:36 PM, Dave Marchevsky wrote:
> It's straightforward to prove that kptr_struct_meta must be non-NULL for
> any valid call to these kfuncs:
> 
>    * btf_parse_struct_metas in btf.c creates a btf_struct_meta for any
>      struct in user BTF with a special field (e.g. bpf_refcount,
>      {rb,list}_node). These are stored in that BTF's struct_meta_tab.
> 
>    * __process_kf_arg_ptr_to_graph_node in verifier.c ensures that nodes
>      have {rb,list}_node field and that it's at the correct offset.
>      Similarly, check_kfunc_args ensures bpf_refcount field existence for
>      node param to bpf_refcount_acquire.
> 
>    * So a btf_struct_meta must have been created for the struct type of
>      node param to these kfuncs
> 
>    * That BTF and its struct_meta_tab are guaranteed to still be around.
>      Any arbitrary {rb,list} node the BPF program interacts with either:
>      came from bpf_obj_new or a collection removal kfunc in the same
>      program, in which case the BTF is associated with the program and
>      still around; or came from bpf_kptr_xchg, in which case the BTF was
>      associated with the map and is still around
> 
> Instead of silently continuing with NULL struct_meta, which caused
> confusing bugs such as those addressed by commit 2140a6e3422d ("bpf: Set
> kptr_struct_meta for node param to list and rbtree insert funcs"), let's
> error out. Then, at runtime, we can confidently say that the
> implementations of these kfuncs were given a non-NULL kptr_struct_meta,
> meaning that special-field-specific functionality like
> bpf_obj_free_fields and the bpf_obj_drop change introduced later in this
> series are guaranteed to execute.

The subject says '... for collection insert and refcount_acquire'.
Why picks these? We could check for all kptr_struct_meta use cases?

> 
> This patch doesn't change functionality, just makes it easier to reason
> about existing functionality.
> 
> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
> ---
>   kernel/bpf/verifier.c | 14 ++++++++++++++
>   1 file changed, 14 insertions(+)
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index e7b1af016841..ec37e84a11c6 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -18271,6 +18271,13 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>   		struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
>   		struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_2, (long)kptr_struct_meta) };
>   
> +		if (desc->func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl] &&

Why check for KF_bpf_refcount_acquire_impl? We can cover all cases in 
this 'if' branch, right?

> +		    !kptr_struct_meta) {
> +			verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n",
> +				insn_idx);
> +			return -EFAULT;
> +		}
> +
>   		insn_buf[0] = addr[0];
>   		insn_buf[1] = addr[1];
>   		insn_buf[2] = *insn;
> @@ -18278,6 +18285,7 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>   	} else if (desc->func_id == special_kfunc_list[KF_bpf_list_push_back_impl] ||
>   		   desc->func_id == special_kfunc_list[KF_bpf_list_push_front_impl] ||
>   		   desc->func_id == special_kfunc_list[KF_bpf_rbtree_add_impl]) {
> +		struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
>   		int struct_meta_reg = BPF_REG_3;
>   		int node_offset_reg = BPF_REG_4;
>   
> @@ -18287,6 +18295,12 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>   			node_offset_reg = BPF_REG_5;
>   		}
>   
> +		if (!kptr_struct_meta) {
> +			verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n",
> +				insn_idx);
> +			return -EFAULT;
> +		}
> +
>   		__fixup_collection_insert_kfunc(&env->insn_aux_data[insn_idx], struct_meta_reg,
>   						node_offset_reg, insn, insn_buf, cnt);
>   	} else if (desc->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx] ||

In my opinion, such selective defensive programming is not necessary. By 
searching kptr_struct_meta in the code, it is reasonably easy to find
whether we have any mismatch or not. Also self test coverage should
cover these cases (probably already) right?

If the defensive programming here is still desirable to warn at 
verification time, I think we should just check all of uses for 
kptr_struct_meta.
Dave Marchevsky Aug. 2, 2023, 7:23 p.m. UTC | #2
On 8/1/23 11:57 PM, Yonghong Song wrote:
> 
> 
> On 8/1/23 1:36 PM, Dave Marchevsky wrote:
>> It's straightforward to prove that kptr_struct_meta must be non-NULL for
>> any valid call to these kfuncs:
>>
>>    * btf_parse_struct_metas in btf.c creates a btf_struct_meta for any
>>      struct in user BTF with a special field (e.g. bpf_refcount,
>>      {rb,list}_node). These are stored in that BTF's struct_meta_tab.
>>
>>    * __process_kf_arg_ptr_to_graph_node in verifier.c ensures that nodes
>>      have {rb,list}_node field and that it's at the correct offset.
>>      Similarly, check_kfunc_args ensures bpf_refcount field existence for
>>      node param to bpf_refcount_acquire.
>>
>>    * So a btf_struct_meta must have been created for the struct type of
>>      node param to these kfuncs
>>
>>    * That BTF and its struct_meta_tab are guaranteed to still be around.
>>      Any arbitrary {rb,list} node the BPF program interacts with either:
>>      came from bpf_obj_new or a collection removal kfunc in the same
>>      program, in which case the BTF is associated with the program and
>>      still around; or came from bpf_kptr_xchg, in which case the BTF was
>>      associated with the map and is still around
>>
>> Instead of silently continuing with NULL struct_meta, which caused
>> confusing bugs such as those addressed by commit 2140a6e3422d ("bpf: Set
>> kptr_struct_meta for node param to list and rbtree insert funcs"), let's
>> error out. Then, at runtime, we can confidently say that the
>> implementations of these kfuncs were given a non-NULL kptr_struct_meta,
>> meaning that special-field-specific functionality like
>> bpf_obj_free_fields and the bpf_obj_drop change introduced later in this
>> series are guaranteed to execute.
> 
> The subject says '... for collection insert and refcount_acquire'.
> Why picks these? We could check for all kptr_struct_meta use cases?
> 

fixup_kfunc_call sets kptr_struct_meta arg for the following kfuncs:

  - bpf_obj_new_impl
  - bpf_obj_drop_impl
  - collection insert kfuncs
    - bpf_rbtree_add_impl
    - bpf_list_push_{front,back}_impl
  - bpf_refcount_acquire_impl

A btf_struct_meta is only created for a struct if it has a non-null btf_record,
which in turn only happens if the struct has any special fields (spin_lock,
refcount, {rb,list}_node, etc.). Since it's valid to call bpf_obj_new on a
struct type without any special fields, the kptr_struct_meta arg can be
NULL. The result of such bpf_obj_new allocation must be bpf_obj_drop-able, so
the same holds for that kfunc.

By definition rbtree and list nodes must be some struct type w/
struct bpf_{rb,list}_node field, and similar logic for refcounted, so if there's
no kptr_struct_meta for their node arg, there was some verifier-internal issue.


>>
>> This patch doesn't change functionality, just makes it easier to reason
>> about existing functionality.
>>
>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>> ---
>>   kernel/bpf/verifier.c | 14 ++++++++++++++
>>   1 file changed, 14 insertions(+)
>>
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index e7b1af016841..ec37e84a11c6 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -18271,6 +18271,13 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>           struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
>>           struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_2, (long)kptr_struct_meta) };
>>   +        if (desc->func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl] &&
> 
> Why check for KF_bpf_refcount_acquire_impl? We can cover all cases in this 'if' branch, right?
> 

The body of this 'else if' also handles kptr_struct_meta setup for bpf_obj_drop,
for which NULL kptr_struct_meta is valid. 

>> +            !kptr_struct_meta) {
>> +            verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n",
>> +                insn_idx);
>> +            return -EFAULT;
>> +        }
>> +
>>           insn_buf[0] = addr[0];
>>           insn_buf[1] = addr[1];
>>           insn_buf[2] = *insn;
>> @@ -18278,6 +18285,7 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>       } else if (desc->func_id == special_kfunc_list[KF_bpf_list_push_back_impl] ||
>>              desc->func_id == special_kfunc_list[KF_bpf_list_push_front_impl] ||
>>              desc->func_id == special_kfunc_list[KF_bpf_rbtree_add_impl]) {
>> +        struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
>>           int struct_meta_reg = BPF_REG_3;
>>           int node_offset_reg = BPF_REG_4;
>>   @@ -18287,6 +18295,12 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>               node_offset_reg = BPF_REG_5;
>>           }
>>   +        if (!kptr_struct_meta) {
>> +            verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n",
>> +                insn_idx);
>> +            return -EFAULT;
>> +        }
>> +
>>           __fixup_collection_insert_kfunc(&env->insn_aux_data[insn_idx], struct_meta_reg,
>>                           node_offset_reg, insn, insn_buf, cnt);
>>       } else if (desc->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx] ||
> 
> In my opinion, such selective defensive programming is not necessary. By searching kptr_struct_meta in the code, it is reasonably easy to find
> whether we have any mismatch or not. Also self test coverage should
> cover these cases (probably already) right?
> 
> If the defensive programming here is still desirable to warn at verification time, I think we should just check all of uses for kptr_struct_meta.

Something like this patch probably should've been included with the series
containing 2140a6e3422d ("bpf: Set kptr_struct_meta for node param to list and rbtree insert funcs"),
since that commit found that kptr_struct_meta wasn't being set for collection
insert kfuncs and fixed the issue. It was annoyingly hard to root-cause
because, among other things, many of these kfunc impls check that
the btf_struct_meta is non-NULL before using it, with some fallback logic.
I don't like those unnecessary NULL checks either, and considered removing
them in this patch, but decided to leave them in since we already had
a case where struct_meta wasn't being set.

On second thought, maybe it's better to take the unnecessary runtime checks
out and leave these verification-time checks in. If, at runtime, those kfuncs
see a NULL btf_struct_meta, I'd rather they fail loudly in the future
with a NULL deref splat, than potentially leaking memory or similarly
subtle failures. WDYT?

I don't feel particularly strongly about these verification-time checks,
but the level of 'selective defensive programming' here feels similar to
other 'verifier internal error' checks sprinkled throughout verifier.c,
so that argument doesn't feel very persuasive to me.
Yonghong Song Aug. 2, 2023, 9:41 p.m. UTC | #3
On 8/2/23 12:23 PM, Dave Marchevsky wrote:
> 
> 
> On 8/1/23 11:57 PM, Yonghong Song wrote:
>>
>>
>> On 8/1/23 1:36 PM, Dave Marchevsky wrote:
>>> It's straightforward to prove that kptr_struct_meta must be non-NULL for
>>> any valid call to these kfuncs:
>>>
>>>     * btf_parse_struct_metas in btf.c creates a btf_struct_meta for any
>>>       struct in user BTF with a special field (e.g. bpf_refcount,
>>>       {rb,list}_node). These are stored in that BTF's struct_meta_tab.
>>>
>>>     * __process_kf_arg_ptr_to_graph_node in verifier.c ensures that nodes
>>>       have {rb,list}_node field and that it's at the correct offset.
>>>       Similarly, check_kfunc_args ensures bpf_refcount field existence for
>>>       node param to bpf_refcount_acquire.
>>>
>>>     * So a btf_struct_meta must have been created for the struct type of
>>>       node param to these kfuncs
>>>
>>>     * That BTF and its struct_meta_tab are guaranteed to still be around.
>>>       Any arbitrary {rb,list} node the BPF program interacts with either:
>>>       came from bpf_obj_new or a collection removal kfunc in the same
>>>       program, in which case the BTF is associated with the program and
>>>       still around; or came from bpf_kptr_xchg, in which case the BTF was
>>>       associated with the map and is still around
>>>
>>> Instead of silently continuing with NULL struct_meta, which caused
>>> confusing bugs such as those addressed by commit 2140a6e3422d ("bpf: Set
>>> kptr_struct_meta for node param to list and rbtree insert funcs"), let's
>>> error out. Then, at runtime, we can confidently say that the
>>> implementations of these kfuncs were given a non-NULL kptr_struct_meta,
>>> meaning that special-field-specific functionality like
>>> bpf_obj_free_fields and the bpf_obj_drop change introduced later in this
>>> series are guaranteed to execute.
>>
>> The subject says '... for collection insert and refcount_acquire'.
>> Why picks these? We could check for all kptr_struct_meta use cases?
>>
> 
> fixup_kfunc_call sets kptr_struct_meta arg for the following kfuncs:
> 
>    - bpf_obj_new_impl
>    - bpf_obj_drop_impl
>    - collection insert kfuncs
>      - bpf_rbtree_add_impl
>      - bpf_list_push_{front,back}_impl
>    - bpf_refcount_acquire_impl
> 
> A btf_struct_meta is only created for a struct if it has a non-null btf_record,
> which in turn only happens if the struct has any special fields (spin_lock,
> refcount, {rb,list}_node, etc.). Since it's valid to call bpf_obj_new on a
> struct type without any special fields, the kptr_struct_meta arg can be
> NULL. The result of such bpf_obj_new allocation must be bpf_obj_drop-able, so
> the same holds for that kfunc.
> 
> By definition rbtree and list nodes must be some struct type w/
> struct bpf_{rb,list}_node field, and similar logic for refcounted, so if there's
> no kptr_struct_meta for their node arg, there was some verifier-internal issue.
> 
> 
>>>
>>> This patch doesn't change functionality, just makes it easier to reason
>>> about existing functionality.
>>>
>>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>>> ---
>>>    kernel/bpf/verifier.c | 14 ++++++++++++++
>>>    1 file changed, 14 insertions(+)
>>>
>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>> index e7b1af016841..ec37e84a11c6 100644
>>> --- a/kernel/bpf/verifier.c
>>> +++ b/kernel/bpf/verifier.c
>>> @@ -18271,6 +18271,13 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>>            struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
>>>            struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_2, (long)kptr_struct_meta) };
>>>    +        if (desc->func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl] &&
>>
>> Why check for KF_bpf_refcount_acquire_impl? We can cover all cases in this 'if' branch, right?
>>
> 
> The body of this 'else if' also handles kptr_struct_meta setup for bpf_obj_drop,
> for which NULL kptr_struct_meta is valid.
> 
>>> +            !kptr_struct_meta) {
>>> +            verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n",
>>> +                insn_idx);
>>> +            return -EFAULT;
>>> +        }
>>> +
>>>            insn_buf[0] = addr[0];
>>>            insn_buf[1] = addr[1];
>>>            insn_buf[2] = *insn;
>>> @@ -18278,6 +18285,7 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>>        } else if (desc->func_id == special_kfunc_list[KF_bpf_list_push_back_impl] ||
>>>               desc->func_id == special_kfunc_list[KF_bpf_list_push_front_impl] ||
>>>               desc->func_id == special_kfunc_list[KF_bpf_rbtree_add_impl]) {
>>> +        struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
>>>            int struct_meta_reg = BPF_REG_3;
>>>            int node_offset_reg = BPF_REG_4;
>>>    @@ -18287,6 +18295,12 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>>                node_offset_reg = BPF_REG_5;
>>>            }
>>>    +        if (!kptr_struct_meta) {
>>> +            verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n",
>>> +                insn_idx);
>>> +            return -EFAULT;
>>> +        }
>>> +
>>>            __fixup_collection_insert_kfunc(&env->insn_aux_data[insn_idx], struct_meta_reg,
>>>                            node_offset_reg, insn, insn_buf, cnt);
>>>        } else if (desc->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx] ||
>>
>> In my opinion, such selective defensive programming is not necessary. By searching kptr_struct_meta in the code, it is reasonably easy to find
>> whether we have any mismatch or not. Also self test coverage should
>> cover these cases (probably already) right?
>>
>> If the defensive programming here is still desirable to warn at verification time, I think we should just check all of uses for kptr_struct_meta.
> 
> Something like this patch probably should've been included with the series
> containing 2140a6e3422d ("bpf: Set kptr_struct_meta for node param to list and rbtree insert funcs"),
> since that commit found that kptr_struct_meta wasn't being set for collection
> insert kfuncs and fixed the issue. It was annoyingly hard to root-cause
> because, among other things, many of these kfunc impls check that
> the btf_struct_meta is non-NULL before using it, with some fallback logic.
> I don't like those unnecessary NULL checks either, and considered removing
> them in this patch, but decided to leave them in since we already had
> a case where struct_meta wasn't being set.
> 
> On second thought, maybe it's better to take the unnecessary runtime checks
> out and leave these verification-time checks in. If, at runtime, those kfuncs
> see a NULL btf_struct_meta, I'd rather they fail loudly in the future
> with a NULL deref splat, than potentially leaking memory or similarly
> subtle failures. WDYT?

Certainly I agree with you that verification failure is much better than
debugging runtime.

Here, we covered a few kfunc which always requires non-NULL 
kptr_struct_meta. But as you mentioned in the above, we also have
cases where for a kfunc, the kptr_struct_meta could be NULL or non-NULL.

Let us say, kfunc bpf_obj_new_impl, for some cases, the kptr_struct_meta
cannot be NULL based on bpf prog, but somehow, the verifier passes
a NULL ptr to the program. Should we check this at fixup_kfunc_call()
as well?

> 
> I don't feel particularly strongly about these verification-time checks,
> but the level of 'selective defensive programming' here feels similar to
> other 'verifier internal error' checks sprinkled throughout verifier.c,
> so that argument doesn't feel very persuasive to me.

I am okay with this patch but I wonder whether we can cover more
cases.
David Marchevsky Aug. 4, 2023, 6:17 a.m. UTC | #4
On 8/2/23 5:41 PM, Yonghong Song wrote:
> 
> 
> On 8/2/23 12:23 PM, Dave Marchevsky wrote:
>>
>>
>> On 8/1/23 11:57 PM, Yonghong Song wrote:
>>>
>>>
>>> On 8/1/23 1:36 PM, Dave Marchevsky wrote:
>>>> It's straightforward to prove that kptr_struct_meta must be non-NULL for
>>>> any valid call to these kfuncs:
>>>>
>>>>     * btf_parse_struct_metas in btf.c creates a btf_struct_meta for any
>>>>       struct in user BTF with a special field (e.g. bpf_refcount,
>>>>       {rb,list}_node). These are stored in that BTF's struct_meta_tab.
>>>>
>>>>     * __process_kf_arg_ptr_to_graph_node in verifier.c ensures that nodes
>>>>       have {rb,list}_node field and that it's at the correct offset.
>>>>       Similarly, check_kfunc_args ensures bpf_refcount field existence for
>>>>       node param to bpf_refcount_acquire.
>>>>
>>>>     * So a btf_struct_meta must have been created for the struct type of
>>>>       node param to these kfuncs
>>>>
>>>>     * That BTF and its struct_meta_tab are guaranteed to still be around.
>>>>       Any arbitrary {rb,list} node the BPF program interacts with either:
>>>>       came from bpf_obj_new or a collection removal kfunc in the same
>>>>       program, in which case the BTF is associated with the program and
>>>>       still around; or came from bpf_kptr_xchg, in which case the BTF was
>>>>       associated with the map and is still around
>>>>
>>>> Instead of silently continuing with NULL struct_meta, which caused
>>>> confusing bugs such as those addressed by commit 2140a6e3422d ("bpf: Set
>>>> kptr_struct_meta for node param to list and rbtree insert funcs"), let's
>>>> error out. Then, at runtime, we can confidently say that the
>>>> implementations of these kfuncs were given a non-NULL kptr_struct_meta,
>>>> meaning that special-field-specific functionality like
>>>> bpf_obj_free_fields and the bpf_obj_drop change introduced later in this
>>>> series are guaranteed to execute.
>>>
>>> The subject says '... for collection insert and refcount_acquire'.
>>> Why picks these? We could check for all kptr_struct_meta use cases?
>>>
>>
>> fixup_kfunc_call sets kptr_struct_meta arg for the following kfuncs:
>>
>>    - bpf_obj_new_impl
>>    - bpf_obj_drop_impl
>>    - collection insert kfuncs
>>      - bpf_rbtree_add_impl
>>      - bpf_list_push_{front,back}_impl
>>    - bpf_refcount_acquire_impl
>>
>> A btf_struct_meta is only created for a struct if it has a non-null btf_record,
>> which in turn only happens if the struct has any special fields (spin_lock,
>> refcount, {rb,list}_node, etc.). Since it's valid to call bpf_obj_new on a
>> struct type without any special fields, the kptr_struct_meta arg can be
>> NULL. The result of such bpf_obj_new allocation must be bpf_obj_drop-able, so
>> the same holds for that kfunc.
>>
>> By definition rbtree and list nodes must be some struct type w/
>> struct bpf_{rb,list}_node field, and similar logic for refcounted, so if there's
>> no kptr_struct_meta for their node arg, there was some verifier-internal issue.
>>
>>
>>>>
>>>> This patch doesn't change functionality, just makes it easier to reason
>>>> about existing functionality.
>>>>
>>>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>>>> ---
>>>>    kernel/bpf/verifier.c | 14 ++++++++++++++
>>>>    1 file changed, 14 insertions(+)
>>>>
>>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>>> index e7b1af016841..ec37e84a11c6 100644
>>>> --- a/kernel/bpf/verifier.c
>>>> +++ b/kernel/bpf/verifier.c
>>>> @@ -18271,6 +18271,13 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>>>            struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
>>>>            struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_2, (long)kptr_struct_meta) };
>>>>    +        if (desc->func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl] &&
>>>
>>> Why check for KF_bpf_refcount_acquire_impl? We can cover all cases in this 'if' branch, right?
>>>
>>
>> The body of this 'else if' also handles kptr_struct_meta setup for bpf_obj_drop,
>> for which NULL kptr_struct_meta is valid.
>>
>>>> +            !kptr_struct_meta) {
>>>> +            verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n",
>>>> +                insn_idx);
>>>> +            return -EFAULT;
>>>> +        }
>>>> +
>>>>            insn_buf[0] = addr[0];
>>>>            insn_buf[1] = addr[1];
>>>>            insn_buf[2] = *insn;
>>>> @@ -18278,6 +18285,7 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>>>        } else if (desc->func_id == special_kfunc_list[KF_bpf_list_push_back_impl] ||
>>>>               desc->func_id == special_kfunc_list[KF_bpf_list_push_front_impl] ||
>>>>               desc->func_id == special_kfunc_list[KF_bpf_rbtree_add_impl]) {
>>>> +        struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
>>>>            int struct_meta_reg = BPF_REG_3;
>>>>            int node_offset_reg = BPF_REG_4;
>>>>    @@ -18287,6 +18295,12 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>>>                node_offset_reg = BPF_REG_5;
>>>>            }
>>>>    +        if (!kptr_struct_meta) {
>>>> +            verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n",
>>>> +                insn_idx);
>>>> +            return -EFAULT;
>>>> +        }
>>>> +
>>>>            __fixup_collection_insert_kfunc(&env->insn_aux_data[insn_idx], struct_meta_reg,
>>>>                            node_offset_reg, insn, insn_buf, cnt);
>>>>        } else if (desc->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx] ||
>>>
>>> In my opinion, such selective defensive programming is not necessary. By searching kptr_struct_meta in the code, it is reasonably easy to find
>>> whether we have any mismatch or not. Also self test coverage should
>>> cover these cases (probably already) right?
>>>
>>> If the defensive programming here is still desirable to warn at verification time, I think we should just check all of uses for kptr_struct_meta.
>>
>> Something like this patch probably should've been included with the series
>> containing 2140a6e3422d ("bpf: Set kptr_struct_meta for node param to list and rbtree insert funcs"),
>> since that commit found that kptr_struct_meta wasn't being set for collection
>> insert kfuncs and fixed the issue. It was annoyingly hard to root-cause
>> because, among other things, many of these kfunc impls check that
>> the btf_struct_meta is non-NULL before using it, with some fallback logic.
>> I don't like those unnecessary NULL checks either, and considered removing
>> them in this patch, but decided to leave them in since we already had
>> a case where struct_meta wasn't being set.
>>
>> On second thought, maybe it's better to take the unnecessary runtime checks
>> out and leave these verification-time checks in. If, at runtime, those kfuncs
>> see a NULL btf_struct_meta, I'd rather they fail loudly in the future
>> with a NULL deref splat, than potentially leaking memory or similarly
>> subtle failures. WDYT?
> 
> Certainly I agree with you that verification failure is much better than
> debugging runtime.
> 
> Here, we covered a few kfunc which always requires non-NULL kptr_struct_meta. But as you mentioned in the above, we also have
> cases where for a kfunc, the kptr_struct_meta could be NULL or non-NULL.
> 
> Let us say, kfunc bpf_obj_new_impl, for some cases, the kptr_struct_meta
> cannot be NULL based on bpf prog, but somehow, the verifier passes
> a NULL ptr to the program. Should we check this at fixup_kfunc_call()
> as well?
> 

I see what you mean. Since btf_struct_meta is a (btf_id, btf_record) tuple
currently, and btf_record is a view of a BTF type optimized for quickly
answering questions about special fields (existence / location / type), we
could certainly go look at the kfunc arg BTF types to confirm, as they're the
source of truth. But this would be redoing the work necessary to generate the
btf_record in the first place, which feels like it defeats the purpose of
having this view of the data at verification time. We definitely don't want
to push this BTF digging to runtime either.

Another approach: we could focus on the specific issue that caused the
bug which the commit I mentioned earlier fixes, and enforce the following
invariants:

   * All bpf_obj_new_impl calls for a particular type must use the same
     btf_struct_meta
   * Any kfunc or helper taking a user-allocated param must use the same
     btf_struct_meta as the bpf_obj_new_impl call that allocated it

Whole point of 'special' fields is to take special-field-specific action on
them, or due to their existence. This would ensure that all helpers use same
special field info. Then e.g. if bpf_obj_new's bpf_obj_init_fields does
something to a special field, bpf_obj_drop will have the same btf_struct_meta
and will have same field info for bpf_obj_free_fields.

Maybe instead of "bpf_obj_new_impl calls" above it's
more accurate to say "kfunc calls that acquire the
user-allocated type".

I think this second idea makes sense and is better than piecemeal checking.
Will think on it more and try to incorporate it into v2 of this series.

>>
>> I don't feel particularly strongly about these verification-time checks,
>> but the level of 'selective defensive programming' here feels similar to
>> other 'verifier internal error' checks sprinkled throughout verifier.c,
>> so that argument doesn't feel very persuasive to me.
> 
> I am okay with this patch but I wonder whether we can cover more
> cases.
Yonghong Song Aug. 4, 2023, 3:37 p.m. UTC | #5
On 8/3/23 11:17 PM, David Marchevsky wrote:
> On 8/2/23 5:41 PM, Yonghong Song wrote:
>>
>>
>> On 8/2/23 12:23 PM, Dave Marchevsky wrote:
>>>
>>>
>>> On 8/1/23 11:57 PM, Yonghong Song wrote:
>>>>
>>>>
>>>> On 8/1/23 1:36 PM, Dave Marchevsky wrote:
>>>>> It's straightforward to prove that kptr_struct_meta must be non-NULL for
>>>>> any valid call to these kfuncs:
>>>>>
>>>>>      * btf_parse_struct_metas in btf.c creates a btf_struct_meta for any
>>>>>        struct in user BTF with a special field (e.g. bpf_refcount,
>>>>>        {rb,list}_node). These are stored in that BTF's struct_meta_tab.
>>>>>
>>>>>      * __process_kf_arg_ptr_to_graph_node in verifier.c ensures that nodes
>>>>>        have {rb,list}_node field and that it's at the correct offset.
>>>>>        Similarly, check_kfunc_args ensures bpf_refcount field existence for
>>>>>        node param to bpf_refcount_acquire.
>>>>>
>>>>>      * So a btf_struct_meta must have been created for the struct type of
>>>>>        node param to these kfuncs
>>>>>
>>>>>      * That BTF and its struct_meta_tab are guaranteed to still be around.
>>>>>        Any arbitrary {rb,list} node the BPF program interacts with either:
>>>>>        came from bpf_obj_new or a collection removal kfunc in the same
>>>>>        program, in which case the BTF is associated with the program and
>>>>>        still around; or came from bpf_kptr_xchg, in which case the BTF was
>>>>>        associated with the map and is still around
>>>>>
>>>>> Instead of silently continuing with NULL struct_meta, which caused
>>>>> confusing bugs such as those addressed by commit 2140a6e3422d ("bpf: Set
>>>>> kptr_struct_meta for node param to list and rbtree insert funcs"), let's
>>>>> error out. Then, at runtime, we can confidently say that the
>>>>> implementations of these kfuncs were given a non-NULL kptr_struct_meta,
>>>>> meaning that special-field-specific functionality like
>>>>> bpf_obj_free_fields and the bpf_obj_drop change introduced later in this
>>>>> series are guaranteed to execute.
>>>>
>>>> The subject says '... for collection insert and refcount_acquire'.
>>>> Why picks these? We could check for all kptr_struct_meta use cases?
>>>>
>>>
>>> fixup_kfunc_call sets kptr_struct_meta arg for the following kfuncs:
>>>
>>>     - bpf_obj_new_impl
>>>     - bpf_obj_drop_impl
>>>     - collection insert kfuncs
>>>       - bpf_rbtree_add_impl
>>>       - bpf_list_push_{front,back}_impl
>>>     - bpf_refcount_acquire_impl
>>>
>>> A btf_struct_meta is only created for a struct if it has a non-null btf_record,
>>> which in turn only happens if the struct has any special fields (spin_lock,
>>> refcount, {rb,list}_node, etc.). Since it's valid to call bpf_obj_new on a
>>> struct type without any special fields, the kptr_struct_meta arg can be
>>> NULL. The result of such bpf_obj_new allocation must be bpf_obj_drop-able, so
>>> the same holds for that kfunc.
>>>
>>> By definition rbtree and list nodes must be some struct type w/
>>> struct bpf_{rb,list}_node field, and similar logic for refcounted, so if there's
>>> no kptr_struct_meta for their node arg, there was some verifier-internal issue.
>>>
>>>
>>>>>
>>>>> This patch doesn't change functionality, just makes it easier to reason
>>>>> about existing functionality.
>>>>>
>>>>> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
>>>>> ---
>>>>>     kernel/bpf/verifier.c | 14 ++++++++++++++
>>>>>     1 file changed, 14 insertions(+)
>>>>>
>>>>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>>>>> index e7b1af016841..ec37e84a11c6 100644
>>>>> --- a/kernel/bpf/verifier.c
>>>>> +++ b/kernel/bpf/verifier.c
>>>>> @@ -18271,6 +18271,13 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>>>>             struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
>>>>>             struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_2, (long)kptr_struct_meta) };
>>>>>     +        if (desc->func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl] &&
>>>>
>>>> Why check for KF_bpf_refcount_acquire_impl? We can cover all cases in this 'if' branch, right?
>>>>
>>>
>>> The body of this 'else if' also handles kptr_struct_meta setup for bpf_obj_drop,
>>> for which NULL kptr_struct_meta is valid.
>>>
>>>>> +            !kptr_struct_meta) {
>>>>> +            verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n",
>>>>> +                insn_idx);
>>>>> +            return -EFAULT;
>>>>> +        }
>>>>> +
>>>>>             insn_buf[0] = addr[0];
>>>>>             insn_buf[1] = addr[1];
>>>>>             insn_buf[2] = *insn;
>>>>> @@ -18278,6 +18285,7 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>>>>         } else if (desc->func_id == special_kfunc_list[KF_bpf_list_push_back_impl] ||
>>>>>                desc->func_id == special_kfunc_list[KF_bpf_list_push_front_impl] ||
>>>>>                desc->func_id == special_kfunc_list[KF_bpf_rbtree_add_impl]) {
>>>>> +        struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
>>>>>             int struct_meta_reg = BPF_REG_3;
>>>>>             int node_offset_reg = BPF_REG_4;
>>>>>     @@ -18287,6 +18295,12 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>>>>>                 node_offset_reg = BPF_REG_5;
>>>>>             }
>>>>>     +        if (!kptr_struct_meta) {
>>>>> +            verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n",
>>>>> +                insn_idx);
>>>>> +            return -EFAULT;
>>>>> +        }
>>>>> +
>>>>>             __fixup_collection_insert_kfunc(&env->insn_aux_data[insn_idx], struct_meta_reg,
>>>>>                             node_offset_reg, insn, insn_buf, cnt);
>>>>>         } else if (desc->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx] ||
>>>>
>>>> In my opinion, such selective defensive programming is not necessary. By searching kptr_struct_meta in the code, it is reasonably easy to find
>>>> whether we have any mismatch or not. Also self test coverage should
>>>> cover these cases (probably already) right?
>>>>
>>>> If the defensive programming here is still desirable to warn at verification time, I think we should just check all of uses for kptr_struct_meta.
>>>
>>> Something like this patch probably should've been included with the series
>>> containing 2140a6e3422d ("bpf: Set kptr_struct_meta for node param to list and rbtree insert funcs"),
>>> since that commit found that kptr_struct_meta wasn't being set for collection
>>> insert kfuncs and fixed the issue. It was annoyingly hard to root-cause
>>> because, among other things, many of these kfunc impls check that
>>> the btf_struct_meta is non-NULL before using it, with some fallback logic.
>>> I don't like those unnecessary NULL checks either, and considered removing
>>> them in this patch, but decided to leave them in since we already had
>>> a case where struct_meta wasn't being set.
>>>
>>> On second thought, maybe it's better to take the unnecessary runtime checks
>>> out and leave these verification-time checks in. If, at runtime, those kfuncs
>>> see a NULL btf_struct_meta, I'd rather they fail loudly in the future
>>> with a NULL deref splat, than potentially leaking memory or similarly
>>> subtle failures. WDYT?
>>
>> Certainly I agree with you that verification failure is much better than
>> debugging runtime.
>>
>> Here, we covered a few kfunc which always requires non-NULL kptr_struct_meta. But as you mentioned in the above, we also have
>> cases where for a kfunc, the kptr_struct_meta could be NULL or non-NULL.
>>
>> Let us say, kfunc bpf_obj_new_impl, for some cases, the kptr_struct_meta
>> cannot be NULL based on bpf prog, but somehow, the verifier passes
>> a NULL ptr to the program. Should we check this at fixup_kfunc_call()
>> as well?
>>
> 
> I see what you mean. Since btf_struct_meta is a (btf_id, btf_record) tuple
> currently, and btf_record is a view of a BTF type optimized for quickly
> answering questions about special fields (existence / location / type), we
> could certainly go look at the kfunc arg BTF types to confirm, as they're the
> source of truth. But this would be redoing the work necessary to generate the
> btf_record in the first place, which feels like it defeats the purpose of
> having this view of the data at verification time. We definitely don't want
> to push this BTF digging to runtime either.
> 
> Another approach: we could focus on the specific issue that caused the
> bug which the commit I mentioned earlier fixes, and enforce the following
> invariants:
> 
>     * All bpf_obj_new_impl calls for a particular type must use the same
>       btf_struct_meta
>     * Any kfunc or helper taking a user-allocated param must use the same
>       btf_struct_meta as the bpf_obj_new_impl call that allocated it
> 
> Whole point of 'special' fields is to take special-field-specific action on
> them, or due to their existence. This would ensure that all helpers use same
> special field info. Then e.g. if bpf_obj_new's bpf_obj_init_fields does
> something to a special field, bpf_obj_drop will have the same btf_struct_meta
> and will have same field info for bpf_obj_free_fields.
> 
> Maybe instead of "bpf_obj_new_impl calls" above it's
> more accurate to say "kfunc calls that acquire the
> user-allocated type".
> 
> I think this second idea makes sense and is better than piecemeal checking.
> Will think on it more and try to incorporate it into v2 of this series.

This sounds better as it is generic
and this should also cover two cases in this patch
as they are covered by the second bullet in the above
(Any kfunc or helper taking a user-allocated param must use the same
  btf_struct_meta as the bpf_obj_new_impl call that allocated it)

> 
>>>
>>> I don't feel particularly strongly about these verification-time checks,
>>> but the level of 'selective defensive programming' here feels similar to
>>> other 'verifier internal error' checks sprinkled throughout verifier.c,
>>> so that argument doesn't feel very persuasive to me.
>>
>> I am okay with this patch but I wonder whether we can cover more
>> cases.
diff mbox series

Patch

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index e7b1af016841..ec37e84a11c6 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -18271,6 +18271,13 @@  static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
 		struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_2, (long)kptr_struct_meta) };
 
+		if (desc->func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl] &&
+		    !kptr_struct_meta) {
+			verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n",
+				insn_idx);
+			return -EFAULT;
+		}
+
 		insn_buf[0] = addr[0];
 		insn_buf[1] = addr[1];
 		insn_buf[2] = *insn;
@@ -18278,6 +18285,7 @@  static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 	} else if (desc->func_id == special_kfunc_list[KF_bpf_list_push_back_impl] ||
 		   desc->func_id == special_kfunc_list[KF_bpf_list_push_front_impl] ||
 		   desc->func_id == special_kfunc_list[KF_bpf_rbtree_add_impl]) {
+		struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
 		int struct_meta_reg = BPF_REG_3;
 		int node_offset_reg = BPF_REG_4;
 
@@ -18287,6 +18295,12 @@  static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 			node_offset_reg = BPF_REG_5;
 		}
 
+		if (!kptr_struct_meta) {
+			verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n",
+				insn_idx);
+			return -EFAULT;
+		}
+
 		__fixup_collection_insert_kfunc(&env->insn_aux_data[insn_idx], struct_meta_reg,
 						node_offset_reg, insn, insn_buf, cnt);
 	} else if (desc->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx] ||