diff mbox series

[bpf-next] bpf: Handle MEM_RCU type properly

Message ID 20221129023713.2216451-1-yhs@fb.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series [bpf-next] bpf: Handle MEM_RCU type properly | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR success PR summary
netdev/tree_selection success Clearly marked for bpf-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit fail Errors and warnings before: 1397 this patch: 1398
netdev/cc_maintainers fail 1 blamed authors not CCed: martin.lau@linux.dev; 11 maintainers not CCed: sdf@google.com shuah@kernel.org martin.lau@linux.dev kpsingh@kernel.org linux-kselftest@vger.kernel.org void@manifault.com haoluo@google.com jolsa@kernel.org song@kernel.org mykolal@fb.com john.fastabend@gmail.com
netdev/build_clang success Errors and warnings before: 157 this patch: 157
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn fail Errors and warnings before: 1387 this patch: 1388
netdev/checkpatch warning CHECK: Prefer using the BIT macro WARNING: line length of 82 exceeds 80 columns WARNING: line length of 87 exceeds 80 columns WARNING: line length of 95 exceeds 80 columns WARNING: line length of 96 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-11 success Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-5 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-6 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-7 success Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-8 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-4 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for test_maps on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-12 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-13 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-14 success Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-15 success Logs for test_progs on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-17 success Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18 success Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-19 success Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for test_progs_no_alu32 on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-22 success Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 success Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-24 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-25 success Logs for test_progs_no_alu32_parallel on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-27 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-28 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-29 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-30 success Logs for test_progs_parallel on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-32 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-33 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-34 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-35 success Logs for test_verifier on aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-37 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-38 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-16 success Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-36 fail Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-26 success Logs for test_progs_no_alu32_parallel on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-31 success Logs for test_progs_parallel on s390x with gcc

Commit Message

Yonghong Song Nov. 29, 2022, 2:37 a.m. UTC
Commit 9bb00b2895cb ("bpf: Add kfunc bpf_rcu_read_lock/unlock()")
introduced MEM_RCU and bpf_rcu_read_lock/unlock() support. In that
commit, a rcu pointer is tagged with both MEM_RCU and PTR_TRUSTED
so that it can be passed into kfuncs or helpers as an argument.

Martin raised a good question in [1] such that the rcu pointer,
although being able to accessing the object, might have reference
count of 0. This might cause a problem if the rcu pointer is passed
to a kfunc which expects trusted arguments where ref count should
be greater than 0.

So this patch tries to fix this problem by tagging rcu pointer with
MEM_RCU only. Special acquire functions are needed to try to
acquire a reference with the consideration that the original rcu
pointer ref count could be 0. This special acquire function's
argument needs to be KF_RCU, a new introduced kfunc flag. In
verifier, KF_RCU will require the actual argument register type
to be MEM_RCU.

 [1] https://lore.kernel.org/bpf/ac70f574-4023-664e-b711-e0d3b18117fd@linux.dev/

Fixes: 9bb00b2895cb ("bpf: Add kfunc bpf_rcu_read_lock/unlock()")
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 include/linux/bpf_verifier.h                  |  2 +-
 include/linux/btf.h                           |  1 +
 kernel/bpf/helpers.c                          | 14 ++++++++
 kernel/bpf/verifier.c                         | 36 +++++++++++++------
 .../selftests/bpf/prog_tests/cgrp_kfunc.c     |  4 +--
 .../selftests/bpf/prog_tests/task_kfunc.c     |  4 +--
 .../selftests/bpf/progs/rcu_read_lock.c       |  7 ++--
 7 files changed, 50 insertions(+), 18 deletions(-)

Comments

Yonghong Song Nov. 30, 2022, 12:11 a.m. UTC | #1
On 11/28/22 6:37 PM, Yonghong Song wrote:
> Commit 9bb00b2895cb ("bpf: Add kfunc bpf_rcu_read_lock/unlock()")
> introduced MEM_RCU and bpf_rcu_read_lock/unlock() support. In that
> commit, a rcu pointer is tagged with both MEM_RCU and PTR_TRUSTED
> so that it can be passed into kfuncs or helpers as an argument.
> 
> Martin raised a good question in [1] such that the rcu pointer,
> although being able to accessing the object, might have reference
> count of 0. This might cause a problem if the rcu pointer is passed
> to a kfunc which expects trusted arguments where ref count should
> be greater than 0.
> 
> So this patch tries to fix this problem by tagging rcu pointer with
> MEM_RCU only. Special acquire functions are needed to try to
> acquire a reference with the consideration that the original rcu
> pointer ref count could be 0. This special acquire function's
> argument needs to be KF_RCU, a new introduced kfunc flag. In
> verifier, KF_RCU will require the actual argument register type
> to be MEM_RCU.
> 
>   [1] https://lore.kernel.org/bpf/ac70f574-4023-664e-b711-e0d3b18117fd@linux.dev/
> 
> Fixes: 9bb00b2895cb ("bpf: Add kfunc bpf_rcu_read_lock/unlock()")
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>   include/linux/bpf_verifier.h                  |  2 +-
>   include/linux/btf.h                           |  1 +
>   kernel/bpf/helpers.c                          | 14 ++++++++
>   kernel/bpf/verifier.c                         | 36 +++++++++++++------
>   .../selftests/bpf/prog_tests/cgrp_kfunc.c     |  4 +--
>   .../selftests/bpf/prog_tests/task_kfunc.c     |  4 +--
>   .../selftests/bpf/progs/rcu_read_lock.c       |  7 ++--
>   7 files changed, 50 insertions(+), 18 deletions(-)
> 
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index c05aa6e1f6f5..6f192dd9025e 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -683,7 +683,7 @@ static inline bool bpf_prog_check_recur(const struct bpf_prog *prog)
>   	}
>   }
>   
> -#define BPF_REG_TRUSTED_MODIFIERS (MEM_ALLOC | MEM_RCU | PTR_TRUSTED)
> +#define BPF_REG_TRUSTED_MODIFIERS (MEM_ALLOC | PTR_TRUSTED)
>   
>   static inline bool bpf_type_has_unsafe_modifiers(u32 type)
>   {
> diff --git a/include/linux/btf.h b/include/linux/btf.h
> index 9ed00077db6e..cbd6e4096f8c 100644
> --- a/include/linux/btf.h
> +++ b/include/linux/btf.h
> @@ -70,6 +70,7 @@
>   #define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */
>   #define KF_SLEEPABLE    (1 << 5) /* kfunc may sleep */
>   #define KF_DESTRUCTIVE  (1 << 6) /* kfunc performs destructive actions */
> +#define KF_RCU          (1 << 7) /* kfunc only takes rcu pointer arguments */
>   
>   /*
>    * Return the name of the passed struct, if exists, or halt the build if for
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index a5a511430f2a..46fbe027f3b6 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -1837,6 +1837,19 @@ struct task_struct *bpf_task_acquire(struct task_struct *p)
>   	return p;
>   }
>   
> +/**
> + * bpf_task_acquire_rcu - Acquire a reference to a rcu task object. A task
> + * acquired by this kfunc which is not stored in a map as a kptr, must be
> + * released by calling bpf_task_release().
> + * @p: The task on which a reference is being acquired or NULL.

A typo. The argument @p cannot be NULL. Will wait for more feedbacks
before sending next version with the fix.

> + */
> +struct task_struct *bpf_task_acquire_rcu(struct task_struct *p)
> +{
> +	if (!refcount_inc_not_zero(&p->rcu_users))
> +		return NULL;
> +	return p;
> +}
> +
>   /**
[...]
Martin KaFai Lau Dec. 1, 2022, 7:05 a.m. UTC | #2
On 11/28/22 6:37 PM, Yonghong Song wrote:
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index c05aa6e1f6f5..6f192dd9025e 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -683,7 +683,7 @@ static inline bool bpf_prog_check_recur(const struct bpf_prog *prog)
>   	}
>   }
>   
> -#define BPF_REG_TRUSTED_MODIFIERS (MEM_ALLOC | MEM_RCU | PTR_TRUSTED)
> +#define BPF_REG_TRUSTED_MODIFIERS (MEM_ALLOC | PTR_TRUSTED)

[ ... ]

> +static bool is_rcu_reg(const struct bpf_reg_state *reg)
> +{
> +	return reg->type & MEM_RCU;
> +}
> +
>   static int check_pkt_ptr_alignment(struct bpf_verifier_env *env,
>   				   const struct bpf_reg_state *reg,
>   				   int off, int size, bool strict)
> @@ -4775,12 +4780,10 @@ static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
>   		/* Mark value register as MEM_RCU only if it is protected by
>   		 * bpf_rcu_read_lock() and the ptr reg is trusted. MEM_RCU
>   		 * itself can already indicate trustedness inside the rcu
> -		 * read lock region. Also mark it as PTR_TRUSTED.
> +		 * read lock region.
>   		 */
>   		if (!env->cur_state->active_rcu_lock || !is_trusted_reg(reg))
>   			flag &= ~MEM_RCU;

How about dereferencing a PTR_TO_BTF_ID | MEM_RCU, like:

	/* parent: PTR_TO_BTF_ID | MEM_RCU */
	parent = current->real_parent;
	/* gparent: PTR_TO_BTF_ID */
	gparent = parent->real_parent;

Should "gparent" have MEM_RCU also?

Also, should PTR_MAYBE_NULL be added to "parent"?
	
> -		else
> -			flag |= PTR_TRUSTED;
>   	} else if (reg->type & MEM_RCU) {
>   		/* ptr (reg) is marked as MEM_RCU, but the struct field is not tagged
>   		 * with __rcu. Mark the flag as PTR_UNTRUSTED conservatively.
> @@ -5945,7 +5948,7 @@ static const struct bpf_reg_types btf_ptr_types = {
>   	.types = {
>   		PTR_TO_BTF_ID,
>   		PTR_TO_BTF_ID | PTR_TRUSTED,
> -		PTR_TO_BTF_ID | MEM_RCU | PTR_TRUSTED,
> +		PTR_TO_BTF_ID | MEM_RCU,
>   	},
>   };
>   static const struct bpf_reg_types percpu_btf_ptr_types = {
> @@ -6124,7 +6127,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
>   	case PTR_TO_BTF_ID:
>   	case PTR_TO_BTF_ID | MEM_ALLOC:
>   	case PTR_TO_BTF_ID | PTR_TRUSTED:
> -	case PTR_TO_BTF_ID | MEM_RCU | PTR_TRUSTED:
> +	case PTR_TO_BTF_ID | MEM_RCU:
>   	case PTR_TO_BTF_ID | MEM_ALLOC | PTR_TRUSTED:
>   		/* When referenced PTR_TO_BTF_ID is passed to release function,
>   		 * it's fixed offset must be 0.	In the other cases, fixed offset
> @@ -8022,6 +8025,11 @@ static bool is_kfunc_destructive(struct bpf_kfunc_call_arg_meta *meta)
>   	return meta->kfunc_flags & KF_DESTRUCTIVE;
>   }
>   
> +static bool is_kfunc_rcu(struct bpf_kfunc_call_arg_meta *meta)
> +{
> +	return meta->kfunc_flags & KF_RCU;
> +}
> +
>   static bool is_kfunc_arg_kptr_get(struct bpf_kfunc_call_arg_meta *meta, int arg)
>   {
>   	return arg == 0 && (meta->kfunc_flags & KF_KPTR_GET);
> @@ -8706,13 +8714,19 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
>   		switch (kf_arg_type) {
>   		case KF_ARG_PTR_TO_ALLOC_BTF_ID:
>   		case KF_ARG_PTR_TO_BTF_ID:
> -			if (!is_kfunc_trusted_args(meta))
> +			if (!is_kfunc_trusted_args(meta) && !is_kfunc_rcu(meta))
>   				break;
>   
> -			if (!is_trusted_reg(reg)) {
> -				verbose(env, "R%d must be referenced or trusted\n", regno);
> +			if (!is_trusted_reg(reg) && !is_rcu_reg(reg)) {
> +				verbose(env, "R%d must be referenced, trusted or rcu\n", regno);
>   				return -EINVAL;
>   			}
> +
> +			if (is_kfunc_rcu(meta) != is_rcu_reg(reg)) {

I think is_trusted_reg(reg) should also be acceptable to bpf_task_acquire_rcu().

nit. bpf_task_acquire_not_zero() may be a better kfunc name.

> +				verbose(env, "R%d does not match kf arg rcu tagging\n", regno);
> +				return -EINVAL;
> +			}
> +
>   			fallthrough;
>   		case KF_ARG_PTR_TO_CTX:
>   			/* Trusted arguments have the same offset checks as release arguments */
> @@ -8823,7 +8837,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
>   		case KF_ARG_PTR_TO_BTF_ID:
>   			/* Only base_type is checked, further checks are done here */
>   			if ((base_type(reg->type) != PTR_TO_BTF_ID ||
> -			     bpf_type_has_unsafe_modifiers(reg->type)) &&
> +			     (bpf_type_has_unsafe_modifiers(reg->type) && !is_rcu_reg(reg))) &&
>   			    !reg2btf_ids[base_type(reg->type)]) {
>   				verbose(env, "arg#%d is %s ", i, reg_type_str(env, reg->type));
>   				verbose(env, "expected %s or socket\n",
> @@ -8938,7 +8952,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>   		} else if (rcu_unlock) {
>   			bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
>   				if (reg->type & MEM_RCU) {
> -					reg->type &= ~(MEM_RCU | PTR_TRUSTED);
> +					reg->type &= ~MEM_RCU;
>   					reg->type |= PTR_UNTRUSTED;
>   				}
>   			}));
> diff --git a/tools/testing/selftests/bpf/prog_tests/cgrp_kfunc.c b/tools/testing/selftests/bpf/prog_tests/cgrp_kfunc.c
> index 973f0c5af965..5fbd9edd2c4c 100644
> --- a/tools/testing/selftests/bpf/prog_tests/cgrp_kfunc.c
> +++ b/tools/testing/selftests/bpf/prog_tests/cgrp_kfunc.c
> @@ -93,10 +93,10 @@ static struct {
>   	const char *prog_name;
>   	const char *expected_err_msg;
>   } failure_tests[] = {
> -	{"cgrp_kfunc_acquire_untrusted", "R1 must be referenced or trusted"},
> +	{"cgrp_kfunc_acquire_untrusted", "R1 must be referenced, trusted or rcu"},
>   	{"cgrp_kfunc_acquire_fp", "arg#0 pointer type STRUCT cgroup must point"},
>   	{"cgrp_kfunc_acquire_unsafe_kretprobe", "reg type unsupported for arg#0 function"},
> -	{"cgrp_kfunc_acquire_trusted_walked", "R1 must be referenced or trusted"},
> +	{"cgrp_kfunc_acquire_trusted_walked", "R1 must be referenced, trusted or rcu"},
>   	{"cgrp_kfunc_acquire_null", "arg#0 pointer type STRUCT cgroup must point"},
>   	{"cgrp_kfunc_acquire_unreleased", "Unreleased reference"},
>   	{"cgrp_kfunc_get_non_kptr_param", "arg#0 expected pointer to map value"},
> diff --git a/tools/testing/selftests/bpf/prog_tests/task_kfunc.c b/tools/testing/selftests/bpf/prog_tests/task_kfunc.c
> index ffd8ef4303c8..80708c073de6 100644
> --- a/tools/testing/selftests/bpf/prog_tests/task_kfunc.c
> +++ b/tools/testing/selftests/bpf/prog_tests/task_kfunc.c
> @@ -87,10 +87,10 @@ static struct {
>   	const char *prog_name;
>   	const char *expected_err_msg;
>   } failure_tests[] = {
> -	{"task_kfunc_acquire_untrusted", "R1 must be referenced or trusted"},
> +	{"task_kfunc_acquire_untrusted", "R1 must be referenced, trusted or rcu"},
>   	{"task_kfunc_acquire_fp", "arg#0 pointer type STRUCT task_struct must point"},
>   	{"task_kfunc_acquire_unsafe_kretprobe", "reg type unsupported for arg#0 function"},
> -	{"task_kfunc_acquire_trusted_walked", "R1 must be referenced or trusted"},
> +	{"task_kfunc_acquire_trusted_walked", "R1 must be referenced, trusted or rcu"},

hmm... why this description is changed here?  The bpf_task_acquire kfunc-flags 
has not changed.
Yonghong Song Dec. 1, 2022, 5:47 p.m. UTC | #3
On 11/30/22 11:05 PM, Martin KaFai Lau wrote:
> On 11/28/22 6:37 PM, Yonghong Song wrote:
>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>> index c05aa6e1f6f5..6f192dd9025e 100644
>> --- a/include/linux/bpf_verifier.h
>> +++ b/include/linux/bpf_verifier.h
>> @@ -683,7 +683,7 @@ static inline bool bpf_prog_check_recur(const 
>> struct bpf_prog *prog)
>>       }
>>   }
>> -#define BPF_REG_TRUSTED_MODIFIERS (MEM_ALLOC | MEM_RCU | PTR_TRUSTED)
>> +#define BPF_REG_TRUSTED_MODIFIERS (MEM_ALLOC | PTR_TRUSTED)
> 
> [ ... ]
> 
>> +static bool is_rcu_reg(const struct bpf_reg_state *reg)
>> +{
>> +    return reg->type & MEM_RCU;
>> +}
>> +
>>   static int check_pkt_ptr_alignment(struct bpf_verifier_env *env,
>>                      const struct bpf_reg_state *reg,
>>                      int off, int size, bool strict)
>> @@ -4775,12 +4780,10 @@ static int check_ptr_to_btf_access(struct 
>> bpf_verifier_env *env,
>>           /* Mark value register as MEM_RCU only if it is protected by
>>            * bpf_rcu_read_lock() and the ptr reg is trusted. MEM_RCU
>>            * itself can already indicate trustedness inside the rcu
>> -         * read lock region. Also mark it as PTR_TRUSTED.
>> +         * read lock region.
>>            */
>>           if (!env->cur_state->active_rcu_lock || !is_trusted_reg(reg))
>>               flag &= ~MEM_RCU;
> 
> How about dereferencing a PTR_TO_BTF_ID | MEM_RCU, like:
> 
>      /* parent: PTR_TO_BTF_ID | MEM_RCU */
>      parent = current->real_parent;
>      /* gparent: PTR_TO_BTF_ID */
>      gparent = parent->real_parent;
> 
> Should "gparent" have MEM_RCU also?

Currently, no. We have logic in the code like

         if (flag & MEM_RCU) {
                 /* Mark value register as MEM_RCU only if it is 
protected by
                  * bpf_rcu_read_lock() and the ptr reg is trusted. MEM_RCU
                  * itself can already indicate trustedness inside the rcu
                  * read lock region.
                  */
                 if (!env->cur_state->active_rcu_lock || 
!is_trusted_reg(reg))
                         flag &= ~MEM_RCU;
         }

Since 'parent' is not trusted, so it will not be marked as MEM_RCU.
It can be marked as MEM_RCU if we do (based on the current patch)

	parent = current->real_parent;
	parent2 = bpf_task_acquire_rcu(parent);
	if (!parent2) goto out;
	gparent = parent2->real_parent;

Now gparent will be marked as MEM_RCU.

> 
> Also, should PTR_MAYBE_NULL be added to "parent"?

I think we might need to do this. Although from kernel code,
task->real_parent, current->cgroups seem not NULL. But for sure
there are cases where the rcu ptr could be NULL. This might
be conservative for some cases, and if it is absolutely
performance critical, we later could tag related __rcu member
with btf_decl_tag to indicate its non-null status.

> 
>> -        else
>> -            flag |= PTR_TRUSTED;
>>       } else if (reg->type & MEM_RCU) {
>>           /* ptr (reg) is marked as MEM_RCU, but the struct field is 
>> not tagged
>>            * with __rcu. Mark the flag as PTR_UNTRUSTED conservatively.
>> @@ -5945,7 +5948,7 @@ static const struct bpf_reg_types btf_ptr_types = {
>>       .types = {
>>           PTR_TO_BTF_ID,
>>           PTR_TO_BTF_ID | PTR_TRUSTED,
>> -        PTR_TO_BTF_ID | MEM_RCU | PTR_TRUSTED,
>> +        PTR_TO_BTF_ID | MEM_RCU,
>>       },
>>   };
>>   static const struct bpf_reg_types percpu_btf_ptr_types = {
>> @@ -6124,7 +6127,7 @@ int check_func_arg_reg_off(struct 
>> bpf_verifier_env *env,
>>       case PTR_TO_BTF_ID:
>>       case PTR_TO_BTF_ID | MEM_ALLOC:
>>       case PTR_TO_BTF_ID | PTR_TRUSTED:
>> -    case PTR_TO_BTF_ID | MEM_RCU | PTR_TRUSTED:
>> +    case PTR_TO_BTF_ID | MEM_RCU:
>>       case PTR_TO_BTF_ID | MEM_ALLOC | PTR_TRUSTED:
>>           /* When referenced PTR_TO_BTF_ID is passed to release function,
>>            * it's fixed offset must be 0.    In the other cases, fixed 
>> offset
>> @@ -8022,6 +8025,11 @@ static bool is_kfunc_destructive(struct 
>> bpf_kfunc_call_arg_meta *meta)
>>       return meta->kfunc_flags & KF_DESTRUCTIVE;
>>   }
>> +static bool is_kfunc_rcu(struct bpf_kfunc_call_arg_meta *meta)
>> +{
>> +    return meta->kfunc_flags & KF_RCU;
>> +}
>> +
>>   static bool is_kfunc_arg_kptr_get(struct bpf_kfunc_call_arg_meta 
>> *meta, int arg)
>>   {
>>       return arg == 0 && (meta->kfunc_flags & KF_KPTR_GET);
>> @@ -8706,13 +8714,19 @@ static int check_kfunc_args(struct 
>> bpf_verifier_env *env, struct bpf_kfunc_call_
>>           switch (kf_arg_type) {
>>           case KF_ARG_PTR_TO_ALLOC_BTF_ID:
>>           case KF_ARG_PTR_TO_BTF_ID:
>> -            if (!is_kfunc_trusted_args(meta))
>> +            if (!is_kfunc_trusted_args(meta) && !is_kfunc_rcu(meta))
>>                   break;
>> -            if (!is_trusted_reg(reg)) {
>> -                verbose(env, "R%d must be referenced or trusted\n", 
>> regno);
>> +            if (!is_trusted_reg(reg) && !is_rcu_reg(reg)) {
>> +                verbose(env, "R%d must be referenced, trusted or 
>> rcu\n", regno);
>>                   return -EINVAL;
>>               }
>> +
>> +            if (is_kfunc_rcu(meta) != is_rcu_reg(reg)) {
> 
> I think is_trusted_reg(reg) should also be acceptable to 
> bpf_task_acquire_rcu().

Yes, good point. trusted is a super set of rcu.

> 
> nit. bpf_task_acquire_not_zero() may be a better kfunc name.

Will use this one.

> 
>> +                verbose(env, "R%d does not match kf arg rcu 
>> tagging\n", regno);
>> +                return -EINVAL;
>> +            }
>> +
>>               fallthrough;
>>           case KF_ARG_PTR_TO_CTX:
>>               /* Trusted arguments have the same offset checks as 
>> release arguments */
>> @@ -8823,7 +8837,7 @@ static int check_kfunc_args(struct 
>> bpf_verifier_env *env, struct bpf_kfunc_call_
>>           case KF_ARG_PTR_TO_BTF_ID:
>>               /* Only base_type is checked, further checks are done 
>> here */
>>               if ((base_type(reg->type) != PTR_TO_BTF_ID ||
>> -                 bpf_type_has_unsafe_modifiers(reg->type)) &&
>> +                 (bpf_type_has_unsafe_modifiers(reg->type) && 
>> !is_rcu_reg(reg))) &&
>>                   !reg2btf_ids[base_type(reg->type)]) {
>>                   verbose(env, "arg#%d is %s ", i, reg_type_str(env, 
>> reg->type));
>>                   verbose(env, "expected %s or socket\n",
>> @@ -8938,7 +8952,7 @@ static int check_kfunc_call(struct 
>> bpf_verifier_env *env, struct bpf_insn *insn,
>>           } else if (rcu_unlock) {
>>               bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
>>                   if (reg->type & MEM_RCU) {
>> -                    reg->type &= ~(MEM_RCU | PTR_TRUSTED);
>> +                    reg->type &= ~MEM_RCU;
>>                       reg->type |= PTR_UNTRUSTED;
>>                   }
>>               }));
>> diff --git a/tools/testing/selftests/bpf/prog_tests/cgrp_kfunc.c 
>> b/tools/testing/selftests/bpf/prog_tests/cgrp_kfunc.c
>> index 973f0c5af965..5fbd9edd2c4c 100644
>> --- a/tools/testing/selftests/bpf/prog_tests/cgrp_kfunc.c
>> +++ b/tools/testing/selftests/bpf/prog_tests/cgrp_kfunc.c
>> @@ -93,10 +93,10 @@ static struct {
>>       const char *prog_name;
>>       const char *expected_err_msg;
>>   } failure_tests[] = {
>> -    {"cgrp_kfunc_acquire_untrusted", "R1 must be referenced or 
>> trusted"},
>> +    {"cgrp_kfunc_acquire_untrusted", "R1 must be referenced, trusted 
>> or rcu"},
>>       {"cgrp_kfunc_acquire_fp", "arg#0 pointer type STRUCT cgroup must 
>> point"},
>>       {"cgrp_kfunc_acquire_unsafe_kretprobe", "reg type unsupported 
>> for arg#0 function"},
>> -    {"cgrp_kfunc_acquire_trusted_walked", "R1 must be referenced or 
>> trusted"},
>> +    {"cgrp_kfunc_acquire_trusted_walked", "R1 must be referenced, 
>> trusted or rcu"},
>>       {"cgrp_kfunc_acquire_null", "arg#0 pointer type STRUCT cgroup 
>> must point"},
>>       {"cgrp_kfunc_acquire_unreleased", "Unreleased reference"},
>>       {"cgrp_kfunc_get_non_kptr_param", "arg#0 expected pointer to map 
>> value"},
>> diff --git a/tools/testing/selftests/bpf/prog_tests/task_kfunc.c 
>> b/tools/testing/selftests/bpf/prog_tests/task_kfunc.c
>> index ffd8ef4303c8..80708c073de6 100644
>> --- a/tools/testing/selftests/bpf/prog_tests/task_kfunc.c
>> +++ b/tools/testing/selftests/bpf/prog_tests/task_kfunc.c
>> @@ -87,10 +87,10 @@ static struct {
>>       const char *prog_name;
>>       const char *expected_err_msg;
>>   } failure_tests[] = {
>> -    {"task_kfunc_acquire_untrusted", "R1 must be referenced or 
>> trusted"},
>> +    {"task_kfunc_acquire_untrusted", "R1 must be referenced, trusted 
>> or rcu"},
>>       {"task_kfunc_acquire_fp", "arg#0 pointer type STRUCT task_struct 
>> must point"},
>>       {"task_kfunc_acquire_unsafe_kretprobe", "reg type unsupported 
>> for arg#0 function"},
>> -    {"task_kfunc_acquire_trusted_walked", "R1 must be referenced or 
>> trusted"},
>> +    {"task_kfunc_acquire_trusted_walked", "R1 must be referenced, 
>> trusted or rcu"},
> 
> hmm... why this description is changed here?  The bpf_task_acquire 
> kfunc-flags has not changed.

This is due to my code change below.

-                       if (!is_trusted_reg(reg)) {
-                               verbose(env, "R%d must be referenced or 
trusted\n", regno);
+                       if (!is_trusted_reg(reg) && !is_rcu_reg(reg)) {
+                               verbose(env, "R%d must be referenced, 
trusted or rcu\n", regno);
                                 return -EINVAL;
                         }

I will change code differently to avoid verifier log change.
Martin KaFai Lau Dec. 1, 2022, 10:05 p.m. UTC | #4
On 12/1/22 9:47 AM, Yonghong Song wrote:
> 
> 
> On 11/30/22 11:05 PM, Martin KaFai Lau wrote:
>> On 11/28/22 6:37 PM, Yonghong Song wrote:
>>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>>> index c05aa6e1f6f5..6f192dd9025e 100644
>>> --- a/include/linux/bpf_verifier.h
>>> +++ b/include/linux/bpf_verifier.h
>>> @@ -683,7 +683,7 @@ static inline bool bpf_prog_check_recur(const struct 
>>> bpf_prog *prog)
>>>       }
>>>   }
>>> -#define BPF_REG_TRUSTED_MODIFIERS (MEM_ALLOC | MEM_RCU | PTR_TRUSTED)
>>> +#define BPF_REG_TRUSTED_MODIFIERS (MEM_ALLOC | PTR_TRUSTED)
>>
>> [ ... ]
>>
>>> +static bool is_rcu_reg(const struct bpf_reg_state *reg)
>>> +{
>>> +    return reg->type & MEM_RCU;
>>> +}
>>> +
>>>   static int check_pkt_ptr_alignment(struct bpf_verifier_env *env,
>>>                      const struct bpf_reg_state *reg,
>>>                      int off, int size, bool strict)
>>> @@ -4775,12 +4780,10 @@ static int check_ptr_to_btf_access(struct 
>>> bpf_verifier_env *env,
>>>           /* Mark value register as MEM_RCU only if it is protected by
>>>            * bpf_rcu_read_lock() and the ptr reg is trusted. MEM_RCU
>>>            * itself can already indicate trustedness inside the rcu
>>> -         * read lock region. Also mark it as PTR_TRUSTED.
>>> +         * read lock region.
>>>            */
>>>           if (!env->cur_state->active_rcu_lock || !is_trusted_reg(reg))
>>>               flag &= ~MEM_RCU;
>>
>> How about dereferencing a PTR_TO_BTF_ID | MEM_RCU, like:
>>
>>      /* parent: PTR_TO_BTF_ID | MEM_RCU */
>>      parent = current->real_parent;
>>      /* gparent: PTR_TO_BTF_ID */
>>      gparent = parent->real_parent;
>>
>> Should "gparent" have MEM_RCU also?
> 
> Currently, no. We have logic in the code like
> 
>          if (flag & MEM_RCU) {
>                  /* Mark value register as MEM_RCU only if it is protected by
>                   * bpf_rcu_read_lock() and the ptr reg is trusted. MEM_RCU
>                   * itself can already indicate trustedness inside the rcu
>                   * read lock region.
>                   */
>                  if (!env->cur_state->active_rcu_lock || !is_trusted_reg(reg))
>                          flag &= ~MEM_RCU;
>          }
> 
> Since 'parent' is not trusted, so it will not be marked as MEM_RCU.
> It can be marked as MEM_RCU if we do (based on the current patch)

or adding a is_rcu_trusted_reg() to consider both is_trusted_reg and MEM_RCU 
before cleaning ~MEM_RCU here.  It seems the check_kfunc_args() below will need 
a similar is_rcu_trusted_reg() test also.

> 
>      parent = current->real_parent;
>      parent2 = bpf_task_acquire_rcu(parent);
>      if (!parent2) goto out;
>      gparent = parent2->real_parent;
> 
> Now gparent will be marked as MEM_RCU.
> 
>>
>> Also, should PTR_MAYBE_NULL be added to "parent"?
> 
> I think we might need to do this. Although from kernel code,
> task->real_parent, current->cgroups seem not NULL. But for sure
> there are cases where the rcu ptr could be NULL. This might
> be conservative for some cases, and if it is absolutely
> performance critical, we later could tag related __rcu member
> with btf_decl_tag to indicate its non-null status.
> 
>>
>>> -        else
>>> -            flag |= PTR_TRUSTED;
>>>       } else if (reg->type & MEM_RCU) {
>>>           /* ptr (reg) is marked as MEM_RCU, but the struct field is not tagged
>>>            * with __rcu. Mark the flag as PTR_UNTRUSTED conservatively.
>>> @@ -5945,7 +5948,7 @@ static const struct bpf_reg_types btf_ptr_types = {
>>>       .types = {
>>>           PTR_TO_BTF_ID,
>>>           PTR_TO_BTF_ID | PTR_TRUSTED,
>>> -        PTR_TO_BTF_ID | MEM_RCU | PTR_TRUSTED,
>>> +        PTR_TO_BTF_ID | MEM_RCU,
>>>       },
>>>   };
>>>   static const struct bpf_reg_types percpu_btf_ptr_types = {
>>> @@ -6124,7 +6127,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
>>>       case PTR_TO_BTF_ID:
>>>       case PTR_TO_BTF_ID | MEM_ALLOC:
>>>       case PTR_TO_BTF_ID | PTR_TRUSTED:
>>> -    case PTR_TO_BTF_ID | MEM_RCU | PTR_TRUSTED:
>>> +    case PTR_TO_BTF_ID | MEM_RCU:
>>>       case PTR_TO_BTF_ID | MEM_ALLOC | PTR_TRUSTED:
>>>           /* When referenced PTR_TO_BTF_ID is passed to release function,
>>>            * it's fixed offset must be 0.    In the other cases, fixed offset
>>> @@ -8022,6 +8025,11 @@ static bool is_kfunc_destructive(struct 
>>> bpf_kfunc_call_arg_meta *meta)
>>>       return meta->kfunc_flags & KF_DESTRUCTIVE;
>>>   }
>>> +static bool is_kfunc_rcu(struct bpf_kfunc_call_arg_meta *meta)
>>> +{
>>> +    return meta->kfunc_flags & KF_RCU;
>>> +}
>>> +
>>>   static bool is_kfunc_arg_kptr_get(struct bpf_kfunc_call_arg_meta *meta, int 
>>> arg)
>>>   {
>>>       return arg == 0 && (meta->kfunc_flags & KF_KPTR_GET);
>>> @@ -8706,13 +8714,19 @@ static int check_kfunc_args(struct bpf_verifier_env 
>>> *env, struct bpf_kfunc_call_
>>>           switch (kf_arg_type) {
>>>           case KF_ARG_PTR_TO_ALLOC_BTF_ID:
>>>           case KF_ARG_PTR_TO_BTF_ID:
>>> -            if (!is_kfunc_trusted_args(meta))
>>> +            if (!is_kfunc_trusted_args(meta) && !is_kfunc_rcu(meta))
>>>                   break;
>>> -            if (!is_trusted_reg(reg)) {
>>> -                verbose(env, "R%d must be referenced or trusted\n", regno);
>>> +            if (!is_trusted_reg(reg) && !is_rcu_reg(reg)) {
>>> +                verbose(env, "R%d must be referenced, trusted or rcu\n", 
>>> regno);
>>>                   return -EINVAL;
>>>               }
>>> +
>>> +            if (is_kfunc_rcu(meta) != is_rcu_reg(reg)) {
>>
>> I think is_trusted_reg(reg) should also be acceptable to bpf_task_acquire_rcu().
> 
> Yes, good point. trusted is a super set of rcu.
> 
>>
>> nit. bpf_task_acquire_not_zero() may be a better kfunc name.
> 
> Will use this one.
Yonghong Song Dec. 2, 2022, 12:08 a.m. UTC | #5
On 12/1/22 2:05 PM, Martin KaFai Lau wrote:
> On 12/1/22 9:47 AM, Yonghong Song wrote:
>>
>>
>> On 11/30/22 11:05 PM, Martin KaFai Lau wrote:
>>> On 11/28/22 6:37 PM, Yonghong Song wrote:
>>>> diff --git a/include/linux/bpf_verifier.h 
>>>> b/include/linux/bpf_verifier.h
>>>> index c05aa6e1f6f5..6f192dd9025e 100644
>>>> --- a/include/linux/bpf_verifier.h
>>>> +++ b/include/linux/bpf_verifier.h
>>>> @@ -683,7 +683,7 @@ static inline bool bpf_prog_check_recur(const 
>>>> struct bpf_prog *prog)
>>>>       }
>>>>   }
>>>> -#define BPF_REG_TRUSTED_MODIFIERS (MEM_ALLOC | MEM_RCU | PTR_TRUSTED)
>>>> +#define BPF_REG_TRUSTED_MODIFIERS (MEM_ALLOC | PTR_TRUSTED)
>>>
>>> [ ... ]
>>>
>>>> +static bool is_rcu_reg(const struct bpf_reg_state *reg)
>>>> +{
>>>> +    return reg->type & MEM_RCU;
>>>> +}
>>>> +
>>>>   static int check_pkt_ptr_alignment(struct bpf_verifier_env *env,
>>>>                      const struct bpf_reg_state *reg,
>>>>                      int off, int size, bool strict)
>>>> @@ -4775,12 +4780,10 @@ static int check_ptr_to_btf_access(struct 
>>>> bpf_verifier_env *env,
>>>>           /* Mark value register as MEM_RCU only if it is protected by
>>>>            * bpf_rcu_read_lock() and the ptr reg is trusted. MEM_RCU
>>>>            * itself can already indicate trustedness inside the rcu
>>>> -         * read lock region. Also mark it as PTR_TRUSTED.
>>>> +         * read lock region.
>>>>            */
>>>>           if (!env->cur_state->active_rcu_lock || !is_trusted_reg(reg))
>>>>               flag &= ~MEM_RCU;
>>>
>>> How about dereferencing a PTR_TO_BTF_ID | MEM_RCU, like:
>>>
>>>      /* parent: PTR_TO_BTF_ID | MEM_RCU */
>>>      parent = current->real_parent;
>>>      /* gparent: PTR_TO_BTF_ID */
>>>      gparent = parent->real_parent;
>>>
>>> Should "gparent" have MEM_RCU also?
>>
>> Currently, no. We have logic in the code like
>>
>>          if (flag & MEM_RCU) {
>>                  /* Mark value register as MEM_RCU only if it is 
>> protected by
>>                   * bpf_rcu_read_lock() and the ptr reg is trusted. 
>> MEM_RCU
>>                   * itself can already indicate trustedness inside the 
>> rcu
>>                   * read lock region.
>>                   */
>>                  if (!env->cur_state->active_rcu_lock || 
>> !is_trusted_reg(reg))
>>                          flag &= ~MEM_RCU;
>>          }
>>
>> Since 'parent' is not trusted, so it will not be marked as MEM_RCU.
>> It can be marked as MEM_RCU if we do (based on the current patch)
> 
> or adding a is_rcu_trusted_reg() to consider both is_trusted_reg and 
> MEM_RCU before cleaning ~MEM_RCU here.  It seems the check_kfunc_args() 
> below will need a similar is_rcu_trusted_reg() test also.

Sounds good. Will do.

> 
>>
>>      parent = current->real_parent;
>>      parent2 = bpf_task_acquire_rcu(parent);
>>      if (!parent2) goto out;
>>      gparent = parent2->real_parent;
>>
>> Now gparent will be marked as MEM_RCU.
>>
>>>
>>> Also, should PTR_MAYBE_NULL be added to "parent"?
>>
>> I think we might need to do this. Although from kernel code,
>> task->real_parent, current->cgroups seem not NULL. But for sure
>> there are cases where the rcu ptr could be NULL. This might
>> be conservative for some cases, and if it is absolutely
>> performance critical, we later could tag related __rcu member
>> with btf_decl_tag to indicate its non-null status.
>>
>>>
>>>> -        else
>>>> -            flag |= PTR_TRUSTED;
>>>>       } else if (reg->type & MEM_RCU) {
>>>>           /* ptr (reg) is marked as MEM_RCU, but the struct field is 
>>>> not tagged
>>>>            * with __rcu. Mark the flag as PTR_UNTRUSTED conservatively.
>>>> @@ -5945,7 +5948,7 @@ static const struct bpf_reg_types 
>>>> btf_ptr_types = {
>>>>       .types = {
>>>>           PTR_TO_BTF_ID,
>>>>           PTR_TO_BTF_ID | PTR_TRUSTED,
>>>> -        PTR_TO_BTF_ID | MEM_RCU | PTR_TRUSTED,
>>>> +        PTR_TO_BTF_ID | MEM_RCU,
>>>>       },
>>>>   };
>>>>   static const struct bpf_reg_types percpu_btf_ptr_types = {
>>>> @@ -6124,7 +6127,7 @@ int check_func_arg_reg_off(struct 
>>>> bpf_verifier_env *env,
>>>>       case PTR_TO_BTF_ID:
>>>>       case PTR_TO_BTF_ID | MEM_ALLOC:
>>>>       case PTR_TO_BTF_ID | PTR_TRUSTED:
>>>> -    case PTR_TO_BTF_ID | MEM_RCU | PTR_TRUSTED:
>>>> +    case PTR_TO_BTF_ID | MEM_RCU:
>>>>       case PTR_TO_BTF_ID | MEM_ALLOC | PTR_TRUSTED:
>>>>           /* When referenced PTR_TO_BTF_ID is passed to release 
>>>> function,
>>>>            * it's fixed offset must be 0.    In the other cases, 
>>>> fixed offset
>>>> @@ -8022,6 +8025,11 @@ static bool is_kfunc_destructive(struct 
>>>> bpf_kfunc_call_arg_meta *meta)
>>>>       return meta->kfunc_flags & KF_DESTRUCTIVE;
>>>>   }
>>>> +static bool is_kfunc_rcu(struct bpf_kfunc_call_arg_meta *meta)
>>>> +{
>>>> +    return meta->kfunc_flags & KF_RCU;
>>>> +}
>>>> +
>>>>   static bool is_kfunc_arg_kptr_get(struct bpf_kfunc_call_arg_meta 
>>>> *meta, int arg)
>>>>   {
>>>>       return arg == 0 && (meta->kfunc_flags & KF_KPTR_GET);
>>>> @@ -8706,13 +8714,19 @@ static int check_kfunc_args(struct 
>>>> bpf_verifier_env *env, struct bpf_kfunc_call_
>>>>           switch (kf_arg_type) {
>>>>           case KF_ARG_PTR_TO_ALLOC_BTF_ID:
>>>>           case KF_ARG_PTR_TO_BTF_ID:
>>>> -            if (!is_kfunc_trusted_args(meta))
>>>> +            if (!is_kfunc_trusted_args(meta) && !is_kfunc_rcu(meta))
>>>>                   break;
>>>> -            if (!is_trusted_reg(reg)) {
>>>> -                verbose(env, "R%d must be referenced or trusted\n", 
>>>> regno);
>>>> +            if (!is_trusted_reg(reg) && !is_rcu_reg(reg)) {
>>>> +                verbose(env, "R%d must be referenced, trusted or 
>>>> rcu\n", regno);
>>>>                   return -EINVAL;
>>>>               }
>>>> +
>>>> +            if (is_kfunc_rcu(meta) != is_rcu_reg(reg)) {
>>>
>>> I think is_trusted_reg(reg) should also be acceptable to 
>>> bpf_task_acquire_rcu().
>>
>> Yes, good point. trusted is a super set of rcu.
>>
>>>
>>> nit. bpf_task_acquire_not_zero() may be a better kfunc name.
>>
>> Will use this one.
>
diff mbox series

Patch

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index c05aa6e1f6f5..6f192dd9025e 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -683,7 +683,7 @@  static inline bool bpf_prog_check_recur(const struct bpf_prog *prog)
 	}
 }
 
-#define BPF_REG_TRUSTED_MODIFIERS (MEM_ALLOC | MEM_RCU | PTR_TRUSTED)
+#define BPF_REG_TRUSTED_MODIFIERS (MEM_ALLOC | PTR_TRUSTED)
 
 static inline bool bpf_type_has_unsafe_modifiers(u32 type)
 {
diff --git a/include/linux/btf.h b/include/linux/btf.h
index 9ed00077db6e..cbd6e4096f8c 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -70,6 +70,7 @@ 
 #define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */
 #define KF_SLEEPABLE    (1 << 5) /* kfunc may sleep */
 #define KF_DESTRUCTIVE  (1 << 6) /* kfunc performs destructive actions */
+#define KF_RCU          (1 << 7) /* kfunc only takes rcu pointer arguments */
 
 /*
  * Return the name of the passed struct, if exists, or halt the build if for
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index a5a511430f2a..46fbe027f3b6 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1837,6 +1837,19 @@  struct task_struct *bpf_task_acquire(struct task_struct *p)
 	return p;
 }
 
+/**
+ * bpf_task_acquire_rcu - Acquire a reference to a rcu task object. A task
+ * acquired by this kfunc which is not stored in a map as a kptr, must be
+ * released by calling bpf_task_release().
+ * @p: The task on which a reference is being acquired or NULL.
+ */
+struct task_struct *bpf_task_acquire_rcu(struct task_struct *p)
+{
+	if (!refcount_inc_not_zero(&p->rcu_users))
+		return NULL;
+	return p;
+}
+
 /**
  * bpf_task_kptr_get - Acquire a reference on a struct task_struct kptr. A task
  * kptr acquired by this kfunc which is not subsequently stored in a map, must
@@ -2013,6 +2026,7 @@  BTF_ID_FLAGS(func, bpf_list_push_back)
 BTF_ID_FLAGS(func, bpf_list_pop_front, KF_ACQUIRE | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_list_pop_back, KF_ACQUIRE | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_task_acquire_rcu, KF_ACQUIRE | KF_RCU | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_task_kptr_get, KF_ACQUIRE | KF_KPTR_GET | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_task_release, KF_RELEASE)
 #ifdef CONFIG_CGROUPS
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6599d25dae38..dda89f4d62a7 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4275,7 +4275,7 @@  static bool is_trusted_reg(const struct bpf_reg_state *reg)
 		return true;
 
 	/* If a register is not referenced, it is trusted if it has the
-	 * MEM_ALLOC, MEM_RCU or PTR_TRUSTED type modifiers, and no others. Some of the
+	 * MEM_ALLOC or PTR_TRUSTED type modifiers, and no others. Some of the
 	 * other type modifiers may be safe, but we elect to take an opt-in
 	 * approach here as some (e.g. PTR_UNTRUSTED and PTR_MAYBE_NULL) are
 	 * not.
@@ -4287,6 +4287,11 @@  static bool is_trusted_reg(const struct bpf_reg_state *reg)
 	       !bpf_type_has_unsafe_modifiers(reg->type);
 }
 
+static bool is_rcu_reg(const struct bpf_reg_state *reg)
+{
+	return reg->type & MEM_RCU;
+}
+
 static int check_pkt_ptr_alignment(struct bpf_verifier_env *env,
 				   const struct bpf_reg_state *reg,
 				   int off, int size, bool strict)
@@ -4775,12 +4780,10 @@  static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
 		/* Mark value register as MEM_RCU only if it is protected by
 		 * bpf_rcu_read_lock() and the ptr reg is trusted. MEM_RCU
 		 * itself can already indicate trustedness inside the rcu
-		 * read lock region. Also mark it as PTR_TRUSTED.
+		 * read lock region.
 		 */
 		if (!env->cur_state->active_rcu_lock || !is_trusted_reg(reg))
 			flag &= ~MEM_RCU;
-		else
-			flag |= PTR_TRUSTED;
 	} else if (reg->type & MEM_RCU) {
 		/* ptr (reg) is marked as MEM_RCU, but the struct field is not tagged
 		 * with __rcu. Mark the flag as PTR_UNTRUSTED conservatively.
@@ -5945,7 +5948,7 @@  static const struct bpf_reg_types btf_ptr_types = {
 	.types = {
 		PTR_TO_BTF_ID,
 		PTR_TO_BTF_ID | PTR_TRUSTED,
-		PTR_TO_BTF_ID | MEM_RCU | PTR_TRUSTED,
+		PTR_TO_BTF_ID | MEM_RCU,
 	},
 };
 static const struct bpf_reg_types percpu_btf_ptr_types = {
@@ -6124,7 +6127,7 @@  int check_func_arg_reg_off(struct bpf_verifier_env *env,
 	case PTR_TO_BTF_ID:
 	case PTR_TO_BTF_ID | MEM_ALLOC:
 	case PTR_TO_BTF_ID | PTR_TRUSTED:
-	case PTR_TO_BTF_ID | MEM_RCU | PTR_TRUSTED:
+	case PTR_TO_BTF_ID | MEM_RCU:
 	case PTR_TO_BTF_ID | MEM_ALLOC | PTR_TRUSTED:
 		/* When referenced PTR_TO_BTF_ID is passed to release function,
 		 * it's fixed offset must be 0.	In the other cases, fixed offset
@@ -8022,6 +8025,11 @@  static bool is_kfunc_destructive(struct bpf_kfunc_call_arg_meta *meta)
 	return meta->kfunc_flags & KF_DESTRUCTIVE;
 }
 
+static bool is_kfunc_rcu(struct bpf_kfunc_call_arg_meta *meta)
+{
+	return meta->kfunc_flags & KF_RCU;
+}
+
 static bool is_kfunc_arg_kptr_get(struct bpf_kfunc_call_arg_meta *meta, int arg)
 {
 	return arg == 0 && (meta->kfunc_flags & KF_KPTR_GET);
@@ -8706,13 +8714,19 @@  static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 		switch (kf_arg_type) {
 		case KF_ARG_PTR_TO_ALLOC_BTF_ID:
 		case KF_ARG_PTR_TO_BTF_ID:
-			if (!is_kfunc_trusted_args(meta))
+			if (!is_kfunc_trusted_args(meta) && !is_kfunc_rcu(meta))
 				break;
 
-			if (!is_trusted_reg(reg)) {
-				verbose(env, "R%d must be referenced or trusted\n", regno);
+			if (!is_trusted_reg(reg) && !is_rcu_reg(reg)) {
+				verbose(env, "R%d must be referenced, trusted or rcu\n", regno);
 				return -EINVAL;
 			}
+
+			if (is_kfunc_rcu(meta) != is_rcu_reg(reg)) {
+				verbose(env, "R%d does not match kf arg rcu tagging\n", regno);
+				return -EINVAL;
+			}
+
 			fallthrough;
 		case KF_ARG_PTR_TO_CTX:
 			/* Trusted arguments have the same offset checks as release arguments */
@@ -8823,7 +8837,7 @@  static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
 		case KF_ARG_PTR_TO_BTF_ID:
 			/* Only base_type is checked, further checks are done here */
 			if ((base_type(reg->type) != PTR_TO_BTF_ID ||
-			     bpf_type_has_unsafe_modifiers(reg->type)) &&
+			     (bpf_type_has_unsafe_modifiers(reg->type) && !is_rcu_reg(reg))) &&
 			    !reg2btf_ids[base_type(reg->type)]) {
 				verbose(env, "arg#%d is %s ", i, reg_type_str(env, reg->type));
 				verbose(env, "expected %s or socket\n",
@@ -8938,7 +8952,7 @@  static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 		} else if (rcu_unlock) {
 			bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({
 				if (reg->type & MEM_RCU) {
-					reg->type &= ~(MEM_RCU | PTR_TRUSTED);
+					reg->type &= ~MEM_RCU;
 					reg->type |= PTR_UNTRUSTED;
 				}
 			}));
diff --git a/tools/testing/selftests/bpf/prog_tests/cgrp_kfunc.c b/tools/testing/selftests/bpf/prog_tests/cgrp_kfunc.c
index 973f0c5af965..5fbd9edd2c4c 100644
--- a/tools/testing/selftests/bpf/prog_tests/cgrp_kfunc.c
+++ b/tools/testing/selftests/bpf/prog_tests/cgrp_kfunc.c
@@ -93,10 +93,10 @@  static struct {
 	const char *prog_name;
 	const char *expected_err_msg;
 } failure_tests[] = {
-	{"cgrp_kfunc_acquire_untrusted", "R1 must be referenced or trusted"},
+	{"cgrp_kfunc_acquire_untrusted", "R1 must be referenced, trusted or rcu"},
 	{"cgrp_kfunc_acquire_fp", "arg#0 pointer type STRUCT cgroup must point"},
 	{"cgrp_kfunc_acquire_unsafe_kretprobe", "reg type unsupported for arg#0 function"},
-	{"cgrp_kfunc_acquire_trusted_walked", "R1 must be referenced or trusted"},
+	{"cgrp_kfunc_acquire_trusted_walked", "R1 must be referenced, trusted or rcu"},
 	{"cgrp_kfunc_acquire_null", "arg#0 pointer type STRUCT cgroup must point"},
 	{"cgrp_kfunc_acquire_unreleased", "Unreleased reference"},
 	{"cgrp_kfunc_get_non_kptr_param", "arg#0 expected pointer to map value"},
diff --git a/tools/testing/selftests/bpf/prog_tests/task_kfunc.c b/tools/testing/selftests/bpf/prog_tests/task_kfunc.c
index ffd8ef4303c8..80708c073de6 100644
--- a/tools/testing/selftests/bpf/prog_tests/task_kfunc.c
+++ b/tools/testing/selftests/bpf/prog_tests/task_kfunc.c
@@ -87,10 +87,10 @@  static struct {
 	const char *prog_name;
 	const char *expected_err_msg;
 } failure_tests[] = {
-	{"task_kfunc_acquire_untrusted", "R1 must be referenced or trusted"},
+	{"task_kfunc_acquire_untrusted", "R1 must be referenced, trusted or rcu"},
 	{"task_kfunc_acquire_fp", "arg#0 pointer type STRUCT task_struct must point"},
 	{"task_kfunc_acquire_unsafe_kretprobe", "reg type unsupported for arg#0 function"},
-	{"task_kfunc_acquire_trusted_walked", "R1 must be referenced or trusted"},
+	{"task_kfunc_acquire_trusted_walked", "R1 must be referenced, trusted or rcu"},
 	{"task_kfunc_acquire_null", "arg#0 pointer type STRUCT task_struct must point"},
 	{"task_kfunc_acquire_unreleased", "Unreleased reference"},
 	{"task_kfunc_get_non_kptr_param", "arg#0 expected pointer to map value"},
diff --git a/tools/testing/selftests/bpf/progs/rcu_read_lock.c b/tools/testing/selftests/bpf/progs/rcu_read_lock.c
index 94a970076b98..3035cd080ec4 100644
--- a/tools/testing/selftests/bpf/progs/rcu_read_lock.c
+++ b/tools/testing/selftests/bpf/progs/rcu_read_lock.c
@@ -23,7 +23,7 @@  struct bpf_key *bpf_lookup_user_key(__u32 serial, __u64 flags) __ksym;
 void bpf_key_put(struct bpf_key *key) __ksym;
 void bpf_rcu_read_lock(void) __ksym;
 void bpf_rcu_read_unlock(void) __ksym;
-struct task_struct *bpf_task_acquire(struct task_struct *p) __ksym;
+struct task_struct *bpf_task_acquire_rcu(struct task_struct *p) __ksym;
 void bpf_task_release(struct task_struct *p) __ksym;
 
 SEC("?fentry.s/" SYS_PREFIX "sys_getpgid")
@@ -135,8 +135,11 @@  int task_acquire(void *ctx)
 	bpf_rcu_read_lock();
 	real_parent = task->real_parent;
 	/* acquire a reference which can be used outside rcu read lock region */
-	real_parent = bpf_task_acquire(real_parent);
+	real_parent = bpf_task_acquire_rcu(real_parent);
 	bpf_rcu_read_unlock();
+	if (!real_parent)
+		return 0;
+
 	(void)bpf_task_storage_get(&map_a, real_parent, 0, 0);
 	bpf_task_release(real_parent);
 	return 0;