diff mbox series

[bpf-next,v3,05/13] bpf: Allow storing referenced kptr in map

Message ID 20220320155510.671497-6-memxor@gmail.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series Introduce typed pointer support in BPF maps | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for bpf-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit fail Errors and warnings before: 1792 this patch: 1793
netdev/cc_maintainers warning 6 maintainers not CCed: netdev@vger.kernel.org songliubraving@fb.com yhs@fb.com john.fastabend@gmail.com kafai@fb.com kpsingh@kernel.org
netdev/build_clang success Errors and warnings before: 194 this patch: 194
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn fail Errors and warnings before: 1811 this patch: 1812
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns WARNING: line length of 84 exceeds 80 columns WARNING: line length of 85 exceeds 80 columns WARNING: line length of 87 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns WARNING: line length of 89 exceeds 80 columns WARNING: line length of 90 exceeds 80 columns WARNING: line length of 92 exceeds 80 columns WARNING: line length of 94 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-PR fail PR summary

Commit Message

Kumar Kartikeya Dwivedi March 20, 2022, 3:55 p.m. UTC
Extending the code in previous commit, introduce referenced kptr
support, which needs to be tagged using 'kptr_ref' tag instead. Unlike
unreferenced kptr, referenced kptr have a lot more restrictions. In
addition to the type matching, only a newly introduced bpf_kptr_xchg
helper is allowed to modify the map value at that offset. This transfers
the referenced pointer being stored into the map, releasing the
references state for the program, and returning the old value and
creating new reference state for the returned pointer.

Similar to unreferenced pointer case, return value for this case will
also be PTR_TO_BTF_ID_OR_NULL. The reference for the returned pointer
must either be eventually released by calling the corresponding release
function, otherwise it must be transferred into another map.

It is also allowed to call bpf_kptr_xchg with a NULL pointer, to clear
the value, and obtain the old value if any.

BPF_LDX, BPF_STX, and BPF_ST cannot access referenced kptr. A future
commit will permit using BPF_LDX for such pointers, but attempt at
making it safe, since the lifetime of object won't be guaranteed.

There are valid reasons to enforce the restriction of permitting only
bpf_kptr_xchg to operate on referenced kptr. The pointer value must be
consistent in face of concurrent modification, and any prior values
contained in the map must also be released before a new one is moved
into the map. To ensure proper transfer of this ownership, bpf_kptr_xchg
returns the old value, which the verifier would require the user to
either free or move into another map, and releases the reference held
for the pointer being moved in.

In the future, direct BPF_XCHG instruction may also be permitted to work
like bpf_kptr_xchg helper.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h            |   7 ++
 include/uapi/linux/bpf.h       |  12 ++++
 kernel/bpf/btf.c               |  11 ++-
 kernel/bpf/helpers.c           |  22 ++++++
 kernel/bpf/verifier.c          | 128 ++++++++++++++++++++++++++++-----
 tools/include/uapi/linux/bpf.h |  12 ++++
 6 files changed, 175 insertions(+), 17 deletions(-)

Comments

Martin KaFai Lau March 22, 2022, 8:59 p.m. UTC | #1
On Sun, Mar 20, 2022 at 09:25:02PM +0530, Kumar Kartikeya Dwivedi wrote:
>  static int map_kptr_match_type(struct bpf_verifier_env *env,
>  			       struct bpf_map_value_off_desc *off_desc,
> -			       struct bpf_reg_state *reg, u32 regno)
> +			       struct bpf_reg_state *reg, u32 regno,
> +			       bool ref_ptr)
>  {
>  	const char *targ_name = kernel_type_name(off_desc->btf, off_desc->btf_id);
>  	const char *reg_name = "";
> +	bool fixed_off_ok = true;
>  
>  	if (reg->type != PTR_TO_BTF_ID && reg->type != PTR_TO_BTF_ID_OR_NULL)
>  		goto bad_type;
> @@ -3525,7 +3530,26 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
>  	/* We need to verify reg->type and reg->btf, before accessing reg->btf */
>  	reg_name = kernel_type_name(reg->btf, reg->btf_id);
>  
> -	if (__check_ptr_off_reg(env, reg, regno, true))
> +	if (ref_ptr) {
> +		if (!reg->ref_obj_id) {
> +			verbose(env, "R%d must be referenced %s%s\n", regno,
> +				reg_type_str(env, PTR_TO_BTF_ID), targ_name);
> +			return -EACCES;
> +		}
The is_release_function() checkings under check_helper_call() is
not the same?

> +		/* reg->off can be used to store pointer to a certain type formed by
> +		 * incrementing pointer of a parent structure the object is embedded in,
> +		 * e.g. map may expect unreferenced struct path *, and user should be
> +		 * allowed a store using &file->f_path. However, in the case of
> +		 * referenced pointer, we cannot do this, because the reference is only
> +		 * for the parent structure, not its embedded object(s), and because
> +		 * the transfer of ownership happens for the original pointer to and
> +		 * from the map (before its eventual release).
> +		 */
> +		if (reg->off)
> +			fixed_off_ok = false;
I thought the new check_func_arg_reg_off() is supposed to handle the
is_release_function() case.  The check_func_arg_reg_off() called
in check_func_arg() can not handle this case?

> +	}
> +	/* var_off is rejected by __check_ptr_off_reg for PTR_TO_BTF_ID */
> +	if (__check_ptr_off_reg(env, reg, regno, fixed_off_ok))
>  		return -EACCES;
>  
>  	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,

[ ... ]

> @@ -5390,6 +5473,7 @@ static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } };
>  static const struct bpf_reg_types stack_ptr_types = { .types = { PTR_TO_STACK } };
>  static const struct bpf_reg_types const_str_ptr_types = { .types = { PTR_TO_MAP_VALUE } };
>  static const struct bpf_reg_types timer_types = { .types = { PTR_TO_MAP_VALUE } };
> +static const struct bpf_reg_types kptr_types = { .types = { PTR_TO_MAP_VALUE } };
>  
>  static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
>  	[ARG_PTR_TO_MAP_KEY]		= &map_key_value_types,
> @@ -5417,11 +5501,13 @@ static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
>  	[ARG_PTR_TO_STACK]		= &stack_ptr_types,
>  	[ARG_PTR_TO_CONST_STR]		= &const_str_ptr_types,
>  	[ARG_PTR_TO_TIMER]		= &timer_types,
> +	[ARG_PTR_TO_KPTR]		= &kptr_types,
>  };
>  
>  static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
>  			  enum bpf_arg_type arg_type,
> -			  const u32 *arg_btf_id)
> +			  const u32 *arg_btf_id,
> +			  struct bpf_call_arg_meta *meta)
>  {
>  	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
>  	enum bpf_reg_type expected, type = reg->type;
> @@ -5474,8 +5560,11 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
>  			arg_btf_id = compatible->btf_id;
>  		}
>  
> -		if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
> -					  btf_vmlinux, *arg_btf_id)) {
> +		if (meta->func_id == BPF_FUNC_kptr_xchg) {
> +			if (map_kptr_match_type(env, meta->kptr_off_desc, reg, regno, true))
> +				return -EACCES;
> +		} else if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
> +						 btf_vmlinux, *arg_btf_id)) {
>  			verbose(env, "R%d is of type %s but %s is expected\n",
>  				regno, kernel_type_name(reg->btf, reg->btf_id),
>  				kernel_type_name(btf_vmlinux, *arg_btf_id));
Kumar Kartikeya Dwivedi March 25, 2022, 2:57 p.m. UTC | #2
On Wed, Mar 23, 2022 at 02:29:12AM IST, Martin KaFai Lau wrote:
> On Sun, Mar 20, 2022 at 09:25:02PM +0530, Kumar Kartikeya Dwivedi wrote:
> >  static int map_kptr_match_type(struct bpf_verifier_env *env,
> >  			       struct bpf_map_value_off_desc *off_desc,
> > -			       struct bpf_reg_state *reg, u32 regno)
> > +			       struct bpf_reg_state *reg, u32 regno,
> > +			       bool ref_ptr)
> >  {
> >  	const char *targ_name = kernel_type_name(off_desc->btf, off_desc->btf_id);
> >  	const char *reg_name = "";
> > +	bool fixed_off_ok = true;
> >
> >  	if (reg->type != PTR_TO_BTF_ID && reg->type != PTR_TO_BTF_ID_OR_NULL)
> >  		goto bad_type;
> > @@ -3525,7 +3530,26 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
> >  	/* We need to verify reg->type and reg->btf, before accessing reg->btf */
> >  	reg_name = kernel_type_name(reg->btf, reg->btf_id);
> >
> > -	if (__check_ptr_off_reg(env, reg, regno, true))
> > +	if (ref_ptr) {
> > +		if (!reg->ref_obj_id) {
> > +			verbose(env, "R%d must be referenced %s%s\n", regno,
> > +				reg_type_str(env, PTR_TO_BTF_ID), targ_name);
> > +			return -EACCES;
> > +		}
> The is_release_function() checkings under check_helper_call() is
> not the same?
>
> > +		/* reg->off can be used to store pointer to a certain type formed by
> > +		 * incrementing pointer of a parent structure the object is embedded in,
> > +		 * e.g. map may expect unreferenced struct path *, and user should be
> > +		 * allowed a store using &file->f_path. However, in the case of
> > +		 * referenced pointer, we cannot do this, because the reference is only
> > +		 * for the parent structure, not its embedded object(s), and because
> > +		 * the transfer of ownership happens for the original pointer to and
> > +		 * from the map (before its eventual release).
> > +		 */
> > +		if (reg->off)
> > +			fixed_off_ok = false;
> I thought the new check_func_arg_reg_off() is supposed to handle the
> is_release_function() case.  The check_func_arg_reg_off() called
> in check_func_arg() can not handle this case?
>

The difference there is, it wouldn't check for reg->off == 0 if reg->ref_obj_id
is 0. So in that case, I should probably check reg->ref_obj_id to be non-zero
when ref_ptr is true, and then call check_func_arg_reg_off, with the comment
that this would eventually be an argument to the release function, so the
argument should be checked with check_func_arg_reg_off.

> > +	}
> > +	/* var_off is rejected by __check_ptr_off_reg for PTR_TO_BTF_ID */
> > +	if (__check_ptr_off_reg(env, reg, regno, fixed_off_ok))
> >  		return -EACCES;
> >
> >  	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
>
> [ ... ]
>
> > @@ -5390,6 +5473,7 @@ static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } };
> >  static const struct bpf_reg_types stack_ptr_types = { .types = { PTR_TO_STACK } };
> >  static const struct bpf_reg_types const_str_ptr_types = { .types = { PTR_TO_MAP_VALUE } };
> >  static const struct bpf_reg_types timer_types = { .types = { PTR_TO_MAP_VALUE } };
> > +static const struct bpf_reg_types kptr_types = { .types = { PTR_TO_MAP_VALUE } };
> >
> >  static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
> >  	[ARG_PTR_TO_MAP_KEY]		= &map_key_value_types,
> > @@ -5417,11 +5501,13 @@ static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
> >  	[ARG_PTR_TO_STACK]		= &stack_ptr_types,
> >  	[ARG_PTR_TO_CONST_STR]		= &const_str_ptr_types,
> >  	[ARG_PTR_TO_TIMER]		= &timer_types,
> > +	[ARG_PTR_TO_KPTR]		= &kptr_types,
> >  };
> >
> >  static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
> >  			  enum bpf_arg_type arg_type,
> > -			  const u32 *arg_btf_id)
> > +			  const u32 *arg_btf_id,
> > +			  struct bpf_call_arg_meta *meta)
> >  {
> >  	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
> >  	enum bpf_reg_type expected, type = reg->type;
> > @@ -5474,8 +5560,11 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
> >  			arg_btf_id = compatible->btf_id;
> >  		}
> >
> > -		if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
> > -					  btf_vmlinux, *arg_btf_id)) {
> > +		if (meta->func_id == BPF_FUNC_kptr_xchg) {
> > +			if (map_kptr_match_type(env, meta->kptr_off_desc, reg, regno, true))
> > +				return -EACCES;
> > +		} else if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
> > +						 btf_vmlinux, *arg_btf_id)) {
> >  			verbose(env, "R%d is of type %s but %s is expected\n",
> >  				regno, kernel_type_name(reg->btf, reg->btf_id),
> >  				kernel_type_name(btf_vmlinux, *arg_btf_id));

--
Kartikeya
Martin KaFai Lau March 25, 2022, 11:39 p.m. UTC | #3
On Fri, Mar 25, 2022 at 08:27:00PM +0530, Kumar Kartikeya Dwivedi wrote:
> On Wed, Mar 23, 2022 at 02:29:12AM IST, Martin KaFai Lau wrote:
> > On Sun, Mar 20, 2022 at 09:25:02PM +0530, Kumar Kartikeya Dwivedi wrote:
> > >  static int map_kptr_match_type(struct bpf_verifier_env *env,
> > >  			       struct bpf_map_value_off_desc *off_desc,
> > > -			       struct bpf_reg_state *reg, u32 regno)
> > > +			       struct bpf_reg_state *reg, u32 regno,
> > > +			       bool ref_ptr)
> > >  {
> > >  	const char *targ_name = kernel_type_name(off_desc->btf, off_desc->btf_id);
> > >  	const char *reg_name = "";
> > > +	bool fixed_off_ok = true;
> > >
> > >  	if (reg->type != PTR_TO_BTF_ID && reg->type != PTR_TO_BTF_ID_OR_NULL)
> > >  		goto bad_type;
> > > @@ -3525,7 +3530,26 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
> > >  	/* We need to verify reg->type and reg->btf, before accessing reg->btf */
> > >  	reg_name = kernel_type_name(reg->btf, reg->btf_id);
> > >
> > > -	if (__check_ptr_off_reg(env, reg, regno, true))
> > > +	if (ref_ptr) {
> > > +		if (!reg->ref_obj_id) {
> > > +			verbose(env, "R%d must be referenced %s%s\n", regno,
> > > +				reg_type_str(env, PTR_TO_BTF_ID), targ_name);
> > > +			return -EACCES;
> > > +		}
> > The is_release_function() checkings under check_helper_call() is
> > not the same?
> >
> > > +		/* reg->off can be used to store pointer to a certain type formed by
> > > +		 * incrementing pointer of a parent structure the object is embedded in,
> > > +		 * e.g. map may expect unreferenced struct path *, and user should be
> > > +		 * allowed a store using &file->f_path. However, in the case of
> > > +		 * referenced pointer, we cannot do this, because the reference is only
> > > +		 * for the parent structure, not its embedded object(s), and because
> > > +		 * the transfer of ownership happens for the original pointer to and
> > > +		 * from the map (before its eventual release).
> > > +		 */
> > > +		if (reg->off)
> > > +			fixed_off_ok = false;
> > I thought the new check_func_arg_reg_off() is supposed to handle the
> > is_release_function() case.  The check_func_arg_reg_off() called
> > in check_func_arg() can not handle this case?
> >
> 
> The difference there is, it wouldn't check for reg->off == 0 if reg->ref_obj_id
> is 0.
If ref_obj_id is not 0, check_func_arg_reg_off() will reject reg->off.
check_func_arg_reg_off is called after check_reg_type().

If ref_obj_id is 0, the is_release_function() check in the
check_helper_call() should complain:
	verbose(env, "func %s#%d reference has not been acquired before\n",
		func_id_name(func_id), func_id);

I am quite confused why it needs special reg->off and
reg->ref_obj_id checking here for the map_kptr helper taking
PTR_TO_BTF_ID arg but not other helper taking PTR_TO_BTF_ID arg.
The existing checks for the other helper taking PTR_TO_BTF_ID
arg is not enough?

> So in that case, I should probably check reg->ref_obj_id to be non-zero
> when ref_ptr is true, and then call check_func_arg_reg_off, with the comment
> that this would eventually be an argument to the release function, so the
> argument should be checked with check_func_arg_reg_off.



> 
> > > +	}
> > > +	/* var_off is rejected by __check_ptr_off_reg for PTR_TO_BTF_ID */
> > > +	if (__check_ptr_off_reg(env, reg, regno, fixed_off_ok))
> > >  		return -EACCES;
> > >
> > >  	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
> >
> > [ ... ]
> >
> > > @@ -5390,6 +5473,7 @@ static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } };
> > >  static const struct bpf_reg_types stack_ptr_types = { .types = { PTR_TO_STACK } };
> > >  static const struct bpf_reg_types const_str_ptr_types = { .types = { PTR_TO_MAP_VALUE } };
> > >  static const struct bpf_reg_types timer_types = { .types = { PTR_TO_MAP_VALUE } };
> > > +static const struct bpf_reg_types kptr_types = { .types = { PTR_TO_MAP_VALUE } };
> > >
> > >  static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
> > >  	[ARG_PTR_TO_MAP_KEY]		= &map_key_value_types,
> > > @@ -5417,11 +5501,13 @@ static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
> > >  	[ARG_PTR_TO_STACK]		= &stack_ptr_types,
> > >  	[ARG_PTR_TO_CONST_STR]		= &const_str_ptr_types,
> > >  	[ARG_PTR_TO_TIMER]		= &timer_types,
> > > +	[ARG_PTR_TO_KPTR]		= &kptr_types,
> > >  };
> > >
> > >  static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
> > >  			  enum bpf_arg_type arg_type,
> > > -			  const u32 *arg_btf_id)
> > > +			  const u32 *arg_btf_id,
> > > +			  struct bpf_call_arg_meta *meta)
> > >  {
> > >  	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
> > >  	enum bpf_reg_type expected, type = reg->type;
> > > @@ -5474,8 +5560,11 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
> > >  			arg_btf_id = compatible->btf_id;
> > >  		}
> > >
> > > -		if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
> > > -					  btf_vmlinux, *arg_btf_id)) {
> > > +		if (meta->func_id == BPF_FUNC_kptr_xchg) {
> > > +			if (map_kptr_match_type(env, meta->kptr_off_desc, reg, regno, true))
> > > +				return -EACCES;
> > > +		} else if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
> > > +						 btf_vmlinux, *arg_btf_id)) {
> > >  			verbose(env, "R%d is of type %s but %s is expected\n",
> > >  				regno, kernel_type_name(reg->btf, reg->btf_id),
> > >  				kernel_type_name(btf_vmlinux, *arg_btf_id));
> 
> --
> Kartikeya
Kumar Kartikeya Dwivedi March 26, 2022, 1:01 a.m. UTC | #4
On Sat, Mar 26, 2022 at 05:09:52AM IST, Martin KaFai Lau wrote:
> On Fri, Mar 25, 2022 at 08:27:00PM +0530, Kumar Kartikeya Dwivedi wrote:
> > On Wed, Mar 23, 2022 at 02:29:12AM IST, Martin KaFai Lau wrote:
> > > On Sun, Mar 20, 2022 at 09:25:02PM +0530, Kumar Kartikeya Dwivedi wrote:
> > > >  static int map_kptr_match_type(struct bpf_verifier_env *env,
> > > >  			       struct bpf_map_value_off_desc *off_desc,
> > > > -			       struct bpf_reg_state *reg, u32 regno)
> > > > +			       struct bpf_reg_state *reg, u32 regno,
> > > > +			       bool ref_ptr)
> > > >  {
> > > >  	const char *targ_name = kernel_type_name(off_desc->btf, off_desc->btf_id);
> > > >  	const char *reg_name = "";
> > > > +	bool fixed_off_ok = true;
> > > >
> > > >  	if (reg->type != PTR_TO_BTF_ID && reg->type != PTR_TO_BTF_ID_OR_NULL)
> > > >  		goto bad_type;
> > > > @@ -3525,7 +3530,26 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
> > > >  	/* We need to verify reg->type and reg->btf, before accessing reg->btf */
> > > >  	reg_name = kernel_type_name(reg->btf, reg->btf_id);
> > > >
> > > > -	if (__check_ptr_off_reg(env, reg, regno, true))
> > > > +	if (ref_ptr) {
> > > > +		if (!reg->ref_obj_id) {
> > > > +			verbose(env, "R%d must be referenced %s%s\n", regno,
> > > > +				reg_type_str(env, PTR_TO_BTF_ID), targ_name);
> > > > +			return -EACCES;
> > > > +		}
> > > The is_release_function() checkings under check_helper_call() is
> > > not the same?
> > >
> > > > +		/* reg->off can be used to store pointer to a certain type formed by
> > > > +		 * incrementing pointer of a parent structure the object is embedded in,
> > > > +		 * e.g. map may expect unreferenced struct path *, and user should be
> > > > +		 * allowed a store using &file->f_path. However, in the case of
> > > > +		 * referenced pointer, we cannot do this, because the reference is only
> > > > +		 * for the parent structure, not its embedded object(s), and because
> > > > +		 * the transfer of ownership happens for the original pointer to and
> > > > +		 * from the map (before its eventual release).
> > > > +		 */
> > > > +		if (reg->off)
> > > > +			fixed_off_ok = false;
> > > I thought the new check_func_arg_reg_off() is supposed to handle the
> > > is_release_function() case.  The check_func_arg_reg_off() called
> > > in check_func_arg() can not handle this case?
> > >
> >
> > The difference there is, it wouldn't check for reg->off == 0 if reg->ref_obj_id
> > is 0.
> If ref_obj_id is not 0, check_func_arg_reg_off() will reject reg->off.
> check_func_arg_reg_off is called after check_reg_type().
>
> If ref_obj_id is 0, the is_release_function() check in the
> check_helper_call() should complain:
> 	verbose(env, "func %s#%d reference has not been acquired before\n",
> 		func_id_name(func_id), func_id);
>
> I am quite confused why it needs special reg->off and
> reg->ref_obj_id checking here for the map_kptr helper taking
> PTR_TO_BTF_ID arg but not other helper taking PTR_TO_BTF_ID arg.
> The existing checks for the other helper taking PTR_TO_BTF_ID
> arg is not enough?
>

Yes, you're right, it should be enough. We just need to check for the normal
case here, with fixed_off_ok = true, since that can come from BPF_STX. In the
referenced case, earlier it was also possible to store using BPF_XCHG, but not
anymore, so now the checks for the helper should be enough, and complain.

Will drop this in v4, and just keep __check_ptr_off_reg(env, reg, regno, false).

> > So in that case, I should probably check reg->ref_obj_id to be non-zero
> > when ref_ptr is true, and then call check_func_arg_reg_off, with the comment
> > that this would eventually be an argument to the release function, so the
> > argument should be checked with check_func_arg_reg_off.
>
>
>
> >
> > > > +	}
> > > > +	/* var_off is rejected by __check_ptr_off_reg for PTR_TO_BTF_ID */
> > > > +	if (__check_ptr_off_reg(env, reg, regno, fixed_off_ok))
> > > >  		return -EACCES;
> > > >
> > > >  	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
> > >
> > > [ ... ]
> > >
> > > > @@ -5390,6 +5473,7 @@ static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } };
> > > >  static const struct bpf_reg_types stack_ptr_types = { .types = { PTR_TO_STACK } };
> > > >  static const struct bpf_reg_types const_str_ptr_types = { .types = { PTR_TO_MAP_VALUE } };
> > > >  static const struct bpf_reg_types timer_types = { .types = { PTR_TO_MAP_VALUE } };
> > > > +static const struct bpf_reg_types kptr_types = { .types = { PTR_TO_MAP_VALUE } };
> > > >
> > > >  static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
> > > >  	[ARG_PTR_TO_MAP_KEY]		= &map_key_value_types,
> > > > @@ -5417,11 +5501,13 @@ static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
> > > >  	[ARG_PTR_TO_STACK]		= &stack_ptr_types,
> > > >  	[ARG_PTR_TO_CONST_STR]		= &const_str_ptr_types,
> > > >  	[ARG_PTR_TO_TIMER]		= &timer_types,
> > > > +	[ARG_PTR_TO_KPTR]		= &kptr_types,
> > > >  };
> > > >
> > > >  static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
> > > >  			  enum bpf_arg_type arg_type,
> > > > -			  const u32 *arg_btf_id)
> > > > +			  const u32 *arg_btf_id,
> > > > +			  struct bpf_call_arg_meta *meta)
> > > >  {
> > > >  	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
> > > >  	enum bpf_reg_type expected, type = reg->type;
> > > > @@ -5474,8 +5560,11 @@ static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
> > > >  			arg_btf_id = compatible->btf_id;
> > > >  		}
> > > >
> > > > -		if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
> > > > -					  btf_vmlinux, *arg_btf_id)) {
> > > > +		if (meta->func_id == BPF_FUNC_kptr_xchg) {
> > > > +			if (map_kptr_match_type(env, meta->kptr_off_desc, reg, regno, true))
> > > > +				return -EACCES;
> > > > +		} else if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
> > > > +						 btf_vmlinux, *arg_btf_id)) {
> > > >  			verbose(env, "R%d is of type %s but %s is expected\n",
> > > >  				regno, kernel_type_name(reg->btf, reg->btf_id),
> > > >  				kernel_type_name(btf_vmlinux, *arg_btf_id));
> >
> > --
> > Kartikeya

--
Kartikeya
diff mbox series

Patch

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 48ddde854d67..6814e4885fab 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -160,10 +160,15 @@  enum {
 	BPF_MAP_VALUE_OFF_MAX = 8,
 };
 
+enum {
+	BPF_MAP_VALUE_OFF_F_REF = (1U << 0),
+};
+
 struct bpf_map_value_off_desc {
 	u32 offset;
 	u32 btf_id;
 	struct btf *btf;
+	int flags;
 };
 
 struct bpf_map_value_off {
@@ -413,6 +418,7 @@  enum bpf_arg_type {
 	ARG_PTR_TO_STACK,	/* pointer to stack */
 	ARG_PTR_TO_CONST_STR,	/* pointer to a null terminated read-only string */
 	ARG_PTR_TO_TIMER,	/* pointer to bpf_timer */
+	ARG_PTR_TO_KPTR,	/* pointer to kptr */
 	__BPF_ARG_TYPE_MAX,
 
 	/* Extended arg_types. */
@@ -422,6 +428,7 @@  enum bpf_arg_type {
 	ARG_PTR_TO_SOCKET_OR_NULL	= PTR_MAYBE_NULL | ARG_PTR_TO_SOCKET,
 	ARG_PTR_TO_ALLOC_MEM_OR_NULL	= PTR_MAYBE_NULL | ARG_PTR_TO_ALLOC_MEM,
 	ARG_PTR_TO_STACK_OR_NULL	= PTR_MAYBE_NULL | ARG_PTR_TO_STACK,
+	ARG_PTR_TO_BTF_ID_OR_NULL	= PTR_MAYBE_NULL | ARG_PTR_TO_BTF_ID,
 
 	/* This must be the last entry. Its purpose is to ensure the enum is
 	 * wide enough to hold the higher bits reserved for bpf_type_flag.
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 7604e7d5438f..b4e89da75d77 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5143,6 +5143,17 @@  union bpf_attr {
  *		The **hash_algo** is returned on success,
  *		**-EOPNOTSUP** if the hash calculation failed or **-EINVAL** if
  *		invalid arguments are passed.
+ *
+ * void *bpf_kptr_xchg(void *map_value, void *ptr)
+ *	Description
+ *		Exchange kptr at pointer *map_value* with *ptr*, and return the
+ *		old value. *ptr* can be NULL, otherwise it must be a referenced
+ *		pointer which will be released when this helper is called.
+ *	Return
+ *		The old value of kptr (which can be NULL). The returned pointer
+ *		if not NULL, is a reference which must be released using its
+ *		corresponding release function, or moved into a BPF map before
+ *		program exit.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5339,6 +5350,7 @@  union bpf_attr {
 	FN(copy_from_user_task),	\
 	FN(skb_set_tstamp),		\
 	FN(ima_file_hash),		\
+	FN(kptr_xchg),			\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 92afbec0a887..e36ad26a5a6e 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3175,6 +3175,7 @@  enum {
 struct btf_field_info {
 	const struct btf_type *type;
 	u32 off;
+	int flags;
 };
 
 static int btf_find_field_struct(const struct btf *btf, const struct btf_type *t,
@@ -3191,6 +3192,8 @@  static int btf_find_field_struct(const struct btf *btf, const struct btf_type *t
 static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
 			       u32 off, int sz, struct btf_field_info *info)
 {
+	int flags;
+
 	/* For PTR, sz is always == 8 */
 	if (!btf_type_is_ptr(t))
 		return BTF_FIELD_IGNORE;
@@ -3201,7 +3204,11 @@  static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
 	/* Reject extra tags */
 	if (btf_type_is_type_tag(btf_type_by_id(btf, t->type)))
 		return -EINVAL;
-	if (strcmp("kptr", __btf_name_by_offset(btf, t->name_off)))
+	if (!strcmp("kptr", __btf_name_by_offset(btf, t->name_off)))
+		flags = 0;
+	else if (!strcmp("kptr_ref", __btf_name_by_offset(btf, t->name_off)))
+		flags = BPF_MAP_VALUE_OFF_F_REF;
+	else
 		return -EINVAL;
 
 	/* Get the base type */
@@ -3213,6 +3220,7 @@  static int btf_find_field_kptr(const struct btf *btf, const struct btf_type *t,
 
 	info->type = t;
 	info->off = off;
+	info->flags = flags;
 	return BTF_FIELD_FOUND;
 }
 
@@ -3415,6 +3423,7 @@  struct bpf_map_value_off *btf_find_kptr(const struct btf *btf,
 		tab->off[i].offset = info_arr[i].off;
 		tab->off[i].btf_id = id;
 		tab->off[i].btf = off_btf;
+		tab->off[i].flags = info_arr[i].flags;
 		tab->nr_off = i + 1;
 	}
 	return tab;
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 315053ef6a75..2e95f94d4efa 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1374,6 +1374,26 @@  void bpf_timer_cancel_and_free(void *val)
 	kfree(t);
 }
 
+BPF_CALL_2(bpf_kptr_xchg, void *, map_value, void *, ptr)
+{
+	unsigned long *kptr = map_value;
+
+	return xchg(kptr, (unsigned long)ptr);
+}
+
+static u32 bpf_kptr_xchg_btf_id;
+
+const struct bpf_func_proto bpf_kptr_xchg_proto = {
+	.func         = bpf_kptr_xchg,
+	.gpl_only     = false,
+	.ret_type     = RET_PTR_TO_BTF_ID_OR_NULL,
+	.ret_btf_id   = &bpf_kptr_xchg_btf_id,
+	.arg1_type    = ARG_PTR_TO_KPTR,
+	.arg2_type    = ARG_PTR_TO_BTF_ID_OR_NULL,
+	.arg2_btf_id  = &bpf_kptr_xchg_btf_id,
+	.arg2_release = true,
+};
+
 const struct bpf_func_proto bpf_get_current_task_proto __weak;
 const struct bpf_func_proto bpf_get_current_task_btf_proto __weak;
 const struct bpf_func_proto bpf_probe_read_user_proto __weak;
@@ -1452,6 +1472,8 @@  bpf_base_func_proto(enum bpf_func_id func_id)
 		return &bpf_timer_start_proto;
 	case BPF_FUNC_timer_cancel:
 		return &bpf_timer_cancel_proto;
+	case BPF_FUNC_kptr_xchg:
+		return &bpf_kptr_xchg_proto;
 	default:
 		break;
 	}
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index b8cd34607215..f731a0b45acb 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -258,6 +258,7 @@  struct bpf_call_arg_meta {
 	struct btf *ret_btf;
 	u32 ret_btf_id;
 	u32 subprogno;
+	struct bpf_map_value_off_desc *kptr_off_desc;
 };
 
 struct btf *btf_vmlinux;
@@ -480,7 +481,8 @@  static bool is_release_function(enum bpf_func_id func_id)
 {
 	return func_id == BPF_FUNC_sk_release ||
 	       func_id == BPF_FUNC_ringbuf_submit ||
-	       func_id == BPF_FUNC_ringbuf_discard;
+	       func_id == BPF_FUNC_ringbuf_discard ||
+	       func_id == BPF_FUNC_kptr_xchg;
 }
 
 static bool may_be_acquire_function(enum bpf_func_id func_id)
@@ -500,7 +502,8 @@  static bool is_acquire_function(enum bpf_func_id func_id,
 	if (func_id == BPF_FUNC_sk_lookup_tcp ||
 	    func_id == BPF_FUNC_sk_lookup_udp ||
 	    func_id == BPF_FUNC_skc_lookup_tcp ||
-	    func_id == BPF_FUNC_ringbuf_reserve)
+	    func_id == BPF_FUNC_ringbuf_reserve ||
+	    func_id == BPF_FUNC_kptr_xchg)
 		return true;
 
 	if (func_id == BPF_FUNC_map_lookup_elem &&
@@ -3510,10 +3513,12 @@  int check_ptr_off_reg(struct bpf_verifier_env *env,
 
 static int map_kptr_match_type(struct bpf_verifier_env *env,
 			       struct bpf_map_value_off_desc *off_desc,
-			       struct bpf_reg_state *reg, u32 regno)
+			       struct bpf_reg_state *reg, u32 regno,
+			       bool ref_ptr)
 {
 	const char *targ_name = kernel_type_name(off_desc->btf, off_desc->btf_id);
 	const char *reg_name = "";
+	bool fixed_off_ok = true;
 
 	if (reg->type != PTR_TO_BTF_ID && reg->type != PTR_TO_BTF_ID_OR_NULL)
 		goto bad_type;
@@ -3525,7 +3530,26 @@  static int map_kptr_match_type(struct bpf_verifier_env *env,
 	/* We need to verify reg->type and reg->btf, before accessing reg->btf */
 	reg_name = kernel_type_name(reg->btf, reg->btf_id);
 
-	if (__check_ptr_off_reg(env, reg, regno, true))
+	if (ref_ptr) {
+		if (!reg->ref_obj_id) {
+			verbose(env, "R%d must be referenced %s%s\n", regno,
+				reg_type_str(env, PTR_TO_BTF_ID), targ_name);
+			return -EACCES;
+		}
+		/* reg->off can be used to store pointer to a certain type formed by
+		 * incrementing pointer of a parent structure the object is embedded in,
+		 * e.g. map may expect unreferenced struct path *, and user should be
+		 * allowed a store using &file->f_path. However, in the case of
+		 * referenced pointer, we cannot do this, because the reference is only
+		 * for the parent structure, not its embedded object(s), and because
+		 * the transfer of ownership happens for the original pointer to and
+		 * from the map (before its eventual release).
+		 */
+		if (reg->off)
+			fixed_off_ok = false;
+	}
+	/* var_off is rejected by __check_ptr_off_reg for PTR_TO_BTF_ID */
+	if (__check_ptr_off_reg(env, reg, regno, fixed_off_ok))
 		return -EACCES;
 
 	if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
@@ -3568,6 +3592,12 @@  static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
 	if (BPF_MODE(insn->code) != BPF_MEM)
 		goto end;
 
+	/* We cannot directly access kptr_ref */
+	if (off_desc->flags & BPF_MAP_VALUE_OFF_F_REF) {
+		verbose(env, "accessing referenced kptr disallowed\n");
+		return -EACCES;
+	}
+
 	if (class == BPF_LDX) {
 		val_reg = reg_state(env, value_regno);
 		/* We can simply mark the value_regno receiving the pointer
@@ -3579,7 +3609,7 @@  static int check_map_kptr_access(struct bpf_verifier_env *env, u32 regno,
 	} else if (class == BPF_STX) {
 		val_reg = reg_state(env, value_regno);
 		if (!register_is_null(val_reg) &&
-		    map_kptr_match_type(env, off_desc, val_reg, value_regno))
+		    map_kptr_match_type(env, off_desc, val_reg, value_regno, false))
 			return -EACCES;
 	} else if (class == BPF_ST) {
 		if (insn->imm) {
@@ -5255,6 +5285,59 @@  static int process_timer_func(struct bpf_verifier_env *env, int regno,
 	return 0;
 }
 
+static int process_kptr_func(struct bpf_verifier_env *env, int regno,
+			     struct bpf_call_arg_meta *meta)
+{
+	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
+	struct bpf_map_value_off_desc *off_desc;
+	struct bpf_map *map_ptr = reg->map_ptr;
+	u32 kptr_off;
+	int ret;
+
+	if (!tnum_is_const(reg->var_off)) {
+		verbose(env,
+			"R%d doesn't have constant offset. kptr has to be at the constant offset\n",
+			regno);
+		return -EINVAL;
+	}
+	if (!map_ptr->btf) {
+		verbose(env, "map '%s' has to have BTF in order to use bpf_kptr_xchg\n",
+			map_ptr->name);
+		return -EINVAL;
+	}
+	if (!map_value_has_kptr(map_ptr)) {
+		ret = PTR_ERR(map_ptr->kptr_off_tab);
+		if (ret == -E2BIG)
+			verbose(env, "map '%s' has more than %d kptr\n", map_ptr->name,
+				BPF_MAP_VALUE_OFF_MAX);
+		else if (ret == -EEXIST)
+			verbose(env, "map '%s' has repeating kptr BTF tags\n", map_ptr->name);
+		else
+			verbose(env, "map '%s' has no valid kptr\n", map_ptr->name);
+		return -EINVAL;
+	}
+
+	meta->map_ptr = map_ptr;
+	/* Check access for BPF_WRITE */
+	meta->raw_mode = true;
+	ret = check_helper_mem_access(env, regno, sizeof(u64), false, meta);
+	if (ret < 0)
+		return ret;
+
+	kptr_off = reg->off + reg->var_off.value;
+	off_desc = bpf_map_kptr_off_contains(map_ptr, kptr_off);
+	if (!off_desc) {
+		verbose(env, "off=%d doesn't point to kptr\n", kptr_off);
+		return -EACCES;
+	}
+	if (!(off_desc->flags & BPF_MAP_VALUE_OFF_F_REF)) {
+		verbose(env, "off=%d kptr isn't referenced kptr\n", kptr_off);
+		return -EACCES;
+	}
+	meta->kptr_off_desc = off_desc;
+	return 0;
+}
+
 static bool arg_type_is_mem_ptr(enum bpf_arg_type type)
 {
 	return base_type(type) == ARG_PTR_TO_MEM ||
@@ -5390,6 +5473,7 @@  static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } };
 static const struct bpf_reg_types stack_ptr_types = { .types = { PTR_TO_STACK } };
 static const struct bpf_reg_types const_str_ptr_types = { .types = { PTR_TO_MAP_VALUE } };
 static const struct bpf_reg_types timer_types = { .types = { PTR_TO_MAP_VALUE } };
+static const struct bpf_reg_types kptr_types = { .types = { PTR_TO_MAP_VALUE } };
 
 static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
 	[ARG_PTR_TO_MAP_KEY]		= &map_key_value_types,
@@ -5417,11 +5501,13 @@  static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = {
 	[ARG_PTR_TO_STACK]		= &stack_ptr_types,
 	[ARG_PTR_TO_CONST_STR]		= &const_str_ptr_types,
 	[ARG_PTR_TO_TIMER]		= &timer_types,
+	[ARG_PTR_TO_KPTR]		= &kptr_types,
 };
 
 static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
 			  enum bpf_arg_type arg_type,
-			  const u32 *arg_btf_id)
+			  const u32 *arg_btf_id,
+			  struct bpf_call_arg_meta *meta)
 {
 	struct bpf_reg_state *regs = cur_regs(env), *reg = &regs[regno];
 	enum bpf_reg_type expected, type = reg->type;
@@ -5474,8 +5560,11 @@  static int check_reg_type(struct bpf_verifier_env *env, u32 regno,
 			arg_btf_id = compatible->btf_id;
 		}
 
-		if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
-					  btf_vmlinux, *arg_btf_id)) {
+		if (meta->func_id == BPF_FUNC_kptr_xchg) {
+			if (map_kptr_match_type(env, meta->kptr_off_desc, reg, regno, true))
+				return -EACCES;
+		} else if (!btf_struct_ids_match(&env->log, reg->btf, reg->btf_id, reg->off,
+						 btf_vmlinux, *arg_btf_id)) {
 			verbose(env, "R%d is of type %s but %s is expected\n",
 				regno, kernel_type_name(reg->btf, reg->btf_id),
 				kernel_type_name(btf_vmlinux, *arg_btf_id));
@@ -5585,7 +5674,7 @@  static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 		 */
 		goto skip_type_check;
 
-	err = check_reg_type(env, regno, arg_type, fn->arg_btf_id[arg]);
+	err = check_reg_type(env, regno, arg_type, fn->arg_btf_id[arg], meta);
 	if (err)
 		return err;
 
@@ -5750,6 +5839,9 @@  static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 			verbose(env, "string is not zero-terminated\n");
 			return -EINVAL;
 		}
+	} else if (arg_type == ARG_PTR_TO_KPTR) {
+		if (process_kptr_func(env, regno, meta))
+			return -EACCES;
 	}
 
 	return err;
@@ -6092,10 +6184,10 @@  static bool check_btf_id_ok(const struct bpf_func_proto *fn)
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(fn->arg_type); i++) {
-		if (fn->arg_type[i] == ARG_PTR_TO_BTF_ID && !fn->arg_btf_id[i])
+		if (base_type(fn->arg_type[i]) == ARG_PTR_TO_BTF_ID && !fn->arg_btf_id[i])
 			return false;
 
-		if (fn->arg_type[i] != ARG_PTR_TO_BTF_ID && fn->arg_btf_id[i])
+		if (base_type(fn->arg_type[i]) != ARG_PTR_TO_BTF_ID && fn->arg_btf_id[i])
 			return false;
 	}
 
@@ -6979,21 +7071,25 @@  static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 			regs[BPF_REG_0].btf_id = meta.ret_btf_id;
 		}
 	} else if (base_type(ret_type) == RET_PTR_TO_BTF_ID) {
+		struct btf *ret_btf;
 		int ret_btf_id;
 
 		mark_reg_known_zero(env, regs, BPF_REG_0);
 		regs[BPF_REG_0].type = PTR_TO_BTF_ID | ret_flag;
-		ret_btf_id = *fn->ret_btf_id;
+		if (func_id == BPF_FUNC_kptr_xchg) {
+			ret_btf = meta.kptr_off_desc->btf;
+			ret_btf_id = meta.kptr_off_desc->btf_id;
+		} else {
+			ret_btf = btf_vmlinux;
+			ret_btf_id = *fn->ret_btf_id;
+		}
 		if (ret_btf_id == 0) {
 			verbose(env, "invalid return type %u of func %s#%d\n",
 				base_type(ret_type), func_id_name(func_id),
 				func_id);
 			return -EINVAL;
 		}
-		/* current BPF helper definitions are only coming from
-		 * built-in code with type IDs from  vmlinux BTF
-		 */
-		regs[BPF_REG_0].btf = btf_vmlinux;
+		regs[BPF_REG_0].btf = ret_btf;
 		regs[BPF_REG_0].btf_id = ret_btf_id;
 	} else {
 		verbose(env, "unknown return type %u of func %s#%d\n",
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 7604e7d5438f..b4e89da75d77 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5143,6 +5143,17 @@  union bpf_attr {
  *		The **hash_algo** is returned on success,
  *		**-EOPNOTSUP** if the hash calculation failed or **-EINVAL** if
  *		invalid arguments are passed.
+ *
+ * void *bpf_kptr_xchg(void *map_value, void *ptr)
+ *	Description
+ *		Exchange kptr at pointer *map_value* with *ptr*, and return the
+ *		old value. *ptr* can be NULL, otherwise it must be a referenced
+ *		pointer which will be released when this helper is called.
+ *	Return
+ *		The old value of kptr (which can be NULL). The returned pointer
+ *		if not NULL, is a reference which must be released using its
+ *		corresponding release function, or moved into a BPF map before
+ *		program exit.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5339,6 +5350,7 @@  union bpf_attr {
 	FN(copy_from_user_task),	\
 	FN(skb_set_tstamp),		\
 	FN(ima_file_hash),		\
+	FN(kptr_xchg),			\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper