diff mbox series

[v2,bpf-next,09/20] bpf: Recognize cast_kern/user instructions in the verifier.

Message ID 20240209040608.98927-10-alexei.starovoitov@gmail.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series bpf: Introduce BPF arena. | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-25 fail Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18 and -O2 optimization
bpf/vmtest-bpf-next-VM_Test-26 success Logs for x86_64-llvm-18 / test
bpf/vmtest-bpf-next-VM_Test-27 success Logs for x86_64-llvm-18 / veristat
bpf/vmtest-bpf-next-VM_Test-2 success Logs for Unittests
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-3 success Logs for Validate matrix.py
bpf/vmtest-bpf-next-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-next-VM_Test-4 fail Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-5 success Logs for aarch64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-6 success Logs for aarch64-gcc / test
bpf/vmtest-bpf-next-VM_Test-7 success Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-9 success Logs for s390x-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-8 fail Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for s390x-gcc / test
bpf/vmtest-bpf-next-VM_Test-11 success Logs for s390x-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-12 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-13 fail Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-14 success Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-15 success Logs for x86_64-gcc / test
bpf/vmtest-bpf-next-VM_Test-16 success Logs for x86_64-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-17 fail Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-19 success Logs for x86_64-llvm-17 / test
bpf/vmtest-bpf-next-VM_Test-20 success Logs for x86_64-llvm-17 / veristat
bpf/vmtest-bpf-next-VM_Test-21 fail Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-23 success Logs for x86_64-llvm-18 / test
bpf/vmtest-bpf-next-VM_Test-24 success Logs for x86_64-llvm-18 / veristat
bpf/vmtest-bpf-next-VM_Test-22 fail Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18 and -O2 optimization
bpf/vmtest-bpf-next-VM_Test-18 fail Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17 and -O2 optimization
netdev/series_format fail Series longer than 15 patches (and no cover letter)
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 7798 this patch: 7798
netdev/build_tools success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 8 maintainers not CCed: jolsa@kernel.org john.fastabend@gmail.com yonghong.song@linux.dev martin.lau@linux.dev song@kernel.org sdf@google.com kpsingh@kernel.org haoluo@google.com
netdev/build_clang success Errors and warnings before: 2369 this patch: 2369
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 8294 this patch: 8294
netdev/checkpatch warning WARNING: line length of 100 exceeds 80 columns WARNING: line length of 102 exceeds 80 columns WARNING: line length of 103 exceeds 80 columns WARNING: line length of 104 exceeds 80 columns WARNING: line length of 106 exceeds 80 columns WARNING: line length of 108 exceeds 80 columns WARNING: line length of 115 exceeds 80 columns WARNING: line length of 81 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns WARNING: line length of 84 exceeds 80 columns WARNING: line length of 85 exceeds 80 columns WARNING: line length of 86 exceeds 80 columns WARNING: line length of 87 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns WARNING: line length of 90 exceeds 80 columns WARNING: line length of 92 exceeds 80 columns WARNING: line length of 94 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 6 this patch: 6
netdev/source_inline success Was 0 now: 0

Commit Message

Alexei Starovoitov Feb. 9, 2024, 4:05 a.m. UTC
From: Alexei Starovoitov <ast@kernel.org>

rX = bpf_cast_kern(rY, addr_space) tells the verifier that rX->type = PTR_TO_ARENA.
Any further operations on PTR_TO_ARENA register have to be in 32-bit domain.

The verifier will mark load/store through PTR_TO_ARENA with PROBE_MEM32.
JIT will generate them as kern_vm_start + 32bit_addr memory accesses.

rX = bpf_cast_user(rY, addr_space) tells the verifier that rX->type = unknown scalar.
If arena->map_flags has BPF_F_NO_USER_CONV set then convert cast_user to mov32 as well.
Otherwise JIT will convert it to:
  rX = (u32)rY;
  if (rX)
     rX |= arena->user_vm_start & ~(u64)~0U;

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/bpf.h          |   1 +
 include/linux/bpf_verifier.h |   1 +
 kernel/bpf/log.c             |   3 ++
 kernel/bpf/verifier.c        | 102 ++++++++++++++++++++++++++++++++---
 4 files changed, 100 insertions(+), 7 deletions(-)

Comments

Eduard Zingerman Feb. 10, 2024, 1:13 a.m. UTC | #1
On Thu, 2024-02-08 at 20:05 -0800, Alexei Starovoitov wrote:
[...]

> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 3c77a3ab1192..5eeb9bf7e324 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c

[...]

> @@ -13837,6 +13844,21 @@ static int adjust_reg_min_max_vals(struct bpf_verifier_env *env,
>  
>  	dst_reg = &regs[insn->dst_reg];
>  	src_reg = NULL;
> +
> +	if (dst_reg->type == PTR_TO_ARENA) {
> +		struct bpf_insn_aux_data *aux = cur_aux(env);
> +
> +		if (BPF_CLASS(insn->code) == BPF_ALU64)
> +			/*
> +			 * 32-bit operations zero upper bits automatically.
> +			 * 64-bit operations need to be converted to 32.
> +			 */
> +			aux->needs_zext = true;

It should be possible to write an example, when the same insn is
visited with both PTR_TO_ARENA and some other PTR type.
Such examples should be rejected as is currently done in do_check()
for BPF_{ST,STX} using save_aux_ptr_type().

[...]

> @@ -13954,16 +13976,17 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
>  	} else if (opcode == BPF_MOV) {
>  
>  		if (BPF_SRC(insn->code) == BPF_X) {
> -			if (insn->imm != 0) {
> -				verbose(env, "BPF_MOV uses reserved fields\n");
> -				return -EINVAL;
> -			}
> -
>  			if (BPF_CLASS(insn->code) == BPF_ALU) {
> -				if (insn->off != 0 && insn->off != 8 && insn->off != 16) {
> +				if ((insn->off != 0 && insn->off != 8 && insn->off != 16) ||
> +				    insn->imm) {
>  					verbose(env, "BPF_MOV uses reserved fields\n");
>  					return -EINVAL;
>  				}
> +			} else if (insn->off == BPF_ARENA_CAST_KERN || insn->off == BPF_ARENA_CAST_USER) {
> +				if (!insn->imm) {
> +					verbose(env, "cast_kern/user insn must have non zero imm32\n");
> +					return -EINVAL;
> +				}
>  			} else {
>  				if (insn->off != 0 && insn->off != 8 && insn->off != 16 &&
>  				    insn->off != 32) {

I think it is now necessary to check insn->imm here,
as is it allows ALU64 move with non-zero imm.

> @@ -13993,7 +14016,12 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
>  			struct bpf_reg_state *dst_reg = regs + insn->dst_reg;
>  
>  			if (BPF_CLASS(insn->code) == BPF_ALU64) {
> -				if (insn->off == 0) {
> +				if (insn->imm) {
> +					/* off == BPF_ARENA_CAST_KERN || off == BPF_ARENA_CAST_USER */
> +					mark_reg_unknown(env, regs, insn->dst_reg);
> +					if (insn->off == BPF_ARENA_CAST_KERN)
> +						dst_reg->type = PTR_TO_ARENA;

This effectively allows casting anything to PTR_TO_ARENA.
Do we want to check that src_reg somehow originates from arena?
Might be tricky, a new type modifier bit or something like that.

> +				} else if (insn->off == 0) {
>  					/* case: R1 = R2
>  					 * copy register state to dest reg
>  					 */
> @@ -14059,6 +14087,9 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
>  						dst_reg->subreg_def = env->insn_idx + 1;
>  						coerce_subreg_to_size_sx(dst_reg, insn->off >> 3);
>  					}
> +				} else if (src_reg->type == PTR_TO_ARENA) {
> +					mark_reg_unknown(env, regs, insn->dst_reg);
> +					dst_reg->type = PTR_TO_ARENA;

This describes a case wX = wY, where rY is PTR_TO_ARENA,
should rX be marked as SCALAR instead of PTR_TO_ARENA?

[...]

> @@ -18235,6 +18272,31 @@ static int resolve_pseudo_ldimm64(struct bpf_verifier_env *env)
>  				fdput(f);
>  				return -EBUSY;
>  			}
> +			if (map->map_type == BPF_MAP_TYPE_ARENA) {
> +				if (env->prog->aux->arena) {

Does this have to be (env->prog->aux->arena && env->prog->aux->arena != map) ?

> +					verbose(env, "Only one arena per program\n");
> +					fdput(f);
> +					return -EBUSY;
> +				}

[...]

> @@ -18799,6 +18861,18 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
>  			   insn->code == (BPF_ST | BPF_MEM | BPF_W) ||
>  			   insn->code == (BPF_ST | BPF_MEM | BPF_DW)) {
>  			type = BPF_WRITE;
> +		} else if (insn->code == (BPF_ALU64 | BPF_MOV | BPF_X) && insn->imm) {
> +			if (insn->off == BPF_ARENA_CAST_KERN ||
> +			    (((struct bpf_map *)env->prog->aux->arena)->map_flags & BPF_F_NO_USER_CONV)) {
> +				/* convert to 32-bit mov that clears upper 32-bit */
> +				insn->code = BPF_ALU | BPF_MOV | BPF_X;
> +				/* clear off, so it's a normal 'wX = wY' from JIT pov */
> +				insn->off = 0;
> +			} /* else insn->off == BPF_ARENA_CAST_USER should be handled by JIT */
> +			continue;
> +		} else if (env->insn_aux_data[i + delta].needs_zext) {
> +			/* Convert BPF_CLASS(insn->code) == BPF_ALU64 to 32-bit ALU */
> +			insn->code = BPF_ALU | BPF_OP(insn->code) | BPF_SRC(insn->code);

Tbh, I think this should be done in do_misc_fixups(),
mixing it with context handling in convert_ctx_accesses()
seems a bit confusing.
Alexei Starovoitov Feb. 13, 2024, 2:58 a.m. UTC | #2
On Fri, Feb 9, 2024 at 5:13 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2024-02-08 at 20:05 -0800, Alexei Starovoitov wrote:
> [...]
>
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 3c77a3ab1192..5eeb9bf7e324 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
>
> [...]
>
> > @@ -13837,6 +13844,21 @@ static int adjust_reg_min_max_vals(struct bpf_verifier_env *env,
> >
> >       dst_reg = &regs[insn->dst_reg];
> >       src_reg = NULL;
> > +
> > +     if (dst_reg->type == PTR_TO_ARENA) {
> > +             struct bpf_insn_aux_data *aux = cur_aux(env);
> > +
> > +             if (BPF_CLASS(insn->code) == BPF_ALU64)
> > +                     /*
> > +                      * 32-bit operations zero upper bits automatically.
> > +                      * 64-bit operations need to be converted to 32.
> > +                      */
> > +                     aux->needs_zext = true;
>
> It should be possible to write an example, when the same insn is
> visited with both PTR_TO_ARENA and some other PTR type.
> Such examples should be rejected as is currently done in do_check()
> for BPF_{ST,STX} using save_aux_ptr_type().

Good catch. Fixed reg_type_mismatch_ok().
Didn't craft a unit test. That will be in a follow up.

> [...]
>
> > @@ -13954,16 +13976,17 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
> >       } else if (opcode == BPF_MOV) {
> >
> >               if (BPF_SRC(insn->code) == BPF_X) {
> > -                     if (insn->imm != 0) {
> > -                             verbose(env, "BPF_MOV uses reserved fields\n");
> > -                             return -EINVAL;
> > -                     }
> > -
> >                       if (BPF_CLASS(insn->code) == BPF_ALU) {
> > -                             if (insn->off != 0 && insn->off != 8 && insn->off != 16) {
> > +                             if ((insn->off != 0 && insn->off != 8 && insn->off != 16) ||
> > +                                 insn->imm) {
> >                                       verbose(env, "BPF_MOV uses reserved fields\n");
> >                                       return -EINVAL;
> >                               }
> > +                     } else if (insn->off == BPF_ARENA_CAST_KERN || insn->off == BPF_ARENA_CAST_USER) {
> > +                             if (!insn->imm) {
> > +                                     verbose(env, "cast_kern/user insn must have non zero imm32\n");
> > +                                     return -EINVAL;
> > +                             }
> >                       } else {
> >                               if (insn->off != 0 && insn->off != 8 && insn->off != 16 &&
> >                                   insn->off != 32) {
>
> I think it is now necessary to check insn->imm here,
> as is it allows ALU64 move with non-zero imm.

great catch too. Fixed.

> > @@ -13993,7 +14016,12 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
> >                       struct bpf_reg_state *dst_reg = regs + insn->dst_reg;
> >
> >                       if (BPF_CLASS(insn->code) == BPF_ALU64) {
> > -                             if (insn->off == 0) {
> > +                             if (insn->imm) {
> > +                                     /* off == BPF_ARENA_CAST_KERN || off == BPF_ARENA_CAST_USER */
> > +                                     mark_reg_unknown(env, regs, insn->dst_reg);
> > +                                     if (insn->off == BPF_ARENA_CAST_KERN)
> > +                                             dst_reg->type = PTR_TO_ARENA;
>
> This effectively allows casting anything to PTR_TO_ARENA.
> Do we want to check that src_reg somehow originates from arena?
> Might be tricky, a new type modifier bit or something like that.

Yes. Casting anything is fine.
I don't think we need to enforce anything.
Those insns will be llvm generated. If src_reg is somehow ptr_to_ctx
or something it's likely llvm bug or crazy manual type cast
by the user, but if they do so let them experience debug pains.
The kernel won't crash.

> > +                             } else if (insn->off == 0) {
> >                                       /* case: R1 = R2
> >                                        * copy register state to dest reg
> >                                        */
> > @@ -14059,6 +14087,9 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
> >                                               dst_reg->subreg_def = env->insn_idx + 1;
> >                                               coerce_subreg_to_size_sx(dst_reg, insn->off >> 3);
> >                                       }
> > +                             } else if (src_reg->type == PTR_TO_ARENA) {
> > +                                     mark_reg_unknown(env, regs, insn->dst_reg);
> > +                                     dst_reg->type = PTR_TO_ARENA;
>
> This describes a case wX = wY, where rY is PTR_TO_ARENA,
> should rX be marked as SCALAR instead of PTR_TO_ARENA?

That was a leftover from earlier experiments when alu64->alu32 was
done early.
Removed this hunk now.

> [...]
>
> > @@ -18235,6 +18272,31 @@ static int resolve_pseudo_ldimm64(struct bpf_verifier_env *env)
> >                               fdput(f);
> >                               return -EBUSY;
> >                       }
> > +                     if (map->map_type == BPF_MAP_TYPE_ARENA) {
> > +                             if (env->prog->aux->arena) {
>
> Does this have to be (env->prog->aux->arena && env->prog->aux->arena != map) ?

No. all maps in used_maps[] are unique.
Adding "env->prog->aux->arena != map" won't make any difference.
It will only be confusing.

> > +                                     verbose(env, "Only one arena per program\n");
> > +                                     fdput(f);
> > +                                     return -EBUSY;
> > +                             }
>
> [...]
>
> > @@ -18799,6 +18861,18 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
> >                          insn->code == (BPF_ST | BPF_MEM | BPF_W) ||
> >                          insn->code == (BPF_ST | BPF_MEM | BPF_DW)) {
> >                       type = BPF_WRITE;
> > +             } else if (insn->code == (BPF_ALU64 | BPF_MOV | BPF_X) && insn->imm) {
> > +                     if (insn->off == BPF_ARENA_CAST_KERN ||
> > +                         (((struct bpf_map *)env->prog->aux->arena)->map_flags & BPF_F_NO_USER_CONV)) {
> > +                             /* convert to 32-bit mov that clears upper 32-bit */
> > +                             insn->code = BPF_ALU | BPF_MOV | BPF_X;
> > +                             /* clear off, so it's a normal 'wX = wY' from JIT pov */
> > +                             insn->off = 0;
> > +                     } /* else insn->off == BPF_ARENA_CAST_USER should be handled by JIT */
> > +                     continue;
> > +             } else if (env->insn_aux_data[i + delta].needs_zext) {
> > +                     /* Convert BPF_CLASS(insn->code) == BPF_ALU64 to 32-bit ALU */
> > +                     insn->code = BPF_ALU | BPF_OP(insn->code) | BPF_SRC(insn->code);
>
> Tbh, I think this should be done in do_misc_fixups(),
> mixing it with context handling in convert_ctx_accesses()
> seems a bit confusing.

Good point. Moved.

Thanks a lot for the review!
Eduard Zingerman Feb. 13, 2024, 12:01 p.m. UTC | #3
On Mon, 2024-02-12 at 18:58 -0800, Alexei Starovoitov wrote:

[...]

> Yes. Casting anything is fine.
> I don't think we need to enforce anything.
> Those insns will be llvm generated. If src_reg is somehow ptr_to_ctx
> or something it's likely llvm bug or crazy manual type cast
> by the user, but if they do so let them experience debug pains.
> The kernel won't crash.

Ok, makes sense.

[...]

> > > @@ -18235,6 +18272,31 @@ static int resolve_pseudo_ldimm64(struct bpf_verifier_env *env)
> > >                               fdput(f);
> > >                               return -EBUSY;
> > >                       }
> > > +                     if (map->map_type == BPF_MAP_TYPE_ARENA) {
> > > +                             if (env->prog->aux->arena) {
> > 
> > Does this have to be (env->prog->aux->arena && env->prog->aux->arena != map) ?
> 
> No. all maps in used_maps[] are unique.
> Adding "env->prog->aux->arena != map" won't make any difference.
> It will only be confusing.

Right, sorry, I missed the loop above that checks if map had been
already seen.
diff mbox series

Patch

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 26419a57bf9f..70d5351427e6 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -889,6 +889,7 @@  enum bpf_reg_type {
 	 * an explicit null check is required for this struct.
 	 */
 	PTR_TO_MEM,		 /* reg points to valid memory region */
+	PTR_TO_ARENA,
 	PTR_TO_BUF,		 /* reg points to a read/write buffer */
 	PTR_TO_FUNC,		 /* reg points to a bpf program function */
 	CONST_PTR_TO_DYNPTR,	 /* reg points to a const struct bpf_dynptr */
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 84365e6dd85d..43c95e3e2a3c 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -547,6 +547,7 @@  struct bpf_insn_aux_data {
 	u32 seen; /* this insn was processed by the verifier at env->pass_cnt */
 	bool sanitize_stack_spill; /* subject to Spectre v4 sanitation */
 	bool zext_dst; /* this insn zero extends dst reg */
+	bool needs_zext; /* alu op needs to clear upper bits */
 	bool storage_get_func_atomic; /* bpf_*_storage_get() with atomic memory alloc */
 	bool is_iter_next; /* bpf_iter_<type>_next() kfunc call */
 	bool call_with_percpu_alloc_ptr; /* {this,per}_cpu_ptr() with prog percpu alloc */
diff --git a/kernel/bpf/log.c b/kernel/bpf/log.c
index 594a234f122b..677076c760ff 100644
--- a/kernel/bpf/log.c
+++ b/kernel/bpf/log.c
@@ -416,6 +416,7 @@  const char *reg_type_str(struct bpf_verifier_env *env, enum bpf_reg_type type)
 		[PTR_TO_XDP_SOCK]	= "xdp_sock",
 		[PTR_TO_BTF_ID]		= "ptr_",
 		[PTR_TO_MEM]		= "mem",
+		[PTR_TO_ARENA]		= "arena",
 		[PTR_TO_BUF]		= "buf",
 		[PTR_TO_FUNC]		= "func",
 		[PTR_TO_MAP_KEY]	= "map_key",
@@ -651,6 +652,8 @@  static void print_reg_state(struct bpf_verifier_env *env,
 	}
 
 	verbose(env, "%s", reg_type_str(env, t));
+	if (t == PTR_TO_ARENA)
+		return;
 	if (t == PTR_TO_STACK) {
 		if (state->frameno != reg->frameno)
 			verbose(env, "[%d]", reg->frameno);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3c77a3ab1192..5eeb9bf7e324 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4370,6 +4370,7 @@  static bool is_spillable_regtype(enum bpf_reg_type type)
 	case PTR_TO_MEM:
 	case PTR_TO_FUNC:
 	case PTR_TO_MAP_KEY:
+	case PTR_TO_ARENA:
 		return true;
 	default:
 		return false;
@@ -5805,6 +5806,8 @@  static int check_ptr_alignment(struct bpf_verifier_env *env,
 	case PTR_TO_XDP_SOCK:
 		pointer_desc = "xdp_sock ";
 		break;
+	case PTR_TO_ARENA:
+		return 0;
 	default:
 		break;
 	}
@@ -6906,6 +6909,9 @@  static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 
 		if (!err && value_regno >= 0 && (rdonly_mem || t == BPF_READ))
 			mark_reg_unknown(env, regs, value_regno);
+	} else if (reg->type == PTR_TO_ARENA) {
+		if (t == BPF_READ && value_regno >= 0)
+			mark_reg_unknown(env, regs, value_regno);
 	} else {
 		verbose(env, "R%d invalid mem access '%s'\n", regno,
 			reg_type_str(env, reg->type));
@@ -8377,6 +8383,7 @@  static int check_func_arg_reg_off(struct bpf_verifier_env *env,
 	case PTR_TO_MEM | MEM_RINGBUF:
 	case PTR_TO_BUF:
 	case PTR_TO_BUF | MEM_RDONLY:
+	case PTR_TO_ARENA:
 	case SCALAR_VALUE:
 		return 0;
 	/* All the rest must be rejected, except PTR_TO_BTF_ID which allows
@@ -13837,6 +13844,21 @@  static int adjust_reg_min_max_vals(struct bpf_verifier_env *env,
 
 	dst_reg = &regs[insn->dst_reg];
 	src_reg = NULL;
+
+	if (dst_reg->type == PTR_TO_ARENA) {
+		struct bpf_insn_aux_data *aux = cur_aux(env);
+
+		if (BPF_CLASS(insn->code) == BPF_ALU64)
+			/*
+			 * 32-bit operations zero upper bits automatically.
+			 * 64-bit operations need to be converted to 32.
+			 */
+			aux->needs_zext = true;
+
+		/* Any arithmetic operations are allowed on arena pointers */
+		return 0;
+	}
+
 	if (dst_reg->type != SCALAR_VALUE)
 		ptr_reg = dst_reg;
 	else
@@ -13954,16 +13976,17 @@  static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
 	} else if (opcode == BPF_MOV) {
 
 		if (BPF_SRC(insn->code) == BPF_X) {
-			if (insn->imm != 0) {
-				verbose(env, "BPF_MOV uses reserved fields\n");
-				return -EINVAL;
-			}
-
 			if (BPF_CLASS(insn->code) == BPF_ALU) {
-				if (insn->off != 0 && insn->off != 8 && insn->off != 16) {
+				if ((insn->off != 0 && insn->off != 8 && insn->off != 16) ||
+				    insn->imm) {
 					verbose(env, "BPF_MOV uses reserved fields\n");
 					return -EINVAL;
 				}
+			} else if (insn->off == BPF_ARENA_CAST_KERN || insn->off == BPF_ARENA_CAST_USER) {
+				if (!insn->imm) {
+					verbose(env, "cast_kern/user insn must have non zero imm32\n");
+					return -EINVAL;
+				}
 			} else {
 				if (insn->off != 0 && insn->off != 8 && insn->off != 16 &&
 				    insn->off != 32) {
@@ -13993,7 +14016,12 @@  static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
 			struct bpf_reg_state *dst_reg = regs + insn->dst_reg;
 
 			if (BPF_CLASS(insn->code) == BPF_ALU64) {
-				if (insn->off == 0) {
+				if (insn->imm) {
+					/* off == BPF_ARENA_CAST_KERN || off == BPF_ARENA_CAST_USER */
+					mark_reg_unknown(env, regs, insn->dst_reg);
+					if (insn->off == BPF_ARENA_CAST_KERN)
+						dst_reg->type = PTR_TO_ARENA;
+				} else if (insn->off == 0) {
 					/* case: R1 = R2
 					 * copy register state to dest reg
 					 */
@@ -14059,6 +14087,9 @@  static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn)
 						dst_reg->subreg_def = env->insn_idx + 1;
 						coerce_subreg_to_size_sx(dst_reg, insn->off >> 3);
 					}
+				} else if (src_reg->type == PTR_TO_ARENA) {
+					mark_reg_unknown(env, regs, insn->dst_reg);
+					dst_reg->type = PTR_TO_ARENA;
 				} else {
 					mark_reg_unknown(env, regs,
 							 insn->dst_reg);
@@ -15142,6 +15173,10 @@  static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
 
 	if (insn->src_reg == BPF_PSEUDO_MAP_VALUE ||
 	    insn->src_reg == BPF_PSEUDO_MAP_IDX_VALUE) {
+		if (map->map_type == BPF_MAP_TYPE_ARENA) {
+			__mark_reg_unknown(env, dst_reg);
+			return 0;
+		}
 		dst_reg->type = PTR_TO_MAP_VALUE;
 		dst_reg->off = aux->map_off;
 		WARN_ON_ONCE(map->max_entries != 1);
@@ -16519,6 +16554,8 @@  static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold,
 		 * the same stack frame, since fp-8 in foo != fp-8 in bar
 		 */
 		return regs_exact(rold, rcur, idmap) && rold->frameno == rcur->frameno;
+	case PTR_TO_ARENA:
+		return true;
 	default:
 		return regs_exact(rold, rcur, idmap);
 	}
@@ -18235,6 +18272,31 @@  static int resolve_pseudo_ldimm64(struct bpf_verifier_env *env)
 				fdput(f);
 				return -EBUSY;
 			}
+			if (map->map_type == BPF_MAP_TYPE_ARENA) {
+				if (env->prog->aux->arena) {
+					verbose(env, "Only one arena per program\n");
+					fdput(f);
+					return -EBUSY;
+				}
+				if (!env->allow_ptr_leaks || !env->bpf_capable) {
+					verbose(env, "CAP_BPF and CAP_PERFMON are required to use arena\n");
+					fdput(f);
+					return -EPERM;
+				}
+				if (!env->prog->jit_requested) {
+					verbose(env, "JIT is required to use arena\n");
+					return -EOPNOTSUPP;
+				}
+				if (!bpf_jit_supports_arena()) {
+					verbose(env, "JIT doesn't support arena\n");
+					return -EOPNOTSUPP;
+				}
+				env->prog->aux->arena = (void *)map;
+				if (!bpf_arena_get_user_vm_start(env->prog->aux->arena)) {
+					verbose(env, "arena's user address must be set via map_extra or mmap()\n");
+					return -EINVAL;
+				}
+			}
 
 			fdput(f);
 next_insn:
@@ -18799,6 +18861,18 @@  static int convert_ctx_accesses(struct bpf_verifier_env *env)
 			   insn->code == (BPF_ST | BPF_MEM | BPF_W) ||
 			   insn->code == (BPF_ST | BPF_MEM | BPF_DW)) {
 			type = BPF_WRITE;
+		} else if (insn->code == (BPF_ALU64 | BPF_MOV | BPF_X) && insn->imm) {
+			if (insn->off == BPF_ARENA_CAST_KERN ||
+			    (((struct bpf_map *)env->prog->aux->arena)->map_flags & BPF_F_NO_USER_CONV)) {
+				/* convert to 32-bit mov that clears upper 32-bit */
+				insn->code = BPF_ALU | BPF_MOV | BPF_X;
+				/* clear off, so it's a normal 'wX = wY' from JIT pov */
+				insn->off = 0;
+			} /* else insn->off == BPF_ARENA_CAST_USER should be handled by JIT */
+			continue;
+		} else if (env->insn_aux_data[i + delta].needs_zext) {
+			/* Convert BPF_CLASS(insn->code) == BPF_ALU64 to 32-bit ALU */
+			insn->code = BPF_ALU | BPF_OP(insn->code) | BPF_SRC(insn->code);
 		} else {
 			continue;
 		}
@@ -18856,6 +18930,14 @@  static int convert_ctx_accesses(struct bpf_verifier_env *env)
 				env->prog->aux->num_exentries++;
 			}
 			continue;
+		case PTR_TO_ARENA:
+			if (BPF_MODE(insn->code) == BPF_MEMSX) {
+				verbose(env, "sign extending loads from arena are not supported yet\n");
+				return -EOPNOTSUPP;
+			}
+			insn->code = BPF_CLASS(insn->code) | BPF_PROBE_MEM32 | BPF_SIZE(insn->code);
+			env->prog->aux->num_exentries++;
+			continue;
 		default:
 			continue;
 		}
@@ -19041,13 +19123,19 @@  static int jit_subprogs(struct bpf_verifier_env *env)
 		func[i]->aux->nr_linfo = prog->aux->nr_linfo;
 		func[i]->aux->jited_linfo = prog->aux->jited_linfo;
 		func[i]->aux->linfo_idx = env->subprog_info[i].linfo_idx;
+		func[i]->aux->arena = prog->aux->arena;
 		num_exentries = 0;
 		insn = func[i]->insnsi;
 		for (j = 0; j < func[i]->len; j++, insn++) {
 			if (BPF_CLASS(insn->code) == BPF_LDX &&
 			    (BPF_MODE(insn->code) == BPF_PROBE_MEM ||
+			     BPF_MODE(insn->code) == BPF_PROBE_MEM32 ||
 			     BPF_MODE(insn->code) == BPF_PROBE_MEMSX))
 				num_exentries++;
+			if ((BPF_CLASS(insn->code) == BPF_STX ||
+			     BPF_CLASS(insn->code) == BPF_ST) &&
+			     BPF_MODE(insn->code) == BPF_PROBE_MEM32)
+				num_exentries++;
 		}
 		func[i]->aux->num_exentries = num_exentries;
 		func[i]->aux->tail_call_reachable = env->subprog_info[i].tail_call_reachable;