diff mbox series

[bpf,v2,1/2] bpf: fix accesses to uninit stack slots

Message ID 20231126015045.1092826-2-andreimatei1@gmail.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series bpf: fix accesses to uninit stack slots | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/codegen success Generated files up to date
netdev/tree_selection success Clearly marked for bpf
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1178 this patch: 1178
netdev/cc_maintainers fail 1 blamed authors not CCed: ast@kernel.org; 16 maintainers not CCed: bjorn@rivosinc.com shuah@kernel.org andrii@kernel.org jolsa@kernel.org song@kernel.org ast@kernel.org martin.lau@linux.dev kpsingh@kernel.org john.fastabend@gmail.com sdf@google.com haoluo@google.com void@manifault.com yonghong.song@linux.dev mykolal@fb.com linux-kselftest@vger.kernel.org daniel@iogearbox.net
netdev/build_clang success Errors and warnings before: 1143 this patch: 1143
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1205 this patch: 1205
netdev/checkpatch warning CHECK: Lines should not end with a '(' WARNING: Missing a blank line after declarations WARNING: Please use correct Fixes: style 'Fixes: <12 chars of sha1> ("<title line>")' - ie: 'Fixes: 01f810ace9ed ("bpf: Allow variable-offset stack access")' WARNING: line length of 83 exceeds 80 columns WARNING: line length of 97 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-VM_Test-2 success Logs for Validate matrix.py
bpf/vmtest-bpf-VM_Test-4 success Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-3 success Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-VM_Test-7 success Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-6 success Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-8 success Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-VM_Test-9 success Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-VM_Test-5 success Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-14 success Logs for s390x-gcc / veristat
bpf/vmtest-bpf-VM_Test-15 success Logs for set-matrix
bpf/vmtest-bpf-VM_Test-16 success Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-VM_Test-17 success Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-18 success Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-19 success Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-20 success Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-21 success Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-22 success Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-23 success Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-26 success Logs for x86_64-llvm-16 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-25 success Logs for x86_64-llvm-16 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-24 success Logs for x86_64-llvm-16 / build / build for x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-27 success Logs for x86_64-llvm-16 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-29 success Logs for x86_64-llvm-16 / veristat
bpf/vmtest-bpf-VM_Test-28 success Logs for x86_64-llvm-16 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-13 success Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
bpf/vmtest-bpf-VM_Test-12 success Logs for s390x-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-VM_Test-11 success Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc
bpf/vmtest-bpf-PR success PR summary
bpf/vmtest-bpf-VM_Test-10 success Logs for s390x-gcc / test (test_maps, false, 360) / test_maps on s390x with gcc

Commit Message

Andrei Matei Nov. 26, 2023, 1:50 a.m. UTC
Privileged programs are supposed to be able to read uninitialized stack
memory (ever since 6715df8d5) but, before this patch, these accesses
were permitted inconsistently. In particular, accesses were permitted
above state->allocated_stack, but not below it. In other words, if the
stack was already "large enough", the access was permitted, but
otherwise the access was rejected instead of being allowed to "grow the
stack". This undesired rejection was happening in two places:
- in check_stack_slot_within_bounds()
- in check_stack_range_initialized()
This patch arranges for these accesses to be permitted.

This patch also fixes the tracking of the stack size for variable-offset
reads. This second fix is bundled in the same commit as the first one
because they're inter-related. Before this patch, writes to the stack
using registers containing a variable offset (as opposed to registers
with fixed, known values) were not properly contributing to the
function's needed stack size. As a result, it was possible for a program
to verify, but then to attempt to read out-of-bounds data at runtime
because a too small stack had been allocated for it.

Each function tracks the size of the stack it needs in
bpf_subprog_info.stack_depth, which is maintained by
update_stack_depth(). For regular memory accesses, check_mem_access()
was calling update_state_depth() but it was passing in only the fixed
part of the offset register, ignoring the variable offset. This was
incorrect; the minimum possible value of that register should be used
instead.

This tracking is now fixed by centralizing the tracking of stack size in
grow_stack_state(), and by lifting the calls to grow_stack_state() to
check_stack_access_within_bounds() as suggested by Andrii. The code is
now simpler and more convincingly tracks the correct maximum stack size.
check_stack_range_initialized() can now rely on enough stack having been
allocated for the access; this helps with the fix for the first issue.

Reported-by: Hao Sun <sunhao.th@gmail.com>
Fixes: 01f810ace9ed3 ("bpf: Allow variable-offset stack access")
Closes: https://lore.kernel.org/bpf/CABWLsev9g8UP_c3a=1qbuZUi20tGoUXoU07FPf-5FLvhOKOY+Q@mail.gmail.com/
Signed-off-by: Andrei Matei <andreimatei1@gmail.com>
---
 include/linux/bpf_verifier.h                  |  4 ++
 kernel/bpf/verifier.c                         | 70 ++++++++-----------
 .../selftests/bpf/progs/test_global_func16.c  |  2 +-
 .../bpf/progs/verifier_basic_stack.c          |  6 +-
 .../selftests/bpf/progs/verifier_int_ptr.c    |  2 +-
 .../selftests/bpf/progs/verifier_raw_stack.c  |  2 +-
 .../selftests/bpf/progs/verifier_var_off.c    |  4 +-
 .../selftests/bpf/verifier/atomic_cmpxchg.c   | 11 ---
 tools/testing/selftests/bpf/verifier/calls.c  |  2 +-
 9 files changed, 42 insertions(+), 61 deletions(-)

Comments

Eduard Zingerman Nov. 28, 2023, 1:33 a.m. UTC | #1
On Sat, 2023-11-25 at 20:50 -0500, Andrei Matei wrote:
> Privileged programs are supposed to be able to read uninitialized stack
> memory (ever since 6715df8d5) but, before this patch, these accesses
> were permitted inconsistently. In particular, accesses were permitted
> above state->allocated_stack, but not below it. In other words, if the
> stack was already "large enough", the access was permitted, but
> otherwise the access was rejected instead of being allowed to "grow the
> stack". This undesired rejection was happening in two places:
> - in check_stack_slot_within_bounds()
> - in check_stack_range_initialized()
> This patch arranges for these accesses to be permitted.
> 
> This patch also fixes the tracking of the stack size for variable-offset
> reads. This second fix is bundled in the same commit as the first one
> because they're inter-related. Before this patch, writes to the stack
> using registers containing a variable offset (as opposed to registers
> with fixed, known values) were not properly contributing to the
> function's needed stack size. As a result, it was possible for a program
> to verify, but then to attempt to read out-of-bounds data at runtime
> because a too small stack had been allocated for it.
> 
> Each function tracks the size of the stack it needs in
> bpf_subprog_info.stack_depth, which is maintained by
> update_stack_depth(). For regular memory accesses, check_mem_access()
> was calling update_state_depth() but it was passing in only the fixed
> part of the offset register, ignoring the variable offset. This was
> incorrect; the minimum possible value of that register should be used
> instead.
> 
> This tracking is now fixed by centralizing the tracking of stack size in
> grow_stack_state(), and by lifting the calls to grow_stack_state() to
> check_stack_access_within_bounds() as suggested by Andrii. The code is
> now simpler and more convincingly tracks the correct maximum stack size.
> check_stack_range_initialized() can now rely on enough stack having been
> allocated for the access; this helps with the fix for the first issue.
> 
> Reported-by: Hao Sun <sunhao.th@gmail.com>
> Fixes: 01f810ace9ed3 ("bpf: Allow variable-offset stack access")
> Closes: https://lore.kernel.org/bpf/CABWLsev9g8UP_c3a=1qbuZUi20tGoUXoU07FPf-5FLvhOKOY+Q@mail.gmail.com/
> Signed-off-by: Andrei Matei <andreimatei1@gmail.com>
> ---

I think these changes make sense.
Question: would it be possible to recover some of the tests (those
converted from failure to success) by changing execution mode from
priv to unpriv?
  
Also, I think there are some tests that do oob stack read in branches
that should be proven unreachable, with expectation that if certain
verifier logic does not work as expected stack access would serve as a
canary. Have no idea how to identify these tests, though.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>

[...]
> @@ -1697,6 +1699,12 @@ static int grow_stack_state(struct bpf_func_state *state, int size)
>  		return -ENOMEM;
>  
>  	state->allocated_stack = size;
> +
> +	/* update known max for given subprogram */
> +	u16 stack = env->subprog_info[state->subprogno].stack_depth;

Nit: 'u16 stack;' should be at the top of the function.

> +	if (stack < size)
> +		env->subprog_info[state->subprogno].stack_depth = size;
> +
>  	return 0;
>  }

[...]
Eduard Zingerman Nov. 28, 2023, 1:43 a.m. UTC | #2
On Tue, 2023-11-28 at 03:33 +0200, Eduard Zingerman wrote:
> I think these changes make sense.

(Under assumption that access to uninitialized stack should be allowed,
 although I implemented the patch-set that made such behavior legal
 I still don't really like this idea).
Eduard Zingerman Nov. 28, 2023, 2:14 p.m. UTC | #3
On Tue, 2023-11-28 at 03:33 +0200, Eduard Zingerman wrote:
[...]
> Also, I think there are some tests that do oob stack read in branches
> that should be proven unreachable, with expectation that if certain
> verifier logic does not work as expected stack access would serve as a
> canary. Have no idea how to identify these tests, though.

I looked through all test cases I ever added (not so many) and it
seems that only one test case should be updated:

diff --git a/tools/testing/selftests/bpf/progs/iters.c b/tools/testing/selftests/bpf/progs/iters.c
index b2181f850d3e..3aca3dc145b5 100644
--- a/tools/testing/selftests/bpf/progs/iters.c
+++ b/tools/testing/selftests/bpf/progs/iters.c
@@ -846,7 +846,7 @@ __naked int delayed_precision_mark(void)
                "call %[bpf_iter_num_next];"
                "if r0 == 0 goto 2f;"
                "if r6 != 42 goto 3f;"
-               "r7 = -32;"
+               "r7 = -33;"
                "call %[bpf_get_prandom_u32];"
                "r6 = r0;"
                "goto 1b;\n"

Here oob access is replaced by unaligned, this does not affect error
message, but should be future proof in case if widening logic would
get smarter.
Andrii Nakryiko Nov. 29, 2023, 6:05 a.m. UTC | #4
On Sat, Nov 25, 2023 at 5:53 PM Andrei Matei <andreimatei1@gmail.com> wrote:
>
> Privileged programs are supposed to be able to read uninitialized stack
> memory (ever since 6715df8d5) but, before this patch, these accesses
> were permitted inconsistently. In particular, accesses were permitted
> above state->allocated_stack, but not below it. In other words, if the
> stack was already "large enough", the access was permitted, but
> otherwise the access was rejected instead of being allowed to "grow the
> stack". This undesired rejection was happening in two places:
> - in check_stack_slot_within_bounds()
> - in check_stack_range_initialized()
> This patch arranges for these accesses to be permitted.
>
> This patch also fixes the tracking of the stack size for variable-offset
> reads. This second fix is bundled in the same commit as the first one
> because they're inter-related. Before this patch, writes to the stack
> using registers containing a variable offset (as opposed to registers
> with fixed, known values) were not properly contributing to the
> function's needed stack size. As a result, it was possible for a program
> to verify, but then to attempt to read out-of-bounds data at runtime
> because a too small stack had been allocated for it.
>
> Each function tracks the size of the stack it needs in
> bpf_subprog_info.stack_depth, which is maintained by
> update_stack_depth(). For regular memory accesses, check_mem_access()
> was calling update_state_depth() but it was passing in only the fixed
> part of the offset register, ignoring the variable offset. This was
> incorrect; the minimum possible value of that register should be used
> instead.
>
> This tracking is now fixed by centralizing the tracking of stack size in
> grow_stack_state(), and by lifting the calls to grow_stack_state() to
> check_stack_access_within_bounds() as suggested by Andrii. The code is
> now simpler and more convincingly tracks the correct maximum stack size.
> check_stack_range_initialized() can now rely on enough stack having been
> allocated for the access; this helps with the fix for the first issue.
>
> Reported-by: Hao Sun <sunhao.th@gmail.com>
> Fixes: 01f810ace9ed3 ("bpf: Allow variable-offset stack access")
> Closes: https://lore.kernel.org/bpf/CABWLsev9g8UP_c3a=1qbuZUi20tGoUXoU07FPf-5FLvhOKOY+Q@mail.gmail.com/
> Signed-off-by: Andrei Matei <andreimatei1@gmail.com>
> ---
>  include/linux/bpf_verifier.h                  |  4 ++
>  kernel/bpf/verifier.c                         | 70 ++++++++-----------
>  .../selftests/bpf/progs/test_global_func16.c  |  2 +-
>  .../bpf/progs/verifier_basic_stack.c          |  6 +-
>  .../selftests/bpf/progs/verifier_int_ptr.c    |  2 +-
>  .../selftests/bpf/progs/verifier_raw_stack.c  |  2 +-
>  .../selftests/bpf/progs/verifier_var_off.c    |  4 +-
>  .../selftests/bpf/verifier/atomic_cmpxchg.c   | 11 ---
>  tools/testing/selftests/bpf/verifier/calls.c  |  2 +-
>  9 files changed, 42 insertions(+), 61 deletions(-)
>
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index aa4d19d0bc94..5fc389e8be35 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -630,6 +630,10 @@ struct bpf_verifier_env {
>         int exception_callback_subprog;
>         bool explore_alu_limits;
>         bool allow_ptr_leaks;
> +       /* Allow access to uninitialized stack memory. Writes with fixed offset are
> +        * always allowed, so this refers to reads (with fixed or variable offset),
> +        * to writes with variable offset and to indirect (helper) accesses.
> +        */
>         bool allow_uninit_stack;
>         bool bpf_capable;
>         bool bypass_spec_v1;
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index af2819d5c8ee..f9546dd73f3c 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -1685,10 +1685,12 @@ static int resize_reference_state(struct bpf_func_state *state, size_t n)
>         return 0;
>  }
>
> -static int grow_stack_state(struct bpf_func_state *state, int size)
> +/* Possibly update state->allocated_stack to be at least size bytes. Also
> + * possibly update the function's high-water mark in its bpf_subprog_info.
> + */
> +static int grow_stack_state(struct bpf_verifier_env *env, struct bpf_func_state *state, int size)
>  {
>         size_t old_n = state->allocated_stack / BPF_REG_SIZE, n = size / BPF_REG_SIZE;

shouldn't this be rounding up? (size + BPF_REG_SIZE - 1) / BPF_REG_SIZE?

> -

According to kernel code style, there should be an empty line between
variable declaration and other statements. Please keep this empty
line.

>         if (old_n >= n)
>                 return 0;
>

[...]

> @@ -6761,13 +6748,15 @@ static int check_ptr_to_map_access(struct bpf_verifier_env *env,
>   * The minimum valid offset is -MAX_BPF_STACK for writes, and
>   * -state->allocated_stack for reads.
>   */
> -static int check_stack_slot_within_bounds(int off,
> -                                         struct bpf_func_state *state,
> -                                         enum bpf_access_type t)
> +static int check_stack_slot_within_bounds(
> +               struct bpf_verifier_env *env,
> +               int off,
> +               struct bpf_func_state *state,
> +               enum bpf_access_type t)

nit: please keep code style, first argument is normally on the same
line as opening parenthesis

>  {
>         int min_valid_off;
>
> -       if (t == BPF_WRITE)
> +       if (t == BPF_WRITE || env->allow_uninit_stack)
>                 min_valid_off = -MAX_BPF_STACK;
>         else
>                 min_valid_off = -state->allocated_stack;

[...]

> diff --git a/tools/testing/selftests/bpf/progs/verifier_var_off.c b/tools/testing/selftests/bpf/progs/verifier_var_off.c
> index 83a90afba785..bbf3628c625a 100644
> --- a/tools/testing/selftests/bpf/progs/verifier_var_off.c
> +++ b/tools/testing/selftests/bpf/progs/verifier_var_off.c
> @@ -61,7 +61,7 @@ __naked void stack_read_priv_vs_unpriv(void)
>
>  SEC("lwt_in")
>  __description("variable-offset stack read, uninitialized")
> -__failure __msg("invalid variable-offset read from stack R2")
> +__success
>  __naked void variable_offset_stack_read_uninitialized(void)
>  {
>         asm volatile ("                                 \
> @@ -255,7 +255,7 @@ __naked void access_min_out_of_bound(void)
>
>  SEC("lwt_in")
>  __description("indirect variable-offset stack access, min_off < min_initialized")
> -__failure __msg("invalid indirect read from stack R2 var_off")
> +__success

as Eduard mentioned, can you please try updating program type to
something that is allowed in unprivileged mode, e.g., SEC("socket")
and make sure that it still fails in unpriv mode

>  __naked void access_min_off_min_initialized(void)
>  {
>         asm volatile ("                                 \

[...]
Andrei Matei Nov. 29, 2023, 4:48 p.m. UTC | #5
[...]

> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index af2819d5c8ee..f9546dd73f3c 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -1685,10 +1685,12 @@ static int resize_reference_state(struct bpf_func_state *state, size_t n)
> >         return 0;
> >  }
> >
> > -static int grow_stack_state(struct bpf_func_state *state, int size)
> > +/* Possibly update state->allocated_stack to be at least size bytes. Also
> > + * possibly update the function's high-water mark in its bpf_subprog_info.
> > + */
> > +static int grow_stack_state(struct bpf_verifier_env *env, struct bpf_func_state *state, int size)
> >  {
> >         size_t old_n = state->allocated_stack / BPF_REG_SIZE, n = size / BPF_REG_SIZE;
>
> shouldn't this be rounding up? (size + BPF_REG_SIZE - 1) / BPF_REG_SIZE?

You're saying this was always broken, regardless of the current patch, right? I
think you're right, but that seems like a bug that should have been
caught somehow; I'm surprised no programs crashed the verifier. Perhaps in
practice all stack accesses are 8-byte aligned, so the rounding doesn't matter?

I'll spend a bit of time reading code and come back.

[...]
Andrii Nakryiko Nov. 29, 2023, 11:54 p.m. UTC | #6
On Wed, Nov 29, 2023 at 8:48 AM Andrei Matei <andreimatei1@gmail.com> wrote:
>
> [...]
>
> > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > index af2819d5c8ee..f9546dd73f3c 100644
> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -1685,10 +1685,12 @@ static int resize_reference_state(struct bpf_func_state *state, size_t n)
> > >         return 0;
> > >  }
> > >
> > > -static int grow_stack_state(struct bpf_func_state *state, int size)
> > > +/* Possibly update state->allocated_stack to be at least size bytes. Also
> > > + * possibly update the function's high-water mark in its bpf_subprog_info.
> > > + */
> > > +static int grow_stack_state(struct bpf_verifier_env *env, struct bpf_func_state *state, int size)
> > >  {
> > >         size_t old_n = state->allocated_stack / BPF_REG_SIZE, n = size / BPF_REG_SIZE;
> >
> > shouldn't this be rounding up? (size + BPF_REG_SIZE - 1) / BPF_REG_SIZE?
>
> You're saying this was always broken, regardless of the current patch, right? I

I think so, yes...

> think you're right, but that seems like a bug that should have been
> caught somehow; I'm surprised no programs crashed the verifier. Perhaps in
> practice all stack accesses are 8-byte aligned, so the rounding doesn't matter?
>
> I'll spend a bit of time reading code and come back.

Thanks!

>
> [...]
Andrei Matei Dec. 2, 2023, 10:41 p.m. UTC | #7
On Wed, Nov 29, 2023 at 6:55 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Wed, Nov 29, 2023 at 8:48 AM Andrei Matei <andreimatei1@gmail.com> wrote:
> >
> > [...]
> >
> > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > > index af2819d5c8ee..f9546dd73f3c 100644
> > > > --- a/kernel/bpf/verifier.c
> > > > +++ b/kernel/bpf/verifier.c
> > > > @@ -1685,10 +1685,12 @@ static int resize_reference_state(struct bpf_func_state *state, size_t n)
> > > >         return 0;
> > > >  }
> > > >
> > > > -static int grow_stack_state(struct bpf_func_state *state, int size)
> > > > +/* Possibly update state->allocated_stack to be at least size bytes. Also
> > > > + * possibly update the function's high-water mark in its bpf_subprog_info.
> > > > + */
> > > > +static int grow_stack_state(struct bpf_verifier_env *env, struct bpf_func_state *state, int size)
> > > >  {
> > > >         size_t old_n = state->allocated_stack / BPF_REG_SIZE, n = size / BPF_REG_SIZE;
> > >
> > > shouldn't this be rounding up? (size + BPF_REG_SIZE - 1) / BPF_REG_SIZE?
> >
> > You're saying this was always broken, regardless of the current patch, right? I
>
> I think so, yes...

I believe that this code was OK after all because all the callers to
grow_stack_state() do the rounding. This seems fragile though; I'll include a
patch to push the rounding inside grow_stack_state() if it's OK to you.

While reading the code around how the stack is accessed, something else caught
my eye. I'm not sure if there's a problem or not; perhaps you could take a
look. Around here in check_stack_write_fixed_off() [1], the code that deals
with register spills has a couple of cases, including:

[1] https://github.com/torvalds/linux/blob/815fb87b753055df2d9e50f6cd80eb10235fe3e9/kernel/bpf/verifier.c#L4733-L4744
} else if (reg && is_spillable_regtype(reg->type)) {
  /* register containing pointer is being spilled into stack */
  if (size != BPF_REG_SIZE) {
    verbose_linfo(env, insn_idx, "; ");
    verbose(env, "invalid size of register spill\n");
    return -EACCES;
  }
  if (state != cur && reg->type == PTR_TO_STACK) {
    verbose(env, "cannot spill pointers to stack into stack frame of
the caller\n");
    return -EINVAL;
  }
  save_register_state(state, spi, reg, size);
}


This branch, as opposed to all the others, calls save_register_state() without
checking that `!(off % BPF_REG_SIZE)`. I believe save_register_spill()
implicitly assumes that the access is aligned, so it'd be bad if `off` was not
aligned. Is it possible for the offset to not be aligned? I'm not sure... The
higher-level check_mem_access() starts with a check_ptr_alignment(), which I
think does the required check at least on the code paths that I've tried. But
then why do all the other branches in check_stack_write_fixed_off() check for
the alignment explicitly? FWIW, the branch in question also has a
is_spillable_regtype() check which is perhaps relevant.

Would an assertion that off is indeed aligned in the branch I'm talking about
(or elsewhere around) be welcomed?

Apologies if I'm paranoid for no reason.


>
> > think you're right, but that seems like a bug that should have been
> > caught somehow; I'm surprised no programs crashed the verifier. Perhaps in
> > practice all stack accesses are 8-byte aligned, so the rounding doesn't matter?
> >
> > I'll spend a bit of time reading code and come back.
>
> Thanks!
>
> >
> > [...]
diff mbox series

Patch

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index aa4d19d0bc94..5fc389e8be35 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -630,6 +630,10 @@  struct bpf_verifier_env {
 	int exception_callback_subprog;
 	bool explore_alu_limits;
 	bool allow_ptr_leaks;
+	/* Allow access to uninitialized stack memory. Writes with fixed offset are
+	 * always allowed, so this refers to reads (with fixed or variable offset),
+	 * to writes with variable offset and to indirect (helper) accesses.
+	 */
 	bool allow_uninit_stack;
 	bool bpf_capable;
 	bool bypass_spec_v1;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index af2819d5c8ee..f9546dd73f3c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1685,10 +1685,12 @@  static int resize_reference_state(struct bpf_func_state *state, size_t n)
 	return 0;
 }
 
-static int grow_stack_state(struct bpf_func_state *state, int size)
+/* Possibly update state->allocated_stack to be at least size bytes. Also
+ * possibly update the function's high-water mark in its bpf_subprog_info.
+ */
+static int grow_stack_state(struct bpf_verifier_env *env, struct bpf_func_state *state, int size)
 {
 	size_t old_n = state->allocated_stack / BPF_REG_SIZE, n = size / BPF_REG_SIZE;
-
 	if (old_n >= n)
 		return 0;
 
@@ -1697,6 +1699,12 @@  static int grow_stack_state(struct bpf_func_state *state, int size)
 		return -ENOMEM;
 
 	state->allocated_stack = size;
+
+	/* update known max for given subprogram */
+	u16 stack = env->subprog_info[state->subprogno].stack_depth;
+	if (stack < size)
+		env->subprog_info[state->subprogno].stack_depth = size;
+
 	return 0;
 }
 
@@ -4669,9 +4677,6 @@  static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
 	struct bpf_reg_state *reg = NULL;
 	u32 dst_reg = insn->dst_reg;
 
-	err = grow_stack_state(state, round_up(slot + 1, BPF_REG_SIZE));
-	if (err)
-		return err;
 	/* caller checked that off % size == 0 and -MAX_BPF_STACK <= off < 0,
 	 * so it's aligned access and [off, off + size) are within stack limits
 	 */
@@ -4827,10 +4832,6 @@  static int check_stack_write_var_off(struct bpf_verifier_env *env,
 	    (!value_reg && is_bpf_st_mem(insn) && insn->imm == 0))
 		writing_zero = true;
 
-	err = grow_stack_state(state, round_up(-min_off, BPF_REG_SIZE));
-	if (err)
-		return err;
-
 	for (i = min_off; i < max_off; i++) {
 		int spi;
 
@@ -5959,20 +5960,6 @@  static int check_ptr_alignment(struct bpf_verifier_env *env,
 					   strict);
 }
 
-static int update_stack_depth(struct bpf_verifier_env *env,
-			      const struct bpf_func_state *func,
-			      int off)
-{
-	u16 stack = env->subprog_info[func->subprogno].stack_depth;
-
-	if (stack >= -off)
-		return 0;
-
-	/* update known max for given subprogram */
-	env->subprog_info[func->subprogno].stack_depth = -off;
-	return 0;
-}
-
 /* starting from main bpf function walk all instructions of the function
  * and recursively walk all callees that given function can call.
  * Ignore jump and exit insns.
@@ -6761,13 +6748,15 @@  static int check_ptr_to_map_access(struct bpf_verifier_env *env,
  * The minimum valid offset is -MAX_BPF_STACK for writes, and
  * -state->allocated_stack for reads.
  */
-static int check_stack_slot_within_bounds(int off,
-					  struct bpf_func_state *state,
-					  enum bpf_access_type t)
+static int check_stack_slot_within_bounds(
+		struct bpf_verifier_env *env,
+		int off,
+		struct bpf_func_state *state,
+		enum bpf_access_type t)
 {
 	int min_valid_off;
 
-	if (t == BPF_WRITE)
+	if (t == BPF_WRITE || env->allow_uninit_stack)
 		min_valid_off = -MAX_BPF_STACK;
 	else
 		min_valid_off = -state->allocated_stack;
@@ -6822,9 +6811,9 @@  static int check_stack_access_within_bounds(
 			max_off = min_off;
 	}
 
-	err = check_stack_slot_within_bounds(min_off, state, type);
+	err = check_stack_slot_within_bounds(env, min_off, state, type);
 	if (!err)
-		err = check_stack_slot_within_bounds(max_off, state, type);
+		err = check_stack_slot_within_bounds(env, max_off, state, type);
 
 	if (err) {
 		if (tnum_is_const(reg->var_off)) {
@@ -6837,8 +6826,10 @@  static int check_stack_access_within_bounds(
 			verbose(env, "invalid variable-offset%s stack R%d var_off=%s size=%d\n",
 				err_extra, regno, tn_buf, access_size);
 		}
+		return err;
 	}
-	return err;
+
+	return grow_stack_state(env, state, round_up(-min_off, BPF_REG_SIZE));
 }
 
 /* check whether memory at (regno + off) is accessible for t = (read | write)
@@ -6853,7 +6844,6 @@  static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 {
 	struct bpf_reg_state *regs = cur_regs(env);
 	struct bpf_reg_state *reg = regs + regno;
-	struct bpf_func_state *state;
 	int size, err = 0;
 
 	size = bpf_size_to_bytes(bpf_size);
@@ -6996,11 +6986,6 @@  static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 		if (err)
 			return err;
 
-		state = func(env, reg);
-		err = update_stack_depth(env, state, off);
-		if (err)
-			return err;
-
 		if (t == BPF_READ)
 			err = check_stack_read(env, regno, off, size,
 					       value_regno);
@@ -7195,7 +7180,8 @@  static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
 
 /* When register 'regno' is used to read the stack (either directly or through
  * a helper function) make sure that it's within stack boundary and, depending
- * on the access type, that all elements of the stack are initialized.
+ * on the access type and privileges, that all elements of the stack are
+ * initialized.
  *
  * 'off' includes 'regno->off', but not its dynamic part (if any).
  *
@@ -7303,8 +7289,11 @@  static int check_stack_range_initialized(
 
 		slot = -i - 1;
 		spi = slot / BPF_REG_SIZE;
-		if (state->allocated_stack <= slot)
-			goto err;
+		if (state->allocated_stack <= slot) {
+			verbose(env, "verifier bug: allocated_stack too small");
+			return -EFAULT;
+		}
+
 		stype = &state->stack[spi].slot_type[slot % BPF_REG_SIZE];
 		if (*stype == STACK_MISC)
 			goto mark;
@@ -7328,7 +7317,6 @@  static int check_stack_range_initialized(
 			goto mark;
 		}
 
-err:
 		if (tnum_is_const(reg->var_off)) {
 			verbose(env, "invalid%s read from stack R%d off %d+%d size %d\n",
 				err_extra, regno, min_off, i - min_off, access_size);
@@ -7353,7 +7341,7 @@  static int check_stack_range_initialized(
 		 * helper may write to the entire memory range.
 		 */
 	}
-	return update_stack_depth(env, state, min_off);
+	return 0;
 }
 
 static int check_helper_mem_access(struct bpf_verifier_env *env, int regno,
diff --git a/tools/testing/selftests/bpf/progs/test_global_func16.c b/tools/testing/selftests/bpf/progs/test_global_func16.c
index e7206304632e..e3e64bc472cd 100644
--- a/tools/testing/selftests/bpf/progs/test_global_func16.c
+++ b/tools/testing/selftests/bpf/progs/test_global_func16.c
@@ -13,7 +13,7 @@  __noinline int foo(int (*arr)[10])
 }
 
 SEC("cgroup_skb/ingress")
-__failure __msg("invalid indirect read from stack")
+__success
 int global_func16(struct __sk_buff *skb)
 {
 	int array[10];
diff --git a/tools/testing/selftests/bpf/progs/verifier_basic_stack.c b/tools/testing/selftests/bpf/progs/verifier_basic_stack.c
index 359df865a8f3..069c3f91705c 100644
--- a/tools/testing/selftests/bpf/progs/verifier_basic_stack.c
+++ b/tools/testing/selftests/bpf/progs/verifier_basic_stack.c
@@ -27,8 +27,8 @@  __naked void stack_out_of_bounds(void)
 
 SEC("socket")
 __description("uninitialized stack1")
-__failure __msg("invalid indirect read from stack")
-__failure_unpriv
+__success
+__failure_unpriv __msg_unpriv("invalid indirect read from stack")
 __naked void uninitialized_stack1(void)
 {
 	asm volatile ("					\
@@ -45,7 +45,7 @@  __naked void uninitialized_stack1(void)
 
 SEC("socket")
 __description("uninitialized stack2")
-__failure __msg("invalid read from stack")
+__success
 __failure_unpriv
 __naked void uninitialized_stack2(void)
 {
diff --git a/tools/testing/selftests/bpf/progs/verifier_int_ptr.c b/tools/testing/selftests/bpf/progs/verifier_int_ptr.c
index b054f9c48143..b5cedc0d23c1 100644
--- a/tools/testing/selftests/bpf/progs/verifier_int_ptr.c
+++ b/tools/testing/selftests/bpf/progs/verifier_int_ptr.c
@@ -7,7 +7,7 @@ 
 
 SEC("cgroup/sysctl")
 __description("ARG_PTR_TO_LONG uninitialized")
-__failure __msg("invalid indirect read from stack R4 off -16+0 size 8")
+__success
 __naked void arg_ptr_to_long_uninitialized(void)
 {
 	asm volatile ("					\
diff --git a/tools/testing/selftests/bpf/progs/verifier_raw_stack.c b/tools/testing/selftests/bpf/progs/verifier_raw_stack.c
index efbfc3a4ad6a..5468c5302495 100644
--- a/tools/testing/selftests/bpf/progs/verifier_raw_stack.c
+++ b/tools/testing/selftests/bpf/progs/verifier_raw_stack.c
@@ -7,7 +7,7 @@ 
 
 SEC("tc")
 __description("raw_stack: no skb_load_bytes")
-__failure __msg("invalid read from stack R6 off=-8 size=8")
+__success
 __naked void stack_no_skb_load_bytes(void)
 {
 	asm volatile ("					\
diff --git a/tools/testing/selftests/bpf/progs/verifier_var_off.c b/tools/testing/selftests/bpf/progs/verifier_var_off.c
index 83a90afba785..bbf3628c625a 100644
--- a/tools/testing/selftests/bpf/progs/verifier_var_off.c
+++ b/tools/testing/selftests/bpf/progs/verifier_var_off.c
@@ -61,7 +61,7 @@  __naked void stack_read_priv_vs_unpriv(void)
 
 SEC("lwt_in")
 __description("variable-offset stack read, uninitialized")
-__failure __msg("invalid variable-offset read from stack R2")
+__success
 __naked void variable_offset_stack_read_uninitialized(void)
 {
 	asm volatile ("					\
@@ -255,7 +255,7 @@  __naked void access_min_out_of_bound(void)
 
 SEC("lwt_in")
 __description("indirect variable-offset stack access, min_off < min_initialized")
-__failure __msg("invalid indirect read from stack R2 var_off")
+__success
 __naked void access_min_off_min_initialized(void)
 {
 	asm volatile ("					\
diff --git a/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c b/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c
index 319337bdcfc8..9a7b1106fda8 100644
--- a/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c
+++ b/tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c
@@ -83,17 +83,6 @@ 
 	.result = REJECT,
 	.errstr = "!read_ok",
 },
-{
-	"Can't use cmpxchg on uninit memory",
-	.insns = {
-		BPF_MOV64_IMM(BPF_REG_0, 3),
-		BPF_MOV64_IMM(BPF_REG_2, 4),
-		BPF_ATOMIC_OP(BPF_DW, BPF_CMPXCHG, BPF_REG_10, BPF_REG_2, -8),
-		BPF_EXIT_INSN(),
-	},
-	.result = REJECT,
-	.errstr = "invalid read from stack",
-},
 {
 	"BPF_W cmpxchg should zero top 32 bits",
 	.insns = {
diff --git a/tools/testing/selftests/bpf/verifier/calls.c b/tools/testing/selftests/bpf/verifier/calls.c
index 3d5cd51071f0..89b79997c1b4 100644
--- a/tools/testing/selftests/bpf/verifier/calls.c
+++ b/tools/testing/selftests/bpf/verifier/calls.c
@@ -1505,7 +1505,7 @@ 
 	.prog_type = BPF_PROG_TYPE_XDP,
 	.fixup_map_hash_8b = { 23 },
 	.result = REJECT,
-	.errstr = "invalid read from stack R7 off=-16 size=8",
+	.errstr = "R0 invalid mem access 'scalar'",
 },
 {
 	"calls: two calls that receive map_value via arg=ptr_stack_of_caller. test1",