diff mbox series

[bpf-next,v1,07/13] bpf: Fix partial dynptr stack slot reads/writes

Message ID 20221018135920.726360-8-memxor@gmail.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series Fixes for dynptr | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 10 this patch: 10
netdev/cc_maintainers warning 8 maintainers not CCed: sdf@google.com john.fastabend@gmail.com yhs@fb.com haoluo@google.com jolsa@kernel.org kpsingh@kernel.org song@kernel.org martin.lau@linux.dev
netdev/build_clang success Errors and warnings before: 5 this patch: 5
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 10 this patch: 10
netdev/checkpatch warning WARNING: line length of 100 exceeds 80 columns WARNING: line length of 95 exceeds 80 columns WARNING: void function return statements are not generally useful
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-4 success Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-5 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-1 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-7 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-8 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-13 fail Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-14 fail Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-16 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-17 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-10 fail Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-11 fail Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-15 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-6 success Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-9 fail Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-12 fail Logs for test_progs_no_alu32 on s390x with gcc

Commit Message

Kumar Kartikeya Dwivedi Oct. 18, 2022, 1:59 p.m. UTC
Currently, while reads are disallowed for dynptr stack slots, writes are
not. Reads don't work from both direct access and helpers, while writes
do work in both cases, but have the effect of overwriting the slot_type.

While this is fine, handling for a few edge cases is missing. Firstly,
a user can overwrite the stack slots of dynptr partially.

Consider the following layout:
spi: [d][d][?]
      2  1  0

First slot is at spi 2, second at spi 1.
Now, do a write of 1 to 8 bytes for spi 1.

This will essentially either write STACK_MISC for all slot_types or
STACK_MISC and STACK_ZERO (in case of size < BPF_REG_SIZE partial write
of zeroes). The end result is that slot is scrubbed.

Now, the layout is:
spi: [d][m][?]
      2  1  0

Suppose if user initializes spi = 1 as dynptr.
We get:
spi: [d][d][d]
      2  1  0

But this time, both spi 2 and spi 1 have first_slot = true.

Now, when passing spi 2 to dynptr helper, it will consider it as
initialized as it does not check whether second slot has first_slot ==
false. And spi 1 should already work as normal.

This effectively replaced size + offset of first dynptr, hence allowing
invalid OOB reads and writes.

Make a few changes to protect against this:
When writing to PTR_TO_STACK using BPF insns, when we touch spi of a
STACK_DYNPTR type, mark both first and second slot (regardless of which
slot we touch) as STACK_INVALID. Reads are already prevented.

Second, prevent writing	to stack memory from helpers if the range may
contain any STACK_DYNPTR slots. Reads are already prevented.

For helpers, we cannot allow it to destroy dynptrs from the writes as
depending on arguments, helper may take uninit_mem and dynptr both at
the same time. This would mean that helper may write to uninit_mem
before it reads the dynptr, which would be bad.

PTR_TO_MEM: [?????dd]

Depending on the code inside the helper, it may end up overwriting the
dynptr contents first and then read those as the dynptr argument.

Verifier would only simulate destruction when it does byte by byte
access simulation in check_helper_call for meta.access_size, and
fail to catch this case, as it happens after argument checks.

The same would need to be done for any other non-trivial objects created
on the stack in the future, such as bpf_list_head on stack, or
bpf_rb_root on stack.

A common misunderstanding in the current code is that MEM_UNINIT means
writes, but note that writes may also be performed even without
MEM_UNINIT in case of helpers, in that case the code after handling meta
&& meta->raw_mode will complain when it sees STACK_DYNPTR. So that
invalid read case also covers writes to potential STACK_DYNPTR slots.
The only loophole was in case of meta->raw_mode which simulated writes
through instructions which could overwrite them.

A future series sequenced after this will focus on the clean up of
helper access checks and bugs around that.

Fixes: 97e03f521050 ("bpf: Add verifier support for dynptrs")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/verifier.c | 76 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 76 insertions(+)

Comments

Joanne Koong Oct. 21, 2022, 10:50 p.m. UTC | #1
On Tue, Oct 18, 2022 at 6:59 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> Currently, while reads are disallowed for dynptr stack slots, writes are
> not. Reads don't work from both direct access and helpers, while writes
> do work in both cases, but have the effect of overwriting the slot_type.
>
> While this is fine, handling for a few edge cases is missing. Firstly,
> a user can overwrite the stack slots of dynptr partially.
>
> Consider the following layout:
> spi: [d][d][?]
>       2  1  0
>
> First slot is at spi 2, second at spi 1.
> Now, do a write of 1 to 8 bytes for spi 1.
>
> This will essentially either write STACK_MISC for all slot_types or
> STACK_MISC and STACK_ZERO (in case of size < BPF_REG_SIZE partial write
> of zeroes). The end result is that slot is scrubbed.
>
> Now, the layout is:
> spi: [d][m][?]
>       2  1  0
>
> Suppose if user initializes spi = 1 as dynptr.
> We get:
> spi: [d][d][d]
>       2  1  0
>
> But this time, both spi 2 and spi 1 have first_slot = true.
>
> Now, when passing spi 2 to dynptr helper, it will consider it as
> initialized as it does not check whether second slot has first_slot ==
> false. And spi 1 should already work as normal.
>
> This effectively replaced size + offset of first dynptr, hence allowing
> invalid OOB reads and writes.
>
> Make a few changes to protect against this:
> When writing to PTR_TO_STACK using BPF insns, when we touch spi of a
> STACK_DYNPTR type, mark both first and second slot (regardless of which
> slot we touch) as STACK_INVALID. Reads are already prevented.
>
> Second, prevent writing to stack memory from helpers if the range may
> contain any STACK_DYNPTR slots. Reads are already prevented.
>
> For helpers, we cannot allow it to destroy dynptrs from the writes as
> depending on arguments, helper may take uninit_mem and dynptr both at
> the same time. This would mean that helper may write to uninit_mem
> before it reads the dynptr, which would be bad.
>
> PTR_TO_MEM: [?????dd]
>
> Depending on the code inside the helper, it may end up overwriting the
> dynptr contents first and then read those as the dynptr argument.
>
> Verifier would only simulate destruction when it does byte by byte
> access simulation in check_helper_call for meta.access_size, and
> fail to catch this case, as it happens after argument checks.
>
> The same would need to be done for any other non-trivial objects created
> on the stack in the future, such as bpf_list_head on stack, or
> bpf_rb_root on stack.
>
> A common misunderstanding in the current code is that MEM_UNINIT means
> writes, but note that writes may also be performed even without
> MEM_UNINIT in case of helpers, in that case the code after handling meta
> && meta->raw_mode will complain when it sees STACK_DYNPTR. So that
> invalid read case also covers writes to potential STACK_DYNPTR slots.
> The only loophole was in case of meta->raw_mode which simulated writes
> through instructions which could overwrite them.
>
> A future series sequenced after this will focus on the clean up of
> helper access checks and bugs around that.

thanks for your work on this (and on the rest of the stack, which I'm
still working on reviewing)

Regarding writes leading to partial dynptr stack slots, I'm regretting
not having the verifier flat-out reject this in the first place
(instead of it being allowed but internally the stack slot gets marked
as invalid) - I think it overall ends up being more confusing to end
users, where there it's not obvious at all that writing to the dynptr
on the stack automatically invalidates it. I'm not sure whether it's
too late from a public API behavior perspective to change this or not.
ANyways, assuming it is too late, I left a few comments below.

>
> Fixes: 97e03f521050 ("bpf: Add verifier support for dynptrs")
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> ---
>  kernel/bpf/verifier.c | 76 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 76 insertions(+)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 0fd73f96c5e2..89ae384ea6a7 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -740,6 +740,8 @@ static void mark_dynptr_cb_reg(struct bpf_reg_state *reg1,
>         __mark_dynptr_regs(reg1, NULL, type);
>  }
>
> +static void destroy_stack_slots_dynptr(struct bpf_verifier_env *env,
> +                                      struct bpf_func_state *state, int spi);
>
>  static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
>                                    enum bpf_arg_type arg_type, int insn_idx)
> @@ -755,6 +757,9 @@ static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_
>         if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS))
>                 return -EINVAL;
>
> +       destroy_stack_slots_dynptr(env, state, spi);
> +       destroy_stack_slots_dynptr(env, state, spi - 1);

I don't think we need these two lines. mark_stack_slots_dynptr() is
called only in the case where an uninitialized dynptr is getting
initialized; is_dynptr_reg_valid_uninit() will have already been
called prior to this (in check_func_arg()), where
is_dynptr_reg_valid_uninit() will have checked that for any
uninitialized dynptr, the stack slot has not already been marked as
STACK_DYNTPR. Maybe I'm missing something in this analysis? What are
your thoughts?

> +
>         for (i = 0; i < BPF_REG_SIZE; i++) {
>                 state->stack[spi].slot_type[i] = STACK_DYNPTR;
>                 state->stack[spi - 1].slot_type[i] = STACK_DYNPTR;
> @@ -829,6 +834,44 @@ static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_re
>         return 0;
>  }
>
> +static void destroy_stack_slots_dynptr(struct bpf_verifier_env *env,
> +                                      struct bpf_func_state *state, int spi)
> +{
> +       int i;
> +
> +       /* We always ensure that STACK_DYNPTR is never set partially,
> +        * hence just checking for slot_type[0] is enough. This is
> +        * different for STACK_SPILL, where it may be only set for
> +        * 1 byte, so code has to use is_spilled_reg.
> +        */
> +       if (state->stack[spi].slot_type[0] != STACK_DYNPTR)
> +               return;
> +       /* Reposition spi to first slot */
> +       if (!state->stack[spi].spilled_ptr.dynptr.first_slot)
> +               spi = spi + 1;
> +
> +       mark_stack_slot_scratched(env, spi);
> +       mark_stack_slot_scratched(env, spi - 1);
> +
> +       /* Writing partially to one dynptr stack slot destroys both. */
> +       for (i = 0; i < BPF_REG_SIZE; i++) {
> +               state->stack[spi].slot_type[i] = STACK_INVALID;
> +               state->stack[spi - 1].slot_type[i] = STACK_INVALID;
> +       }
> +
> +       /* Do not release reference state, we are destroying dynptr on stack,
> +        * not using some helper to release it. Just reset register.
> +        */
> +       __mark_reg_not_init(env, &state->stack[spi].spilled_ptr);
> +       __mark_reg_not_init(env, &state->stack[spi - 1].spilled_ptr);
> +
> +       /* Same reason as unmark_stack_slots_dynptr above */
> +       state->stack[spi].spilled_ptr.live |= REG_LIVE_WRITTEN;
> +       state->stack[spi - 1].spilled_ptr.live |= REG_LIVE_WRITTEN;
> +
> +       return;
> +}

I think it'd be cleaner if we combined this and
unmark_stack_slots_dynptr() into one function. The logic is pretty
much the same except for if the reference state should be released.

> +
>  static bool is_dynptr_reg_valid_uninit(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
>  {
>         struct bpf_func_state *state = func(env, reg);
> @@ -3183,6 +3226,8 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
>                         env->insn_aux_data[insn_idx].sanitize_stack_spill = true;
>         }
>
> +       destroy_stack_slots_dynptr(env, state, spi);

If the stack slot is a dynptr, I think we can just return after this
call, else we do extra work and mark the stack slots as STACK_MISC
(3rd case in the if statement).

> +
>         mark_stack_slot_scratched(env, spi);
>         if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) &&
>             !register_is_null(reg) && env->bpf_capable) {
> @@ -3296,6 +3341,13 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env,
>         if (err)
>                 return err;
>
> +       for (i = min_off; i < max_off; i++) {
> +               int slot, spi;
> +
> +               slot = -i - 1;
> +               spi = slot / BPF_REG_SIZE;
> +               destroy_stack_slots_dynptr(env, state, spi);
> +       }
>

Instead of calling destroy_stack_slots_dynptr() in
check_stack_write_fixed_off() and check_stack_write_var_off(), I think
calling it from check_stack_write() would be a better place. I think
that'd be more efficient as well where if it is a write to a dynptr,
we can directly return after invalidating the stack slot.

>         /* Variable offset writes destroy any spilled pointers in range. */
>         for (i = min_off; i < max_off; i++) {
> @@ -5257,6 +5309,30 @@ static int check_stack_range_initialized(
>         }
>
>         if (meta && meta->raw_mode) {
> +               /* Ensure we won't be overwriting dynptrs when simulating byte
> +                * by byte access in check_helper_call using meta.access_size.
> +                * This would be a problem if we have a helper in the future
> +                * which takes:
> +                *
> +                *      helper(uninit_mem, len, dynptr)
> +                *
> +                * Now, uninint_mem may overlap with dynptr pointer. Hence, it
> +                * may end up writing to dynptr itself when touching memory from
> +                * arg 1. This can be relaxed on a case by case basis for known
> +                * safe cases, but reject due to the possibilitiy of aliasing by
> +                * default.
> +                */
> +               for (i = min_off; i < max_off + access_size; i++) {
> +                       slot = -i - 1;
> +                       spi = slot / BPF_REG_SIZE;

I think we can just use get_spi(i) here

> +                       /* raw_mode may write past allocated_stack */
> +                       if (state->allocated_stack <= slot)
> +                               continue;

break?

> +                       if (state->stack[spi].slot_type[slot % BPF_REG_SIZE] == STACK_DYNPTR) {
> +                               verbose(env, "potential write to dynptr at off=%d disallowed\n", i);
> +                               return -EACCES;
> +                       }
> +               }
>                 meta->access_size = access_size;
>                 meta->regno = regno;
>                 return 0;
> --
> 2.38.0
>
Joanne Koong Oct. 21, 2022, 10:57 p.m. UTC | #2
[...]
>
> > +                       /* raw_mode may write past allocated_stack */
> > +                       if (state->allocated_stack <= slot)
> > +                               continue;
>
> break?

nvm, i think this should stay "continue".

>
> > +                       if (state->stack[spi].slot_type[slot % BPF_REG_SIZE] == STACK_DYNPTR) {
> > +                               verbose(env, "potential write to dynptr at off=%d disallowed\n", i);
> > +                               return -EACCES;
> > +                       }
> > +               }
> >                 meta->access_size = access_size;
> >                 meta->regno = regno;
> >                 return 0;
> > --
> > 2.38.0
> >
Kumar Kartikeya Dwivedi Oct. 22, 2022, 4:08 a.m. UTC | #3
On Sat, Oct 22, 2022 at 04:20:28AM IST, Joanne Koong wrote:
> On Tue, Oct 18, 2022 at 6:59 AM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > Currently, while reads are disallowed for dynptr stack slots, writes are
> > not. Reads don't work from both direct access and helpers, while writes
> > do work in both cases, but have the effect of overwriting the slot_type.
> >
> > While this is fine, handling for a few edge cases is missing. Firstly,
> > a user can overwrite the stack slots of dynptr partially.
> >
> > Consider the following layout:
> > spi: [d][d][?]
> >       2  1  0
> >
> > First slot is at spi 2, second at spi 1.
> > Now, do a write of 1 to 8 bytes for spi 1.
> >
> > This will essentially either write STACK_MISC for all slot_types or
> > STACK_MISC and STACK_ZERO (in case of size < BPF_REG_SIZE partial write
> > of zeroes). The end result is that slot is scrubbed.
> >
> > Now, the layout is:
> > spi: [d][m][?]
> >       2  1  0
> >
> > Suppose if user initializes spi = 1 as dynptr.
> > We get:
> > spi: [d][d][d]
> >       2  1  0
> >
> > But this time, both spi 2 and spi 1 have first_slot = true.
> >
> > Now, when passing spi 2 to dynptr helper, it will consider it as
> > initialized as it does not check whether second slot has first_slot ==
> > false. And spi 1 should already work as normal.
> >
> > This effectively replaced size + offset of first dynptr, hence allowing
> > invalid OOB reads and writes.
> >
> > Make a few changes to protect against this:
> > When writing to PTR_TO_STACK using BPF insns, when we touch spi of a
> > STACK_DYNPTR type, mark both first and second slot (regardless of which
> > slot we touch) as STACK_INVALID. Reads are already prevented.
> >
> > Second, prevent writing to stack memory from helpers if the range may
> > contain any STACK_DYNPTR slots. Reads are already prevented.
> >
> > For helpers, we cannot allow it to destroy dynptrs from the writes as
> > depending on arguments, helper may take uninit_mem and dynptr both at
> > the same time. This would mean that helper may write to uninit_mem
> > before it reads the dynptr, which would be bad.
> >
> > PTR_TO_MEM: [?????dd]
> >
> > Depending on the code inside the helper, it may end up overwriting the
> > dynptr contents first and then read those as the dynptr argument.
> >
> > Verifier would only simulate destruction when it does byte by byte
> > access simulation in check_helper_call for meta.access_size, and
> > fail to catch this case, as it happens after argument checks.
> >
> > The same would need to be done for any other non-trivial objects created
> > on the stack in the future, such as bpf_list_head on stack, or
> > bpf_rb_root on stack.
> >
> > A common misunderstanding in the current code is that MEM_UNINIT means
> > writes, but note that writes may also be performed even without
> > MEM_UNINIT in case of helpers, in that case the code after handling meta
> > && meta->raw_mode will complain when it sees STACK_DYNPTR. So that
> > invalid read case also covers writes to potential STACK_DYNPTR slots.
> > The only loophole was in case of meta->raw_mode which simulated writes
> > through instructions which could overwrite them.
> >
> > A future series sequenced after this will focus on the clean up of
> > helper access checks and bugs around that.
>
> thanks for your work on this (and on the rest of the stack, which I'm
> still working on reviewing)
>
> Regarding writes leading to partial dynptr stack slots, I'm regretting
> not having the verifier flat-out reject this in the first place
> (instead of it being allowed but internally the stack slot gets marked
> as invalid) - I think it overall ends up being more confusing to end
> users, where there it's not obvious at all that writing to the dynptr
> on the stack automatically invalidates it. I'm not sure whether it's
> too late from a public API behavior perspective to change this or not.

It would be incorrect to reject writes into dynptrs whose reference is not
tracked by the verifier (so bpf_dynptr_from_mem), because the compiler would be
free to reuse the stack space for some other variable when the local dynptr
variable's lifetime ends, and the verifier would have no way to know when the
variable went out of scope.

I feel it is also incorrect to refuse bpf_dynptr_from_mem where unref dynptr
already exists as well. Right now it sees STACK_DYNPTR in the slot_type and
fails. But consider something like this:

void prog(void)
{
	{
		struct bpf_dynptr ptr;
		bpf_dynptr_from_mem(...);
		...
	}

	...

	{
		struct bpf_dynptr ptr;
		bpf_dynptr_from_mem(...);
	}
}

The program is valid, but if ptr in both scopes share the same stack slots, the
call in the second scope would fail because verifier would see STACK_DYNPTR in
slot_type.

It is fine though to simply reject writes in case of dynptrs obtained from
bpf_ringbuf_reserve_dynptr, because if they are overwritten before being
released, it will end up being an error later due to unreleased reference state.
The lifetime of the object in this case is being controlled using BPF helpers
explicitly.

So I think it is ok to do in the second case, and it is unaffected by backward
compatibility constraints. It wouldn't have been possible for the unref case
even when you started out with this.

> ANyways, assuming it is too late, I left a few comments below.
>
> >
> > Fixes: 97e03f521050 ("bpf: Add verifier support for dynptrs")
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > ---
> >  kernel/bpf/verifier.c | 76 +++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 76 insertions(+)
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 0fd73f96c5e2..89ae384ea6a7 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -740,6 +740,8 @@ static void mark_dynptr_cb_reg(struct bpf_reg_state *reg1,
> >         __mark_dynptr_regs(reg1, NULL, type);
> >  }
> >
> > +static void destroy_stack_slots_dynptr(struct bpf_verifier_env *env,
> > +                                      struct bpf_func_state *state, int spi);
> >
> >  static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> >                                    enum bpf_arg_type arg_type, int insn_idx)
> > @@ -755,6 +757,9 @@ static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_
> >         if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS))
> >                 return -EINVAL;
> >
> > +       destroy_stack_slots_dynptr(env, state, spi);
> > +       destroy_stack_slots_dynptr(env, state, spi - 1);
>
> I don't think we need these two lines. mark_stack_slots_dynptr() is
> called only in the case where an uninitialized dynptr is getting
> initialized; is_dynptr_reg_valid_uninit() will have already been
> called prior to this (in check_func_arg()), where
> is_dynptr_reg_valid_uninit() will have checked that for any
> uninitialized dynptr, the stack slot has not already been marked as
> STACK_DYNTPR. Maybe I'm missing something in this analysis? What are
> your thoughts?
>

You're right, it shouldn't be needed here now.
In case of insn writes we already destroy both slots of a pair.

If we decide to allow mark_stack_slots_dynptr on STACK_DYNPTR that is
unreferenced, per the discussion above, I will keep it, because it would be
needed then, otherwise I will drop it.

> > +
> >         for (i = 0; i < BPF_REG_SIZE; i++) {
> >                 state->stack[spi].slot_type[i] = STACK_DYNPTR;
> >                 state->stack[spi - 1].slot_type[i] = STACK_DYNPTR;
> > @@ -829,6 +834,44 @@ static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_re
> >         return 0;
> >  }
> >
> > +static void destroy_stack_slots_dynptr(struct bpf_verifier_env *env,
> > +                                      struct bpf_func_state *state, int spi)
> > +{
> > +       int i;
> > +
> > +       /* We always ensure that STACK_DYNPTR is never set partially,
> > +        * hence just checking for slot_type[0] is enough. This is
> > +        * different for STACK_SPILL, where it may be only set for
> > +        * 1 byte, so code has to use is_spilled_reg.
> > +        */
> > +       if (state->stack[spi].slot_type[0] != STACK_DYNPTR)
> > +               return;
> > +       /* Reposition spi to first slot */
> > +       if (!state->stack[spi].spilled_ptr.dynptr.first_slot)
> > +               spi = spi + 1;
> > +
> > +       mark_stack_slot_scratched(env, spi);
> > +       mark_stack_slot_scratched(env, spi - 1);
> > +
> > +       /* Writing partially to one dynptr stack slot destroys both. */
> > +       for (i = 0; i < BPF_REG_SIZE; i++) {
> > +               state->stack[spi].slot_type[i] = STACK_INVALID;
> > +               state->stack[spi - 1].slot_type[i] = STACK_INVALID;
> > +       }
> > +
> > +       /* Do not release reference state, we are destroying dynptr on stack,
> > +        * not using some helper to release it. Just reset register.
> > +        */
> > +       __mark_reg_not_init(env, &state->stack[spi].spilled_ptr);
> > +       __mark_reg_not_init(env, &state->stack[spi - 1].spilled_ptr);
> > +
> > +       /* Same reason as unmark_stack_slots_dynptr above */
> > +       state->stack[spi].spilled_ptr.live |= REG_LIVE_WRITTEN;
> > +       state->stack[spi - 1].spilled_ptr.live |= REG_LIVE_WRITTEN;
> > +
> > +       return;
> > +}
>
> I think it'd be cleaner if we combined this and
> unmark_stack_slots_dynptr() into one function. The logic is pretty
> much the same except for if the reference state should be released.
>

Ack, will do. I can put this logic in a common function and both could be
callers of that, passing true/false, so it remains readable while avoiding the
duplication.

> > +
> >  static bool is_dynptr_reg_valid_uninit(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> >  {
> >         struct bpf_func_state *state = func(env, reg);
> > @@ -3183,6 +3226,8 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
> >                         env->insn_aux_data[insn_idx].sanitize_stack_spill = true;
> >         }
> >
> > +       destroy_stack_slots_dynptr(env, state, spi);
>
> If the stack slot is a dynptr, I think we can just return after this
> call, else we do extra work and mark the stack slots as STACK_MISC
> (3rd case in the if statement).
>

That is the intention here. The destroy_stack_slots_dynptr overwrites two slots,
while we still simulate the write to the slot being written to.

[?][d][d]
 2  1  0

If I wrote to spi = 1, it would now be [?][m][?].
Earlier it would have been [?][m][d].

Any stray write (either fixed or variable offset) to a dynptr slot ends the
lifetime of the dynptr object, so both slots representing the dynptr object need
to be invalidated.

But the write itself needs to happen, and its state has to be reflected in the
stack state for those particular slot(s).

The main point here is to prevent partial destruction, which allows manifesting
the case described in the commit log. Writing to one slot of the two
representing a dynptr invalidates both.

> > +
> >         mark_stack_slot_scratched(env, spi);
> >         if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) &&
> >             !register_is_null(reg) && env->bpf_capable) {
> > @@ -3296,6 +3341,13 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env,
> >         if (err)
> >                 return err;
> >
> > +       for (i = min_off; i < max_off; i++) {
> > +               int slot, spi;
> > +
> > +               slot = -i - 1;
> > +               spi = slot / BPF_REG_SIZE;
> > +               destroy_stack_slots_dynptr(env, state, spi);
> > +       }
> >
>
> Instead of calling destroy_stack_slots_dynptr() in
> check_stack_write_fixed_off() and check_stack_write_var_off(), I think
> calling it from check_stack_write() would be a better place. I think
> that'd be more efficient as well where if it is a write to a dynptr,
> we can directly return after invalidating the stack slot.
>

We cannot directly return, as explained above.

> >         /* Variable offset writes destroy any spilled pointers in range. */
> >         for (i = min_off; i < max_off; i++) {
> > @@ -5257,6 +5309,30 @@ static int check_stack_range_initialized(
> >         }
> >
> >         if (meta && meta->raw_mode) {
> > +               /* Ensure we won't be overwriting dynptrs when simulating byte
> > +                * by byte access in check_helper_call using meta.access_size.
> > +                * This would be a problem if we have a helper in the future
> > +                * which takes:
> > +                *
> > +                *      helper(uninit_mem, len, dynptr)
> > +                *
> > +                * Now, uninint_mem may overlap with dynptr pointer. Hence, it
> > +                * may end up writing to dynptr itself when touching memory from
> > +                * arg 1. This can be relaxed on a case by case basis for known
> > +                * safe cases, but reject due to the possibilitiy of aliasing by
> > +                * default.
> > +                */
> > +               for (i = min_off; i < max_off + access_size; i++) {
> > +                       slot = -i - 1;
> > +                       spi = slot / BPF_REG_SIZE;
>
> I think we can just use get_spi(i) here
>

Ack.

> > +                       /* raw_mode may write past allocated_stack */
> > +                       if (state->allocated_stack <= slot)
> > +                               continue;
>
> break?
>

I think you realised why it's continue in your other reply :).
Joanne Koong Nov. 3, 2022, 2:07 p.m. UTC | #4
On Sat, Oct 22, 2022 at 5:08 AM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> On Sat, Oct 22, 2022 at 04:20:28AM IST, Joanne Koong wrote:
> > On Tue, Oct 18, 2022 at 6:59 AM Kumar Kartikeya Dwivedi
> > <memxor@gmail.com> wrote:
> > >
> > > Currently, while reads are disallowed for dynptr stack slots, writes are
> > > not. Reads don't work from both direct access and helpers, while writes
> > > do work in both cases, but have the effect of overwriting the slot_type.
> > >
> > > While this is fine, handling for a few edge cases is missing. Firstly,
> > > a user can overwrite the stack slots of dynptr partially.
> > >
> > > Consider the following layout:
> > > spi: [d][d][?]
> > >       2  1  0
> > >
> > > First slot is at spi 2, second at spi 1.
> > > Now, do a write of 1 to 8 bytes for spi 1.
> > >
> > > This will essentially either write STACK_MISC for all slot_types or
> > > STACK_MISC and STACK_ZERO (in case of size < BPF_REG_SIZE partial write
> > > of zeroes). The end result is that slot is scrubbed.
> > >
> > > Now, the layout is:
> > > spi: [d][m][?]
> > >       2  1  0
> > >
> > > Suppose if user initializes spi = 1 as dynptr.
> > > We get:
> > > spi: [d][d][d]
> > >       2  1  0
> > >
> > > But this time, both spi 2 and spi 1 have first_slot = true.
> > >
> > > Now, when passing spi 2 to dynptr helper, it will consider it as
> > > initialized as it does not check whether second slot has first_slot ==
> > > false. And spi 1 should already work as normal.
> > >
> > > This effectively replaced size + offset of first dynptr, hence allowing
> > > invalid OOB reads and writes.
> > >
> > > Make a few changes to protect against this:
> > > When writing to PTR_TO_STACK using BPF insns, when we touch spi of a
> > > STACK_DYNPTR type, mark both first and second slot (regardless of which
> > > slot we touch) as STACK_INVALID. Reads are already prevented.
> > >
> > > Second, prevent writing to stack memory from helpers if the range may
> > > contain any STACK_DYNPTR slots. Reads are already prevented.
> > >
> > > For helpers, we cannot allow it to destroy dynptrs from the writes as
> > > depending on arguments, helper may take uninit_mem and dynptr both at
> > > the same time. This would mean that helper may write to uninit_mem
> > > before it reads the dynptr, which would be bad.
> > >
> > > PTR_TO_MEM: [?????dd]
> > >
> > > Depending on the code inside the helper, it may end up overwriting the
> > > dynptr contents first and then read those as the dynptr argument.
> > >
> > > Verifier would only simulate destruction when it does byte by byte
> > > access simulation in check_helper_call for meta.access_size, and
> > > fail to catch this case, as it happens after argument checks.
> > >
> > > The same would need to be done for any other non-trivial objects created
> > > on the stack in the future, such as bpf_list_head on stack, or
> > > bpf_rb_root on stack.
> > >
> > > A common misunderstanding in the current code is that MEM_UNINIT means
> > > writes, but note that writes may also be performed even without
> > > MEM_UNINIT in case of helpers, in that case the code after handling meta
> > > && meta->raw_mode will complain when it sees STACK_DYNPTR. So that
> > > invalid read case also covers writes to potential STACK_DYNPTR slots.
> > > The only loophole was in case of meta->raw_mode which simulated writes
> > > through instructions which could overwrite them.
> > >
> > > A future series sequenced after this will focus on the clean up of
> > > helper access checks and bugs around that.
> >
> > thanks for your work on this (and on the rest of the stack, which I'm
> > still working on reviewing)
> >
> > Regarding writes leading to partial dynptr stack slots, I'm regretting
> > not having the verifier flat-out reject this in the first place
> > (instead of it being allowed but internally the stack slot gets marked
> > as invalid) - I think it overall ends up being more confusing to end
> > users, where there it's not obvious at all that writing to the dynptr
> > on the stack automatically invalidates it. I'm not sure whether it's
> > too late from a public API behavior perspective to change this or not.
>
> It would be incorrect to reject writes into dynptrs whose reference is not
> tracked by the verifier (so bpf_dynptr_from_mem), because the compiler would be
> free to reuse the stack space for some other variable when the local dynptr
> variable's lifetime ends, and the verifier would have no way to know when the
> variable went out of scope.
>
> I feel it is also incorrect to refuse bpf_dynptr_from_mem where unref dynptr
> already exists as well. Right now it sees STACK_DYNPTR in the slot_type and
> fails. But consider something like this:
>
> void prog(void)
> {
>         {
>                 struct bpf_dynptr ptr;
>                 bpf_dynptr_from_mem(...);
>                 ...
>         }
>
>         ...
>
>         {
>                 struct bpf_dynptr ptr;
>                 bpf_dynptr_from_mem(...);
>         }
> }
>
> The program is valid, but if ptr in both scopes share the same stack slots, the
> call in the second scope would fail because verifier would see STACK_DYNPTR in
> slot_type.
>
> It is fine though to simply reject writes in case of dynptrs obtained from
> bpf_ringbuf_reserve_dynptr, because if they are overwritten before being
> released, it will end up being an error later due to unreleased reference state.
> The lifetime of the object in this case is being controlled using BPF helpers
> explicitly.
>
> So I think it is ok to do in the second case, and it is unaffected by backward
> compatibility constraints. It wouldn't have been possible for the unref case
> even when you started out with this.

I see! I didn't realize the compiler can reuse the stack slot for
different variables within the same stack frame. I agree with your
thoughts.

>
> > ANyways, assuming it is too late, I left a few comments below.
> >
> > >
> > > Fixes: 97e03f521050 ("bpf: Add verifier support for dynptrs")
> > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > > ---
> > >  kernel/bpf/verifier.c | 76 +++++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 76 insertions(+)
> > >
> > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > index 0fd73f96c5e2..89ae384ea6a7 100644
> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -740,6 +740,8 @@ static void mark_dynptr_cb_reg(struct bpf_reg_state *reg1,
> > >         __mark_dynptr_regs(reg1, NULL, type);
> > >  }
> > >
> > > +static void destroy_stack_slots_dynptr(struct bpf_verifier_env *env,
> > > +                                      struct bpf_func_state *state, int spi);
> > >
> > >  static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> > >                                    enum bpf_arg_type arg_type, int insn_idx)
> > > @@ -755,6 +757,9 @@ static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_
> > >         if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS))
> > >                 return -EINVAL;
> > >
> > > +       destroy_stack_slots_dynptr(env, state, spi);
> > > +       destroy_stack_slots_dynptr(env, state, spi - 1);
> >
> > I don't think we need these two lines. mark_stack_slots_dynptr() is
> > called only in the case where an uninitialized dynptr is getting
> > initialized; is_dynptr_reg_valid_uninit() will have already been
> > called prior to this (in check_func_arg()), where
> > is_dynptr_reg_valid_uninit() will have checked that for any
> > uninitialized dynptr, the stack slot has not already been marked as
> > STACK_DYNTPR. Maybe I'm missing something in this analysis? What are
> > your thoughts?
> >
>
> You're right, it shouldn't be needed here now.
> In case of insn writes we already destroy both slots of a pair.
>
> If we decide to allow mark_stack_slots_dynptr on STACK_DYNPTR that is
> unreferenced, per the discussion above, I will keep it, because it would be
> needed then, otherwise I will drop it.

I think we should remove these two lines from this patch and have the
code for allowing mark_stack_slots_dynptr on unreferenced
STACK_DYNPTRs as a separate patch since that will also require changes
to is_dynptr_reg_valid_uninit().

>
> > > +
> > >         for (i = 0; i < BPF_REG_SIZE; i++) {
> > >                 state->stack[spi].slot_type[i] = STACK_DYNPTR;
> > >                 state->stack[spi - 1].slot_type[i] = STACK_DYNPTR;
> > > @@ -829,6 +834,44 @@ static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_re
> > >         return 0;
> > >  }
> > >
> > > +static void destroy_stack_slots_dynptr(struct bpf_verifier_env *env,
> > > +                                      struct bpf_func_state *state, int spi)
> > > +{
> > > +       int i;
> > > +
> > > +       /* We always ensure that STACK_DYNPTR is never set partially,
> > > +        * hence just checking for slot_type[0] is enough. This is
> > > +        * different for STACK_SPILL, where it may be only set for
> > > +        * 1 byte, so code has to use is_spilled_reg.
> > > +        */
> > > +       if (state->stack[spi].slot_type[0] != STACK_DYNPTR)
> > > +               return;
> > > +       /* Reposition spi to first slot */
> > > +       if (!state->stack[spi].spilled_ptr.dynptr.first_slot)
> > > +               spi = spi + 1;
> > > +
> > > +       mark_stack_slot_scratched(env, spi);
> > > +       mark_stack_slot_scratched(env, spi - 1);
> > > +
> > > +       /* Writing partially to one dynptr stack slot destroys both. */
> > > +       for (i = 0; i < BPF_REG_SIZE; i++) {
> > > +               state->stack[spi].slot_type[i] = STACK_INVALID;
> > > +               state->stack[spi - 1].slot_type[i] = STACK_INVALID;
> > > +       }
> > > +
> > > +       /* Do not release reference state, we are destroying dynptr on stack,
> > > +        * not using some helper to release it. Just reset register.
> > > +        */
> > > +       __mark_reg_not_init(env, &state->stack[spi].spilled_ptr);
> > > +       __mark_reg_not_init(env, &state->stack[spi - 1].spilled_ptr);
> > > +
> > > +       /* Same reason as unmark_stack_slots_dynptr above */
> > > +       state->stack[spi].spilled_ptr.live |= REG_LIVE_WRITTEN;
> > > +       state->stack[spi - 1].spilled_ptr.live |= REG_LIVE_WRITTEN;
> > > +
> > > +       return;
> > > +}
> >
> > I think it'd be cleaner if we combined this and
> > unmark_stack_slots_dynptr() into one function. The logic is pretty
> > much the same except for if the reference state should be released.
> >
>
> Ack, will do. I can put this logic in a common function and both could be
> callers of that, passing true/false, so it remains readable while avoiding the
> duplication.
>
> > > +
> > >  static bool is_dynptr_reg_valid_uninit(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> > >  {
> > >         struct bpf_func_state *state = func(env, reg);
> > > @@ -3183,6 +3226,8 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
> > >                         env->insn_aux_data[insn_idx].sanitize_stack_spill = true;
> > >         }
> > >
> > > +       destroy_stack_slots_dynptr(env, state, spi);
> >
> > If the stack slot is a dynptr, I think we can just return after this
> > call, else we do extra work and mark the stack slots as STACK_MISC
> > (3rd case in the if statement).
> >
>
> That is the intention here. The destroy_stack_slots_dynptr overwrites two slots,
> while we still simulate the write to the slot being written to.
>
> [?][d][d]
>  2  1  0
>
> If I wrote to spi = 1, it would now be [?][m][?].
> Earlier it would have been [?][m][d].
>
> Any stray write (either fixed or variable offset) to a dynptr slot ends the
> lifetime of the dynptr object, so both slots representing the dynptr object need
> to be invalidated.
>
> But the write itself needs to happen, and its state has to be reflected in the
> stack state for those particular slot(s).
>
> The main point here is to prevent partial destruction, which allows manifesting
> the case described in the commit log. Writing to one slot of the two
> representing a dynptr invalidates both.

Returning after the destroy_stack_slots_dynptr call would overwrite
both slots; my previous point was that we could return after calling
this instead of also going through the 3rd if statement below. But I
just realized that we do need to go through the 3rd if statement since
the stack slots need to be STACK_MISC, not STACK_INVALID.

>
> > > +
> > >         mark_stack_slot_scratched(env, spi);
> > >         if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) &&
> > >             !register_is_null(reg) && env->bpf_capable) {
> > > @@ -3296,6 +3341,13 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env,
> > >         if (err)
> > >                 return err;
> > >
> > > +       for (i = min_off; i < max_off; i++) {
> > > +               int slot, spi;
> > > +
> > > +               slot = -i - 1;
> > > +               spi = slot / BPF_REG_SIZE;
> > > +               destroy_stack_slots_dynptr(env, state, spi);
> > > +       }
> > >
> >
> > Instead of calling destroy_stack_slots_dynptr() in
> > check_stack_write_fixed_off() and check_stack_write_var_off(), I think
> > calling it from check_stack_write() would be a better place. I think
> > that'd be more efficient as well where if it is a write to a dynptr,
> > we can directly return after invalidating the stack slot.
> >
>
> We cannot directly return, as explained above.
>
> > >         /* Variable offset writes destroy any spilled pointers in range. */
> > >         for (i = min_off; i < max_off; i++) {
> > > @@ -5257,6 +5309,30 @@ static int check_stack_range_initialized(
> > >         }
> > >
> > >         if (meta && meta->raw_mode) {
> > > +               /* Ensure we won't be overwriting dynptrs when simulating byte
> > > +                * by byte access in check_helper_call using meta.access_size.
> > > +                * This would be a problem if we have a helper in the future
> > > +                * which takes:
> > > +                *
> > > +                *      helper(uninit_mem, len, dynptr)
> > > +                *
> > > +                * Now, uninint_mem may overlap with dynptr pointer. Hence, it
> > > +                * may end up writing to dynptr itself when touching memory from
> > > +                * arg 1. This can be relaxed on a case by case basis for known
> > > +                * safe cases, but reject due to the possibilitiy of aliasing by
> > > +                * default.
> > > +                */
> > > +               for (i = min_off; i < max_off + access_size; i++) {
> > > +                       slot = -i - 1;
> > > +                       spi = slot / BPF_REG_SIZE;
> >
> > I think we can just use get_spi(i) here
> >
>
> Ack.
>
> > > +                       /* raw_mode may write past allocated_stack */
> > > +                       if (state->allocated_stack <= slot)
> > > +                               continue;
> >
> > break?
> >
>
> I think you realised why it's continue in your other reply :).
Andrii Nakryiko Nov. 4, 2022, 10:14 p.m. UTC | #5
On Thu, Nov 3, 2022 at 7:07 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Sat, Oct 22, 2022 at 5:08 AM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > On Sat, Oct 22, 2022 at 04:20:28AM IST, Joanne Koong wrote:
> > > On Tue, Oct 18, 2022 at 6:59 AM Kumar Kartikeya Dwivedi
> > > <memxor@gmail.com> wrote:
> > > >
> > > > Currently, while reads are disallowed for dynptr stack slots, writes are
> > > > not. Reads don't work from both direct access and helpers, while writes
> > > > do work in both cases, but have the effect of overwriting the slot_type.
> > > >
> > > > While this is fine, handling for a few edge cases is missing. Firstly,
> > > > a user can overwrite the stack slots of dynptr partially.
> > > >
> > > > Consider the following layout:
> > > > spi: [d][d][?]
> > > >       2  1  0
> > > >
> > > > First slot is at spi 2, second at spi 1.
> > > > Now, do a write of 1 to 8 bytes for spi 1.
> > > >
> > > > This will essentially either write STACK_MISC for all slot_types or
> > > > STACK_MISC and STACK_ZERO (in case of size < BPF_REG_SIZE partial write
> > > > of zeroes). The end result is that slot is scrubbed.
> > > >
> > > > Now, the layout is:
> > > > spi: [d][m][?]
> > > >       2  1  0
> > > >
> > > > Suppose if user initializes spi = 1 as dynptr.
> > > > We get:
> > > > spi: [d][d][d]
> > > >       2  1  0
> > > >
> > > > But this time, both spi 2 and spi 1 have first_slot = true.
> > > >
> > > > Now, when passing spi 2 to dynptr helper, it will consider it as
> > > > initialized as it does not check whether second slot has first_slot ==
> > > > false. And spi 1 should already work as normal.
> > > >
> > > > This effectively replaced size + offset of first dynptr, hence allowing
> > > > invalid OOB reads and writes.
> > > >
> > > > Make a few changes to protect against this:
> > > > When writing to PTR_TO_STACK using BPF insns, when we touch spi of a
> > > > STACK_DYNPTR type, mark both first and second slot (regardless of which
> > > > slot we touch) as STACK_INVALID. Reads are already prevented.
> > > >
> > > > Second, prevent writing to stack memory from helpers if the range may
> > > > contain any STACK_DYNPTR slots. Reads are already prevented.
> > > >
> > > > For helpers, we cannot allow it to destroy dynptrs from the writes as
> > > > depending on arguments, helper may take uninit_mem and dynptr both at
> > > > the same time. This would mean that helper may write to uninit_mem
> > > > before it reads the dynptr, which would be bad.
> > > >
> > > > PTR_TO_MEM: [?????dd]
> > > >
> > > > Depending on the code inside the helper, it may end up overwriting the
> > > > dynptr contents first and then read those as the dynptr argument.
> > > >
> > > > Verifier would only simulate destruction when it does byte by byte
> > > > access simulation in check_helper_call for meta.access_size, and
> > > > fail to catch this case, as it happens after argument checks.
> > > >
> > > > The same would need to be done for any other non-trivial objects created
> > > > on the stack in the future, such as bpf_list_head on stack, or
> > > > bpf_rb_root on stack.
> > > >
> > > > A common misunderstanding in the current code is that MEM_UNINIT means
> > > > writes, but note that writes may also be performed even without
> > > > MEM_UNINIT in case of helpers, in that case the code after handling meta
> > > > && meta->raw_mode will complain when it sees STACK_DYNPTR. So that
> > > > invalid read case also covers writes to potential STACK_DYNPTR slots.
> > > > The only loophole was in case of meta->raw_mode which simulated writes
> > > > through instructions which could overwrite them.
> > > >
> > > > A future series sequenced after this will focus on the clean up of
> > > > helper access checks and bugs around that.
> > >
> > > thanks for your work on this (and on the rest of the stack, which I'm
> > > still working on reviewing)
> > >
> > > Regarding writes leading to partial dynptr stack slots, I'm regretting
> > > not having the verifier flat-out reject this in the first place
> > > (instead of it being allowed but internally the stack slot gets marked
> > > as invalid) - I think it overall ends up being more confusing to end
> > > users, where there it's not obvious at all that writing to the dynptr
> > > on the stack automatically invalidates it. I'm not sure whether it's
> > > too late from a public API behavior perspective to change this or not.
> >
> > It would be incorrect to reject writes into dynptrs whose reference is not
> > tracked by the verifier (so bpf_dynptr_from_mem), because the compiler would be
> > free to reuse the stack space for some other variable when the local dynptr
> > variable's lifetime ends, and the verifier would have no way to know when the
> > variable went out of scope.
> >
> > I feel it is also incorrect to refuse bpf_dynptr_from_mem where unref dynptr
> > already exists as well. Right now it sees STACK_DYNPTR in the slot_type and
> > fails. But consider something like this:
> >
> > void prog(void)
> > {
> >         {
> >                 struct bpf_dynptr ptr;
> >                 bpf_dynptr_from_mem(...);
> >                 ...
> >         }
> >
> >         ...
> >
> >         {
> >                 struct bpf_dynptr ptr;
> >                 bpf_dynptr_from_mem(...);
> >         }
> > }
> >
> > The program is valid, but if ptr in both scopes share the same stack slots, the
> > call in the second scope would fail because verifier would see STACK_DYNPTR in
> > slot_type.

I don't think compiler is allowed to reuse the same stack slot for
those two ptrs, because we are passing a pointer to it into a
black-box bpf_dynptr_from_mem() function, so kernel can't assume that
this slot is free to be reused just because no one is accessing it
after bpf_dynptr_from_mem (I think?)

Would it make sense to allow *optional* bpf_dynptr_free (of is it
bpf_dynptr_put, not sure) for non-reference-tracked dynptrs if indeed
we wanted to reuse the same stack variable for multiple dynptrs,
though?

> >
> > It is fine though to simply reject writes in case of dynptrs obtained from
> > bpf_ringbuf_reserve_dynptr, because if they are overwritten before being
> > released, it will end up being an error later due to unreleased reference state.
> > The lifetime of the object in this case is being controlled using BPF helpers
> > explicitly.
> >
> > So I think it is ok to do in the second case, and it is unaffected by backward
> > compatibility constraints. It wouldn't have been possible for the unref case
> > even when you started out with this.
>
> I see! I didn't realize the compiler can reuse the stack slot for
> different variables within the same stack frame. I agree with your
> thoughts.
>
> >
> > > ANyways, assuming it is too late, I left a few comments below.
> > >
> > > >
> > > > Fixes: 97e03f521050 ("bpf: Add verifier support for dynptrs")
> > > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > > > ---
> > > >  kernel/bpf/verifier.c | 76 +++++++++++++++++++++++++++++++++++++++++++
> > > >  1 file changed, 76 insertions(+)
> > > >
> > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > > index 0fd73f96c5e2..89ae384ea6a7 100644
> > > > --- a/kernel/bpf/verifier.c
> > > > +++ b/kernel/bpf/verifier.c
> > > > @@ -740,6 +740,8 @@ static void mark_dynptr_cb_reg(struct bpf_reg_state *reg1,
> > > >         __mark_dynptr_regs(reg1, NULL, type);
> > > >  }
> > > >
> > > > +static void destroy_stack_slots_dynptr(struct bpf_verifier_env *env,
> > > > +                                      struct bpf_func_state *state, int spi);
> > > >
> > > >  static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
> > > >                                    enum bpf_arg_type arg_type, int insn_idx)
> > > > @@ -755,6 +757,9 @@ static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_
> > > >         if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS))
> > > >                 return -EINVAL;
> > > >
> > > > +       destroy_stack_slots_dynptr(env, state, spi);
> > > > +       destroy_stack_slots_dynptr(env, state, spi - 1);
> > >
> > > I don't think we need these two lines. mark_stack_slots_dynptr() is
> > > called only in the case where an uninitialized dynptr is getting
> > > initialized; is_dynptr_reg_valid_uninit() will have already been
> > > called prior to this (in check_func_arg()), where
> > > is_dynptr_reg_valid_uninit() will have checked that for any
> > > uninitialized dynptr, the stack slot has not already been marked as
> > > STACK_DYNTPR. Maybe I'm missing something in this analysis? What are
> > > your thoughts?
> > >
> >
> > You're right, it shouldn't be needed here now.
> > In case of insn writes we already destroy both slots of a pair.
> >
> > If we decide to allow mark_stack_slots_dynptr on STACK_DYNPTR that is
> > unreferenced, per the discussion above, I will keep it, because it would be
> > needed then, otherwise I will drop it.
>
> I think we should remove these two lines from this patch and have the
> code for allowing mark_stack_slots_dynptr on unreferenced
> STACK_DYNPTRs as a separate patch since that will also require changes
> to is_dynptr_reg_valid_uninit().
>
> >
> > > > +
> > > >         for (i = 0; i < BPF_REG_SIZE; i++) {
> > > >                 state->stack[spi].slot_type[i] = STACK_DYNPTR;
> > > >                 state->stack[spi - 1].slot_type[i] = STACK_DYNPTR;
> > > > @@ -829,6 +834,44 @@ static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_re
> > > >         return 0;
> > > >  }
> > > >
> > > > +static void destroy_stack_slots_dynptr(struct bpf_verifier_env *env,
> > > > +                                      struct bpf_func_state *state, int spi)
> > > > +{
> > > > +       int i;
> > > > +
> > > > +       /* We always ensure that STACK_DYNPTR is never set partially,
> > > > +        * hence just checking for slot_type[0] is enough. This is
> > > > +        * different for STACK_SPILL, where it may be only set for
> > > > +        * 1 byte, so code has to use is_spilled_reg.
> > > > +        */
> > > > +       if (state->stack[spi].slot_type[0] != STACK_DYNPTR)
> > > > +               return;
> > > > +       /* Reposition spi to first slot */
> > > > +       if (!state->stack[spi].spilled_ptr.dynptr.first_slot)
> > > > +               spi = spi + 1;
> > > > +
> > > > +       mark_stack_slot_scratched(env, spi);
> > > > +       mark_stack_slot_scratched(env, spi - 1);
> > > > +
> > > > +       /* Writing partially to one dynptr stack slot destroys both. */
> > > > +       for (i = 0; i < BPF_REG_SIZE; i++) {
> > > > +               state->stack[spi].slot_type[i] = STACK_INVALID;
> > > > +               state->stack[spi - 1].slot_type[i] = STACK_INVALID;
> > > > +       }
> > > > +
> > > > +       /* Do not release reference state, we are destroying dynptr on stack,
> > > > +        * not using some helper to release it. Just reset register.
> > > > +        */
> > > > +       __mark_reg_not_init(env, &state->stack[spi].spilled_ptr);
> > > > +       __mark_reg_not_init(env, &state->stack[spi - 1].spilled_ptr);
> > > > +
> > > > +       /* Same reason as unmark_stack_slots_dynptr above */
> > > > +       state->stack[spi].spilled_ptr.live |= REG_LIVE_WRITTEN;
> > > > +       state->stack[spi - 1].spilled_ptr.live |= REG_LIVE_WRITTEN;
> > > > +
> > > > +       return;
> > > > +}
> > >
> > > I think it'd be cleaner if we combined this and
> > > unmark_stack_slots_dynptr() into one function. The logic is pretty
> > > much the same except for if the reference state should be released.
> > >
> >
> > Ack, will do. I can put this logic in a common function and both could be
> > callers of that, passing true/false, so it remains readable while avoiding the
> > duplication.
> >
> > > > +
> > > >  static bool is_dynptr_reg_valid_uninit(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> > > >  {
> > > >         struct bpf_func_state *state = func(env, reg);
> > > > @@ -3183,6 +3226,8 @@ static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
> > > >                         env->insn_aux_data[insn_idx].sanitize_stack_spill = true;
> > > >         }
> > > >
> > > > +       destroy_stack_slots_dynptr(env, state, spi);
> > >
> > > If the stack slot is a dynptr, I think we can just return after this
> > > call, else we do extra work and mark the stack slots as STACK_MISC
> > > (3rd case in the if statement).
> > >
> >
> > That is the intention here. The destroy_stack_slots_dynptr overwrites two slots,
> > while we still simulate the write to the slot being written to.
> >
> > [?][d][d]
> >  2  1  0
> >
> > If I wrote to spi = 1, it would now be [?][m][?].
> > Earlier it would have been [?][m][d].
> >
> > Any stray write (either fixed or variable offset) to a dynptr slot ends the
> > lifetime of the dynptr object, so both slots representing the dynptr object need
> > to be invalidated.
> >
> > But the write itself needs to happen, and its state has to be reflected in the
> > stack state for those particular slot(s).
> >
> > The main point here is to prevent partial destruction, which allows manifesting
> > the case described in the commit log. Writing to one slot of the two
> > representing a dynptr invalidates both.
>
> Returning after the destroy_stack_slots_dynptr call would overwrite
> both slots; my previous point was that we could return after calling
> this instead of also going through the 3rd if statement below. But I
> just realized that we do need to go through the 3rd if statement since
> the stack slots need to be STACK_MISC, not STACK_INVALID.
>
> >
> > > > +
> > > >         mark_stack_slot_scratched(env, spi);
> > > >         if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) &&
> > > >             !register_is_null(reg) && env->bpf_capable) {
> > > > @@ -3296,6 +3341,13 @@ static int check_stack_write_var_off(struct bpf_verifier_env *env,
> > > >         if (err)
> > > >                 return err;
> > > >
> > > > +       for (i = min_off; i < max_off; i++) {
> > > > +               int slot, spi;
> > > > +
> > > > +               slot = -i - 1;
> > > > +               spi = slot / BPF_REG_SIZE;
> > > > +               destroy_stack_slots_dynptr(env, state, spi);
> > > > +       }
> > > >
> > >
> > > Instead of calling destroy_stack_slots_dynptr() in
> > > check_stack_write_fixed_off() and check_stack_write_var_off(), I think
> > > calling it from check_stack_write() would be a better place. I think
> > > that'd be more efficient as well where if it is a write to a dynptr,
> > > we can directly return after invalidating the stack slot.
> > >
> >
> > We cannot directly return, as explained above.
> >
> > > >         /* Variable offset writes destroy any spilled pointers in range. */
> > > >         for (i = min_off; i < max_off; i++) {
> > > > @@ -5257,6 +5309,30 @@ static int check_stack_range_initialized(
> > > >         }
> > > >
> > > >         if (meta && meta->raw_mode) {
> > > > +               /* Ensure we won't be overwriting dynptrs when simulating byte
> > > > +                * by byte access in check_helper_call using meta.access_size.
> > > > +                * This would be a problem if we have a helper in the future
> > > > +                * which takes:
> > > > +                *
> > > > +                *      helper(uninit_mem, len, dynptr)
> > > > +                *
> > > > +                * Now, uninint_mem may overlap with dynptr pointer. Hence, it
> > > > +                * may end up writing to dynptr itself when touching memory from
> > > > +                * arg 1. This can be relaxed on a case by case basis for known
> > > > +                * safe cases, but reject due to the possibilitiy of aliasing by
> > > > +                * default.
> > > > +                */
> > > > +               for (i = min_off; i < max_off + access_size; i++) {
> > > > +                       slot = -i - 1;
> > > > +                       spi = slot / BPF_REG_SIZE;
> > >
> > > I think we can just use get_spi(i) here
> > >
> >
> > Ack.
> >
> > > > +                       /* raw_mode may write past allocated_stack */
> > > > +                       if (state->allocated_stack <= slot)
> > > > +                               continue;
> > >
> > > break?
> > >
> >
> > I think you realised why it's continue in your other reply :).
Kumar Kartikeya Dwivedi Nov. 4, 2022, 11:02 p.m. UTC | #6
On Sat, Nov 05, 2022 at 03:44:53AM IST, Andrii Nakryiko wrote:
> On Thu, Nov 3, 2022 at 7:07 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > On Sat, Oct 22, 2022 at 5:08 AM Kumar Kartikeya Dwivedi
> > <memxor@gmail.com> wrote:
> > >
> > > On Sat, Oct 22, 2022 at 04:20:28AM IST, Joanne Koong wrote:
> > > > [...]
> > > >
> > > > thanks for your work on this (and on the rest of the stack, which I'm
> > > > still working on reviewing)
> > > >
> > > > Regarding writes leading to partial dynptr stack slots, I'm regretting
> > > > not having the verifier flat-out reject this in the first place
> > > > (instead of it being allowed but internally the stack slot gets marked
> > > > as invalid) - I think it overall ends up being more confusing to end
> > > > users, where there it's not obvious at all that writing to the dynptr
> > > > on the stack automatically invalidates it. I'm not sure whether it's
> > > > too late from a public API behavior perspective to change this or not.
> > >
> > > It would be incorrect to reject writes into dynptrs whose reference is not
> > > tracked by the verifier (so bpf_dynptr_from_mem), because the compiler would be
> > > free to reuse the stack space for some other variable when the local dynptr
> > > variable's lifetime ends, and the verifier would have no way to know when the
> > > variable went out of scope.
> > >
> > > I feel it is also incorrect to refuse bpf_dynptr_from_mem where unref dynptr
> > > already exists as well. Right now it sees STACK_DYNPTR in the slot_type and
> > > fails. But consider something like this:
> > >
> > > void prog(void)
> > > {
> > >         {
> > >                 struct bpf_dynptr ptr;
> > >                 bpf_dynptr_from_mem(...);
> > >                 ...
> > >         }
> > >
> > >         ...
> > >
> > >         {
> > >                 struct bpf_dynptr ptr;
> > >                 bpf_dynptr_from_mem(...);
> > >         }
> > > }
> > >
> > > The program is valid, but if ptr in both scopes share the same stack slots, the
> > > call in the second scope would fail because verifier would see STACK_DYNPTR in
> > > slot_type.
>
> I don't think compiler is allowed to reuse the same stack slot for
> those two ptrs, because we are passing a pointer to it into a
> black-box bpf_dynptr_from_mem() function, so kernel can't assume that
> this slot is free to be reused just because no one is accessing it
> after bpf_dynptr_from_mem (I think?)
>

At the C level, once the lifetime of the object ends upon execution going out of
its enclosing scope, even if its pointer escaped, the ending of the lifetime
implicitly invalidates any such pointers (as per the abstract machine), so the
compiler is well within its rights (because using such a pointer any more is UB)
to reuse the same storage for another object.

E.g. https://godbolt.org/z/Wcvzfbfbr

For the following:

struct bpf_dynptr {
	unsigned long a, b;
};

extern void bpf_dynptr_func(struct bpf_dynptr *);
extern void bpf_dynptr_ro(const struct bpf_dynptr *);

int main(void)
{
	{
		struct bpf_dynptr d2;

		bpf_dynptr_func(&d2);
		bpf_dynptr_ro(&d2);
	}
	{
		struct bpf_dynptr d3;

		bpf_dynptr_func(&d3);
		bpf_dynptr_ro(&d3);
	}
}

clang produces:

main:                                   # @main
        r6 = r10
        r6 += -16
        call bpf_dynptr_func
        call bpf_dynptr_ro
        r6 = r10
        r6 += -16
        call bpf_dynptr_func
        call bpf_dynptr_ro
        exit

> Would it make sense to allow *optional* bpf_dynptr_free (of is it
> bpf_dynptr_put, not sure) for non-reference-tracked dynptrs if indeed
> we wanted to reuse the same stack variable for multiple dynptrs,
> though?

I think it's fine to simply overwrite the type of object when reusing the same
stack slot.

For some precedence:

This is what compilers essentially do for effective(C)/dynamic(C++) types, for
instance GCC will simply overwrite the effective type of the object (even for
declared objects, even though standard only permits overwriting it for allocated
objects) whenever you do a store using a lvalue of some type T or memcpy
(transferring the effective type to dst).

For a more concrete analogy, Storage Reuse in
https://en.cppreference.com/w/cpp/language/lifetime describes how placement new
simply reuses storage of trivially destructible objects without requiring the
destructor to be called. Since it is C++ if you replace storage of non-trivial
object it must be switched back to the same type before the runtime calls the
original destructors emitted implicitly for declared type.

So in BPF terms, local dynptr (dynptr_from_mem) is trivially destructible, but
ringbuf dynptr requires its 'destructor' to be called before storage is reused.

There are two choices in the second case, either complain when such a ringbuf
dynptr's storage is reused without calling its release function, or the verifier
will complain later when seeing an unreleased reference. I think complaining
early is better as later the user has to correlate insn_idx of leaked reference.
The program is going to fail in both cases anyway, so it makes sense to give a
better error back to the user.
Andrii Nakryiko Nov. 4, 2022, 11:08 p.m. UTC | #7
On Fri, Nov 4, 2022 at 4:02 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
> On Sat, Nov 05, 2022 at 03:44:53AM IST, Andrii Nakryiko wrote:
> > On Thu, Nov 3, 2022 at 7:07 AM Joanne Koong <joannelkoong@gmail.com> wrote:
> > >
> > > On Sat, Oct 22, 2022 at 5:08 AM Kumar Kartikeya Dwivedi
> > > <memxor@gmail.com> wrote:
> > > >
> > > > On Sat, Oct 22, 2022 at 04:20:28AM IST, Joanne Koong wrote:
> > > > > [...]
> > > > >
> > > > > thanks for your work on this (and on the rest of the stack, which I'm
> > > > > still working on reviewing)
> > > > >
> > > > > Regarding writes leading to partial dynptr stack slots, I'm regretting
> > > > > not having the verifier flat-out reject this in the first place
> > > > > (instead of it being allowed but internally the stack slot gets marked
> > > > > as invalid) - I think it overall ends up being more confusing to end
> > > > > users, where there it's not obvious at all that writing to the dynptr
> > > > > on the stack automatically invalidates it. I'm not sure whether it's
> > > > > too late from a public API behavior perspective to change this or not.
> > > >
> > > > It would be incorrect to reject writes into dynptrs whose reference is not
> > > > tracked by the verifier (so bpf_dynptr_from_mem), because the compiler would be
> > > > free to reuse the stack space for some other variable when the local dynptr
> > > > variable's lifetime ends, and the verifier would have no way to know when the
> > > > variable went out of scope.
> > > >
> > > > I feel it is also incorrect to refuse bpf_dynptr_from_mem where unref dynptr
> > > > already exists as well. Right now it sees STACK_DYNPTR in the slot_type and
> > > > fails. But consider something like this:
> > > >
> > > > void prog(void)
> > > > {
> > > >         {
> > > >                 struct bpf_dynptr ptr;
> > > >                 bpf_dynptr_from_mem(...);
> > > >                 ...
> > > >         }
> > > >
> > > >         ...
> > > >
> > > >         {
> > > >                 struct bpf_dynptr ptr;
> > > >                 bpf_dynptr_from_mem(...);
> > > >         }
> > > > }
> > > >
> > > > The program is valid, but if ptr in both scopes share the same stack slots, the
> > > > call in the second scope would fail because verifier would see STACK_DYNPTR in
> > > > slot_type.
> >
> > I don't think compiler is allowed to reuse the same stack slot for
> > those two ptrs, because we are passing a pointer to it into a
> > black-box bpf_dynptr_from_mem() function, so kernel can't assume that
> > this slot is free to be reused just because no one is accessing it
> > after bpf_dynptr_from_mem (I think?)
> >
>
> At the C level, once the lifetime of the object ends upon execution going out of
> its enclosing scope, even if its pointer escaped, the ending of the lifetime
> implicitly invalidates any such pointers (as per the abstract machine), so the
> compiler is well within its rights (because using such a pointer any more is UB)
> to reuse the same storage for another object.
>
> E.g. https://godbolt.org/z/Wcvzfbfbr
>
> For the following:
>
> struct bpf_dynptr {
>         unsigned long a, b;
> };
>
> extern void bpf_dynptr_func(struct bpf_dynptr *);
> extern void bpf_dynptr_ro(const struct bpf_dynptr *);
>
> int main(void)
> {
>         {
>                 struct bpf_dynptr d2;
>
>                 bpf_dynptr_func(&d2);
>                 bpf_dynptr_ro(&d2);
>         }
>         {
>                 struct bpf_dynptr d3;
>
>                 bpf_dynptr_func(&d3);
>                 bpf_dynptr_ro(&d3);
>         }
> }

oh, I completely missed that it's nested lexical scopes, doh. Never
mind what I said then.

>
> clang produces:
>
> main:                                   # @main
>         r6 = r10
>         r6 += -16
>         call bpf_dynptr_func
>         call bpf_dynptr_ro
>         r6 = r10
>         r6 += -16
>         call bpf_dynptr_func
>         call bpf_dynptr_ro
>         exit
>
> > Would it make sense to allow *optional* bpf_dynptr_free (of is it
> > bpf_dynptr_put, not sure) for non-reference-tracked dynptrs if indeed
> > we wanted to reuse the same stack variable for multiple dynptrs,
> > though?
>
> I think it's fine to simply overwrite the type of object when reusing the same
> stack slot.
>
> For some precedence:
>
> This is what compilers essentially do for effective(C)/dynamic(C++) types, for
> instance GCC will simply overwrite the effective type of the object (even for
> declared objects, even though standard only permits overwriting it for allocated
> objects) whenever you do a store using a lvalue of some type T or memcpy
> (transferring the effective type to dst).
>
> For a more concrete analogy, Storage Reuse in
> https://en.cppreference.com/w/cpp/language/lifetime describes how placement new
> simply reuses storage of trivially destructible objects without requiring the
> destructor to be called. Since it is C++ if you replace storage of non-trivial
> object it must be switched back to the same type before the runtime calls the
> original destructors emitted implicitly for declared type.
>
> So in BPF terms, local dynptr (dynptr_from_mem) is trivially destructible, but
> ringbuf dynptr requires its 'destructor' to be called before storage is reused.
>
> There are two choices in the second case, either complain when such a ringbuf
> dynptr's storage is reused without calling its release function, or the verifier
> will complain later when seeing an unreleased reference. I think complaining
> early is better as later the user has to correlate insn_idx of leaked reference.
> The program is going to fail in both cases anyway, so it makes sense to give a
> better error back to the user.

Agreed.
diff mbox series

Patch

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0fd73f96c5e2..89ae384ea6a7 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -740,6 +740,8 @@  static void mark_dynptr_cb_reg(struct bpf_reg_state *reg1,
 	__mark_dynptr_regs(reg1, NULL, type);
 }
 
+static void destroy_stack_slots_dynptr(struct bpf_verifier_env *env,
+				       struct bpf_func_state *state, int spi);
 
 static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_state *reg,
 				   enum bpf_arg_type arg_type, int insn_idx)
@@ -755,6 +757,9 @@  static int mark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_reg_
 	if (!is_spi_bounds_valid(state, spi, BPF_DYNPTR_NR_SLOTS))
 		return -EINVAL;
 
+	destroy_stack_slots_dynptr(env, state, spi);
+	destroy_stack_slots_dynptr(env, state, spi - 1);
+
 	for (i = 0; i < BPF_REG_SIZE; i++) {
 		state->stack[spi].slot_type[i] = STACK_DYNPTR;
 		state->stack[spi - 1].slot_type[i] = STACK_DYNPTR;
@@ -829,6 +834,44 @@  static int unmark_stack_slots_dynptr(struct bpf_verifier_env *env, struct bpf_re
 	return 0;
 }
 
+static void destroy_stack_slots_dynptr(struct bpf_verifier_env *env,
+				       struct bpf_func_state *state, int spi)
+{
+	int i;
+
+	/* We always ensure that STACK_DYNPTR is never set partially,
+	 * hence just checking for slot_type[0] is enough. This is
+	 * different for STACK_SPILL, where it may be only set for
+	 * 1 byte, so code has to use is_spilled_reg.
+	 */
+	if (state->stack[spi].slot_type[0] != STACK_DYNPTR)
+		return;
+	/* Reposition spi to first slot */
+	if (!state->stack[spi].spilled_ptr.dynptr.first_slot)
+		spi = spi + 1;
+
+	mark_stack_slot_scratched(env, spi);
+	mark_stack_slot_scratched(env, spi - 1);
+
+	/* Writing partially to one dynptr stack slot destroys both. */
+	for (i = 0; i < BPF_REG_SIZE; i++) {
+		state->stack[spi].slot_type[i] = STACK_INVALID;
+		state->stack[spi - 1].slot_type[i] = STACK_INVALID;
+	}
+
+	/* Do not release reference state, we are destroying dynptr on stack,
+	 * not using some helper to release it. Just reset register.
+	 */
+	__mark_reg_not_init(env, &state->stack[spi].spilled_ptr);
+	__mark_reg_not_init(env, &state->stack[spi - 1].spilled_ptr);
+
+	/* Same reason as unmark_stack_slots_dynptr above */
+	state->stack[spi].spilled_ptr.live |= REG_LIVE_WRITTEN;
+	state->stack[spi - 1].spilled_ptr.live |= REG_LIVE_WRITTEN;
+
+	return;
+}
+
 static bool is_dynptr_reg_valid_uninit(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
 {
 	struct bpf_func_state *state = func(env, reg);
@@ -3183,6 +3226,8 @@  static int check_stack_write_fixed_off(struct bpf_verifier_env *env,
 			env->insn_aux_data[insn_idx].sanitize_stack_spill = true;
 	}
 
+	destroy_stack_slots_dynptr(env, state, spi);
+
 	mark_stack_slot_scratched(env, spi);
 	if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) &&
 	    !register_is_null(reg) && env->bpf_capable) {
@@ -3296,6 +3341,13 @@  static int check_stack_write_var_off(struct bpf_verifier_env *env,
 	if (err)
 		return err;
 
+	for (i = min_off; i < max_off; i++) {
+		int slot, spi;
+
+		slot = -i - 1;
+		spi = slot / BPF_REG_SIZE;
+		destroy_stack_slots_dynptr(env, state, spi);
+	}
 
 	/* Variable offset writes destroy any spilled pointers in range. */
 	for (i = min_off; i < max_off; i++) {
@@ -5257,6 +5309,30 @@  static int check_stack_range_initialized(
 	}
 
 	if (meta && meta->raw_mode) {
+		/* Ensure we won't be overwriting dynptrs when simulating byte
+		 * by byte access in check_helper_call using meta.access_size.
+		 * This would be a problem if we have a helper in the future
+		 * which takes:
+		 *
+		 *	helper(uninit_mem, len, dynptr)
+		 *
+		 * Now, uninint_mem may overlap with dynptr pointer. Hence, it
+		 * may end up writing to dynptr itself when touching memory from
+		 * arg 1. This can be relaxed on a case by case basis for known
+		 * safe cases, but reject due to the possibilitiy of aliasing by
+		 * default.
+		 */
+		for (i = min_off; i < max_off + access_size; i++) {
+			slot = -i - 1;
+			spi = slot / BPF_REG_SIZE;
+			/* raw_mode may write past allocated_stack */
+			if (state->allocated_stack <= slot)
+				continue;
+			if (state->stack[spi].slot_type[slot % BPF_REG_SIZE] == STACK_DYNPTR) {
+				verbose(env, "potential write to dynptr at off=%d disallowed\n", i);
+				return -EACCES;
+			}
+		}
 		meta->access_size = access_size;
 		meta->regno = regno;
 		return 0;