Message ID | 20250219125117.1956939-2-memxor@gmail.com (mailing list archive) |
---|---|
State | RFC |
Delegated to: | BPF |
Headers | show |
Series | Fix bpf_dynptr_slice_rdwr stack state | expand |
On Wed, 19 Feb 2025 at 13:51, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > > For the bpf_dynptr_slice_rdwr kfunc, the verifier may return a pointer > to the underlying packet (if the requested slice is linear), or copy out > the data to the buffer passed into the kfunc. The verifier performs > symbolic execution assuming the returned value is a PTR_TO_MEM of a > certain size (passed into the kfunc), and ensures reads and writes are > within bounds. > > A complication arises when the passed in buffer is a stack pointer. The > returned pointer may be used to perform writes (unlike bpf_dynptr_slice), > but the verifier will be unaware of which stack slots such writes may > end up overwriting. As such, a program may end up overwriting stack data > (such as spilled pointers) through such return values by accident, which > may cause undefined behavior. > > Fix this by exploring an additional path whenever the passed in argument > is a PTR_TO_STACK, and explore a path where the returned buffer is the > same as this stack pointer. This allows us to ensure that the program > doesn't introduce unsafe behavior when this condition is triggered at > runtime. > > The push_stack() call is performed after kfunc processing is over, > simply fixing up the return type to PTR_TO_STACK with proper frameno, > off, and var_off. I just sent this out to have discussion with code and context (in selftest). The current approach has two problems: It fails xdp_dynptr selftest with misaligned stack access (-72+30 = 42) size 4 error. This was happening before as well, but is surfaced because we see writes to the stack. It also leads to veristat regression (+80-100% in states) in two selftests. We probably want to avoid doing push_stack due to the states increase, and instead mark the stack slot instead whenever the returned PTR_TO_MEM is used for writing, but we'll have to keep remarking whenever writes happen, so it requires stashing some stack slot state in the register. The other option is invalidating the returned PTR_TO_MEM when the buffer on the stack is written to (i.e. the stack location gets reused).
On Wed, 19 Feb 2025 at 13:51, Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > > For the bpf_dynptr_slice_rdwr kfunc, the verifier may return a pointer > to the underlying packet (if the requested slice is linear), or copy out > the data to the buffer passed into the kfunc. The verifier performs > symbolic execution assuming the returned value is a PTR_TO_MEM of a > certain size (passed into the kfunc), and ensures reads and writes are > within bounds. > > A complication arises when the passed in buffer is a stack pointer. The > returned pointer may be used to perform writes (unlike bpf_dynptr_slice), > but the verifier will be unaware of which stack slots such writes may > end up overwriting. As such, a program may end up overwriting stack data > (such as spilled pointers) through such return values by accident, which > may cause undefined behavior. > > Fix this by exploring an additional path whenever the passed in argument > is a PTR_TO_STACK, and explore a path where the returned buffer is the > same as this stack pointer. This allows us to ensure that the program > doesn't introduce unsafe behavior when this condition is triggered at > runtime. > > The push_stack() call is performed after kfunc processing is over, > simply fixing up the return type to PTR_TO_STACK with proper frameno, > off, and var_off. > > Fixes: 66e3a13e7c2c ("bpf: Add bpf_dynptr_slice and bpf_dynptr_slice_rdwr") > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> > --- > [...] > } > > + /* Push a state for exploration where the returned buffer is pointing to > + * the stack buffer passed into bpf_dynptr_slice_rdwr, otherwise we > + * cannot see writes to the stack solely through marking it as > + * PTR_TO_MEM. We don't do the same for bpf_dynptr_slice, because the > + * returned pointer is MEM_RDONLY, hence cannot be used to write to the > + * stack. > + */ > + if (!insn->off && meta.arg_stack.found && > + insn->imm == special_kfunc_list[KF_bpf_dynptr_slice_rdwr]) { > + struct bpf_verifier_state *branch; > + struct bpf_reg_state *regs, *r0; > + > + branch = push_stack(env, env->insn_idx + 1, env->insn_idx, false); > + if (!branch) { > + verbose(env, "failed to push state to explore stack buffer in r0\n"); > + return -ENOMEM; > + } > + > + regs = branch->frame[branch->curframe]->regs; > + r0 = ®s[BPF_REG_0]; > + > + r0->type = PTR_TO_STACK; I forgot to fold in a hunk. This needs to be marked PTR_MAYBE_NULL (preserve the set reg->id). > [...] >
On Wed, Feb 19, 2025 at 4:51 AM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > > For the bpf_dynptr_slice_rdwr kfunc, the verifier may return a pointer > to the underlying packet (if the requested slice is linear), or copy out > the data to the buffer passed into the kfunc. The verifier performs > symbolic execution assuming the returned value is a PTR_TO_MEM of a > certain size (passed into the kfunc), and ensures reads and writes are > within bounds. sounds like check_kfunc_mem_size_reg() -> check_mem_size_reg() -> check_helper_mem_access() case PTR_TO_STACK: check_stack_range_initialized() clobber = true if (clobber) { __mark_reg_unknown(env, &state->stack[spi].spilled_ptr); is somehow broken? ohh. It might be: || !is_kfunc_arg_optional(meta->btf, buff_arg) This bit is wrong then. When arg is not-null check_kfunc_mem_size_reg() should be called. The PTR_TO_STACK abuse is a small subset of issues if check_kfunc_mem_size_reg() is indeed not called.
On Wed, 19 Feb 2025 at 18:41, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Wed, Feb 19, 2025 at 4:51 AM Kumar Kartikeya Dwivedi > <memxor@gmail.com> wrote: > > > > For the bpf_dynptr_slice_rdwr kfunc, the verifier may return a pointer > > to the underlying packet (if the requested slice is linear), or copy out > > the data to the buffer passed into the kfunc. The verifier performs > > symbolic execution assuming the returned value is a PTR_TO_MEM of a > > certain size (passed into the kfunc), and ensures reads and writes are > > within bounds. > > sounds like > check_kfunc_mem_size_reg() -> check_mem_size_reg() -> > check_helper_mem_access() > case PTR_TO_STACK: > check_stack_range_initialized() > clobber = true > if (clobber) { > __mark_reg_unknown(env, &state->stack[spi].spilled_ptr); > > is somehow broken? > > ohh. It might be: > || !is_kfunc_arg_optional(meta->btf, buff_arg) > > This bit is wrong then. > When arg is not-null check_kfunc_mem_size_reg() should be called. > The PTR_TO_STACK abuse is a small subset of issues > if check_kfunc_mem_size_reg() is indeed not called. The condition looks ok to me. The condition to do check_mem_size_reg is !null || !opt. So when it's null, and it's opt, it will be skipped. When it's null, and it's not opt, the check will happen. When arg is not-null, the said function is called, opt does not matter then. So the stack slots are marked misc. In our case we're not passing a NULL pointer in the selftest. The problem occurs once we spill to that slot _after_ the call, and then do a write through returned mem pointer. The final few lines from the selftest do the dirty thing, where r0 is aliasing fp-8, and r1 = 0. + *(u64 *)(r10 - 8) = r8; \ + *(u64 *)(r0 + 0) = r1; \ + r8 = *(u64 *)(r10 - 8); \ + r0 = *(u64 *)(r8 + 0); \ The write through r0 must re-mark the stack, but the verifier doesn't know it's pointing to the stack. push_stack was the conceptually cleaner/simpler fix, but it apparently isn't good enough. Remarking the stack on write to PTR_TO_MEM, or invalidating PTR_TO_MEM when r8 is spilled to fp-8 first time after the call are two options. Both have some concerns (first, the misaligned stack access is not caught, second PTR_TO_MEM may outlive stack frame). I don't recall if there was a hardware/JIT specific reason to care about stack access alignment or not (on some architectures), but otherwise we can over approximately mark at 8-byte granularity for any slot(s) that overlap with the buffer to cover such a case. The second problem is slightly trickier, which makes me lean towards invalidating returned PTR_TO_MEM when stack slot is overwritten or frame is destroyed.
On Wed, 2025-02-19 at 13:56 +0100, Kumar Kartikeya Dwivedi wrote: [...] > It also leads to veristat regression (+80-100% in states) in two selftests. > > We probably want to avoid doing push_stack due to the states increase, > and instead mark the stack slot instead whenever the returned > PTR_TO_MEM is used for writing, but we'll have to keep remarking > whenever writes happen, so it requires stashing some stack slot state > in the register. > The other option is invalidating the returned PTR_TO_MEM when the > buffer on the stack is written to (i.e. the stack location gets > reused). Would it be wrong, to always consider r0 to be a pointer to stack if buffer is provided? It's like modelling the PTR_TO_MEM with some additional precision. Otherwise, I think push_stack() is fine, as it keeps implementation simple.
On Wed, Feb 19, 2025 at 1:20 PM Eduard Zingerman <eddyz87@gmail.com> wrote: > > On Wed, 2025-02-19 at 13:56 +0100, Kumar Kartikeya Dwivedi wrote: > > [...] > > > It also leads to veristat regression (+80-100% in states) in two selftests. > > > > We probably want to avoid doing push_stack due to the states increase, > > and instead mark the stack slot instead whenever the returned > > PTR_TO_MEM is used for writing, but we'll have to keep remarking > > whenever writes happen, so it requires stashing some stack slot state > > in the register. > > The other option is invalidating the returned PTR_TO_MEM when the > > buffer on the stack is written to (i.e. the stack location gets > > reused). > > Would it be wrong, to always consider r0 to be a pointer to stack if > buffer is provided? It's like modelling the PTR_TO_MEM with some > additional precision. > > Otherwise, I think push_stack() is fine, as it keeps implementation simple. Discussed plenty of options offline with Kumar. The next step is to experiment returning PTR_TO_STACK with mem_size.
On Wed, Feb 19, 2025 at 10:10 AM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > > On Wed, 19 Feb 2025 at 18:41, Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Wed, Feb 19, 2025 at 4:51 AM Kumar Kartikeya Dwivedi > > <memxor@gmail.com> wrote: > > > > > > For the bpf_dynptr_slice_rdwr kfunc, the verifier may return a pointer > > > to the underlying packet (if the requested slice is linear), or copy out > > > the data to the buffer passed into the kfunc. The verifier performs > > > symbolic execution assuming the returned value is a PTR_TO_MEM of a > > > certain size (passed into the kfunc), and ensures reads and writes are > > > within bounds. > > > > sounds like > > check_kfunc_mem_size_reg() -> check_mem_size_reg() -> > > check_helper_mem_access() > > case PTR_TO_STACK: > > check_stack_range_initialized() > > clobber = true > > if (clobber) { > > __mark_reg_unknown(env, &state->stack[spi].spilled_ptr); > > > > is somehow broken? > > > > ohh. It might be: > > || !is_kfunc_arg_optional(meta->btf, buff_arg) > > > > This bit is wrong then. > > When arg is not-null check_kfunc_mem_size_reg() should be called. > > The PTR_TO_STACK abuse is a small subset of issues > > if check_kfunc_mem_size_reg() is indeed not called. > > The condition looks ok to me. > > The condition to do check_mem_size_reg is !null || !opt. > So when it's null, and it's opt, it will be skipped. > When it's null, and it's not opt, the check will happen. > When arg is not-null, the said function is called, opt does not matter then. > So the stack slots are marked misc. > > In our case we're not passing a NULL pointer in the selftest. > > The problem occurs once we spill to that slot _after_ the call, and > then do a write through returned mem pointer. > > The final few lines from the selftest do the dirty thing, where r0 is > aliasing fp-8, and r1 = 0. > > + *(u64 *)(r10 - 8) = r8; \ > + *(u64 *)(r0 + 0) = r1; \ > + r8 = *(u64 *)(r10 - 8); \ > + r0 = *(u64 *)(r8 + 0); \ > > The write through r0 must re-mark the stack, but the verifier doesn't > know it's pointing to the stack. > push_stack was the conceptually cleaner/simpler fix, but it apparently > isn't good enough. > > Remarking the stack on write to PTR_TO_MEM, or invalidating PTR_TO_MEM > when r8 is spilled to fp-8 first time after the call are two options. > Both have some concerns (first, the misaligned stack access is not > caught, second PTR_TO_MEM may outlive stack frame). Reading the description of the problem my first instinct was to have stack slots with associated obj_ref_id for such cases and when something writes into that stack slot, invalidate everything with that obj_ref_id. So that's probably what you mean by invalidating PTR_TO_MEM, right? Not sure I understand what "PTR_TO_STACK with mem_size" (that Alexei mentioned in another email) means, though, so hard to compare. > > I don't recall if there was a hardware/JIT specific reason to care > about stack access alignment or not (on some architectures), but > otherwise we can over approximately mark at 8-byte granularity for any > slot(s) that overlap with the buffer to cover such a case. The second > problem is slightly trickier, which makes me lean towards invalidating > returned PTR_TO_MEM when stack slot is overwritten or frame is > destroyed.
On Thu, 20 Feb 2025 at 01:13, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Wed, Feb 19, 2025 at 10:10 AM Kumar Kartikeya Dwivedi > <memxor@gmail.com> wrote: > > > > On Wed, 19 Feb 2025 at 18:41, Alexei Starovoitov > > <alexei.starovoitov@gmail.com> wrote: > > > > > > On Wed, Feb 19, 2025 at 4:51 AM Kumar Kartikeya Dwivedi > > > <memxor@gmail.com> wrote: > > > > > > > > For the bpf_dynptr_slice_rdwr kfunc, the verifier may return a pointer > > > > to the underlying packet (if the requested slice is linear), or copy out > > > > the data to the buffer passed into the kfunc. The verifier performs > > > > symbolic execution assuming the returned value is a PTR_TO_MEM of a > > > > certain size (passed into the kfunc), and ensures reads and writes are > > > > within bounds. > > > > > > sounds like > > > check_kfunc_mem_size_reg() -> check_mem_size_reg() -> > > > check_helper_mem_access() > > > case PTR_TO_STACK: > > > check_stack_range_initialized() > > > clobber = true > > > if (clobber) { > > > __mark_reg_unknown(env, &state->stack[spi].spilled_ptr); > > > > > > is somehow broken? > > > > > > ohh. It might be: > > > || !is_kfunc_arg_optional(meta->btf, buff_arg) > > > > > > This bit is wrong then. > > > When arg is not-null check_kfunc_mem_size_reg() should be called. > > > The PTR_TO_STACK abuse is a small subset of issues > > > if check_kfunc_mem_size_reg() is indeed not called. > > > > The condition looks ok to me. > > > > The condition to do check_mem_size_reg is !null || !opt. > > So when it's null, and it's opt, it will be skipped. > > When it's null, and it's not opt, the check will happen. > > When arg is not-null, the said function is called, opt does not matter then. > > So the stack slots are marked misc. > > > > In our case we're not passing a NULL pointer in the selftest. > > > > The problem occurs once we spill to that slot _after_ the call, and > > then do a write through returned mem pointer. > > > > The final few lines from the selftest do the dirty thing, where r0 is > > aliasing fp-8, and r1 = 0. > > > > + *(u64 *)(r10 - 8) = r8; \ > > + *(u64 *)(r0 + 0) = r1; \ > > + r8 = *(u64 *)(r10 - 8); \ > > + r0 = *(u64 *)(r8 + 0); \ > > > > The write through r0 must re-mark the stack, but the verifier doesn't > > know it's pointing to the stack. > > push_stack was the conceptually cleaner/simpler fix, but it apparently > > isn't good enough. > > > > Remarking the stack on write to PTR_TO_MEM, or invalidating PTR_TO_MEM > > when r8 is spilled to fp-8 first time after the call are two options. > > Both have some concerns (first, the misaligned stack access is not > > caught, second PTR_TO_MEM may outlive stack frame). > > Reading the description of the problem my first instinct was to have > stack slots with associated obj_ref_id for such cases and when > something writes into that stack slot, invalidate everything with that > obj_ref_id. So that's probably what you mean by invalidating > PTR_TO_MEM, right? > > Not sure I understand what "PTR_TO_STACK with mem_size" (that Alexei > mentioned in another email) means, though, so hard to compare. > Invalidation is certainly one option. The one Alexei mentioned was where we discussed bounding how much data can be read through the PTR_TO_STACK (similar to PTR_TO_MEM), and mark r0 as PTR_TO_STACK. This ends up combining the constraints of both types of pointers (it may as well be called PTR_TO_STACK_OR_MEM) without forking paths. The benefit over the push_stack approach is that we avoid the states regression for cls_redirect and balancer_ingress. For the selftest failure, I plan to just silence the error by changing it. > > > > I don't recall if there was a hardware/JIT specific reason to care > > about stack access alignment or not (on some architectures), but > > otherwise we can over approximately mark at 8-byte granularity for any > > slot(s) that overlap with the buffer to cover such a case. The second > > problem is slightly trickier, which makes me lean towards invalidating > > returned PTR_TO_MEM when stack slot is overwritten or frame is > > destroyed.
On Thu, Feb 20, 2025 at 7:41 AM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > > On Thu, 20 Feb 2025 at 01:13, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > > > On Wed, Feb 19, 2025 at 10:10 AM Kumar Kartikeya Dwivedi > > <memxor@gmail.com> wrote: > > > > > > On Wed, 19 Feb 2025 at 18:41, Alexei Starovoitov > > > <alexei.starovoitov@gmail.com> wrote: > > > > > > > > On Wed, Feb 19, 2025 at 4:51 AM Kumar Kartikeya Dwivedi > > > > <memxor@gmail.com> wrote: > > > > > > > > > > For the bpf_dynptr_slice_rdwr kfunc, the verifier may return a pointer > > > > > to the underlying packet (if the requested slice is linear), or copy out > > > > > the data to the buffer passed into the kfunc. The verifier performs > > > > > symbolic execution assuming the returned value is a PTR_TO_MEM of a > > > > > certain size (passed into the kfunc), and ensures reads and writes are > > > > > within bounds. > > > > > > > > sounds like > > > > check_kfunc_mem_size_reg() -> check_mem_size_reg() -> > > > > check_helper_mem_access() > > > > case PTR_TO_STACK: > > > > check_stack_range_initialized() > > > > clobber = true > > > > if (clobber) { > > > > __mark_reg_unknown(env, &state->stack[spi].spilled_ptr); > > > > > > > > is somehow broken? > > > > > > > > ohh. It might be: > > > > || !is_kfunc_arg_optional(meta->btf, buff_arg) > > > > > > > > This bit is wrong then. > > > > When arg is not-null check_kfunc_mem_size_reg() should be called. > > > > The PTR_TO_STACK abuse is a small subset of issues > > > > if check_kfunc_mem_size_reg() is indeed not called. > > > > > > The condition looks ok to me. > > > > > > The condition to do check_mem_size_reg is !null || !opt. > > > So when it's null, and it's opt, it will be skipped. > > > When it's null, and it's not opt, the check will happen. > > > When arg is not-null, the said function is called, opt does not matter then. > > > So the stack slots are marked misc. > > > > > > In our case we're not passing a NULL pointer in the selftest. > > > > > > The problem occurs once we spill to that slot _after_ the call, and > > > then do a write through returned mem pointer. > > > > > > The final few lines from the selftest do the dirty thing, where r0 is > > > aliasing fp-8, and r1 = 0. > > > > > > + *(u64 *)(r10 - 8) = r8; \ > > > + *(u64 *)(r0 + 0) = r1; \ > > > + r8 = *(u64 *)(r10 - 8); \ > > > + r0 = *(u64 *)(r8 + 0); \ > > > > > > The write through r0 must re-mark the stack, but the verifier doesn't > > > know it's pointing to the stack. > > > push_stack was the conceptually cleaner/simpler fix, but it apparently > > > isn't good enough. > > > > > > Remarking the stack on write to PTR_TO_MEM, or invalidating PTR_TO_MEM > > > when r8 is spilled to fp-8 first time after the call are two options. > > > Both have some concerns (first, the misaligned stack access is not > > > caught, second PTR_TO_MEM may outlive stack frame). > > > > Reading the description of the problem my first instinct was to have > > stack slots with associated obj_ref_id for such cases and when > > something writes into that stack slot, invalidate everything with that > > obj_ref_id. So that's probably what you mean by invalidating > > PTR_TO_MEM, right? > > > > Not sure I understand what "PTR_TO_STACK with mem_size" (that Alexei > > mentioned in another email) means, though, so hard to compare. > > > > Invalidation is certainly one option. The one Alexei mentioned was > where we discussed bounding how much data can be read through the > PTR_TO_STACK (similar to PTR_TO_MEM), and mark r0 as PTR_TO_STACK. > This ends up combining the constraints of both types of pointers (it > may as well be called PTR_TO_STACK_OR_MEM) without forking paths. Yeah, PTR_TO_STACK_OR_MEM would be more precise. But how does that help with this problem? Not sure I follow the idea of the solution (but I can wait for patches to be posted). > > The benefit over the push_stack approach is that we avoid the states > regression for cls_redirect and balancer_ingress. yeah, of course, I don't particularly like the push_stack approach. > For the selftest failure, I plan to just silence the error by changing it. > > > > > > > I don't recall if there was a hardware/JIT specific reason to care > > > about stack access alignment or not (on some architectures), but > > > otherwise we can over approximately mark at 8-byte granularity for any > > > slot(s) that overlap with the buffer to cover such a case. The second > > > problem is slightly trickier, which makes me lean towards invalidating > > > returned PTR_TO_MEM when stack slot is overwritten or frame is > > > destroyed.
On Fri, 21 Feb 2025 at 01:27, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Thu, Feb 20, 2025 at 7:41 AM Kumar Kartikeya Dwivedi > <memxor@gmail.com> wrote: > > > > On Thu, 20 Feb 2025 at 01:13, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > > > > > On Wed, Feb 19, 2025 at 10:10 AM Kumar Kartikeya Dwivedi > > > <memxor@gmail.com> wrote: > > > > > > > > On Wed, 19 Feb 2025 at 18:41, Alexei Starovoitov > > > > <alexei.starovoitov@gmail.com> wrote: > > > > > > > > > > On Wed, Feb 19, 2025 at 4:51 AM Kumar Kartikeya Dwivedi > > > > > <memxor@gmail.com> wrote: > > > > > > > > > > > > For the bpf_dynptr_slice_rdwr kfunc, the verifier may return a pointer > > > > > > to the underlying packet (if the requested slice is linear), or copy out > > > > > > the data to the buffer passed into the kfunc. The verifier performs > > > > > > symbolic execution assuming the returned value is a PTR_TO_MEM of a > > > > > > certain size (passed into the kfunc), and ensures reads and writes are > > > > > > within bounds. > > > > > > > > > > sounds like > > > > > check_kfunc_mem_size_reg() -> check_mem_size_reg() -> > > > > > check_helper_mem_access() > > > > > case PTR_TO_STACK: > > > > > check_stack_range_initialized() > > > > > clobber = true > > > > > if (clobber) { > > > > > __mark_reg_unknown(env, &state->stack[spi].spilled_ptr); > > > > > > > > > > is somehow broken? > > > > > > > > > > ohh. It might be: > > > > > || !is_kfunc_arg_optional(meta->btf, buff_arg) > > > > > > > > > > This bit is wrong then. > > > > > When arg is not-null check_kfunc_mem_size_reg() should be called. > > > > > The PTR_TO_STACK abuse is a small subset of issues > > > > > if check_kfunc_mem_size_reg() is indeed not called. > > > > > > > > The condition looks ok to me. > > > > > > > > The condition to do check_mem_size_reg is !null || !opt. > > > > So when it's null, and it's opt, it will be skipped. > > > > When it's null, and it's not opt, the check will happen. > > > > When arg is not-null, the said function is called, opt does not matter then. > > > > So the stack slots are marked misc. > > > > > > > > In our case we're not passing a NULL pointer in the selftest. > > > > > > > > The problem occurs once we spill to that slot _after_ the call, and > > > > then do a write through returned mem pointer. > > > > > > > > The final few lines from the selftest do the dirty thing, where r0 is > > > > aliasing fp-8, and r1 = 0. > > > > > > > > + *(u64 *)(r10 - 8) = r8; \ > > > > + *(u64 *)(r0 + 0) = r1; \ > > > > + r8 = *(u64 *)(r10 - 8); \ > > > > + r0 = *(u64 *)(r8 + 0); \ > > > > > > > > The write through r0 must re-mark the stack, but the verifier doesn't > > > > know it's pointing to the stack. > > > > push_stack was the conceptually cleaner/simpler fix, but it apparently > > > > isn't good enough. > > > > > > > > Remarking the stack on write to PTR_TO_MEM, or invalidating PTR_TO_MEM > > > > when r8 is spilled to fp-8 first time after the call are two options. > > > > Both have some concerns (first, the misaligned stack access is not > > > > caught, second PTR_TO_MEM may outlive stack frame). > > > > > > Reading the description of the problem my first instinct was to have > > > stack slots with associated obj_ref_id for such cases and when > > > something writes into that stack slot, invalidate everything with that > > > obj_ref_id. So that's probably what you mean by invalidating > > > PTR_TO_MEM, right? > > > > > > Not sure I understand what "PTR_TO_STACK with mem_size" (that Alexei > > > mentioned in another email) means, though, so hard to compare. > > > > > > > Invalidation is certainly one option. The one Alexei mentioned was > > where we discussed bounding how much data can be read through the > > PTR_TO_STACK (similar to PTR_TO_MEM), and mark r0 as PTR_TO_STACK. > > This ends up combining the constraints of both types of pointers (it > > may as well be called PTR_TO_STACK_OR_MEM) without forking paths. > > Yeah, PTR_TO_STACK_OR_MEM would be more precise. But how does that > help with this problem? Not sure I follow the idea of the solution > (but I can wait for patches to be posted). The reason for push_stack was to ensure writes through the returned pointer take effect on stack state. By keeping it PTR_TO_STACK, we get that behavior. However, in the other path of this patch, the verifier would verify the pointer as PTR_TO_MEM, with a bounded mem_size. Thus it would not allow writing beyond a certain size. If we simply set r0 to PTR_TO_STACK now, it would possibly allow going beyond the size that was passed to kfunc. E.g. say buffer was fp-24 and size was 8, we can also modify fp-8 through it and not just fp-16. Adding an additional mem_size field and checking it (when set) for PTR_TO_STACK allows us to ensure we cannot write beyond 8 bytes through r0=fp-24. I hope this is clearer. > > > > [...]
On Thu, Feb 20, 2025 at 4:37 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote: > > On Fri, 21 Feb 2025 at 01:27, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > > > On Thu, Feb 20, 2025 at 7:41 AM Kumar Kartikeya Dwivedi > > <memxor@gmail.com> wrote: > > > > > > On Thu, 20 Feb 2025 at 01:13, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > > > > > > > On Wed, Feb 19, 2025 at 10:10 AM Kumar Kartikeya Dwivedi > > > > <memxor@gmail.com> wrote: > > > > > > > > > > On Wed, 19 Feb 2025 at 18:41, Alexei Starovoitov > > > > > <alexei.starovoitov@gmail.com> wrote: > > > > > > > > > > > > On Wed, Feb 19, 2025 at 4:51 AM Kumar Kartikeya Dwivedi > > > > > > <memxor@gmail.com> wrote: > > > > > > > > > > > > > > For the bpf_dynptr_slice_rdwr kfunc, the verifier may return a pointer > > > > > > > to the underlying packet (if the requested slice is linear), or copy out > > > > > > > the data to the buffer passed into the kfunc. The verifier performs > > > > > > > symbolic execution assuming the returned value is a PTR_TO_MEM of a > > > > > > > certain size (passed into the kfunc), and ensures reads and writes are > > > > > > > within bounds. > > > > > > > > > > > > sounds like > > > > > > check_kfunc_mem_size_reg() -> check_mem_size_reg() -> > > > > > > check_helper_mem_access() > > > > > > case PTR_TO_STACK: > > > > > > check_stack_range_initialized() > > > > > > clobber = true > > > > > > if (clobber) { > > > > > > __mark_reg_unknown(env, &state->stack[spi].spilled_ptr); > > > > > > > > > > > > is somehow broken? > > > > > > > > > > > > ohh. It might be: > > > > > > || !is_kfunc_arg_optional(meta->btf, buff_arg) > > > > > > > > > > > > This bit is wrong then. > > > > > > When arg is not-null check_kfunc_mem_size_reg() should be called. > > > > > > The PTR_TO_STACK abuse is a small subset of issues > > > > > > if check_kfunc_mem_size_reg() is indeed not called. > > > > > > > > > > The condition looks ok to me. > > > > > > > > > > The condition to do check_mem_size_reg is !null || !opt. > > > > > So when it's null, and it's opt, it will be skipped. > > > > > When it's null, and it's not opt, the check will happen. > > > > > When arg is not-null, the said function is called, opt does not matter then. > > > > > So the stack slots are marked misc. > > > > > > > > > > In our case we're not passing a NULL pointer in the selftest. > > > > > > > > > > The problem occurs once we spill to that slot _after_ the call, and > > > > > then do a write through returned mem pointer. > > > > > > > > > > The final few lines from the selftest do the dirty thing, where r0 is > > > > > aliasing fp-8, and r1 = 0. > > > > > > > > > > + *(u64 *)(r10 - 8) = r8; \ > > > > > + *(u64 *)(r0 + 0) = r1; \ > > > > > + r8 = *(u64 *)(r10 - 8); \ > > > > > + r0 = *(u64 *)(r8 + 0); \ > > > > > > > > > > The write through r0 must re-mark the stack, but the verifier doesn't > > > > > know it's pointing to the stack. > > > > > push_stack was the conceptually cleaner/simpler fix, but it apparently > > > > > isn't good enough. > > > > > > > > > > Remarking the stack on write to PTR_TO_MEM, or invalidating PTR_TO_MEM > > > > > when r8 is spilled to fp-8 first time after the call are two options. > > > > > Both have some concerns (first, the misaligned stack access is not > > > > > caught, second PTR_TO_MEM may outlive stack frame). > > > > > > > > Reading the description of the problem my first instinct was to have > > > > stack slots with associated obj_ref_id for such cases and when > > > > something writes into that stack slot, invalidate everything with that > > > > obj_ref_id. So that's probably what you mean by invalidating > > > > PTR_TO_MEM, right? > > > > > > > > Not sure I understand what "PTR_TO_STACK with mem_size" (that Alexei > > > > mentioned in another email) means, though, so hard to compare. > > > > > > > > > > Invalidation is certainly one option. The one Alexei mentioned was > > > where we discussed bounding how much data can be read through the > > > PTR_TO_STACK (similar to PTR_TO_MEM), and mark r0 as PTR_TO_STACK. > > > This ends up combining the constraints of both types of pointers (it > > > may as well be called PTR_TO_STACK_OR_MEM) without forking paths. > > > > Yeah, PTR_TO_STACK_OR_MEM would be more precise. But how does that > > help with this problem? Not sure I follow the idea of the solution > > (but I can wait for patches to be posted). > > The reason for push_stack was to ensure writes through the returned > pointer take effect on stack state. > By keeping it PTR_TO_STACK, we get that behavior. > However, in the other path of this patch, the verifier would verify > the pointer as PTR_TO_MEM, with a bounded mem_size. > Thus it would not allow writing beyond a certain size. > If we simply set r0 to PTR_TO_STACK now, it would possibly allow going > beyond the size that was passed to kfunc. > E.g. say buffer was fp-24 and size was 8, we can also modify fp-8 > through it and not just fp-16. > Adding an additional mem_size field and checking it (when set) for > PTR_TO_STACK allows us to ensure we cannot write beyond 8 bytes > through r0=fp-24. yeah, this part is clear, of course, but I was actually more worried about handling the following situation: struct bpf_dynptr dptr; void *slice; union { char buf[32]; void *map_value; } blah; bpf_dynptr_from_skb(..., &dptr); slice = bpf_dynptr_slice_rdwr(&dptr, 0, &blah.buf, sizeof(blah.buf)); if (!slice) return 0; /* whatever */ /* now slice points to blah.{buf,map_value}, right? */ map_value = bpf_map_lookup_elem(&some_map, ...); if (!map_value) return 0; /* we shouldn't allow working with slice at this point */ *(u64 *)slice = 0xdeadbeef; /* overwrite map_value */ *(u64 *)map_value = 123; /* BOOM */ Would this PTR_TO_STACK_OR_MEM approach handle this? If so, can you briefly walk me through how? Thanks! > > I hope this is clearer. > > > > > > > > [...]
On Fri, 21 Feb 2025 at 02:02, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Thu, Feb 20, 2025 at 4:37 PM Kumar Kartikeya Dwivedi > <memxor@gmail.com> wrote: > > > > On Fri, 21 Feb 2025 at 01:27, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > > > > > On Thu, Feb 20, 2025 at 7:41 AM Kumar Kartikeya Dwivedi > > > <memxor@gmail.com> wrote: > > > > > > > > On Thu, 20 Feb 2025 at 01:13, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > > > > > > > > > On Wed, Feb 19, 2025 at 10:10 AM Kumar Kartikeya Dwivedi > > > > > <memxor@gmail.com> wrote: > > > > > > > > > > > > On Wed, 19 Feb 2025 at 18:41, Alexei Starovoitov > > > > > > <alexei.starovoitov@gmail.com> wrote: > > > > > > > > > > > > > > On Wed, Feb 19, 2025 at 4:51 AM Kumar Kartikeya Dwivedi > > > > > > > <memxor@gmail.com> wrote: > > > > > > > > > > > > > > > > For the bpf_dynptr_slice_rdwr kfunc, the verifier may return a pointer > > > > > > > > to the underlying packet (if the requested slice is linear), or copy out > > > > > > > > the data to the buffer passed into the kfunc. The verifier performs > > > > > > > > symbolic execution assuming the returned value is a PTR_TO_MEM of a > > > > > > > > certain size (passed into the kfunc), and ensures reads and writes are > > > > > > > > within bounds. > > > > > > > > > > > > > > sounds like > > > > > > > check_kfunc_mem_size_reg() -> check_mem_size_reg() -> > > > > > > > check_helper_mem_access() > > > > > > > case PTR_TO_STACK: > > > > > > > check_stack_range_initialized() > > > > > > > clobber = true > > > > > > > if (clobber) { > > > > > > > __mark_reg_unknown(env, &state->stack[spi].spilled_ptr); > > > > > > > > > > > > > > is somehow broken? > > > > > > > > > > > > > > ohh. It might be: > > > > > > > || !is_kfunc_arg_optional(meta->btf, buff_arg) > > > > > > > > > > > > > > This bit is wrong then. > > > > > > > When arg is not-null check_kfunc_mem_size_reg() should be called. > > > > > > > The PTR_TO_STACK abuse is a small subset of issues > > > > > > > if check_kfunc_mem_size_reg() is indeed not called. > > > > > > > > > > > > The condition looks ok to me. > > > > > > > > > > > > The condition to do check_mem_size_reg is !null || !opt. > > > > > > So when it's null, and it's opt, it will be skipped. > > > > > > When it's null, and it's not opt, the check will happen. > > > > > > When arg is not-null, the said function is called, opt does not matter then. > > > > > > So the stack slots are marked misc. > > > > > > > > > > > > In our case we're not passing a NULL pointer in the selftest. > > > > > > > > > > > > The problem occurs once we spill to that slot _after_ the call, and > > > > > > then do a write through returned mem pointer. > > > > > > > > > > > > The final few lines from the selftest do the dirty thing, where r0 is > > > > > > aliasing fp-8, and r1 = 0. > > > > > > > > > > > > + *(u64 *)(r10 - 8) = r8; \ > > > > > > + *(u64 *)(r0 + 0) = r1; \ > > > > > > + r8 = *(u64 *)(r10 - 8); \ > > > > > > + r0 = *(u64 *)(r8 + 0); \ > > > > > > > > > > > > The write through r0 must re-mark the stack, but the verifier doesn't > > > > > > know it's pointing to the stack. > > > > > > push_stack was the conceptually cleaner/simpler fix, but it apparently > > > > > > isn't good enough. > > > > > > > > > > > > Remarking the stack on write to PTR_TO_MEM, or invalidating PTR_TO_MEM > > > > > > when r8 is spilled to fp-8 first time after the call are two options. > > > > > > Both have some concerns (first, the misaligned stack access is not > > > > > > caught, second PTR_TO_MEM may outlive stack frame). > > > > > > > > > > Reading the description of the problem my first instinct was to have > > > > > stack slots with associated obj_ref_id for such cases and when > > > > > something writes into that stack slot, invalidate everything with that > > > > > obj_ref_id. So that's probably what you mean by invalidating > > > > > PTR_TO_MEM, right? > > > > > > > > > > Not sure I understand what "PTR_TO_STACK with mem_size" (that Alexei > > > > > mentioned in another email) means, though, so hard to compare. > > > > > > > > > > > > > Invalidation is certainly one option. The one Alexei mentioned was > > > > where we discussed bounding how much data can be read through the > > > > PTR_TO_STACK (similar to PTR_TO_MEM), and mark r0 as PTR_TO_STACK. > > > > This ends up combining the constraints of both types of pointers (it > > > > may as well be called PTR_TO_STACK_OR_MEM) without forking paths. > > > > > > Yeah, PTR_TO_STACK_OR_MEM would be more precise. But how does that > > > help with this problem? Not sure I follow the idea of the solution > > > (but I can wait for patches to be posted). > > > > The reason for push_stack was to ensure writes through the returned > > pointer take effect on stack state. > > By keeping it PTR_TO_STACK, we get that behavior. > > However, in the other path of this patch, the verifier would verify > > the pointer as PTR_TO_MEM, with a bounded mem_size. > > Thus it would not allow writing beyond a certain size. > > If we simply set r0 to PTR_TO_STACK now, it would possibly allow going > > beyond the size that was passed to kfunc. > > E.g. say buffer was fp-24 and size was 8, we can also modify fp-8 > > through it and not just fp-16. > > Adding an additional mem_size field and checking it (when set) for > > PTR_TO_STACK allows us to ensure we cannot write beyond 8 bytes > > through r0=fp-24. > > yeah, this part is clear, of course, but I was actually more worried > about handling the following situation: > For simplicity, let's just forget about stack_or_mem, and assume stack pointer with this extra mem_size. I haven't written the code yet but the idea would be to limit writing to [fp-N, fp-N+mem_size) through it. A screening check before we pass through to the normal stack access handling code. > > struct bpf_dynptr dptr; > void *slice; > union { > char buf[32]; > void *map_value; > } blah; > > bpf_dynptr_from_skb(..., &dptr); At this point we mark dptr. > > slice = bpf_dynptr_slice_rdwr(&dptr, 0, &blah.buf, sizeof(blah.buf)); > if (!slice) return 0; /* whatever */ > slice is PTR_TO_STACK w/ mem_size = 32. We mark the bytes for buf as STACK_MISC at this point. > /* now slice points to blah.{buf,map_value}, right? */ > > map_value = bpf_map_lookup_elem(&some_map, ...); > if (!map_value) return 0; The 8-byte slot for map_value in these 32-bytes becomes map_value. Rest remains misc. > > /* we shouldn't allow working with slice at this point */ > *(u64 *)slice = 0xdeadbeef; /* overwrite map_value */ And because slice is PTR_TO_STACK, we will notice the overwrite of map value with scalar. > > *(u64 *)map_value = 123; /* BOOM */ > So this would fail as deref of scalar. Let me know if I missed something you were trying to point out.
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index e57b7c949860..ad57144f044c 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -329,6 +329,12 @@ struct bpf_kfunc_call_arg_meta { struct { struct btf_field *field; } arg_rbtree_root; + struct { + u32 frameno; + s32 off; + struct tnum var_off; + bool found; + } arg_stack; struct { enum bpf_dynptr_type type; u32 id; @@ -7287,6 +7293,7 @@ static int check_stack_access_within_bounds( min_off = (s64)reg->var_off.value + off; max_off = min_off + access_size; } else { + if (reg->smax_value >= BPF_MAX_VAR_OFF || reg->smin_value <= -BPF_MAX_VAR_OFF) { verbose(env, "invalid unbounded variable-offset%s stack R%d\n", @@ -13017,6 +13024,22 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_ meta->arg_constant.value = size_reg->var_off.value; } + /* We need to simulate a path where return value is the + * stack buffer passed into the kfunc, therefore store + * its metadata here. + */ + if (buff_reg->type == PTR_TO_STACK && + meta->func_id == special_kfunc_list[KF_bpf_dynptr_slice_rdwr]) { + if (meta->arg_stack.found) { + verbose(env, "verifier internal error: only one stack argument permitted\n"); + return -EFAULT; + } + meta->arg_stack.found = true; + meta->arg_stack.frameno = buff_reg->frameno; + meta->arg_stack.off = buff_reg->off; + meta->arg_stack.var_off = buff_reg->var_off; + } + /* Skip next '__sz' or '__szk' argument */ i++; break; @@ -13598,6 +13621,34 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, return err; } + /* Push a state for exploration where the returned buffer is pointing to + * the stack buffer passed into bpf_dynptr_slice_rdwr, otherwise we + * cannot see writes to the stack solely through marking it as + * PTR_TO_MEM. We don't do the same for bpf_dynptr_slice, because the + * returned pointer is MEM_RDONLY, hence cannot be used to write to the + * stack. + */ + if (!insn->off && meta.arg_stack.found && + insn->imm == special_kfunc_list[KF_bpf_dynptr_slice_rdwr]) { + struct bpf_verifier_state *branch; + struct bpf_reg_state *regs, *r0; + + branch = push_stack(env, env->insn_idx + 1, env->insn_idx, false); + if (!branch) { + verbose(env, "failed to push state to explore stack buffer in r0\n"); + return -ENOMEM; + } + + regs = branch->frame[branch->curframe]->regs; + r0 = ®s[BPF_REG_0]; + + r0->type = PTR_TO_STACK; + mark_reg_known_zero(env, regs, BPF_REG_0); + r0->frameno = meta.arg_stack.frameno; + r0->off = meta.arg_stack.off; + r0->var_off = meta.arg_stack.var_off; + } + return 0; }
For the bpf_dynptr_slice_rdwr kfunc, the verifier may return a pointer to the underlying packet (if the requested slice is linear), or copy out the data to the buffer passed into the kfunc. The verifier performs symbolic execution assuming the returned value is a PTR_TO_MEM of a certain size (passed into the kfunc), and ensures reads and writes are within bounds. A complication arises when the passed in buffer is a stack pointer. The returned pointer may be used to perform writes (unlike bpf_dynptr_slice), but the verifier will be unaware of which stack slots such writes may end up overwriting. As such, a program may end up overwriting stack data (such as spilled pointers) through such return values by accident, which may cause undefined behavior. Fix this by exploring an additional path whenever the passed in argument is a PTR_TO_STACK, and explore a path where the returned buffer is the same as this stack pointer. This allows us to ensure that the program doesn't introduce unsafe behavior when this condition is triggered at runtime. The push_stack() call is performed after kfunc processing is over, simply fixing up the return type to PTR_TO_STACK with proper frameno, off, and var_off. Fixes: 66e3a13e7c2c ("bpf: Add bpf_dynptr_slice and bpf_dynptr_slice_rdwr") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> --- kernel/bpf/verifier.c | 51 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 51 insertions(+)