diff mbox series

[RFC,v1,05/14] bpf: Implement BPF exception frame descriptor generation

Message ID 20240201042109.1150490-6-memxor@gmail.com (mailing list archive)
State RFC
Delegated to: BPF
Headers show
Series Exceptions - Resource Cleanup | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR success PR summary
netdev/tree_selection success Guessing tree name failed - patch did not apply, async
bpf/vmtest-bpf-next-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-2 success Logs for Unittests
bpf/vmtest-bpf-next-VM_Test-3 success Logs for Validate matrix.py
bpf/vmtest-bpf-next-VM_Test-5 success Logs for aarch64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-6 success Logs for aarch64-gcc / test
bpf/vmtest-bpf-next-VM_Test-7 success Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-9 success Logs for s390x-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-4 fail Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-26 success Logs for x86_64-llvm-17 / veristat
bpf/vmtest-bpf-next-VM_Test-34 success Logs for x86_64-llvm-18 / veristat
bpf/vmtest-bpf-next-VM_Test-8 fail Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for s390x-gcc / test
bpf/vmtest-bpf-next-VM_Test-11 success Logs for s390x-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-12 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-13 success Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-14 success Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-17 pending Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-22 success Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-25 pending Logs for x86_64-llvm-17 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-28 success Logs for x86_64-llvm-17 / veristat
bpf/vmtest-bpf-next-VM_Test-29 success Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-30 success Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18 and -O2 optimization
bpf/vmtest-bpf-next-VM_Test-32 pending Logs for x86_64-llvm-18 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-33 pending Logs for x86_64-llvm-18 / test (test_progs_cpuv4, false, 360) / test_progs_cpuv4 on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-36 success Logs for x86_64-llvm-18 / veristat
bpf/vmtest-bpf-next-VM_Test-15 success Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-16 success Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-18 success Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-19 success Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 success Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17 and -O2 optimization
bpf/vmtest-bpf-next-VM_Test-24 success Logs for x86_64-llvm-17 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-27 success Logs for x86_64-llvm-17 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-31 success Logs for x86_64-llvm-18 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-35 success Logs for x86_64-llvm-18 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-18

Commit Message

Kumar Kartikeya Dwivedi Feb. 1, 2024, 4:21 a.m. UTC
Introduce support to the verifier to generate a set of descriptors for
each BPF frame to describe the register and stack state, which can be
used to reason about the resources acquired by the program at that
particular program point, and subsequent patches can introduce support
to unwind a given program when it throws an exception while holding
ownership of such resources.

Descriptors generated for each frame are then tied back to the subprog
they belong to, and attached to the bpf_prog instance during the JIT
phase, with the program counter serving as a key to find the descriptor
of interest for a given subprog.

Logically, during the unwinding phase, for each frame, we will use the
program counter and bpf_prog object to figure out how we should release
acquired resources if any in the frame.

Let's study how the frame descriptor generation algorithm works.
Whenever an exception throwing instruction is encountered, thus global
subprog calls which are throw reachable, and bpf_throw kfunc, we call
gen_exception_frame_descs.

This function will start with the current frame, and explore the
registers and other objects on the current stack. We consider 8-byte
granularity as all registers spilled on the stack and objects like
dynptr and iter are 8-byte aligned. For each such stack entry, we
inspect the slot_type and figure out whether it is a spilled register or
a dynptr/iter object.

For any acquired resources on the stack, and insert entries representing
them into a frame descriptor table for the current subprog at the
current instruction index.

The same steps are repeated for registers that are callee saved, as
these would be possibly spilled on stack of one of the frames in the
call chain and would have to be located in order to be freed.

In case of registers (spilled or callee saved), we make a special
provision for register_is_null scalar values, to increase the chances of
merging frame descriptors where the only divergence is NULL in one state
being replaced with a valid pointer in another.

The next important step is the logic to merge the frame descriptors. It
is possible that the verifier reaches the same instruction index in a
program from multiple paths, and has to generate frame descriptors for
them at that program counter. In such a case, we always ensure that
after generating the frame descriptor, we attempt to "merge" it with an
existing one.

The merging rules are fairly simple except for a few caveats. First, if
the layout and type of objects on the stack and in registers is the
same, we have a successful merge. Next, in case of registers (spilled or
callee saved), we have a special where if the old entry has NULL, the
new type (non-NULL) replaces it, and if the new entry has NULL, it
satisfies the merge rules with the old entry (can be of any type).

This helps in cases where we have an optional value held in a register
or stack slot in one program path, which is replaced by the actual value
in the other program path. This can also be the case in case of
conditionals, where the verifier may see acquired references in verifier
state depending on if a condition is true (therefore, not in all of the
program paths traversing the same instruction).

To illustrate with an example, in the following program:

struct foo *p = NULL;
if (x)
	p = bpf_obj_new(typeof(*p));
if (y)
	bpf_throw(0);
if (p)
	bpf_obj_drop(p);

In such a case, bpf_throw may be reached for x == 0, y == 1 and x == 1,
y == 1, with two possible values of p. As long as both can be passed
into the release function (i.e. NULL or a valid pointer value), we can
satisfy the merge.

TODO: We need to reserve a slot for STACK_ZERO as well.
TODO: Improve the error message in case we have pointer and misc instead of zero.

Currently, we only consider resources which are modelled as acquired
references in verifier state. In particular, this excludes resources
like held spinlocks and RCU read sections. For now, both of these will
not be handled, and the verifier will continue to complain when
exceptions are thrown in their presence.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf.h                           |  27 ++
 include/linux/bpf_verifier.h                  |   2 +
 kernel/bpf/core.c                             |  13 +
 kernel/bpf/verifier.c                         | 368 ++++++++++++++++++
 .../selftests/bpf/progs/exceptions_fail.c     |   4 +-
 5 files changed, 412 insertions(+), 2 deletions(-)

Comments

Eduard Zingerman Feb. 15, 2024, 6:24 p.m. UTC | #1
On Thu, 2024-02-01 at 04:21 +0000, Kumar Kartikeya Dwivedi wrote:

Question: are there any real-life programs adapted to use exceptions
with cleanup feature? It would be interesting to see how robust
one-descriptor-per-pc is in practice, and also how it affects memory
consumption during verification.

The algorithm makes sense to me, a few comments/nits below.

[...]

> +static int find_and_merge_frame_desc(struct bpf_verifier_env *env, struct bpf_exception_frame_desc_tab *fdtab, u64 pc, struct bpf_frame_desc_reg_entry *fd)
> +{
> +	struct bpf_exception_frame_desc **descs = NULL, *desc = NULL, *p;
> +	int ret = 0;
> +
> +	for (int i = 0; i < fdtab->cnt; i++) {
> +		if (pc != fdtab->desc[i]->pc)
> +			continue;
> +		descs = &fdtab->desc[i];
> +		desc = fdtab->desc[i];
> +		break;
> +	}
> +
> +	if (!desc) {
> +		verbose(env, "frame_desc: find_and_merge: cannot find frame descriptor for pc=%llu, creating new entry\n", pc);
> +		return -ENOENT;
> +	}
> +
> +	if (fd->off < 0)
> +		goto stack;

Nit: maybe write it down as

	if (fd->off >= 0)
		return merge_frame_desc(...);

     and avoid goto?

[...]

> +static int gen_exception_frame_desc_stack_entry(struct bpf_verifier_env *env, struct bpf_func_state *frame, int stack_off)
> +{
> +	int spi = stack_off / BPF_REG_SIZE, off = -stack_off - 1;
> +	struct bpf_reg_state *reg, not_init_reg, null_reg;
> +	int slot_type, ret;
> +
> +	__mark_reg_not_init(env, &not_init_reg);
> +	__mark_reg_known_zero(&null_reg);

__mark_reg_known_zero() does not set .type field,
thus null_reg.type value is undefined.

> +
> +	slot_type = frame->stack[spi].slot_type[BPF_REG_SIZE - 1];
> +	reg = &frame->stack[spi].spilled_ptr;
> +
> +	switch (slot_type) {
> +	case STACK_SPILL:
> +		/* We skip all kinds of scalar registers, except NULL values, which consume a slot. */
> +		if (is_spilled_scalar_reg(&frame->stack[spi]) && !register_is_null(&frame->stack[spi].spilled_ptr))
> +			break;
> +		ret = gen_exception_frame_desc_reg_entry(env, reg, off, frame->frameno);
> +		if (ret < 0)
> +			return ret;
> +		break;
> +	case STACK_DYNPTR:
> +		/* Keep iterating until we find the first slot. */
> +		if (!reg->dynptr.first_slot)
> +			break;
> +		ret = gen_exception_frame_desc_dynptr_entry(env, reg, off, frame->frameno);
> +		if (ret < 0)
> +			return ret;
> +		break;
> +	case STACK_ITER:
> +		/* Keep iterating until we find the first slot. */
> +		if (!reg->ref_obj_id)
> +			break;
> +		ret = gen_exception_frame_desc_iter_entry(env, reg, off, frame->frameno);
> +		if (ret < 0)
> +			return ret;
> +		break;
> +	case STACK_MISC:
> +	case STACK_INVALID:
> +		/* Create an invalid entry for MISC and INVALID */
> +		ret = gen_exception_frame_desc_reg_entry(env, &not_init_reg, off, frame->frameno);
> +		if (ret < 0)
> +			return 0;

No tests are failing if I comment out this block.
Looking at the merge_frame_desc() logic it appears to me that fd
entries with fd->type == NOT_INIT would only be merged with other
NOT_INIT entries. What is the point of having such entries at all?

> +		break;
> +	case STACK_ZERO:
> +		reg = &null_reg;
> +		for (int i = BPF_REG_SIZE - 1; i >= 0; i--) {
> +			if (frame->stack[spi].slot_type[i] != STACK_ZERO)
> +				reg = &not_init_reg;
> +		}
> +		ret = gen_exception_frame_desc_reg_entry(env, &null_reg, off, frame->frameno);
> +		if (ret < 0)
> +			return ret;

Same here, no tests are failing if STACK_ZERO block is commented.
In general, what is the point of adding STACK_ZERO entries?
There is a logic in place to merge NULL and non-NULL entries,
but how is it different from not adding NULL entries in a first place?
find_and_merge_frame_desc() does a linear scan over bpf_exception_frame_desc->stack
and does not rely on entries being sorted by .off field.

> +		break;
> +	default:
> +		verbose(env, "verifier internal error: frame%d stack off=%d slot_type=%d missing handling for exception frame generation\n",
> +			frame->frameno, off, slot_type);
> +		return -EFAULT;
> +	}
> +	return 0;
> +}
Eduard Zingerman Feb. 16, 2024, 11:23 a.m. UTC | #2
On Thu, 2024-02-15 at 20:24 +0200, Eduard Zingerman wrote:
[...]

> > +	case STACK_MISC:
> > +	case STACK_INVALID:
> > +		/* Create an invalid entry for MISC and INVALID */
> > +		ret = gen_exception_frame_desc_reg_entry(env, &not_init_reg, off, frame->frameno);
> > +		if (ret < 0)
> > +			return 0;
> 
> No tests are failing if I comment out this block.
> Looking at the merge_frame_desc() logic it appears to me that fd
> entries with fd->type == NOT_INIT would only be merged with other
> NOT_INIT entries. What is the point of having such entries at all?

Oh, I got it, it's a mark that ensures that no merge would happen with
e.g. some resource pointer. Makes sense, sorry for the noise.

Still, I think that my STACK_ZERO remark was correct.
Kumar Kartikeya Dwivedi Feb. 16, 2024, 10:06 p.m. UTC | #3
On Fri, 16 Feb 2024 at 12:23, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2024-02-15 at 20:24 +0200, Eduard Zingerman wrote:
> [...]
>
> > > +   case STACK_MISC:
> > > +   case STACK_INVALID:
> > > +           /* Create an invalid entry for MISC and INVALID */
> > > +           ret = gen_exception_frame_desc_reg_entry(env, &not_init_reg, off, frame->frameno);
> > > +           if (ret < 0)
> > > +                   return 0;
> >
> > No tests are failing if I comment out this block.
> > Looking at the merge_frame_desc() logic it appears to me that fd
> > entries with fd->type == NOT_INIT would only be merged with other
> > NOT_INIT entries. What is the point of having such entries at all?
>
> Oh, I got it, it's a mark that ensures that no merge would happen with
> e.g. some resource pointer. Makes sense, sorry for the noise.
>
> Still, I think that my STACK_ZERO remark was correct.

I will add tests to exercise these cases, but the idea for STACK_ZERO
was to treat it as if we had a NULL value in that stack slot, thus
allowing merging with other resource pointers. Sometimes when NULL
initializing something, it can be marked as STACK_ZERO in the verifier
state. Therefore, we would prefer to treat it same as the case where a
scalar reg known to be zero is spilled to the stack (that is what we
do by using a fake struct bpf_reg_state).
Kumar Kartikeya Dwivedi Feb. 16, 2024, 10:24 p.m. UTC | #4
On Thu, 15 Feb 2024 at 19:24, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Thu, 2024-02-01 at 04:21 +0000, Kumar Kartikeya Dwivedi wrote:
>
> Question: are there any real-life programs adapted to use exceptions
> with cleanup feature? It would be interesting to see how robust
> one-descriptor-per-pc is in practice, and also how it affects memory
> consumption during verification.
>

I tried it on sched-ext schedulers with this series with some helper
macros and it worked well.
I can post stats on memory consumption with the next version (or make
it part of veristat, whichever is more desirable).

I think the downside of this approach is when you have different
resources in the same register/slot from two different paths which
then hit the same bpf_throw. Surely, it will not work in that case the
way things are set up right now. But there are ways to address that
(ranging from compiler passes that hoist the throw call to the
predecessor basic blocks of separate paths when we detect such a
pattern, to emitting path hints of resource types during JIT which
bpf_throw can pick up), but I didn't encounter any examples so far
where this came up, and when it does, mostly you can rewrite things a
bit differently to make it work.

One of the changes though that I plan to make in the next posting is
keeping duplicate entries for the same resource around when the first
frame descriptor is created for a pc. This increases the chances of
merging correctly in case the second path intersects with us in some
slots, but for the purposes of releasing resources, that intersection
is sufficient.

Like you could have the same resource in R8 and R9, the second path
may only have it in R9 and not R8, both can merge, if we erase R8 (or
any other duplicates) in the original frame desc.

Right now, we simply do a release_resource so whenever we encounter
R8, we release its ref_obj_id and mark everything sharing the same id
invalid, so we never encounter R9 in the verifier state. In the above
case, these frame descs would not match but they could have if we had
discovered R9 before R8.

If you think this is not sufficient, please let me know. It is
certainly tricky and I might be underestimating the difficulty of
getting this right.

> The algorithm makes sense to me, a few comments/nits below.
>
> [...]
>
> > +static int find_and_merge_frame_desc(struct bpf_verifier_env *env, struct bpf_exception_frame_desc_tab *fdtab, u64 pc, struct bpf_frame_desc_reg_entry *fd)
> > +{
> > +     struct bpf_exception_frame_desc **descs = NULL, *desc = NULL, *p;
> > +     int ret = 0;
> > +
> > +     for (int i = 0; i < fdtab->cnt; i++) {
> > +             if (pc != fdtab->desc[i]->pc)
> > +                     continue;
> > +             descs = &fdtab->desc[i];
> > +             desc = fdtab->desc[i];
> > +             break;
> > +     }
> > +
> > +     if (!desc) {
> > +             verbose(env, "frame_desc: find_and_merge: cannot find frame descriptor for pc=%llu, creating new entry\n", pc);
> > +             return -ENOENT;
> > +     }
> > +
> > +     if (fd->off < 0)
> > +             goto stack;
>
> Nit: maybe write it down as
>
>         if (fd->off >= 0)
>                 return merge_frame_desc(...);
>
>      and avoid goto?
>

Ack, will fix.

> [...]
>
> > +static int gen_exception_frame_desc_stack_entry(struct bpf_verifier_env *env, struct bpf_func_state *frame, int stack_off)
> > +{
> > +     int spi = stack_off / BPF_REG_SIZE, off = -stack_off - 1;
> > +     struct bpf_reg_state *reg, not_init_reg, null_reg;
> > +     int slot_type, ret;
> > +
> > +     __mark_reg_not_init(env, &not_init_reg);
> > +     __mark_reg_known_zero(&null_reg);
>
> __mark_reg_known_zero() does not set .type field,
> thus null_reg.type value is undefined.
>

Hmm, good catch, will fix it.

> > +
> > +     slot_type = frame->stack[spi].slot_type[BPF_REG_SIZE - 1];
> > +     reg = &frame->stack[spi].spilled_ptr;
> > +
> > +     switch (slot_type) {
> > +     case STACK_SPILL:
> > +             /* We skip all kinds of scalar registers, except NULL values, which consume a slot. */
> > +             if (is_spilled_scalar_reg(&frame->stack[spi]) && !register_is_null(&frame->stack[spi].spilled_ptr))
> > +                     break;
> > +             ret = gen_exception_frame_desc_reg_entry(env, reg, off, frame->frameno);
> > +             if (ret < 0)
> > +                     return ret;
> > +             break;
> > +     case STACK_DYNPTR:
> > +             /* Keep iterating until we find the first slot. */
> > +             if (!reg->dynptr.first_slot)
> > +                     break;
> > +             ret = gen_exception_frame_desc_dynptr_entry(env, reg, off, frame->frameno);
> > +             if (ret < 0)
> > +                     return ret;
> > +             break;
> > +     case STACK_ITER:
> > +             /* Keep iterating until we find the first slot. */
> > +             if (!reg->ref_obj_id)
> > +                     break;
> > +             ret = gen_exception_frame_desc_iter_entry(env, reg, off, frame->frameno);
> > +             if (ret < 0)
> > +                     return ret;
> > +             break;
> > +     case STACK_MISC:
> > +     case STACK_INVALID:
> > +             /* Create an invalid entry for MISC and INVALID */
> > +             ret = gen_exception_frame_desc_reg_entry(env, &not_init_reg, off, frame->frameno);
> > +             if (ret < 0)
> > +                     return 0;
>
> No tests are failing if I comment out this block.
> Looking at the merge_frame_desc() logic it appears to me that fd
> entries with fd->type == NOT_INIT would only be merged with other
> NOT_INIT entries. What is the point of having such entries at all?
>

Assume you figured it out based on the other email.
Basically, creating entries so that no merge can occur for the slot.

> > +             break;
> > +     case STACK_ZERO:
> > +             reg = &null_reg;
> > +             for (int i = BPF_REG_SIZE - 1; i >= 0; i--) {
> > +                     if (frame->stack[spi].slot_type[i] != STACK_ZERO)
> > +                             reg = &not_init_reg;
> > +             }
> > +             ret = gen_exception_frame_desc_reg_entry(env, &null_reg, off, frame->frameno);
> > +             if (ret < 0)
> > +                     return ret;
>
> Same here, no tests are failing if STACK_ZERO block is commented.
> In general, what is the point of adding STACK_ZERO entries?
> There is a logic in place to merge NULL and non-NULL entries,
> but how is it different from not adding NULL entries in a first place?
> find_and_merge_frame_desc() does a linear scan over bpf_exception_frame_desc->stack
> and does not rely on entries being sorted by .off field.
>

Answered on the other email as well. I will add more tests for this case.
STACK_ZERO also creates NULL entries, we just treat them the same as
NULL reg and insert an entry using a fake reg initialized to zero.
Also I think we need to pass reg, not &null_reg in this case. Will fix
this part of the code and add tests.

While it does not rely on entries being sorted, we still find the one
with the same offset.
I can sort for a binary search at runtime, I was mostly worried that
constant re-sorting during verification may end up costing more when
the number of frame descriptors is low, but I think it makes sense
atleast for runtime when the array is only searched.

> > +             break;
> > +     default:
> > +             verbose(env, "verifier internal error: frame%d stack off=%d slot_type=%d missing handling for exception frame generation\n",
> > +                     frame->frameno, off, slot_type);
> > +             return -EFAULT;
> > +     }
> > +     return 0;
> > +}
Eduard Zingerman Feb. 17, 2024, 5:14 p.m. UTC | #5
On Fri, 2024-02-16 at 23:06 +0100, Kumar Kartikeya Dwivedi wrote:
[...]

> I will add tests to exercise these cases, but the idea for STACK_ZERO
> was to treat it as if we had a NULL value in that stack slot, thus
> allowing merging with other resource pointers. Sometimes when NULL
> initializing something, it can be marked as STACK_ZERO in the verifier
> state. Therefore, we would prefer to treat it same as the case where a
> scalar reg known to be zero is spilled to the stack (that is what we
> do by using a fake struct bpf_reg_state).

Agree that it is useful to merge 0/NULL/STACK_ZERO with PTR_TO_SOMETHING.
What I meant is that merging with 0 is a noop and there is no need to
add a new fd entry. However, I missed the following check:

+ static int gen_exception_frame_desc_reg_entry(struct bpf_verifier_env *env, struct bpf_reg_state *reg, int off, int frameno)
+ {
+ 	struct bpf_frame_desc_reg_entry fd = {};
+ 
+ 	if ((!reg->ref_obj_id && reg->type != NOT_INIT) || reg->type == SCALAR_VALUE)
+ 		return 0;

So, the intent is to skip adding fd entry if register has scalar type, right?
I tried the following test case:

SEC("?tc")
__success
int test(struct __sk_buff *ctx)
{
    asm volatile (
       "r7 = *(u32 *)(r1 + 0);		\
	r1 = %[ringbuf] ll;		\
	r2 = 8;				\
	r3 = 0;				\
	r0 = 0;				\
	if r7 > 42 goto +1;		\
	call %[bpf_ringbuf_reserve];	\
	*(u64 *)(r10 - 8) = r0;		\
	r0 = 0;				\
	r1 = 0;				\
	call bpf_throw;			\
    "	:
	: __imm(bpf_ringbuf_reserve),
	  __imm_addr(ringbuf)
	: "memory");
    return 0;
}

And it adds fp[-8] entry for one branch and skips fp[-8] for the other.
However, the following test passes as well (note 'r0 = 7'):

SEC("?tc")
__success
int same_resource_many_regs(struct __sk_buff *ctx)
{
    asm volatile (
       "r7 = *(u32 *)(r1 + 0);		\
	r1 = %[ringbuf] ll;		\
	r2 = 8;				\
	r3 = 0;				\
	r0 = 7;	/* !!! */		\
	if r7 > 42 goto +1;		\
	call %[bpf_ringbuf_reserve];	\
	*(u64 *)(r10 - 8) = r0;		\
	r0 = 0;				\
	r1 = 0;				\
	call bpf_throw;			\
    "	:
	: __imm(bpf_ringbuf_reserve),
	  __imm_addr(ringbuf)
	: "memory");
    return 0;
}

Which is probably not correct, as scalar 7 would be used as a
parameter for ringbuf destructor, right?
Kumar Kartikeya Dwivedi Feb. 20, 2024, 9:58 p.m. UTC | #6
On Sat, 17 Feb 2024 at 18:14, Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Fri, 2024-02-16 at 23:06 +0100, Kumar Kartikeya Dwivedi wrote:
> [...]
>
> > I will add tests to exercise these cases, but the idea for STACK_ZERO
> > was to treat it as if we had a NULL value in that stack slot, thus
> > allowing merging with other resource pointers. Sometimes when NULL
> > initializing something, it can be marked as STACK_ZERO in the verifier
> > state. Therefore, we would prefer to treat it same as the case where a
> > scalar reg known to be zero is spilled to the stack (that is what we
> > do by using a fake struct bpf_reg_state).
>
> Agree that it is useful to merge 0/NULL/STACK_ZERO with PTR_TO_SOMETHING.
> What I meant is that merging with 0 is a noop and there is no need to
> add a new fd entry. However, I missed the following check:
>
> + static int gen_exception_frame_desc_reg_entry(struct bpf_verifier_env *env, struct bpf_reg_state *reg, int off, int frameno)
> + {
> +       struct bpf_frame_desc_reg_entry fd = {};
> +
> +       if ((!reg->ref_obj_id && reg->type != NOT_INIT) || reg->type == SCALAR_VALUE)
> +               return 0;
>
> So, the intent is to skip adding fd entry if register has scalar type, right?
> I tried the following test case:
>
> SEC("?tc")
> __success
> int test(struct __sk_buff *ctx)
> {
>     asm volatile (
>        "r7 = *(u32 *)(r1 + 0);          \
>         r1 = %[ringbuf] ll;             \
>         r2 = 8;                         \
>         r3 = 0;                         \
>         r0 = 0;                         \
>         if r7 > 42 goto +1;             \
>         call %[bpf_ringbuf_reserve];    \
>         *(u64 *)(r10 - 8) = r0;         \
>         r0 = 0;                         \
>         r1 = 0;                         \
>         call bpf_throw;                 \
>     "   :
>         : __imm(bpf_ringbuf_reserve),
>           __imm_addr(ringbuf)
>         : "memory");
>     return 0;
> }
>
> And it adds fp[-8] entry for one branch and skips fp[-8] for the other.
> However, the following test passes as well (note 'r0 = 7'):
>
> SEC("?tc")
> __success
> int same_resource_many_regs(struct __sk_buff *ctx)
> {
>     asm volatile (
>        "r7 = *(u32 *)(r1 + 0);          \
>         r1 = %[ringbuf] ll;             \
>         r2 = 8;                         \
>         r3 = 0;                         \
>         r0 = 7; /* !!! */               \
>         if r7 > 42 goto +1;             \
>         call %[bpf_ringbuf_reserve];    \
>         *(u64 *)(r10 - 8) = r0;         \
>         r0 = 0;                         \
>         r1 = 0;                         \
>         call bpf_throw;                 \
>     "   :
>         : __imm(bpf_ringbuf_reserve),
>           __imm_addr(ringbuf)
>         : "memory");
>     return 0;
> }
>
> Which is probably not correct, as scalar 7 would be used as a
> parameter for ringbuf destructor, right?

I think you are right, we probably need to create an unmergeable slot
in case we find a non-zero scalar value in the stack as well.
I will fix this and add tests as well.

Thanks a lot for your thorough reviews! Really appreciate it.
diff mbox series

Patch

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 1ebbee1d648e..463c8d22ad72 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1424,6 +1424,7 @@  struct btf_mod_pair {
 };
 
 struct bpf_kfunc_desc_tab;
+struct bpf_exception_frame_desc_tab;
 
 struct bpf_prog_aux {
 	atomic64_t refcnt;
@@ -1518,6 +1519,7 @@  struct bpf_prog_aux {
 	struct module *mod;
 	u32 num_exentries;
 	struct exception_table_entry *extable;
+	struct bpf_exception_frame_desc_tab *fdtab;
 	union {
 		struct work_struct work;
 		struct rcu_head	rcu;
@@ -3367,4 +3369,29 @@  static inline bool bpf_is_subprog(const struct bpf_prog *prog)
 	return prog->aux->func_idx != 0;
 }
 
+struct bpf_frame_desc_reg_entry {
+	u32 type;
+	s16 spill_type;
+	union {
+		s16 off;
+		u16 regno;
+	};
+	struct btf *btf;
+	u32 btf_id;
+};
+
+struct bpf_exception_frame_desc {
+	u64 pc;
+	u32 stack_cnt;
+	struct bpf_frame_desc_reg_entry regs[4];
+	struct bpf_frame_desc_reg_entry stack[];
+};
+
+struct bpf_exception_frame_desc_tab {
+	u32 cnt;
+	struct bpf_exception_frame_desc **desc;
+};
+
+void bpf_exception_frame_desc_tab_free(struct bpf_exception_frame_desc_tab *fdtab);
+
 #endif /* _LINUX_BPF_H */
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 5482701e6ad9..0113a3a940e2 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -631,6 +631,8 @@  struct bpf_subprog_info {
 
 	u8 arg_cnt;
 	struct bpf_subprog_arg_info args[MAX_BPF_FUNC_REG_ARGS];
+
+	struct bpf_exception_frame_desc_tab *fdtab;
 };
 
 struct bpf_verifier_env;
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 71c459a51d9e..995a4dcfa970 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2734,6 +2734,14 @@  static void bpf_free_used_btfs(struct bpf_prog_aux *aux)
 	kfree(aux->used_btfs);
 }
 
+void bpf_exception_frame_desc_tab_free(struct bpf_exception_frame_desc_tab *fdtab)
+{
+	if (!fdtab)
+		return;
+	for (int i = 0; i < fdtab->cnt; i++)
+		kfree(fdtab->desc[i]);
+}
+
 static void bpf_prog_free_deferred(struct work_struct *work)
 {
 	struct bpf_prog_aux *aux;
@@ -2747,6 +2755,11 @@  static void bpf_prog_free_deferred(struct work_struct *work)
 	if (aux->cgroup_atype != CGROUP_BPF_ATTACH_TYPE_INVALID)
 		bpf_cgroup_atype_put(aux->cgroup_atype);
 #endif
+	/* Free all exception frame descriptors */
+	for (int i = 0; i < aux->func_cnt; i++) {
+		bpf_exception_frame_desc_tab_free(aux->func[i]->aux->fdtab);
+		aux->func[i]->aux->fdtab = NULL;
+	}
 	bpf_free_used_maps(aux);
 	bpf_free_used_btfs(aux);
 	if (bpf_prog_is_dev_bound(aux))
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 03ad9a9d47c9..27233c308d83 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -10004,6 +10004,366 @@  record_func_key(struct bpf_verifier_env *env, struct bpf_call_arg_meta *meta,
 	return 0;
 }
 
+static void print_frame_desc_reg_entry(struct bpf_verifier_env *env, struct bpf_frame_desc_reg_entry *fd, const char *pfx)
+{
+	const char *type = fd->off < 0 ? "stack" : "reg";
+	const char *key = fd->off < 0 ? "off" : "regno";
+	const char *spill_type;
+
+	switch (fd->spill_type) {
+	case STACK_INVALID:
+		spill_type = "<unknown>";
+		break;
+	case STACK_SPILL:
+		spill_type = "reg";
+		break;
+	case STACK_DYNPTR:
+		spill_type = "dynptr";
+		break;
+	case STACK_ITER:
+		spill_type = "iter";
+		break;
+	default:
+		spill_type = "???";
+		break;
+	}
+	verbose(env, "frame_desc: %s%s fde: %s=%d spill_type=%s ", pfx, type, key, fd->off, spill_type);
+	if (fd->btf) {
+		const struct btf_type *t = btf_type_by_id(fd->btf, fd->btf_id);
+
+		verbose(env, "type=%s%s btf=%s btf_id=%d\n", fd->off < 0 ? "" : "ptr_",
+			btf_name_by_offset(fd->btf, t->name_off), btf_get_name(fd->btf), fd->btf_id);
+	} else {
+		verbose(env, "type=%s\n", fd->spill_type == STACK_DYNPTR ? "ringbuf" : reg_type_str(env, fd->type));
+	}
+}
+
+static int merge_frame_desc(struct bpf_verifier_env *env, struct bpf_frame_desc_reg_entry *ofd, struct bpf_frame_desc_reg_entry *fd)
+{
+	int ofd_type, fd_type;
+
+	/* If ofd->off/regno is 0, this is uninitialized reg entry, just merge new entry. */
+	if (!ofd->off)
+		goto merge_new;
+	/* Exact merge for dynptr, iter, reg, stack. */
+	if (!memcmp(ofd, fd, sizeof(*ofd)))
+		goto none;
+	/* First, for a successful merge, both spill_type should be same.*/
+	if (ofd->spill_type != fd->spill_type)
+		goto fail;
+	/* Then, both should correspond to a reg or stack entry for non-exact merge. */
+	if (ofd->spill_type != STACK_SPILL && ofd->spill_type != STACK_INVALID)
+		goto fail;
+	ofd_type = ofd->type;
+	fd_type = fd->type;
+	/* One of the old or new entry must be NULL, if both are not same. */
+	if (ofd_type == fd_type)
+		goto none;
+	if (ofd_type != SCALAR_VALUE && fd_type != SCALAR_VALUE)
+		goto fail;
+	if (fd_type == SCALAR_VALUE)
+		goto none;
+	verbose(env, "frame_desc: merge: merging new frame desc entry into old\n");
+	print_frame_desc_reg_entry(env, ofd, "old ");
+merge_new:
+	if (!ofd->off)
+		verbose(env, "frame_desc: merge: creating new frame desc entry\n");
+	print_frame_desc_reg_entry(env, fd, "new ");
+	*ofd = *fd;
+	return 0;
+none:
+	verbose(env, "frame_desc: merge: no merging needed of new frame desc entry into old\n");
+	print_frame_desc_reg_entry(env, ofd, "old ");
+	print_frame_desc_reg_entry(env, fd, "new ");
+	return 0;
+fail:
+	verbose(env, "frame_desc: merge: failed to merge old and new frame desc entry\n");
+	print_frame_desc_reg_entry(env, ofd, "old ");
+	print_frame_desc_reg_entry(env, fd, "new ");
+	return -EINVAL;
+}
+
+static int find_and_merge_frame_desc(struct bpf_verifier_env *env, struct bpf_exception_frame_desc_tab *fdtab, u64 pc, struct bpf_frame_desc_reg_entry *fd)
+{
+	struct bpf_exception_frame_desc **descs = NULL, *desc = NULL, *p;
+	int ret = 0;
+
+	for (int i = 0; i < fdtab->cnt; i++) {
+		if (pc != fdtab->desc[i]->pc)
+			continue;
+		descs = &fdtab->desc[i];
+		desc = fdtab->desc[i];
+		break;
+	}
+
+	if (!desc) {
+		verbose(env, "frame_desc: find_and_merge: cannot find frame descriptor for pc=%llu, creating new entry\n", pc);
+		return -ENOENT;
+	}
+
+	if (fd->off < 0)
+		goto stack;
+	/* We didn't find a match for regno or offset, fill it into the frame descriptor. */
+	return merge_frame_desc(env, &desc->regs[fd->regno - BPF_REG_6], fd);
+
+stack:
+	for (int i = 0; i < desc->stack_cnt; i++) {
+		struct bpf_frame_desc_reg_entry *ofd = desc->stack + i;
+
+		if (ofd->off != fd->off)
+			continue;
+		ret = merge_frame_desc(env, ofd, fd);
+		if (ret < 0)
+			return ret;
+		return 0;
+	}
+	p = krealloc(desc, offsetof(typeof(*desc), stack[desc->stack_cnt + 1]), GFP_USER | __GFP_ZERO);
+	if (!p) {
+		return -ENOMEM;
+	}
+	verbose(env, "frame_desc: merge: creating new frame desc entry\n");
+	print_frame_desc_reg_entry(env, fd, "new ");
+	desc = p;
+	desc->stack[desc->stack_cnt] = *fd;
+	desc->stack_cnt++;
+	*descs = desc;
+	return 0;
+}
+
+/* Implementation details:
+ * This function is responsible for pushing a prepared bpf_frame_desc_reg_entry
+ * into the frame descriptor array tied to each subprog. The first step is
+ * ensuring the array is allocated and has enough capacity. Second, we must find
+ * if there is an existing descriptor already for the program counter under
+ * consideration, and try to report an error if we see conflicting frame
+ * descriptor generation requests for the same instruction in the program.
+ * Note that by default, we let NULL registers and stack slots occupy an entry.
+ * This is done so that any future non-NULL registers or stack slots at the same
+ * regno or offset can be satisfied by changing the type of entry to a "stronger"
+ * pointer type. The release handler can deal with NULL or valid values,
+ * therefore such a logic allows handling cases where the program may only have
+ * a pointer in some of the program paths and NULL in others while reaching the
+ * same instruction that causes an exception to be thrown.
+ * Likewise, a NULL entry merges into the stronger pointer type entry when a
+ * frame descriptor already exists before pushing a new one.
+ */
+static int push_exception_frame_desc(struct bpf_verifier_env *env, int frameno, struct bpf_frame_desc_reg_entry *fd)
+{
+	struct bpf_func_state *frame = env->cur_state->frame[frameno], *curframe = cur_func(env);
+	struct bpf_subprog_info *si = subprog_info(env, frame->subprogno);
+	struct bpf_exception_frame_desc_tab *fdtab = si->fdtab;
+	struct bpf_exception_frame_desc **desc;
+	u64 pc = env->insn_idx;
+	int ret;
+
+	/* If this is not the current frame, then we need to figure out the callsite
+	 * for its callee to identify the pc.
+	 */
+	if (frameno != curframe->frameno)
+		pc = env->cur_state->frame[frameno + 1]->callsite;
+
+	if (!fdtab) {
+		fdtab = kzalloc(sizeof(*si->fdtab), GFP_USER);
+		if (!fdtab)
+			return -ENOMEM;
+		fdtab->desc = kzalloc(sizeof(*fdtab->desc), GFP_USER);
+		if (!fdtab->desc) {
+			kfree(fdtab);
+			return -ENOMEM;
+		}
+		si->fdtab = fdtab;
+	}
+
+	ret = find_and_merge_frame_desc(env, fdtab, pc, fd);
+	if (!ret)
+		return 0;
+	if (ret < 0 && ret != -ENOENT)
+		return ret;
+	/* We didn't find a frame descriptor for pc, grow the array and insert it. */
+	desc = realloc_array(fdtab->desc, fdtab->cnt ?: 1, fdtab->cnt + 1, sizeof(*fdtab->desc));
+	if (!desc) {
+		return -ENOMEM;
+	}
+	fdtab->desc = desc;
+	fdtab->desc[fdtab->cnt] = kzalloc(sizeof(*fdtab->desc[0]), GFP_USER);
+	if (!fdtab->desc[fdtab->cnt])
+		return -ENOMEM;
+	fdtab->desc[fdtab->cnt]->pc = pc;
+	fdtab->cnt++;
+	return find_and_merge_frame_desc(env, fdtab, pc, fd);
+}
+
+static int gen_exception_frame_desc_reg_entry(struct bpf_verifier_env *env, struct bpf_reg_state *reg, int off, int frameno)
+{
+	struct bpf_frame_desc_reg_entry fd = {};
+
+	if ((!reg->ref_obj_id && reg->type != NOT_INIT) || reg->type == SCALAR_VALUE)
+		return 0;
+	if (base_type(reg->type) == PTR_TO_BTF_ID)
+		fd.btf = reg->btf;
+	fd.type = reg->type & ~PTR_MAYBE_NULL;
+	fd.btf_id = fd.btf ? reg->btf_id : 0;
+	fd.spill_type = off < 0 ? STACK_SPILL : STACK_INVALID;
+	fd.off = off;
+	verbose(env, "frame_desc: frame%d: insn_idx=%d %s=%d size=%d ref_obj_id=%d type=%s\n",
+		frameno, env->insn_idx, off < 0 ? "off" : "regno", off, BPF_REG_SIZE, reg->ref_obj_id, reg_type_str(env, reg->type));
+	return push_exception_frame_desc(env, frameno, &fd);
+}
+
+static int gen_exception_frame_desc_dynptr_entry(struct bpf_verifier_env *env, struct bpf_reg_state *reg, int off, int frameno)
+{
+	struct bpf_frame_desc_reg_entry fd = {};
+	int type = reg->dynptr.type;
+
+	/* We only need to generate an entry when the dynptr is refcounted,
+	 * otherwise it encapsulates no resource that needs to be released.
+	 */
+	if (!dynptr_type_refcounted(type))
+		return 0;
+	switch (type) {
+	case BPF_DYNPTR_TYPE_RINGBUF:
+		fd.type = BPF_DYNPTR_TYPE_RINGBUF;
+		fd.spill_type = STACK_DYNPTR;
+		fd.off = off;
+		verbose(env, "frame_desc: frame%d: insn_idx=%d off=%d size=%lu dynptr_ringbuf\n", frameno, env->insn_idx, off,
+			BPF_DYNPTR_NR_SLOTS * BPF_REG_SIZE);
+		break;
+	default:
+		verbose(env, "verifier internal error: refcounted dynptr type unhandled for exception frame descriptor entry\n");
+		return -EFAULT;
+	}
+	return push_exception_frame_desc(env, frameno, &fd);
+}
+
+static int add_used_btf(struct bpf_verifier_env *env, struct btf *btf);
+
+static int gen_exception_frame_desc_iter_entry(struct bpf_verifier_env *env, struct bpf_reg_state *reg, int off, int frameno)
+{
+	struct bpf_frame_desc_reg_entry fd = {};
+	struct btf *btf = reg->iter.btf;
+	u32 btf_id = reg->iter.btf_id;
+	const struct btf_type *t;
+	int ret;
+
+	fd.btf = btf;
+	fd.type = reg->type;
+	fd.btf_id = btf_id;
+	fd.spill_type = STACK_ITER;
+	fd.off = off;
+	t = btf_type_by_id(btf, btf_id);
+	verbose(env, "frame_desc: frame%d: insn_idx=%d off=%d size=%u ref_obj_id=%d iter_%s\n",
+		frameno, env->insn_idx, off, t->size, reg->ref_obj_id, btf_name_by_offset(btf, t->name_off));
+	btf_get(btf);
+	ret = add_used_btf(env, btf);
+	if (ret < 0) {
+		btf_put(btf);
+		return ret;
+	}
+	return push_exception_frame_desc(env, frameno, &fd);
+}
+
+static int gen_exception_frame_desc_stack_entry(struct bpf_verifier_env *env, struct bpf_func_state *frame, int stack_off)
+{
+	int spi = stack_off / BPF_REG_SIZE, off = -stack_off - 1;
+	struct bpf_reg_state *reg, not_init_reg, null_reg;
+	int slot_type, ret;
+
+	__mark_reg_not_init(env, &not_init_reg);
+	__mark_reg_known_zero(&null_reg);
+
+	slot_type = frame->stack[spi].slot_type[BPF_REG_SIZE - 1];
+	reg = &frame->stack[spi].spilled_ptr;
+
+	switch (slot_type) {
+	case STACK_SPILL:
+		/* We skip all kinds of scalar registers, except NULL values, which consume a slot. */
+		if (is_spilled_scalar_reg(&frame->stack[spi]) && !register_is_null(&frame->stack[spi].spilled_ptr))
+			break;
+		ret = gen_exception_frame_desc_reg_entry(env, reg, off, frame->frameno);
+		if (ret < 0)
+			return ret;
+		break;
+	case STACK_DYNPTR:
+		/* Keep iterating until we find the first slot. */
+		if (!reg->dynptr.first_slot)
+			break;
+		ret = gen_exception_frame_desc_dynptr_entry(env, reg, off, frame->frameno);
+		if (ret < 0)
+			return ret;
+		break;
+	case STACK_ITER:
+		/* Keep iterating until we find the first slot. */
+		if (!reg->ref_obj_id)
+			break;
+		ret = gen_exception_frame_desc_iter_entry(env, reg, off, frame->frameno);
+		if (ret < 0)
+			return ret;
+		break;
+	case STACK_MISC:
+	case STACK_INVALID:
+		/* Create an invalid entry for MISC and INVALID */
+		ret = gen_exception_frame_desc_reg_entry(env, &not_init_reg, off, frame->frameno);
+		if (ret < 0)
+			return 0;
+		break;
+	case STACK_ZERO:
+		reg = &null_reg;
+		for (int i = BPF_REG_SIZE - 1; i >= 0; i--) {
+			if (frame->stack[spi].slot_type[i] != STACK_ZERO)
+				reg = &not_init_reg;
+		}
+		ret = gen_exception_frame_desc_reg_entry(env, &null_reg, off, frame->frameno);
+		if (ret < 0)
+			return ret;
+		break;
+	default:
+		verbose(env, "verifier internal error: frame%d stack off=%d slot_type=%d missing handling for exception frame generation\n",
+			frame->frameno, off, slot_type);
+		return -EFAULT;
+	}
+	return 0;
+}
+
+/* We generate exception descriptors for all frames at the current program
+ * counter.  For caller frames, we use their callsite as their program counter,
+ * and we go on generating it until the main frame.
+ *
+ * It's necessary to detect whether the stack layout is different, in that case
+ * frame descriptor generation should fail and we cannot really support runtime
+ * unwinding in that case.
+ */
+static int gen_exception_frame_descs(struct bpf_verifier_env *env)
+{
+	struct bpf_reg_state not_init_reg;
+	int ret;
+
+	__mark_reg_not_init(env, &not_init_reg);
+
+	for (int frameno = env->cur_state->curframe; frameno >= 0; frameno--) {
+		struct bpf_func_state *frame = env->cur_state->frame[frameno];
+
+		verbose(env, "frame_desc: frame%d: Stack:\n", frameno);
+		for (int i = BPF_REG_SIZE - 1; i < frame->allocated_stack; i += BPF_REG_SIZE) {
+			ret = gen_exception_frame_desc_stack_entry(env, frame, i);
+			if (ret < 0)
+				return ret;
+		}
+
+		verbose(env, "frame_desc: frame%d: Registers:\n", frameno);
+		for (int i = BPF_REG_6; i < BPF_REG_FP; i++) {
+			struct bpf_reg_state *reg = &frame->regs[i];
+
+			/* Treat havoc scalars as incompatible type. */
+			if (reg->type == SCALAR_VALUE && !register_is_null(reg))
+				reg = &not_init_reg;
+			ret = gen_exception_frame_desc_reg_entry(env, reg, i, frame->frameno);
+			if (ret < 0)
+				return ret;
+		}
+	}
+	return 0;
+}
+
 static int check_reference_leak(struct bpf_verifier_env *env, bool exception_exit)
 {
 	struct bpf_func_state *state = cur_func(env);
@@ -17694,12 +18054,18 @@  static int do_check(struct bpf_verifier_env *env)
 					err = check_func_call(env, insn, &env->insn_idx);
 					if (!err && env->cur_state->global_subprog_call_exception) {
 						env->cur_state->global_subprog_call_exception = false;
+						err = gen_exception_frame_descs(env);
+						if (err < 0)
+							return err;
 						exception_exit = true;
 						goto process_bpf_exit_full;
 					}
 				} else if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) {
 					err = check_kfunc_call(env, insn, &env->insn_idx);
 					if (!err && is_bpf_throw_kfunc(insn)) {
+						err = gen_exception_frame_descs(env);
+						if (err < 0)
+							return err;
 						exception_exit = true;
 						goto process_bpf_exit_full;
 					}
@@ -21184,6 +21550,8 @@  int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
 		mutex_unlock(&bpf_verifier_lock);
 	vfree(env->insn_aux_data);
 err_free_env:
+	for (int i = 0; i < env->subprog_cnt; i++)
+		bpf_exception_frame_desc_tab_free(env->subprog_info[i].fdtab);
 	kfree(env);
 	return ret;
 }
diff --git a/tools/testing/selftests/bpf/progs/exceptions_fail.c b/tools/testing/selftests/bpf/progs/exceptions_fail.c
index 28602f905d7d..5a517065b4e6 100644
--- a/tools/testing/selftests/bpf/progs/exceptions_fail.c
+++ b/tools/testing/selftests/bpf/progs/exceptions_fail.c
@@ -213,7 +213,7 @@  __noinline static int subprog_cb_ref(u32 i, void *ctx)
 }
 
 SEC("?tc")
-__failure __msg("Unreleased reference")
+__failure __msg("cannot be called from callback subprog 1")
 int reject_with_cb_reference(void *ctx)
 {
 	struct foo *f;
@@ -235,7 +235,7 @@  int reject_with_cb(void *ctx)
 }
 
 SEC("?tc")
-__failure __msg("Unreleased reference")
+__success
 int reject_with_subprog_reference(void *ctx)
 {
 	return subprog_ref(ctx) + 1;