diff mbox series

[RFC,bpf-next,v1,4/9] bpf: Handle throwing BPF callbacks in helpers and kfuncs

Message ID 20230405004239.1375399-5-memxor@gmail.com (mailing list archive)
State RFC
Delegated to: BPF
Headers show
Series Exceptions - 1/2 | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-4 pending Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-7 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for aarch64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-5 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-6 success Logs for build for x86_64 with llvm-16
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 135 this patch: 135
netdev/cc_maintainers warning 8 maintainers not CCed: song@kernel.org sdf@google.com haoluo@google.com yhs@fb.com john.fastabend@gmail.com kpsingh@kernel.org jolsa@kernel.org martin.lau@linux.dev
netdev/build_clang success Errors and warnings before: 32 this patch: 32
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 135 this patch: 135
netdev/checkpatch warning CHECK: multiple assignments should be avoided WARNING: line length of 100 exceeds 80 columns WARNING: line length of 104 exceeds 80 columns WARNING: line length of 82 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns WARNING: line length of 85 exceeds 80 columns WARNING: line length of 87 exceeds 80 columns WARNING: line length of 90 exceeds 80 columns WARNING: line length of 91 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Kumar Kartikeya Dwivedi April 5, 2023, 12:42 a.m. UTC
Enable handling of callbacks that throw within helpers and kfuncs taking
them. The problem with helpers is that exception state can be
manipulated by a tracing program which is invoked between the checking
of exception after calling callback and returning back to the program.
Hence, helpers always return -EJUKEBOX whenever they detect an
exception. Only 1 kfunc takes a callback (bpf_rbtree_add), so it is
updated to be notrace to avoid this pitfall.

This allows us to use bpf_get_exception to detect the thrown case, and
in case we miss exception event we use return code to unwind.

TODO: It might be possible to simply check return code in case of
helpers or kfuncs taking callbacks and check_helper_ret_code = true, and
not bother with exception state. This should lead to less code being
generated per-callsite. For all other cases, we can rely on
bpf_get_exception, and ensure that helper/kfunc uses notrace to avoid
invocation of tracing programs that clobber exception state on return
path. But make this change in v2 after ensuring current->bpf_exception_thrown
approach is acceptable.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 include/linux/bpf_verifier.h |   1 +
 kernel/bpf/arraymap.c        |   4 +-
 kernel/bpf/bpf_iter.c        |   2 +
 kernel/bpf/hashtab.c         |   4 +-
 kernel/bpf/helpers.c         |  18 +++--
 kernel/bpf/ringbuf.c         |   4 ++
 kernel/bpf/task_iter.c       |   2 +
 kernel/bpf/verifier.c        | 129 +++++++++++++++++++++++++++++++++++
 8 files changed, 156 insertions(+), 8 deletions(-)

Comments

Alexei Starovoitov April 6, 2023, 2:21 a.m. UTC | #1
On Wed, Apr 05, 2023 at 02:42:34AM +0200, Kumar Kartikeya Dwivedi wrote:
> @@ -759,6 +759,8 @@ BPF_CALL_4(bpf_loop, u32, nr_loops, void *, callback_fn, void *, callback_ctx,
>  
>  	for (i = 0; i < nr_loops; i++) {
>  		ret = callback((u64)i, (u64)(long)callback_ctx, 0, 0, 0);
> +		if (bpf_get_exception())
> +			return -EJUKEBOX;

This is too slow.
We cannot afford a call and conditional here.
Some time ago folks tried bpf_loop() and went back to bounded loop, because
the overhead of indirect call was not acceptable.
After that we've added inlining of bpf_loop() to make overhead to the minimum.
With prog->aux->exception[] approach it might be ok-ish,
but my preference would be to disallow throw in callbacks.
timer cb, rbtree_add cb are typically small.
bpf_loop cb can be big, but we have open coded iterators now.
So disabling asserts in cb-s is probably acceptable trade-off.

The choice of error name is odd, tbh.
Kumar Kartikeya Dwivedi April 7, 2023, 12:07 a.m. UTC | #2
On Thu, Apr 06, 2023 at 04:21:39AM CEST, Alexei Starovoitov wrote:
> On Wed, Apr 05, 2023 at 02:42:34AM +0200, Kumar Kartikeya Dwivedi wrote:
> > @@ -759,6 +759,8 @@ BPF_CALL_4(bpf_loop, u32, nr_loops, void *, callback_fn, void *, callback_ctx,
> >
> >  	for (i = 0; i < nr_loops; i++) {
> >  		ret = callback((u64)i, (u64)(long)callback_ctx, 0, 0, 0);
> > +		if (bpf_get_exception())
> > +			return -EJUKEBOX;
>
> This is too slow.
> We cannot afford a call and conditional here.

There are two more options here: have two variants, one with and without the
check (always_inline template and bpf_loop vs bpf_loop_except calling functions
which pass false/true) and dispatch to the appropriate one based on if callback
throws or not (so the cost is not paid for current users at all). Secondly, we
can avoid repeated calls by hoisting the call out and save the pointer to
exception state, then it's a bit less costly.

> Some time ago folks tried bpf_loop() and went back to bounded loop, because
> the overhead of indirect call was not acceptable.
> After that we've added inlining of bpf_loop() to make overhead to the minimum.
> With prog->aux->exception[] approach it might be ok-ish,
> but my preference would be to disallow throw in callbacks.
> timer cb, rbtree_add cb are typically small.
> bpf_loop cb can be big, but we have open coded iterators now.
> So disabling asserts in cb-s is probably acceptable trade-off.

If the only reason to avoid them is the added performance cost, we can work
towards eliminating that when bpf_throw is not used (see above). I agree that
supporting it everywhere means thinking about a lot more corner cases, but I
feel it would be less surprising if doing bpf_assert simply worked everywhere.
One of the other reasons is that if it's being used within a shared static
function that both main program and callbacks call into, it will be a bit
annoying that it doesn't work in one context.

>
> The choice of error name is odd, tbh.

I was trying to choose something that helpers will never pass back to the
program to avoid conflicts. Open to other suggestions (but it may not matter
depending on the discussion above).
Alexei Starovoitov April 7, 2023, 2:15 a.m. UTC | #3
On Fri, Apr 07, 2023 at 02:07:06AM +0200, Kumar Kartikeya Dwivedi wrote:
> On Thu, Apr 06, 2023 at 04:21:39AM CEST, Alexei Starovoitov wrote:
> > On Wed, Apr 05, 2023 at 02:42:34AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > @@ -759,6 +759,8 @@ BPF_CALL_4(bpf_loop, u32, nr_loops, void *, callback_fn, void *, callback_ctx,
> > >
> > >  	for (i = 0; i < nr_loops; i++) {
> > >  		ret = callback((u64)i, (u64)(long)callback_ctx, 0, 0, 0);
> > > +		if (bpf_get_exception())
> > > +			return -EJUKEBOX;
> >
> > This is too slow.
> > We cannot afford a call and conditional here.
> 
> There are two more options here: have two variants, one with and without the
> check (always_inline template and bpf_loop vs bpf_loop_except calling functions
> which pass false/true) and dispatch to the appropriate one based on if callback
> throws or not (so the cost is not paid for current users at all). Secondly, we
> can avoid repeated calls by hoisting the call out and save the pointer to
> exception state, then it's a bit less costly.
> 
> > Some time ago folks tried bpf_loop() and went back to bounded loop, because
> > the overhead of indirect call was not acceptable.
> > After that we've added inlining of bpf_loop() to make overhead to the minimum.
> > With prog->aux->exception[] approach it might be ok-ish,
> > but my preference would be to disallow throw in callbacks.
> > timer cb, rbtree_add cb are typically small.
> > bpf_loop cb can be big, but we have open coded iterators now.
> > So disabling asserts in cb-s is probably acceptable trade-off.
> 
> If the only reason to avoid them is the added performance cost, we can work
> towards eliminating that when bpf_throw is not used (see above). I agree that
> supporting it everywhere means thinking about a lot more corner cases, but I
> feel it would be less surprising if doing bpf_assert simply worked everywhere.
> One of the other reasons is that if it's being used within a shared static
> function that both main program and callbacks call into, it will be a bit
> annoying that it doesn't work in one context.

I hope with open coded iterators the only use case for callbacks will be timers,
exception cb and rbtree_add. All three are special cases.
There is nothing to unwind in the timer case.
Certinaly not allowed to rethrow in exception cb.
less-like rbtree_add should be tiny. And it's gotta to be fast.
Kumar Kartikeya Dwivedi April 7, 2023, 2:57 a.m. UTC | #4
On Fri, Apr 07, 2023 at 04:15:19AM CEST, Alexei Starovoitov wrote:
> On Fri, Apr 07, 2023 at 02:07:06AM +0200, Kumar Kartikeya Dwivedi wrote:
> > On Thu, Apr 06, 2023 at 04:21:39AM CEST, Alexei Starovoitov wrote:
> > > On Wed, Apr 05, 2023 at 02:42:34AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > > @@ -759,6 +759,8 @@ BPF_CALL_4(bpf_loop, u32, nr_loops, void *, callback_fn, void *, callback_ctx,
> > > >
> > > >  	for (i = 0; i < nr_loops; i++) {
> > > >  		ret = callback((u64)i, (u64)(long)callback_ctx, 0, 0, 0);
> > > > +		if (bpf_get_exception())
> > > > +			return -EJUKEBOX;
> > >
> > > This is too slow.
> > > We cannot afford a call and conditional here.
> >
> > There are two more options here: have two variants, one with and without the
> > check (always_inline template and bpf_loop vs bpf_loop_except calling functions
> > which pass false/true) and dispatch to the appropriate one based on if callback
> > throws or not (so the cost is not paid for current users at all). Secondly, we
> > can avoid repeated calls by hoisting the call out and save the pointer to
> > exception state, then it's a bit less costly.
> >
> > > Some time ago folks tried bpf_loop() and went back to bounded loop, because
> > > the overhead of indirect call was not acceptable.
> > > After that we've added inlining of bpf_loop() to make overhead to the minimum.
> > > With prog->aux->exception[] approach it might be ok-ish,
> > > but my preference would be to disallow throw in callbacks.
> > > timer cb, rbtree_add cb are typically small.
> > > bpf_loop cb can be big, but we have open coded iterators now.
> > > So disabling asserts in cb-s is probably acceptable trade-off.
> >
> > If the only reason to avoid them is the added performance cost, we can work
> > towards eliminating that when bpf_throw is not used (see above). I agree that
> > supporting it everywhere means thinking about a lot more corner cases, but I
> > feel it would be less surprising if doing bpf_assert simply worked everywhere.
> > One of the other reasons is that if it's being used within a shared static
> > function that both main program and callbacks call into, it will be a bit
> > annoying that it doesn't work in one context.
>
> I hope with open coded iterators the only use case for callbacks will be timers,
> exception cb and rbtree_add. All three are special cases.

There's also bpf_for_each_map_elem, bpf_find_vma, bpf_user_ringbuf_drain.

> There is nothing to unwind in the timer case.
> Certinaly not allowed to rethrow in exception cb.
> less-like rbtree_add should be tiny. And it's gotta to be fast.

I agree for some of the cases above they do not matter too much. The main one
was bpf_loop, and the ones I listed (like for_each_map) seem to be those where
people may do siginificant work in the callback.

I think we can make it zero cost for programs that don't use it (even for the
kernel code), I was just hoping to be able to support it everywhere as a generic
helper to abort program execution without any special corner cases during usage.
The other main point was about code sharing of functions which makes use of
assertions. But I'm ok with dropping support for callbacks if you think it's not
worth it in the end.
Alexei Starovoitov April 12, 2023, 7:43 p.m. UTC | #5
On Fri, Apr 07, 2023 at 04:57:48AM +0200, Kumar Kartikeya Dwivedi wrote:
> On Fri, Apr 07, 2023 at 04:15:19AM CEST, Alexei Starovoitov wrote:
> > On Fri, Apr 07, 2023 at 02:07:06AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > On Thu, Apr 06, 2023 at 04:21:39AM CEST, Alexei Starovoitov wrote:
> > > > On Wed, Apr 05, 2023 at 02:42:34AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > > > @@ -759,6 +759,8 @@ BPF_CALL_4(bpf_loop, u32, nr_loops, void *, callback_fn, void *, callback_ctx,
> > > > >
> > > > >  	for (i = 0; i < nr_loops; i++) {
> > > > >  		ret = callback((u64)i, (u64)(long)callback_ctx, 0, 0, 0);
> > > > > +		if (bpf_get_exception())
> > > > > +			return -EJUKEBOX;
> > > >
> > > > This is too slow.
> > > > We cannot afford a call and conditional here.
> > >
> > > There are two more options here: have two variants, one with and without the
> > > check (always_inline template and bpf_loop vs bpf_loop_except calling functions
> > > which pass false/true) and dispatch to the appropriate one based on if callback
> > > throws or not (so the cost is not paid for current users at all). Secondly, we
> > > can avoid repeated calls by hoisting the call out and save the pointer to
> > > exception state, then it's a bit less costly.
> > >
> > > > Some time ago folks tried bpf_loop() and went back to bounded loop, because
> > > > the overhead of indirect call was not acceptable.
> > > > After that we've added inlining of bpf_loop() to make overhead to the minimum.
> > > > With prog->aux->exception[] approach it might be ok-ish,
> > > > but my preference would be to disallow throw in callbacks.
> > > > timer cb, rbtree_add cb are typically small.
> > > > bpf_loop cb can be big, but we have open coded iterators now.
> > > > So disabling asserts in cb-s is probably acceptable trade-off.
> > >
> > > If the only reason to avoid them is the added performance cost, we can work
> > > towards eliminating that when bpf_throw is not used (see above). I agree that
> > > supporting it everywhere means thinking about a lot more corner cases, but I
> > > feel it would be less surprising if doing bpf_assert simply worked everywhere.
> > > One of the other reasons is that if it's being used within a shared static
> > > function that both main program and callbacks call into, it will be a bit
> > > annoying that it doesn't work in one context.
> >
> > I hope with open coded iterators the only use case for callbacks will be timers,
> > exception cb and rbtree_add. All three are special cases.
> 
> There's also bpf_for_each_map_elem, bpf_find_vma, bpf_user_ringbuf_drain.
> 
> > There is nothing to unwind in the timer case.
> > Certinaly not allowed to rethrow in exception cb.
> > less-like rbtree_add should be tiny. And it's gotta to be fast.
> 
> I agree for some of the cases above they do not matter too much. The main one
> was bpf_loop, and the ones I listed (like for_each_map) seem to be those where
> people may do siginificant work in the callback.
> 
> I think we can make it zero cost for programs that don't use it (even for the
> kernel code), I was just hoping to be able to support it everywhere as a generic
> helper to abort program execution without any special corner cases during usage.
> The other main point was about code sharing of functions which makes use of
> assertions. But I'm ok with dropping support for callbacks if you think it's not
> worth it in the end.

I think the unexpected run-time slowdown due to checks is a bigger problem.
Since bpf_assert is a 'bad program detector' it should be as zero cost to run-time
as possible. Hence I'm advocating for single load+jmp approach of prog->aux->exception.
The run-time matter more than ability to use assert in all conditions.
I think we'd need two flavors of bpf_assert() asm macros: with and without bpf_throw().
They seem to be useful without actual throw. They will help the verifier understand
the ranges of variables in both cases. The macros are the 99% of the feature.
The actual throw mechanism is 1%. The unwinding and release of resource is a cost
of using macros without explicit control flow in the bpf programs.
The macros without throw might be good enough in many cases.
Kumar Kartikeya Dwivedi April 13, 2023, 5:13 p.m. UTC | #6
On Wed, Apr 12, 2023 at 09:43:06PM CEST, Alexei Starovoitov wrote:
> On Fri, Apr 07, 2023 at 04:57:48AM +0200, Kumar Kartikeya Dwivedi wrote:
> > On Fri, Apr 07, 2023 at 04:15:19AM CEST, Alexei Starovoitov wrote:
> > > On Fri, Apr 07, 2023 at 02:07:06AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > > On Thu, Apr 06, 2023 at 04:21:39AM CEST, Alexei Starovoitov wrote:
> > > > > On Wed, Apr 05, 2023 at 02:42:34AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > > > > @@ -759,6 +759,8 @@ BPF_CALL_4(bpf_loop, u32, nr_loops, void *, callback_fn, void *, callback_ctx,
> > > > > >
> > > > > >  	for (i = 0; i < nr_loops; i++) {
> > > > > >  		ret = callback((u64)i, (u64)(long)callback_ctx, 0, 0, 0);
> > > > > > +		if (bpf_get_exception())
> > > > > > +			return -EJUKEBOX;
> > > > >
> > > > > This is too slow.
> > > > > We cannot afford a call and conditional here.
> > > >
> > > > There are two more options here: have two variants, one with and without the
> > > > check (always_inline template and bpf_loop vs bpf_loop_except calling functions
> > > > which pass false/true) and dispatch to the appropriate one based on if callback
> > > > throws or not (so the cost is not paid for current users at all). Secondly, we
> > > > can avoid repeated calls by hoisting the call out and save the pointer to
> > > > exception state, then it's a bit less costly.
> > > >
> > > > > Some time ago folks tried bpf_loop() and went back to bounded loop, because
> > > > > the overhead of indirect call was not acceptable.
> > > > > After that we've added inlining of bpf_loop() to make overhead to the minimum.
> > > > > With prog->aux->exception[] approach it might be ok-ish,
> > > > > but my preference would be to disallow throw in callbacks.
> > > > > timer cb, rbtree_add cb are typically small.
> > > > > bpf_loop cb can be big, but we have open coded iterators now.
> > > > > So disabling asserts in cb-s is probably acceptable trade-off.
> > > >
> > > > If the only reason to avoid them is the added performance cost, we can work
> > > > towards eliminating that when bpf_throw is not used (see above). I agree that
> > > > supporting it everywhere means thinking about a lot more corner cases, but I
> > > > feel it would be less surprising if doing bpf_assert simply worked everywhere.
> > > > One of the other reasons is that if it's being used within a shared static
> > > > function that both main program and callbacks call into, it will be a bit
> > > > annoying that it doesn't work in one context.
> > >
> > > I hope with open coded iterators the only use case for callbacks will be timers,
> > > exception cb and rbtree_add. All three are special cases.
> >
> > There's also bpf_for_each_map_elem, bpf_find_vma, bpf_user_ringbuf_drain.
> >
> > > There is nothing to unwind in the timer case.
> > > Certinaly not allowed to rethrow in exception cb.
> > > less-like rbtree_add should be tiny. And it's gotta to be fast.
> >
> > I agree for some of the cases above they do not matter too much. The main one
> > was bpf_loop, and the ones I listed (like for_each_map) seem to be those where
> > people may do siginificant work in the callback.
> >
> > I think we can make it zero cost for programs that don't use it (even for the
> > kernel code), I was just hoping to be able to support it everywhere as a generic
> > helper to abort program execution without any special corner cases during usage.
> > The other main point was about code sharing of functions which makes use of
> > assertions. But I'm ok with dropping support for callbacks if you think it's not
> > worth it in the end.
>
> I think the unexpected run-time slowdown due to checks is a bigger problem.
> Since bpf_assert is a 'bad program detector' it should be as zero cost to run-time
> as possible. Hence I'm advocating for single load+jmp approach of prog->aux->exception.

I 100% agree, slow down is a big problem and a downside of the current version.
They need to be as lightweight as possible. It's too costly right now in this
set.

> The run-time matter more than ability to use assert in all conditions. I think
> we'd need two flavors of bpf_assert() asm macros: with and without
> bpf_throw(). They seem to be useful without actual throw. They will help the
> verifier understand the ranges of variables in both cases. The macros are the
> 99% of the feature. The actual throw mechanism is 1%. The unwinding and
> release of resource is a cost of using macros without explicit control flow in
> the bpf programs. The macros without throw might be good enough in many cases.

Ok, I'll update to allow for both variants. But just to confirm, do you want to
shelve automatic cleanup (part 2 for now), or want to add it? I'm not clear on
whether you agree or disagree it's necessary.

I'll try the prog->aux->exception route, but I must say that we're sort of
trading off simpler semantics/behavior for speedup in that case (which is
necessary, but it is a cost IMO). I'll post a version using that, but I'll also
add comparisons with the variant where we spill ptr to exception state and
load+jmp. It's an extra instruction but I will try to benchmark and see how much
difference it causes in practice (probably over the XDP benchmark using such
exceptions, since that's one of the most important performance-critical use
cases). If you agree, let's double down on whatever approach we choose after
analysing the difference?
Alexei Starovoitov April 13, 2023, 11:41 p.m. UTC | #7
On Thu, Apr 13, 2023 at 07:13:22PM +0200, Kumar Kartikeya Dwivedi wrote:
> On Wed, Apr 12, 2023 at 09:43:06PM CEST, Alexei Starovoitov wrote:
> > On Fri, Apr 07, 2023 at 04:57:48AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > On Fri, Apr 07, 2023 at 04:15:19AM CEST, Alexei Starovoitov wrote:
> > > > On Fri, Apr 07, 2023 at 02:07:06AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > > > On Thu, Apr 06, 2023 at 04:21:39AM CEST, Alexei Starovoitov wrote:
> > > > > > On Wed, Apr 05, 2023 at 02:42:34AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > > > > > @@ -759,6 +759,8 @@ BPF_CALL_4(bpf_loop, u32, nr_loops, void *, callback_fn, void *, callback_ctx,
> > > > > > >
> > > > > > >  	for (i = 0; i < nr_loops; i++) {
> > > > > > >  		ret = callback((u64)i, (u64)(long)callback_ctx, 0, 0, 0);
> > > > > > > +		if (bpf_get_exception())
> > > > > > > +			return -EJUKEBOX;
> > > > > >
> > > > > > This is too slow.
> > > > > > We cannot afford a call and conditional here.
> > > > >
> > > > > There are two more options here: have two variants, one with and without the
> > > > > check (always_inline template and bpf_loop vs bpf_loop_except calling functions
> > > > > which pass false/true) and dispatch to the appropriate one based on if callback
> > > > > throws or not (so the cost is not paid for current users at all). Secondly, we
> > > > > can avoid repeated calls by hoisting the call out and save the pointer to
> > > > > exception state, then it's a bit less costly.
> > > > >
> > > > > > Some time ago folks tried bpf_loop() and went back to bounded loop, because
> > > > > > the overhead of indirect call was not acceptable.
> > > > > > After that we've added inlining of bpf_loop() to make overhead to the minimum.
> > > > > > With prog->aux->exception[] approach it might be ok-ish,
> > > > > > but my preference would be to disallow throw in callbacks.
> > > > > > timer cb, rbtree_add cb are typically small.
> > > > > > bpf_loop cb can be big, but we have open coded iterators now.
> > > > > > So disabling asserts in cb-s is probably acceptable trade-off.
> > > > >
> > > > > If the only reason to avoid them is the added performance cost, we can work
> > > > > towards eliminating that when bpf_throw is not used (see above). I agree that
> > > > > supporting it everywhere means thinking about a lot more corner cases, but I
> > > > > feel it would be less surprising if doing bpf_assert simply worked everywhere.
> > > > > One of the other reasons is that if it's being used within a shared static
> > > > > function that both main program and callbacks call into, it will be a bit
> > > > > annoying that it doesn't work in one context.
> > > >
> > > > I hope with open coded iterators the only use case for callbacks will be timers,
> > > > exception cb and rbtree_add. All three are special cases.
> > >
> > > There's also bpf_for_each_map_elem, bpf_find_vma, bpf_user_ringbuf_drain.
> > >
> > > > There is nothing to unwind in the timer case.
> > > > Certinaly not allowed to rethrow in exception cb.
> > > > less-like rbtree_add should be tiny. And it's gotta to be fast.
> > >
> > > I agree for some of the cases above they do not matter too much. The main one
> > > was bpf_loop, and the ones I listed (like for_each_map) seem to be those where
> > > people may do siginificant work in the callback.
> > >
> > > I think we can make it zero cost for programs that don't use it (even for the
> > > kernel code), I was just hoping to be able to support it everywhere as a generic
> > > helper to abort program execution without any special corner cases during usage.
> > > The other main point was about code sharing of functions which makes use of
> > > assertions. But I'm ok with dropping support for callbacks if you think it's not
> > > worth it in the end.
> >
> > I think the unexpected run-time slowdown due to checks is a bigger problem.
> > Since bpf_assert is a 'bad program detector' it should be as zero cost to run-time
> > as possible. Hence I'm advocating for single load+jmp approach of prog->aux->exception.
> 
> I 100% agree, slow down is a big problem and a downside of the current version.
> They need to be as lightweight as possible. It's too costly right now in this
> set.
> 
> > The run-time matter more than ability to use assert in all conditions. I think
> > we'd need two flavors of bpf_assert() asm macros: with and without
> > bpf_throw(). They seem to be useful without actual throw. They will help the
> > verifier understand the ranges of variables in both cases. The macros are the
> > 99% of the feature. The actual throw mechanism is 1%. The unwinding and
> > release of resource is a cost of using macros without explicit control flow in
> > the bpf programs. The macros without throw might be good enough in many cases.
> 
> Ok, I'll update to allow for both variants. But just to confirm, do you want to
> shelve automatic cleanup (part 2 for now), or want to add it? I'm not clear on
> whether you agree or disagree it's necessary.
> 
> I'll try the prog->aux->exception route, but I must say that we're sort of
> trading off simpler semantics/behavior for speedup in that case (which is
> necessary, but it is a cost IMO). I'll post a version using that, but I'll also
> add comparisons with the variant where we spill ptr to exception state and
> load+jmp. It's an extra instruction but I will try to benchmark and see how much
> difference it causes in practice (probably over the XDP benchmark using such
> exceptions, since that's one of the most important performance-critical use
> cases). If you agree, let's double down on whatever approach we choose after
> analysing the difference?

I think performance considerations dominate implementation and ease of use.
Could you describe how 'spill to exception state' will look like?
I think the check after bpf_call insn has to be no more than LD + JMP.
I was thinking whether we can do static_key like patching of the code.
bpf_throw will know all locations that should be converted from nop into check
and will do text_poke_bp before throwing.
Maybe we can consider offline unwind and release too. The verifier will prep
release tables and throw will execute them. BPF progs always have frame pointers,
so walking the stack back is relatively easy. Release per callsite is hard.

As far as benchmarking I'd use selftests/bpf/bench. No need for real network XDP.
Kumar Kartikeya Dwivedi April 16, 2023, 4:45 a.m. UTC | #8
On Fri, Apr 14, 2023 at 01:41:52AM CEST, Alexei Starovoitov wrote:
> On Thu, Apr 13, 2023 at 07:13:22PM +0200, Kumar Kartikeya Dwivedi wrote:
> > On Wed, Apr 12, 2023 at 09:43:06PM CEST, Alexei Starovoitov wrote:
> > > On Fri, Apr 07, 2023 at 04:57:48AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > > On Fri, Apr 07, 2023 at 04:15:19AM CEST, Alexei Starovoitov wrote:
> > > > > On Fri, Apr 07, 2023 at 02:07:06AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > > > > On Thu, Apr 06, 2023 at 04:21:39AM CEST, Alexei Starovoitov wrote:
> > > > > > > On Wed, Apr 05, 2023 at 02:42:34AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > > > > > > @@ -759,6 +759,8 @@ BPF_CALL_4(bpf_loop, u32, nr_loops, void *, callback_fn, void *, callback_ctx,
> > > > > > > >
> > > > > > > >  	for (i = 0; i < nr_loops; i++) {
> > > > > > > >  		ret = callback((u64)i, (u64)(long)callback_ctx, 0, 0, 0);
> > > > > > > > +		if (bpf_get_exception())
> > > > > > > > +			return -EJUKEBOX;
> > > > > > >
> > > > > > > This is too slow.
> > > > > > > We cannot afford a call and conditional here.
> > > > > >
> > > > > > There are two more options here: have two variants, one with and without the
> > > > > > check (always_inline template and bpf_loop vs bpf_loop_except calling functions
> > > > > > which pass false/true) and dispatch to the appropriate one based on if callback
> > > > > > throws or not (so the cost is not paid for current users at all). Secondly, we
> > > > > > can avoid repeated calls by hoisting the call out and save the pointer to
> > > > > > exception state, then it's a bit less costly.
> > > > > >
> > > > > > > Some time ago folks tried bpf_loop() and went back to bounded loop, because
> > > > > > > the overhead of indirect call was not acceptable.
> > > > > > > After that we've added inlining of bpf_loop() to make overhead to the minimum.
> > > > > > > With prog->aux->exception[] approach it might be ok-ish,
> > > > > > > but my preference would be to disallow throw in callbacks.
> > > > > > > timer cb, rbtree_add cb are typically small.
> > > > > > > bpf_loop cb can be big, but we have open coded iterators now.
> > > > > > > So disabling asserts in cb-s is probably acceptable trade-off.
> > > > > >
> > > > > > If the only reason to avoid them is the added performance cost, we can work
> > > > > > towards eliminating that when bpf_throw is not used (see above). I agree that
> > > > > > supporting it everywhere means thinking about a lot more corner cases, but I
> > > > > > feel it would be less surprising if doing bpf_assert simply worked everywhere.
> > > > > > One of the other reasons is that if it's being used within a shared static
> > > > > > function that both main program and callbacks call into, it will be a bit
> > > > > > annoying that it doesn't work in one context.
> > > > >
> > > > > I hope with open coded iterators the only use case for callbacks will be timers,
> > > > > exception cb and rbtree_add. All three are special cases.
> > > >
> > > > There's also bpf_for_each_map_elem, bpf_find_vma, bpf_user_ringbuf_drain.
> > > >
> > > > > There is nothing to unwind in the timer case.
> > > > > Certinaly not allowed to rethrow in exception cb.
> > > > > less-like rbtree_add should be tiny. And it's gotta to be fast.
> > > >
> > > > I agree for some of the cases above they do not matter too much. The main one
> > > > was bpf_loop, and the ones I listed (like for_each_map) seem to be those where
> > > > people may do siginificant work in the callback.
> > > >
> > > > I think we can make it zero cost for programs that don't use it (even for the
> > > > kernel code), I was just hoping to be able to support it everywhere as a generic
> > > > helper to abort program execution without any special corner cases during usage.
> > > > The other main point was about code sharing of functions which makes use of
> > > > assertions. But I'm ok with dropping support for callbacks if you think it's not
> > > > worth it in the end.
> > >
> > > I think the unexpected run-time slowdown due to checks is a bigger problem.
> > > Since bpf_assert is a 'bad program detector' it should be as zero cost to run-time
> > > as possible. Hence I'm advocating for single load+jmp approach of prog->aux->exception.
> >
> > I 100% agree, slow down is a big problem and a downside of the current version.
> > They need to be as lightweight as possible. It's too costly right now in this
> > set.
> >
> > > The run-time matter more than ability to use assert in all conditions. I think
> > > we'd need two flavors of bpf_assert() asm macros: with and without
> > > bpf_throw(). They seem to be useful without actual throw. They will help the
> > > verifier understand the ranges of variables in both cases. The macros are the
> > > 99% of the feature. The actual throw mechanism is 1%. The unwinding and
> > > release of resource is a cost of using macros without explicit control flow in
> > > the bpf programs. The macros without throw might be good enough in many cases.
> >
> > Ok, I'll update to allow for both variants. But just to confirm, do you want to
> > shelve automatic cleanup (part 2 for now), or want to add it? I'm not clear on
> > whether you agree or disagree it's necessary.
> >
> > I'll try the prog->aux->exception route, but I must say that we're sort of
> > trading off simpler semantics/behavior for speedup in that case (which is
> > necessary, but it is a cost IMO). I'll post a version using that, but I'll also
> > add comparisons with the variant where we spill ptr to exception state and
> > load+jmp. It's an extra instruction but I will try to benchmark and see how much
> > difference it causes in practice (probably over the XDP benchmark using such
> > exceptions, since that's one of the most important performance-critical use
> > cases). If you agree, let's double down on whatever approach we choose after
> > analysing the difference?
>
> I think performance considerations dominate implementation and ease of use.
> Could you describe how 'spill to exception state' will look like?

It was about spilling pointer to exception state on entry to program (so just
main subprog) at a known offset on stack, then instead of LD + JMP, LD from
stack + LD + JMP where the check is needed.

> I think the check after bpf_call insn has to be no more than LD + JMP.
> I was thinking whether we can do static_key like patching of the code.
> bpf_throw will know all locations that should be converted from nop into check
> and will do text_poke_bp before throwing.
> Maybe we can consider offline unwind and release too. The verifier will prep
> release tables and throw will execute them. BPF progs always have frame pointers,
> so walking the stack back is relatively easy. Release per callsite is hard.
>

After some thought, I think offline unwinding is the way to go. That means no
rewrites for the existing code, and we just offload all the cost to the slow
path (bpf_throw call) as it should be. There would be no cost at runtime (except
the conditional branch, which should be well predicted). The current approach
was a bit simpler so I gave it a shot first but I think it's not the way to go.
I will rework the set.

> As far as benchmarking I'd use selftests/bpf/bench. No need for real network XDP.

Got it.
Alexei Starovoitov April 17, 2023, 11:20 p.m. UTC | #9
On Sat, Apr 15, 2023 at 9:45 PM Kumar Kartikeya Dwivedi
<memxor@gmail.com> wrote:
>
> > I think the check after bpf_call insn has to be no more than LD + JMP.
> > I was thinking whether we can do static_key like patching of the code.
> > bpf_throw will know all locations that should be converted from nop into check
> > and will do text_poke_bp before throwing.
> > Maybe we can consider offline unwind and release too. The verifier will prep
> > release tables and throw will execute them. BPF progs always have frame pointers,
> > so walking the stack back is relatively easy. Release per callsite is hard.
> >
>
> After some thought, I think offline unwinding is the way to go. That means no
> rewrites for the existing code, and we just offload all the cost to the slow
> path (bpf_throw call) as it should be. There would be no cost at runtime (except
> the conditional branch, which should be well predicted). The current approach
> was a bit simpler so I gave it a shot first but I think it's not the way to go.
> I will rework the set.

It seems so indeed. Offline unwinding is more complex for sure.
The challenge is to make it mostly arch independent.
Something like get_perf_callchain() followed by lookup IP->release_table
and final setjmp() to bpf callback in the last bpf frame.
is_bpf_text_address() will help.
We shouldn't gate it by HAVE_RELIABLE_STACKTRACE,
since bpf prog stack walking is reliable on all archs where JIT is enabled.
Unwinding won't work reliably in interpreted mode though and it's ok.
bpf_throw is a kfunc and it needs prog->jit_requested anyway.
diff mbox series

Patch

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index bc067223d3ee..a5346a2b7e68 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -485,6 +485,7 @@  struct bpf_insn_aux_data {
 	bool zext_dst; /* this insn zero extends dst reg */
 	bool storage_get_func_atomic; /* bpf_*_storage_get() with atomic memory alloc */
 	bool is_iter_next; /* bpf_iter_<type>_next() kfunc call */
+	bool skip_patch_call_imm; /* Skip patch_call_imm phase in do_misc_fixups */
 	u8 alu_state; /* used in combination with alu_limit */
 
 	/* below fields are initialized once */
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index de0eadf8706f..6c0c5e726ebf 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -711,6 +711,8 @@  static long bpf_for_each_array_elem(struct bpf_map *map, bpf_callback_t callback
 		key = i;
 		ret = callback_fn((u64)(long)map, (u64)(long)&key,
 				  (u64)(long)val, (u64)(long)callback_ctx, 0);
+		if (bpf_get_exception())
+			ret = -EJUKEBOX;
 		/* return value: 0 - continue, 1 - stop and return */
 		if (ret)
 			break;
@@ -718,7 +720,7 @@  static long bpf_for_each_array_elem(struct bpf_map *map, bpf_callback_t callback
 
 	if (is_percpu)
 		migrate_enable();
-	return num_elems;
+	return ret == -EJUKEBOX ? ret : num_elems;
 }
 
 static u64 array_map_mem_usage(const struct bpf_map *map)
diff --git a/kernel/bpf/bpf_iter.c b/kernel/bpf/bpf_iter.c
index 96856f130cbf..6e4e4b6213f8 100644
--- a/kernel/bpf/bpf_iter.c
+++ b/kernel/bpf/bpf_iter.c
@@ -759,6 +759,8 @@  BPF_CALL_4(bpf_loop, u32, nr_loops, void *, callback_fn, void *, callback_ctx,
 
 	for (i = 0; i < nr_loops; i++) {
 		ret = callback((u64)i, (u64)(long)callback_ctx, 0, 0, 0);
+		if (bpf_get_exception())
+			return -EJUKEBOX;
 		/* return value: 0 - continue, 1 - stop and return */
 		if (ret)
 			return i + 1;
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 00c253b84bf5..5e70151e0414 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -2178,6 +2178,8 @@  static long bpf_for_each_hash_elem(struct bpf_map *map, bpf_callback_t callback_
 			num_elems++;
 			ret = callback_fn((u64)(long)map, (u64)(long)key,
 					  (u64)(long)val, (u64)(long)callback_ctx, 0);
+			if (bpf_get_exception())
+				ret = -EJUKEBOX;
 			/* return value: 0 - continue, 1 - stop and return */
 			if (ret) {
 				rcu_read_unlock();
@@ -2189,7 +2191,7 @@  static long bpf_for_each_hash_elem(struct bpf_map *map, bpf_callback_t callback_
 out:
 	if (is_percpu)
 		migrate_enable();
-	return num_elems;
+	return ret == -EJUKEBOX ? ret : num_elems;
 }
 
 static u64 htab_map_mem_usage(const struct bpf_map *map)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 89e70907257c..82db3a64fa3f 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1982,10 +1982,11 @@  __bpf_kfunc struct bpf_rb_node *bpf_rbtree_remove(struct bpf_rb_root *root,
 }
 
 /* Need to copy rbtree_add_cached's logic here because our 'less' is a BPF
- * program
+ * program.
+ * Marked notrace to avoid clobbering of exception state in current by BPF
+ * programs.
  */
-static void __bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
-			     void *less)
+static notrace void __bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node, void *less)
 {
 	struct rb_node **link = &((struct rb_root_cached *)root)->rb_root.rb_node;
 	bpf_callback_t cb = (bpf_callback_t)less;
@@ -1993,8 +1994,13 @@  static void __bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
 	bool leftmost = true;
 
 	while (*link) {
+		u64 cb_res;
+
 		parent = *link;
-		if (cb((uintptr_t)node, (uintptr_t)parent, 0, 0, 0)) {
+		cb_res = cb((uintptr_t)node, (uintptr_t)parent, 0, 0, 0);
+		if (bpf_get_exception())
+			return;
+		if (cb_res) {
 			link = &parent->rb_left;
 		} else {
 			link = &parent->rb_right;
@@ -2007,8 +2013,8 @@  static void __bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
 			       (struct rb_root_cached *)root, leftmost);
 }
 
-__bpf_kfunc void bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
-				bool (less)(struct bpf_rb_node *a, const struct bpf_rb_node *b))
+__bpf_kfunc notrace void bpf_rbtree_add(struct bpf_rb_root *root, struct bpf_rb_node *node,
+					bool (less)(struct bpf_rb_node *a, const struct bpf_rb_node *b))
 {
 	__bpf_rbtree_add(root, node, (void *)less);
 }
diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
index 875ac9b698d9..7f6764ae4fff 100644
--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
@@ -766,6 +766,10 @@  BPF_CALL_4(bpf_user_ringbuf_drain, struct bpf_map *, map,
 
 		bpf_dynptr_init(&dynptr, sample, BPF_DYNPTR_TYPE_LOCAL, 0, size);
 		ret = callback((uintptr_t)&dynptr, (uintptr_t)callback_ctx, 0, 0, 0);
+		if (bpf_get_exception()) {
+			ret = -EJUKEBOX;
+			goto schedule_work_return;
+		}
 		__bpf_user_ringbuf_sample_release(rb, size, flags);
 	}
 	ret = samples - discarded_samples;
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index c4ab9d6cdbe9..6e8667f03784 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -807,6 +807,8 @@  BPF_CALL_5(bpf_find_vma, struct task_struct *, task, u64, start,
 		callback_fn((u64)(long)task, (u64)(long)vma,
 			    (u64)(long)callback_ctx, 0, 0);
 		ret = 0;
+		if (bpf_get_exception())
+			ret = -EJUKEBOX;
 	}
 	bpf_mmap_unlock_mm(work, mm);
 	return ret;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6981d8817c71..07d808b05044 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -9053,6 +9053,24 @@  static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 		}
 	}
 
+	/* For each helper call which invokes a callback which may throw, it
+	 * will propagate the thrown exception to us. For helpers, we check the
+	 * return code in addition to exception state, as it may be reset
+	 * between detection and return within kernel. Note that we don't
+	 * include async callbacks (passed to bpf_timer_set_callback) because
+	 * exceptions won't be propagated.
+	 */
+	if (is_callback_calling_function(meta.func_id) &&
+	    meta.func_id != BPF_FUNC_timer_set_callback) {
+		struct bpf_throw_state *ts = &env->insn_aux_data[insn_idx].throw_state;
+		/* Check for -EJUKEBOX in case exception state is clobbered by
+		 * some other program executing between bpf_get_exception and
+		 * return from helper.
+		 */
+		if (base_type(fn->ret_type) == RET_INTEGER)
+			ts->check_helper_ret_code = true;
+	}
+
 	switch (func_id) {
 	case BPF_FUNC_tail_call:
 		err = check_reference_leak(env, false);
@@ -17691,6 +17709,9 @@  static int do_misc_fixups(struct bpf_verifier_env *env)
 			continue;
 		}
 
+		if (env->insn_aux_data[i + delta].skip_patch_call_imm)
+			continue;
+
 		if (insn->imm == BPF_FUNC_get_route_realm)
 			prog->dst_needed = 1;
 		if (insn->imm == BPF_FUNC_get_prandom_u32)
@@ -18177,6 +18198,94 @@  static bool is_bpf_throw_call(struct bpf_insn *insn)
 	       insn->off == 0 && insn->imm == special_kfunc_list[KF_bpf_throw];
 }
 
+static struct bpf_prog *rewrite_bpf_call(struct bpf_verifier_env *env,
+					 int position,
+					 s32 stack_base,
+					 struct bpf_throw_state *tstate,
+					 u32 *cnt)
+{
+	s32 r0_offset = stack_base + 0 * BPF_REG_SIZE;
+	struct bpf_insn_aux_data *aux_data;
+	struct bpf_insn insn_buf[] = {
+		env->prog->insnsi[position],
+		BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, r0_offset),
+		BPF_EMIT_CALL(bpf_get_exception),
+		BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3),
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_10, r0_offset),
+		BPF_JMP32_IMM(BPF_JNE, BPF_REG_0, -EJUKEBOX, 3),
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	};
+	struct bpf_prog *new_prog;
+	int type, tsubprog = -1;
+	u32 callback_start;
+	u32 call_insn_offset;
+	s32 callback_offset;
+	bool ret_code;
+
+	type = tstate->type;
+	ret_code = tstate->check_helper_ret_code;
+	if (type == BPF_THROW_OUTER)
+		insn_buf[4] = insn_buf[9] = BPF_EMIT_CALL(bpf_reset_exception);
+	if (type == BPF_THROW_INNER)
+		insn_buf[9] = BPF_EMIT_CALL(bpf_throw);
+
+	/* We need to fix offset of the pseudo call after patching.
+	 * Note: The actual call instruction is at insn_buf[0]
+	 */
+	if (bpf_pseudo_call(&insn_buf[0])) {
+		tsubprog = find_subprog(env, position + insn_buf[0].imm + 1);
+		if (WARN_ON_ONCE(tsubprog < 0))
+			return NULL;
+	}
+	/* For helpers, the code path between checking bpf_get_exception and
+	 * returning may involve invocation of tracing progs which reset
+	 * exception state, so also use the return value to invoke exception
+	 * path. Otherwise, exception event from callback is lost.
+	 */
+	if (ret_code)
+		*cnt = ARRAY_SIZE(insn_buf);
+	else
+		*cnt = ARRAY_SIZE(insn_buf) - 4;
+	new_prog = bpf_patch_insn_data(env, position, insn_buf, *cnt);
+	if (!new_prog)
+		return new_prog;
+
+	/* Note: The actual call instruction is at insn_buf[0] */
+	if (bpf_pseudo_call(&insn_buf[0])) {
+		callback_start = env->subprog_info[tsubprog].start;
+		call_insn_offset = position + 0;
+		callback_offset = callback_start - call_insn_offset - 1;
+		new_prog->insnsi[call_insn_offset].imm = callback_offset;
+	}
+
+	aux_data = env->insn_aux_data;
+	/* Note: We already patched in call at insn_buf[2], insn_buf[9]. */
+	aux_data[position + 2].skip_patch_call_imm = true;
+	if (ret_code)
+		aux_data[position + 9].skip_patch_call_imm = true;
+	/* Note: For BPF_THROW_OUTER, we already patched in call at insn_buf[4] */
+	if (type == BPF_THROW_OUTER)
+		aux_data[position + 4].skip_patch_call_imm = true;
+	return new_prog;
+}
+
+static bool is_throwing_bpf_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
+				 struct bpf_insn_aux_data *insn_aux)
+{
+	if (insn->code != (BPF_JMP | BPF_CALL))
+		return false;
+	if (insn->src_reg == BPF_PSEUDO_CALL ||
+	    insn->src_reg == BPF_PSEUDO_KFUNC_CALL ||
+	    insn->src_reg == 0)
+		return insn_aux->throw_state.type != BPF_THROW_NONE;
+	return false;
+}
+
 /* For all sub-programs in the program (including main) check
  * insn_aux_data to see if there are any instructions that need to be
  * transformed into an instruction sequence. E.g. bpf_loop calls that
@@ -18228,6 +18337,26 @@  static int do_misc_rewrites(struct bpf_verifier_env *env)
 			new_prog = rewrite_bpf_throw_call(env, i + delta, throw_state, &cnt);
 			if (!new_prog)
 				return -ENOMEM;
+		} else if (is_throwing_bpf_call(env, insn, insn_aux)) {
+			struct bpf_throw_state *throw_state = &insn_aux->throw_state;
+
+			stack_depth_extra = max_t(u16, stack_depth_extra,
+						  BPF_REG_SIZE * 1 + stack_depth_roundup);
+
+			/* The verifier was able to prove that the throwing call
+			 * was unreachable, hence it must have not been seen and
+			 * will be removed by opt_remove_dead_code.
+			 */
+			if (throw_state->type == BPF_THROW_NONE) {
+				WARN_ON_ONCE(insn_aux->seen);
+				goto skip;
+			}
+
+			new_prog = rewrite_bpf_call(env, i + delta,
+						    -(stack_depth + stack_depth_extra),
+						    throw_state, &cnt);
+			if (!new_prog)
+				return -ENOMEM;
 		}
 
 skip: