diff mbox series

[Bug,Report] Packet drops after trimming skb using bpf_skb_adjust_room

Message ID CAP01T74=OChcu8V2u3bSMas=Lh-y72Y+7my1djn-uSKygBoX+w@mail.gmail.com (mailing list archive)
State RFC
Delegated to: BPF
Headers show
Series [Bug,Report] Packet drops after trimming skb using bpf_skb_adjust_room | expand

Checks

Context Check Description
netdev/tree_selection success Guessing tree name failed - patch did not apply
bpf/vmtest-bpf-PR success PR summary
bpf/vmtest-bpf-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-VM_Test-10 success Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-VM_Test-14 success Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc
bpf/vmtest-bpf-VM_Test-5 success Logs for aarch64-gcc / build-release
bpf/vmtest-bpf-VM_Test-19 success Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-VM_Test-6 success Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-VM_Test-15 success Logs for s390x-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-VM_Test-26 success Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-8 success Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-3 success Logs for Validate matrix.py
bpf/vmtest-bpf-VM_Test-21 success Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-13 success Logs for s390x-gcc / test (test_maps, false, 360) / test_maps on s390x with gcc
bpf/vmtest-bpf-VM_Test-11 success Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-VM_Test-7 success Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-42 success Logs for x86_64-llvm-18 / veristat
bpf/vmtest-bpf-VM_Test-9 success Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-18 success Logs for set-matrix
bpf/vmtest-bpf-VM_Test-2 success Logs for Unittests
bpf/vmtest-bpf-VM_Test-25 success Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-30 success Logs for x86_64-llvm-17 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-36 success Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18 and -O2 optimization
bpf/vmtest-bpf-VM_Test-4 success Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-VM_Test-17 success Logs for s390x-gcc / veristat
bpf/vmtest-bpf-VM_Test-16 success Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
bpf/vmtest-bpf-VM_Test-37 success Logs for x86_64-llvm-18 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-39 success Logs for x86_64-llvm-18 / test (test_progs_cpuv4, false, 360) / test_progs_cpuv4 on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-23 success Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-32 success Logs for x86_64-llvm-17 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-29 success Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17 and -O2 optimization
bpf/vmtest-bpf-VM_Test-35 success Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-40 success Logs for x86_64-llvm-18 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-24 success Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-12 success Logs for s390x-gcc / build-release
bpf/vmtest-bpf-VM_Test-22 success Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-31 success Logs for x86_64-llvm-17 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-20 success Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-VM_Test-28 success Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-34 success Logs for x86_64-llvm-17 / veristat
bpf/vmtest-bpf-VM_Test-38 success Logs for x86_64-llvm-18 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-18
bpf/vmtest-bpf-VM_Test-27 success Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-33 success Logs for x86_64-llvm-17 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-17
bpf/vmtest-bpf-VM_Test-41 success Logs for x86_64-llvm-18 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-18

Commit Message

Kumar Kartikeya Dwivedi April 4, 2024, 12:22 a.m. UTC
Hello John,

For background, I have a sk_skb program where for certain flows, a
subset of request handling logic (on Rx) that is typically done in
user space is handed over to a BPF program for doing it in-kernel.

Since the protocol is over TCP, I'm using sk_skb programs to handle
these requests and generate responses from within the kernel. The user
space application will decide which flow should be added to the
sockmap on socket accept and then let it be handled in the kernel for
some of the request types.

However, when generating replies and trimming away extra bytes in the
packet by using bpf_skb_adjust_room, I see packet drops because the
strp_msg->full_len is not updated unless tls_sw_has_ctx_rx is true for
the socket. Removing that condition makes everything work, but I am
not sure that is the right fix.

My understanding so far is that in sk_psock_backlog,
sk_psock_handle_skb returns an error since the stm->full_len is not
correct (and passed into it as len), and SK_PSOCK_TX_ENABLED gets
cleared due to this error. Then, on the redirect side
(sk_psock_skb_redirect), it does sock_drop because the test for the
SK_PSOCK_TX_ENABLED flag fails, which does kfree_skb.

I have attached a patch with a selftest (skb_adjust_room) that you can
run to verify the problem. It should basically hang on the second read
as the packet would be dropped by the kernel. The first read goes
through fine. Applying the diff below allows the selftest to pass.

I am not familiar with the code to know whether the fix below is
correct, but it seems to resolve the problem for me (and I carried it
for a while and ran my stuff on it until I got around to reporting
this now).

In case gmail screws up the formatting, I'm just removing the
tls_sw_has_ctx_rx(skb->sk) conditional on the strp_msg->full_len
update in sk_skb_adjust_room.

-               rxm->full_len += len_diff;
-       }
+       strp_msg(skb)->full_len += len_diff;
        return ret;
 }

--

Thanks!

Comments

John Fastabend April 5, 2024, 4:52 a.m. UTC | #1
Kumar Kartikeya Dwivedi wrote:
> Hello John,
> 
> For background, I have a sk_skb program where for certain flows, a
> subset of request handling logic (on Rx) that is typically done in
> user space is handed over to a BPF program for doing it in-kernel.
> 
> Since the protocol is over TCP, I'm using sk_skb programs to handle
> these requests and generate responses from within the kernel. The user
> space application will decide which flow should be added to the
> sockmap on socket accept and then let it be handled in the kernel for
> some of the request types.
> 
> However, when generating replies and trimming away extra bytes in the
> packet by using bpf_skb_adjust_room, I see packet drops because the
> strp_msg->full_len is not updated unless tls_sw_has_ctx_rx is true for
> the socket. Removing that condition makes everything work, but I am
> not sure that is the right fix.
> 
> My understanding so far is that in sk_psock_backlog,
> sk_psock_handle_skb returns an error since the stm->full_len is not
> correct (and passed into it as len), and SK_PSOCK_TX_ENABLED gets
> cleared due to this error. Then, on the redirect side
> (sk_psock_skb_redirect), it does sock_drop because the test for the
> SK_PSOCK_TX_ENABLED flag fails, which does kfree_skb.
> 
> I have attached a patch with a selftest (skb_adjust_room) that you can
> run to verify the problem. It should basically hang on the second read
> as the packet would be dropped by the kernel. The first read goes
> through fine. Applying the diff below allows the selftest to pass.
> 
> I am not familiar with the code to know whether the fix below is
> correct, but it seems to resolve the problem for me (and I carried it
> for a while and ran my stuff on it until I got around to reporting
> this now).
> 
> In case gmail screws up the formatting, I'm just removing the
> tls_sw_has_ctx_rx(skb->sk) conditional on the strp_msg->full_len
> update in sk_skb_adjust_room.

I think your analysis is correct and patch looks good. I'll study
a bit more tomorrow.  It looks like I just messed it up there and
didn't consider the case of redirect in non-TLS case.

Gmail did fine pushing code through.

> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 524adf1fa6d0..e3009d0f3352 100644
> 
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -3622,11 +3622,7 @@ BPF_CALL_4(sk_skb_adjust_room, struct sk_buff
> *, skb, s32, len_diff,
> 
>                         return -ENOMEM;
> 
>                 __skb_pull(skb, len_diff_abs);
> 
>         }
> 
> -       if (tls_sw_has_ctx_rx(skb->sk)) {
> -               struct strp_msg *rxm = strp_msg(skb);
> -
> -               rxm->full_len += len_diff;
> -       }
> +       strp_msg(skb)->full_len += len_diff;
>         return ret;
>  }
> 
> --
> 
> Thanks!
Kumar Kartikeya Dwivedi April 5, 2024, 10:08 p.m. UTC | #2
+ Fix email in Cc

On Fri, 5 Apr 2024 at 06:52, John Fastabend <john.fastabend@gmail.com> wrote:
>
> Kumar Kartikeya Dwivedi wrote:
> > Hello John,
> >
> > For background, I have a sk_skb program where for certain flows, a
> > subset of request handling logic (on Rx) that is typically done in
> > user space is handed over to a BPF program for doing it in-kernel.
> >
> > Since the protocol is over TCP, I'm using sk_skb programs to handle
> > these requests and generate responses from within the kernel. The user
> > space application will decide which flow should be added to the
> > sockmap on socket accept and then let it be handled in the kernel for
> > some of the request types.
> >
> > However, when generating replies and trimming away extra bytes in the
> > packet by using bpf_skb_adjust_room, I see packet drops because the
> > strp_msg->full_len is not updated unless tls_sw_has_ctx_rx is true for
> > the socket. Removing that condition makes everything work, but I am
> > not sure that is the right fix.
> >
> > My understanding so far is that in sk_psock_backlog,
> > sk_psock_handle_skb returns an error since the stm->full_len is not
> > correct (and passed into it as len), and SK_PSOCK_TX_ENABLED gets
> > cleared due to this error. Then, on the redirect side
> > (sk_psock_skb_redirect), it does sock_drop because the test for the
> > SK_PSOCK_TX_ENABLED flag fails, which does kfree_skb.
> >
> > I have attached a patch with a selftest (skb_adjust_room) that you can
> > run to verify the problem. It should basically hang on the second read
> > as the packet would be dropped by the kernel. The first read goes
> > through fine. Applying the diff below allows the selftest to pass.
> >
> > I am not familiar with the code to know whether the fix below is
> > correct, but it seems to resolve the problem for me (and I carried it
> > for a while and ran my stuff on it until I got around to reporting
> > this now).
> >
> > In case gmail screws up the formatting, I'm just removing the
> > tls_sw_has_ctx_rx(skb->sk) conditional on the strp_msg->full_len
> > update in sk_skb_adjust_room.
>
> I think your analysis is correct and patch looks good. I'll study
> a bit more tomorrow.  It looks like I just messed it up there and
> didn't consider the case of redirect in non-TLS case.

Great. Let me know your conclusion after you take a closer look, and I
can post the fix as an official patch with the selftest.
Thanks!


>
> Gmail did fine pushing code through.
>
> >
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 524adf1fa6d0..e3009d0f3352 100644
> >
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -3622,11 +3622,7 @@ BPF_CALL_4(sk_skb_adjust_room, struct sk_buff
> > *, skb, s32, len_diff,
> >
> >                         return -ENOMEM;
> >
> >                 __skb_pull(skb, len_diff_abs);
> >
> >         }
> >
> > -       if (tls_sw_has_ctx_rx(skb->sk)) {
> > -               struct strp_msg *rxm = strp_msg(skb);
> > -
> > -               rxm->full_len += len_diff;
> > -       }
> > +       strp_msg(skb)->full_len += len_diff;
> >         return ret;
> >  }
> >
> > --
> >
> > Thanks!
>
>
diff mbox series

Patch

diff --git a/net/core/filter.c b/net/core/filter.c
index 524adf1fa6d0..e3009d0f3352 100644

--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3622,11 +3622,7 @@  BPF_CALL_4(sk_skb_adjust_room, struct sk_buff
*, skb, s32, len_diff,

                        return -ENOMEM;

                __skb_pull(skb, len_diff_abs);

        }

-       if (tls_sw_has_ctx_rx(skb->sk)) {
-               struct strp_msg *rxm = strp_msg(skb);
-