Context |
Check |
Description |
bpf/vmtest-bpf-PR |
success
|
PR summary
|
netdev/series_format |
success
|
Posting correctly formatted
|
netdev/tree_selection |
success
|
Clearly marked for bpf, async
|
netdev/ynl |
success
|
Generated files up to date;
no warnings/errors;
no diff in generated;
|
netdev/fixes_present |
success
|
Fixes tag present in non-next series
|
netdev/header_inline |
success
|
No static functions without inline keyword in header files
|
netdev/build_32bit |
success
|
Errors and warnings before: 7 this patch: 7
|
netdev/build_tools |
success
|
Errors and warnings before: 0 (+0) this patch: 0 (+0)
|
netdev/cc_maintainers |
success
|
CCed 11 of 11 maintainers
|
netdev/build_clang |
success
|
Errors and warnings before: 542 this patch: 542
|
netdev/verify_signedoff |
success
|
Signed-off-by tag matches author and committer
|
netdev/deprecated_api |
success
|
None detected
|
netdev/check_selftest |
success
|
No net selftest shell script
|
netdev/verify_fixes |
success
|
Fixes tag looks correct
|
netdev/build_allmodconfig_warn |
success
|
Errors and warnings before: 908 this patch: 908
|
netdev/checkpatch |
success
|
total: 0 errors, 0 warnings, 0 checks, 210 lines checked
|
netdev/build_clang_rust |
success
|
No Rust files in patch. Skipping build
|
netdev/kdoc |
success
|
Errors and warnings before: 3 this patch: 3
|
netdev/source_inline |
success
|
Was 0 now: 0
|
bpf/vmtest-bpf-VM_Test-0 |
success
|
Logs for Lint
|
bpf/vmtest-bpf-VM_Test-1 |
success
|
Logs for ShellCheck
|
bpf/vmtest-bpf-VM_Test-2 |
success
|
Logs for Unittests
|
bpf/vmtest-bpf-VM_Test-3 |
success
|
Logs for Validate matrix.py
|
bpf/vmtest-bpf-VM_Test-5 |
success
|
Logs for aarch64-gcc / build-release
|
bpf/vmtest-bpf-VM_Test-4 |
success
|
Logs for aarch64-gcc / build / build for aarch64 with gcc
|
bpf/vmtest-bpf-VM_Test-9 |
success
|
Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
|
bpf/vmtest-bpf-VM_Test-10 |
success
|
Logs for aarch64-gcc / veristat-kernel
|
bpf/vmtest-bpf-VM_Test-11 |
success
|
Logs for aarch64-gcc / veristat-meta
|
bpf/vmtest-bpf-VM_Test-12 |
success
|
Logs for s390x-gcc / build / build for s390x with gcc
|
bpf/vmtest-bpf-VM_Test-16 |
success
|
Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
|
bpf/vmtest-bpf-VM_Test-17 |
success
|
Logs for s390x-gcc / veristat-kernel
|
bpf/vmtest-bpf-VM_Test-13 |
success
|
Logs for s390x-gcc / build-release
|
bpf/vmtest-bpf-VM_Test-19 |
success
|
Logs for set-matrix
|
bpf/vmtest-bpf-VM_Test-18 |
success
|
Logs for s390x-gcc / veristat-meta
|
bpf/vmtest-bpf-VM_Test-21 |
success
|
Logs for x86_64-gcc / build-release
|
bpf/vmtest-bpf-VM_Test-20 |
success
|
Logs for x86_64-gcc / build / build for x86_64 with gcc
|
bpf/vmtest-bpf-VM_Test-30 |
success
|
Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
|
bpf/vmtest-bpf-VM_Test-31 |
success
|
Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17-O2
|
bpf/vmtest-bpf-VM_Test-37 |
success
|
Logs for x86_64-llvm-17 / veristat-meta
|
bpf/vmtest-bpf-VM_Test-38 |
success
|
Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
|
bpf/vmtest-bpf-VM_Test-39 |
success
|
Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18-O2
|
bpf/vmtest-bpf-VM_Test-36 |
success
|
Logs for x86_64-llvm-17 / veristat-kernel
|
bpf/vmtest-bpf-VM_Test-45 |
success
|
Logs for x86_64-llvm-18 / veristat-kernel
|
bpf/vmtest-bpf-VM_Test-46 |
success
|
Logs for x86_64-llvm-18 / veristat-meta
|
bpf/vmtest-bpf-VM_Test-6 |
success
|
Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
|
bpf/vmtest-bpf-VM_Test-7 |
success
|
Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc
|
bpf/vmtest-bpf-VM_Test-8 |
success
|
Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
|
bpf/vmtest-bpf-VM_Test-15 |
success
|
Logs for s390x-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on s390x with gcc
|
bpf/vmtest-bpf-VM_Test-22 |
success
|
Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
|
bpf/vmtest-bpf-VM_Test-27 |
success
|
Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
|
bpf/vmtest-bpf-VM_Test-29 |
success
|
Logs for x86_64-gcc / veristat-meta / x86_64-gcc veristat_meta
|
bpf/vmtest-bpf-VM_Test-28 |
success
|
Logs for x86_64-gcc / veristat-kernel / x86_64-gcc veristat_kernel
|
bpf/vmtest-bpf-VM_Test-32 |
success
|
Logs for x86_64-llvm-17 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-17
|
bpf/vmtest-bpf-VM_Test-35 |
success
|
Logs for x86_64-llvm-17 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-17
|
bpf/vmtest-bpf-VM_Test-44 |
success
|
Logs for x86_64-llvm-18 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-18
|
bpf/vmtest-bpf-VM_Test-14 |
success
|
Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc
|
bpf/vmtest-bpf-VM_Test-23 |
success
|
Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
|
bpf/vmtest-bpf-VM_Test-24 |
success
|
Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
|
bpf/vmtest-bpf-VM_Test-25 |
success
|
Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
|
bpf/vmtest-bpf-VM_Test-26 |
success
|
Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
|
bpf/vmtest-bpf-VM_Test-33 |
success
|
Logs for x86_64-llvm-17 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-17
|
bpf/vmtest-bpf-VM_Test-34 |
success
|
Logs for x86_64-llvm-17 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-17
|
bpf/vmtest-bpf-VM_Test-40 |
success
|
Logs for x86_64-llvm-18 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-18
|
bpf/vmtest-bpf-VM_Test-41 |
success
|
Logs for x86_64-llvm-18 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-18
|
bpf/vmtest-bpf-VM_Test-42 |
success
|
Logs for x86_64-llvm-18 / test (test_progs_cpuv4, false, 360) / test_progs_cpuv4 on x86_64 with llvm-18
|
bpf/vmtest-bpf-VM_Test-43 |
success
|
Logs for x86_64-llvm-18 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-18
|
@@ -91,6 +91,8 @@ struct sk_psock {
struct sk_psock_progs progs;
#if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
struct strparser strp;
+ u32 copied_seq;
+ u32 ingress_bytes;
#endif
struct sk_buff_head ingress_skb;
struct list_head ingress_msg;
@@ -43,6 +43,8 @@ struct strparser;
struct strp_callbacks {
int (*parse_msg)(struct strparser *strp, struct sk_buff *skb);
void (*rcv_msg)(struct strparser *strp, struct sk_buff *skb);
+ int (*read_sock)(struct strparser *strp, read_descriptor_t *desc,
+ sk_read_actor_t recv_actor);
int (*read_sock_done)(struct strparser *strp, int err);
void (*abort_parser)(struct strparser *strp, int err);
void (*lock)(struct strparser *strp);
@@ -729,6 +729,9 @@ void tcp_get_info(struct sock *, struct tcp_info *);
/* Read 'sendfile()'-style from a TCP socket */
int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
sk_read_actor_t recv_actor);
+int tcp_read_sock_noack(struct sock *sk, read_descriptor_t *desc,
+ sk_read_actor_t recv_actor, bool noack,
+ u32 *copied_seq);
int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor);
struct sk_buff *tcp_recv_skb(struct sock *sk, u32 seq, u32 *off);
void tcp_read_done(struct sock *sk, size_t len);
@@ -2609,6 +2612,11 @@ static inline void tcp_eat_skb(struct sock *sk, struct sk_buff *skb)
}
#endif
+#ifdef CONFIG_BPF_STREAM_PARSER
+int tcp_bpf_strp_read_sock(struct sock *sk, read_descriptor_t *desc,
+ sk_read_actor_t recv_actor);
+#endif /* CONFIG_BPF_STREAM_PARSER */
+
int tcp_bpf_sendmsg_redir(struct sock *sk, bool ingress,
struct sk_msg *msg, u32 bytes, int flags);
#endif /* CONFIG_NET_SOCK_MSG */
@@ -548,7 +548,9 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb,
if (unlikely(num_sge < 0))
return num_sge;
}
-
+#if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
+ psock->ingress_bytes += len;
+#endif
copied = len;
msg->sg.start = 0;
msg->sg.size = copied;
@@ -1092,6 +1094,20 @@ static int sk_psock_strp_read_done(struct strparser *strp, int err)
return err;
}
+static int sk_psock_strp_read_sock(struct strparser *strp,
+ read_descriptor_t *desc,
+ sk_read_actor_t recv_actor)
+{
+ struct sock *sk = strp->sk;
+
+ if (WARN_ON(!sk_is_tcp(sk))) {
+ desc->error = -EINVAL;
+ return 0;
+ }
+
+ return tcp_bpf_strp_read_sock(sk, desc, recv_actor);
+}
+
static int sk_psock_strp_parse(struct strparser *strp, struct sk_buff *skb)
{
struct sk_psock *psock = container_of(strp, struct sk_psock, strp);
@@ -1136,6 +1152,7 @@ int sk_psock_init_strp(struct sock *sk, struct sk_psock *psock)
static const struct strp_callbacks cb = {
.rcv_msg = sk_psock_strp_read,
+ .read_sock = sk_psock_strp_read_sock,
.read_sock_done = sk_psock_strp_read_done,
.parse_msg = sk_psock_strp_parse,
};
@@ -1144,6 +1161,9 @@ int sk_psock_init_strp(struct sock *sk, struct sk_psock *psock)
if (!ret)
sk_psock_set_state(psock, SK_PSOCK_RX_STRP_ENABLED);
+ if (sk_is_tcp(sk))
+ psock->copied_seq = tcp_sk(sk)->copied_seq;
+
return ret;
}
@@ -1565,12 +1565,13 @@ EXPORT_SYMBOL(tcp_recv_skb);
* or for 'peeking' the socket using this routine
* (although both would be easy to implement).
*/
-int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
- sk_read_actor_t recv_actor)
+static int __tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
+ sk_read_actor_t recv_actor, bool noack,
+ u32 *copied_seq)
{
struct sk_buff *skb;
struct tcp_sock *tp = tcp_sk(sk);
- u32 seq = tp->copied_seq;
+ u32 seq = *copied_seq;
u32 offset;
int copied = 0;
@@ -1624,9 +1625,12 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
tcp_eat_recv_skb(sk, skb);
if (!desc->count)
break;
- WRITE_ONCE(tp->copied_seq, seq);
+ WRITE_ONCE(*copied_seq, seq);
}
- WRITE_ONCE(tp->copied_seq, seq);
+ WRITE_ONCE(*copied_seq, seq);
+
+ if (noack)
+ goto out;
tcp_rcv_space_adjust(sk);
@@ -1635,10 +1639,25 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
tcp_recv_skb(sk, seq, &offset);
tcp_cleanup_rbuf(sk, copied);
}
+out:
return copied;
}
+
+int tcp_read_sock(struct sock *sk, read_descriptor_t *desc,
+ sk_read_actor_t recv_actor)
+{
+ return __tcp_read_sock(sk, desc, recv_actor, false,
+ &tcp_sk(sk)->copied_seq);
+}
EXPORT_SYMBOL(tcp_read_sock);
+int tcp_read_sock_noack(struct sock *sk, read_descriptor_t *desc,
+ sk_read_actor_t recv_actor, bool noack,
+ u32 *copied_seq)
+{
+ return __tcp_read_sock(sk, desc, recv_actor, noack, copied_seq);
+}
+
int tcp_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
{
struct sk_buff *skb;
@@ -698,3 +698,44 @@ void tcp_bpf_clone(const struct sock *sk, struct sock *newsk)
newsk->sk_prot = sk->sk_prot_creator;
}
#endif /* CONFIG_BPF_SYSCALL */
+
+#if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
+int tcp_bpf_strp_read_sock(struct sock *sk, read_descriptor_t *desc,
+ sk_read_actor_t recv_actor)
+{
+ struct sk_psock *psock;
+ struct tcp_sock *tp;
+ int copied = 0;
+
+ tp = tcp_sk(sk);
+ rcu_read_lock();
+ psock = sk_psock(sk);
+ if (WARN_ON(!psock)) {
+ desc->error = -EINVAL;
+ goto out;
+ }
+
+ psock->ingress_bytes = 0;
+ /* We could easily add copied_seq and noack into desc then call
+ * ops->read_sock without calling symbol directly. But unfortunately
+ * most descriptors used by other modules are not inited with zero.
+ * Also replacing ops->read_sock can't be workd without introducing
+ * new ops as ops itself is located in rodata segment.
+ */
+ copied = tcp_read_sock_noack(sk, desc, recv_actor, true,
+ &psock->copied_seq);
+ if (copied < 0)
+ goto out;
+ /* recv_actor may redirect skb to another socket(SK_REDIRECT) or
+ * just put skb into ingress queue of current socket(SK_PASS).
+ * For SK_REDIRECT, we need 'ack' the frame immediately but for
+ * SK_PASS, the 'ack' was delay to tcp_bpf_recvmsg_parser()
+ */
+ tp->copied_seq = psock->copied_seq - psock->ingress_bytes;
+ tcp_rcv_space_adjust(sk);
+ __tcp_cleanup_rbuf(sk, copied - psock->ingress_bytes);
+out:
+ rcu_read_unlock();
+ return copied;
+}
+#endif /* CONFIG_BPF_STREAM_PARSER */
@@ -347,7 +347,10 @@ static int strp_read_sock(struct strparser *strp)
struct socket *sock = strp->sk->sk_socket;
read_descriptor_t desc;
- if (unlikely(!sock || !sock->ops || !sock->ops->read_sock))
+ if (unlikely(!sock || !sock->ops))
+ return -EBUSY;
+
+ if (unlikely(!strp->cb.read_sock && !sock->ops->read_sock))
return -EBUSY;
desc.arg.data = strp;
@@ -355,7 +358,10 @@ static int strp_read_sock(struct strparser *strp)
desc.count = 1; /* give more than one skb per call */
/* sk should be locked here, so okay to do read_sock */
- sock->ops->read_sock(strp->sk, &desc, strp_recv);
+ if (strp->cb.read_sock)
+ strp->cb.read_sock(strp, &desc, strp_recv);
+ else
+ sock->ops->read_sock(strp->sk, &desc, strp_recv);
desc.error = strp->cb.read_sock_done(strp, desc.error);
@@ -468,6 +474,7 @@ int strp_init(struct strparser *strp, struct sock *sk,
strp->cb.unlock = cb->unlock ? : strp_sock_unlock;
strp->cb.rcv_msg = cb->rcv_msg;
strp->cb.parse_msg = cb->parse_msg;
+ strp->cb.read_sock = cb->read_sock;
strp->cb.read_sock_done = cb->read_sock_done ? : default_read_sock_done;
strp->cb.abort_parser = cb->abort_parser ? : strp_abort_strp;
'sk->copied_seq' was updated in the tcp_eat_skb() function when the action of a BPF program was SK_REDIRECT. For other actions, like SK_PASS, the update logic for 'sk->copied_seq' was moved to tcp_bpf_recvmsg_parser() to ensure the accuracy of the 'fionread' feature. It works for a single stream_verdict scenario, as it also modified 'sk_data_ready->sk_psock_verdict_data_ready->tcp_read_skb' to remove updating 'sk->copied_seq'. However, for programs where both stream_parser and stream_verdict are active(strparser purpose), tcp_read_sock() was used instead of tcp_read_skb() (sk_data_ready->strp_data_ready->tcp_read_sock) tcp_read_sock() now still update 'sk->copied_seq', leading to duplicated updates. In summary, for strparser + SK_PASS, copied_seq is redundantly calculated in both tcp_read_sock() and tcp_bpf_recvmsg_parser(). The issue causes incorrect copied_seq calculations, which prevent correct data reads from the recv() interface in user-land. We do not want to add new proto_ops to implement a new version of tcp_read_sock, as this would introduce code complexity [1]. We add new callback for strparser for customized read operation. [1]: https://lore.kernel.org/bpf/20241218053408.437295-1-mrpre@163.com Fixes: e5c6de5fa025 ("bpf, sockmap: Incorrectly handling copied_seq") Suggested-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Jiayuan Chen <mrpre@163.com> --- include/linux/skmsg.h | 2 ++ include/net/strparser.h | 2 ++ include/net/tcp.h | 8 ++++++++ net/core/skmsg.c | 22 ++++++++++++++++++++- net/ipv4/tcp.c | 29 ++++++++++++++++++++++----- net/ipv4/tcp_bpf.c | 41 +++++++++++++++++++++++++++++++++++++++ net/strparser/strparser.c | 11 +++++++++-- 7 files changed, 107 insertions(+), 8 deletions(-)