Message ID | 20211002003706.11237-4-xiyou.wangcong@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | sock_map: fix ->poll() and update selftests | expand |
On 10/2/21 2:37 AM, Cong Wang wrote: > From: Cong Wang <cong.wang@bytedance.com> > > Yucong noticed we can't poll() sockets in sockmap even > when they are the destination sockets of redirections. > This is because we never poll any psock queues in ->poll(), > except for TCP. With ->sock_is_readable() now we can > overwrite >sock_is_readable(), invoke and implement it for > both UDP and AF_UNIX sockets. > > Reported-by: Yucong Sun <sunyucong@gmail.com> > Cc: John Fastabend <john.fastabend@gmail.com> > Cc: Daniel Borkmann <daniel@iogearbox.net> > Cc: Jakub Sitnicki <jakub@cloudflare.com> > Cc: Lorenz Bauer <lmb@cloudflare.com> > Signed-off-by: Cong Wang <cong.wang@bytedance.com> > --- > net/ipv4/udp.c | 2 ++ > net/ipv4/udp_bpf.c | 1 + > net/unix/af_unix.c | 4 ++++ > net/unix/unix_bpf.c | 2 ++ > 4 files changed, 9 insertions(+) > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c > index 2a7825a5b842..4a7e15a43a68 100644 > --- a/net/ipv4/udp.c > +++ b/net/ipv4/udp.c > @@ -2866,6 +2866,8 @@ __poll_t udp_poll(struct file *file, struct socket *sock, poll_table *wait) > !(sk->sk_shutdown & RCV_SHUTDOWN) && first_packet_length(sk) == -1) > mask &= ~(EPOLLIN | EPOLLRDNORM); > > + if (sk_is_readable(sk)) > + mask |= EPOLLIN | EPOLLRDNORM; udp_poll() has this extra logic around first_packet_length() which drops all bad csum'ed skbs. How does this stand in relation to sk_msg_is_readable()? Is this a concern as well there? Maybe makes sense to elaborate a bit more in the commit message for context / future reference. Thanks, Daniel > return mask; > > }
On Thu, Oct 7, 2021 at 1:00 PM Daniel Borkmann <daniel@iogearbox.net> wrote: > > On 10/2/21 2:37 AM, Cong Wang wrote: > > From: Cong Wang <cong.wang@bytedance.com> > > > > Yucong noticed we can't poll() sockets in sockmap even > > when they are the destination sockets of redirections. > > This is because we never poll any psock queues in ->poll(), > > except for TCP. With ->sock_is_readable() now we can > > overwrite >sock_is_readable(), invoke and implement it for > > both UDP and AF_UNIX sockets. > > > > Reported-by: Yucong Sun <sunyucong@gmail.com> > > Cc: John Fastabend <john.fastabend@gmail.com> > > Cc: Daniel Borkmann <daniel@iogearbox.net> > > Cc: Jakub Sitnicki <jakub@cloudflare.com> > > Cc: Lorenz Bauer <lmb@cloudflare.com> > > Signed-off-by: Cong Wang <cong.wang@bytedance.com> > > --- > > net/ipv4/udp.c | 2 ++ > > net/ipv4/udp_bpf.c | 1 + > > net/unix/af_unix.c | 4 ++++ > > net/unix/unix_bpf.c | 2 ++ > > 4 files changed, 9 insertions(+) > > > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c > > index 2a7825a5b842..4a7e15a43a68 100644 > > --- a/net/ipv4/udp.c > > +++ b/net/ipv4/udp.c > > @@ -2866,6 +2866,8 @@ __poll_t udp_poll(struct file *file, struct socket *sock, poll_table *wait) > > !(sk->sk_shutdown & RCV_SHUTDOWN) && first_packet_length(sk) == -1) > > mask &= ~(EPOLLIN | EPOLLRDNORM); > > > > + if (sk_is_readable(sk)) > > + mask |= EPOLLIN | EPOLLRDNORM; > > udp_poll() has this extra logic around first_packet_length() which drops all bad csum'ed > skbs. How does this stand in relation to sk_msg_is_readable()? Is this a concern as well > there? Maybe makes sense to elaborate a bit more in the commit message for context / future > reference. We don't validate UDP checksums on sockmap RX path, so it is okay to leave it as it is, but it is worth a comment like you suggest. I will add a comment in this code. If we really need to validate the checksum, it should be addressed in a separate patch(set), not in this one. Thanks.
Cong Wang wrote: > On Thu, Oct 7, 2021 at 1:00 PM Daniel Borkmann <daniel@iogearbox.net> wrote: > > > > On 10/2/21 2:37 AM, Cong Wang wrote: > > > From: Cong Wang <cong.wang@bytedance.com> > > > > > > Yucong noticed we can't poll() sockets in sockmap even > > > when they are the destination sockets of redirections. > > > This is because we never poll any psock queues in ->poll(), > > > except for TCP. With ->sock_is_readable() now we can > > > overwrite >sock_is_readable(), invoke and implement it for > > > both UDP and AF_UNIX sockets. > > > > > > Reported-by: Yucong Sun <sunyucong@gmail.com> > > > Cc: John Fastabend <john.fastabend@gmail.com> > > > Cc: Daniel Borkmann <daniel@iogearbox.net> > > > Cc: Jakub Sitnicki <jakub@cloudflare.com> > > > Cc: Lorenz Bauer <lmb@cloudflare.com> > > > Signed-off-by: Cong Wang <cong.wang@bytedance.com> > > > --- > > > net/ipv4/udp.c | 2 ++ > > > net/ipv4/udp_bpf.c | 1 + > > > net/unix/af_unix.c | 4 ++++ > > > net/unix/unix_bpf.c | 2 ++ > > > 4 files changed, 9 insertions(+) > > > > > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c > > > index 2a7825a5b842..4a7e15a43a68 100644 > > > --- a/net/ipv4/udp.c > > > +++ b/net/ipv4/udp.c > > > @@ -2866,6 +2866,8 @@ __poll_t udp_poll(struct file *file, struct socket *sock, poll_table *wait) > > > !(sk->sk_shutdown & RCV_SHUTDOWN) && first_packet_length(sk) == -1) > > > mask &= ~(EPOLLIN | EPOLLRDNORM); > > > > > > + if (sk_is_readable(sk)) > > > + mask |= EPOLLIN | EPOLLRDNORM; > > > > udp_poll() has this extra logic around first_packet_length() which drops all bad csum'ed > > skbs. How does this stand in relation to sk_msg_is_readable()? Is this a concern as well > > there? Maybe makes sense to elaborate a bit more in the commit message for context / future > > reference. > > We don't validate UDP checksums on sockmap RX path, so > it is okay to leave it as it is, but it is worth a comment like > you suggest. I will add a comment in this code. > > If we really need to validate the checksum, it should be addressed > in a separate patch(set), not in this one. > > Thanks. We should validate the checksum before creating the sk_msg so that any parsers running on top of UDP don't parse a bad checksum payload or pass a bad checksum up to user space. I thought there would be a check in the read_sock side, but I didn't see it.
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 2a7825a5b842..4a7e15a43a68 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -2866,6 +2866,8 @@ __poll_t udp_poll(struct file *file, struct socket *sock, poll_table *wait) !(sk->sk_shutdown & RCV_SHUTDOWN) && first_packet_length(sk) == -1) mask &= ~(EPOLLIN | EPOLLRDNORM); + if (sk_is_readable(sk)) + mask |= EPOLLIN | EPOLLRDNORM; return mask; } diff --git a/net/ipv4/udp_bpf.c b/net/ipv4/udp_bpf.c index 7a1d5f473878..bbe6569c9ad3 100644 --- a/net/ipv4/udp_bpf.c +++ b/net/ipv4/udp_bpf.c @@ -114,6 +114,7 @@ static void udp_bpf_rebuild_protos(struct proto *prot, const struct proto *base) *prot = *base; prot->close = sock_map_close; prot->recvmsg = udp_bpf_recvmsg; + prot->sock_is_readable = sk_msg_is_readable; } static void udp_bpf_check_v6_needs_rebuild(struct proto *ops) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index f505b89bda6a..3e65d9f5531d 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -3029,6 +3029,8 @@ static __poll_t unix_poll(struct file *file, struct socket *sock, poll_table *wa /* readable? */ if (!skb_queue_empty_lockless(&sk->sk_receive_queue)) mask |= EPOLLIN | EPOLLRDNORM; + if (sk_is_readable(sk)) + mask |= EPOLLIN | EPOLLRDNORM; /* Connection-based need to check for termination and startup */ if ((sk->sk_type == SOCK_STREAM || sk->sk_type == SOCK_SEQPACKET) && @@ -3068,6 +3070,8 @@ static __poll_t unix_dgram_poll(struct file *file, struct socket *sock, /* readable? */ if (!skb_queue_empty_lockless(&sk->sk_receive_queue)) mask |= EPOLLIN | EPOLLRDNORM; + if (sk_is_readable(sk)) + mask |= EPOLLIN | EPOLLRDNORM; /* Connection-based need to check for termination and startup */ if (sk->sk_type == SOCK_SEQPACKET) { diff --git a/net/unix/unix_bpf.c b/net/unix/unix_bpf.c index b927e2baae50..452376c6f419 100644 --- a/net/unix/unix_bpf.c +++ b/net/unix/unix_bpf.c @@ -102,6 +102,7 @@ static void unix_dgram_bpf_rebuild_protos(struct proto *prot, const struct proto *prot = *base; prot->close = sock_map_close; prot->recvmsg = unix_bpf_recvmsg; + prot->sock_is_readable = sk_msg_is_readable; } static void unix_stream_bpf_rebuild_protos(struct proto *prot, @@ -110,6 +111,7 @@ static void unix_stream_bpf_rebuild_protos(struct proto *prot, *prot = *base; prot->close = sock_map_close; prot->recvmsg = unix_bpf_recvmsg; + prot->sock_is_readable = sk_msg_is_readable; prot->unhash = sock_map_unhash; }