diff mbox series

[bpf,v3,3/4] net: implement ->sock_is_readable() for UDP and AF_UNIX

Message ID 20211002003706.11237-4-xiyou.wangcong@gmail.com (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series sock_map: fix ->poll() and update selftests | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present fail Series targets non-next tree, but doesn't contain any Fixes tags
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for bpf
netdev/subject_prefix success Link
netdev/cc_maintainers warning 15 maintainers not CCed: kafai@fb.com Rao.Shoaib@oracle.com songliubraving@fb.com kpsingh@kernel.org andrii@kernel.org dsahern@kernel.org yoshfuji@linux-ipv6.org yhs@fb.com jiang.wang@bytedance.com ast@kernel.org christian.brauner@ubuntu.com kuba@kernel.org davem@davemloft.net viro@zeniv.linux.org.uk kuniyu@amazon.co.jp
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 11 this patch: 11
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 45 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 11 this patch: 11
netdev/header_inline success Link
bpf/vmtest-bpf-PR success PR summary
bpf/vmtest-bpf success VM_Test

Commit Message

Cong Wang Oct. 2, 2021, 12:37 a.m. UTC
From: Cong Wang <cong.wang@bytedance.com>

Yucong noticed we can't poll() sockets in sockmap even
when they are the destination sockets of redirections.
This is because we never poll any psock queues in ->poll(),
except for TCP. With ->sock_is_readable() now we can
overwrite >sock_is_readable(), invoke and implement it for
both UDP and AF_UNIX sockets.

Reported-by: Yucong Sun <sunyucong@gmail.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 net/ipv4/udp.c      | 2 ++
 net/ipv4/udp_bpf.c  | 1 +
 net/unix/af_unix.c  | 4 ++++
 net/unix/unix_bpf.c | 2 ++
 4 files changed, 9 insertions(+)

Comments

Daniel Borkmann Oct. 7, 2021, 8 p.m. UTC | #1
On 10/2/21 2:37 AM, Cong Wang wrote:
> From: Cong Wang <cong.wang@bytedance.com>
> 
> Yucong noticed we can't poll() sockets in sockmap even
> when they are the destination sockets of redirections.
> This is because we never poll any psock queues in ->poll(),
> except for TCP. With ->sock_is_readable() now we can
> overwrite >sock_is_readable(), invoke and implement it for
> both UDP and AF_UNIX sockets.
> 
> Reported-by: Yucong Sun <sunyucong@gmail.com>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Jakub Sitnicki <jakub@cloudflare.com>
> Cc: Lorenz Bauer <lmb@cloudflare.com>
> Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> ---
>   net/ipv4/udp.c      | 2 ++
>   net/ipv4/udp_bpf.c  | 1 +
>   net/unix/af_unix.c  | 4 ++++
>   net/unix/unix_bpf.c | 2 ++
>   4 files changed, 9 insertions(+)
> 
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 2a7825a5b842..4a7e15a43a68 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -2866,6 +2866,8 @@ __poll_t udp_poll(struct file *file, struct socket *sock, poll_table *wait)
>   	    !(sk->sk_shutdown & RCV_SHUTDOWN) && first_packet_length(sk) == -1)
>   		mask &= ~(EPOLLIN | EPOLLRDNORM);
>   
> +	if (sk_is_readable(sk))
> +		mask |= EPOLLIN | EPOLLRDNORM;

udp_poll() has this extra logic around first_packet_length() which drops all bad csum'ed
skbs. How does this stand in relation to sk_msg_is_readable()? Is this a concern as well
there? Maybe makes sense to elaborate a bit more in the commit message for context / future
reference.

Thanks,
Daniel

>   	return mask;
>   
>   }
Cong Wang Oct. 7, 2021, 8:53 p.m. UTC | #2
On Thu, Oct 7, 2021 at 1:00 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 10/2/21 2:37 AM, Cong Wang wrote:
> > From: Cong Wang <cong.wang@bytedance.com>
> >
> > Yucong noticed we can't poll() sockets in sockmap even
> > when they are the destination sockets of redirections.
> > This is because we never poll any psock queues in ->poll(),
> > except for TCP. With ->sock_is_readable() now we can
> > overwrite >sock_is_readable(), invoke and implement it for
> > both UDP and AF_UNIX sockets.
> >
> > Reported-by: Yucong Sun <sunyucong@gmail.com>
> > Cc: John Fastabend <john.fastabend@gmail.com>
> > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > Cc: Jakub Sitnicki <jakub@cloudflare.com>
> > Cc: Lorenz Bauer <lmb@cloudflare.com>
> > Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> > ---
> >   net/ipv4/udp.c      | 2 ++
> >   net/ipv4/udp_bpf.c  | 1 +
> >   net/unix/af_unix.c  | 4 ++++
> >   net/unix/unix_bpf.c | 2 ++
> >   4 files changed, 9 insertions(+)
> >
> > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > index 2a7825a5b842..4a7e15a43a68 100644
> > --- a/net/ipv4/udp.c
> > +++ b/net/ipv4/udp.c
> > @@ -2866,6 +2866,8 @@ __poll_t udp_poll(struct file *file, struct socket *sock, poll_table *wait)
> >           !(sk->sk_shutdown & RCV_SHUTDOWN) && first_packet_length(sk) == -1)
> >               mask &= ~(EPOLLIN | EPOLLRDNORM);
> >
> > +     if (sk_is_readable(sk))
> > +             mask |= EPOLLIN | EPOLLRDNORM;
>
> udp_poll() has this extra logic around first_packet_length() which drops all bad csum'ed
> skbs. How does this stand in relation to sk_msg_is_readable()? Is this a concern as well
> there? Maybe makes sense to elaborate a bit more in the commit message for context / future
> reference.

We don't validate UDP checksums on sockmap RX path, so
it is okay to leave it as it is, but it is worth a comment like
you suggest. I will add a comment in this code.

If we really need to validate the checksum, it should be addressed
in a separate patch(set), not in this one.

Thanks.
John Fastabend Oct. 8, 2021, 3:26 p.m. UTC | #3
Cong Wang wrote:
> On Thu, Oct 7, 2021 at 1:00 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >
> > On 10/2/21 2:37 AM, Cong Wang wrote:
> > > From: Cong Wang <cong.wang@bytedance.com>
> > >
> > > Yucong noticed we can't poll() sockets in sockmap even
> > > when they are the destination sockets of redirections.
> > > This is because we never poll any psock queues in ->poll(),
> > > except for TCP. With ->sock_is_readable() now we can
> > > overwrite >sock_is_readable(), invoke and implement it for
> > > both UDP and AF_UNIX sockets.
> > >
> > > Reported-by: Yucong Sun <sunyucong@gmail.com>
> > > Cc: John Fastabend <john.fastabend@gmail.com>
> > > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > > Cc: Jakub Sitnicki <jakub@cloudflare.com>
> > > Cc: Lorenz Bauer <lmb@cloudflare.com>
> > > Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> > > ---
> > >   net/ipv4/udp.c      | 2 ++
> > >   net/ipv4/udp_bpf.c  | 1 +
> > >   net/unix/af_unix.c  | 4 ++++
> > >   net/unix/unix_bpf.c | 2 ++
> > >   4 files changed, 9 insertions(+)
> > >
> > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > > index 2a7825a5b842..4a7e15a43a68 100644
> > > --- a/net/ipv4/udp.c
> > > +++ b/net/ipv4/udp.c
> > > @@ -2866,6 +2866,8 @@ __poll_t udp_poll(struct file *file, struct socket *sock, poll_table *wait)
> > >           !(sk->sk_shutdown & RCV_SHUTDOWN) && first_packet_length(sk) == -1)
> > >               mask &= ~(EPOLLIN | EPOLLRDNORM);
> > >
> > > +     if (sk_is_readable(sk))
> > > +             mask |= EPOLLIN | EPOLLRDNORM;
> >
> > udp_poll() has this extra logic around first_packet_length() which drops all bad csum'ed
> > skbs. How does this stand in relation to sk_msg_is_readable()? Is this a concern as well
> > there? Maybe makes sense to elaborate a bit more in the commit message for context / future
> > reference.
> 
> We don't validate UDP checksums on sockmap RX path, so
> it is okay to leave it as it is, but it is worth a comment like
> you suggest. I will add a comment in this code.
> 
> If we really need to validate the checksum, it should be addressed
> in a separate patch(set), not in this one.
> 
> Thanks.

We should validate the checksum before creating the sk_msg so that any parsers running
on top of UDP don't parse a bad checksum payload or pass a bad checksum up to user
space. I thought there would be a check in the read_sock side, but I didn't see it.
diff mbox series

Patch

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 2a7825a5b842..4a7e15a43a68 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2866,6 +2866,8 @@  __poll_t udp_poll(struct file *file, struct socket *sock, poll_table *wait)
 	    !(sk->sk_shutdown & RCV_SHUTDOWN) && first_packet_length(sk) == -1)
 		mask &= ~(EPOLLIN | EPOLLRDNORM);
 
+	if (sk_is_readable(sk))
+		mask |= EPOLLIN | EPOLLRDNORM;
 	return mask;
 
 }
diff --git a/net/ipv4/udp_bpf.c b/net/ipv4/udp_bpf.c
index 7a1d5f473878..bbe6569c9ad3 100644
--- a/net/ipv4/udp_bpf.c
+++ b/net/ipv4/udp_bpf.c
@@ -114,6 +114,7 @@  static void udp_bpf_rebuild_protos(struct proto *prot, const struct proto *base)
 	*prot        = *base;
 	prot->close  = sock_map_close;
 	prot->recvmsg = udp_bpf_recvmsg;
+	prot->sock_is_readable = sk_msg_is_readable;
 }
 
 static void udp_bpf_check_v6_needs_rebuild(struct proto *ops)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index f505b89bda6a..3e65d9f5531d 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -3029,6 +3029,8 @@  static __poll_t unix_poll(struct file *file, struct socket *sock, poll_table *wa
 	/* readable? */
 	if (!skb_queue_empty_lockless(&sk->sk_receive_queue))
 		mask |= EPOLLIN | EPOLLRDNORM;
+	if (sk_is_readable(sk))
+		mask |= EPOLLIN | EPOLLRDNORM;
 
 	/* Connection-based need to check for termination and startup */
 	if ((sk->sk_type == SOCK_STREAM || sk->sk_type == SOCK_SEQPACKET) &&
@@ -3068,6 +3070,8 @@  static __poll_t unix_dgram_poll(struct file *file, struct socket *sock,
 	/* readable? */
 	if (!skb_queue_empty_lockless(&sk->sk_receive_queue))
 		mask |= EPOLLIN | EPOLLRDNORM;
+	if (sk_is_readable(sk))
+		mask |= EPOLLIN | EPOLLRDNORM;
 
 	/* Connection-based need to check for termination and startup */
 	if (sk->sk_type == SOCK_SEQPACKET) {
diff --git a/net/unix/unix_bpf.c b/net/unix/unix_bpf.c
index b927e2baae50..452376c6f419 100644
--- a/net/unix/unix_bpf.c
+++ b/net/unix/unix_bpf.c
@@ -102,6 +102,7 @@  static void unix_dgram_bpf_rebuild_protos(struct proto *prot, const struct proto
 	*prot        = *base;
 	prot->close  = sock_map_close;
 	prot->recvmsg = unix_bpf_recvmsg;
+	prot->sock_is_readable = sk_msg_is_readable;
 }
 
 static void unix_stream_bpf_rebuild_protos(struct proto *prot,
@@ -110,6 +111,7 @@  static void unix_stream_bpf_rebuild_protos(struct proto *prot,
 	*prot        = *base;
 	prot->close  = sock_map_close;
 	prot->recvmsg = unix_bpf_recvmsg;
+	prot->sock_is_readable = sk_msg_is_readable;
 	prot->unhash  = sock_map_unhash;
 }