Message ID | 20211227062035.3224982-1-imagedong@tencent.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | net: bpf: handle return value of BPF_CGROUP_RUN_PROG_INET4_POST_BIND() | expand |
On Mon, 27 Dec 2021 14:20:35 +0800 menglong8.dong@gmail.com wrote: > From: Menglong Dong <imagedong@tencent.com> > > The return value of BPF_CGROUP_RUN_PROG_INET4_POST_BIND() in > __inet_bind() is not handled properly. While the return value > is non-zero, it will set inet_saddr and inet_rcv_saddr to 0 and > exit: > > err = BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk); > if (err) { > inet->inet_saddr = inet->inet_rcv_saddr = 0; > goto out_release_sock; > } > > Let's take UDP for example and see what will happen. For UDP > socket, it will be added to 'udp_prot.h.udp_table->hash' and > 'udp_prot.h.udp_table->hash2' after the sk->sk_prot->get_port() > called success. If 'inet->inet_rcv_saddr' is specified here, > then 'sk' will be in the 'hslot2' of 'hash2' that it don't belong > to (because inet_saddr is changed to 0), and UDP packet received > will not be passed to this sock. If 'inet->inet_rcv_saddr' is not > specified here, the sock will work fine, as it can receive packet > properly, which is wired, as the 'bind()' is already failed. > > I'm not sure what should do here, maybe we should unhash the sock > for UDP? Therefor, user can try to bind another port? Enumarating the L4 unwind paths in L3 code seems like a fairly clear layering violation. A new callback to undo ->sk_prot->get_port() may be better. Does IPv6 no need as similar change? You need to provide a selftest to validate the expected behavior. > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c > index 04067b249bf3..9e5710f40a39 100644 > --- a/net/ipv4/af_inet.c > +++ b/net/ipv4/af_inet.c > @@ -530,7 +530,14 @@ int __inet_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len, > if (!(flags & BIND_FROM_BPF)) { > err = BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk); > if (err) { > + if (sk->sk_prot == &udp_prot) > + sk->sk_prot->unhash(sk); > + else if (sk->sk_prot == &tcp_prot) > + inet_put_port(sk); > + > inet->inet_saddr = inet->inet_rcv_saddr = 0; > + err = -EPERM; > + > goto out_release_sock; > } > }
On Thu, Dec 30, 2021 at 5:09 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Mon, 27 Dec 2021 14:20:35 +0800 menglong8.dong@gmail.com wrote: > > From: Menglong Dong <imagedong@tencent.com> > > > > The return value of BPF_CGROUP_RUN_PROG_INET4_POST_BIND() in > > __inet_bind() is not handled properly. While the return value > > is non-zero, it will set inet_saddr and inet_rcv_saddr to 0 and > > exit: > > > > err = BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk); > > if (err) { > > inet->inet_saddr = inet->inet_rcv_saddr = 0; > > goto out_release_sock; > > } > > > > Let's take UDP for example and see what will happen. For UDP > > socket, it will be added to 'udp_prot.h.udp_table->hash' and > > 'udp_prot.h.udp_table->hash2' after the sk->sk_prot->get_port() > > called success. If 'inet->inet_rcv_saddr' is specified here, > > then 'sk' will be in the 'hslot2' of 'hash2' that it don't belong > > to (because inet_saddr is changed to 0), and UDP packet received > > will not be passed to this sock. If 'inet->inet_rcv_saddr' is not > > specified here, the sock will work fine, as it can receive packet > > properly, which is wired, as the 'bind()' is already failed. > > > > I'm not sure what should do here, maybe we should unhash the sock > > for UDP? Therefor, user can try to bind another port? > > Enumarating the L4 unwind paths in L3 code seems like a fairly clear > layering violation. A new callback to undo ->sk_prot->get_port() may > be better. Yeah, it seems there isn't an easier way to solve this problem, a new callback is needed. > > Does IPv6 no need as similar change? > IPv6 nedd change too. This patch is just to get some suggestions :/ > You need to provide a selftest to validate the expected behavior. I'll add it. Thanks! Menglong Dong > > > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c > > index 04067b249bf3..9e5710f40a39 100644 > > --- a/net/ipv4/af_inet.c > > +++ b/net/ipv4/af_inet.c > > @@ -530,7 +530,14 @@ int __inet_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len, > > if (!(flags & BIND_FROM_BPF)) { > > err = BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk); > > if (err) { > > + if (sk->sk_prot == &udp_prot) > > + sk->sk_prot->unhash(sk); > > + else if (sk->sk_prot == &tcp_prot) > > + inet_put_port(sk); > > + > > inet->inet_saddr = inet->inet_rcv_saddr = 0; > > + err = -EPERM; > > + > > goto out_release_sock; > > } > > } >
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 04067b249bf3..9e5710f40a39 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -530,7 +530,14 @@ int __inet_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len, if (!(flags & BIND_FROM_BPF)) { err = BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk); if (err) { + if (sk->sk_prot == &udp_prot) + sk->sk_prot->unhash(sk); + else if (sk->sk_prot == &tcp_prot) + inet_put_port(sk); + inet->inet_saddr = inet->inet_rcv_saddr = 0; + err = -EPERM; + goto out_release_sock; } }