Message ID | 1627e0f688c7de7fe291b09c524c7fbb55cfe367.1610669653.git.sdf@google.com (mailing list archive) |
---|---|
State | RFC |
Delegated to: | BPF |
Headers | show |
Series | [RPC,bpf-next] bpf: implement new BPF_CGROUP_INET_SOCK_POST_CONNECT | expand |
On Thu, Jan 14, 2021 at 4:19 PM Stanislav Fomichev <sdf@google.com> wrote: > > We are playing with doing hybrid conntrack where BPF generates > connect/disconnect/etc events and puts them into perfbuf (or, later, > new ringbuf). We can get most of the functionality out of > existing hooks: > - BPF_CGROUP_SOCK_OPS fully covers TCP > - BPF_CGROUP_UDP4_SENDMSG covers unconnected UDP (with sampling, etc) > > The only missing bit is connected UDP where we can get some > information from the existing BPF_CGROUP_INET{4,6}_CONNECT if the caller > did explicit bind(); otherwise, in an autobind case, we get > only destination addr/port and no source port because this hook > triggers prior to that. > > We'd really like to avoid the cost of BPF_CGROUP_INET_EGRESS > and filtering UDP (which covers both connected and unconnected UDP, > but loses that connect/disconnect pseudo signal). > > The proposal is to add a new BPF_CGROUP_INET_SOCK_POST_CONNECT which > triggers right before sys_connect exits in the AF_INET{,6} case. > The context is bpf_sock which lets BPF examine the socket state. > There is really no reason for it to trigger for all inet socks, > I've considered adding BPF_CGROUP_UDP_POST_CONNECT, but decided > that it might be better to have a generic inet case. > > New hook triggers right before sys_connect() returns and gives > BPF an opportunity to explore source & destination addresses > as well as ability to return EPERM to the user. > > This is somewhat analogous to the existing BPF_CGROUP_INET{4,6}_POST_BIND > hooks with the intention to log the connection addresses (after autobind). > > Signed-off-by: Stanislav Fomichev <sdf@google.com> > Change-Id: I46d0122f93c58b17bfae5ba5040b0b0343908c19 > --- > include/linux/bpf-cgroup.h | 17 +++++++++++++++++ > include/uapi/linux/bpf.h | 1 + > kernel/bpf/syscall.c | 3 +++ > net/core/filter.c | 4 ++++ > net/ipv4/af_inet.c | 7 ++++++- > 5 files changed, 31 insertions(+), 1 deletion(-) > > diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h > index 72e69a0e1e8c..f110935258b9 100644 > --- a/include/linux/bpf-cgroup.h > +++ b/include/linux/bpf-cgroup.h > @@ -213,12 +213,29 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key, > __ret; \ > }) > > +#define BPF_CGROUP_RUN_SK_PROG_LOCKED(sk, type) \ > +({ \ > + int __ret = 0; \ > + if (cgroup_bpf_enabled) { \ > + lock_sock(sk); \ > + __ret = __cgroup_bpf_run_filter_sk(sk, type); \ > + release_sock(sk); \ > + } \ > + __ret; \ > +}) > + > #define BPF_CGROUP_RUN_PROG_INET_SOCK(sk) \ > BPF_CGROUP_RUN_SK_PROG(sk, BPF_CGROUP_INET_SOCK_CREATE) > > #define BPF_CGROUP_RUN_PROG_INET_SOCK_RELEASE(sk) \ > BPF_CGROUP_RUN_SK_PROG(sk, BPF_CGROUP_INET_SOCK_RELEASE) > > +#define BPF_CGROUP_RUN_PROG_INET_SOCK_POST_CONNECT(sk) \ > + BPF_CGROUP_RUN_SK_PROG(sk, BPF_CGROUP_INET_SOCK_POST_CONNECT) > + > +#define BPF_CGROUP_RUN_PROG_INET_SOCK_POST_CONNECT_LOCKED(sk) \ > + BPF_CGROUP_RUN_SK_PROG_LOCKED(sk, BPF_CGROUP_INET_SOCK_POST_CONNECT) > + > #define BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk) \ > BPF_CGROUP_RUN_SK_PROG(sk, BPF_CGROUP_INET4_POST_BIND) > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index a1ad32456f89..3235f7bd131f 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -241,6 +241,7 @@ enum bpf_attach_type { > BPF_XDP_CPUMAP, > BPF_SK_LOOKUP, > BPF_XDP, > + BPF_CGROUP_INET_SOCK_POST_CONNECT, Adding new bpf_attach_type enums keeps blowing up the size of struct cgroup_bpf. Right now we have 38 different values, of which 15 values are not related to cgroups (judging by their name). That results in 15 * (8 + 16 + 4) = 420 extra bytes wasted for each struct cgroup_bpf (and thus struct cgroup). Probably not critical, but it would be nice to not waste space unnecessarily. Would anyone be interested in addressing this? Basically, instead of using MAX_BPF_ATTACH_TYPE from enum bpf_attach_type, we'd need to have cgroup-specific enumeration and mapping bpf_attach_type to that bpf_cgroup_attach_type to compactly store information in struct cgroup_bpf. Thoughts? > __MAX_BPF_ATTACH_TYPE > }; > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index c3bb03c8371f..7d6fd1e32d22 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -1958,6 +1958,7 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type, > switch (expected_attach_type) { > case BPF_CGROUP_INET_SOCK_CREATE: > case BPF_CGROUP_INET_SOCK_RELEASE: > + case BPF_CGROUP_INET_SOCK_POST_CONNECT: > case BPF_CGROUP_INET4_POST_BIND: > case BPF_CGROUP_INET6_POST_BIND: > return 0; > @@ -2910,6 +2911,7 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type) > return BPF_PROG_TYPE_CGROUP_SKB; > case BPF_CGROUP_INET_SOCK_CREATE: > case BPF_CGROUP_INET_SOCK_RELEASE: > + case BPF_CGROUP_INET_SOCK_POST_CONNECT: > case BPF_CGROUP_INET4_POST_BIND: > case BPF_CGROUP_INET6_POST_BIND: > return BPF_PROG_TYPE_CGROUP_SOCK; > @@ -3063,6 +3065,7 @@ static int bpf_prog_query(const union bpf_attr *attr, > case BPF_CGROUP_INET_EGRESS: > case BPF_CGROUP_INET_SOCK_CREATE: > case BPF_CGROUP_INET_SOCK_RELEASE: > + case BPF_CGROUP_INET_SOCK_POST_CONNECT: > case BPF_CGROUP_INET4_BIND: > case BPF_CGROUP_INET6_BIND: > case BPF_CGROUP_INET4_POST_BIND: > diff --git a/net/core/filter.c b/net/core/filter.c > index 9ab94e90d660..d955321d3415 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -7683,12 +7683,14 @@ static bool __sock_filter_check_attach_type(int off, > switch (attach_type) { > case BPF_CGROUP_INET_SOCK_CREATE: > case BPF_CGROUP_INET_SOCK_RELEASE: > + case BPF_CGROUP_INET_SOCK_POST_CONNECT: > goto full_access; > default: > return false; > } > case bpf_ctx_range(struct bpf_sock, src_ip4): > switch (attach_type) { > + case BPF_CGROUP_INET_SOCK_POST_CONNECT: > case BPF_CGROUP_INET4_POST_BIND: > goto read_only; > default: > @@ -7696,6 +7698,7 @@ static bool __sock_filter_check_attach_type(int off, > } > case bpf_ctx_range_till(struct bpf_sock, src_ip6[0], src_ip6[3]): > switch (attach_type) { > + case BPF_CGROUP_INET_SOCK_POST_CONNECT: > case BPF_CGROUP_INET6_POST_BIND: > goto read_only; > default: > @@ -7703,6 +7706,7 @@ static bool __sock_filter_check_attach_type(int off, > } > case bpf_ctx_range(struct bpf_sock, src_port): > switch (attach_type) { > + case BPF_CGROUP_INET_SOCK_POST_CONNECT: > case BPF_CGROUP_INET4_POST_BIND: > case BPF_CGROUP_INET6_POST_BIND: > goto read_only; > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c > index b94fa8eb831b..568654cafa48 100644 > --- a/net/ipv4/af_inet.c > +++ b/net/ipv4/af_inet.c > @@ -574,7 +574,10 @@ int inet_dgram_connect(struct socket *sock, struct sockaddr *uaddr, > > if (!inet_sk(sk)->inet_num && inet_autobind(sk)) > return -EAGAIN; > - return sk->sk_prot->connect(sk, uaddr, addr_len); > + err = sk->sk_prot->connect(sk, uaddr, addr_len); > + if (!err) > + err = BPF_CGROUP_RUN_PROG_INET_SOCK_POST_CONNECT_LOCKED(sk); > + return err; > } > EXPORT_SYMBOL(inet_dgram_connect); > Have you tried attaching the fexit program to inet_dgram_connect? Doesn't it give all the information you need? > @@ -723,6 +726,8 @@ int inet_stream_connect(struct socket *sock, struct sockaddr *uaddr, > > lock_sock(sock->sk); > err = __inet_stream_connect(sock, uaddr, addr_len, flags, 0); Similarly here, attaching fexit to __inet_stream_connect would execute your BPF program at exactly the same time (and then you can check for err value). Or the point here is to have a more "stable" BPF program type? > + if (!err) > + err = BPF_CGROUP_RUN_PROG_INET_SOCK_POST_CONNECT(sock->sk); > release_sock(sock->sk); > return err; > } > -- > 2.30.0.284.gd98b1dd5eaa7-goog >
On Thu, Jan 14, 2021 at 6:38 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Thu, Jan 14, 2021 at 4:19 PM Stanislav Fomichev <sdf@google.com> wrote: > > > > We are playing with doing hybrid conntrack where BPF generates > > connect/disconnect/etc events and puts them into perfbuf (or, later, > > new ringbuf). We can get most of the functionality out of > > existing hooks: > > - BPF_CGROUP_SOCK_OPS fully covers TCP > > - BPF_CGROUP_UDP4_SENDMSG covers unconnected UDP (with sampling, etc) > > > > The only missing bit is connected UDP where we can get some > > information from the existing BPF_CGROUP_INET{4,6}_CONNECT if the caller > > did explicit bind(); otherwise, in an autobind case, we get > > only destination addr/port and no source port because this hook > > triggers prior to that. > > > > We'd really like to avoid the cost of BPF_CGROUP_INET_EGRESS > > and filtering UDP (which covers both connected and unconnected UDP, > > but loses that connect/disconnect pseudo signal). > > > > The proposal is to add a new BPF_CGROUP_INET_SOCK_POST_CONNECT which > > triggers right before sys_connect exits in the AF_INET{,6} case. > > The context is bpf_sock which lets BPF examine the socket state. > > There is really no reason for it to trigger for all inet socks, > > I've considered adding BPF_CGROUP_UDP_POST_CONNECT, but decided > > that it might be better to have a generic inet case. > > > > New hook triggers right before sys_connect() returns and gives > > BPF an opportunity to explore source & destination addresses > > as well as ability to return EPERM to the user. > > > > This is somewhat analogous to the existing BPF_CGROUP_INET{4,6}_POST_BIND > > hooks with the intention to log the connection addresses (after autobind). > > > > Signed-off-by: Stanislav Fomichev <sdf@google.com> > > Change-Id: I46d0122f93c58b17bfae5ba5040b0b0343908c19 > > --- > > include/linux/bpf-cgroup.h | 17 +++++++++++++++++ > > include/uapi/linux/bpf.h | 1 + > > kernel/bpf/syscall.c | 3 +++ > > net/core/filter.c | 4 ++++ > > net/ipv4/af_inet.c | 7 ++++++- > > 5 files changed, 31 insertions(+), 1 deletion(-) > > > > diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h > > index 72e69a0e1e8c..f110935258b9 100644 > > --- a/include/linux/bpf-cgroup.h > > +++ b/include/linux/bpf-cgroup.h > > @@ -213,12 +213,29 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key, > > __ret; \ > > }) > > > > +#define BPF_CGROUP_RUN_SK_PROG_LOCKED(sk, type) \ > > +({ \ > > + int __ret = 0; \ > > + if (cgroup_bpf_enabled) { \ > > + lock_sock(sk); \ > > + __ret = __cgroup_bpf_run_filter_sk(sk, type); \ > > + release_sock(sk); \ > > + } \ > > + __ret; \ > > +}) > > + > > #define BPF_CGROUP_RUN_PROG_INET_SOCK(sk) \ > > BPF_CGROUP_RUN_SK_PROG(sk, BPF_CGROUP_INET_SOCK_CREATE) > > > > #define BPF_CGROUP_RUN_PROG_INET_SOCK_RELEASE(sk) \ > > BPF_CGROUP_RUN_SK_PROG(sk, BPF_CGROUP_INET_SOCK_RELEASE) > > > > +#define BPF_CGROUP_RUN_PROG_INET_SOCK_POST_CONNECT(sk) \ > > + BPF_CGROUP_RUN_SK_PROG(sk, BPF_CGROUP_INET_SOCK_POST_CONNECT) > > + > > +#define BPF_CGROUP_RUN_PROG_INET_SOCK_POST_CONNECT_LOCKED(sk) \ > > + BPF_CGROUP_RUN_SK_PROG_LOCKED(sk, BPF_CGROUP_INET_SOCK_POST_CONNECT) > > + > > #define BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk) \ > > BPF_CGROUP_RUN_SK_PROG(sk, BPF_CGROUP_INET4_POST_BIND) > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > index a1ad32456f89..3235f7bd131f 100644 > > --- a/include/uapi/linux/bpf.h > > +++ b/include/uapi/linux/bpf.h > > @@ -241,6 +241,7 @@ enum bpf_attach_type { > > BPF_XDP_CPUMAP, > > BPF_SK_LOOKUP, > > BPF_XDP, > > + BPF_CGROUP_INET_SOCK_POST_CONNECT, > > Adding new bpf_attach_type enums keeps blowing up the size of struct > cgroup_bpf. Right now we have 38 different values, of which 15 values > are not related to cgroups (judging by their name). That results in 15 > * (8 + 16 + 4) = 420 extra bytes wasted for each struct cgroup_bpf > (and thus struct cgroup). Probably not critical, but it would be nice > to not waste space unnecessarily. > > Would anyone be interested in addressing this? Basically, instead of > using MAX_BPF_ATTACH_TYPE from enum bpf_attach_type, we'd need to have > cgroup-specific enumeration and mapping bpf_attach_type to that > bpf_cgroup_attach_type to compactly store information in struct > cgroup_bpf. Thoughts? Sure, I can get to that at some point if nobody beats me to it. Assuming we have 10k cgroups on the machine, it would save about 4mb, which doesn't look alarming. > > __MAX_BPF_ATTACH_TYPE > > }; > > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > > index c3bb03c8371f..7d6fd1e32d22 100644 > > --- a/kernel/bpf/syscall.c > > +++ b/kernel/bpf/syscall.c > > @@ -1958,6 +1958,7 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type, > > switch (expected_attach_type) { > > case BPF_CGROUP_INET_SOCK_CREATE: > > case BPF_CGROUP_INET_SOCK_RELEASE: > > + case BPF_CGROUP_INET_SOCK_POST_CONNECT: > > case BPF_CGROUP_INET4_POST_BIND: > > case BPF_CGROUP_INET6_POST_BIND: > > return 0; > > @@ -2910,6 +2911,7 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type) > > return BPF_PROG_TYPE_CGROUP_SKB; > > case BPF_CGROUP_INET_SOCK_CREATE: > > case BPF_CGROUP_INET_SOCK_RELEASE: > > + case BPF_CGROUP_INET_SOCK_POST_CONNECT: > > case BPF_CGROUP_INET4_POST_BIND: > > case BPF_CGROUP_INET6_POST_BIND: > > return BPF_PROG_TYPE_CGROUP_SOCK; > > @@ -3063,6 +3065,7 @@ static int bpf_prog_query(const union bpf_attr *attr, > > case BPF_CGROUP_INET_EGRESS: > > case BPF_CGROUP_INET_SOCK_CREATE: > > case BPF_CGROUP_INET_SOCK_RELEASE: > > + case BPF_CGROUP_INET_SOCK_POST_CONNECT: > > case BPF_CGROUP_INET4_BIND: > > case BPF_CGROUP_INET6_BIND: > > case BPF_CGROUP_INET4_POST_BIND: > > diff --git a/net/core/filter.c b/net/core/filter.c > > index 9ab94e90d660..d955321d3415 100644 > > --- a/net/core/filter.c > > +++ b/net/core/filter.c > > @@ -7683,12 +7683,14 @@ static bool __sock_filter_check_attach_type(int off, > > switch (attach_type) { > > case BPF_CGROUP_INET_SOCK_CREATE: > > case BPF_CGROUP_INET_SOCK_RELEASE: > > + case BPF_CGROUP_INET_SOCK_POST_CONNECT: > > goto full_access; > > default: > > return false; > > } > > case bpf_ctx_range(struct bpf_sock, src_ip4): > > switch (attach_type) { > > + case BPF_CGROUP_INET_SOCK_POST_CONNECT: > > case BPF_CGROUP_INET4_POST_BIND: > > goto read_only; > > default: > > @@ -7696,6 +7698,7 @@ static bool __sock_filter_check_attach_type(int off, > > } > > case bpf_ctx_range_till(struct bpf_sock, src_ip6[0], src_ip6[3]): > > switch (attach_type) { > > + case BPF_CGROUP_INET_SOCK_POST_CONNECT: > > case BPF_CGROUP_INET6_POST_BIND: > > goto read_only; > > default: > > @@ -7703,6 +7706,7 @@ static bool __sock_filter_check_attach_type(int off, > > } > > case bpf_ctx_range(struct bpf_sock, src_port): > > switch (attach_type) { > > + case BPF_CGROUP_INET_SOCK_POST_CONNECT: > > case BPF_CGROUP_INET4_POST_BIND: > > case BPF_CGROUP_INET6_POST_BIND: > > goto read_only; > > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c > > index b94fa8eb831b..568654cafa48 100644 > > --- a/net/ipv4/af_inet.c > > +++ b/net/ipv4/af_inet.c > > @@ -574,7 +574,10 @@ int inet_dgram_connect(struct socket *sock, struct sockaddr *uaddr, > > > > if (!inet_sk(sk)->inet_num && inet_autobind(sk)) > > return -EAGAIN; > > - return sk->sk_prot->connect(sk, uaddr, addr_len); > > + err = sk->sk_prot->connect(sk, uaddr, addr_len); > > + if (!err) > > + err = BPF_CGROUP_RUN_PROG_INET_SOCK_POST_CONNECT_LOCKED(sk); > > + return err; > > } > > EXPORT_SYMBOL(inet_dgram_connect); > > > > Have you tried attaching the fexit program to inet_dgram_connect? > Doesn't it give all the information you need? > > > @@ -723,6 +726,8 @@ int inet_stream_connect(struct socket *sock, struct sockaddr *uaddr, > > > > lock_sock(sock->sk); > > err = __inet_stream_connect(sock, uaddr, addr_len, flags, 0); > > Similarly here, attaching fexit to __inet_stream_connect would execute > your BPF program at exactly the same time (and then you can check for > err value). > > Or the point here is to have a more "stable" BPF program type? Good suggestion, I can try to play with it, I think it should give me all the info I need (I only need sock). But yeah, I'd rather prefer a stable interface against stable __sk_buff, but maybe fexit will also work.
On Thu, Jan 14, 2021 at 7:51 PM Stanislav Fomichev <sdf@google.com> wrote: > > > > > > lock_sock(sock->sk); > > > err = __inet_stream_connect(sock, uaddr, addr_len, flags, 0); > > > > Similarly here, attaching fexit to __inet_stream_connect would execute > > your BPF program at exactly the same time (and then you can check for > > err value). > > > > Or the point here is to have a more "stable" BPF program type? > Good suggestion, I can try to play with it, I think it should give me > all the info I need (I only need sock). > But yeah, I'd rather prefer a stable interface against stable > __sk_buff, but maybe fexit will also work. Maybe we can add an extension to fentry/fexit that are cgroup scoped? I think this will solve many such cases.
On 1/14/21 7:59 PM, Alexei Starovoitov wrote: > On Thu, Jan 14, 2021 at 7:51 PM Stanislav Fomichev <sdf@google.com> wrote: >>>> >>>> lock_sock(sock->sk); >>>> err = __inet_stream_connect(sock, uaddr, addr_len, flags, 0); >>> >>> Similarly here, attaching fexit to __inet_stream_connect would execute >>> your BPF program at exactly the same time (and then you can check for >>> err value). >>> >>> Or the point here is to have a more "stable" BPF program type? >> Good suggestion, I can try to play with it, I think it should give me >> all the info I need (I only need sock). >> But yeah, I'd rather prefer a stable interface against stable >> __sk_buff, but maybe fexit will also work. > > Maybe we can add an extension to fentry/fexit that are cgroup scoped? > I think this will solve many such cases. Currently, google is pushing LTO build of the kernel. If this happens, it is possible one global function in one file (say a.c) might be inlined into another file (say b.c). So in this particular case, if the global function is inlined, fentry/fexit approach might be missing some cases? We could mark certain *special purpose* function as non-inline, but not sure whether this is scalable or not.
Hi Stanislav, Thank you for the patch! Yet something to improve: [auto build test ERROR on bpf-next/master] url: https://github.com/0day-ci/linux/commits/Stanislav-Fomichev/bpf-implement-new-BPF_CGROUP_INET_SOCK_POST_CONNECT/20210115-112524 base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master config: arm-randconfig-r014-20210115 (attached as .config) compiler: arm-linux-gnueabi-gcc (GCC) 9.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/342141c74fe4ece77f9d9753918a77e66d9d3316 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Stanislav-Fomichev/bpf-implement-new-BPF_CGROUP_INET_SOCK_POST_CONNECT/20210115-112524 git checkout 342141c74fe4ece77f9d9753918a77e66d9d3316 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arm If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): net/ipv4/af_inet.c: In function 'inet_dgram_connect': >> net/ipv4/af_inet.c:579:9: error: implicit declaration of function 'BPF_CGROUP_RUN_PROG_INET_SOCK_POST_CONNECT_LOCKED'; did you mean 'BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK'? [-Werror=implicit-function-declaration] 579 | err = BPF_CGROUP_RUN_PROG_INET_SOCK_POST_CONNECT_LOCKED(sk); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK net/ipv4/af_inet.c: In function 'inet_stream_connect': >> net/ipv4/af_inet.c:730:9: error: implicit declaration of function 'BPF_CGROUP_RUN_PROG_INET_SOCK_POST_CONNECT'; did you mean 'BPF_CGROUP_RUN_PROG_INET6_CONNECT'? [-Werror=implicit-function-declaration] 730 | err = BPF_CGROUP_RUN_PROG_INET_SOCK_POST_CONNECT(sock->sk); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | BPF_CGROUP_RUN_PROG_INET6_CONNECT cc1: some warnings being treated as errors vim +579 net/ipv4/af_inet.c 557 558 int inet_dgram_connect(struct socket *sock, struct sockaddr *uaddr, 559 int addr_len, int flags) 560 { 561 struct sock *sk = sock->sk; 562 int err; 563 564 if (addr_len < sizeof(uaddr->sa_family)) 565 return -EINVAL; 566 if (uaddr->sa_family == AF_UNSPEC) 567 return sk->sk_prot->disconnect(sk, flags); 568 569 if (BPF_CGROUP_PRE_CONNECT_ENABLED(sk)) { 570 err = sk->sk_prot->pre_connect(sk, uaddr, addr_len); 571 if (err) 572 return err; 573 } 574 575 if (!inet_sk(sk)->inet_num && inet_autobind(sk)) 576 return -EAGAIN; 577 err = sk->sk_prot->connect(sk, uaddr, addr_len); 578 if (!err) > 579 err = BPF_CGROUP_RUN_PROG_INET_SOCK_POST_CONNECT_LOCKED(sk); 580 return err; 581 } 582 EXPORT_SYMBOL(inet_dgram_connect); 583 584 static long inet_wait_for_connect(struct sock *sk, long timeo, int writebias) 585 { 586 DEFINE_WAIT_FUNC(wait, woken_wake_function); 587 588 add_wait_queue(sk_sleep(sk), &wait); 589 sk->sk_write_pending += writebias; 590 591 /* Basic assumption: if someone sets sk->sk_err, he _must_ 592 * change state of the socket from TCP_SYN_*. 593 * Connect() does not allow to get error notifications 594 * without closing the socket. 595 */ 596 while ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV)) { 597 release_sock(sk); 598 timeo = wait_woken(&wait, TASK_INTERRUPTIBLE, timeo); 599 lock_sock(sk); 600 if (signal_pending(current) || !timeo) 601 break; 602 } 603 remove_wait_queue(sk_sleep(sk), &wait); 604 sk->sk_write_pending -= writebias; 605 return timeo; 606 } 607 608 /* 609 * Connect to a remote host. There is regrettably still a little 610 * TCP 'magic' in here. 611 */ 612 int __inet_stream_connect(struct socket *sock, struct sockaddr *uaddr, 613 int addr_len, int flags, int is_sendmsg) 614 { 615 struct sock *sk = sock->sk; 616 int err; 617 long timeo; 618 619 /* 620 * uaddr can be NULL and addr_len can be 0 if: 621 * sk is a TCP fastopen active socket and 622 * TCP_FASTOPEN_CONNECT sockopt is set and 623 * we already have a valid cookie for this socket. 624 * In this case, user can call write() after connect(). 625 * write() will invoke tcp_sendmsg_fastopen() which calls 626 * __inet_stream_connect(). 627 */ 628 if (uaddr) { 629 if (addr_len < sizeof(uaddr->sa_family)) 630 return -EINVAL; 631 632 if (uaddr->sa_family == AF_UNSPEC) { 633 err = sk->sk_prot->disconnect(sk, flags); 634 sock->state = err ? SS_DISCONNECTING : SS_UNCONNECTED; 635 goto out; 636 } 637 } 638 639 switch (sock->state) { 640 default: 641 err = -EINVAL; 642 goto out; 643 case SS_CONNECTED: 644 err = -EISCONN; 645 goto out; 646 case SS_CONNECTING: 647 if (inet_sk(sk)->defer_connect) 648 err = is_sendmsg ? -EINPROGRESS : -EISCONN; 649 else 650 err = -EALREADY; 651 /* Fall out of switch with err, set for this state */ 652 break; 653 case SS_UNCONNECTED: 654 err = -EISCONN; 655 if (sk->sk_state != TCP_CLOSE) 656 goto out; 657 658 if (BPF_CGROUP_PRE_CONNECT_ENABLED(sk)) { 659 err = sk->sk_prot->pre_connect(sk, uaddr, addr_len); 660 if (err) 661 goto out; 662 } 663 664 err = sk->sk_prot->connect(sk, uaddr, addr_len); 665 if (err < 0) 666 goto out; 667 668 sock->state = SS_CONNECTING; 669 670 if (!err && inet_sk(sk)->defer_connect) 671 goto out; 672 673 /* Just entered SS_CONNECTING state; the only 674 * difference is that return value in non-blocking 675 * case is EINPROGRESS, rather than EALREADY. 676 */ 677 err = -EINPROGRESS; 678 break; 679 } 680 681 timeo = sock_sndtimeo(sk, flags & O_NONBLOCK); 682 683 if ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV)) { 684 int writebias = (sk->sk_protocol == IPPROTO_TCP) && 685 tcp_sk(sk)->fastopen_req && 686 tcp_sk(sk)->fastopen_req->data ? 1 : 0; 687 688 /* Error code is set above */ 689 if (!timeo || !inet_wait_for_connect(sk, timeo, writebias)) 690 goto out; 691 692 err = sock_intr_errno(timeo); 693 if (signal_pending(current)) 694 goto out; 695 } 696 697 /* Connection was closed by RST, timeout, ICMP error 698 * or another process disconnected us. 699 */ 700 if (sk->sk_state == TCP_CLOSE) 701 goto sock_error; 702 703 /* sk->sk_err may be not zero now, if RECVERR was ordered by user 704 * and error was received after socket entered established state. 705 * Hence, it is handled normally after connect() return successfully. 706 */ 707 708 sock->state = SS_CONNECTED; 709 err = 0; 710 out: 711 return err; 712 713 sock_error: 714 err = sock_error(sk) ? : -ECONNABORTED; 715 sock->state = SS_UNCONNECTED; 716 if (sk->sk_prot->disconnect(sk, flags)) 717 sock->state = SS_DISCONNECTING; 718 goto out; 719 } 720 EXPORT_SYMBOL(__inet_stream_connect); 721 722 int inet_stream_connect(struct socket *sock, struct sockaddr *uaddr, 723 int addr_len, int flags) 724 { 725 int err; 726 727 lock_sock(sock->sk); 728 err = __inet_stream_connect(sock, uaddr, addr_len, flags, 0); 729 if (!err) > 730 err = BPF_CGROUP_RUN_PROG_INET_SOCK_POST_CONNECT(sock->sk); 731 release_sock(sock->sk); 732 return err; 733 } 734 EXPORT_SYMBOL(inet_stream_connect); 735 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Hi Stanislav, Thank you for the patch! Yet something to improve: [auto build test ERROR on bpf-next/master] url: https://github.com/0day-ci/linux/commits/Stanislav-Fomichev/bpf-implement-new-BPF_CGROUP_INET_SOCK_POST_CONNECT/20210115-112524 base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master config: powerpc64-randconfig-r021-20210115 (attached as .config) compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 5b42fd8dd4e7e29125a09a41a33af7c9cb57d144) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install powerpc64 cross compiling tool for clang build # apt-get install binutils-powerpc64-linux-gnu # https://github.com/0day-ci/linux/commit/342141c74fe4ece77f9d9753918a77e66d9d3316 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Stanislav-Fomichev/bpf-implement-new-BPF_CGROUP_INET_SOCK_POST_CONNECT/20210115-112524 git checkout 342141c74fe4ece77f9d9753918a77e66d9d3316 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc64 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): __do_insb ^ arch/powerpc/include/asm/io.h:556:56: note: expanded from macro '__do_insb' #define __do_insb(p, b, n) readsb((PCI_IO_ADDR)_IO_BASE+(p), (b), (n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from net/ipv4/af_inet.c:81: In file included from include/linux/interrupt.h:11: In file included from include/linux/hardirq.h:10: In file included from arch/powerpc/include/asm/hardirq.h:6: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:619: arch/powerpc/include/asm/io-defs.h:45:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(insw, (unsigned long p, void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:100:1: note: expanded from here __do_insw ^ arch/powerpc/include/asm/io.h:557:56: note: expanded from macro '__do_insw' #define __do_insw(p, b, n) readsw((PCI_IO_ADDR)_IO_BASE+(p), (b), (n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from net/ipv4/af_inet.c:81: In file included from include/linux/interrupt.h:11: In file included from include/linux/hardirq.h:10: In file included from arch/powerpc/include/asm/hardirq.h:6: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:619: arch/powerpc/include/asm/io-defs.h:47:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(insl, (unsigned long p, void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:102:1: note: expanded from here __do_insl ^ arch/powerpc/include/asm/io.h:558:56: note: expanded from macro '__do_insl' #define __do_insl(p, b, n) readsl((PCI_IO_ADDR)_IO_BASE+(p), (b), (n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from net/ipv4/af_inet.c:81: In file included from include/linux/interrupt.h:11: In file included from include/linux/hardirq.h:10: In file included from arch/powerpc/include/asm/hardirq.h:6: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:619: arch/powerpc/include/asm/io-defs.h:49:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(outsb, (unsigned long p, const void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:104:1: note: expanded from here __do_outsb ^ arch/powerpc/include/asm/io.h:559:58: note: expanded from macro '__do_outsb' #define __do_outsb(p, b, n) writesb((PCI_IO_ADDR)_IO_BASE+(p),(b),(n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from net/ipv4/af_inet.c:81: In file included from include/linux/interrupt.h:11: In file included from include/linux/hardirq.h:10: In file included from arch/powerpc/include/asm/hardirq.h:6: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:619: arch/powerpc/include/asm/io-defs.h:51:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(outsw, (unsigned long p, const void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:106:1: note: expanded from here __do_outsw ^ arch/powerpc/include/asm/io.h:560:58: note: expanded from macro '__do_outsw' #define __do_outsw(p, b, n) writesw((PCI_IO_ADDR)_IO_BASE+(p),(b),(n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from net/ipv4/af_inet.c:81: In file included from include/linux/interrupt.h:11: In file included from include/linux/hardirq.h:10: In file included from arch/powerpc/include/asm/hardirq.h:6: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:619: arch/powerpc/include/asm/io-defs.h:53:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(outsl, (unsigned long p, const void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:616:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:108:1: note: expanded from here __do_outsl ^ arch/powerpc/include/asm/io.h:561:58: note: expanded from macro '__do_outsl' #define __do_outsl(p, b, n) writesl((PCI_IO_ADDR)_IO_BASE+(p),(b),(n)) ~~~~~~~~~~~~~~~~~~~~~^ >> net/ipv4/af_inet.c:579:9: error: implicit declaration of function 'BPF_CGROUP_RUN_PROG_INET_SOCK_POST_CONNECT_LOCKED' [-Werror,-Wimplicit-function-declaration] err = BPF_CGROUP_RUN_PROG_INET_SOCK_POST_CONNECT_LOCKED(sk); ^ >> net/ipv4/af_inet.c:730:9: error: implicit declaration of function 'BPF_CGROUP_RUN_PROG_INET_SOCK_POST_CONNECT' [-Werror,-Wimplicit-function-declaration] err = BPF_CGROUP_RUN_PROG_INET_SOCK_POST_CONNECT(sock->sk); ^ 6 warnings and 2 errors generated. vim +/BPF_CGROUP_RUN_PROG_INET_SOCK_POST_CONNECT_LOCKED +579 net/ipv4/af_inet.c 557 558 int inet_dgram_connect(struct socket *sock, struct sockaddr *uaddr, 559 int addr_len, int flags) 560 { 561 struct sock *sk = sock->sk; 562 int err; 563 564 if (addr_len < sizeof(uaddr->sa_family)) 565 return -EINVAL; 566 if (uaddr->sa_family == AF_UNSPEC) 567 return sk->sk_prot->disconnect(sk, flags); 568 569 if (BPF_CGROUP_PRE_CONNECT_ENABLED(sk)) { 570 err = sk->sk_prot->pre_connect(sk, uaddr, addr_len); 571 if (err) 572 return err; 573 } 574 575 if (!inet_sk(sk)->inet_num && inet_autobind(sk)) 576 return -EAGAIN; 577 err = sk->sk_prot->connect(sk, uaddr, addr_len); 578 if (!err) > 579 err = BPF_CGROUP_RUN_PROG_INET_SOCK_POST_CONNECT_LOCKED(sk); 580 return err; 581 } 582 EXPORT_SYMBOL(inet_dgram_connect); 583 584 static long inet_wait_for_connect(struct sock *sk, long timeo, int writebias) 585 { 586 DEFINE_WAIT_FUNC(wait, woken_wake_function); 587 588 add_wait_queue(sk_sleep(sk), &wait); 589 sk->sk_write_pending += writebias; 590 591 /* Basic assumption: if someone sets sk->sk_err, he _must_ 592 * change state of the socket from TCP_SYN_*. 593 * Connect() does not allow to get error notifications 594 * without closing the socket. 595 */ 596 while ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV)) { 597 release_sock(sk); 598 timeo = wait_woken(&wait, TASK_INTERRUPTIBLE, timeo); 599 lock_sock(sk); 600 if (signal_pending(current) || !timeo) 601 break; 602 } 603 remove_wait_queue(sk_sleep(sk), &wait); 604 sk->sk_write_pending -= writebias; 605 return timeo; 606 } 607 608 /* 609 * Connect to a remote host. There is regrettably still a little 610 * TCP 'magic' in here. 611 */ 612 int __inet_stream_connect(struct socket *sock, struct sockaddr *uaddr, 613 int addr_len, int flags, int is_sendmsg) 614 { 615 struct sock *sk = sock->sk; 616 int err; 617 long timeo; 618 619 /* 620 * uaddr can be NULL and addr_len can be 0 if: 621 * sk is a TCP fastopen active socket and 622 * TCP_FASTOPEN_CONNECT sockopt is set and 623 * we already have a valid cookie for this socket. 624 * In this case, user can call write() after connect(). 625 * write() will invoke tcp_sendmsg_fastopen() which calls 626 * __inet_stream_connect(). 627 */ 628 if (uaddr) { 629 if (addr_len < sizeof(uaddr->sa_family)) 630 return -EINVAL; 631 632 if (uaddr->sa_family == AF_UNSPEC) { 633 err = sk->sk_prot->disconnect(sk, flags); 634 sock->state = err ? SS_DISCONNECTING : SS_UNCONNECTED; 635 goto out; 636 } 637 } 638 639 switch (sock->state) { 640 default: 641 err = -EINVAL; 642 goto out; 643 case SS_CONNECTED: 644 err = -EISCONN; 645 goto out; 646 case SS_CONNECTING: 647 if (inet_sk(sk)->defer_connect) 648 err = is_sendmsg ? -EINPROGRESS : -EISCONN; 649 else 650 err = -EALREADY; 651 /* Fall out of switch with err, set for this state */ 652 break; 653 case SS_UNCONNECTED: 654 err = -EISCONN; 655 if (sk->sk_state != TCP_CLOSE) 656 goto out; 657 658 if (BPF_CGROUP_PRE_CONNECT_ENABLED(sk)) { 659 err = sk->sk_prot->pre_connect(sk, uaddr, addr_len); 660 if (err) 661 goto out; 662 } 663 664 err = sk->sk_prot->connect(sk, uaddr, addr_len); 665 if (err < 0) 666 goto out; 667 668 sock->state = SS_CONNECTING; 669 670 if (!err && inet_sk(sk)->defer_connect) 671 goto out; 672 673 /* Just entered SS_CONNECTING state; the only 674 * difference is that return value in non-blocking 675 * case is EINPROGRESS, rather than EALREADY. 676 */ 677 err = -EINPROGRESS; 678 break; 679 } 680 681 timeo = sock_sndtimeo(sk, flags & O_NONBLOCK); 682 683 if ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV)) { 684 int writebias = (sk->sk_protocol == IPPROTO_TCP) && 685 tcp_sk(sk)->fastopen_req && 686 tcp_sk(sk)->fastopen_req->data ? 1 : 0; 687 688 /* Error code is set above */ 689 if (!timeo || !inet_wait_for_connect(sk, timeo, writebias)) 690 goto out; 691 692 err = sock_intr_errno(timeo); 693 if (signal_pending(current)) 694 goto out; 695 } 696 697 /* Connection was closed by RST, timeout, ICMP error 698 * or another process disconnected us. 699 */ 700 if (sk->sk_state == TCP_CLOSE) 701 goto sock_error; 702 703 /* sk->sk_err may be not zero now, if RECVERR was ordered by user 704 * and error was received after socket entered established state. 705 * Hence, it is handled normally after connect() return successfully. 706 */ 707 708 sock->state = SS_CONNECTED; 709 err = 0; 710 out: 711 return err; 712 713 sock_error: 714 err = sock_error(sk) ? : -ECONNABORTED; 715 sock->state = SS_UNCONNECTED; 716 if (sk->sk_prot->disconnect(sk, flags)) 717 sock->state = SS_DISCONNECTING; 718 goto out; 719 } 720 EXPORT_SYMBOL(__inet_stream_connect); 721 722 int inet_stream_connect(struct socket *sock, struct sockaddr *uaddr, 723 int addr_len, int flags) 724 { 725 int err; 726 727 lock_sock(sock->sk); 728 err = __inet_stream_connect(sock, uaddr, addr_len, flags, 0); 729 if (!err) > 730 err = BPF_CGROUP_RUN_PROG_INET_SOCK_POST_CONNECT(sock->sk); 731 release_sock(sock->sk); 732 return err; 733 } 734 EXPORT_SYMBOL(inet_stream_connect); 735 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
On Thu, Jan 14, 2021 at 8:27 PM Yonghong Song <yhs@fb.com> wrote: > > > > On 1/14/21 7:59 PM, Alexei Starovoitov wrote: > > On Thu, Jan 14, 2021 at 7:51 PM Stanislav Fomichev <sdf@google.com> wrote: > >>>> > >>>> lock_sock(sock->sk); > >>>> err = __inet_stream_connect(sock, uaddr, addr_len, flags, 0); > >>> > >>> Similarly here, attaching fexit to __inet_stream_connect would execute > >>> your BPF program at exactly the same time (and then you can check for > >>> err value). > >>> > >>> Or the point here is to have a more "stable" BPF program type? > >> Good suggestion, I can try to play with it, I think it should give me > >> all the info I need (I only need sock). > >> But yeah, I'd rather prefer a stable interface against stable > >> __sk_buff, but maybe fexit will also work. > > > > Maybe we can add an extension to fentry/fexit that are cgroup scoped? > > I think this will solve many such cases. > > Currently, google is pushing LTO build of the kernel. If this happens, > it is possible one global function in one file (say a.c) might be > inlined into another file (say b.c). So in this particular case, > if the global function is inlined, fentry/fexit approach might be > missing some cases? We could mark certain *special purpose* function > as non-inline, but not sure whether this is scalable or not. For this particular case I don't think it matters, right? I'd like to fexit ip4_datagram_connect which is exported symbol, it's accessed via proto->connect and there is no way it's gonna be inlined. Unless our indirect call macros give clang a hint :-/ I'm in general a bit concerned about using tracing calls for stuff like that and depending on the non-uapi, but it's probably time to give it a try and see how co-re works :-)
On 1/15/21 8:39 AM, Stanislav Fomichev wrote: > On Thu, Jan 14, 2021 at 8:27 PM Yonghong Song <yhs@fb.com> wrote: >> >> >> >> On 1/14/21 7:59 PM, Alexei Starovoitov wrote: >>> On Thu, Jan 14, 2021 at 7:51 PM Stanislav Fomichev <sdf@google.com> wrote: >>>>>> >>>>>> lock_sock(sock->sk); >>>>>> err = __inet_stream_connect(sock, uaddr, addr_len, flags, 0); >>>>> >>>>> Similarly here, attaching fexit to __inet_stream_connect would execute >>>>> your BPF program at exactly the same time (and then you can check for >>>>> err value). >>>>> >>>>> Or the point here is to have a more "stable" BPF program type? >>>> Good suggestion, I can try to play with it, I think it should give me >>>> all the info I need (I only need sock). >>>> But yeah, I'd rather prefer a stable interface against stable >>>> __sk_buff, but maybe fexit will also work. >>> >>> Maybe we can add an extension to fentry/fexit that are cgroup scoped? >>> I think this will solve many such cases. >> >> Currently, google is pushing LTO build of the kernel. If this happens, >> it is possible one global function in one file (say a.c) might be >> inlined into another file (say b.c). So in this particular case, >> if the global function is inlined, fentry/fexit approach might be >> missing some cases? We could mark certain *special purpose* function >> as non-inline, but not sure whether this is scalable or not. > For this particular case I don't think it matters, right? > I'd like to fexit ip4_datagram_connect which is exported symbol, > it's accessed via proto->connect and there is no way it's > gonna be inlined. Unless our indirect call macros give clang > a hint :-/ You are right. It is called through indirect call and by default compiler won't be able to do inlining. One possibility is profile guided optimization which often profiles indirect call as well. They may find the indirect call has one call happening in say 80% and will special case that one and may do inlining. not sure. I guess kernel build in general is not that advanced. But just keep in mind that this could happen in distant future. > > I'm in general a bit concerned about using tracing calls for stuff > like that and depending on the non-uapi, but it's probably > time to give it a try and see how co-re works :-) you can filter out based on cgroup id in bpf program. I guess fexit should work in your use case.
diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h index 72e69a0e1e8c..f110935258b9 100644 --- a/include/linux/bpf-cgroup.h +++ b/include/linux/bpf-cgroup.h @@ -213,12 +213,29 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key, __ret; \ }) +#define BPF_CGROUP_RUN_SK_PROG_LOCKED(sk, type) \ +({ \ + int __ret = 0; \ + if (cgroup_bpf_enabled) { \ + lock_sock(sk); \ + __ret = __cgroup_bpf_run_filter_sk(sk, type); \ + release_sock(sk); \ + } \ + __ret; \ +}) + #define BPF_CGROUP_RUN_PROG_INET_SOCK(sk) \ BPF_CGROUP_RUN_SK_PROG(sk, BPF_CGROUP_INET_SOCK_CREATE) #define BPF_CGROUP_RUN_PROG_INET_SOCK_RELEASE(sk) \ BPF_CGROUP_RUN_SK_PROG(sk, BPF_CGROUP_INET_SOCK_RELEASE) +#define BPF_CGROUP_RUN_PROG_INET_SOCK_POST_CONNECT(sk) \ + BPF_CGROUP_RUN_SK_PROG(sk, BPF_CGROUP_INET_SOCK_POST_CONNECT) + +#define BPF_CGROUP_RUN_PROG_INET_SOCK_POST_CONNECT_LOCKED(sk) \ + BPF_CGROUP_RUN_SK_PROG_LOCKED(sk, BPF_CGROUP_INET_SOCK_POST_CONNECT) + #define BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk) \ BPF_CGROUP_RUN_SK_PROG(sk, BPF_CGROUP_INET4_POST_BIND) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index a1ad32456f89..3235f7bd131f 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -241,6 +241,7 @@ enum bpf_attach_type { BPF_XDP_CPUMAP, BPF_SK_LOOKUP, BPF_XDP, + BPF_CGROUP_INET_SOCK_POST_CONNECT, __MAX_BPF_ATTACH_TYPE }; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index c3bb03c8371f..7d6fd1e32d22 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -1958,6 +1958,7 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type, switch (expected_attach_type) { case BPF_CGROUP_INET_SOCK_CREATE: case BPF_CGROUP_INET_SOCK_RELEASE: + case BPF_CGROUP_INET_SOCK_POST_CONNECT: case BPF_CGROUP_INET4_POST_BIND: case BPF_CGROUP_INET6_POST_BIND: return 0; @@ -2910,6 +2911,7 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type) return BPF_PROG_TYPE_CGROUP_SKB; case BPF_CGROUP_INET_SOCK_CREATE: case BPF_CGROUP_INET_SOCK_RELEASE: + case BPF_CGROUP_INET_SOCK_POST_CONNECT: case BPF_CGROUP_INET4_POST_BIND: case BPF_CGROUP_INET6_POST_BIND: return BPF_PROG_TYPE_CGROUP_SOCK; @@ -3063,6 +3065,7 @@ static int bpf_prog_query(const union bpf_attr *attr, case BPF_CGROUP_INET_EGRESS: case BPF_CGROUP_INET_SOCK_CREATE: case BPF_CGROUP_INET_SOCK_RELEASE: + case BPF_CGROUP_INET_SOCK_POST_CONNECT: case BPF_CGROUP_INET4_BIND: case BPF_CGROUP_INET6_BIND: case BPF_CGROUP_INET4_POST_BIND: diff --git a/net/core/filter.c b/net/core/filter.c index 9ab94e90d660..d955321d3415 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -7683,12 +7683,14 @@ static bool __sock_filter_check_attach_type(int off, switch (attach_type) { case BPF_CGROUP_INET_SOCK_CREATE: case BPF_CGROUP_INET_SOCK_RELEASE: + case BPF_CGROUP_INET_SOCK_POST_CONNECT: goto full_access; default: return false; } case bpf_ctx_range(struct bpf_sock, src_ip4): switch (attach_type) { + case BPF_CGROUP_INET_SOCK_POST_CONNECT: case BPF_CGROUP_INET4_POST_BIND: goto read_only; default: @@ -7696,6 +7698,7 @@ static bool __sock_filter_check_attach_type(int off, } case bpf_ctx_range_till(struct bpf_sock, src_ip6[0], src_ip6[3]): switch (attach_type) { + case BPF_CGROUP_INET_SOCK_POST_CONNECT: case BPF_CGROUP_INET6_POST_BIND: goto read_only; default: @@ -7703,6 +7706,7 @@ static bool __sock_filter_check_attach_type(int off, } case bpf_ctx_range(struct bpf_sock, src_port): switch (attach_type) { + case BPF_CGROUP_INET_SOCK_POST_CONNECT: case BPF_CGROUP_INET4_POST_BIND: case BPF_CGROUP_INET6_POST_BIND: goto read_only; diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index b94fa8eb831b..568654cafa48 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -574,7 +574,10 @@ int inet_dgram_connect(struct socket *sock, struct sockaddr *uaddr, if (!inet_sk(sk)->inet_num && inet_autobind(sk)) return -EAGAIN; - return sk->sk_prot->connect(sk, uaddr, addr_len); + err = sk->sk_prot->connect(sk, uaddr, addr_len); + if (!err) + err = BPF_CGROUP_RUN_PROG_INET_SOCK_POST_CONNECT_LOCKED(sk); + return err; } EXPORT_SYMBOL(inet_dgram_connect); @@ -723,6 +726,8 @@ int inet_stream_connect(struct socket *sock, struct sockaddr *uaddr, lock_sock(sock->sk); err = __inet_stream_connect(sock, uaddr, addr_len, flags, 0); + if (!err) + err = BPF_CGROUP_RUN_PROG_INET_SOCK_POST_CONNECT(sock->sk); release_sock(sock->sk); return err; }
We are playing with doing hybrid conntrack where BPF generates connect/disconnect/etc events and puts them into perfbuf (or, later, new ringbuf). We can get most of the functionality out of existing hooks: - BPF_CGROUP_SOCK_OPS fully covers TCP - BPF_CGROUP_UDP4_SENDMSG covers unconnected UDP (with sampling, etc) The only missing bit is connected UDP where we can get some information from the existing BPF_CGROUP_INET{4,6}_CONNECT if the caller did explicit bind(); otherwise, in an autobind case, we get only destination addr/port and no source port because this hook triggers prior to that. We'd really like to avoid the cost of BPF_CGROUP_INET_EGRESS and filtering UDP (which covers both connected and unconnected UDP, but loses that connect/disconnect pseudo signal). The proposal is to add a new BPF_CGROUP_INET_SOCK_POST_CONNECT which triggers right before sys_connect exits in the AF_INET{,6} case. The context is bpf_sock which lets BPF examine the socket state. There is really no reason for it to trigger for all inet socks, I've considered adding BPF_CGROUP_UDP_POST_CONNECT, but decided that it might be better to have a generic inet case. New hook triggers right before sys_connect() returns and gives BPF an opportunity to explore source & destination addresses as well as ability to return EPERM to the user. This is somewhat analogous to the existing BPF_CGROUP_INET{4,6}_POST_BIND hooks with the intention to log the connection addresses (after autobind). Signed-off-by: Stanislav Fomichev <sdf@google.com> Change-Id: I46d0122f93c58b17bfae5ba5040b0b0343908c19 --- include/linux/bpf-cgroup.h | 17 +++++++++++++++++ include/uapi/linux/bpf.h | 1 + kernel/bpf/syscall.c | 3 +++ net/core/filter.c | 4 ++++ net/ipv4/af_inet.c | 7 ++++++- 5 files changed, 31 insertions(+), 1 deletion(-)