Message ID | 20230703175329.3259672-1-i.maximets@ovn.org (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | BPF |
Headers | show |
Series | [bpf-next] xsk: honor SO_BINDTODEVICE on bind | expand |
Ilya Maximets wrote: > Initial creation of an AF_XDP socket requires CAP_NET_RAW capability. > A privileged process might create the socket and pass it to a > non-privileged process for later use. However, that process will be > able to bind the socket to any network interface. Even though it will > not be able to receive any traffic without modification of the BPF map, > the situation is not ideal. > > Sockets already have a mechanism that can be used to restrict what > interface they can be attached to. That is SO_BINDTODEVICE. > > To change the SO_BINDTODEVICE binding the process will need CAP_NET_RAW. > > Make xsk_bind() honor the SO_BINDTODEVICE in order to allow safer > workflow when non-privileged process is using AF_XDP. > > The intended workflow is following: > > 1. First process creates a bare socket with socket(AF_XDP, ...). > 2. First process loads the XSK program to the interface. > 3. First process adds the socket fd to a BPF map. > 4. First process ties socket fd to a particular interface using > SO_BINDTODEVICE. > 5. First process sends socket fd to a second process. > 6. Second process allocates UMEM. > 7. Second process binds socket to the interface with bind(...). > 8. Second process sends/receives the traffic. > > All the steps above are possible today if the first process is > privileged and the second one has sufficient RLIMIT_MEMLOCK and no > capabilities. However, the second process will be able to bind the > socket to any interface it wants on step 7 and send traffic from it. > With the proposed change, the second process will be able to bind > the socket only to a specific interface chosen by the first process > at step 4. > > Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> > Signed-off-by: Ilya Maximets <i.maximets@ovn.org> > --- LGTM. Acked-by: John Fastabend <john.fastabend@gmail.com>
On Tue, Jul 4, 2023 at 1:53 AM Ilya Maximets <i.maximets@ovn.org> wrote: > > Initial creation of an AF_XDP socket requires CAP_NET_RAW capability. > A privileged process might create the socket and pass it to a > non-privileged process for later use. However, that process will be > able to bind the socket to any network interface. Even though it will > not be able to receive any traffic without modification of the BPF map, > the situation is not ideal. > > Sockets already have a mechanism that can be used to restrict what > interface they can be attached to. That is SO_BINDTODEVICE. > > To change the SO_BINDTODEVICE binding the process will need CAP_NET_RAW. > > Make xsk_bind() honor the SO_BINDTODEVICE in order to allow safer > workflow when non-privileged process is using AF_XDP. > > The intended workflow is following: > > 1. First process creates a bare socket with socket(AF_XDP, ...). > 2. First process loads the XSK program to the interface. > 3. First process adds the socket fd to a BPF map. > 4. First process ties socket fd to a particular interface using > SO_BINDTODEVICE. > 5. First process sends socket fd to a second process. > 6. Second process allocates UMEM. > 7. Second process binds socket to the interface with bind(...). > 8. Second process sends/receives the traffic. > > All the steps above are possible today if the first process is > privileged and the second one has sufficient RLIMIT_MEMLOCK and no > capabilities. However, the second process will be able to bind the > socket to any interface it wants on step 7 and send traffic from it. > With the proposed change, the second process will be able to bind > the socket only to a specific interface chosen by the first process > at step 4. > > Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> > Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Acked-by: Jason Wang <jasowang@redhat.com> Is this a stable material or not? Thanks > --- > > RFC --> PATCH: > * Better explained intended workflow in a commit message. > * Added ACK from Magnus. > > Documentation/networking/af_xdp.rst | 9 +++++++++ > net/xdp/xsk.c | 6 ++++++ > 2 files changed, 15 insertions(+) > > diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst > index 247c6c4127e9..1cc35de336a4 100644 > --- a/Documentation/networking/af_xdp.rst > +++ b/Documentation/networking/af_xdp.rst > @@ -433,6 +433,15 @@ start N bytes into the buffer leaving the first N bytes for the > application to use. The final option is the flags field, but it will > be dealt with in separate sections for each UMEM flag. > > +SO_BINDTODEVICE setsockopt > +-------------------------- > + > +This is a generic SOL_SOCKET option that can be used to tie AF_XDP > +socket to a particular network interface. It is useful when a socket > +is created by a privileged process and passed to a non-privileged one. > +Once the option is set, kernel will refuse attempts to bind that socket > +to a different interface. Updating the value requires CAP_NET_RAW. > + > XDP_STATISTICS getsockopt > ------------------------- > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > index 5a8c0dd250af..386ff641db0f 100644 > --- a/net/xdp/xsk.c > +++ b/net/xdp/xsk.c > @@ -886,6 +886,7 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) > struct sock *sk = sock->sk; > struct xdp_sock *xs = xdp_sk(sk); > struct net_device *dev; > + int bound_dev_if; > u32 flags, qid; > int err = 0; > > @@ -899,6 +900,11 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) > XDP_USE_NEED_WAKEUP)) > return -EINVAL; > > + bound_dev_if = READ_ONCE(sk->sk_bound_dev_if); > + > + if (bound_dev_if && bound_dev_if != sxdp->sxdp_ifindex) > + return -EINVAL; > + > rtnl_lock(); > mutex_lock(&xs->mutex); > if (xs->state != XSK_READY) { > -- > 2.40.1 >
On 7/4/23 4:31 AM, Jason Wang wrote: > On Tue, Jul 4, 2023 at 1:53 AM Ilya Maximets <i.maximets@ovn.org> wrote: >> >> Initial creation of an AF_XDP socket requires CAP_NET_RAW capability. >> A privileged process might create the socket and pass it to a >> non-privileged process for later use. However, that process will be >> able to bind the socket to any network interface. Even though it will >> not be able to receive any traffic without modification of the BPF map, >> the situation is not ideal. >> >> Sockets already have a mechanism that can be used to restrict what >> interface they can be attached to. That is SO_BINDTODEVICE. >> >> To change the SO_BINDTODEVICE binding the process will need CAP_NET_RAW. >> >> Make xsk_bind() honor the SO_BINDTODEVICE in order to allow safer >> workflow when non-privileged process is using AF_XDP. >> >> The intended workflow is following: >> >> 1. First process creates a bare socket with socket(AF_XDP, ...). >> 2. First process loads the XSK program to the interface. >> 3. First process adds the socket fd to a BPF map. >> 4. First process ties socket fd to a particular interface using >> SO_BINDTODEVICE. >> 5. First process sends socket fd to a second process. >> 6. Second process allocates UMEM. >> 7. Second process binds socket to the interface with bind(...). >> 8. Second process sends/receives the traffic. >> >> All the steps above are possible today if the first process is >> privileged and the second one has sufficient RLIMIT_MEMLOCK and no >> capabilities. However, the second process will be able to bind the >> socket to any interface it wants on step 7 and send traffic from it. >> With the proposed change, the second process will be able to bind >> the socket only to a specific interface chosen by the first process >> at step 4. >> >> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> >> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> > > Acked-by: Jason Wang <jasowang@redhat.com> > > Is this a stable material or not? To me this is a bug rather than 'feature', so I applied it to bpf tree and also added Fixes tag. Thanks everyone!
diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst index 247c6c4127e9..1cc35de336a4 100644 --- a/Documentation/networking/af_xdp.rst +++ b/Documentation/networking/af_xdp.rst @@ -433,6 +433,15 @@ start N bytes into the buffer leaving the first N bytes for the application to use. The final option is the flags field, but it will be dealt with in separate sections for each UMEM flag. +SO_BINDTODEVICE setsockopt +-------------------------- + +This is a generic SOL_SOCKET option that can be used to tie AF_XDP +socket to a particular network interface. It is useful when a socket +is created by a privileged process and passed to a non-privileged one. +Once the option is set, kernel will refuse attempts to bind that socket +to a different interface. Updating the value requires CAP_NET_RAW. + XDP_STATISTICS getsockopt ------------------------- diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 5a8c0dd250af..386ff641db0f 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -886,6 +886,7 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) struct sock *sk = sock->sk; struct xdp_sock *xs = xdp_sk(sk); struct net_device *dev; + int bound_dev_if; u32 flags, qid; int err = 0; @@ -899,6 +900,11 @@ static int xsk_bind(struct socket *sock, struct sockaddr *addr, int addr_len) XDP_USE_NEED_WAKEUP)) return -EINVAL; + bound_dev_if = READ_ONCE(sk->sk_bound_dev_if); + + if (bound_dev_if && bound_dev_if != sxdp->sxdp_ifindex) + return -EINVAL; + rtnl_lock(); mutex_lock(&xs->mutex); if (xs->state != XSK_READY) {