mbox series

[bpf-next,v11,0/9] Add cgroup sockaddr hooks for unix sockets

Message ID 20231011185113.140426-1-daan.j.demeyer@gmail.com (mailing list archive)
Headers show
Series Add cgroup sockaddr hooks for unix sockets | expand

Message

Daan De Meyer Oct. 11, 2023, 6:51 p.m. UTC
Changes since v10:

* Removed extra check from bpf_sock_addr_set_sun_path() again in favor of
  calling unix_validate_addr() everywhere in af_unix.c before calling the hooks.

Changes since v9:

* Renamed bpf_sock_addr_set_unix_addr() to bpf_sock_addr_set_sun_path() and
  rennamed arguments to match the new name.
* Added an extra check to bpf_sock_addr_set_sun_path() to disallow changing the
  address of an unnamed unix socket.
* Removed unnecessary NULL check on uaddrlen in
  __cgroup_bpf_run_filter_sock_addr().

Changes since v8:

* Added missing test programs to last patch

Changes since v7:

* Fixed formatting nit in comment
* Renamed from cgroup/connectun to cgroup/connect_unix (and similar for all
  other hooks)

Changes since v6:

* Actually removed bpf_bind() helper for AF_UNIX hooks.
* Fixed merge conflict
* Updated comment to mention uaddrlen is read-only for AF_INET[6]
* Removed unnecessary forward declaration of struct sock_addr_test
* Removed unused BPF_CGROUP_RUN_PROG_UNIX_CONNECT()
* Fixed formatting nit reported by checkpatch
* Added more information to commit message about recvmsg() on connected socket

Changes since v5:

* Fixed kernel version in bpftool documentation (6.3 => 6.7).
* Added connection mode socket recvmsg() test.
* Removed bpf_bind() helper for AF_UNIX hooks.
* Added missing getpeernameun and getsocknameun BPF test programs.
* Added note for bind() test being unused currently.

Changes since v4:

* Dropped support for intercepting bind() as when using bind() with unix sockets
  and a pathname sockaddr, bind() will create an inode in the filesystem that
  needs to be cleaned up. If the address is rewritten, users might try to clean
  up the wrong file and leak the actual socket file in the filesystem.
* Changed bpf_sock_addr_set_unix_addr() to use BTF_KFUNC_HOOK_CGROUP_SKB instead
  of BTF_KFUNC_HOOK_COMMON.
* Removed unix socket related changes from BPF_CGROUP_PRE_CONNECT_ENABLED() as
  unix sockets do not support pre-connect.
* Added tests for getpeernameun and getsocknameun hooks.
* We now disallow an empty sockaddr in bpf_sock_addr_set_unix_addr() similar to
  unix_validate_addr().
* Removed unnecessary cgroup_bpf_enabled() checks
* Removed unnecessary error checks

Changes since v3:

* Renamed bpf_sock_addr_set_addr() to bpf_sock_addr_set_unix_addr() and
  made it only operate on AF_UNIX sockaddrs. This is because for the other
  families, users usually want to configure more than just the address so
  a generic interface will not fit the bill here. e.g. for AF_INET and AF_INET6,
  users would generally also want to be able to configure the port which the
  current interface doesn't support. So we expose an AF_UNIX specific function
  instead.
* Made the tests in the new sock addr tests more generic (similar to test_sock_addr.c),
  this should make it easier to migrate the other sock addr tests in the future.
* Removed the new kfunc hook and attached to BTF_KFUNC_HOOK_COMMON instead
* Set uaddrlen to 0 when the family is AF_UNSPEC
* Pass in the addrlen to the hook from IPv6 code
* Fixed mount directory mkdir() to ignore EEXIST

Changes since v2:

* Configuring the sock addr is now done via a new kfunc bpf_sock_addr_set()
* The addrlen is exposed as u32 in bpf_sock_addr_kern
* Selftests are updated to use the new kfunc
* Selftests are now added as a new sock_addr test in prog_tests/
* Added BTF_KFUNC_HOOK_SOCK_ADDR for BPF_PROG_TYPE_CGROUP_SOCK_ADDR
* __cgroup_bpf_run_filter_sock_addr() now returns the modified addrlen

Changes since v1:

* Split into multiple patches instead of one single patch
* Added unix support for all socket address hooks instead of only connect()
* Switched approach to expose the socket address length to the bpf hook
instead of recalculating the socket address length in kernelspace to
properly support abstract unix socket addresses
* Modified socket address hook tests to calculate the socket address length
once and pass it around everywhere instead of recalculating the actual unix
socket address length on demand.
* Added some missing section name tests for getpeername()/getsockname()

This patch series extends the cgroup sockaddr hooks to include support for unix
sockets. To add support for unix sockets, struct bpf_sock_addr_kern is extended
to expose the socket address length to the bpf program. Along with that, a new
kfunc bpf_sock_addr_set_unix_addr() is added to safely allow modifying an
AF_UNIX sockaddr from bpf programs.

I intend to use these new hooks in systemd to reimplement the LogNamespace=
feature, which allows running multiple instances of systemd-journald to
process the logs of different services. systemd-journald also processes
syslog messages, so currently, using log namespaces means all services running
in the same log namespace have to live in the same private mount namespace
so that systemd can mount the journal namespace's associated syslog socket
over /dev/log to properly direct syslog messages from all services running
in that log namespace to the correct systemd-journald instance. We want to
relax this requirement so that processes running in disjoint mount namespaces
can still run in the same log namespace. To achieve this, we can use these
new hooks to rewrite the socket address of any connect(), sendto(), ...
syscalls to /dev/log to the socket address of the journal namespace's syslog
socket instead, which will transparently do the redirection without requiring
use of a mount namespace and mounting over /dev/log.

Aside from the above usecase, these hooks can more generally be used to
transparently redirect unix sockets to different addresses as required by
services.

Daan De Meyer (9):
  selftests/bpf: Add missing section name tests for
    getpeername/getsockname
  bpf: Propagate modified uaddrlen from cgroup sockaddr programs
  bpf: Add bpf_sock_addr_set_sun_path() to allow writing unix sockaddr
    from bpf
  bpf: Implement cgroup sockaddr hooks for unix sockets
  libbpf: Add support for cgroup unix socket address hooks
  bpftool: Add support for cgroup unix socket address hooks
  documentation/bpf: Document cgroup unix socket address hooks
  selftests/bpf: Make sure mount directory exists
  selftests/bpf: Add tests for cgroup unix socket address hooks

 Documentation/bpf/libbpf/program_types.rst    |  10 +
 include/linux/bpf-cgroup-defs.h               |   5 +
 include/linux/bpf-cgroup.h                    |  90 +--
 include/linux/filter.h                        |   1 +
 include/uapi/linux/bpf.h                      |  13 +-
 kernel/bpf/btf.c                              |   1 +
 kernel/bpf/cgroup.c                           |  29 +-
 kernel/bpf/syscall.c                          |  15 +
 kernel/bpf/verifier.c                         |   5 +-
 net/core/filter.c                             |  50 +-
 net/ipv4/af_inet.c                            |   7 +-
 net/ipv4/ping.c                               |   2 +-
 net/ipv4/tcp_ipv4.c                           |   2 +-
 net/ipv4/udp.c                                |   9 +-
 net/ipv6/af_inet6.c                           |   9 +-
 net/ipv6/ping.c                               |   2 +-
 net/ipv6/tcp_ipv6.c                           |   2 +-
 net/ipv6/udp.c                                |   6 +-
 net/unix/af_unix.c                            |  35 +-
 .../bpftool/Documentation/bpftool-cgroup.rst  |  16 +-
 .../bpftool/Documentation/bpftool-prog.rst    |   8 +-
 tools/bpf/bpftool/bash-completion/bpftool     |  14 +-
 tools/bpf/bpftool/cgroup.c                    |  16 +-
 tools/bpf/bpftool/prog.c                      |   7 +-
 tools/include/uapi/linux/bpf.h                |  13 +-
 tools/lib/bpf/libbpf.c                        |  10 +
 tools/testing/selftests/bpf/bpf_kfuncs.h      |  14 +
 tools/testing/selftests/bpf/cgroup_helpers.c  |   5 +
 tools/testing/selftests/bpf/network_helpers.c |  34 +
 tools/testing/selftests/bpf/network_helpers.h |   1 +
 .../selftests/bpf/prog_tests/section_names.c  |  45 ++
 .../selftests/bpf/prog_tests/sock_addr.c      | 612 ++++++++++++++++++
 .../selftests/bpf/progs/connect_unix_prog.c   |  40 ++
 .../bpf/progs/getpeername_unix_prog.c         |  39 ++
 .../bpf/progs/getsockname_unix_prog.c         |  39 ++
 .../selftests/bpf/progs/recvmsg_unix_prog.c   |  39 ++
 .../selftests/bpf/progs/sendmsg_unix_prog.c   |  40 ++
 37 files changed, 1192 insertions(+), 93 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/sock_addr.c
 create mode 100644 tools/testing/selftests/bpf/progs/connect_unix_prog.c
 create mode 100644 tools/testing/selftests/bpf/progs/getpeername_unix_prog.c
 create mode 100644 tools/testing/selftests/bpf/progs/getsockname_unix_prog.c
 create mode 100644 tools/testing/selftests/bpf/progs/recvmsg_unix_prog.c
 create mode 100644 tools/testing/selftests/bpf/progs/sendmsg_unix_prog.c

--
2.41.0

Comments

patchwork-bot+netdevbpf@kernel.org Oct. 12, 2023, 12:40 a.m. UTC | #1
Hello:

This series was applied to bpf/bpf-next.git (master)
by Martin KaFai Lau <martin.lau@kernel.org>:

On Wed, 11 Oct 2023 20:51:02 +0200 you wrote:
> Changes since v10:
> 
> * Removed extra check from bpf_sock_addr_set_sun_path() again in favor of
>   calling unix_validate_addr() everywhere in af_unix.c before calling the hooks.
> 
> Changes since v9:
> 
> [...]

Here is the summary with links:
  - [bpf-next,v11,1/9] selftests/bpf: Add missing section name tests for getpeername/getsockname
    https://git.kernel.org/bpf/bpf-next/c/feba7b634ef0
  - [bpf-next,v11,2/9] bpf: Propagate modified uaddrlen from cgroup sockaddr programs
    https://git.kernel.org/bpf/bpf-next/c/fefba7d1ae19
  - [bpf-next,v11,3/9] bpf: Add bpf_sock_addr_set_sun_path() to allow writing unix sockaddr from bpf
    https://git.kernel.org/bpf/bpf-next/c/53e380d21441
  - [bpf-next,v11,4/9] bpf: Implement cgroup sockaddr hooks for unix sockets
    https://git.kernel.org/bpf/bpf-next/c/859051dd165e
  - [bpf-next,v11,5/9] libbpf: Add support for cgroup unix socket address hooks
    https://git.kernel.org/bpf/bpf-next/c/bf90438c78df
  - [bpf-next,v11,6/9] bpftool: Add support for cgroup unix socket address hooks
    https://git.kernel.org/bpf/bpf-next/c/8b3cba987e6d
  - [bpf-next,v11,7/9] documentation/bpf: Document cgroup unix socket address hooks
    https://git.kernel.org/bpf/bpf-next/c/3243fef6a4c0
  - [bpf-next,v11,8/9] selftests/bpf: Make sure mount directory exists
    https://git.kernel.org/bpf/bpf-next/c/af2752ed450e
  - [bpf-next,v11,9/9] selftests/bpf: Add tests for cgroup unix socket address hooks
    https://git.kernel.org/bpf/bpf-next/c/82ab6b505e81

You are awesome, thank you!
Martin KaFai Lau Oct. 12, 2023, 12:41 a.m. UTC | #2
On 10/11/23 11:51 AM, Daan De Meyer wrote:
> Changes since v10:
> 
> * Removed extra check from bpf_sock_addr_set_sun_path() again in favor of
>    calling unix_validate_addr() everywhere in af_unix.c before calling the hooks.
> 
> Changes since v9:
> 
> * Renamed bpf_sock_addr_set_unix_addr() to bpf_sock_addr_set_sun_path() and
>    rennamed arguments to match the new name.
> * Added an extra check to bpf_sock_addr_set_sun_path() to disallow changing the
>    address of an unnamed unix socket.
> * Removed unnecessary NULL check on uaddrlen in
>    __cgroup_bpf_run_filter_sock_addr().
> 

[ ... ]

> This patch series extends the cgroup sockaddr hooks to include support for unix
> sockets. To add support for unix sockets, struct bpf_sock_addr_kern is extended
> to expose the socket address length to the bpf program. Along with that, a new
> kfunc bpf_sock_addr_set_unix_addr() is added to safely allow modifying an
> AF_UNIX sockaddr from bpf programs.
> 
> I intend to use these new hooks in systemd to reimplement the LogNamespace=
> feature, which allows running multiple instances of systemd-journald to
> process the logs of different services. systemd-journald also processes
> syslog messages, so currently, using log namespaces means all services running
> in the same log namespace have to live in the same private mount namespace
> so that systemd can mount the journal namespace's associated syslog socket
> over /dev/log to properly direct syslog messages from all services running
> in that log namespace to the correct systemd-journald instance. We want to
> relax this requirement so that processes running in disjoint mount namespaces
> can still run in the same log namespace. To achieve this, we can use these
> new hooks to rewrite the socket address of any connect(), sendto(), ...
> syscalls to /dev/log to the socket address of the journal namespace's syslog
> socket instead, which will transparently do the redirection without requiring
> use of a mount namespace and mounting over /dev/log.
> 
> Aside from the above usecase, these hooks can more generally be used to
> transparently redirect unix sockets to different addresses as required by
> services.

I have changed to use the "uaddr" test in patch 2 per the discussion in v10.
Patch 4 in v11 was changed based on the discussion in v10 (call bpf after 
unix_validate_addr), so I carried Kuniyuki's reviewed-by tag from v9.

Applied. Thanks.