Message ID | 20250309132821.103046-1-aleksandr.mikhalitsyn@canonical.com (mailing list archive) |
---|---|
Headers | show |
Series | Add getsockopt(SO_PEERCGROUPID) and fdinfo API to retreive socket's peer cgroup id | expand |
On Sun, Mar 09, 2025 at 02:28:11PM +0100, Alexander Mikhalitsyn wrote: > 1. Add socket cgroup id and socket's peer cgroup id in socket's fdinfo > 2. Add SO_PEERCGROUPID which allows to retrieve socket's peer cgroup id > 3. Add SO_PEERCGROUPID kselftest > > Generally speaking, this API allows race-free resolution of socket's peer cgroup id. > Currently, to do that SCM_CREDENTIALS/SCM_PIDFD -> pid -> /proc/<pid>/cgroup sequence > is used which is racy. > > As we don't add any new state to the socket itself there is no potential locking issues > or performance problems. We use already existing sk->sk_cgrp_data. > > We already have analogical interfaces to retrieve this > information: > - inet_diag: INET_DIAG_CGROUP_ID > - eBPF: bpf_sk_cgroup_id > > Having getsockopt() interface makes sense for many applications, because using eBPF is > not always an option, while inet_diag has obvious complexety and performance drawbacks > if we only want to get this specific info for one specific socket. > > Idea comes from UAPI kernel group: > https://uapi-group.org/kernel-features/ > > Huge thanks to Christian Brauner, Lennart Poettering and Luca Boccassi for proposing > and exchanging ideas about this. > > Git tree: > https://github.com/mihalicyn/linux/tree/so_peercgroupid From cgroup POV: Acked-by: Tejun Heo <tj@kernel.org> Thanks.
On Sun, Mar 09, 2025 at 02:28:11PM +0100, Alexander Mikhalitsyn wrote: > 1. Add socket cgroup id and socket's peer cgroup id in socket's fdinfo > 2. Add SO_PEERCGROUPID which allows to retrieve socket's peer cgroup id > 3. Add SO_PEERCGROUPID kselftest > > Generally speaking, this API allows race-free resolution of socket's peer cgroup id. > Currently, to do that SCM_CREDENTIALS/SCM_PIDFD -> pid -> /proc/<pid>/cgroup sequence > is used which is racy. > > As we don't add any new state to the socket itself there is no potential locking issues > or performance problems. We use already existing sk->sk_cgrp_data. > > We already have analogical interfaces to retrieve this > information: > - inet_diag: INET_DIAG_CGROUP_ID > - eBPF: bpf_sk_cgroup_id > > Having getsockopt() interface makes sense for many applications, because using eBPF is > not always an option, while inet_diag has obvious complexety and performance drawbacks > if we only want to get this specific info for one specific socket. > > Idea comes from UAPI kernel group: > https://uapi-group.org/kernel-features/ > > Huge thanks to Christian Brauner, Lennart Poettering and Luca Boccassi for proposing > and exchanging ideas about this. Seems fine to me, Reviewed-by: Christian Brauner <brauner@kernel.org>
On Mon, Mar 10, 2025 at 09:52:31AM +0100, Christian Brauner wrote: > On Sun, Mar 09, 2025 at 02:28:11PM +0100, Alexander Mikhalitsyn wrote: > > 1. Add socket cgroup id and socket's peer cgroup id in socket's fdinfo > > 2. Add SO_PEERCGROUPID which allows to retrieve socket's peer cgroup id > > 3. Add SO_PEERCGROUPID kselftest > > > > Generally speaking, this API allows race-free resolution of socket's peer cgroup id. > > Currently, to do that SCM_CREDENTIALS/SCM_PIDFD -> pid -> /proc/<pid>/cgroup sequence > > is used which is racy. > > > > As we don't add any new state to the socket itself there is no potential locking issues > > or performance problems. We use already existing sk->sk_cgrp_data. > > > > We already have analogical interfaces to retrieve this > > information: > > - inet_diag: INET_DIAG_CGROUP_ID > > - eBPF: bpf_sk_cgroup_id > > > > Having getsockopt() interface makes sense for many applications, because using eBPF is > > not always an option, while inet_diag has obvious complexety and performance drawbacks > > if we only want to get this specific info for one specific socket. > > > > Idea comes from UAPI kernel group: > > https://uapi-group.org/kernel-features/ > > > > Huge thanks to Christian Brauner, Lennart Poettering and Luca Boccassi for proposing > > and exchanging ideas about this. > > Seems fine to me, > Reviewed-by: Christian Brauner <brauner@kernel.org> One wider conceptual comment. Starting with v6.15 it is possible to retrieve exit information from pidfds even after the task has been reaped. So if someone opens a pidfd via pidfd_open() and that task gets reaped by the parent it is possible to call PIDFD_INFO_EXIT and you can retrieve the exit status and the cgroupid of the task that was reaped. That works even after all task linkage has been removed from struct pid. The system call api doesn't allow the creation of pidfds for reaped processes. It wouldn't be possible as the pid number will have already been released. Both SO_PEERPIDFD and SO_PASSPIDFD also don't allow the creation of pidfds for already reaped peers or senders. But that doesn't have to be the case since we always have the struct pid available. So it's entirely possible to hand out a pidfd to a reaped process if it's guaranteed that exit information is available. If it's not then this would be a bug. The trick is that when a struct pid is stashed it needs to also allocate a pidfd inode. That could simply be done by a helper get_pidfs_pid() which takes a reference to the struct pid and ensures that space for recording exit information is available. With that done SO_PEERCGROUPID isn't needed per se as it will be possible to get the cgroupid and exit status from the pidfd. From a cursory look that should be possible to do without too much work. I'm just pointing this out as an alternative. There's one restriction that this would be subject to that SO_PEERCGROUPID isn't. The SO_PEERCGROUPID is exposed for any process whereas PIDFD_GET_INFO ioctls (that includes the PIDFD_INFO_EXIT) option is only available for processes within the receivers pid namespace hierarchy. But in any case, enabling pidfds for such reaped processes might still be useful since it would mean receivers could get exit information for pidfds within their pid namespace hierarchy.
From: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Date: Sun, 9 Mar 2025 14:28:11 +0100 > 1. Add socket cgroup id and socket's peer cgroup id in socket's fdinfo Why do you want to add yet another racy interface ? > 2. Add SO_PEERCGROUPID which allows to retrieve socket's peer cgroup id > 3. Add SO_PEERCGROUPID kselftest > > Generally speaking, this API allows race-free resolution of socket's peer cgroup id. > Currently, to do that SCM_CREDENTIALS/SCM_PIDFD -> pid -> /proc/<pid>/cgroup sequence > is used which is racy. Few more words about the race (recycling pid ?) would be appreciated. I somewhat assumed pid is not recycled until all of its pidfd are close()d, but sounds like no ? > > As we don't add any new state to the socket itself there is no potential locking issues > or performance problems. We use already existing sk->sk_cgrp_data. > > We already have analogical interfaces to retrieve this > information: > - inet_diag: INET_DIAG_CGROUP_ID > - eBPF: bpf_sk_cgroup_id > > Having getsockopt() interface makes sense for many applications, because using eBPF is > not always an option, while inet_diag has obvious complexety and performance drawbacks > if we only want to get this specific info for one specific socket. If it's limited to the connect()ed peer, I'd add UNIX_DIAG_CGROUP_ID and UNIX_DIAG_PEER_CGROUP_ID instead. Then also ss can use that easily.
On Tue, Mar 11, 2025 at 12:33:48AM -0700, Kuniyuki Iwashima wrote: > From: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> > Date: Sun, 9 Mar 2025 14:28:11 +0100 > > 1. Add socket cgroup id and socket's peer cgroup id in socket's fdinfo > > Why do you want to add yet another racy interface ? > > > > 2. Add SO_PEERCGROUPID which allows to retrieve socket's peer cgroup id > > 3. Add SO_PEERCGROUPID kselftest > > > > Generally speaking, this API allows race-free resolution of socket's peer cgroup id. > > Currently, to do that SCM_CREDENTIALS/SCM_PIDFD -> pid -> /proc/<pid>/cgroup sequence > > is used which is racy. > > Few more words about the race (recycling pid ?) would be appreciated. > > I somewhat assumed pid is not recycled until all of its pidfd are > close()d, but sounds like no ? No, that would allow starving the kernel of pid numbers. pidfds don't pin struct task_struct for a multitude of reasons similar to how cred->peer or scm->pid don't stash a task_struct but a struct pid. > > > > > > As we don't add any new state to the socket itself there is no potential locking issues > > or performance problems. We use already existing sk->sk_cgrp_data. > > > > We already have analogical interfaces to retrieve this > > information: > > - inet_diag: INET_DIAG_CGROUP_ID > > - eBPF: bpf_sk_cgroup_id > > > > Having getsockopt() interface makes sense for many applications, because using eBPF is > > not always an option, while inet_diag has obvious complexety and performance drawbacks > > if we only want to get this specific info for one specific socket. > > If it's limited to the connect()ed peer, I'd add UNIX_DIAG_CGROUP_ID > and UNIX_DIAG_PEER_CGROUP_ID instead. Then also ss can use that easily.
Hello. On Sun, Mar 09, 2025 at 02:28:11PM +0100, Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> wrote: > As we don't add any new state to the socket itself there is no potential locking issues > or performance problems. We use already existing sk->sk_cgrp_data. This is the cgroup where the socket was created in. If such a socket is fd-passed to another cgroup, SO_PEERCGROUPID may not be what's expected. > We already have analogical interfaces to retrieve this > information: > - inet_diag: INET_DIAG_CGROUP_ID > - eBPF: bpf_sk_cgroup_id > > Having getsockopt() interface makes sense for many applications, because using eBPF is > not always an option, while inet_diag has obvious complexety and performance drawbacks > if we only want to get this specific info for one specific socket. I'm not that familiar with INET_DIAG_CGROUP_ID but that one sounds like fit for the purpose of obtaining socket creator's cgroup whereas what is desired here is slightly different thing -- cgroup of actual sender through the socket. > Idea comes from UAPI kernel group: > https://uapi-group.org/kernel-features/ In theory shortlived (sending) program may reside in shortlived cgroup and the consumer of SO_PEERCGROUPID (even if it is real sender) would only have that number to work with. It doesn't guarantee existence of original cgroup or stable translation to cgroup path. I think having something like this could be useful but consumers must still be aware of limitations (and possible switch from path-oriented to id-oriented work with cgroups). Thanks, Michal