mbox series

[v2,00/15] pidfs: provide information after task has been reaped

Message ID 20250304-work-pidfs-kill_on_last_close-v2-0-44fdacfaa7b7@kernel.org (mailing list archive)
Headers show
Series pidfs: provide information after task has been reaped | expand

Message

Christian Brauner March 4, 2025, 9:41 a.m. UTC
Hey,

Various tools need access to information about a process/task even after
it has already been reaped. For example, systemd's journal logs
and uses such information as the cgroup id and exit status to deal with
processes that have been sent via SCM_PIDFD or SCM_PEERPIDFD. By the
time the pidfd is received the process might have already been reaped.

This series aims to provide information by extending the PIDFD_GET_INFO
ioctl to retrieve the exit code and cgroup id. There might be other
stuff that we would want in the future.

Pidfd
polling allows waiting on either task exit or for a task to have been
reaped. The contract for PIDFD_INFO_EXIT is simply that EPOLLHUP must
be observed before exit information can be retrieved, i.e., exit
information is only provided once the task has been reaped. 

Note, that if a thread-group leader exits before other threads in the
thread-group then exit information will only be available once the
thread-group is empty. This aligns with wait() as well, where reaping of
a thread-group leader that exited before the thread-group was empty is
delayed until the thread-group is empty.

With PIDFD_INFO_EXIT autoreaping might actually become usable because it
means a parent can ignore SIGCHLD or set SA_NOCLDWAIT and simply use
pidfd polling and PIDFD_INFO_EXIT to get get status information for its
children. The kernel will autocleanup right away instead of delaying.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
Changes in v2:
- Call pidfs_exit() from release_task().
- Don't provide exit information once the task has exited but once the
  task has been reaped. This makes for simpler semantics. Thus, call
  pidfs_exit() from release_task().
- Link to v1: https://lore.kernel.org/r/20250228-work-pidfs-kill_on_last_close-v1-0-5bd7e6bb428e@kernel.org

---
Christian Brauner (15):
      pidfs: switch to copy_struct_to_user()
      pidfd: rely on automatic cleanup in __pidfd_prepare()
      pidfs: move setting flags into pidfs_alloc_file()
      pidfs: add inode allocation
      pidfs: record exit code and cgroupid at exit
      pidfs: allow to retrieve exit information
      selftests/pidfd: fix header inclusion
      pidfs/selftests: ensure correct headers for ioctl handling
      selftests/pidfd: move more defines to common header
      selftests/pidfd: add first PIDFD_INFO_EXIT selftest
      selftests/pidfd: add second PIDFD_INFO_EXIT selftest
      selftests/pidfd: add third PIDFD_INFO_EXIT selftest
      selftests/pidfd: add fourth PIDFD_INFO_EXIT selftest
      selftests/pidfd: add fifth PIDFD_INFO_EXIT selftest
      selftests/pidfd: add sixth PIDFD_INFO_EXIT selftest

 fs/internal.h                                     |   1 +
 fs/libfs.c                                        |   4 +-
 fs/pidfs.c                                        | 182 +++++++++--
 include/linux/pidfs.h                             |   1 +
 include/uapi/linux/pidfd.h                        |   3 +-
 kernel/exit.c                                     |   2 +
 kernel/fork.c                                     |  15 +-
 tools/testing/selftests/pidfd/.gitignore          |   1 +
 tools/testing/selftests/pidfd/Makefile            |   2 +-
 tools/testing/selftests/pidfd/pidfd.h             |  86 +++++
 tools/testing/selftests/pidfd/pidfd_fdinfo_test.c |   1 +
 tools/testing/selftests/pidfd/pidfd_info_test.c   | 372 ++++++++++++++++++++++
 tools/testing/selftests/pidfd/pidfd_open_test.c   |  26 --
 tools/testing/selftests/pidfd/pidfd_setns_test.c  |  45 ---
 14 files changed, 636 insertions(+), 105 deletions(-)
---
base-commit: b1e809e7f64ad47dd232ff072d8ef59c1fe414c5
change-id: 20250227-work-pidfs-kill_on_last_close-a23ddf21db47