mbox series

[v7,0/6] introduce PIDFD_SELF* sentinels

Message ID cover.1738268370.git.lorenzo.stoakes@oracle.com (mailing list archive)
Headers show
Series introduce PIDFD_SELF* sentinels | expand

Message

Lorenzo Stoakes Jan. 30, 2025, 8:40 p.m. UTC
If you wish to utilise a pidfd interface to refer to the current process or
thread it is rather cumbersome, requiring something like:

	int pidfd = pidfd_open(getpid(), 0 or PIDFD_THREAD);

	...

	close(pidfd);

Or the equivalent call opening /proc/self. It is more convenient to use a
sentinel value to indicate to an interface that accepts a pidfd that we
simply wish to refer to the current process thread.

This series introduces sentinels for this purposes which can be passed as
the pidfd in this instance rather than having to establish a dummy fd for
this purpose.

It is useful to refer to both the current thread from the userland's
perspective for which we use PIDFD_SELF, and the current process from the
userland's perspective, for which we use PIDFD_SELF_PROCESS.

There is unfortunately some confusion between the kernel and userland as to
what constitutes a process - a thread from the userland perspective is a
process in userland, and a userland process is a thread group (more
specifically the thread group leader from the kernel perspective). We
therefore alias things thusly:

* PIDFD_SELF_THREAD aliased by PIDFD_SELF - use PIDTYPE_PID.
* PIDFD_SELF_THREAD_GROUP alised by PIDFD_SELF_PROCESS - use PIDTYPE_TGID.

In all of the kernel code we refer to PIDFD_SELF_THREAD and
PIDFD_SELF_THREAD_GROUP. However we expect users to use PIDFD_SELF and
PIDFD_SELF_PROCESS.

This matters for cases where, for instance, a user unshare()'s FDs or does
thread-specific signal handling and where the user would be hugely confused
if the FDs referenced or signal processed referred to the thread group
leader rather than the individual thread.

For now we only adjust pidfd_get_task() and the pidfd_send_signal() system
call with specific handling for this, implementing this functionality for
process_madvise(), process_mrelease() (albeit, using it here wouldn't
really make sense) and pidfd_send_signal().

We defer making further changes, as this would require a significant rework
of the pidfd mechanism.

The motivating case here is to support PIDFD_SELF in process_madvise(), so
this suffices for immediate uses. Moving forward, this can be further
expanded to other uses.

v7:
* Reworked implementation according to Christian's requirements. We now
  only support process_madvise() and pidfd_send_signal() system calls with
  PIDFD_SELF as specified.
* Updated tests to account for broken pidfd_open_test.c implementation.
* Fixed missing includes in pidfd self tests.
* Removed tests relating to functionality no longer supported.
* Update guard pages test to use PIDFD_SELF.

v6:
* Avoid static inline in UAPI header as suggested by Pedro.
* Place PIDFD_SELF values out of range of errors and any other sentinel as
  suggested by Pedro.
https://lore.kernel.org/linux-mm/cover.1729926229.git.lorenzo.stoakes@oracle.com/

v5:
* Fixup self test dependencies on pidfd/pidfd.h.
https://lore.kernel.org/linux-mm/cover.1729848252.git.lorenzo.stoakes@oracle.com/

v4:
* Avoid returning an fd in the __pidfd_get_pid() function as pointed out by
  Christian, instead simply always pin the pid and maintain fd scope in the
  helper alone.
* Add wrapper header file in tools/include/linux to allow for import of
  UAPI pidfd.h header without encountering the collision between system
  fcntl.h and linux/fcntl.h as discussed with Shuah and John.
* Fixup tests to import the UAPI pidfd.h header working around conflicts
  between system fcntl.h and linux/fcntl.h which the UAPI pidfd.h imports,
  as reported by Shuah.
* Use an int for pidfd_is_self_sentinel() to avoid any dependency on
  stdbool.h in userland.
https://lore.kernel.org/linux-mm/cover.1729198898.git.lorenzo.stoakes@oracle.com/

v3:
* Do not fput() an invalid fd as reported by kernel test bot.
* Fix unintended churn from moving variable declaration.
https://lore.kernel.org/linux-mm/cover.1729073310.git.lorenzo.stoakes@oracle.com/

v2:
* Fix tests as reported by Shuah.
* Correct RFC version lore link.
https://lore.kernel.org/linux-mm/cover.1728643714.git.lorenzo.stoakes@oracle.com/

Non-RFC v1:
* Removed RFC tag - there seems to be general consensus that this change is
  a good idea, but perhaps some debate to be had on implementation. It
  seems sensible then to move forward with the RFC flag removed.
* Introduced PIDFD_SELF_THREAD, PIDFD_SELF_THREAD_GROUP and their aliases
  PIDFD_SELF and PIDFD_SELF_PROCESS respectively.
* Updated testing accordingly.
https://lore.kernel.org/linux-mm/cover.1728578231.git.lorenzo.stoakes@oracle.com/

RFC version:
https://lore.kernel.org/linux-mm/cover.1727644404.git.lorenzo.stoakes@oracle.com/


Lorenzo Stoakes (6):
  pidfd: add PIDFD_SELF* sentinels to refer to own thread/process
  selftests/pidfd: add missing system header imcludes to pidfd tests
  tools: testing: separate out wait_for_pid() into helper header
  selftests: pidfd: add pidfd.h UAPI wrapper
  selftests: pidfd: add tests for PIDFD_SELF_*
  selftests/mm: use PIDFD_SELF in guard pages test

 include/uapi/linux/pidfd.h                    |  24 ++++
 kernel/pid.c                                  |  24 +++-
 kernel/signal.c                               | 106 +++++++++++-------
 tools/include/linux/pidfd.h                   |  14 +++
 tools/testing/selftests/cgroup/test_kill.c    |   2 +-
 tools/testing/selftests/mm/Makefile           |   4 +
 tools/testing/selftests/mm/guard-pages.c      |  15 +--
 .../pid_namespace/regression_enomem.c         |   2 +-
 tools/testing/selftests/pidfd/Makefile        |   3 +-
 tools/testing/selftests/pidfd/pidfd.h         |  28 +----
 .../selftests/pidfd/pidfd_fdinfo_test.c       |   1 +
 tools/testing/selftests/pidfd/pidfd_helpers.h |  39 +++++++
 .../testing/selftests/pidfd/pidfd_open_test.c |   6 +-
 .../selftests/pidfd/pidfd_setns_test.c        |   1 +
 tools/testing/selftests/pidfd/pidfd_test.c    |  76 +++++++++++--
 15 files changed, 242 insertions(+), 103 deletions(-)
 create mode 100644 tools/include/linux/pidfd.h
 create mode 100644 tools/testing/selftests/pidfd/pidfd_helpers.h

--
2.48.1

Comments

Andrew Morton Jan. 30, 2025, 10:37 p.m. UTC | #1
On Thu, 30 Jan 2025 20:40:25 +0000 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:

> If you wish to utilise a pidfd interface to refer to the current process or
> thread it is rather cumbersome, requiring something like:
> 
> 	int pidfd = pidfd_open(getpid(), 0 or PIDFD_THREAD);
> 
> 	...
> 
> 	close(pidfd);
> 
> Or the equivalent call opening /proc/self. It is more convenient to use a
> sentinel value to indicate to an interface that accepts a pidfd that we
> simply wish to refer to the current process thread.
> 

The above code sequence doesn't seem at all onerous.  I'm not
understanding why it's worth altering the kernel to permit this little
shortcut?
Lorenzo Stoakes Jan. 30, 2025, 10:53 p.m. UTC | #2
On Thu, Jan 30, 2025 at 02:37:54PM -0800, Andrew Morton wrote:
> On Thu, 30 Jan 2025 20:40:25 +0000 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:
>
> > If you wish to utilise a pidfd interface to refer to the current process or
> > thread it is rather cumbersome, requiring something like:
> >
> > 	int pidfd = pidfd_open(getpid(), 0 or PIDFD_THREAD);
> >
> > 	...
> >
> > 	close(pidfd);
> >
> > Or the equivalent call opening /proc/self. It is more convenient to use a
> > sentinel value to indicate to an interface that accepts a pidfd that we
> > simply wish to refer to the current process thread.
> >
>
> The above code sequence doesn't seem at all onerous.  I'm not
> understanding why it's worth altering the kernel to permit this little
> shortcut?

In practice it adds quite a bit of overhead for something that whatever
mechanism is using the pidfd can avoid.

It was specifically intended for a real case of utilising
process_madvise(), using the newly extended ability to batch _any_
madvise() operations for the current process, like:

	if (process_madvise(PIDFD_SELF, iovec, 10, MADV_GUARD_INSTALL, 0)) {
	    ... error handling ...
	}

vs.

	pid_t pid = getpid();
	int pidfd = pidfd_open(pid, PIDFD_THREAD);

	if (pidfd < 0) {
	   ... error handling ...
	}

	if (process_madvise(PIDFD_SELF, iovec, 10, MADV_GUARD_INSTALL, 0)) {
	   ... cleanup pidfd ...
	   ... error handling ...
	}

	...

	... cleanup pidfd ...

So in practice, it's actually a lot more ceremony and noise. Suren has been
working with this code in practice and found this to be useful.

The suggestion to embed it as PIDFD_SELF rather than to pass it as a
process_madvise() flag was made on the original series where I extended its
functionality.

So in practice I think it's onerous enough to justify this, plus it allows
for a more fluent use of pidfd's in other cases where one is referring to
the same process/thread, to the extent that I've seen people commenting on
supporting it while sending series relating to pidfd.

Also Christian and others appear to support this idea.
Pedro Falcato Jan. 30, 2025, 11:10 p.m. UTC | #3
On Thu, Jan 30, 2025 at 10:53 PM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Thu, Jan 30, 2025 at 02:37:54PM -0800, Andrew Morton wrote:
> > On Thu, 30 Jan 2025 20:40:25 +0000 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:
> >
> > > If you wish to utilise a pidfd interface to refer to the current process or
> > > thread it is rather cumbersome, requiring something like:
> > >
> > >     int pidfd = pidfd_open(getpid(), 0 or PIDFD_THREAD);
> > >
> > >     ...
> > >
> > >     close(pidfd);
> > >
> > > Or the equivalent call opening /proc/self. It is more convenient to use a
> > > sentinel value to indicate to an interface that accepts a pidfd that we
> > > simply wish to refer to the current process thread.
> > >
> >
> > The above code sequence doesn't seem at all onerous.  I'm not
> > understanding why it's worth altering the kernel to permit this little
> > shortcut?
>
> In practice it adds quite a bit of overhead for something that whatever
> mechanism is using the pidfd can avoid.
>
> It was specifically intended for a real case of utilising
> process_madvise(), using the newly extended ability to batch _any_
> madvise() operations for the current process, like:
>
>         if (process_madvise(PIDFD_SELF, iovec, 10, MADV_GUARD_INSTALL, 0)) {
>             ... error handling ...
>         }
>
> vs.
>
>         pid_t pid = getpid();
>         int pidfd = pidfd_open(pid, PIDFD_THREAD);
>
>         if (pidfd < 0) {
>            ... error handling ...
>         }
>
>         if (process_madvise(PIDFD_SELF, iovec, 10, MADV_GUARD_INSTALL, 0)) {
>            ... cleanup pidfd ...
>            ... error handling ...
>         }
>
>         ...
>
>         ... cleanup pidfd ...
>
> So in practice, it's actually a lot more ceremony and noise. Suren has been
> working with this code in practice and found this to be useful.

It's also nice to add that people on the libc/allocator side should
also appreciate skipping pidfd_open's reliability concerns (mostly,
that RLIMIT_NOFILE Should Not(tm) ever affect thread spawning or a
malloc[1]). Besides the big syscall reduction and nice speedup, that
is.

[1] whether this is the already case is an exercise left to the
reader, but at the very least we should not add onto existing problems
Andrew Morton Jan. 30, 2025, 11:32 p.m. UTC | #4
On Thu, 30 Jan 2025 23:10:53 +0000 Pedro Falcato <pedro.falcato@gmail.com> wrote:

> On Thu, Jan 30, 2025 at 10:53 PM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> >
> > > The above code sequence doesn't seem at all onerous.  I'm not
> > > understanding why it's worth altering the kernel to permit this little
> > > shortcut?
> >
> > In practice it adds quite a bit of overhead for something that whatever
> > mechanism is using the pidfd can avoid.
> >
> > It was specifically intended for a real case of utilising
> > process_madvise(), using the newly extended ability to batch _any_
> > madvise() operations for the current process, like:
> >
> >         if (process_madvise(PIDFD_SELF, iovec, 10, MADV_GUARD_INSTALL, 0)) {
> >             ... error handling ...
> >         }
> >
> > vs.
> >
> >         pid_t pid = getpid();
> >         int pidfd = pidfd_open(pid, PIDFD_THREAD);
> >
> >         if (pidfd < 0) {
> >            ... error handling ...
> >         }
> >
> >         if (process_madvise(PIDFD_SELF, iovec, 10, MADV_GUARD_INSTALL, 0)) {
> >            ... cleanup pidfd ...
> >            ... error handling ...
> >         }
> >
> >         ...
> >
> >         ... cleanup pidfd ...
> >
> > So in practice, it's actually a lot more ceremony and noise. Suren has been
> > working with this code in practice and found this to be useful.
> 
> It's also nice to add that people on the libc/allocator side should
> also appreciate skipping pidfd_open's reliability concerns (mostly,
> that RLIMIT_NOFILE Should Not(tm) ever affect thread spawning or a
> malloc[1]). Besides the big syscall reduction and nice speedup, that
> is.
> 
> [1] whether this is the already case is an exercise left to the
> reader, but at the very least we should not add onto existing problems

Thanks.

Could we please get all the above spelled out much more thoroughly in
the [0/n] description (aka Patch Series Sales Brochure)?
Lorenzo Stoakes Jan. 31, 2025, 10:21 a.m. UTC | #5
On Thu, Jan 30, 2025 at 03:32:36PM -0800, Andrew Morton wrote:
> On Thu, 30 Jan 2025 23:10:53 +0000 Pedro Falcato <pedro.falcato@gmail.com> wrote:
>
> > On Thu, Jan 30, 2025 at 10:53 PM Lorenzo Stoakes
> > <lorenzo.stoakes@oracle.com> wrote:
> > >
> > > > The above code sequence doesn't seem at all onerous.  I'm not
> > > > understanding why it's worth altering the kernel to permit this little
> > > > shortcut?
> > >
> > > In practice it adds quite a bit of overhead for something that whatever
> > > mechanism is using the pidfd can avoid.
> > >
> > > It was specifically intended for a real case of utilising
> > > process_madvise(), using the newly extended ability to batch _any_
> > > madvise() operations for the current process, like:
> > >
> > >         if (process_madvise(PIDFD_SELF, iovec, 10, MADV_GUARD_INSTALL, 0)) {
> > >             ... error handling ...
> > >         }
> > >
> > > vs.
> > >
> > >         pid_t pid = getpid();
> > >         int pidfd = pidfd_open(pid, PIDFD_THREAD);
> > >
> > >         if (pidfd < 0) {
> > >            ... error handling ...
> > >         }
> > >
> > >         if (process_madvise(PIDFD_SELF, iovec, 10, MADV_GUARD_INSTALL, 0)) {
> > >            ... cleanup pidfd ...
> > >            ... error handling ...
> > >         }
> > >
> > >         ...
> > >
> > >         ... cleanup pidfd ...
> > >
> > > So in practice, it's actually a lot more ceremony and noise. Suren has been
> > > working with this code in practice and found this to be useful.
> >
> > It's also nice to add that people on the libc/allocator side should
> > also appreciate skipping pidfd_open's reliability concerns (mostly,
> > that RLIMIT_NOFILE Should Not(tm) ever affect thread spawning or a
> > malloc[1]). Besides the big syscall reduction and nice speedup, that
> > is.
> >
> > [1] whether this is the already case is an exercise left to the
> > reader, but at the very least we should not add onto existing problems
>
> Thanks.
>
> Could we please get all the above spelled out much more thoroughly in
> the [0/n] description (aka Patch Series Sales Brochure)?

Ack, will expand if there's a respin, or Christian - perhaps you could fold
the above explanation into the cover letter?

Intent is for Christian to take this in his tree (if he so wishes) to be
clear!

Cheers
Christian Brauner Feb. 1, 2025, 11:12 a.m. UTC | #6
> Intent is for Christian to take this in his tree (if he so wishes) to be
> clear!

If you send me an updated blurb I can fold it.
Lorenzo Stoakes Feb. 1, 2025, 4:38 p.m. UTC | #7
On Sat, Feb 01, 2025 at 12:12:46PM +0100, Christian Brauner wrote:
> > Intent is for Christian to take this in his tree (if he so wishes) to be
> > clear!
>
> If you send me an updated blurb I can fold it.

Thanks, I therefore propose replacing the cover letter blurb with the below:

----8<----
If you wish to utilise a pidfd interface to refer to the current process or
thread then there is a lot of ceremony involved, looking something like:

	pid_t pid = getpid();
	int pidfd = pidfd_open(pid, PIDFD_THREAD);

	if (pidfd < 0) {
		... error handling ...
	}

	if (process_madvise(pidfd, iovec, 8, MADV_GUARD_INSTALL, 0)) {
		... cleanup pidfd ...
		... error handling ...
        }

        ...

        ... cleanup pidfd ...

This adds unnecessary overhead + system calls, complicated error handling
and in addition pidfd_open() is subject to RLIMIT_NOFILE (i.e. the limit of
per-process number of open file descriptors), so the call may fail
spuriously on this basis.

Rather than doing this we can make use of sentinels for this purpose which can
be passed as the pidfd instead. This looks like:

	if (process_madvise(PIDFD_SELF, iovec, 8, MADV_GUARD_INSTALL, 0)) {
		... error handling ...
	}

And avoids all of the aforementioned issues. This series introduces such
sentinels.

It is useful to refer to both the current thread from the userland's
perspective for which we use PIDFD_SELF, and the current process from the
userland's perspective, for which we use PIDFD_SELF_PROCESS.

There is unfortunately some confusion between the kernel and userland as to
what constitutes a process - a thread from the userland perspective is a
process in userland, and a userland process is a thread group (more
specifically the thread group leader from the kernel perspective). We
therefore alias things thusly:

* PIDFD_SELF_THREAD aliased by PIDFD_SELF - use PIDTYPE_PID.
* PIDFD_SELF_THREAD_GROUP alised by PIDFD_SELF_PROCESS - use PIDTYPE_TGID.

In all of the kernel code we refer to PIDFD_SELF_THREAD and
PIDFD_SELF_THREAD_GROUP. However we expect users to use PIDFD_SELF and
PIDFD_SELF_PROCESS.

This matters for cases where, for instance, a user unshare()'s FDs or does
thread-specific signal handling and where the user would be hugely confused
if the FDs referenced or signal processed referred to the thread group
leader rather than the individual thread.

For now we only adjust pidfd_get_task() and the pidfd_send_signal() system
call with specific handling for this, implementing this functionality for
process_madvise(), process_mrelease() (albeit, using it here wouldn't
really make sense) and pidfd_send_signal().

We defer making further changes, as this would require a significant rework
of the pidfd mechanism.

The motivating case here is to support PIDFD_SELF in process_madvise(), so
this suffices for immediate uses. Moving forward, this can be further
expanded to other uses.
Christian Brauner Feb. 4, 2025, 9:46 a.m. UTC | #8
On Thu, 30 Jan 2025 20:40:25 +0000, Lorenzo Stoakes wrote:
> If you wish to utilise a pidfd interface to refer to the current process or
> thread it is rather cumbersome, requiring something like:
> 
> 	int pidfd = pidfd_open(getpid(), 0 or PIDFD_THREAD);
> 
> 	...
> 
> [...]

Updated merge message. I've slightly rearranged pidfd_send_signal() so
we don't have to call CLASS(fd, f)(pidfd) unconditionally anymore.

---

Applied to the vfs-6.15.pidfs branch of the vfs/vfs.git tree.
Patches in the vfs-6.15.pidfs branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs-6.15.pidfs

[1/6] pidfd: add PIDFD_SELF* sentinels to refer to own thread/process
      https://git.kernel.org/vfs/vfs/c/e6e4ed42f8d8
[2/6] selftests/pidfd: add missing system header imcludes to pidfd tests
      https://git.kernel.org/vfs/vfs/c/c9f04f4a251d
[3/6] tools: testing: separate out wait_for_pid() into helper header
      https://git.kernel.org/vfs/vfs/c/fb67fe44116e
[4/6] selftests: pidfd: add pidfd.h UAPI wrapper
      https://git.kernel.org/vfs/vfs/c/ac331e56724d
[5/6] selftests: pidfd: add tests for PIDFD_SELF_*
      https://git.kernel.org/vfs/vfs/c/881a3515c191
[6/6] selftests/mm: use PIDFD_SELF in guard pages test
      https://git.kernel.org/vfs/vfs/c/b4703f056f42
Lorenzo Stoakes Feb. 4, 2025, 10:01 a.m. UTC | #9
On Tue, Feb 04, 2025 at 10:46:35AM +0100, Christian Brauner wrote:
> On Thu, 30 Jan 2025 20:40:25 +0000, Lorenzo Stoakes wrote:
> > If you wish to utilise a pidfd interface to refer to the current process or
> > thread it is rather cumbersome, requiring something like:
> >
> > 	int pidfd = pidfd_open(getpid(), 0 or PIDFD_THREAD);
> >
> > 	...
> >
> > [...]
>
> Updated merge message. I've slightly rearranged pidfd_send_signal() so
> we don't have to call CLASS(fd, f)(pidfd) unconditionally anymore.

Sounds good and thank you! Glad to get this in :)

>
> ---
>
> Applied to the vfs-6.15.pidfs branch of the vfs/vfs.git tree.
> Patches in the vfs-6.15.pidfs branch should appear in linux-next soon.
>
> Please report any outstanding bugs that were missed during review in a
> new review to the original patch series allowing us to drop it.
>
> It's encouraged to provide Acked-bys and Reviewed-bys even though the
> patch has now been applied. If possible patch trailers will be updated.
>
> Note that commit hashes shown below are subject to change due to rebase,
> trailer updates or similar. If in doubt, please check the listed branch.
>
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
> branch: vfs-6.15.pidfs
>
> [1/6] pidfd: add PIDFD_SELF* sentinels to refer to own thread/process
>       https://git.kernel.org/vfs/vfs/c/e6e4ed42f8d8
> [2/6] selftests/pidfd: add missing system header imcludes to pidfd tests
>       https://git.kernel.org/vfs/vfs/c/c9f04f4a251d
> [3/6] tools: testing: separate out wait_for_pid() into helper header
>       https://git.kernel.org/vfs/vfs/c/fb67fe44116e
> [4/6] selftests: pidfd: add pidfd.h UAPI wrapper
>       https://git.kernel.org/vfs/vfs/c/ac331e56724d
> [5/6] selftests: pidfd: add tests for PIDFD_SELF_*
>       https://git.kernel.org/vfs/vfs/c/881a3515c191
> [6/6] selftests/mm: use PIDFD_SELF in guard pages test
>       https://git.kernel.org/vfs/vfs/c/b4703f056f42
Suren Baghdasaryan Feb. 4, 2025, 5:43 p.m. UTC | #10
On Tue, Feb 4, 2025 at 2:01 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Tue, Feb 04, 2025 at 10:46:35AM +0100, Christian Brauner wrote:
> > On Thu, 30 Jan 2025 20:40:25 +0000, Lorenzo Stoakes wrote:
> > > If you wish to utilise a pidfd interface to refer to the current process or
> > > thread it is rather cumbersome, requiring something like:
> > >
> > >     int pidfd = pidfd_open(getpid(), 0 or PIDFD_THREAD);
> > >
> > >     ...
> > >
> > > [...]
> >
> > Updated merge message. I've slightly rearranged pidfd_send_signal() so
> > we don't have to call CLASS(fd, f)(pidfd) unconditionally anymore.
>
> Sounds good and thank you! Glad to get this in :)

Sorry, a bit late to the party...

We were discussing MADV_GUARD_INSTALL use with Android Bionic team and
the possibility of caching pidfd_open() result for reuse when
installing multiple guards, however doing that in libraries would pose
issues as we can't predict the user behavior, which can fork() in
between such calls. That would be an additional reason why having
these sentinels is beneficial.


>
> >
> > ---
> >
> > Applied to the vfs-6.15.pidfs branch of the vfs/vfs.git tree.
> > Patches in the vfs-6.15.pidfs branch should appear in linux-next soon.
> >
> > Please report any outstanding bugs that were missed during review in a
> > new review to the original patch series allowing us to drop it.
> >
> > It's encouraged to provide Acked-bys and Reviewed-bys even though the
> > patch has now been applied. If possible patch trailers will be updated.
> >
> > Note that commit hashes shown below are subject to change due to rebase,
> > trailer updates or similar. If in doubt, please check the listed branch.
> >
> > tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
> > branch: vfs-6.15.pidfs
> >
> > [1/6] pidfd: add PIDFD_SELF* sentinels to refer to own thread/process
> >       https://git.kernel.org/vfs/vfs/c/e6e4ed42f8d8
> > [2/6] selftests/pidfd: add missing system header imcludes to pidfd tests
> >       https://git.kernel.org/vfs/vfs/c/c9f04f4a251d
> > [3/6] tools: testing: separate out wait_for_pid() into helper header
> >       https://git.kernel.org/vfs/vfs/c/fb67fe44116e
> > [4/6] selftests: pidfd: add pidfd.h UAPI wrapper
> >       https://git.kernel.org/vfs/vfs/c/ac331e56724d
> > [5/6] selftests: pidfd: add tests for PIDFD_SELF_*
> >       https://git.kernel.org/vfs/vfs/c/881a3515c191
> > [6/6] selftests/mm: use PIDFD_SELF in guard pages test
> >       https://git.kernel.org/vfs/vfs/c/b4703f056f42
Christian Brauner Feb. 5, 2025, 9:29 a.m. UTC | #11
On Tue, Feb 04, 2025 at 09:43:31AM -0800, Suren Baghdasaryan wrote:
> On Tue, Feb 4, 2025 at 2:01 AM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> >
> > On Tue, Feb 04, 2025 at 10:46:35AM +0100, Christian Brauner wrote:
> > > On Thu, 30 Jan 2025 20:40:25 +0000, Lorenzo Stoakes wrote:
> > > > If you wish to utilise a pidfd interface to refer to the current process or
> > > > thread it is rather cumbersome, requiring something like:
> > > >
> > > >     int pidfd = pidfd_open(getpid(), 0 or PIDFD_THREAD);
> > > >
> > > >     ...
> > > >
> > > > [...]
> > >
> > > Updated merge message. I've slightly rearranged pidfd_send_signal() so
> > > we don't have to call CLASS(fd, f)(pidfd) unconditionally anymore.
> >
> > Sounds good and thank you! Glad to get this in :)
> 
> Sorry, a bit late to the party...
> 
> We were discussing MADV_GUARD_INSTALL use with Android Bionic team and
> the possibility of caching pidfd_open() result for reuse when
> installing multiple guards, however doing that in libraries would pose
> issues as we can't predict the user behavior, which can fork() in
> between such calls. That would be an additional reason why having
> these sentinels is beneficial.

Ok, added this to the cover letter as well.

Note that starting with v6.14 pidfs supports file handles.
This works because pidfs provides each pidfd with a unique 64bit inode
number that is exposed in statx(). On 64-bit the ->st_ino simply is the
inode number. On 32-bit the unique identifier can be reconstructed using
->st_ino and the inode generation number which can be retrieved via the
FS_IOC_GETVERSION ioctl. So the 64-bit identifier on 32-bit is
reconstructed by using ->st_ino as the lower 32-bits and the 32-bit
generation number as the upper 32-bits.

Also note that since the introduction of pidfs each struct pid will
refer to a different inode but the same struct pid will refer to the
same inode if it's opened multiple times. In contrast to pre-pidfs
pidfds where each struct pid refered to the same inode.

IOW, with pidfs statx() is sufficient to compare to pidfds whether they
refer to the same process. On 64-bit it's sufficient to do the usual
st1->st_dev == st2->st_dev && st1->st_ino == st2->st_ino and on 32-bit
you will want to also compare the generation number:

TEST_F(pidfd_bind_mount, reopen)
{
        int pidfd;
        char proc_path[PATH_MAX];

        sprintf(proc_path, "/proc/self/fd/%d", self->pidfd);
        pidfd = open(proc_path, O_RDONLY | O_NOCTTY | O_CLOEXEC);
        ASSERT_GE(pidfd, 0);

        ASSERT_GE(fstat(self->pidfd, &self->st2), 0);
        ASSERT_EQ(ioctl(self->pidfd, FS_IOC_GETVERSION, &self->gen2), 0);

        ASSERT_TRUE(self->st1.st_dev == self->st2.st_dev && self->st1.st_ino == self->st2.st_ino);
        ASSERT_TRUE(self->gen1 == self->gen2);

        ASSERT_EQ(close(pidfd), 0);
}

Plus, you can bind-mount them now.

In any case, this allows us to create file handles that are unique for
the lifetime of the system. Please see

tools/testing/selftests/pidfd/pidfd_file_handle_test.c

for how that works. The gist is that decoding and encoding for pidfs is
unprivileged and the only requirement we have is that the process the
file handle resolves to must be valid in the caller's pid namespace
hierarchy:

TEST_F(file_handle, file_handle_child_pidns)
{
        int mnt_id;
        struct file_handle *fh;
        int pidfd = -EBADF;
        struct stat st1, st2;

        fh = malloc(sizeof(struct file_handle) + MAX_HANDLE_SZ);
        ASSERT_NE(fh, NULL);
        memset(fh, 0, sizeof(struct file_handle) + MAX_HANDLE_SZ);
        fh->handle_bytes = MAX_HANDLE_SZ;

        ASSERT_EQ(name_to_handle_at(self->child_pidfd2, "", fh, &mnt_id, AT_EMPTY_PATH), 0);

        ASSERT_EQ(fstat(self->child_pidfd2, &st1), 0);

        pidfd = open_by_handle_at(self->pidfd, fh, 0);
        ASSERT_GE(pidfd, 0);

        ASSERT_EQ(fstat(pidfd, &st2), 0);
        ASSERT_TRUE(st1.st_dev == st2.st_dev && st1.st_ino == st2.st_ino);

        ASSERT_EQ(close(pidfd), 0);

        pidfd = open_by_handle_at(self->pidfd, fh, O_CLOEXEC);
        ASSERT_GE(pidfd, 0);

        ASSERT_EQ(fstat(pidfd, &st2), 0);
        ASSERT_TRUE(st1.st_dev == st2.st_dev && st1.st_ino == st2.st_ino);

        ASSERT_EQ(close(pidfd), 0);

        pidfd = open_by_handle_at(self->pidfd, fh, O_NONBLOCK);
        ASSERT_GE(pidfd, 0);

        ASSERT_EQ(fstat(pidfd, &st2), 0);
        ASSERT_TRUE(st1.st_dev == st2.st_dev && st1.st_ino == st2.st_ino);

        ASSERT_EQ(close(pidfd), 0);

        free(fh);
}

So you don't need to keep the fd open.

> 
> 
> >
> > >
> > > ---
> > >
> > > Applied to the vfs-6.15.pidfs branch of the vfs/vfs.git tree.
> > > Patches in the vfs-6.15.pidfs branch should appear in linux-next soon.
> > >
> > > Please report any outstanding bugs that were missed during review in a
> > > new review to the original patch series allowing us to drop it.
> > >
> > > It's encouraged to provide Acked-bys and Reviewed-bys even though the
> > > patch has now been applied. If possible patch trailers will be updated.
> > >
> > > Note that commit hashes shown below are subject to change due to rebase,
> > > trailer updates or similar. If in doubt, please check the listed branch.
> > >
> > > tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
> > > branch: vfs-6.15.pidfs
> > >
> > > [1/6] pidfd: add PIDFD_SELF* sentinels to refer to own thread/process
> > >       https://git.kernel.org/vfs/vfs/c/e6e4ed42f8d8
> > > [2/6] selftests/pidfd: add missing system header imcludes to pidfd tests
> > >       https://git.kernel.org/vfs/vfs/c/c9f04f4a251d
> > > [3/6] tools: testing: separate out wait_for_pid() into helper header
> > >       https://git.kernel.org/vfs/vfs/c/fb67fe44116e
> > > [4/6] selftests: pidfd: add pidfd.h UAPI wrapper
> > >       https://git.kernel.org/vfs/vfs/c/ac331e56724d
> > > [5/6] selftests: pidfd: add tests for PIDFD_SELF_*
> > >       https://git.kernel.org/vfs/vfs/c/881a3515c191
> > > [6/6] selftests/mm: use PIDFD_SELF in guard pages test
> > >       https://git.kernel.org/vfs/vfs/c/b4703f056f42