mbox series

[RFC,0/3] introduce PIDFD_SELF

Message ID cover.1727644404.git.lorenzo.stoakes@oracle.com (mailing list archive)
Headers show
Series introduce PIDFD_SELF | expand

Message

Lorenzo Stoakes Sept. 30, 2024, 9:22 a.m. UTC
If you wish to utilise a pidfd interface to refer to the current process
(from the point of view of userland - from the kernel point of view - the
thread group leader), it is rather cumbersome, requiring something like:

	int pidfd = pidfd_open(getpid(), 0);

	...

	close(pidfd);

Or the equivalent call opening /proc/self. It is more convenient to use a
sentinel value to indicate to an interface that accepts a pidfd that we
simply wish to refer to the current process.

This series introduces such a sentinel, PIDFD_SELF, which can be passed as
the pidfd in this instance rather than having to establish a dummy fd for
this purpose.

The only pidfd interface where this is particularly useful is
process_madvise(), which provides the motivation for this series. However,
as this is a general interface, we ensure that all pidfd interfaces can
handle this correctly.

We ensure that pidfd_send_signal() and pidfd_getfd() work correctly, and
assert as much in selftests. All other interfaces except setns() will work
implicitly with this new interface, however it doesn't make sense to test
waitid(P_PIDFD, ...) as waiting on ourselves is a blocking operation.

In the case of setns() we explicitly disallow use of PIDFD_SELF as it
doesn't make sense to obtain the namespaces of our own process, and it
would require work to implement this functionality there that would be of
no use.

We also do not provide the ability to utilise PIDFD_SELF in ordinary fd
operations such as open() or poll(), as this would require extensive work
and be of no real use.

Lorenzo Stoakes (3):
  pidfd: refactor pidfd_get_pid/to_pid() and de-duplicate pid lookup
  pidfd: add PIDFD_SELF sentinel to refer to own process
  selftests: pidfd: add tests for PIDFD_SELF

 include/linux/pid.h                           | 43 +++++++++++-
 include/uapi/linux/pidfd.h                    |  3 +
 kernel/exit.c                                 |  3 +-
 kernel/nsproxy.c                              |  1 +
 kernel/pid.c                                  | 70 +++++++++++++------
 kernel/signal.c                               | 26 ++-----
 tools/testing/selftests/pidfd/pidfd.h         |  5 ++
 .../selftests/pidfd/pidfd_getfd_test.c        | 38 ++++++++++
 .../selftests/pidfd/pidfd_setns_test.c        |  6 ++
 tools/testing/selftests/pidfd/pidfd_test.c    | 13 ++++
 10 files changed, 165 insertions(+), 43 deletions(-)

Comments

Florian Weimer Sept. 30, 2024, 10:33 a.m. UTC | #1
* Lorenzo Stoakes:

> If you wish to utilise a pidfd interface to refer to the current process
> (from the point of view of userland - from the kernel point of view - the
> thread group leader), it is rather cumbersome, requiring something like:
>
> 	int pidfd = pidfd_open(getpid(), 0);
>
> 	...
>
> 	close(pidfd);
>
> Or the equivalent call opening /proc/self. It is more convenient to use a
> sentinel value to indicate to an interface that accepts a pidfd that we
> simply wish to refer to the current process.

The descriptor will refer to the current thread, not process, right?

The distinction matters for pidfd_getfd if a process contains multiple
threads with different file descriptor tables, and probably for
pidfd_send_signal as well.

Thanks,
Florian
Lorenzo Stoakes Sept. 30, 2024, 10:39 a.m. UTC | #2
On Mon, Sep 30, 2024 at 12:33:18PM GMT, Florian Weimer wrote:
> * Lorenzo Stoakes:
>
> > If you wish to utilise a pidfd interface to refer to the current process
> > (from the point of view of userland - from the kernel point of view - the
> > thread group leader), it is rather cumbersome, requiring something like:
> >
> > 	int pidfd = pidfd_open(getpid(), 0);
> >
> > 	...
> >
> > 	close(pidfd);
> >
> > Or the equivalent call opening /proc/self. It is more convenient to use a
> > sentinel value to indicate to an interface that accepts a pidfd that we
> > simply wish to refer to the current process.
>
> The descriptor will refer to the current thread, not process, right?

No it refers to the current process (i.e. thread group leader from kernel
perspective). Unless you specify PIDFD_THREAD, this is the same if you did the above.

>
> The distinction matters for pidfd_getfd if a process contains multiple
> threads with different file descriptor tables, and probably for
> pidfd_send_signal as well.

You mean if you did a strange set of flags to clone()? Otherwise these are
shared right?

Again, we are explicitly looking at process not thread from userland
perspective. A PIDFD_SELF_THREAD might be possible, but this series doesn't try
to implement that.

>
> Thanks,
> Florian
>
Christian Brauner Sept. 30, 2024, 12:34 p.m. UTC | #3
On Mon, Sep 30, 2024 at 11:39:49AM GMT, Lorenzo Stoakes wrote:
> On Mon, Sep 30, 2024 at 12:33:18PM GMT, Florian Weimer wrote:
> > * Lorenzo Stoakes:
> >
> > > If you wish to utilise a pidfd interface to refer to the current process
> > > (from the point of view of userland - from the kernel point of view - the
> > > thread group leader), it is rather cumbersome, requiring something like:
> > >
> > > 	int pidfd = pidfd_open(getpid(), 0);
> > >
> > > 	...
> > >
> > > 	close(pidfd);
> > >
> > > Or the equivalent call opening /proc/self. It is more convenient to use a
> > > sentinel value to indicate to an interface that accepts a pidfd that we
> > > simply wish to refer to the current process.
> >
> > The descriptor will refer to the current thread, not process, right?
> 
> No it refers to the current process (i.e. thread group leader from kernel
> perspective). Unless you specify PIDFD_THREAD, this is the same if you did the above.
> 
> >
> > The distinction matters for pidfd_getfd if a process contains multiple
> > threads with different file descriptor tables, and probably for
> > pidfd_send_signal as well.
> 
> You mean if you did a strange set of flags to clone()? Otherwise these are
> shared right?
> 
> Again, we are explicitly looking at process not thread from userland
> perspective. A PIDFD_SELF_THREAD might be possible, but this series doesn't try
> to implement that.

Florian raises a good point. Currently we have:

(1) int pidfd_tgid = pidfd_open(getpid(), 0);
(2) int pidfd_thread = pidfd_open(getpid(), PIDFD_THREAD);

and this instructs:

pidfd_send_signal()
pidfd_getfd()

to do different things. For pidfd_send_signal() it's whether the
operation has thread-group scope or thread-scope for pidfd_send_signal()
and for pidfd_getfd() it determines the fdtable to use.

The thing is that if you pass:

pidfd_getfd(PDIFD_SELF)

and you have:

TGID

T1 {
    clone(CLONE_THREAD)
    unshare(CLONE_FILES)
}

T2 {
    clone(CLONE_THREAD)
    unshare(CLONE_FILES)
}

You have 3 threads in the same thread-group that all have distinct file
descriptor tables from each other.

So if T1 did:

pidfd_getfd(PIDFD_SELF, ...)

and we mirror the PIDTYPE_TGID behavior then T1 will very likely expect
to get the fd from its file descriptor table. IOW, its reasonable to
expect that T1 is interested in their very own resource, not someone
else's even if it is the thread-group leader.

But what T1 will get in reality is an fd from TGID's file descriptor
table (and similar for T2).

Iirc, yes that confusion exists already with /proc/self. But the
question is whether we should add the same confusion to the pidfd api or
whether we make PIDFD_SELF actually mean PIDTYPE_PID aka the actual
calling thread.

My thinking is that if you have the reasonable suspicion that you're
multi-threaded and that you're interested in the thread-group resource
then you should be using:

int pidfd = pidfd_open(getpid(), 0)

and hand that thread-group leader pidfd around since you're interested
in another thread. But if you're really just interested in your own
resource then pidfd_open(getpid(), 0) makes no sense and you would want
PIDFD_SELF.

Thoughts?
Lorenzo Stoakes Sept. 30, 2024, 1:10 p.m. UTC | #4
On Mon, Sep 30, 2024 at 02:34:33PM GMT, Christian Brauner wrote:
> On Mon, Sep 30, 2024 at 11:39:49AM GMT, Lorenzo Stoakes wrote:
> > On Mon, Sep 30, 2024 at 12:33:18PM GMT, Florian Weimer wrote:
> > > * Lorenzo Stoakes:
> > >
> > > > If you wish to utilise a pidfd interface to refer to the current process
> > > > (from the point of view of userland - from the kernel point of view - the
> > > > thread group leader), it is rather cumbersome, requiring something like:
> > > >
> > > > 	int pidfd = pidfd_open(getpid(), 0);
> > > >
> > > > 	...
> > > >
> > > > 	close(pidfd);
> > > >
> > > > Or the equivalent call opening /proc/self. It is more convenient to use a
> > > > sentinel value to indicate to an interface that accepts a pidfd that we
> > > > simply wish to refer to the current process.
> > >
> > > The descriptor will refer to the current thread, not process, right?
> >
> > No it refers to the current process (i.e. thread group leader from kernel
> > perspective). Unless you specify PIDFD_THREAD, this is the same if you did the above.
> >
> > >
> > > The distinction matters for pidfd_getfd if a process contains multiple
> > > threads with different file descriptor tables, and probably for
> > > pidfd_send_signal as well.
> >
> > You mean if you did a strange set of flags to clone()? Otherwise these are
> > shared right?
> >
> > Again, we are explicitly looking at process not thread from userland
> > perspective. A PIDFD_SELF_THREAD might be possible, but this series doesn't try
> > to implement that.
>
> Florian raises a good point. Currently we have:
>
> (1) int pidfd_tgid = pidfd_open(getpid(), 0);
> (2) int pidfd_thread = pidfd_open(getpid(), PIDFD_THREAD);
>
> and this instructs:
>
> pidfd_send_signal()
> pidfd_getfd()
>
> to do different things. For pidfd_send_signal() it's whether the
> operation has thread-group scope or thread-scope for pidfd_send_signal()
> and for pidfd_getfd() it determines the fdtable to use.
>
> The thing is that if you pass:
>
> pidfd_getfd(PDIFD_SELF)
>
> and you have:
>
> TGID
>
> T1 {
>     clone(CLONE_THREAD)
>     unshare(CLONE_FILES)
> }
>
> T2 {
>     clone(CLONE_THREAD)
>     unshare(CLONE_FILES)
> }
>
> You have 3 threads in the same thread-group that all have distinct file
> descriptor tables from each other.
>
> So if T1 did:
>
> pidfd_getfd(PIDFD_SELF, ...)
>
> and we mirror the PIDTYPE_TGID behavior then T1 will very likely expect
> to get the fd from its file descriptor table. IOW, its reasonable to
> expect that T1 is interested in their very own resource, not someone
> else's even if it is the thread-group leader.
>
> But what T1 will get in reality is an fd from TGID's file descriptor
> table (and similar for T2).
>
> Iirc, yes that confusion exists already with /proc/self. But the
> question is whether we should add the same confusion to the pidfd api or
> whether we make PIDFD_SELF actually mean PIDTYPE_PID aka the actual
> calling thread.
>
> My thinking is that if you have the reasonable suspicion that you're
> multi-threaded and that you're interested in the thread-group resource
> then you should be using:
>
> int pidfd = pidfd_open(getpid(), 0)
>
> and hand that thread-group leader pidfd around since you're interested
> in another thread. But if you're really just interested in your own
> resource then pidfd_open(getpid(), 0) makes no sense and you would want
> PIDFD_SELF.
>
> Thoughts?

I mean from my perspective, my aim is to get current->mm for
process_madvise() so both work for me :) however you both raise a very good
point here (sorry Florian, perhaps I was a little too dismissive as to your
point, you're absolutely right).

My intent was for PIDFD_SELF to simply mirror the pidfd_open(getpid(), 0)
behaviour, but you and Florian make a strong case that you'd _probably_
find this very confusing had you unshared in this fashion.

I mean in general this confusion already exists, and is for what
PIDFD_THREAD was created, but I suspect ideally if you could go back you
might actually do this by default Christian + let the TGL behaviour be the
optional thing?

For most users this will not be an issue, but for those they'd get the same
result whichever they used, but yes actually I think you're both right -
PIDFD_SELF should in effect imply PIDFD_THREAD.

We can adjust the pidfd_send_signal() call to infer the correct scope
(actually nicely we can do that without any change there, by having
__pidfd_get_pid() set f_flags accordingly).

So TL;DR: I agree, I will respin with PIDFD_SELF referring to the thread.

My question in return here then is - should we introduce PIDFD_SELF_PROCESS
also (do advise if you feel this naming isn't quite right) - to provide
thread group leader behaviour?

Thanks!
Aleksa Sarai Sept. 30, 2024, 2:21 p.m. UTC | #5
On 2024-09-30, Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:
> On Mon, Sep 30, 2024 at 02:34:33PM GMT, Christian Brauner wrote:
> > On Mon, Sep 30, 2024 at 11:39:49AM GMT, Lorenzo Stoakes wrote:
> > > On Mon, Sep 30, 2024 at 12:33:18PM GMT, Florian Weimer wrote:
> > > > * Lorenzo Stoakes:
> > > >
> > > > > If you wish to utilise a pidfd interface to refer to the current process
> > > > > (from the point of view of userland - from the kernel point of view - the
> > > > > thread group leader), it is rather cumbersome, requiring something like:
> > > > >
> > > > > 	int pidfd = pidfd_open(getpid(), 0);
> > > > >
> > > > > 	...
> > > > >
> > > > > 	close(pidfd);
> > > > >
> > > > > Or the equivalent call opening /proc/self. It is more convenient to use a
> > > > > sentinel value to indicate to an interface that accepts a pidfd that we
> > > > > simply wish to refer to the current process.
> > > >
> > > > The descriptor will refer to the current thread, not process, right?
> > >
> > > No it refers to the current process (i.e. thread group leader from kernel
> > > perspective). Unless you specify PIDFD_THREAD, this is the same if you did the above.
> > >
> > > >
> > > > The distinction matters for pidfd_getfd if a process contains multiple
> > > > threads with different file descriptor tables, and probably for
> > > > pidfd_send_signal as well.
> > >
> > > You mean if you did a strange set of flags to clone()? Otherwise these are
> > > shared right?
> > >
> > > Again, we are explicitly looking at process not thread from userland
> > > perspective. A PIDFD_SELF_THREAD might be possible, but this series doesn't try
> > > to implement that.
> >
> > Florian raises a good point. Currently we have:
> >
> > (1) int pidfd_tgid = pidfd_open(getpid(), 0);
> > (2) int pidfd_thread = pidfd_open(getpid(), PIDFD_THREAD);
> >
> > and this instructs:
> >
> > pidfd_send_signal()
> > pidfd_getfd()
> >
> > to do different things. For pidfd_send_signal() it's whether the
> > operation has thread-group scope or thread-scope for pidfd_send_signal()
> > and for pidfd_getfd() it determines the fdtable to use.
> >
> > The thing is that if you pass:
> >
> > pidfd_getfd(PDIFD_SELF)
> >
> > and you have:
> >
> > TGID
> >
> > T1 {
> >     clone(CLONE_THREAD)
> >     unshare(CLONE_FILES)
> > }
> >
> > T2 {
> >     clone(CLONE_THREAD)
> >     unshare(CLONE_FILES)
> > }
> >
> > You have 3 threads in the same thread-group that all have distinct file
> > descriptor tables from each other.
> >
> > So if T1 did:
> >
> > pidfd_getfd(PIDFD_SELF, ...)
> >
> > and we mirror the PIDTYPE_TGID behavior then T1 will very likely expect
> > to get the fd from its file descriptor table. IOW, its reasonable to
> > expect that T1 is interested in their very own resource, not someone
> > else's even if it is the thread-group leader.
> >
> > But what T1 will get in reality is an fd from TGID's file descriptor
> > table (and similar for T2).
> >
> > Iirc, yes that confusion exists already with /proc/self. But the
> > question is whether we should add the same confusion to the pidfd api or
> > whether we make PIDFD_SELF actually mean PIDTYPE_PID aka the actual
> > calling thread.
> >
> > My thinking is that if you have the reasonable suspicion that you're
> > multi-threaded and that you're interested in the thread-group resource
> > then you should be using:
> >
> > int pidfd = pidfd_open(getpid(), 0)
> >
> > and hand that thread-group leader pidfd around since you're interested
> > in another thread. But if you're really just interested in your own
> > resource then pidfd_open(getpid(), 0) makes no sense and you would want
> > PIDFD_SELF.
> >
> > Thoughts?
> 
> I mean from my perspective, my aim is to get current->mm for
> process_madvise() so both work for me :) however you both raise a very good
> point here (sorry Florian, perhaps I was a little too dismissive as to your
> point, you're absolutely right).
> 
> My intent was for PIDFD_SELF to simply mirror the pidfd_open(getpid(), 0)
> behaviour, but you and Florian make a strong case that you'd _probably_
> find this very confusing had you unshared in this fashion.
> 
> I mean in general this confusion already exists, and is for what
> PIDFD_THREAD was created, but I suspect ideally if you could go back you
> might actually do this by default Christian + let the TGL behaviour be the
> optional thing?
> 
> For most users this will not be an issue, but for those they'd get the same
> result whichever they used, but yes actually I think you're both right -
> PIDFD_SELF should in effect imply PIDFD_THREAD.

Funnily enough we ran into issues with this when running Go code in runc
that did precisely this -- /proc/self gave you the wrong fd table in
very specific circumstances that were annoying to debug. For languages
with green-threading you can't turn off (like Go) these kinds of issues
pop up surprisingly often.

> We can adjust the pidfd_send_signal() call to infer the correct scope
> (actually nicely we can do that without any change there, by having
> __pidfd_get_pid() set f_flags accordingly).
> 
> So TL;DR: I agree, I will respin with PIDFD_SELF referring to the thread.
> 
> My question in return here then is - should we introduce PIDFD_SELF_PROCESS
> also (do advise if you feel this naming isn't quite right) - to provide
> thread group leader behaviour?

Sorry to bike-shed, but to match /proc/self and /proc/thread-self, maybe
they should be called PIDFD_SELF (for tgid) and PIDFD_THREAD_SELF (for
current's tid)? In principle I guess users might use PIDFD_SELF by
accident but if we mirror the naming with /proc/{,thread-}self that
might not be that big of an issue?

Just a thought.

> 
> Thanks!
>
Lorenzo Stoakes Sept. 30, 2024, 2:32 p.m. UTC | #6
On Mon, Sep 30, 2024 at 04:21:23PM GMT, Aleksa Sarai wrote:
> On 2024-09-30, Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:
> > On Mon, Sep 30, 2024 at 02:34:33PM GMT, Christian Brauner wrote:
> > > On Mon, Sep 30, 2024 at 11:39:49AM GMT, Lorenzo Stoakes wrote:
> > > > On Mon, Sep 30, 2024 at 12:33:18PM GMT, Florian Weimer wrote:
> > > > > * Lorenzo Stoakes:
> > > > >
> > > > > > If you wish to utilise a pidfd interface to refer to the current process
> > > > > > (from the point of view of userland - from the kernel point of view - the
> > > > > > thread group leader), it is rather cumbersome, requiring something like:
> > > > > >
> > > > > > 	int pidfd = pidfd_open(getpid(), 0);
> > > > > >
> > > > > > 	...
> > > > > >
> > > > > > 	close(pidfd);
> > > > > >
> > > > > > Or the equivalent call opening /proc/self. It is more convenient to use a
> > > > > > sentinel value to indicate to an interface that accepts a pidfd that we
> > > > > > simply wish to refer to the current process.
> > > > >
> > > > > The descriptor will refer to the current thread, not process, right?
> > > >
> > > > No it refers to the current process (i.e. thread group leader from kernel
> > > > perspective). Unless you specify PIDFD_THREAD, this is the same if you did the above.
> > > >
> > > > >
> > > > > The distinction matters for pidfd_getfd if a process contains multiple
> > > > > threads with different file descriptor tables, and probably for
> > > > > pidfd_send_signal as well.
> > > >
> > > > You mean if you did a strange set of flags to clone()? Otherwise these are
> > > > shared right?
> > > >
> > > > Again, we are explicitly looking at process not thread from userland
> > > > perspective. A PIDFD_SELF_THREAD might be possible, but this series doesn't try
> > > > to implement that.
> > >
> > > Florian raises a good point. Currently we have:
> > >
> > > (1) int pidfd_tgid = pidfd_open(getpid(), 0);
> > > (2) int pidfd_thread = pidfd_open(getpid(), PIDFD_THREAD);
> > >
> > > and this instructs:
> > >
> > > pidfd_send_signal()
> > > pidfd_getfd()
> > >
> > > to do different things. For pidfd_send_signal() it's whether the
> > > operation has thread-group scope or thread-scope for pidfd_send_signal()
> > > and for pidfd_getfd() it determines the fdtable to use.
> > >
> > > The thing is that if you pass:
> > >
> > > pidfd_getfd(PDIFD_SELF)
> > >
> > > and you have:
> > >
> > > TGID
> > >
> > > T1 {
> > >     clone(CLONE_THREAD)
> > >     unshare(CLONE_FILES)
> > > }
> > >
> > > T2 {
> > >     clone(CLONE_THREAD)
> > >     unshare(CLONE_FILES)
> > > }
> > >
> > > You have 3 threads in the same thread-group that all have distinct file
> > > descriptor tables from each other.
> > >
> > > So if T1 did:
> > >
> > > pidfd_getfd(PIDFD_SELF, ...)
> > >
> > > and we mirror the PIDTYPE_TGID behavior then T1 will very likely expect
> > > to get the fd from its file descriptor table. IOW, its reasonable to
> > > expect that T1 is interested in their very own resource, not someone
> > > else's even if it is the thread-group leader.
> > >
> > > But what T1 will get in reality is an fd from TGID's file descriptor
> > > table (and similar for T2).
> > >
> > > Iirc, yes that confusion exists already with /proc/self. But the
> > > question is whether we should add the same confusion to the pidfd api or
> > > whether we make PIDFD_SELF actually mean PIDTYPE_PID aka the actual
> > > calling thread.
> > >
> > > My thinking is that if you have the reasonable suspicion that you're
> > > multi-threaded and that you're interested in the thread-group resource
> > > then you should be using:
> > >
> > > int pidfd = pidfd_open(getpid(), 0)
> > >
> > > and hand that thread-group leader pidfd around since you're interested
> > > in another thread. But if you're really just interested in your own
> > > resource then pidfd_open(getpid(), 0) makes no sense and you would want
> > > PIDFD_SELF.
> > >
> > > Thoughts?
> >
> > I mean from my perspective, my aim is to get current->mm for
> > process_madvise() so both work for me :) however you both raise a very good
> > point here (sorry Florian, perhaps I was a little too dismissive as to your
> > point, you're absolutely right).
> >
> > My intent was for PIDFD_SELF to simply mirror the pidfd_open(getpid(), 0)
> > behaviour, but you and Florian make a strong case that you'd _probably_
> > find this very confusing had you unshared in this fashion.
> >
> > I mean in general this confusion already exists, and is for what
> > PIDFD_THREAD was created, but I suspect ideally if you could go back you
> > might actually do this by default Christian + let the TGL behaviour be the
> > optional thing?
> >
> > For most users this will not be an issue, but for those they'd get the same
> > result whichever they used, but yes actually I think you're both right -
> > PIDFD_SELF should in effect imply PIDFD_THREAD.
>
> Funnily enough we ran into issues with this when running Go code in runc
> that did precisely this -- /proc/self gave you the wrong fd table in
> very specific circumstances that were annoying to debug. For languages
> with green-threading you can't turn off (like Go) these kinds of issues
> pop up surprisingly often.

Yeah, damn, useful insight that such things do happen in the wild.

>
> > We can adjust the pidfd_send_signal() call to infer the correct scope
> > (actually nicely we can do that without any change there, by having
> > __pidfd_get_pid() set f_flags accordingly).
> >
> > So TL;DR: I agree, I will respin with PIDFD_SELF referring to the thread.
> >
> > My question in return here then is - should we introduce PIDFD_SELF_PROCESS
> > also (do advise if you feel this naming isn't quite right) - to provide
> > thread group leader behaviour?
>
> Sorry to bike-shed, but to match /proc/self and /proc/thread-self, maybe
> they should be called PIDFD_SELF (for tgid) and PIDFD_THREAD_SELF (for
> current's tid)? In principle I guess users might use PIDFD_SELF by
> accident but if we mirror the naming with /proc/{,thread-}self that
> might not be that big of an issue?

Lol, you know I wasn't even aware /proc/thread-self existed...

Yeah, that actually makes sense and is consistent, though obviously the
concern is people will reflexively use PIDFD_SELF and end up with
potentially confusing results.

I will obviously be doing a manpage patch for this so we can spell it out
there very clearly and also in the header - so I'd actually lean towards
doing this myself.

Christian, Florian? Thoughts?

Thanks!

>
> Just a thought.
>
> >
> > Thanks!
> >
>
> --
> Aleksa Sarai
> Senior Software Engineer (Containers)
> SUSE Linux GmbH
> <https://www.cyphar.com/>
Christian Brauner Oct. 1, 2024, 10:21 a.m. UTC | #7
On Mon, Sep 30, 2024 at 03:32:25PM GMT, Lorenzo Stoakes wrote:
> On Mon, Sep 30, 2024 at 04:21:23PM GMT, Aleksa Sarai wrote:
> > On 2024-09-30, Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:
> > > On Mon, Sep 30, 2024 at 02:34:33PM GMT, Christian Brauner wrote:
> > > > On Mon, Sep 30, 2024 at 11:39:49AM GMT, Lorenzo Stoakes wrote:
> > > > > On Mon, Sep 30, 2024 at 12:33:18PM GMT, Florian Weimer wrote:
> > > > > > * Lorenzo Stoakes:
> > > > > >
> > > > > > > If you wish to utilise a pidfd interface to refer to the current process
> > > > > > > (from the point of view of userland - from the kernel point of view - the
> > > > > > > thread group leader), it is rather cumbersome, requiring something like:
> > > > > > >
> > > > > > > 	int pidfd = pidfd_open(getpid(), 0);
> > > > > > >
> > > > > > > 	...
> > > > > > >
> > > > > > > 	close(pidfd);
> > > > > > >
> > > > > > > Or the equivalent call opening /proc/self. It is more convenient to use a
> > > > > > > sentinel value to indicate to an interface that accepts a pidfd that we
> > > > > > > simply wish to refer to the current process.
> > > > > >
> > > > > > The descriptor will refer to the current thread, not process, right?
> > > > >
> > > > > No it refers to the current process (i.e. thread group leader from kernel
> > > > > perspective). Unless you specify PIDFD_THREAD, this is the same if you did the above.
> > > > >
> > > > > >
> > > > > > The distinction matters for pidfd_getfd if a process contains multiple
> > > > > > threads with different file descriptor tables, and probably for
> > > > > > pidfd_send_signal as well.
> > > > >
> > > > > You mean if you did a strange set of flags to clone()? Otherwise these are
> > > > > shared right?
> > > > >
> > > > > Again, we are explicitly looking at process not thread from userland
> > > > > perspective. A PIDFD_SELF_THREAD might be possible, but this series doesn't try
> > > > > to implement that.
> > > >
> > > > Florian raises a good point. Currently we have:
> > > >
> > > > (1) int pidfd_tgid = pidfd_open(getpid(), 0);
> > > > (2) int pidfd_thread = pidfd_open(getpid(), PIDFD_THREAD);
> > > >
> > > > and this instructs:
> > > >
> > > > pidfd_send_signal()
> > > > pidfd_getfd()
> > > >
> > > > to do different things. For pidfd_send_signal() it's whether the
> > > > operation has thread-group scope or thread-scope for pidfd_send_signal()
> > > > and for pidfd_getfd() it determines the fdtable to use.
> > > >
> > > > The thing is that if you pass:
> > > >
> > > > pidfd_getfd(PDIFD_SELF)
> > > >
> > > > and you have:
> > > >
> > > > TGID
> > > >
> > > > T1 {
> > > >     clone(CLONE_THREAD)
> > > >     unshare(CLONE_FILES)
> > > > }
> > > >
> > > > T2 {
> > > >     clone(CLONE_THREAD)
> > > >     unshare(CLONE_FILES)
> > > > }
> > > >
> > > > You have 3 threads in the same thread-group that all have distinct file
> > > > descriptor tables from each other.
> > > >
> > > > So if T1 did:
> > > >
> > > > pidfd_getfd(PIDFD_SELF, ...)
> > > >
> > > > and we mirror the PIDTYPE_TGID behavior then T1 will very likely expect
> > > > to get the fd from its file descriptor table. IOW, its reasonable to
> > > > expect that T1 is interested in their very own resource, not someone
> > > > else's even if it is the thread-group leader.
> > > >
> > > > But what T1 will get in reality is an fd from TGID's file descriptor
> > > > table (and similar for T2).
> > > >
> > > > Iirc, yes that confusion exists already with /proc/self. But the
> > > > question is whether we should add the same confusion to the pidfd api or
> > > > whether we make PIDFD_SELF actually mean PIDTYPE_PID aka the actual
> > > > calling thread.
> > > >
> > > > My thinking is that if you have the reasonable suspicion that you're
> > > > multi-threaded and that you're interested in the thread-group resource
> > > > then you should be using:
> > > >
> > > > int pidfd = pidfd_open(getpid(), 0)
> > > >
> > > > and hand that thread-group leader pidfd around since you're interested
> > > > in another thread. But if you're really just interested in your own
> > > > resource then pidfd_open(getpid(), 0) makes no sense and you would want
> > > > PIDFD_SELF.
> > > >
> > > > Thoughts?
> > >
> > > I mean from my perspective, my aim is to get current->mm for
> > > process_madvise() so both work for me :) however you both raise a very good
> > > point here (sorry Florian, perhaps I was a little too dismissive as to your
> > > point, you're absolutely right).
> > >
> > > My intent was for PIDFD_SELF to simply mirror the pidfd_open(getpid(), 0)
> > > behaviour, but you and Florian make a strong case that you'd _probably_
> > > find this very confusing had you unshared in this fashion.
> > >
> > > I mean in general this confusion already exists, and is for what
> > > PIDFD_THREAD was created, but I suspect ideally if you could go back you
> > > might actually do this by default Christian + let the TGL behaviour be the
> > > optional thing?
> > >
> > > For most users this will not be an issue, but for those they'd get the same
> > > result whichever they used, but yes actually I think you're both right -
> > > PIDFD_SELF should in effect imply PIDFD_THREAD.
> >
> > Funnily enough we ran into issues with this when running Go code in runc
> > that did precisely this -- /proc/self gave you the wrong fd table in
> > very specific circumstances that were annoying to debug. For languages
> > with green-threading you can't turn off (like Go) these kinds of issues
> > pop up surprisingly often.
> 
> Yeah, damn, useful insight that such things do happen in the wild.
> 
> >
> > > We can adjust the pidfd_send_signal() call to infer the correct scope
> > > (actually nicely we can do that without any change there, by having
> > > __pidfd_get_pid() set f_flags accordingly).
> > >
> > > So TL;DR: I agree, I will respin with PIDFD_SELF referring to the thread.
> > >
> > > My question in return here then is - should we introduce PIDFD_SELF_PROCESS
> > > also (do advise if you feel this naming isn't quite right) - to provide
> > > thread group leader behaviour?
> >
> > Sorry to bike-shed, but to match /proc/self and /proc/thread-self, maybe
> > they should be called PIDFD_SELF (for tgid) and PIDFD_THREAD_SELF (for
> > current's tid)? In principle I guess users might use PIDFD_SELF by
> > accident but if we mirror the naming with /proc/{,thread-}self that
> > might not be that big of an issue?
> 
> Lol, you know I wasn't even aware /proc/thread-self existed...

Wait until you learn that /proc/$TID thread entries exist but aren't
shown when you do ls -al /proc, only when you explicitly access them.

> 
> Yeah, that actually makes sense and is consistent, though obviously the
> concern is people will reflexively use PIDFD_SELF and end up with
> potentially confusing results.
> 
> I will obviously be doing a manpage patch for this so we can spell it out
> there very clearly and also in the header - so I'd actually lean towards
> doing this myself.
> 
> Christian, Florian? Thoughts?

I think adding both would be potentially useful. How about:

#define PIDFD_SELF		-100
#define PIDFD_THREAD_GROUP	-200

This will make PIDFD_SELF mean current and PIDFD_THREAD_GROUP mean
current->pid_links[PIDTYPE_TGID]. I don't think we need to or should
mirror procfs in any way. pidfds are intended to be usable without
procfs at all.

I want to leave one comment on a bit of quirkiness in the api when we
add this. I don't consider it a big deal but it should be pointed out.

It can be useful to allow PIDFD_SELF or PIDFD_THREAD_GROUP to refer to
the calling thread even for pidfd_open() to avoid an additional getpid()
system call:

(1) pidfd_open(PIDFD_SELF, PIDFD_THREAD)
(2) pidfd_open(PIDFD_SELF, 0)

So if we allow this (Should we allow it?) then (1) is just redundant but
whathever. But (2) is at least worth discussing: Should we reject (2) on
the grounds of contradictory requests or allow it and document that it's
equivalent to pidfd_open(getpid(), PIDFD_THREAD)? It feels like the
latter would be ok.

Similar for pidfd_send_signal():

// redundant but ok:
(1) pidfd_send_signal(PIDFD_SELF,         PIDFD_SIGNAL_THREAD)

// redundant but ok:
(2) pidfd_send_signal(PIDFD_THREAD_GROUP, PIDFD_SIGNAL_THREAD_GROUP)

// weird way to spell pidfd_send_signal(PIDFD_THREAD_GROUP, 0)
(3) pidfd_send_signal(PIDFD_SELF,         PIDFD_SIGNAL_THREAD_GROUP)

// weird way to spell pidfd_send_signal(PIDFD_SELF, 0)
(4) pidfd_send_signal(PIDFD_THREAD_GROUP, PIDFD_SIGNAL_THREAD)

I think all of this is ok but does anyone else have a strong opinion?
Lorenzo Stoakes Oct. 1, 2024, 2:31 p.m. UTC | #8
On Tue, Oct 01, 2024 at 12:21:32PM GMT, Christian Brauner wrote:
> On Mon, Sep 30, 2024 at 03:32:25PM GMT, Lorenzo Stoakes wrote:
> > On Mon, Sep 30, 2024 at 04:21:23PM GMT, Aleksa Sarai wrote:

[snip]

> > > Sorry to bike-shed, but to match /proc/self and /proc/thread-self, maybe
> > > they should be called PIDFD_SELF (for tgid) and PIDFD_THREAD_SELF (for
> > > current's tid)? In principle I guess users might use PIDFD_SELF by
> > > accident but if we mirror the naming with /proc/{,thread-}self that
> > > might not be that big of an issue?
> >
> > Lol, you know I wasn't even aware /proc/thread-self existed...
>
> Wait until you learn that /proc/$TID thread entries exist but aren't
> shown when you do ls -al /proc, only when you explicitly access them.

My God... You're right, that's crazy... :)

>
> >
> > Yeah, that actually makes sense and is consistent, though obviously the
> > concern is people will reflexively use PIDFD_SELF and end up with
> > potentially confusing results.
> >
> > I will obviously be doing a manpage patch for this so we can spell it out
> > there very clearly and also in the header - so I'd actually lean towards
> > doing this myself.
> >
> > Christian, Florian? Thoughts?
>
> I think adding both would be potentially useful. How about:
>
> #define PIDFD_SELF		-100
> #define PIDFD_THREAD_GROUP	-200

Sure, makes sense to add both.

>
> This will make PIDFD_SELF mean current and PIDFD_THREAD_GROUP mean
> current->pid_links[PIDTYPE_TGID]. I don't think we need to or should
> mirror procfs in any way. pidfds are intended to be usable without
> procfs at all.

Yeah, I think it's important to ensure the _default_ choice, so in this
case PIDFD_SELF clearly, is one that will be least surprising.

The proc thing is sort of pleasing from an aesthetic point of view, but if
you followed it you'd have to _clearly_ document PIDFD_THREAD_SELF as the
default.

Happy to go along with this. PIDFD_THREAD_GROUP is also clearer as it is
distinct from PIDFD_SELF (doesn't reference 'self' at all).

>
> I want to leave one comment on a bit of quirkiness in the api when we
> add this. I don't consider it a big deal but it should be pointed out.
>
> It can be useful to allow PIDFD_SELF or PIDFD_THREAD_GROUP to refer to
> the calling thread even for pidfd_open() to avoid an additional getpid()
> system call:
>
> (1) pidfd_open(PIDFD_SELF, PIDFD_THREAD)
> (2) pidfd_open(PIDFD_SELF, 0)
>

Hm, this is a bit weird, as these are pid_t's and PIDFD_SELF and
PIDFD_THREAD_GROUP are otherwise (pid)fd's.

Being dummy values sort of allows us to put them into service here also,
but it is just weird, we pass what is usually a pidfd to receive a pidfd,
only this time it's an actually concrete one?

I'm not sure I like this, even though as you say it avoids a getpid().

If we did this I'd prefer it to be a separate name, even if it has the same
numeric value (I guess we also might want to check for anything that uses a
negative pid_t to refer to an error or something else too).

Perhaps PID_SELF and PID_THREAD_GROUP?

> So if we allow this (Should we allow it?) then (1) is just redundant but
> whathever. But (2) is at least worth discussing: Should we reject (2) on
> the grounds of contradictory requests or allow it and document that it's
> equivalent to pidfd_open(getpid(), PIDFD_THREAD)? It feels like the
> latter would be ok.
>
> Similar for pidfd_send_signal():
>
> // redundant but ok:
> (1) pidfd_send_signal(PIDFD_SELF,         PIDFD_SIGNAL_THREAD)
>
> // redundant but ok:
> (2) pidfd_send_signal(PIDFD_THREAD_GROUP, PIDFD_SIGNAL_THREAD_GROUP)
>
> // weird way to spell pidfd_send_signal(PIDFD_THREAD_GROUP, 0)
> (3) pidfd_send_signal(PIDFD_SELF,         PIDFD_SIGNAL_THREAD_GROUP)
>
> // weird way to spell pidfd_send_signal(PIDFD_SELF, 0)
> (4) pidfd_send_signal(PIDFD_THREAD_GROUP, PIDFD_SIGNAL_THREAD)
>
> I think all of this is ok but does anyone else have a strong opinion?

I think it's fine to allow all 4 and we should get this behaviour by
default (if we have no flags we use the f_flags as a hint, which will be
set correctly).

I think people might find contradictory ones, i.e. 3 and 4, strange, but it
makes sense for the flags to override the pidfd (as they would for a
non-sentinel pidfd) and it makes everything consistent vs. if you were not
using a sentinel value.

So yes I think that's fine.