diff mbox series

pidfd: prevent creation of pidfds for kthreads

Message ID 20240731-gleis-mehreinnahmen-6bbadd128383@brauner (mailing list archive)
State New
Headers show
Series pidfd: prevent creation of pidfds for kthreads | expand

Commit Message

Christian Brauner July 31, 2024, 10:01 a.m. UTC
It's currently possible to create pidfds for kthreads but it is unclear
what that is supposed to mean. Until we have use-cases for it and we
figured out what behavior we want block the creation of pidfds for
kthreads.

Fixes: 32fcb426ec00 ("pid: add pidfd_open()")
Cc: stable@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 kernel/fork.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

Comments

Oleg Nesterov July 31, 2024, 2:51 p.m. UTC | #1
On 07/31, Christian Brauner wrote:
>
> It's currently possible to create pidfds for kthreads but it is unclear
> what that is supposed to mean. Until we have use-cases for it and we
> figured out what behavior we want block the creation of pidfds for
> kthreads.

Hmm... could you explain your concerns? Why do you think we should disallow
pidfd_open(pid-of-kthread) ?

> @@ -2403,6 +2416,12 @@ __latent_entropy struct task_struct *copy_process(
>  	if (clone_flags & CLONE_PIDFD) {
>  		int flags = (clone_flags & CLONE_THREAD) ? PIDFD_THREAD : 0;
>  
> +		/* Don't create pidfds for kernel threads for now. */
> +		if (args->kthread) {
> +			retval = -EINVAL;
> +			goto bad_fork_free_pid;

Do we really need this check? Userspace can't use args->kthread != NULL,
the kernel users should not use CLONE_PIDFD.

Oleg.
Christian Brauner Aug. 1, 2024, 6:58 a.m. UTC | #2
On Wed, Jul 31, 2024 at 04:51:33PM GMT, Oleg Nesterov wrote:
> On 07/31, Christian Brauner wrote:
> >
> > It's currently possible to create pidfds for kthreads but it is unclear
> > what that is supposed to mean. Until we have use-cases for it and we
> > figured out what behavior we want block the creation of pidfds for
> > kthreads.
> 
> Hmm... could you explain your concerns? Why do you think we should disallow
> pidfd_open(pid-of-kthread) ?

It basically just works now and it's not intentional - at least not on
my part. You can't send signals to them, you may or may not get notified
via poll when a kthread exits. If we ever want this to be useful I would
like to enable it explicitly.

Plus, this causes confusion in userspace. When you have qemu running
with kvm support then kvm creates several kthreads (that inherit the
cgroup of the calling process). If you try to kill those instances via
systemctl kill or systemctl stop then pidfds for these kthreads are
opened but sending a signal to them is meaningless.

(So imho this causes more confusion then it is actually helpful. If we
add supports for kthreads I'd also like pidfs to gain a way to identify
them via statx() or fdinfo.)

> > @@ -2403,6 +2416,12 @@ __latent_entropy struct task_struct *copy_process(
> >  	if (clone_flags & CLONE_PIDFD) {
> >  		int flags = (clone_flags & CLONE_THREAD) ? PIDFD_THREAD : 0;
> >  
> > +		/* Don't create pidfds for kernel threads for now. */
> > +		if (args->kthread) {
> > +			retval = -EINVAL;
> > +			goto bad_fork_free_pid;
> 
> Do we really need this check? Userspace can't use args->kthread != NULL,
> the kernel users should not use CLONE_PIDFD.

Yeah, I know. That's really just proactive so that user of e.g.,
copy_process() such as vhost or so on don't start handing out pidfds for
stuff without requring changes to the helper itself.
Oleg Nesterov Aug. 1, 2024, 8:01 a.m. UTC | #3
OK, I won't argue, but ....

On 08/01, Christian Brauner wrote:
>
> On Wed, Jul 31, 2024 at 04:51:33PM GMT, Oleg Nesterov wrote:
> > On 07/31, Christian Brauner wrote:
> > >
> > > It's currently possible to create pidfds for kthreads but it is unclear
> > > what that is supposed to mean. Until we have use-cases for it and we
> > > figured out what behavior we want block the creation of pidfds for
> > > kthreads.
> >
> > Hmm... could you explain your concerns? Why do you think we should disallow
> > pidfd_open(pid-of-kthread) ?
>
> It basically just works now and it's not intentional - at least not on
> my part. You can't send signals to them,

Yes, you can't send signals to kthread. So what?

You can't send signals to the normal processes if check_kill_permission()
fails. And even if you are root, you can't send an unhandled signal via
pidfd = pidfd_open(1).

> you may or may not get notified
> via poll when a kthread exits.

Why? the exiting kthread should not differ in this respect?

> (So imho this causes more confusion then it is actually helpful. If we
> add supports for kthreads I'd also like pidfs to gain a way to identify
> them via statx() or fdinfo.)

/proc/$pid/status has a "Kthread" field...

> > > @@ -2403,6 +2416,12 @@ __latent_entropy struct task_struct *copy_process(
> > >  	if (clone_flags & CLONE_PIDFD) {
> > >  		int flags = (clone_flags & CLONE_THREAD) ? PIDFD_THREAD : 0;
> > >
> > > +		/* Don't create pidfds for kernel threads for now. */
> > > +		if (args->kthread) {
> > > +			retval = -EINVAL;
> > > +			goto bad_fork_free_pid;
> >
> > Do we really need this check? Userspace can't use args->kthread != NULL,
> > the kernel users should not use CLONE_PIDFD.
>
> Yeah, I know. That's really just proactive so that user of e.g.,
> copy_process() such as vhost or so on don't start handing out pidfds for
> stuff without requring changes to the helper itself.

Then I'd suggest WARN_ON_ONCE(args->kthread).

But as I said I won't argue. I see nothing wrong in this patch.

Oleg.
Christian Brauner Aug. 1, 2024, 1:48 p.m. UTC | #4
On Thu, Aug 01, 2024 at 10:01:20AM GMT, Oleg Nesterov wrote:
> OK, I won't argue, but ....
> 
> On 08/01, Christian Brauner wrote:
> >
> > On Wed, Jul 31, 2024 at 04:51:33PM GMT, Oleg Nesterov wrote:
> > > On 07/31, Christian Brauner wrote:
> > > >
> > > > It's currently possible to create pidfds for kthreads but it is unclear
> > > > what that is supposed to mean. Until we have use-cases for it and we
> > > > figured out what behavior we want block the creation of pidfds for
> > > > kthreads.
> > >
> > > Hmm... could you explain your concerns? Why do you think we should disallow
> > > pidfd_open(pid-of-kthread) ?
> >
> > It basically just works now and it's not intentional - at least not on
> > my part. You can't send signals to them,
> 
> Yes, you can't send signals to kthread. So what?
> 
> You can't send signals to the normal processes if check_kill_permission()
> fails. And even if you are root, you can't send an unhandled signal via
> pidfd = pidfd_open(1).
> 
> > you may or may not get notified
> > via poll when a kthread exits.
> 
> Why? the exiting kthread should not differ in this respect?

Why do you want to allow it? I see zero reason to get a reference to a
kthread if there's no use-case for it. kthreads are mostly a kernel
thing so why give userspace handles to it. And as I said before, there's
userspace out there that's already confused why they can get references
to them in the first place.

> 
> > (So imho this causes more confusion then it is actually helpful. If we
> > add supports for kthreads I'd also like pidfs to gain a way to identify
> > them via statx() or fdinfo.)
> 
> /proc/$pid/status has a "Kthread" field...

Going forward, I don't want to force people to parse basic stuff out of
procfs. Ideally, they'll be able to mostly rely on pidfd operations
only.
Oleg Nesterov Aug. 1, 2024, 1:59 p.m. UTC | #5
On 08/01, Christian Brauner wrote:
>
> On Thu, Aug 01, 2024 at 10:01:20AM GMT, Oleg Nesterov wrote:
> > OK, I won't argue, but ....
> >
> > > you may or may not get notified
> > > via poll when a kthread exits.
> >
> > Why? the exiting kthread should not differ in this respect?
>
> Why do you want to allow it?

Again, I didn't try to "nack" your patch. Just tried to understand
your motivation.

And "may not get notified" above doesn't look right to me...

> > /proc/$pid/status has a "Kthread" field...
>
> Going forward, I don't want to force people to parse basic stuff out of
> procfs. Ideally, they'll be able to mostly rely on pidfd operations
> only.

Agreed.

Oleg.
Eric Biggers Aug. 18, 2024, 3:58 a.m. UTC | #6
Hi Christian,

On Wed, Jul 31, 2024 at 12:01:12PM +0200, Christian Brauner wrote:
> It's currently possible to create pidfds for kthreads but it is unclear
> what that is supposed to mean. Until we have use-cases for it and we
> figured out what behavior we want block the creation of pidfds for
> kthreads.
> 
> Fixes: 32fcb426ec00 ("pid: add pidfd_open()")
> Cc: stable@vger.kernel.org
> Signed-off-by: Christian Brauner <brauner@kernel.org>
> ---
>  kernel/fork.c | 25 ++++++++++++++++++++++---
>  1 file changed, 22 insertions(+), 3 deletions(-)

Unfortunately this commit broke systemd-shutdown's ability to kill processes,
which makes some filesystems no longer get unmounted at shutdown.

It looks like systemd-shutdown relies on being able to create a pidfd for any
process listed in /proc (even a kthread), and if it gets EINVAL it treats it a
fatal error and stops looking for more processes...

This is what shows up in the system log:

    systemd[1]: Shutting down.
    systemd-shutdown[1]: Syncing filesystems and block devices.
    systemd-shutdown[1]: Sending SIGTERM to remaining processes...
    systemd-shutdown[1]: Failed to enumerate /proc/: Invalid argument
    systemd-shutdown[1]: Sending SIGKILL to remaining processes...
    systemd-shutdown[1]: Failed to enumerate /proc/: Invalid argument
    systemd-shutdown[1]: Unmounting file systems.
    (sd-umount)[17359]: Unmounting '/run/credentials/systemd-vconsole-setup.service'.
    (sd-umount)[17360]: Unmounting '/run/credentials/systemd-journald.service'.
    (sd-remount)[17361]: Remounting '/' read-only with options ''.
    (sd-remount)[17361]: Failed to remount '/' read-only: Device or resource busy
    (sd-remount)[17362]: Remounting '/' read-only with options ''.
    (sd-remount)[17362]: Failed to remount '/' read-only: Device or resource busy
    systemd-shutdown[1]: Not all file systems unmounted, 1 left.

- Eric
Christian Brauner Aug. 19, 2024, 8:41 a.m. UTC | #7
On Sat, Aug 17, 2024 at 08:58:18PM GMT, Eric Biggers wrote:
> Hi Christian,
> 
> On Wed, Jul 31, 2024 at 12:01:12PM +0200, Christian Brauner wrote:
> > It's currently possible to create pidfds for kthreads but it is unclear
> > what that is supposed to mean. Until we have use-cases for it and we
> > figured out what behavior we want block the creation of pidfds for
> > kthreads.
> > 
> > Fixes: 32fcb426ec00 ("pid: add pidfd_open()")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Christian Brauner <brauner@kernel.org>
> > ---
> >  kernel/fork.c | 25 ++++++++++++++++++++++---
> >  1 file changed, 22 insertions(+), 3 deletions(-)
> 
> Unfortunately this commit broke systemd-shutdown's ability to kill processes,
> which makes some filesystems no longer get unmounted at shutdown.
> 
> It looks like systemd-shutdown relies on being able to create a pidfd for any
> process listed in /proc (even a kthread), and if it gets EINVAL it treats it a
> fatal error and stops looking for more processes...

Thanks for the report!
I talked to Daan De Meyer who made that change and he said that this
must a systemd version that hasn't gotten his fixes yet. In any case, if
this causes regression then I'll revert it right now. See the appended
revert.
Eric Biggers Aug. 20, 2024, 7:34 p.m. UTC | #8
On Mon, Aug 19, 2024 at 10:41:15AM +0200, Christian Brauner wrote:
> On Sat, Aug 17, 2024 at 08:58:18PM GMT, Eric Biggers wrote:
> > Hi Christian,
> > 
> > On Wed, Jul 31, 2024 at 12:01:12PM +0200, Christian Brauner wrote:
> > > It's currently possible to create pidfds for kthreads but it is unclear
> > > what that is supposed to mean. Until we have use-cases for it and we
> > > figured out what behavior we want block the creation of pidfds for
> > > kthreads.
> > > 
> > > Fixes: 32fcb426ec00 ("pid: add pidfd_open()")
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: Christian Brauner <brauner@kernel.org>
> > > ---
> > >  kernel/fork.c | 25 ++++++++++++++++++++++---
> > >  1 file changed, 22 insertions(+), 3 deletions(-)
> > 
> > Unfortunately this commit broke systemd-shutdown's ability to kill processes,
> > which makes some filesystems no longer get unmounted at shutdown.
> > 
> > It looks like systemd-shutdown relies on being able to create a pidfd for any
> > process listed in /proc (even a kthread), and if it gets EINVAL it treats it a
> > fatal error and stops looking for more processes...
> 
> Thanks for the report!
> I talked to Daan De Meyer who made that change and he said that this
> must a systemd version that hasn't gotten his fixes yet. In any case, if
> this causes regression then I'll revert it right now. See the appended
> revert.

Thanks for queueing up a revert.

This was on systemd 256.4 which was released less than a month ago.

I'm not sure what systemd fix you are talking about.  Looking at killall() in
src/shared/killall.c on the latest "main" branch of systemd, it calls
proc_dir_read_pidref() => pidref_set_pid() => pidfd_open(), and EINVAL gets
passed back up to killall() and treated as a fatal error.  ignore_proc() skips
kernel threads but is executed too late.  I didn't test it, so I could be wrong,
but based on the code it does not appear to be fixed.

- Erici
Christian Brauner Aug. 21, 2024, 7:41 a.m. UTC | #9
On Tue, Aug 20, 2024 at 12:34:14PM GMT, Eric Biggers wrote:
> On Mon, Aug 19, 2024 at 10:41:15AM +0200, Christian Brauner wrote:
> > On Sat, Aug 17, 2024 at 08:58:18PM GMT, Eric Biggers wrote:
> > > Hi Christian,
> > > 
> > > On Wed, Jul 31, 2024 at 12:01:12PM +0200, Christian Brauner wrote:
> > > > It's currently possible to create pidfds for kthreads but it is unclear
> > > > what that is supposed to mean. Until we have use-cases for it and we
> > > > figured out what behavior we want block the creation of pidfds for
> > > > kthreads.
> > > > 
> > > > Fixes: 32fcb426ec00 ("pid: add pidfd_open()")
> > > > Cc: stable@vger.kernel.org
> > > > Signed-off-by: Christian Brauner <brauner@kernel.org>
> > > > ---
> > > >  kernel/fork.c | 25 ++++++++++++++++++++++---
> > > >  1 file changed, 22 insertions(+), 3 deletions(-)
> > > 
> > > Unfortunately this commit broke systemd-shutdown's ability to kill processes,
> > > which makes some filesystems no longer get unmounted at shutdown.
> > > 
> > > It looks like systemd-shutdown relies on being able to create a pidfd for any
> > > process listed in /proc (even a kthread), and if it gets EINVAL it treats it a
> > > fatal error and stops looking for more processes...
> > 
> > Thanks for the report!
> > I talked to Daan De Meyer who made that change and he said that this
> > must a systemd version that hasn't gotten his fixes yet. In any case, if
> > this causes regression then I'll revert it right now. See the appended
> > revert.
> 
> Thanks for queueing up a revert.
> 
> This was on systemd 256.4 which was released less than a month ago.
> 
> I'm not sure what systemd fix you are talking about.  Looking at killall() in
> src/shared/killall.c on the latest "main" branch of systemd, it calls
> proc_dir_read_pidref() => pidref_set_pid() => pidfd_open(), and EINVAL gets
> passed back up to killall() and treated as a fatal error.  ignore_proc() skips
> kernel threads but is executed too late.  I didn't test it, so I could be wrong,
> but based on the code it does not appear to be fixed.

Yeah, I think you're right. What they fixed is
ead48ec35c86 ("cgroup-util: Don't try to open pidfd for kernel threads")
when reading pids from cgroup.procs. Daan is currently prepping a fix
for reading pids from /proc as well.
Daan De Meyer Aug. 21, 2024, 7:47 a.m. UTC | #10
Hi,

I just prepped https://github.com/systemd/systemd/pull/34058 which
will apply the same fix from ead48ec35c86 ("cgroup-util: Don't try to
open pidfd for kernel threads") for pids read from /proc as well.

Cheers,

Daan

On Wed, 21 Aug 2024 at 09:41, Christian Brauner <brauner@kernel.org> wrote:
>
> On Tue, Aug 20, 2024 at 12:34:14PM GMT, Eric Biggers wrote:
> > On Mon, Aug 19, 2024 at 10:41:15AM +0200, Christian Brauner wrote:
> > > On Sat, Aug 17, 2024 at 08:58:18PM GMT, Eric Biggers wrote:
> > > > Hi Christian,
> > > >
> > > > On Wed, Jul 31, 2024 at 12:01:12PM +0200, Christian Brauner wrote:
> > > > > It's currently possible to create pidfds for kthreads but it is unclear
> > > > > what that is supposed to mean. Until we have use-cases for it and we
> > > > > figured out what behavior we want block the creation of pidfds for
> > > > > kthreads.
> > > > >
> > > > > Fixes: 32fcb426ec00 ("pid: add pidfd_open()")
> > > > > Cc: stable@vger.kernel.org
> > > > > Signed-off-by: Christian Brauner <brauner@kernel.org>
> > > > > ---
> > > > >  kernel/fork.c | 25 ++++++++++++++++++++++---
> > > > >  1 file changed, 22 insertions(+), 3 deletions(-)
> > > >
> > > > Unfortunately this commit broke systemd-shutdown's ability to kill processes,
> > > > which makes some filesystems no longer get unmounted at shutdown.
> > > >
> > > > It looks like systemd-shutdown relies on being able to create a pidfd for any
> > > > process listed in /proc (even a kthread), and if it gets EINVAL it treats it a
> > > > fatal error and stops looking for more processes...
> > >
> > > Thanks for the report!
> > > I talked to Daan De Meyer who made that change and he said that this
> > > must a systemd version that hasn't gotten his fixes yet. In any case, if
> > > this causes regression then I'll revert it right now. See the appended
> > > revert.
> >
> > Thanks for queueing up a revert.
> >
> > This was on systemd 256.4 which was released less than a month ago.
> >
> > I'm not sure what systemd fix you are talking about.  Looking at killall() in
> > src/shared/killall.c on the latest "main" branch of systemd, it calls
> > proc_dir_read_pidref() => pidref_set_pid() => pidfd_open(), and EINVAL gets
> > passed back up to killall() and treated as a fatal error.  ignore_proc() skips
> > kernel threads but is executed too late.  I didn't test it, so I could be wrong,
> > but based on the code it does not appear to be fixed.
>
> Yeah, I think you're right. What they fixed is
> ead48ec35c86 ("cgroup-util: Don't try to open pidfd for kernel threads")
> when reading pids from cgroup.procs. Daan is currently prepping a fix
> for reading pids from /proc as well.
Thorsten Leemhuis Aug. 23, 2024, 5:23 a.m. UTC | #11
On 19.08.24 10:41, Christian Brauner wrote:
> On Sat, Aug 17, 2024 at 08:58:18PM GMT, Eric Biggers wrote:
>> On Wed, Jul 31, 2024 at 12:01:12PM +0200, Christian Brauner wrote:
>>> It's currently possible to create pidfds for kthreads but it is unclear
>>> what that is supposed to mean. Until we have use-cases for it and we
>>> figured out what behavior we want block the creation of pidfds for
>>> kthreads.
>>>
>>> Fixes: 32fcb426ec00 ("pid: add pidfd_open()")
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: Christian Brauner <brauner@kernel.org>
>>> ---
>>>  kernel/fork.c | 25 ++++++++++++++++++++++---
>>>  1 file changed, 22 insertions(+), 3 deletions(-)
>>
>> Unfortunately this commit broke systemd-shutdown's ability to kill processes,
>> which makes some filesystems no longer get unmounted at shutdown.
>>
>> It looks like systemd-shutdown relies on being able to create a pidfd for any
>> process listed in /proc (even a kthread), and if it gets EINVAL it treats it a
>> fatal error and stops looking for more processes...
> 
> Thanks for the report!
> I talked to Daan De Meyer who made that change and he said that this
> must a systemd version that hasn't gotten his fixes yet. In any case, if
> this causes regression then I'll revert it right now. See the appended
> revert.

Greg, Sasha, JFYI in case you are not already aware of it: I by
chance[1] noticed that the patch Christian plans to revert is still in
the 6.10-queue. You might want to drop it (or apply the revert as well,
which is in -next, but not yet in mainline afaics).

Ciao, Thorsten

[1] after I noticed this thread a third time, this time via some SELinux
problems it caused in Fedora land:
https://bugzilla.redhat.com/show_bug.cgi?id=2306818
https://bugzilla.redhat.com/show_bug.cgi?id=2305270
Greg KH Aug. 23, 2024, 6:12 a.m. UTC | #12
On Fri, Aug 23, 2024 at 07:23:12AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 19.08.24 10:41, Christian Brauner wrote:
> > On Sat, Aug 17, 2024 at 08:58:18PM GMT, Eric Biggers wrote:
> >> On Wed, Jul 31, 2024 at 12:01:12PM +0200, Christian Brauner wrote:
> >>> It's currently possible to create pidfds for kthreads but it is unclear
> >>> what that is supposed to mean. Until we have use-cases for it and we
> >>> figured out what behavior we want block the creation of pidfds for
> >>> kthreads.
> >>>
> >>> Fixes: 32fcb426ec00 ("pid: add pidfd_open()")
> >>> Cc: stable@vger.kernel.org
> >>> Signed-off-by: Christian Brauner <brauner@kernel.org>
> >>> ---
> >>>  kernel/fork.c | 25 ++++++++++++++++++++++---
> >>>  1 file changed, 22 insertions(+), 3 deletions(-)
> >>
> >> Unfortunately this commit broke systemd-shutdown's ability to kill processes,
> >> which makes some filesystems no longer get unmounted at shutdown.
> >>
> >> It looks like systemd-shutdown relies on being able to create a pidfd for any
> >> process listed in /proc (even a kthread), and if it gets EINVAL it treats it a
> >> fatal error and stops looking for more processes...
> > 
> > Thanks for the report!
> > I talked to Daan De Meyer who made that change and he said that this
> > must a systemd version that hasn't gotten his fixes yet. In any case, if
> > this causes regression then I'll revert it right now. See the appended
> > revert.
> 
> Greg, Sasha, JFYI in case you are not already aware of it: I by
> chance[1] noticed that the patch Christian plans to revert is still in
> the 6.10-queue. You might want to drop it (or apply the revert as well,
> which is in -next, but not yet in mainline afaics).

I was hoping it would get into Linus's tree "soon" so I could take the
revert too.  As it's in -next, I'll grab it from there when I get a
chance.

thanks,

greg k-h
diff mbox series

Patch

diff --git a/kernel/fork.c b/kernel/fork.c
index cc760491f201..18bdc87209d0 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2053,11 +2053,24 @@  static int __pidfd_prepare(struct pid *pid, unsigned int flags, struct file **re
  */
 int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret)
 {
-	bool thread = flags & PIDFD_THREAD;
-
-	if (!pid || !pid_has_task(pid, thread ? PIDTYPE_PID : PIDTYPE_TGID))
+	if (!pid)
 		return -EINVAL;
 
+	scoped_guard(rcu) {
+		struct task_struct *tsk;
+
+		if (flags & PIDFD_THREAD)
+			tsk = pid_task(pid, PIDTYPE_PID);
+		else
+			tsk = pid_task(pid, PIDTYPE_TGID);
+		if (!tsk)
+			return -EINVAL;
+
+		/* Don't create pidfds for kernel threads for now. */
+		if (tsk->flags & PF_KTHREAD)
+			return -EINVAL;
+	}
+
 	return __pidfd_prepare(pid, flags, ret);
 }
 
@@ -2403,6 +2416,12 @@  __latent_entropy struct task_struct *copy_process(
 	if (clone_flags & CLONE_PIDFD) {
 		int flags = (clone_flags & CLONE_THREAD) ? PIDFD_THREAD : 0;
 
+		/* Don't create pidfds for kernel threads for now. */
+		if (args->kthread) {
+			retval = -EINVAL;
+			goto bad_fork_free_pid;
+		}
+
 		/* Note that no task has been attached to @pid yet. */
 		retval = __pidfd_prepare(pid, flags, &pidfile);
 		if (retval < 0)