From patchwork Wed Jan  2 16:16:53 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Christian Brauner <christian@brauner.io>
X-Patchwork-Id: 10746535
Return-Path: <linux-fsdevel-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DAA3A14DE
	for <patchwork-linux-fsdevel@patchwork.kernel.org>;
 Wed,  2 Jan 2019 16:17:50 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C51862896A
	for <patchwork-linux-fsdevel@patchwork.kernel.org>;
 Wed,  2 Jan 2019 16:17:50 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id B8F3C28AA5; Wed,  2 Jan 2019 16:17:50 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham
	version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F0FFD2896A
	for <patchwork-linux-fsdevel@patchwork.kernel.org>;
 Wed,  2 Jan 2019 16:17:48 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1730388AbfABQRj (ORCPT
        <rfc822;patchwork-linux-fsdevel@patchwork.kernel.org>);
        Wed, 2 Jan 2019 11:17:39 -0500
Received: from mail-ed1-f68.google.com ([209.85.208.68]:41200 "EHLO
        mail-ed1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1729071AbfABQRj (ORCPT
        <rfc822;linux-fsdevel@vger.kernel.org>);
        Wed, 2 Jan 2019 11:17:39 -0500
Received: by mail-ed1-f68.google.com with SMTP id a20so19837328edc.8
        for <linux-fsdevel@vger.kernel.org>;
 Wed, 02 Jan 2019 08:17:36 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=brauner.io; s=google;
        h=from:to:cc:subject:date:message-id:mime-version
         :content-transfer-encoding;
        bh=7Aq+4n6o2/CD6gMHh6uU13NVsAiGiMjKk7gpjMmUMoQ=;
        b=Z2Q8Dfrx9Zp08srRnORq0Je8jgdcRwEUzpwtcIdGhRleIVT4DoDh/9QdcciGHLDNim
         2NkEOflkPJgXZuOcf2FPPDjpTwTpvt3ksmvQUNDZw5WSzB0gUe9+eT+JNdAw/Kw85dxm
         kE0Z07GoK1aNDlxh48LerMPgQ+Dh1GQVIcsMpU6XO0fWOnx9BxCThXnzw4rHIavUKlD3
         PcaHP8wX0aBZfdatrnMEZvH+OaR/EL/rI5nfEnAp3akTJTlVDXpvdfdzoMFqGRMqkPwN
         qZV+3POd73YyVHCl5+B3h+crjtHyAfjswLR8cx/WAycf5oQTWp2IuBgePVcjD5mSH21i
         l8MQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version
         :content-transfer-encoding;
        bh=7Aq+4n6o2/CD6gMHh6uU13NVsAiGiMjKk7gpjMmUMoQ=;
        b=VFTiGbHu6ZHrKEpV3snQ+H3iJjMMjj3fePCONPhEfZjm4ceKg5lhA81Xagf9QsGBDd
         sTyC/k1CeONGE6p/B3f2uBVF8SkGMYQuY2o4oH0kEGsj+cGXRjz6adGeZp9ngI3+atQO
         qVqSVeO/x06ahdMrTdUT8hnblz6f8cFI8GbNixdXkJ6bctomrpnLQGUT7zyKflUncOsz
         74ttrj87ilZDmtO4jBYca1SEHhA7EDDpOHpTNbdJVEpziZsuIk7y3oBc9m7oBKMt1naX
         kxFP7K/L63sg9saC0Lo9poTF525L3fgt/k/qOUMjm+NGxKpmCR8xw1BA0rR1ixE3Vs9L
         lYhA==
X-Gm-Message-State: AA+aEWYC9T0Xhqp72mngTiMvHgFmACOgi69mLwpY0udarvO3i0hNaGHe
        aLzDMXi3O8kwkI8FpVdUElb23Q==
X-Google-Smtp-Source: 
 AFSGD/UnF2EiShEQy+ZWv05w3BUHOwWtBTnrp988jN0/jXVQ6nrzMZd00hSlzVv4qMEylSToGb5Yng==
X-Received: by 2002:a17:906:590a:: with SMTP id
 h10-v6mr33839641ejq.102.1546445855449;
        Wed, 02 Jan 2019 08:17:35 -0800 (PST)
Received: from localhost.localdomain
 ([2a02:8109:b6c0:d6c:700b:b2fa:de8a:175d])
        by smtp.gmail.com with ESMTPSA id
 k20sm22513972eda.71.2019.01.02.08.17.34
        (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
        Wed, 02 Jan 2019 08:17:34 -0800 (PST)
From: Christian Brauner <christian@brauner.io>
To: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org,
        luto@kernel.org, arnd@arndb.de, serge@hallyn.com,
        keescook@chromium.org, akpm@linux-foundation.org
Cc: jannh@google.com, oleg@redhat.com, cyphar@cyphar.com,
        viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org,
        dancol@google.com, timmurray@google.com, fweimer@redhat.com,
        tglx@linutronix.de, x86@kernel.org, ebiederm@xmission.com,
        Christian Brauner <christian@brauner.io>
Subject: [PATCH v7 1/2] signal: add pidfd_send_signal() syscall
Date: Wed,  2 Jan 2019 17:16:53 +0100
Message-Id: <20190102161654.9093-1-christian@brauner.io>
X-Mailer: git-send-email 2.19.1
MIME-Version: 1.0
X-Patchwork-Bot: notify
Sender: linux-fsdevel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-fsdevel.vger.kernel.org>
X-Mailing-List: linux-fsdevel@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

The kill() syscall operates on process identifiers (pid). After a process
has exited its pid can be reused by another process. If a caller sends a
signal to a reused pid it will end up signaling the wrong process. This
issue has often surfaced and there has been a push to address this problem [1].

This patch uses file descriptors (fd) from proc/<pid> as stable handles on
struct pid. Even if a pid is recycled the handle will not change. The fd
can be used to send signals to the process it refers to.
Thus, the new syscall pidfd_send_signal() is introduced to solve this
problem. Instead of pids it operates on process fds (pidfd).

/* prototype and argument /*
long pidfd_send_signal(int pidfd, int sig, siginfo_t *info, unsigned int flags);

In addition to the pidfd and signal argument it takes an additional
siginfo_t and flags argument. If the siginfo_t argument is NULL then
pidfd_send_signal() is equivalent to kill(<positive-pid>, <signal>). If it
is not NULL pidfd_send_signal() is equivalent to rt_sigqueueinfo().
The flags argument is added to allow for future extensions of this syscall.
It currently needs to be passed as 0. Failing to do so will cause EINVAL.

/* pidfd_send_signal() replaces multiple pid-based syscalls */
The pidfd_send_signal() syscall currently takes on the job of
rt_sigqueueinfo(2) and parts of the functionality of kill(2), Namely, when a
positive pid is passed to kill(2). It will however be possible to also
replace tgkill(2) and rt_tgsigqueueinfo(2) if this syscall is extended.

/* sending signals to threads (tid) and process groups (pgid) */
Specifically, the pidfd_send_signal() syscall does currently not operate on
process groups or threads. This is left for future extensions.
In order to extend the syscall to allow sending signal to threads and
process groups appropriately named flags (e.g. PIDFD_TYPE_PGID, and
PIDFD_TYPE_TID) should be added. This implies that the flags argument will
determine what is signaled and not the file descriptor itself. Put in other
words, grouping in this api is a property of the flags argument not a
property of the file descriptor (cf. [13]). Clarification for this has been
requested by Eric (cf. [19]).
When appropriate extensions through the flags argument are added then
pidfd_send_signal() can additionally replace the part of kill(2) which
operates on process groups as well as the tgkill(2) and
rt_tgsigqueueinfo(2) syscalls.
How such an extension could be implemented has been very roughly sketched
in [14], [15], and [16]. However, this should not be taken as a commitment
to a particular implementation. There might be better ways to do it.
Right now this is intentionally left out to keep this patchset as simple as
possible (cf. [4]). For example, if a pidfd for a tid from
/proc/<pid>/task/<tid> is passed EOPNOTSUPP will be returned to give
userspace a way to detect when I add support for signaling to threads (cf. [10]).

/* naming */
The syscall had various names throughout iterations of this patchset:
- procfd_signal()
- procfd_send_signal()
- taskfd_send_signal()
In the last round of reviews it was pointed out that given that if the
flags argument decides the scope of the signal instead of different types
of fds it might make sense to either settle for "procfd_" or "pidfd_" as
prefix. The community was willing to accept either (cf. [17] and [18]).
Given that one developer expressed strong preference for the "pidfd_"
prefix (cf. [13] and with other developers less opinionated about the name
we should settle for "pidfd_" to avoid further bikeshedding.

The  "_send_signal" suffix was chosen to reflect the fact that the syscall
takes on the job of multiple syscalls. It is therefore intentional that the
name is not reminiscent of neither kill(2) nor rt_sigqueueinfo(2). Not the
fomer because it might imply that pidfd_send_signal() is a replacement for
kill(2), and not the latter because it is a hassle to remember the correct
spelling - especially for non-native speakers - and because it is not
descriptive enough of what the syscall actually does. The name
"pidfd_send_signal" makes it very clear that its job is to send signals.

/* zombies */
Zombies can be signaled just as any other process. No special error will be
reported since a zombie state is an unreliable state (cf. [3]). However,
this can be added as an extension through the @flags argument if the need
ever arises.

/* cross-namespace signals */
The patch currently enforces that the signaler and signalee either are in
the same pid namespace or that the signaler's pid namespace is an ancestor
of the signalee's pid namespace. This is done for the sake of simplicity
and because it is unclear to what values certain members of struct
siginfo_t would need to be set to (cf. [5], [6]).

/* compat syscalls */
It became clear that we would like to avoid adding compat syscalls
(cf. [7]).  The compat syscall handling is now done in kernel/signal.c
itself by adding __copy_siginfo_from_user_generic() which lets us avoid
compat syscalls (cf. [8]). It should be noted that the addition of
__copy_siginfo_from_user_any() is caused by a bug in the original
implementation of rt_sigqueueinfo(2) (cf. 12).
With upcoming rework for syscall handling things might improve
significantly (cf. [11]) and __copy_siginfo_from_user_any() will not gain
any additional callers.

/* testing */
This patch was tested on x64 and x86.

/* userspace usage */
An asciinema recording for the basic functionality can be found under [9].
With this patch a process can be killed via:

 #define _GNU_SOURCE
 #include <errno.h>
 #include <fcntl.h>
 #include <signal.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <sys/stat.h>
 #include <sys/syscall.h>
 #include <sys/types.h>
 #include <unistd.h>

 static inline int do_pidfd_send_signal(int pidfd, int sig, siginfo_t *info,
                                         unsigned int flags)
 {
 #ifdef __NR_pidfd_send_signal
         return syscall(__NR_pidfd_send_signal, pidfd, sig, info, flags);
 #else
         return -ENOSYS;
 #endif
 }

 int main(int argc, char *argv[])
 {
         int fd, ret, saved_errno, sig;

         if (argc < 3)
                 exit(EXIT_FAILURE);

         fd = open(argv[1], O_DIRECTORY | O_CLOEXEC);
         if (fd < 0) {
                 printf("%s - Failed to open \"%s\"\n", strerror(errno), argv[1]);
                 exit(EXIT_FAILURE);
         }

         sig = atoi(argv[2]);

         printf("Sending signal %d to process %s\n", sig, argv[1]);
         ret = do_pidfd_send_signal(fd, sig, NULL, 0);

         saved_errno = errno;
         close(fd);
         errno = saved_errno;

         if (ret < 0) {
                 printf("%s - Failed to send signal %d to process %s\n",
                        strerror(errno), sig, argv[1]);
                 exit(EXIT_FAILURE);
         }

         exit(EXIT_SUCCESS);
 }

/* Q&A
 * Given that it seems the same questions get asked again by people who are
 * late to the party it makes sense to add a Q&A section to the commit
 * message so it's hopefully easier to avoid duplicate threads.
 *
 * For the sake of progress please consider these arguments settled unless
 * there is a new point that desperately needs to be addressed. Please make
 * sure to check the links to the threads in this commit message whether
 * this has not already been covered.
 */
Q-01: (Florian Weimer [20], Andrew Morton [21])
      What happens when the target process has exited?
A-01: Sending the signal will fail with ESRCH (cf. [22]).

Q-02:  (Andrew Morton [21])
       Is the task_struct pinned by the fd?
A-02:  No. A reference to struct pid is kept. struct pid - as far as I
       understand - was created exactly for the reason to not require to
       pin struct task_struct (cf. [22]).

Q-03: (Andrew Morton [21])
      Does the entire procfs directory remain visible? Just one entry
      within it?
A-03: The same thing that happens right now when you hold a file descriptor
      to /proc/<pid> open (cf. [22]).

Q-04: (Andrew Morton [21])
      Does the pid remain reserved?
A-04: No. This patchset guarantees a stable handle not that pids are not
      recycled (cf. [22]).

Q-05: (Andrew Morton [21])
      Do attempts to signal that fd return errors?
A-05: See {Q,A}-01.

Q-06: (Andrew Morton [22])
      Is there a cleaner way of obtaining the fd? Another syscall perhaps.
A-06: Userspace can already trivially retrieve file descriptors from procfs
      so this is something that we will need to support anyway. Hence,
      there's no immediate need to add another syscalls just to make
      pidfd_send_signal() not dependent on the presence of procfs. However,
      adding a syscalls to get such file descriptors is planned for a
      future patchset (cf. [22]).

Q-07: (Andrew Morton [21] and others)
      This fd-for-a-process sounds like a handy thing and people may well
      think up other uses for it in the future, probably unrelated to
      signals. Are the code and the interface designed to permit such
      future applications?
A-07: Yes (cf. [22]).

Q-08: (Andrew Morton [21] and others)
      Now I think about it, why a new syscall? This thing is looking
      rather like an ioctl?
A-08: This has been extensively discussed. It was agreed that a syscall is
      preferred for a variety or reasons. Here are just a few taken from
      prior threads. Syscalls are safer than ioctl()s especially when
      signaling to fds. Processes are a core kernel concept so a syscall
      seems more appropriate. The layout of the syscall with its four
      arguments would require the addition of a custom struct for the
      ioctl() thereby causing at least the same amount or even more
      complexity for userspace than a simple syscall. The new syscall will
      replace multiple other pid-based syscalls (see description above).
      The file-descriptors-for-processes concept introduced with this
      syscall will be extended with other syscalls in the future. See also
      [22], [23] and various other threads already linked in here.

Q-09: (Florian Weimer [24])
      What happens if you use the new interface with an O_PATH descriptor?
A-09:
      pidfds opened as O_PATH fds cannot be used to send signals to a
      process (cf. [2]). Signaling processes through pidfds is the
      equivalent of writing to a file. Thus, this is not an operation that
      operates "purely at the file descriptor level" as required by the
      open(2) manpage. See also [4].

/* References */
[1]:  https://lore.kernel.org/lkml/20181029221037.87724-1-dancol@google.com/
[2]:  https://lore.kernel.org/lkml/874lbtjvtd.fsf@oldenburg2.str.redhat.com/
[3]:  https://lore.kernel.org/lkml/20181204132604.aspfupwjgjx6fhva@brauner.io/
[4]:  https://lore.kernel.org/lkml/20181203180224.fkvw4kajtbvru2ku@brauner.io/
[5]:  https://lore.kernel.org/lkml/20181121213946.GA10795@mail.hallyn.com/
[6]:  https://lore.kernel.org/lkml/20181120103111.etlqp7zop34v6nv4@brauner.io/
[7]:  https://lore.kernel.org/lkml/36323361-90BD-41AF-AB5B-EE0D7BA02C21@amacapital.net/
[8]:  https://lore.kernel.org/lkml/87tvjxp8pc.fsf@xmission.com/
[9]:  https://asciinema.org/a/IQjuCHew6bnq1cr78yuMv16cy
[10]: https://lore.kernel.org/lkml/20181203180224.fkvw4kajtbvru2ku@brauner.io/
[11]: https://lore.kernel.org/lkml/F53D6D38-3521-4C20-9034-5AF447DF62FF@amacapital.net/
[12]: https://lore.kernel.org/lkml/87zhtjn8ck.fsf@xmission.com/
[13]: https://lore.kernel.org/lkml/871s6u9z6u.fsf@xmission.com/
[14]: https://lore.kernel.org/lkml/20181206231742.xxi4ghn24z4h2qki@brauner.io/
[15]: https://lore.kernel.org/lkml/20181207003124.GA11160@mail.hallyn.com/
[16]: https://lore.kernel.org/lkml/20181207015423.4miorx43l3qhppfz@brauner.io/
[17]: https://lore.kernel.org/lkml/CAGXu5jL8PciZAXvOvCeCU3wKUEB_dU-O3q0tDw4uB_ojMvDEew@mail.gmail.com/
[18]: https://lore.kernel.org/lkml/20181206222746.GB9224@mail.hallyn.com/
[19]: https://lore.kernel.org/lkml/20181208054059.19813-1-christian@brauner.io/
[20]: https://lore.kernel.org/lkml/8736rebl9s.fsf@oldenburg.str.redhat.com/
[21]: https://lore.kernel.org/lkml/20181228152012.dbf0508c2508138efc5f2bbe@linux-foundation.org/
[22]: https://lore.kernel.org/lkml/20181228233725.722tdfgijxcssg76@brauner.io/
[23]: https://lwn.net/Articles/773459/
[24]: https://lore.kernel.org/lkml/8736rebl9s.fsf@oldenburg.str.redhat.com/

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Jann Horn <jannh@google.com>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Christian Brauner <christian@brauner.io>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Serge Hallyn <serge@hallyn.com>
Acked-by: Aleksa Sarai <cyphar@cyphar.com>
Reviewed-by: Tycho Andersen <tycho@tycho.ws>
---
/* Changelog */
v7:
- patch unchanged
v6:
- Given that it seems the same questions get asked multiple times it made
  sense to add a Q&A section to the commit message so it's hopefully easier
  to avoid duplicate threads.
- Since we settled on adding flags when extending the syscalls to allow
  signaling to threads and process groups it doesn't make sense anymore to
  report EOPNOTSUPP when a file descriptor to /proc/<pid>/task/<tid> is
  passed. This means we can also remove the tgid_pidfd_to_pid() helper from
  proc_fs.h and simplify the code. We will now always return EBADF when a
  file descriptor is passed that does not refer to /proc/<pid>.
- add CONFIG_PROC_FS ifdefs for pidfd_send_signal() and add
  COND_SYSCALL(pidfd_send_signal) definition as suggested by Andrew Morgan
  in [changelog-1].
v5:
- s/may_signal_taskfd/access_taskfd_pidns/g
- make it clear that process grouping is a property of the @flags argument
  Eric has argued that he would like to know when we add thread and process
  group signal support whether grouping will be a property of the file
  descriptor or the flag argument and he would oppose this until a
  commitment has been made. It seems that the cleanest strategy is to make
  grouping a property of the @flags argument.
  He also argued that in this case the prefix of the syscall should be
  "pidfd_" (cf. https://lore.kernel.org/lkml/871s6u9z6u.fsf@xmission.com/).
- use "pidfd_" as prefix for the syscall since grouping will be a property
  of the @flags argument
- substantial rewrite of the commit message to reflect the discussion
v4:
- updated asciinema to use "taskfd_" prefix
- s/procfd_send_signal/taskfd_send_signal/g
- s/proc_is_tgid_procfd/tgid_taskfd_to_pid/b
- s/proc_is_tid_procfd/tid_taskfd_to_pid/b
- s/__copy_siginfo_from_user_generic/__copy_siginfo_from_user_any/g
- make it clear that __copy_siginfo_from_user_any() is a workaround caused
  by a bug in the original implementation of rt_sigqueueinfo()
- when spoofing signals turn them into regular kill signals if si_code is
  set to SI_USER
- make proc_is_t{g}id_procfd() return struct pid to allow proc_pid() to
  stay private to fs/proc/
v3:
- add __copy_siginfo_from_user_generic() to avoid adding compat syscalls
- s/procfd_signal/procfd_send_signal/g
- change type of flags argument from int to unsigned int
- add comment about what happens to zombies
- add proc_is_tid_procfd()
- return EOPNOTSUPP when /proc/<pid>/task/<tid> fd is passed so userspace
  has a way of knowing that tidfds are not supported currently.
v2:
- define __NR_procfd_signal in unistd.h
- wire up compat syscall
- s/proc_is_procfd/proc_is_tgid_procfd/g
- provide stubs when CONFIG_PROC_FS=n
- move proc_pid() to linux/proc_fs.h header
- use proc_pid() to grab struct pid from /proc/<pid> fd
v1:
- patch introduced

/* Changelog references */
[changelog-1]: https://lore.kernel.org/lkml/20181228152012.dbf0508c2508138efc5f2bbe@linux-foundation.org/
---
 arch/x86/entry/syscalls/syscall_32.tbl |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl |   1 +
 fs/proc/base.c                         |   9 ++
 include/linux/proc_fs.h                |   6 ++
 include/linux/syscalls.h               |   3 +
 include/uapi/asm-generic/unistd.h      |   4 +-
 kernel/signal.c                        | 133 +++++++++++++++++++++++--
 kernel/sys_ni.c                        |   1 +
 8 files changed, 151 insertions(+), 7 deletions(-)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 3cf7b533b3d1..6804c1e84b36 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -398,3 +398,4 @@
 384	i386	arch_prctl		sys_arch_prctl			__ia32_compat_sys_arch_prctl
 385	i386	io_pgetevents		sys_io_pgetevents		__ia32_compat_sys_io_pgetevents
 386	i386	rseq			sys_rseq			__ia32_sys_rseq
+387	i386	pidfd_send_signal	sys_pidfd_send_signal		__ia32_sys_pidfd_send_signal
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index f0b1709a5ffb..aa4b858fa0f1 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -343,6 +343,7 @@
 332	common	statx			__x64_sys_statx
 333	common	io_pgetevents		__x64_sys_io_pgetevents
 334	common	rseq			__x64_sys_rseq
+335	common	pidfd_send_signal	__x64_sys_pidfd_send_signal
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/fs/proc/base.c b/fs/proc/base.c
index d7fd1ca807d2..714bfc844897 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3038,6 +3038,15 @@ static const struct file_operations proc_tgid_base_operations = {
 	.llseek		= generic_file_llseek,
 };
 
+struct pid *tgid_pidfd_to_pid(const struct file *file)
+{
+	if (!d_is_dir(file->f_path.dentry) ||
+	    (file->f_op != &proc_tgid_base_operations))
+		return ERR_PTR(-EBADF);
+
+	return proc_pid(file_inode(file));
+}
+
 static struct dentry *proc_tgid_base_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags)
 {
 	return proc_pident_lookup(dir, dentry,
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index d0e1f1522a78..52a283ba0465 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -73,6 +73,7 @@ struct proc_dir_entry *proc_create_net_single_write(const char *name, umode_t mo
 						    int (*show)(struct seq_file *, void *),
 						    proc_write_t write,
 						    void *data);
+extern struct pid *tgid_pidfd_to_pid(const struct file *file);
 
 #else /* CONFIG_PROC_FS */
 
@@ -114,6 +115,11 @@ static inline int remove_proc_subtree(const char *name, struct proc_dir_entry *p
 #define proc_create_net(name, mode, parent, state_size, ops) ({NULL;})
 #define proc_create_net_single(name, mode, parent, show, data) ({NULL;})
 
+static inline struct pid *tgid_pidfd_to_pid(const struct file *file)
+{
+	return ERR_PTR(-EBADF);
+}
+
 #endif /* CONFIG_PROC_FS */
 
 struct net;
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 251979d2e709..b0d2a12bb523 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -926,6 +926,9 @@ asmlinkage long sys_statx(int dfd, const char __user *path, unsigned flags,
 			  unsigned mask, struct statx __user *buffer);
 asmlinkage long sys_rseq(struct rseq __user *rseq, uint32_t rseq_len,
 			 int flags, uint32_t sig);
+asmlinkage long sys_pidfd_send_signal(int pidfd, int sig,
+				       siginfo_t __user *info,
+				       unsigned int flags);
 
 /*
  * Architecture-specific system calls
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index d90127298f12..b77538af7aca 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -740,9 +740,11 @@ __SC_COMP(__NR_io_pgetevents, sys_io_pgetevents, compat_sys_io_pgetevents)
 __SYSCALL(__NR_rseq, sys_rseq)
 #define __NR_kexec_file_load 294
 __SYSCALL(__NR_kexec_file_load,     sys_kexec_file_load)
+#define __NR_pidfd_send_signal 295
+__SYSCALL(__NR_pidfd_send_signal, sys_pidfd_send_signal)
 
 #undef __NR_syscalls
-#define __NR_syscalls 295
+#define __NR_syscalls 296
 
 /*
  * 32 bit systems traditionally used different
diff --git a/kernel/signal.c b/kernel/signal.c
index 53e07d97ffe0..61b12c518fbc 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -19,7 +19,9 @@
 #include <linux/sched/task.h>
 #include <linux/sched/task_stack.h>
 #include <linux/sched/cputime.h>
+#include <linux/file.h>
 #include <linux/fs.h>
+#include <linux/proc_fs.h>
 #include <linux/tty.h>
 #include <linux/binfmts.h>
 #include <linux/coredump.h>
@@ -3429,6 +3431,16 @@ COMPAT_SYSCALL_DEFINE4(rt_sigtimedwait, compat_sigset_t __user *, uthese,
 #endif
 #endif
 
+static inline void prepare_kill_siginfo(int sig, struct kernel_siginfo *info)
+{
+	clear_siginfo(info);
+	info->si_signo = sig;
+	info->si_errno = 0;
+	info->si_code = SI_USER;
+	info->si_pid = task_tgid_vnr(current);
+	info->si_uid = from_kuid_munged(current_user_ns(), current_uid());
+}
+
 /**
  *  sys_kill - send a signal to a process
  *  @pid: the PID of the process
@@ -3438,16 +3450,125 @@ SYSCALL_DEFINE2(kill, pid_t, pid, int, sig)
 {
 	struct kernel_siginfo info;
 
-	clear_siginfo(&info);
-	info.si_signo = sig;
-	info.si_errno = 0;
-	info.si_code = SI_USER;
-	info.si_pid = task_tgid_vnr(current);
-	info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
+	prepare_kill_siginfo(sig, &info);
 
 	return kill_something_info(sig, &info, pid);
 }
 
+#ifdef CONFIG_PROC_FS
+/*
+ * Verify that the signaler and signalee either are in the same pid namespace
+ * or that the signaler's pid namespace is an ancestor of the signalee's pid
+ * namespace.
+ */
+static bool access_pidfd_pidns(struct pid *pid)
+{
+	struct pid_namespace *active = task_active_pid_ns(current);
+	struct pid_namespace *p = ns_of_pid(pid);
+
+	for (;;) {
+		if (!p)
+			return false;
+		if (p == active)
+			break;
+		p = p->parent;
+	}
+
+	return true;
+}
+
+static int copy_siginfo_from_user_any(kernel_siginfo_t *kinfo, siginfo_t *info)
+{
+#ifdef CONFIG_COMPAT
+	/*
+	 * Avoid hooking up compat syscalls and instead handle necessary
+	 * conversions here. Note, this is a stop-gap measure and should not be
+	 * considered a generic solution.
+	 */
+	if (in_compat_syscall())
+		return copy_siginfo_from_user32(
+			kinfo, (struct compat_siginfo __user *)info);
+#endif
+	return copy_siginfo_from_user(kinfo, info);
+}
+
+/**
+ * sys_pidfd_send_signal - send a signal to a process through a task file
+ *                          descriptor
+ * @pidfd:  the file descriptor of the process
+ * @sig:    signal to be sent
+ * @info:   the signal info
+ * @flags:  future flags to be passed
+ *
+ * The syscall currently only signals via PIDTYPE_PID which covers
+ * kill(<positive-pid>, <signal>. It does not signal threads or process
+ * groups.
+ * In order to extend the syscall to threads and process groups the @flags
+ * argument should be used. In essence, the @flags argument will determine
+ * what is signaled and not the file descriptor itself. Put in other words,
+ * grouping is a property of the flags argument not a property of the file
+ * descriptor.
+ *
+ * Return: 0 on success, negative errno on failure
+ */
+SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig,
+		siginfo_t __user *, info, unsigned int, flags)
+{
+	int ret;
+	struct fd f;
+	struct pid *pid;
+	kernel_siginfo_t kinfo;
+
+	/* Enforce flags be set to 0 until we add an extension. */
+	if (flags)
+		return -EINVAL;
+
+	f = fdget_raw(pidfd);
+	if (!f.file)
+		return -EBADF;
+
+	/* Is this a pidfd? */
+	pid = tgid_pidfd_to_pid(f.file);
+	if (IS_ERR(pid)) {
+		ret = PTR_ERR(pid);
+		goto err;
+	}
+
+	ret = -EINVAL;
+	if (!access_pidfd_pidns(pid))
+		goto err;
+
+	if (info) {
+		ret = copy_siginfo_from_user_any(&kinfo, info);
+		if (unlikely(ret))
+			goto err;
+
+		ret = -EINVAL;
+		if (unlikely(sig != kinfo.si_signo))
+			goto err;
+
+		if ((task_pid(current) != pid) &&
+		    (kinfo.si_code >= 0 || kinfo.si_code == SI_TKILL)) {
+			/* Only allow sending arbitrary signals to yourself. */
+			ret = -EPERM;
+			if (kinfo.si_code != SI_USER)
+				goto err;
+
+			/* Turn this into a regular kill signal. */
+			prepare_kill_siginfo(sig, &kinfo);
+		}
+	} else {
+		prepare_kill_siginfo(sig, &kinfo);
+	}
+
+	ret = kill_pid_info(sig, &kinfo, pid);
+
+err:
+	fdput(f);
+	return ret;
+}
+#endif /* CONFIG_PROC_FS */
+
 static int
 do_send_specific(pid_t tgid, pid_t pid, int sig, struct kernel_siginfo *info)
 {
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index ab9d0e3c6d50..f905f4f9f677 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -163,6 +163,7 @@ COND_SYSCALL(syslog);
 /* kernel/sched/core.c */
 
 /* kernel/signal.c */
+COND_SYSCALL(pidfd_send_signal);
 
 /* kernel/sys.c */
 COND_SYSCALL(setregid);

From patchwork Wed Jan  2 16:16:54 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Christian Brauner <christian@brauner.io>
X-Patchwork-Id: 10746533
Return-Path: <linux-fsdevel-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4A3D56C5
	for <patchwork-linux-fsdevel@patchwork.kernel.org>;
 Wed,  2 Jan 2019 16:17:47 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 35B7E2896A
	for <patchwork-linux-fsdevel@patchwork.kernel.org>;
 Wed,  2 Jan 2019 16:17:47 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 26E6328AA5; Wed,  2 Jan 2019 16:17:47 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham
	version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 32A112896A
	for <patchwork-linux-fsdevel@patchwork.kernel.org>;
 Wed,  2 Jan 2019 16:17:46 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1730401AbfABQRl (ORCPT
        <rfc822;patchwork-linux-fsdevel@patchwork.kernel.org>);
        Wed, 2 Jan 2019 11:17:41 -0500
Received: from mail-ed1-f65.google.com ([209.85.208.65]:33578 "EHLO
        mail-ed1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1730383AbfABQRk (ORCPT
        <rfc822;linux-fsdevel@vger.kernel.org>);
        Wed, 2 Jan 2019 11:17:40 -0500
Received: by mail-ed1-f65.google.com with SMTP id p6so26681972eds.0
        for <linux-fsdevel@vger.kernel.org>;
 Wed, 02 Jan 2019 08:17:37 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=brauner.io; s=google;
        h=from:to:cc:subject:date:message-id:in-reply-to:references
         :mime-version:content-transfer-encoding;
        bh=PVewjK+WZgXG8aJrPOjn6YwGTbcieMK5eAf0Dz5fQac=;
        b=OkqaDaYSelEprsGFnUz0xRO96uDCGGd9iPWGS4yEx90Frae6HtCHX9VHS8Rp5zu2r3
         nwoXZ/XcQkTpDtMi4O5dogbipjaYUZ6F6DwwyDS1JMj8SaPll6GJoQXsomWzl8uScHv3
         Z+/FeX+GgLNcLXlHC72bauzNlCIXiyGLTR60p9o5HW3ILV+YdVjiaoL2yADj+fJuL4zD
         cbduHW8cP1aH97nMXMxx2Avm9HLPU7GcA/TEL3aDPJ19xceOH9kCLlyHpbwwIi3UP+NG
         hYmpmJvuqGx/sgLgU+8hUqlYVVvK+ZA5klof+TfvFci/l/hYRHHLgFwoaAG3rzZUS914
         YTNA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
         :references:mime-version:content-transfer-encoding;
        bh=PVewjK+WZgXG8aJrPOjn6YwGTbcieMK5eAf0Dz5fQac=;
        b=rKRy8U0A6Qwt2isnsNJJW/DX7H258P1LL19S9/jpgQDQ6TLnjkgiBpVbqbrnsoeQy0
         AkIP7dJSGdTIWu9vg7ORONweELwrqyPwqOmKt/sMmjYZQlCZA1N10WiM1st5675RsdUg
         9oY9eCOAUAla+7qaCDCxrW9G4r/uVHO3ug3lZqW5BgJTbgjhdU0PQ6z9vZ/W8kufj79g
         0NAn1kmQeTMVV8F2fxI6A3CE2Ii9ER+gGxXJ0m+QNpn8D2pIF4A1ibQIE3ggDl3DtICx
         ip3ZeXJ6U7BdKZy7aCYQcMZecfm+6RFLT9a/I1b5y3Z0LKtnaxH2AgJWGWQDQ6LD0adm
         Kw0Q==
X-Gm-Message-State: AA+aEWZSM42MAURTOwvB+80jYvWF9ATS0kTMezUjmO4GJESAGLItFV+z
        2WgyGPuECe1c6vZytF8jhsM4ww==
X-Google-Smtp-Source: 
 AFSGD/WNTkGnf/b4HBy1byYq0T4kN2jqP3ohworK+UOisbrRAAVWog6xXCGSO6d6BhrnUMK/bPgZiw==
X-Received: by 2002:a50:8ad6:: with SMTP id
 k22mr40458600edk.189.1546445856887;
        Wed, 02 Jan 2019 08:17:36 -0800 (PST)
Received: from localhost.localdomain
 ([2a02:8109:b6c0:d6c:700b:b2fa:de8a:175d])
        by smtp.gmail.com with ESMTPSA id
 k20sm22513972eda.71.2019.01.02.08.17.35
        (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
        Wed, 02 Jan 2019 08:17:36 -0800 (PST)
From: Christian Brauner <christian@brauner.io>
To: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org,
        luto@kernel.org, arnd@arndb.de, serge@hallyn.com,
        keescook@chromium.org, akpm@linux-foundation.org
Cc: jannh@google.com, oleg@redhat.com, cyphar@cyphar.com,
        viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org,
        dancol@google.com, timmurray@google.com, fweimer@redhat.com,
        tglx@linutronix.de, x86@kernel.org, ebiederm@xmission.com,
        Christian Brauner <christian@brauner.io>
Subject: [PATCH v7 2/2] selftests: add tests for pidfd_send_signal()
Date: Wed,  2 Jan 2019 17:16:54 +0100
Message-Id: <20190102161654.9093-2-christian@brauner.io>
X-Mailer: git-send-email 2.19.1
In-Reply-To: <20190102161654.9093-1-christian@brauner.io>
References: <20190102161654.9093-1-christian@brauner.io>
MIME-Version: 1.0
X-Patchwork-Bot: notify
Sender: linux-fsdevel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-fsdevel.vger.kernel.org>
X-Mailing-List: linux-fsdevel@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

As suggested by Andrew Morton in [1] add selftests for the new
sys_pidfd_send_signal() syscall:

/* test_pidfd_send_signal_syscall_support */
Test whether the pidfd_send_signal() syscall is supported and the tests can
be run or need to be skipped.

/* test_pidfd_send_signal_simple_success */
Test whether sending a signal via a pidfd works.

/* test_pidfd_send_signal_exited_fail */
Verify that sending a signal to an already exited process fails with ESRCH.

/* test_pidfd_send_signal_recycled_pid_fail */
Verify that a recycled pid cannot be signaled via a pidfd referring to an
already exited process that had the same pid (cf. [2], [3]).

[1]: https://lore.kernel.org/lkml/20181228152012.dbf0508c2508138efc5f2bbe@linux-foundation.org/
[2]: https://lore.kernel.org/lkml/20181230210245.GA30252@mail.hallyn.com/
[3]: https://lore.kernel.org/lkml/20181230232711.7aayb7vnhogbv4co@brauner.io/

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Jann Horn <jannh@google.com>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Christian Brauner <christian@brauner.io>
Acked-by: Serge Hallyn <serge@hallyn.com>
---
/* Changelog */
v7:
- add test for pid recycling (cf. [2] and [3]) as suggested by Serge
- add test for pidfd_send_signal() syscalls support so we can skip tests if
  it isn't
v6:
- patch introduced
v5..v0:
- patch not present
---
 tools/testing/selftests/Makefile           |   1 +
 tools/testing/selftests/pidfd/Makefile     |   6 +
 tools/testing/selftests/pidfd/pidfd_test.c | 381 +++++++++++++++++++++
 3 files changed, 388 insertions(+)
 create mode 100644 tools/testing/selftests/pidfd/Makefile
 create mode 100644 tools/testing/selftests/pidfd/pidfd_test.c

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index eb54df682d56..84dd78a684e4 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -29,6 +29,7 @@ TARGETS += net
 TARGETS += netfilter
 TARGETS += networking/timestamping
 TARGETS += nsfs
+TARGETS += pidfd
 TARGETS += powerpc
 TARGETS += proc
 TARGETS += pstore
diff --git a/tools/testing/selftests/pidfd/Makefile b/tools/testing/selftests/pidfd/Makefile
new file mode 100644
index 000000000000..deaf8073bc06
--- /dev/null
+++ b/tools/testing/selftests/pidfd/Makefile
@@ -0,0 +1,6 @@
+CFLAGS += -g -I../../../../usr/include/
+
+TEST_GEN_PROGS := pidfd_test
+
+include ../lib.mk
+
diff --git a/tools/testing/selftests/pidfd/pidfd_test.c b/tools/testing/selftests/pidfd/pidfd_test.c
new file mode 100644
index 000000000000..d59378a93782
--- /dev/null
+++ b/tools/testing/selftests/pidfd/pidfd_test.c
@@ -0,0 +1,381 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <linux/types.h>
+#include <sched.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <syscall.h>
+#include <sys/mount.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
+#include "../kselftest.h"
+
+static inline int sys_pidfd_send_signal(int pidfd, int sig, siginfo_t *info,
+					unsigned int flags)
+{
+	return syscall(__NR_pidfd_send_signal, pidfd, sig, info, flags);
+}
+
+static int signal_received;
+
+static void set_signal_received_on_sigusr1(int sig)
+{
+	if (sig == SIGUSR1)
+		signal_received = 1;
+}
+
+/*
+ * Straightforward test to see whether pidfd_send_signal() works is to send
+ * a signal to ourself.
+ */
+static int test_pidfd_send_signal_simple_success(void)
+{
+	int pidfd, ret;
+	const char *test_name = "pidfd_send_signal send SIGUSR1";
+
+	pidfd = open("/proc/self", O_DIRECTORY | O_CLOEXEC);
+	if (pidfd < 0)
+		ksft_exit_fail_msg(
+			"%s test: Failed to open process file descriptor\n",
+			test_name);
+
+	signal(SIGUSR1, set_signal_received_on_sigusr1);
+
+	ret = sys_pidfd_send_signal(pidfd, SIGUSR1, NULL, 0);
+	close(pidfd);
+	if (ret < 0)
+		ksft_exit_fail_msg("%s test: Failed to send signal\n",
+				   test_name);
+
+	if (signal_received != 1)
+		ksft_exit_fail_msg("%s test: Failed to receive signal\n",
+				   test_name);
+
+	signal_received = 0;
+	ksft_test_result_pass("%s test: Sent signal\n", test_name);
+	return 0;
+}
+
+static int wait_for_pid(pid_t pid)
+{
+	int status, ret;
+
+again:
+	ret = waitpid(pid, &status, 0);
+	if (ret == -1) {
+		if (errno == EINTR)
+			goto again;
+
+		return -1;
+	}
+
+	if (ret != pid)
+		goto again;
+
+	if (!WIFEXITED(status))
+		return -1;
+
+	return WEXITSTATUS(status);
+}
+
+static int test_pidfd_send_signal_exited_fail(void)
+{
+	int pidfd, ret, saved_errno;
+	char buf[256];
+	pid_t pid;
+	const char *test_name = "pidfd_send_signal signal exited process";
+
+	pid = fork();
+	if (pid < 0)
+		ksft_exit_fail_msg("%s test: Failed to create new process\n",
+				   test_name);
+
+	if (pid == 0)
+		_exit(EXIT_SUCCESS);
+
+	snprintf(buf, sizeof(buf), "/proc/%d", pid);
+
+	pidfd = open(buf, O_DIRECTORY | O_CLOEXEC);
+
+	(void)wait_for_pid(pid);
+
+	if (pidfd < 0)
+		ksft_exit_fail_msg(
+			"%s test: Failed to open process file descriptor\n",
+			test_name);
+
+	ret = sys_pidfd_send_signal(pidfd, 0, NULL, 0);
+	saved_errno = errno;
+	close(pidfd);
+	if (ret == 0)
+		ksft_exit_fail_msg(
+			"%s test: Managed to send signal to process even though it should have failed\n",
+			test_name);
+
+	if (saved_errno != ESRCH)
+		ksft_exit_fail_msg(
+			"%s test: Expected to receive ESRCH as errno value but received %d instead\n",
+			test_name, saved_errno);
+
+	ksft_test_result_pass("%s test: Failed to send signal as expected\n",
+			      test_name);
+	return 0;
+}
+
+/*
+ * The kernel reserves 300 pids via RESERVED_PIDS in kernel/pid.c
+ * That means, when it wraps around any pid < 300 will be skipped.
+ * So we need to use a pid > 300 in order to test recycling.
+ */
+#define PID_RECYCLE 1000
+
+/*
+ * Maximum number of cycles we allow. This is equivalent to PID_MAX_DEFAULT.
+ * If users set a higher limit or we have cycled PIDFD_MAX_DEFAULT number of
+ * times then we skip the test to not go into an infinite loop or block for a
+ * long time.
+ */
+#define PIDFD_MAX_DEFAULT 0x8000
+
+/*
+ * Define a few custom error codes for the child process to clearly indicate
+ * what is happening. This way we can tell the difference between a system
+ * error, a test error, etc.
+ */
+#define PIDFD_PASS 0
+#define PIDFD_FAIL 1
+#define PIDFD_ERROR 2
+#define PIDFD_SKIP 3
+#define PIDFD_XFAIL 4
+
+static int test_pidfd_send_signal_recycled_pid_fail(void)
+{
+	int i, ret;
+	pid_t pid1;
+	const char *test_name = "pidfd_send_signal signal recycled pid";
+
+	ret = unshare(CLONE_NEWPID);
+	if (ret < 0)
+		ksft_exit_fail_msg("%s test: Failed to unshare pid namespace\n",
+				   test_name);
+
+	ret = unshare(CLONE_NEWNS);
+	if (ret < 0)
+		ksft_exit_fail_msg(
+			"%s test: Failed to unshare mount namespace\n",
+			test_name);
+
+	ret = mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, 0);
+	if (ret < 0)
+		ksft_exit_fail_msg("%s test: Failed to remount / private\n",
+				   test_name);
+
+	/* pid 1 in new pid namespace */
+	pid1 = fork();
+	if (pid1 < 0)
+		ksft_exit_fail_msg("%s test: Failed to create new process\n",
+				   test_name);
+
+	if (pid1 == 0) {
+		char buf[256];
+		pid_t pid2;
+		int pidfd = -1;
+
+		(void)umount2("/proc", MNT_DETACH);
+		ret = mount("proc", "/proc", "proc", 0, NULL);
+		if (ret < 0)
+			_exit(PIDFD_ERROR);
+
+		/* grab pid PID_RECYCLE */
+		for (i = 0; i <= PIDFD_MAX_DEFAULT; i++) {
+			pid2 = fork();
+			if (pid2 < 0)
+				_exit(PIDFD_ERROR);
+
+			if (pid2 == 0)
+				_exit(PIDFD_PASS);
+
+			if (pid2 == PID_RECYCLE) {
+				snprintf(buf, sizeof(buf), "/proc/%d", pid2);
+				ksft_print_msg("pid to recycle is %d\n", pid2);
+				pidfd = open(buf, O_DIRECTORY | O_CLOEXEC);
+			}
+
+			if (wait_for_pid(pid2))
+				_exit(PIDFD_ERROR);
+
+			if (pid2 >= PID_RECYCLE)
+				break;
+		}
+
+		/*
+		 * We want to be as predictable as we can so if we haven't been
+		 * able to grab pid PID_RECYCLE skip the test.
+		 */
+		if (pid2 != PID_RECYCLE) {
+			/* skip test */
+			close(pidfd);
+			_exit(PIDFD_SKIP);
+		}
+
+		if (pidfd < 0)
+			_exit(PIDFD_ERROR);
+
+		for (i = 0; i <= PIDFD_MAX_DEFAULT; i++) {
+			char c;
+			int pipe_fds[2];
+			pid_t recycled_pid;
+			int child_ret = PIDFD_PASS;
+
+			ret = pipe2(pipe_fds, O_CLOEXEC);
+			if (ret < 0)
+				_exit(PIDFD_ERROR);
+
+			recycled_pid = fork();
+			if (recycled_pid < 0)
+				_exit(PIDFD_ERROR);
+
+			if (recycled_pid == 0) {
+				close(pipe_fds[1]);
+				(void)read(pipe_fds[0], &c, 1);
+				close(pipe_fds[0]);
+
+				_exit(PIDFD_PASS);
+			}
+
+			/*
+			 * Stop the child so we can inspect whether we have
+			 * recycled pid PID_RECYCLE.
+			 */
+			close(pipe_fds[0]);
+			ret = kill(recycled_pid, SIGSTOP);
+			close(pipe_fds[1]);
+			if (ret) {
+				(void)wait_for_pid(recycled_pid);
+				_exit(PIDFD_ERROR);
+			}
+
+			/*
+			 * We have recycled the pid. Try to signal it. This
+			 * needs to fail since this is a different process than
+			 * the one the pidfd refers to.
+			 */
+			if (recycled_pid == PID_RECYCLE) {
+				ret = sys_pidfd_send_signal(pidfd, SIGCONT,
+							    NULL, 0);
+				if (ret && errno == ESRCH)
+					child_ret = PIDFD_XFAIL;
+				else
+					child_ret = PIDFD_FAIL;
+			}
+
+			/* let the process move on */
+			ret = kill(recycled_pid, SIGCONT);
+			if (ret)
+				(void)kill(recycled_pid, SIGKILL);
+
+			if (wait_for_pid(recycled_pid))
+				_exit(PIDFD_ERROR);
+
+			switch (child_ret) {
+			case PIDFD_FAIL:
+				/* fallthrough */
+			case PIDFD_XFAIL:
+				_exit(child_ret);
+			case PIDFD_PASS:
+				break;
+			default:
+				/* not reached */
+				_exit(PIDFD_ERROR);
+			}
+
+			/*
+			 * If the user set a custom pid_max limit we could be
+			 * in the millions.
+			 * Skip the test in this case.
+			 */
+			if (recycled_pid > PIDFD_MAX_DEFAULT)
+				_exit(PIDFD_SKIP);
+		}
+
+		/* failed to recycle pid */
+		_exit(PIDFD_SKIP);
+	}
+
+	ret = wait_for_pid(pid1);
+	switch (ret) {
+	case PIDFD_FAIL:
+		ksft_exit_fail_msg(
+			"%s test: Managed to signal recycled pid %d\n",
+			test_name, PID_RECYCLE);
+	case PIDFD_PASS:
+		ksft_exit_fail_msg("%s test: Failed to recycle pid %d\n",
+				   test_name, PID_RECYCLE);
+	case PIDFD_SKIP:
+		ksft_print_msg("%s test: Skipping test\n", test_name);
+		ret = 0;
+		break;
+	case PIDFD_XFAIL:
+		ksft_test_result_pass(
+			"%s test: Failed to signal recycled pid as expected\n",
+			test_name);
+		ret = 0;
+		break;
+	default /* PIDFD_ERROR */:
+		ksft_exit_fail_msg("%s test: Error while running tests\n",
+				   test_name);
+	}
+
+	return ret;
+}
+
+static int test_pidfd_send_signal_syscall_support(void)
+{
+	int pidfd, ret;
+	const char *test_name = "pidfd_send_signal check for support";
+
+	pidfd = open("/proc/self", O_DIRECTORY | O_CLOEXEC);
+	if (pidfd < 0)
+		ksft_exit_fail_msg(
+			"%s test: Failed to open process file descriptor\n",
+			test_name);
+
+	ret = sys_pidfd_send_signal(pidfd, 0, NULL, 0);
+	if (ret < 0) {
+		/*
+		 * pidfd_send_signal() will currently return ENOSYS when
+		 * CONFIG_PROC_FS is not set.
+		 */
+		if (errno == ENOSYS)
+			ksft_exit_skip(
+				"%s test: pidfd_send_signal() syscall not supported (Ensure that CONFIG_PROC_FS=y is set)\n",
+				test_name);
+
+		ksft_exit_fail_msg("%s test: Failed to send signal\n",
+				   test_name);
+	}
+
+	close(pidfd);
+	ksft_test_result_pass(
+		"%s test: pidfd_send_signal() syscall is supported. Tests can be executed\n",
+		test_name);
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	ksft_print_header();
+
+	test_pidfd_send_signal_syscall_support();
+	test_pidfd_send_signal_simple_success();
+	test_pidfd_send_signal_exited_fail();
+	test_pidfd_send_signal_recycled_pid_fail();
+
+	return ksft_exit_pass();
+}