[v5,1/1] signal: add pidfd_send_signal() syscall

The kill() syscall operates on process identifiers (pid). After a process
has exited its pid can be reused by another process. If a caller sends a
signal to a reused pid it will end up signaling the wrong process. This
issue has often surfaced and there has been a push to address this problem [1].

This patch uses file descriptors (fd) from proc/<pid> as stable handles on
struct pid. Even if a pid is recycled the handle will not change. The fd
can be used to send signals to the process it refers to.
Thus, the new syscall pidfd_send_signal() is introduced to solve this
problem. Instead of pids it operates on process fds (pidfd).

/* prototype and argument /*
long pidfd_send_signal(int pidfd, int sig, siginfo_t *info, unsigned int flags);

In addition to the pidfd and signal argument it takes an additional
siginfo_t and flags argument. If the siginfo_t argument is NULL then
pidfd_send_signal() is equivalent to kill(<positive-pid>, <signal>). If it
is not NULL pidfd_send_signal() is equivalent to rt_sigqueueinfo().
The flags argument is added to allow for future extensions of this syscall.
It currently needs to be passed as 0. Failing to do so will cause EINVAL.

/* pidfd_send_signal() replaces multiple pid-based syscalls */
The pidfd_send_signal() syscall currently takes on the job of
rt_sigqueueinfo(2) and parts of the functionality of kill(2), Namely, when a
positive pid is passed to kill(2). It will however be possible to also
replace tgkill(2) and rt_tgsigqueueinfo(2) if this syscall is extended.

/* sending signals to threads (tid) and process groups (pgid) */
Specifically, the pidfd_send_signal() syscall does currently not operate on
process groups or threads. This is left for future extensions.
In order to extend the syscall to allow sending signal to threads and
process groups appropriately named flags (e.g. PIDFD_TYPE_PGID, and
PIDFD_TYPE_TID) should be added. This implies that the flags argument will
determine what is signaled and not the file descriptor itself. Put in other
words, grouping in this api is a property of the flags argument not a
property of the file descriptor (cf. [13]).
When appropriate extensions through the flags argument are added then
pidfd_send_signal() can additionally replace the part of kill(2) which
operates on process groups as well as the tgkill(2) and
rt_tgsigqueueinfo(2) syscalls.
How such an extension could be implemented has been very roughly sketched
in [14], [15], and [16]. However, this should not be taken as a commitment
to a particular implementation. There might be better ways to do it.
Right now this is intentionally left out to keep this patchset as simple as
possible (cf. [4]). For example, if a pidfd for a tid from
/proc/<pid>/task/<tid> is passed EOPNOTSUPP will be returned to give
userspace a way to detect when I add support for signaling to threads (cf. [10]).

/* naming */
The syscall had various names throughout iterations of this patchset:
- procfd_signal()
- procfd_send_signal()
- taskfd_send_signal()
In the last round of reviews it was pointed out that given that if the
flags argument decides the scope of the signal instead of different types
of fds it might make sense to either settle for "procfd_" or "pidfd_" as
prefix. The community was willing to accept either (cf. [17] and [18]).
Given that one developer expressed strong preference for the "pidfd_"
prefix (cf. [13] and with other developers less opinionated about the name
we should settle for "pidfd_" to avoid further bikeshedding.

The  "_send_signal" suffix was chosen to reflect the fact that the syscall
takes on the job of multiple syscalls. It is therefore intentional that the
name is not reminiscent of neither kill(2) nor rt_sigqueueinfo(2). Not the
fomer because it might imply that pidfd_send_signal() is a replacement for
kill(2), and not the latter because it is a hassle to remember the correct
spelling - especially for non-native speakers - and because it is not
descriptive enough of what the syscall actually does. The name
"pidfd_send_signal" makes it very clear that its job is to send signals.

/* O_PATH file descriptors */
pidfds opened as O_PATH fds cannot be used to send signals to a process
(cf. [2]). Signaling processes through pidfds is the equivalent of writing
to a file. Thus, this is not an operation that operates "purely at the file
descriptor level" as required by the open(2) manpage.

/* zombies */
Zombies can be signaled just as any other process. No special error will be
reported since a zombie state is an unreliable state (cf. [3]). However,
this can be added as an extension through the @flags argument if the need
ever arises.

/* cross-namespace signals */
The patch currently enforces that the signaler and signalee either are in
the same pid namespace or that the signaler's pid namespace is an ancestor
of the signalee's pid namespace. This is done for the sake of simplicity
and because it is unclear to what values certain members of struct
siginfo_t would need to be set to (cf. [5], [6]).

/* compat syscalls */
It became clear that we would like to avoid adding compat syscalls
(cf. [7]).  The compat syscall handling is now done in kernel/signal.c
itself by adding __copy_siginfo_from_user_generic() which lets us avoid
compat syscalls (cf. [8]). It should be noted that the addition of
__copy_siginfo_from_user_any() is caused by a bug in the original
implementation of rt_sigqueueinfo(2) (cf. 12).
With upcoming rework for syscall handling things might improve
significantly (cf. [11]) and __copy_siginfo_from_user_any() will not gain
any additional callers.

/* testing */
This patch was tested on x64 and x86.

/* userspace usage */
An asciinema recording for the basic functionality can be found under [9].
With this patch a process can be killed via:

 #define _GNU_SOURCE
 #include <errno.h>
 #include <fcntl.h>
 #include <signal.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <sys/stat.h>
 #include <sys/syscall.h>
 #include <sys/types.h>
 #include <unistd.h>

 static inline int do_pidfd_send_signal(int pidfd, int sig, siginfo_t *info,
                                         unsigned int flags)
 {
 #ifdef __NR_pidfd_send_signal
         return syscall(__NR_pidfd_send_signal, pidfd, sig, info, flags);
 #else
         return -ENOSYS;
 #endif
 }

 int main(int argc, char *argv[])
 {
         int fd, ret, saved_errno, sig;

         if (argc < 3)
                 exit(EXIT_FAILURE);

         fd = open(argv[1], O_DIRECTORY | O_CLOEXEC);
         if (fd < 0) {
                 printf("%s - Failed to open \"%s\"\n", strerror(errno), argv[1]);
                 exit(EXIT_FAILURE);
         }

         sig = atoi(argv[2]);

         printf("Sending signal %d to process %s\n", sig, argv[1]);
         ret = do_pidfd_send_signal(fd, sig, NULL, 0);

         saved_errno = errno;
         close(fd);
         errno = saved_errno;

         if (ret < 0) {
                 printf("%s - Failed to send signal %d to process %s\n",
                        strerror(errno), sig, argv[1]);
                 exit(EXIT_FAILURE);
         }

         exit(EXIT_SUCCESS);
 }

[1]:  https://lore.kernel.org/lkml/20181029221037.87724-1-dancol@google.com/
[2]:  https://lore.kernel.org/lkml/874lbtjvtd.fsf@oldenburg2.str.redhat.com/
[3]:  https://lore.kernel.org/lkml/20181204132604.aspfupwjgjx6fhva@brauner.io/
[4]:  https://lore.kernel.org/lkml/20181203180224.fkvw4kajtbvru2ku@brauner.io/
[5]:  https://lore.kernel.org/lkml/20181121213946.GA10795@mail.hallyn.com/
[6]:  https://lore.kernel.org/lkml/20181120103111.etlqp7zop34v6nv4@brauner.io/
[7]:  https://lore.kernel.org/lkml/36323361-90BD-41AF-AB5B-EE0D7BA02C21@amacapital.net/
[8]:  https://lore.kernel.org/lkml/87tvjxp8pc.fsf@xmission.com/
[9]:  https://asciinema.org/a/IQjuCHew6bnq1cr78yuMv16cy
[10]: https://lore.kernel.org/lkml/20181203180224.fkvw4kajtbvru2ku@brauner.io/
[11]: https://lore.kernel.org/lkml/F53D6D38-3521-4C20-9034-5AF447DF62FF@amacapital.net/
[12]: https://lore.kernel.org/lkml/87zhtjn8ck.fsf@xmission.com/
[13]: https://lore.kernel.org/lkml/871s6u9z6u.fsf@xmission.com/
[14]: https://lore.kernel.org/lkml/20181206231742.xxi4ghn24z4h2qki@brauner.io/
[15]: https://lore.kernel.org/lkml/20181207003124.GA11160@mail.hallyn.com/
[16]: https://lore.kernel.org/lkml/20181207015423.4miorx43l3qhppfz@brauner.io/
[17]: https://lore.kernel.org/lkml/CAGXu5jL8PciZAXvOvCeCU3wKUEB_dU-O3q0tDw4uB_ojMvDEew@mail.gmail.com/
[18]: https://lore.kernel.org/lkml/20181206222746.GB9224@mail.hallyn.com/

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Jann Horn <jannh@google.com>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Christian Brauner <christian@brauner.io>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Serge Hallyn <serge@hallyn.com>
Acked-by: Aleksa Sarai <cyphar@cyphar.com>
---
Changelog:
v5:
- s/may_signal_taskfd/access_taskfd_pidns/g
- make it clear that process grouping is a property of the @flags argument
  Eric has argued that he would like to know when we add thread and process
  group signal support whether grouping will be a property of the file
  descriptor or the flag argument and he would oppose this until a
  commitment has been made. It seems that the cleanest strategy is to make
  grouping a property of the @flags argument.
  He also argued that in this case the prefix of the syscall should be
  "pidfd_" (cf. https://lore.kernel.org/lkml/871s6u9z6u.fsf@xmission.com/).
- use "pidfd_" as prefix for the syscall since grouping will be a property
  of the @flags argument
- substantial rewrite of the commit message to reflect the discussion
v4:
- updated asciinema to use "taskfd_" prefix
- s/procfd_send_signal/taskfd_send_signal/g
- s/proc_is_tgid_procfd/tgid_taskfd_to_pid/b
- s/proc_is_tid_procfd/tid_taskfd_to_pid/b
- s/__copy_siginfo_from_user_generic/__copy_siginfo_from_user_any/g
- make it clear that __copy_siginfo_from_user_any() is a workaround caused
  by a bug in the original implementation of rt_sigqueueinfo()
- when spoofing signals turn them into regular kill signals if si_code is
  set to SI_USER
- make proc_is_t{g}id_procfd() return struct pid to allow proc_pid() to
  stay private to fs/proc/
v3:
- add __copy_siginfo_from_user_generic() to avoid adding compat syscalls
- s/procfd_signal/procfd_send_signal/g
- change type of flags argument from int to unsigned int
- add comment about what happens to zombies
- add proc_is_tid_procfd()
- return EOPNOTSUPP when /proc/<pid>/task/<tid> fd is passed so userspace
  has a way of knowing that tidfds are not supported currently.
v2:
- define __NR_procfd_signal in unistd.h
- wire up compat syscall
- s/proc_is_procfd/proc_is_tgid_procfd/g
- provide stubs when CONFIG_PROC_FS=n
- move proc_pid() to linux/proc_fs.h header
- use proc_pid() to grab struct pid from /proc/<pid> fd
v1:
- patch introduced
---
 arch/x86/entry/syscalls/syscall_32.tbl |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl |   1 +
 fs/proc/base.c                         |  20 +++-
 include/linux/proc_fs.h                |  12 +++
 include/linux/syscalls.h               |   3 +
 include/uapi/asm-generic/unistd.h      |   4 +-
 kernel/signal.c                        | 141 +++++++++++++++++++++++--
 7 files changed, 173 insertions(+), 9 deletions(-)

Message ID	20181208054059.19813-2-christian@brauner.io (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@kernel.org> Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2E3D6109C for <patchwork-linux-fsdevel@patchwork.kernel.org>; Sat, 8 Dec 2018 05:41:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1D1122D1A0 for <patchwork-linux-fsdevel@patchwork.kernel.org>; Sat, 8 Dec 2018 05:41:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0EFC32D1A9; Sat, 8 Dec 2018 05:41:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.0 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8FF5F2D1A7 for <patchwork-linux-fsdevel@patchwork.kernel.org>; Sat, 8 Dec 2018 05:41:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726206AbeLHFla (ORCPT <rfc822;patchwork-linux-fsdevel@patchwork.kernel.org>); Sat, 8 Dec 2018 00:41:30 -0500 Received: from mail-pl1-f193.google.com ([209.85.214.193]:36144 "EHLO mail-pl1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726182AbeLHFla (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>); Sat, 8 Dec 2018 00:41:30 -0500 Received: by mail-pl1-f193.google.com with SMTP id g9so2781323plo.3 for <linux-fsdevel@vger.kernel.org>; Fri, 07 Dec 2018 21:41:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=brauner.io; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=0wcwV+ADdWYdNgeENO19WNJQovZabc2cDfiUOdCJpn8=; b=AxJt8ImQyT//+l3F1hGfnh/amfcFPr0+4PCpmiSfddGz2oOqhVXUeR+vjUe0RE7Qzp VvIXRBSYWYRi8FJX9BaABI+w792Nk9/MG3tnMB3FBBXy0rGKLUreuzAqpgAefuqCoaU+ ABojBN2lYUI8HvoaSJcUyPt/S8QdXrRwT8jSL+i4ptia1GfVQ6gpHPQi6D3mg9bYkojr w2xptyJ+4tBC9GZAr4d35LJh0xtDR8RZC4vpmmDQAbNLqlte/QqIJQCwIgxHxPeb4dPK 1CjsSteVxFTyps62NTG0BEXMn1Xe1hBadnaaxXUW8K2KZfqYp7L6pTSkOCgmFDZV0Jy7 vRrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=0wcwV+ADdWYdNgeENO19WNJQovZabc2cDfiUOdCJpn8=; b=P7Wly7EuQSFoZ5J9ybWS28gnm6D+FF51ibAZrXXWQjdNJoWi6jstfFQpjg/cH3tRa8 964osu52yKLN3i9YNCCe7m1Z+K31wVvYvnPoCR/5yGVquyu0DhWQr9c5zpC44KFe4wCS fC0i0aB6H9edahp4W6XsNt+beXUoTZrAmVnlQTehyhyACwzo3lJ4JhTECOZr1uJjUBLK YXXRoDbDFzdi4pYht8lwC629T/KiHotms8/nRZaJAJ5RPbSvEvnhL+9AZXefFzRGbrPO FaBZfnHC4NCdibjmZq71AomH6oJvg1NOM05PnRCVbWXAM26Bg9Elsfn8PWXSWOaVXVU2 NzLw== X-Gm-Message-State: AA+aEWYu56OJss0YwWpTa9Nf6BL3biwzRKjZm15czOmTf+/aeQLzFup0 rou5XtwOC0y7BTROxYC7Kois/Q== X-Google-Smtp-Source: AFSGD/VC0mBJzcWGniQoyDTCc9cuWFpFfT5kA3b/o+p4IG/F8ZYIjua2KllW2onL7t4ZX/Qd9Nk9vA== X-Received: by 2002:a17:902:8e8a:: with SMTP id bg10mr4763095plb.192.1544247688992; Fri, 07 Dec 2018 21:41:28 -0800 (PST) Received: from localhost.localdomain ([208.181.63.202]) by smtp.gmail.com with ESMTPSA id b10sm8249946pfj.183.2018.12.07.21.41.27 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 07 Dec 2018 21:41:28 -0800 (PST) From: Christian Brauner <christian@brauner.io> To: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, luto@kernel.org, arnd@arndb.de, ebiederm@xmission.com, serge@hallyn.com, keescook@chromium.org Cc: jannh@google.com, akpm@linux-foundation.org, oleg@redhat.com, cyphar@cyphar.com, viro@zeniv.linux.org.uk, linux-fsdevel@vger.kernel.org, dancol@google.com, timmurray@google.com, fweimer@redhat.com, tglx@linutronix.de, x86@kernel.org, Christian Brauner <christian@brauner.io> Subject: [PATCH v5 1/1] signal: add pidfd_send_signal() syscall Date: Sat, 8 Dec 2018 06:40:59 +0100 Message-Id: <20181208054059.19813-2-christian@brauner.io> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20181208054059.19813-1-christian@brauner.io> References: <20181208054059.19813-1-christian@brauner.io> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: <linux-fsdevel.vger.kernel.org> X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP
Series	signaling processes through pidfds \| expand [v5,0/1] signaling processes through pidfds [v5,1/1] signal: add pidfd_send_signal() syscall

[v5,1/1] signal: add pidfd_send_signal() syscall

Commit Message

Comments

Patch