From patchwork Thu Sep 27 15:11:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tycho Andersen X-Patchwork-Id: 10618147 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E2E2516B1 for ; Thu, 27 Sep 2018 15:12:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CF0A02BA17 for ; Thu, 27 Sep 2018 15:12:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C2F822BA2A; Thu, 27 Sep 2018 15:12:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 767242BA17 for ; Thu, 27 Sep 2018 15:12:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728139AbeI0VaW (ORCPT ); Thu, 27 Sep 2018 17:30:22 -0400 Received: from mail-pf1-f196.google.com ([209.85.210.196]:40095 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727335AbeI0VaV (ORCPT ); Thu, 27 Sep 2018 17:30:21 -0400 Received: by mail-pf1-f196.google.com with SMTP id s5-v6so2130122pfj.7 for ; Thu, 27 Sep 2018 08:11:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tycho-ws.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=FWlFmwoBFpiQbCSs73EAjiaar8+xXcG1gzdSFkImaIU=; b=yqzSrEeRmRJUb4xdjXtpGQsM37zH8Wh2esHw1uvYhgyIboWIHKStYuSc6z3y4xQ8n5 8tHXTJm3ObinwgrhQmmPNF/MD6UPbPLmMdp2p9Ou0Jg65WkQ4Z/zni6ysYFL4Iagp5Lo LgYWvRSs38OnfUw2dt/zMhrIUOODL9Cx2SEh3mHEYoRSAyEcmz3VCQ1MW4YrC9CYCurh lmPSQRYo7321hsFq7bIvefeBDbrr53HZ3q2bd7XvJe6vAafgru/ztxh33UpwW2/xb+fD 1LrXHU/M+0zdEJ2PbfL4OavXO16bqqxUBneGZ1TApUASPbZhGPEfOziVpvIiCeLvhCtc jM/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=FWlFmwoBFpiQbCSs73EAjiaar8+xXcG1gzdSFkImaIU=; b=tQBRFfEFF0Jn8KIM6IpgwXxGDVsrg7YzehgomUnEC8k20NETCQc6SLkSCpuigwPpzA sEBMbrVTWLadEBK9kA1/TNQ+ed5mXV4mp5cp+oXbqVuQahOelTBucZ1naKUQUZpwyET0 ZzJ62CYnknWnkvl2Ekoxb4H+0PRrGRD2HlStAWcTL3x0U1TDR8LlXojfTpd37WAhO7m6 3RZ0myD8RkLkXPerJ4KdWKxl2s5p283gWTZs1nmQVlhIQjtuiWUwcn07wAf0e2deZtCV 4XB4HIEaKThFHiUIzquKFLsUs+rULtNue/P8NhMCSirZTcQuyupguSdCAxyNsSUnYvYI JnRQ== X-Gm-Message-State: ABuFfojWq6I80HKQ5gCWtUzzy3lQ/FJaIg/zIW4h1PZBGvGt139DwGuz PokGCs9/4HBms1Cq1lqki85SOA== X-Google-Smtp-Source: ACcGV62M4vLiuLcbfLPWXvGsLWb1l7+oelbbkmpNN0B35EsbHW2kOgkrmbANXEtLcOWUF6nplaPADg== X-Received: by 2002:a63:27c1:: with SMTP id n184-v6mr10851165pgn.278.1538061097712; Thu, 27 Sep 2018 08:11:37 -0700 (PDT) Received: from localhost.localdomain ([128.107.241.178]) by smtp.gmail.com with ESMTPSA id y19-v6sm5429610pff.14.2018.09.27.08.11.35 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 Sep 2018 08:11:36 -0700 (PDT) From: Tycho Andersen To: Kees Cook Cc: linux-kernel@vger.kernel.org, containers@lists.linux-foundation.org, linux-api@vger.kernel.org, Andy Lutomirski , Oleg Nesterov , "Eric W . Biederman" , "Serge E . Hallyn" , Christian Brauner , Tyler Hicks , Akihiro Suda , Jann Horn , linux-fsdevel@vger.kernel.org, Tycho Andersen Subject: [PATCH v7 1/6] seccomp: add a return code to trap to userspace Date: Thu, 27 Sep 2018 09:11:14 -0600 Message-Id: <20180927151119.9989-2-tycho@tycho.ws> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180927151119.9989-1-tycho@tycho.ws> References: <20180927151119.9989-1-tycho@tycho.ws> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch introduces a means for syscalls matched in seccomp to notify some other task that a particular filter has been triggered. The motivation for this is primarily for use with containers. For example, if a container does an init_module(), we obviously don't want to load this untrusted code, which may be compiled for the wrong version of the kernel anyway. Instead, we could parse the module image, figure out which module the container is trying to load and load it on the host. As another example, containers cannot mknod(), since this checks capable(CAP_SYS_ADMIN). However, harmless devices like /dev/null or /dev/zero should be ok for containers to mknod, but we'd like to avoid hard coding some whitelist in the kernel. Another example is mount(), which has many security restrictions for good reason, but configuration or runtime knowledge could potentially be used to relax these restrictions. This patch adds functionality that is already possible via at least two other means that I know about, both of which involve ptrace(): first, one could ptrace attach, and then iterate through syscalls via PTRACE_SYSCALL. Unfortunately this is slow, so a faster version would be to install a filter that does SECCOMP_RET_TRACE, which triggers a PTRACE_EVENT_SECCOMP. Since ptrace allows only one tracer, if the container runtime is that tracer, users inside the container (or outside) trying to debug it will not be able to use ptrace, which is annoying. It also means that older distributions based on Upstart cannot boot inside containers using ptrace, since upstart itself uses ptrace to start services. The actual implementation of this is fairly small, although getting the synchronization right was/is slightly complex. Finally, it's worth noting that the classic seccomp TOCTOU of reading memory data from the task still applies here, but can be avoided with careful design of the userspace handler: if the userspace handler reads all of the task memory that is necessary before applying its security policy, the tracee's subsequent memory edits will not be read by the tracer. v2: * make id a u64; the idea here being that it will never overflow, because 64 is huge (one syscall every nanosecond => wrap every 584 years) (Andy) * prevent nesting of user notifications: if someone is already attached the tree in one place, nobody else can attach to the tree (Andy) * notify the listener of signals the tracee receives as well (Andy) * implement poll v3: * lockdep fix (Oleg) * drop unnecessary WARN()s (Christian) * rearrange error returns to be more rpetty (Christian) * fix build in !CONFIG_SECCOMP_USER_NOTIFICATION case v4: * fix implementation of poll to use poll_wait() (Jann) * change listener's fd flags to be 0 (Jann) * hoist filter initialization out of ifdefs to its own function init_user_notification() * add some more testing around poll() and closing the listener while a syscall is in action * s/GET_LISTENER/NEW_LISTENER, since you can't _get_ a listener, but it creates a new one (Matthew) * correctly handle pid namespaces, add some testcases (Matthew) * use EINPROGRESS instead of EINVAL when a notification response is written twice (Matthew) * fix comment typo from older version (SEND vs READ) (Matthew) * whitespace and logic simplification (Tobin) * add some Documentation/ bits on userspace trapping v5: * fix documentation typos (Jann) * add signalled field to struct seccomp_notif (Jann) * switch to using ioctls instead of read()/write() for struct passing (Jann) * add an ioctl to ensure an id is still valid v6: * docs typo fixes, update docs for ioctl() change (Christian) v7: * switch struct seccomp_knotif's id member to a u64 (derp :) * use notify_lock in IS_ID_VALID query to avoid racing * s/signalled/signaled (Tyler) * fix docs to reflect that ids are not globally unique (Tyler) * add a test to check -ERESTARTSYS behavior (Tyler) * drop CONFIG_SECCOMP_USER_NOTIFICATION (Tyler) * reorder USER_NOTIF in seccomp return codes list (Tyler) * return size instead of sizeof(struct user_notif) (Tyler) * ENOENT instead of EINVAL when invalid id is passed (Tyler) * drop CONFIG_SECCOMP_USER_NOTIFICATION guards (Tyler) * s/IS_ID_VALID/ID_VALID and switch ioctl to be "well behaved" (Tyler) * add a new struct notification to minimize the additions to struct seccomp_filter, also pack the necessary additions a bit more cleverly (Tyler) * switch to keeping track of the task itself instead of the pid (we'll use this for implementing PUT_FD) Signed-off-by: Tycho Andersen CC: Kees Cook CC: Andy Lutomirski CC: Oleg Nesterov CC: Eric W. Biederman CC: "Serge E. Hallyn" CC: Christian Brauner CC: Tyler Hicks CC: Akihiro Suda --- Documentation/ioctl/ioctl-number.txt | 1 + .../userspace-api/seccomp_filter.rst | 73 +++ include/linux/seccomp.h | 7 +- include/uapi/linux/seccomp.h | 33 +- kernel/seccomp.c | 436 +++++++++++++++++- tools/testing/selftests/seccomp/seccomp_bpf.c | 413 ++++++++++++++++- 6 files changed, 954 insertions(+), 9 deletions(-) diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt index 13a7c999c04a..31e9707f7e06 100644 --- a/Documentation/ioctl/ioctl-number.txt +++ b/Documentation/ioctl/ioctl-number.txt @@ -345,4 +345,5 @@ Code Seq#(hex) Include File Comments 0xF6 all LTTng Linux Trace Toolkit Next Generation +0xF7 00-1F uapi/linux/seccomp.h 0xFD all linux/dm-ioctl.h diff --git a/Documentation/userspace-api/seccomp_filter.rst b/Documentation/userspace-api/seccomp_filter.rst index 82a468bc7560..d2e61f1c0a0b 100644 --- a/Documentation/userspace-api/seccomp_filter.rst +++ b/Documentation/userspace-api/seccomp_filter.rst @@ -122,6 +122,11 @@ In precedence order, they are: Results in the lower 16-bits of the return value being passed to userland as the errno without executing the system call. +``SECCOMP_RET_USER_NOTIF``: + Results in a ``struct seccomp_notif`` message sent on the userspace + notification fd, if it is attached, or ``-ENOSYS`` if it is not. See below + on discussion of how to handle user notifications. + ``SECCOMP_RET_TRACE``: When returned, this value will cause the kernel to attempt to notify a ``ptrace()``-based tracer prior to executing the system @@ -183,6 +188,74 @@ The ``samples/seccomp/`` directory contains both an x86-specific example and a more generic example of a higher level macro interface for BPF program generation. +Userspace Notification +====================== + +The ``SECCOMP_RET_USER_NOTIF`` return code lets seccomp filters pass a +particular syscall to userspace to be handled. This may be useful for +applications like container managers, which wish to intercept particular +syscalls (``mount()``, ``finit_module()``, etc.) and change their behavior. + +There are currently two APIs to acquire a userspace notification fd for a +particular filter. The first is when the filter is installed, the task +installing the filter can ask the ``seccomp()`` syscall: + +.. code-block:: + + fd = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_NEW_LISTENER, &prog); + +which (on success) will return a listener fd for the filter, which can then be +passed around via ``SCM_RIGHTS`` or similar. Alternatively, a filter fd can be +acquired via: + +.. code-block:: + + fd = ptrace(PTRACE_SECCOMP_NEW_LISTENER, pid, 0); + +which grabs the 0th filter for some task which the tracer has privilege over. +Note that filter fds correspond to a particular filter, and not a particular +task. So if this task then forks, notifications from both tasks will appear on +the same filter fd. Reads and writes to/from a filter fd are also synchronized, +so a filter fd can safely have many readers. + +The interface for a seccomp notification fd consists of two structures: + +.. code-block:: + + struct seccomp_notif { + __u16 len; + __u64 id; + pid_t pid; + __u8 signalled; + struct seccomp_data data; + }; + + struct seccomp_notif_resp { + __u16 len; + __u64 id; + __s32 error; + __s64 val; + }; + +Users can read via ``ioctl(SECCOMP_NOTIF_RECV)`` (or ``poll()``) on a seccomp +notification fd to receive a ``struct seccomp_notif``, which contains five +members: the input length of the structure, a unique-per-filter ``id``, the +``pid`` of the task which triggered this request (which may be 0 if the task is +in a pid ns not visible from the listener's pid namespace), a flag representing +whether or not the notification is a result of a non-fatal signal, and the +``data`` passed to seccomp. Userspace can then make a decision based on this +information about what to do, and ``ioctl(SECCOMP_NOTIF_SEND)`` a response, +indicating what should be returned to userspace. The ``id`` member of ``struct +seccomp_notif_resp`` should be the same ``id`` as in ``struct seccomp_notif``. + +It is worth noting that ``struct seccomp_data`` contains the values of register +arguments to the syscall, but does not contain pointers to memory. The task's +memory is accessible to suitably privileged traces via ``ptrace()`` or +``/proc/pid/map_files/``. However, care should be taken to avoid the TOCTOU +mentioned above in this document: all arguments being read from the tracee's +memory should be read into the tracer's memory before any policy decisions are +made. This allows for an atomic decision on syscall arguments. + Sysctls ======= diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h index e5320f6c8654..017444b5efed 100644 --- a/include/linux/seccomp.h +++ b/include/linux/seccomp.h @@ -4,9 +4,10 @@ #include -#define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC | \ - SECCOMP_FILTER_FLAG_LOG | \ - SECCOMP_FILTER_FLAG_SPEC_ALLOW) +#define SECCOMP_FILTER_FLAG_MASK (SECCOMP_FILTER_FLAG_TSYNC | \ + SECCOMP_FILTER_FLAG_LOG | \ + SECCOMP_FILTER_FLAG_SPEC_ALLOW | \ + SECCOMP_FILTER_FLAG_NEW_LISTENER) #ifdef CONFIG_SECCOMP diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h index 9efc0e73d50b..d4ccb32fe089 100644 --- a/include/uapi/linux/seccomp.h +++ b/include/uapi/linux/seccomp.h @@ -17,9 +17,10 @@ #define SECCOMP_GET_ACTION_AVAIL 2 /* Valid flags for SECCOMP_SET_MODE_FILTER */ -#define SECCOMP_FILTER_FLAG_TSYNC (1UL << 0) -#define SECCOMP_FILTER_FLAG_LOG (1UL << 1) -#define SECCOMP_FILTER_FLAG_SPEC_ALLOW (1UL << 2) +#define SECCOMP_FILTER_FLAG_TSYNC (1UL << 0) +#define SECCOMP_FILTER_FLAG_LOG (1UL << 1) +#define SECCOMP_FILTER_FLAG_SPEC_ALLOW (1UL << 2) +#define SECCOMP_FILTER_FLAG_NEW_LISTENER (1UL << 3) /* * All BPF programs must return a 32-bit value. @@ -35,6 +36,7 @@ #define SECCOMP_RET_KILL SECCOMP_RET_KILL_THREAD #define SECCOMP_RET_TRAP 0x00030000U /* disallow and force a SIGSYS */ #define SECCOMP_RET_ERRNO 0x00050000U /* returns an errno */ +#define SECCOMP_RET_USER_NOTIF 0x7fc00000U /* notifies userspace */ #define SECCOMP_RET_TRACE 0x7ff00000U /* pass to a tracer or disallow */ #define SECCOMP_RET_LOG 0x7ffc0000U /* allow after logging */ #define SECCOMP_RET_ALLOW 0x7fff0000U /* allow */ @@ -60,4 +62,29 @@ struct seccomp_data { __u64 args[6]; }; +struct seccomp_notif { + __u16 len; + __u64 id; + __u32 pid; + __u8 signaled; + struct seccomp_data data; +}; + +struct seccomp_notif_resp { + __u16 len; + __u64 id; + __s32 error; + __s64 val; +}; + +#define SECCOMP_IOC_MAGIC 0xF7 + +/* Flags for seccomp notification fd ioctl. */ +#define SECCOMP_NOTIF_RECV _IOWR(SECCOMP_IOC_MAGIC, 0, \ + struct seccomp_notif) +#define SECCOMP_NOTIF_SEND _IOWR(SECCOMP_IOC_MAGIC, 1, \ + struct seccomp_notif_resp) +#define SECCOMP_NOTIF_ID_VALID _IOR(SECCOMP_IOC_MAGIC, 2, \ + __u64) + #endif /* _UAPI_LINUX_SECCOMP_H */ diff --git a/kernel/seccomp.c b/kernel/seccomp.c index fd023ac24e10..fa6fe9756c80 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -33,12 +33,78 @@ #endif #ifdef CONFIG_SECCOMP_FILTER +#include #include #include #include #include #include #include +#include + +enum notify_state { + SECCOMP_NOTIFY_INIT, + SECCOMP_NOTIFY_SENT, + SECCOMP_NOTIFY_REPLIED, +}; + +struct seccomp_knotif { + /* The struct pid of the task whose filter triggered the notification */ + struct task_struct *task; + + /* The "cookie" for this request; this is unique for this filter. */ + u64 id; + + /* Whether or not this task has been given an interruptible signal. */ + bool signaled; + + /* + * The seccomp data. This pointer is valid the entire time this + * notification is active, since it comes from __seccomp_filter which + * eclipses the entire lifecycle here. + */ + const struct seccomp_data *data; + + /* + * Notification states. When SECCOMP_RET_USER_NOTIF is returned, a + * struct seccomp_knotif is created and starts out in INIT. Once the + * handler reads the notification off of an FD, it transitions to SENT. + * If a signal is received the state transitions back to INIT and + * another message is sent. When the userspace handler replies, state + * transitions to REPLIED. + */ + enum notify_state state; + + /* The return values, only valid when in SECCOMP_NOTIFY_REPLIED */ + int error; + long val; + + /* Signals when this has entered SECCOMP_NOTIFY_REPLIED */ + struct completion ready; + + struct list_head list; +}; + +/** + * struct notification - container for seccomp userspace notifications. Since + * most seccomp filters will not have notification listeners attached and this + * structure is fairly large, we store the notification-specific stuff in a + * separate structure. + * + * @request: A semaphore that users of this notification can wait on for + * changes. Actual reads and writes are still controlled with + * filter->notify_lock. + * @notify_lock: A lock for all notification-related accesses. + * @next_id: The id of the next request. + * @notifications: A list of struct seccomp_knotif elements. + * @wqh: A wait queue for poll. + */ +struct notification { + struct semaphore request; + u64 next_id; + struct list_head notifications; + wait_queue_head_t wqh; +}; /** * struct seccomp_filter - container for seccomp BPF programs @@ -66,6 +132,8 @@ struct seccomp_filter { bool log; struct seccomp_filter *prev; struct bpf_prog *prog; + struct notification *notif; + struct mutex notify_lock; }; /* Limit any path through the tree to 256KB worth of instructions. */ @@ -392,6 +460,7 @@ static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog) if (!sfilter) return ERR_PTR(-ENOMEM); + mutex_init(&sfilter->notify_lock); ret = bpf_prog_create_from_user(&sfilter->prog, fprog, seccomp_check_filter, save_orig); if (ret < 0) { @@ -556,11 +625,13 @@ static void seccomp_send_sigsys(int syscall, int reason) #define SECCOMP_LOG_TRACE (1 << 4) #define SECCOMP_LOG_LOG (1 << 5) #define SECCOMP_LOG_ALLOW (1 << 6) +#define SECCOMP_LOG_USER_NOTIF (1 << 7) static u32 seccomp_actions_logged = SECCOMP_LOG_KILL_PROCESS | SECCOMP_LOG_KILL_THREAD | SECCOMP_LOG_TRAP | SECCOMP_LOG_ERRNO | + SECCOMP_LOG_USER_NOTIF | SECCOMP_LOG_TRACE | SECCOMP_LOG_LOG; @@ -581,6 +652,9 @@ static inline void seccomp_log(unsigned long syscall, long signr, u32 action, case SECCOMP_RET_TRACE: log = requested && seccomp_actions_logged & SECCOMP_LOG_TRACE; break; + case SECCOMP_RET_USER_NOTIF: + log = requested && seccomp_actions_logged & SECCOMP_LOG_USER_NOTIF; + break; case SECCOMP_RET_LOG: log = seccomp_actions_logged & SECCOMP_LOG_LOG; break; @@ -652,6 +726,73 @@ void secure_computing_strict(int this_syscall) #else #ifdef CONFIG_SECCOMP_FILTER +static u64 seccomp_next_notify_id(struct seccomp_filter *filter) +{ + /* Note: overflow is ok here, the id just needs to be unique */ + return filter->notif->next_id++; +} + +static void seccomp_do_user_notification(int this_syscall, + struct seccomp_filter *match, + const struct seccomp_data *sd) +{ + int err; + long ret = 0; + struct seccomp_knotif n = {}; + + mutex_lock(&match->notify_lock); + err = -ENOSYS; + if (!match->notif) + goto out; + + n.task = current; + n.state = SECCOMP_NOTIFY_INIT; + n.data = sd; + n.id = seccomp_next_notify_id(match); + init_completion(&n.ready); + + list_add(&n.list, &match->notif->notifications); + wake_up_poll(&match->notif->wqh, EPOLLIN | EPOLLRDNORM); + + mutex_unlock(&match->notify_lock); + up(&match->notif->request); + + err = wait_for_completion_interruptible(&n.ready); + mutex_lock(&match->notify_lock); + + /* + * Here it's possible we got a signal and then had to wait on the mutex + * while the reply was sent, so let's be sure there wasn't a response + * in the meantime. + */ + if (err < 0 && n.state != SECCOMP_NOTIFY_REPLIED) { + /* + * We got a signal. Let's tell userspace about it (potentially + * again, if we had already notified them about the first one). + */ + n.signaled = true; + if (n.state == SECCOMP_NOTIFY_SENT) { + n.state = SECCOMP_NOTIFY_INIT; + up(&match->notif->request); + } + mutex_unlock(&match->notify_lock); + err = wait_for_completion_killable(&n.ready); + mutex_lock(&match->notify_lock); + if (err < 0) + goto remove_list; + } + + ret = n.val; + err = n.error; + +remove_list: + list_del(&n.list); +out: + mutex_unlock(&match->notify_lock); + syscall_set_return_value(current, task_pt_regs(current), + err, ret); +} + static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd, const bool recheck_after_trace) { @@ -728,6 +869,9 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd, return 0; + case SECCOMP_RET_USER_NOTIF: + seccomp_do_user_notification(this_syscall, match, sd); + goto skip; case SECCOMP_RET_LOG: seccomp_log(this_syscall, 0, action, true); return 0; @@ -834,6 +978,9 @@ static long seccomp_set_mode_strict(void) } #ifdef CONFIG_SECCOMP_FILTER +static struct file *init_listener(struct task_struct *, + struct seccomp_filter *); + /** * seccomp_set_mode_filter: internal function for setting seccomp filter * @flags: flags to change filter behavior @@ -853,6 +1000,8 @@ static long seccomp_set_mode_filter(unsigned int flags, const unsigned long seccomp_mode = SECCOMP_MODE_FILTER; struct seccomp_filter *prepared = NULL; long ret = -EINVAL; + int listener = 0; + struct file *listener_f = NULL; /* Validate flags. */ if (flags & ~SECCOMP_FILTER_FLAG_MASK) @@ -863,13 +1012,28 @@ static long seccomp_set_mode_filter(unsigned int flags, if (IS_ERR(prepared)) return PTR_ERR(prepared); + if (flags & SECCOMP_FILTER_FLAG_NEW_LISTENER) { + listener = get_unused_fd_flags(0); + if (listener < 0) { + ret = listener; + goto out_free; + } + + listener_f = init_listener(current, prepared); + if (IS_ERR(listener_f)) { + put_unused_fd(listener); + ret = PTR_ERR(listener_f); + goto out_free; + } + } + /* * Make sure we cannot change seccomp or nnp state via TSYNC * while another thread is in the middle of calling exec. */ if (flags & SECCOMP_FILTER_FLAG_TSYNC && mutex_lock_killable(¤t->signal->cred_guard_mutex)) - goto out_free; + goto out_put_fd; spin_lock_irq(¤t->sighand->siglock); @@ -887,6 +1051,16 @@ static long seccomp_set_mode_filter(unsigned int flags, spin_unlock_irq(¤t->sighand->siglock); if (flags & SECCOMP_FILTER_FLAG_TSYNC) mutex_unlock(¤t->signal->cred_guard_mutex); +out_put_fd: + if (flags & SECCOMP_FILTER_FLAG_NEW_LISTENER) { + if (ret < 0) { + fput(listener_f); + put_unused_fd(listener); + } else { + fd_install(listener, listener_f); + ret = listener; + } + } out_free: seccomp_filter_free(prepared); return ret; @@ -911,6 +1085,7 @@ static long seccomp_get_action_avail(const char __user *uaction) case SECCOMP_RET_KILL_THREAD: case SECCOMP_RET_TRAP: case SECCOMP_RET_ERRNO: + case SECCOMP_RET_USER_NOTIF: case SECCOMP_RET_TRACE: case SECCOMP_RET_LOG: case SECCOMP_RET_ALLOW: @@ -1111,6 +1286,7 @@ long seccomp_get_metadata(struct task_struct *task, #define SECCOMP_RET_KILL_THREAD_NAME "kill_thread" #define SECCOMP_RET_TRAP_NAME "trap" #define SECCOMP_RET_ERRNO_NAME "errno" +#define SECCOMP_RET_USER_NOTIF_NAME "user_notif" #define SECCOMP_RET_TRACE_NAME "trace" #define SECCOMP_RET_LOG_NAME "log" #define SECCOMP_RET_ALLOW_NAME "allow" @@ -1120,6 +1296,7 @@ static const char seccomp_actions_avail[] = SECCOMP_RET_KILL_THREAD_NAME " " SECCOMP_RET_TRAP_NAME " " SECCOMP_RET_ERRNO_NAME " " + SECCOMP_RET_USER_NOTIF_NAME " " SECCOMP_RET_TRACE_NAME " " SECCOMP_RET_LOG_NAME " " SECCOMP_RET_ALLOW_NAME; @@ -1134,6 +1311,7 @@ static const struct seccomp_log_name seccomp_log_names[] = { { SECCOMP_LOG_KILL_THREAD, SECCOMP_RET_KILL_THREAD_NAME }, { SECCOMP_LOG_TRAP, SECCOMP_RET_TRAP_NAME }, { SECCOMP_LOG_ERRNO, SECCOMP_RET_ERRNO_NAME }, + { SECCOMP_LOG_USER_NOTIF, SECCOMP_RET_USER_NOTIF_NAME }, { SECCOMP_LOG_TRACE, SECCOMP_RET_TRACE_NAME }, { SECCOMP_LOG_LOG, SECCOMP_RET_LOG_NAME }, { SECCOMP_LOG_ALLOW, SECCOMP_RET_ALLOW_NAME }, @@ -1342,3 +1520,259 @@ static int __init seccomp_sysctl_init(void) device_initcall(seccomp_sysctl_init) #endif /* CONFIG_SYSCTL */ + +#ifdef CONFIG_SECCOMP_FILTER +static int seccomp_notify_release(struct inode *inode, struct file *file) +{ + struct seccomp_filter *filter = file->private_data; + struct seccomp_knotif *knotif; + + mutex_lock(&filter->notify_lock); + + /* + * If this file is being closed because e.g. the task who owned it + * died, let's wake everyone up who was waiting on us. + */ + list_for_each_entry(knotif, &filter->notif->notifications, list) { + if (knotif->state == SECCOMP_NOTIFY_REPLIED) + continue; + + knotif->state = SECCOMP_NOTIFY_REPLIED; + knotif->error = -ENOSYS; + knotif->val = 0; + + complete(&knotif->ready); + } + + wake_up_all(&filter->notif->wqh); + kfree(filter->notif); + filter->notif = NULL; + mutex_unlock(&filter->notify_lock); + __put_seccomp_filter(filter); + return 0; +} + +static long seccomp_notify_recv(struct seccomp_filter *filter, + unsigned long arg) +{ + struct seccomp_knotif *knotif = NULL, *cur; + struct seccomp_notif unotif = {}; + ssize_t ret; + u16 size; + void __user *buf = (void __user *)arg; + + if (copy_from_user(&size, buf, sizeof(size))) + return -EFAULT; + + ret = down_interruptible(&filter->notif->request); + if (ret < 0) + return ret; + + mutex_lock(&filter->notify_lock); + list_for_each_entry(cur, &filter->notif->notifications, list) { + if (cur->state == SECCOMP_NOTIFY_INIT) { + knotif = cur; + break; + } + } + + /* + * If we didn't find a notification, it could be that the task was + * interrupted between the time we were woken and when we were able to + * acquire the rw lock. + */ + if (!knotif) { + ret = -ENOENT; + goto out; + } + + size = min_t(size_t, size, sizeof(unotif)); + + unotif.len = size; + unotif.id = knotif->id; + unotif.pid = task_pid_vnr(knotif->task); + unotif.signaled = knotif->signaled; + unotif.data = *(knotif->data); + + if (copy_to_user(buf, &unotif, size)) { + ret = -EFAULT; + goto out; + } + + ret = size; + knotif->state = SECCOMP_NOTIFY_SENT; + wake_up_poll(&filter->notif->wqh, EPOLLOUT | EPOLLWRNORM); + + +out: + mutex_unlock(&filter->notify_lock); + return ret; +} + +static long seccomp_notify_send(struct seccomp_filter *filter, + unsigned long arg) +{ + struct seccomp_notif_resp resp = {}; + struct seccomp_knotif *knotif = NULL; + long ret; + u16 size; + void __user *buf = (void __user *)arg; + + if (copy_from_user(&size, buf, sizeof(size))) + return -EFAULT; + size = min_t(size_t, size, sizeof(resp)); + if (copy_from_user(&resp, buf, size)) + return -EFAULT; + + ret = mutex_lock_interruptible(&filter->notify_lock); + if (ret < 0) + return ret; + + list_for_each_entry(knotif, &filter->notif->notifications, list) { + if (knotif->id == resp.id) + break; + } + + if (!knotif || knotif->id != resp.id) { + ret = -ENOENT; + goto out; + } + + /* Allow exactly one reply. */ + if (knotif->state != SECCOMP_NOTIFY_SENT) { + ret = -EINPROGRESS; + goto out; + } + + ret = size; + knotif->state = SECCOMP_NOTIFY_REPLIED; + knotif->error = resp.error; + knotif->val = resp.val; + complete(&knotif->ready); +out: + mutex_unlock(&filter->notify_lock); + return ret; +} + +static long seccomp_notify_id_valid(struct seccomp_filter *filter, + unsigned long arg) +{ + struct seccomp_knotif *knotif = NULL; + void __user *buf = (void __user *)arg; + u64 id; + long ret; + + if (copy_from_user(&id, buf, sizeof(id))) + return -EFAULT; + + ret = mutex_lock_interruptible(&filter->notify_lock); + if (ret < 0) + return ret; + + ret = -1; + list_for_each_entry(knotif, &filter->notif->notifications, list) { + if (knotif->id == id) { + ret = 0; + goto out; + } + } + +out: + mutex_unlock(&filter->notify_lock); + return ret; +} + +static long seccomp_notify_ioctl(struct file *file, unsigned int cmd, + unsigned long arg) +{ + struct seccomp_filter *filter = file->private_data; + + switch (cmd) { + case SECCOMP_NOTIF_RECV: + return seccomp_notify_recv(filter, arg); + case SECCOMP_NOTIF_SEND: + return seccomp_notify_send(filter, arg); + case SECCOMP_NOTIF_ID_VALID: + return seccomp_notify_id_valid(filter, arg); + default: + return -EINVAL; + } +} + +static __poll_t seccomp_notify_poll(struct file *file, + struct poll_table_struct *poll_tab) +{ + struct seccomp_filter *filter = file->private_data; + __poll_t ret = 0; + struct seccomp_knotif *cur; + + poll_wait(file, &filter->notif->wqh, poll_tab); + + ret = mutex_lock_interruptible(&filter->notify_lock); + if (ret < 0) + return ret; + + list_for_each_entry(cur, &filter->notif->notifications, list) { + if (cur->state == SECCOMP_NOTIFY_INIT) + ret |= EPOLLIN | EPOLLRDNORM; + if (cur->state == SECCOMP_NOTIFY_SENT) + ret |= EPOLLOUT | EPOLLWRNORM; + if (ret & EPOLLIN && ret & EPOLLOUT) + break; + } + + mutex_unlock(&filter->notify_lock); + + return ret; +} + +static const struct file_operations seccomp_notify_ops = { + .poll = seccomp_notify_poll, + .release = seccomp_notify_release, + .unlocked_ioctl = seccomp_notify_ioctl, +}; + +static struct file *init_listener(struct task_struct *task, + struct seccomp_filter *filter) +{ + struct file *ret = ERR_PTR(-EBUSY); + struct seccomp_filter *cur, *last_locked = NULL; + int filter_nesting = 0; + + for (cur = task->seccomp.filter; cur; cur = cur->prev) { + mutex_lock_nested(&cur->notify_lock, filter_nesting); + filter_nesting++; + last_locked = cur; + if (cur->notif) + goto out; + } + + ret = ERR_PTR(-ENOMEM); + filter->notif = kzalloc(sizeof(*(filter->notif)), GFP_KERNEL); + if (!filter->notif) + goto out; + + sema_init(&filter->notif->request, 0); + INIT_LIST_HEAD(&filter->notif->notifications); + filter->notif->next_id = get_random_u64(); + init_waitqueue_head(&filter->notif->wqh); + + ret = anon_inode_getfile("seccomp notify", &seccomp_notify_ops, + filter, O_RDWR); + if (IS_ERR(ret)) + goto out; + + + /* The file has a reference to it now */ + __get_seccomp_filter(filter); + +out: + for (cur = task->seccomp.filter; cur; cur = cur->prev) { + mutex_unlock(&cur->notify_lock); + if (cur == last_locked) + break; + } + + return ret; +} +#endif diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index e1473234968d..5f4b836a6792 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -5,6 +5,7 @@ * Test code for seccomp bpf. */ +#define _GNU_SOURCE #include /* @@ -40,10 +41,12 @@ #include #include #include +#include +#include -#define _GNU_SOURCE #include #include +#include #include "../kselftest_harness.h" @@ -154,6 +157,34 @@ struct seccomp_metadata { }; #endif +#ifndef SECCOMP_FILTER_FLAG_NEW_LISTENER +#define SECCOMP_FILTER_FLAG_NEW_LISTENER (1UL << 3) + +#define SECCOMP_RET_USER_NOTIF 0x7fc00000U + +#define SECCOMP_IOC_MAGIC 0xF7 +#define SECCOMP_NOTIF_RECV _IOWR(SECCOMP_IOC_MAGIC, 0, \ + struct seccomp_notif) +#define SECCOMP_NOTIF_SEND _IOWR(SECCOMP_IOC_MAGIC, 1, \ + struct seccomp_notif_resp) +#define SECCOMP_NOTIF_ID_VALID _IOR(SECCOMP_IOC_MAGIC, 2, \ + __u64) +struct seccomp_notif { + __u16 len; + __u64 id; + __u32 pid; + __u8 signaled; + struct seccomp_data data; +}; + +struct seccomp_notif_resp { + __u16 len; + __u64 id; + __s32 error; + __s64 val; +}; +#endif + #ifndef seccomp int seccomp(unsigned int op, unsigned int flags, void *args) { @@ -2077,7 +2108,8 @@ TEST(detect_seccomp_filter_flags) { unsigned int flags[] = { SECCOMP_FILTER_FLAG_TSYNC, SECCOMP_FILTER_FLAG_LOG, - SECCOMP_FILTER_FLAG_SPEC_ALLOW }; + SECCOMP_FILTER_FLAG_SPEC_ALLOW, + SECCOMP_FILTER_FLAG_NEW_LISTENER }; unsigned int flag, all_flags; int i; long ret; @@ -2933,6 +2965,383 @@ TEST(get_metadata) ASSERT_EQ(0, kill(pid, SIGKILL)); } +static int user_trap_syscall(int nr, unsigned int flags) +{ + struct sock_filter filter[] = { + BPF_STMT(BPF_LD+BPF_W+BPF_ABS, + offsetof(struct seccomp_data, nr)), + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, nr, 0, 1), + BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_USER_NOTIF), + BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW), + }; + + struct sock_fprog prog = { + .len = (unsigned short)ARRAY_SIZE(filter), + .filter = filter, + }; + + return seccomp(SECCOMP_SET_MODE_FILTER, flags, &prog); +} + +static int read_notif(int listener, struct seccomp_notif *req) +{ + int ret; + + do { + errno = 0; + req->len = sizeof(*req); + ret = ioctl(listener, SECCOMP_NOTIF_RECV, req); + } while (ret == -1 && errno == ENOENT); + return ret; +} + +static void signal_handler(int signal) +{ +} + +#define USER_NOTIF_MAGIC 116983961184613L +TEST(get_user_notification_syscall) +{ + pid_t pid; + long ret; + int status, listener; + struct seccomp_notif req = {}; + struct seccomp_notif_resp resp = {}; + struct pollfd pollfd; + + struct sock_filter filter[] = { + BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW), + }; + struct sock_fprog prog = { + .len = (unsigned short)ARRAY_SIZE(filter), + .filter = filter, + }; + + pid = fork(); + ASSERT_GE(pid, 0); + + /* Check that we get -ENOSYS with no listener attached */ + if (pid == 0) { + if (user_trap_syscall(__NR_getpid, 0) < 0) + exit(1); + ret = syscall(__NR_getpid); + exit(ret >= 0 || errno != ENOSYS); + } + + EXPECT_EQ(waitpid(pid, &status, 0), pid); + EXPECT_EQ(true, WIFEXITED(status)); + EXPECT_EQ(0, WEXITSTATUS(status)); + + /* Add some no-op filters so that we (don't) trigger lockdep. */ + EXPECT_EQ(seccomp(SECCOMP_SET_MODE_FILTER, 0, &prog), 0); + EXPECT_EQ(seccomp(SECCOMP_SET_MODE_FILTER, 0, &prog), 0); + EXPECT_EQ(seccomp(SECCOMP_SET_MODE_FILTER, 0, &prog), 0); + EXPECT_EQ(seccomp(SECCOMP_SET_MODE_FILTER, 0, &prog), 0); + + /* Check that the basic notification machinery works */ + listener = user_trap_syscall(__NR_getpid, + SECCOMP_FILTER_FLAG_NEW_LISTENER); + EXPECT_GE(listener, 0); + + /* Installing a second listener in the chain should EBUSY */ + EXPECT_EQ(user_trap_syscall(__NR_getpid, + SECCOMP_FILTER_FLAG_NEW_LISTENER), + -1); + EXPECT_EQ(errno, EBUSY); + + pid = fork(); + ASSERT_GE(pid, 0); + + if (pid == 0) { + ret = syscall(__NR_getpid); + exit(ret != USER_NOTIF_MAGIC); + } + + pollfd.fd = listener; + pollfd.events = POLLIN | POLLOUT; + + EXPECT_GT(poll(&pollfd, 1, -1), 0); + EXPECT_EQ(pollfd.revents, POLLIN); + + req.len = sizeof(req); + EXPECT_EQ(ioctl(listener, SECCOMP_NOTIF_RECV, &req), sizeof(req)); + + pollfd.fd = listener; + pollfd.events = POLLIN | POLLOUT; + + EXPECT_GT(poll(&pollfd, 1, -1), 0); + EXPECT_EQ(pollfd.revents, POLLOUT); + + EXPECT_EQ(req.data.nr, __NR_getpid); + + resp.len = sizeof(resp); + resp.id = req.id; + resp.error = 0; + resp.val = USER_NOTIF_MAGIC; + + EXPECT_EQ(ioctl(listener, SECCOMP_NOTIF_SEND, &resp), sizeof(resp)); + + EXPECT_EQ(waitpid(pid, &status, 0), pid); + EXPECT_EQ(true, WIFEXITED(status)); + EXPECT_EQ(0, WEXITSTATUS(status)); + + /* + * Check that nothing bad happens when we kill the task in the middle + * of a syscall. + */ + pid = fork(); + ASSERT_GE(pid, 0); + + if (pid == 0) { + ret = syscall(__NR_getpid); + exit(ret != USER_NOTIF_MAGIC); + } + + EXPECT_EQ(ioctl(listener, SECCOMP_NOTIF_RECV, &req), sizeof(req)); + EXPECT_EQ(ioctl(listener, SECCOMP_NOTIF_ID_VALID, &req.id), 0); + + EXPECT_EQ(kill(pid, SIGKILL), 0); + EXPECT_EQ(waitpid(pid, NULL, 0), pid); + + EXPECT_EQ(ioctl(listener, SECCOMP_NOTIF_ID_VALID, &req.id), -1); + + resp.id = req.id; + ret = ioctl(listener, SECCOMP_NOTIF_SEND, &resp); + EXPECT_EQ(ret, -1); + EXPECT_EQ(errno, ENOENT); + + /* + * Check that we get another notification about a signal in the middle + * of a syscall. + */ + pid = fork(); + ASSERT_GE(pid, 0); + + if (pid == 0) { + if (signal(SIGUSR1, signal_handler) == SIG_ERR) { + perror("signal"); + exit(1); + } + ret = syscall(__NR_getpid); + exit(ret != USER_NOTIF_MAGIC); + } + + ret = read_notif(listener, &req); + EXPECT_EQ(ret, sizeof(req)); + EXPECT_EQ(errno, 0); + + EXPECT_EQ(kill(pid, SIGUSR1), 0); + + ret = read_notif(listener, &req); + EXPECT_EQ(req.signaled, 1); + EXPECT_EQ(ret, sizeof(req)); + EXPECT_EQ(errno, 0); + + resp.len = sizeof(resp); + resp.id = req.id; + resp.error = -512; /* -ERESTARTSYS */ + resp.val = 0; + + EXPECT_EQ(ioctl(listener, SECCOMP_NOTIF_SEND, &resp), sizeof(resp)); + + ret = read_notif(listener, &req); + resp.len = sizeof(resp); + resp.id = req.id; + resp.error = 0; + resp.val = USER_NOTIF_MAGIC; + ret = ioctl(listener, SECCOMP_NOTIF_SEND, &resp); + EXPECT_EQ(ret, sizeof(resp)); + EXPECT_EQ(errno, 0); + + EXPECT_EQ(waitpid(pid, &status, 0), pid); + EXPECT_EQ(true, WIFEXITED(status)); + EXPECT_EQ(0, WEXITSTATUS(status)); + + /* + * Check that we get an ENOSYS when the listener is closed. + */ + pid = fork(); + ASSERT_GE(pid, 0); + if (pid == 0) { + close(listener); + ret = syscall(__NR_getpid); + exit(ret != -1 && errno != ENOSYS); + } + + close(listener); + + EXPECT_EQ(waitpid(pid, &status, 0), pid); + EXPECT_EQ(true, WIFEXITED(status)); + EXPECT_EQ(0, WEXITSTATUS(status)); +} + +/* + * Check that a pid in a child namespace still shows up as valid in ours. + */ +TEST(user_notification_child_pid_ns) +{ + pid_t pid; + int status, listener; + int sk_pair[2]; + char c; + struct seccomp_notif req = {}; + struct seccomp_notif_resp resp = {}; + + ASSERT_EQ(socketpair(PF_LOCAL, SOCK_SEQPACKET, 0, sk_pair), 0); + ASSERT_EQ(unshare(CLONE_NEWPID), 0); + + pid = fork(); + ASSERT_GE(pid, 0); + + if (pid == 0) { + EXPECT_EQ(user_trap_syscall(__NR_getpid, 0), 0); + + /* Signal we're ready and have installed the filter. */ + EXPECT_EQ(write(sk_pair[1], "J", 1), 1); + + EXPECT_EQ(read(sk_pair[1], &c, 1), 1); + EXPECT_EQ(c, 'H'); + + exit(syscall(__NR_getpid) != USER_NOTIF_MAGIC); + } + + EXPECT_EQ(read(sk_pair[0], &c, 1), 1); + EXPECT_EQ(c, 'J'); + + EXPECT_EQ(ptrace(PTRACE_ATTACH, pid), 0); + EXPECT_EQ(waitpid(pid, NULL, 0), pid); + listener = ptrace(PTRACE_SECCOMP_NEW_LISTENER, pid, 0); + EXPECT_GE(listener, 0); + EXPECT_EQ(ptrace(PTRACE_DETACH, pid, NULL, 0), 0); + + /* Now signal we are done and respond with magic */ + EXPECT_EQ(write(sk_pair[0], "H", 1), 1); + + req.len = sizeof(req); + EXPECT_EQ(ioctl(listener, SECCOMP_NOTIF_RECV, &req), sizeof(req)); + EXPECT_EQ(req.pid, pid); + + resp.len = sizeof(resp); + resp.id = req.id; + resp.error = 0; + resp.val = USER_NOTIF_MAGIC; + + EXPECT_EQ(ioctl(listener, SECCOMP_NOTIF_SEND, &resp), sizeof(resp)); + + EXPECT_EQ(waitpid(pid, &status, 0), pid); + EXPECT_EQ(true, WIFEXITED(status)); + EXPECT_EQ(0, WEXITSTATUS(status)); + close(listener); +} + +/* + * Check that a pid in a sibling (i.e. unrelated) namespace shows up as 0, i.e. + * invalid. + */ +TEST(user_notification_sibling_pid_ns) +{ + pid_t pid, pid2; + int status, listener; + int sk_pair[2]; + char c; + struct seccomp_notif req = {}; + struct seccomp_notif_resp resp = {}; + + ASSERT_EQ(socketpair(PF_LOCAL, SOCK_SEQPACKET, 0, sk_pair), 0); + + pid = fork(); + ASSERT_GE(pid, 0); + + if (pid == 0) { + int child_pair[2]; + + ASSERT_EQ(unshare(CLONE_NEWPID), 0); + + ASSERT_EQ(socketpair(PF_LOCAL, SOCK_SEQPACKET, 0, child_pair), 0); + + pid2 = fork(); + ASSERT_GE(pid2, 0); + + if (pid2 == 0) { + close(child_pair[0]); + EXPECT_EQ(user_trap_syscall(__NR_getpid, 0), 0); + + /* Signal we're ready and have installed the filter. */ + EXPECT_EQ(write(child_pair[1], "J", 1), 1); + + EXPECT_EQ(read(child_pair[1], &c, 1), 1); + EXPECT_EQ(c, 'H'); + + exit(syscall(__NR_getpid) != USER_NOTIF_MAGIC); + } + + /* check that child has installed the filter */ + EXPECT_EQ(read(child_pair[0], &c, 1), 1); + EXPECT_EQ(c, 'J'); + + /* tell parent who child is */ + EXPECT_EQ(write(sk_pair[1], &pid2, sizeof(pid2)), sizeof(pid2)); + + /* parent has installed listener, tell child to call syscall */ + EXPECT_EQ(read(sk_pair[1], &c, 1), 1); + EXPECT_EQ(c, 'H'); + EXPECT_EQ(write(child_pair[0], "H", 1), 1); + + EXPECT_EQ(waitpid(pid2, &status, 0), pid2); + EXPECT_EQ(true, WIFEXITED(status)); + EXPECT_EQ(0, WEXITSTATUS(status)); + exit(WEXITSTATUS(status)); + } + + EXPECT_EQ(read(sk_pair[0], &pid2, sizeof(pid2)), sizeof(pid2)); + + EXPECT_EQ(ptrace(PTRACE_ATTACH, pid2), 0); + EXPECT_EQ(waitpid(pid2, NULL, 0), pid2); + listener = ptrace(PTRACE_SECCOMP_NEW_LISTENER, pid2, 0); + EXPECT_GE(listener, 0); + EXPECT_EQ(errno, 0); + EXPECT_EQ(ptrace(PTRACE_DETACH, pid2, NULL, 0), 0); + + /* Create the sibling ns, and sibling in it. */ + EXPECT_EQ(unshare(CLONE_NEWPID), 0); + EXPECT_EQ(errno, 0); + + pid2 = fork(); + EXPECT_GE(pid2, 0); + + if (pid2 == 0) { + req.len = sizeof(req); + ASSERT_EQ(ioctl(listener, SECCOMP_NOTIF_RECV, &req), sizeof(req)); + /* + * The pid should be 0, i.e. the task is in some namespace that + * we can't "see". + */ + ASSERT_EQ(req.pid, 0); + + resp.len = sizeof(resp); + resp.id = req.id; + resp.error = 0; + resp.val = USER_NOTIF_MAGIC; + + ASSERT_EQ(ioctl(listener, SECCOMP_NOTIF_SEND, &resp), sizeof(resp)); + exit(0); + } + + close(listener); + + /* Now signal we are done setting up sibling listener. */ + EXPECT_EQ(write(sk_pair[0], "H", 1), 1); + + EXPECT_EQ(waitpid(pid, &status, 0), pid); + EXPECT_EQ(true, WIFEXITED(status)); + EXPECT_EQ(0, WEXITSTATUS(status)); + + EXPECT_EQ(waitpid(pid2, &status, 0), pid2); + EXPECT_EQ(true, WIFEXITED(status)); + EXPECT_EQ(0, WEXITSTATUS(status)); +} + + /* * TODO: * - add microbenchmarks From patchwork Thu Sep 27 15:11:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tycho Andersen X-Patchwork-Id: 10618145 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DE3B916B1 for ; Thu, 27 Sep 2018 15:12:19 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CE8982BA16 for ; Thu, 27 Sep 2018 15:12:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C2BDB2BA23; Thu, 27 Sep 2018 15:12:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B6A4C2BA16 for ; Thu, 27 Sep 2018 15:12:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728193AbeI0VaY (ORCPT ); Thu, 27 Sep 2018 17:30:24 -0400 Received: from mail-pf1-f196.google.com ([209.85.210.196]:40099 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728049AbeI0VaX (ORCPT ); Thu, 27 Sep 2018 17:30:23 -0400 Received: by mail-pf1-f196.google.com with SMTP id s5-v6so2130191pfj.7 for ; Thu, 27 Sep 2018 08:11:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tycho-ws.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=5SJku3WcDtMpjnS0yXcbc9iM4AoahPeyBHdtmmQOzZY=; b=PQVjE0K7zokPLF7dhzi5BnwFXoh4Lt9wHjDuBTkr9j2sx0feg6KIe7nnbsDZkcXj21 DBH610w5XIBCiRzWq8Tte4NREpxeyczILxyEPx96VdkxwekVeDGycylwG5qXfSCXkYUg +6XmV8SizoXMisaEFUW6mgh+9Wzd4Ffwkcayq8BszlokTtIH9tGy+/yziz8d1MCYwOwF 1DIQ9a7L9Fg80+iVPqke6VQffvZae8Zh8Iw3TdAZ9VG5feWMuZBPfgKHZTeC9Eupgb4H xoZTjJsvqGkSmsEE9Q4cY6NE/OPh4Lha8VC+ufxUpkQ7kSmIeMQ+WVgk+vJQW0GQn2Q0 E7/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=5SJku3WcDtMpjnS0yXcbc9iM4AoahPeyBHdtmmQOzZY=; b=CdW47Wo2norMG3PVdITugQO2lMLP07ZWDY9IM2otI3bI1/eMUaJ/uf5UYg1BtDE/6E zV+xTIlkeh81TX3vCsJYi7b9Rnjg9n7gyTM8bWuFsoeM0BoIVqrr04B9i6peAWYuKvgL YGzYqQ17hnsbvAoEQrCoKe30dYTX6pvN8lXvmHdj4mvkbuiUqB7mgy2dHUFeymmw2GIH q/P6e0zqExkJjwp/MxjpaPfCD/lpRZuB9qKvzHHtLg9FsyVADHEkMXapDBY1OxRq/6ae JwIrphrfL1MVVNEiS69yxgKXU+pC376zFD2Omfhg4vV/YJn+KCpnfNYogPpqj+eTDyf/ 6B4w== X-Gm-Message-State: ABuFfoiBD882n5SQNP2qFhW0JJ4fyUrGXyAPLnj82czoBbDZv4YgtiKx xmZ4vf+xVS9fDmdc6Rs9r+4NYw== X-Google-Smtp-Source: ACcGV61aLeOgzUclpCR9hmQjuj/BVtQOHdFkgvG81skl4CtF4Q4eR2v/p50Av8FnqN7KtRWSBYQYmQ== X-Received: by 2002:a63:34c7:: with SMTP id b190-v6mr10612274pga.184.1538061099787; Thu, 27 Sep 2018 08:11:39 -0700 (PDT) Received: from localhost.localdomain ([128.107.241.178]) by smtp.gmail.com with ESMTPSA id y19-v6sm5429610pff.14.2018.09.27.08.11.37 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 Sep 2018 08:11:38 -0700 (PDT) From: Tycho Andersen To: Kees Cook Cc: linux-kernel@vger.kernel.org, containers@lists.linux-foundation.org, linux-api@vger.kernel.org, Andy Lutomirski , Oleg Nesterov , "Eric W . Biederman" , "Serge E . Hallyn" , Christian Brauner , Tyler Hicks , Akihiro Suda , Jann Horn , linux-fsdevel@vger.kernel.org, Tycho Andersen Subject: [PATCH v7 2/6] seccomp: make get_nth_filter available outside of CHECKPOINT_RESTORE Date: Thu, 27 Sep 2018 09:11:15 -0600 Message-Id: <20180927151119.9989-3-tycho@tycho.ws> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180927151119.9989-1-tycho@tycho.ws> References: <20180927151119.9989-1-tycho@tycho.ws> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP In the next commit we'll use this same mnemonic to get a listener for the nth filter, so we need it available outside of CHECKPOINT_RESTORE in the USER_NOTIFICATION case as well. v2: new in v2 v3: no changes v4: no changes v5: switch to CHECKPOINT_RESTORE || USER_NOTIFICATION to avoid warning when only CONFIG_SECCOMP_FILTER is enabled. v7: drop USER_NOTIFICATION bits Signed-off-by: Tycho Andersen CC: Kees Cook CC: Andy Lutomirski CC: Oleg Nesterov CC: Eric W. Biederman CC: "Serge E. Hallyn" CC: Christian Brauner CC: Tyler Hicks CC: Akihiro Suda Reviewed-by: Jann Horn Acked-by: Christian Brauner --- kernel/seccomp.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/kernel/seccomp.c b/kernel/seccomp.c index fa6fe9756c80..44a31ac8373a 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -1158,7 +1158,7 @@ long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter) return do_seccomp(op, 0, uargs); } -#if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_CHECKPOINT_RESTORE) +#if defined(CONFIG_SECCOMP_FILTER) static struct seccomp_filter *get_nth_filter(struct task_struct *task, unsigned long filter_off) { @@ -1205,6 +1205,7 @@ static struct seccomp_filter *get_nth_filter(struct task_struct *task, return filter; } +#if defined(CONFIG_CHECKPOINT_RESTORE) long seccomp_get_filter(struct task_struct *task, unsigned long filter_off, void __user *data) { @@ -1277,7 +1278,8 @@ long seccomp_get_metadata(struct task_struct *task, __put_seccomp_filter(filter); return ret; } -#endif +#endif /* CONFIG_CHECKPOINT_RESTORE */ +#endif /* CONFIG_SECCOMP_FILTER */ #ifdef CONFIG_SYSCTL From patchwork Thu Sep 27 15:11:16 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tycho Andersen X-Patchwork-Id: 10618143 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 38AE6112B for ; Thu, 27 Sep 2018 15:12:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2818D2B9FA for ; Thu, 27 Sep 2018 15:12:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1AA872BA16; Thu, 27 Sep 2018 15:12:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 244872BA13 for ; Thu, 27 Sep 2018 15:12:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728187AbeI0Va0 (ORCPT ); Thu, 27 Sep 2018 17:30:26 -0400 Received: from mail-pf1-f196.google.com ([209.85.210.196]:34124 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728197AbeI0VaZ (ORCPT ); Thu, 27 Sep 2018 17:30:25 -0400 Received: by mail-pf1-f196.google.com with SMTP id k19-v6so2149386pfi.1 for ; Thu, 27 Sep 2018 08:11:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tycho-ws.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=wNIoNeuEr4DC/hzokXKQUdZjVide0qCEJHCpOpWRb3g=; b=v2287NVKfZ1AV7Es+Op1iUObQpp3AAdXUr+2IX60ZTdR1lk+tRAdS0Ssru3KKW/AVn NHPDpnNCTFH63offd95J30ulUKuT+VhOAgUO0StLOXmyolTzx/x9qCCxRXkOd8HPocSc /y6NzcENsy0hc4Mk6O90DC41DYTG1UmUn+MKLl3cSgJXSA9CGKVZwPUq59q8zLXvEEEM djSaGdmPlbxEi1pRxcbcxGxtpWMiBcWrpa+R0FW22J0CTr9WhXgk3hb7viUdHqZX5gsB jqGg2hugwZgLYVHmcjqr5O8flMmgPa6qxFiAhqW9di+/zyGXmI+zFwoyohHQCUUOeYWR f5Ug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=wNIoNeuEr4DC/hzokXKQUdZjVide0qCEJHCpOpWRb3g=; b=tC1mT9/JkTpg5CW0lDOpw7F9ZjPMhbVD7zM/EZqfLZ9dnSnBBUy1b9C7IQj7vlHxok qLFJLVx68W6KbiVBdKshqVyyJd/uDssSSiNly59UKwW7NPSvyYI8K5z0R7PINlyLYuDE AGgvMSNeqhQRASH4utkzl7ejcPxmxVsOVkYFJYoXUZ5AzZimoeYju46ro3qpKPy3JQMd RU2OXG1vP1RDsm4r+vl8HxwEERlix2rUqS+zcpf1sNsrew72YS1umPHdJ8vf7Eo1JM9j xSq+eGLbCqKscKWHcADT0EZW8u/hGpFJpgFqz5QrnKkCsQHG+cmh4JiAKHzhj9QJtKzC SjIw== X-Gm-Message-State: ABuFfoges9f2Gzs7/H7bdXD8xfp2yGM9yW2cL4SbULe6IvBKgeU4Q4+U 5xzgvajZXKOZH7VrxZPpxbzijg== X-Google-Smtp-Source: ACcGV60XoHDrqxNtisM/ZjyUk1Aou+xrypoQdUZ/qw0jqndPQChEXu20UwrnVXxhc4kI+lplnlN62w== X-Received: by 2002:a17:902:163:: with SMTP id 90-v6mr11559790plb.322.1538061101838; Thu, 27 Sep 2018 08:11:41 -0700 (PDT) Received: from localhost.localdomain ([128.107.241.178]) by smtp.gmail.com with ESMTPSA id y19-v6sm5429610pff.14.2018.09.27.08.11.39 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 Sep 2018 08:11:40 -0700 (PDT) From: Tycho Andersen To: Kees Cook Cc: linux-kernel@vger.kernel.org, containers@lists.linux-foundation.org, linux-api@vger.kernel.org, Andy Lutomirski , Oleg Nesterov , "Eric W . Biederman" , "Serge E . Hallyn" , Christian Brauner , Tyler Hicks , Akihiro Suda , Jann Horn , linux-fsdevel@vger.kernel.org, Tycho Andersen Subject: [PATCH v7 3/6] seccomp: add a way to get a listener fd from ptrace Date: Thu, 27 Sep 2018 09:11:16 -0600 Message-Id: <20180927151119.9989-4-tycho@tycho.ws> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180927151119.9989-1-tycho@tycho.ws> References: <20180927151119.9989-1-tycho@tycho.ws> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP As an alternative to SECCOMP_FILTER_FLAG_GET_LISTENER, perhaps a ptrace() version which can acquire filters is useful. There are at least two reasons this is preferable, even though it uses ptrace: 1. You can control tasks that aren't cooperating with you 2. You can control tasks whose filters block sendmsg() and socket(); if the task installs a filter which blocks these calls, there's no way with SECCOMP_FILTER_FLAG_GET_LISTENER to get the fd out to the privileged task. v2: fix a bug where listener mode was not unset when an unused fd was not available v3: fix refcounting bug (Oleg) v4: * change the listener's fd flags to be 0 * rename GET_LISTENER to NEW_LISTENER (Matthew) v5: * add capable(CAP_SYS_ADMIN) requirement v7: * point the new listener at the right filter (Jann) Signed-off-by: Tycho Andersen CC: Kees Cook CC: Andy Lutomirski CC: Oleg Nesterov CC: Eric W. Biederman CC: "Serge E. Hallyn" CC: Christian Brauner CC: Tyler Hicks CC: Akihiro Suda Reviewed-by: Jann Horn --- include/linux/seccomp.h | 7 ++ include/uapi/linux/ptrace.h | 2 + kernel/ptrace.c | 4 ++ kernel/seccomp.c | 31 +++++++++ tools/testing/selftests/seccomp/seccomp_bpf.c | 68 +++++++++++++++++++ 5 files changed, 112 insertions(+) diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h index 017444b5efed..234c61b37405 100644 --- a/include/linux/seccomp.h +++ b/include/linux/seccomp.h @@ -83,6 +83,8 @@ static inline int seccomp_mode(struct seccomp *s) #ifdef CONFIG_SECCOMP_FILTER extern void put_seccomp_filter(struct task_struct *tsk); extern void get_seccomp_filter(struct task_struct *tsk); +extern long seccomp_new_listener(struct task_struct *task, + unsigned long filter_off); #else /* CONFIG_SECCOMP_FILTER */ static inline void put_seccomp_filter(struct task_struct *tsk) { @@ -92,6 +94,11 @@ static inline void get_seccomp_filter(struct task_struct *tsk) { return; } +static inline long seccomp_new_listener(struct task_struct *task, + unsigned long filter_off) +{ + return -EINVAL; +} #endif /* CONFIG_SECCOMP_FILTER */ #if defined(CONFIG_SECCOMP_FILTER) && defined(CONFIG_CHECKPOINT_RESTORE) diff --git a/include/uapi/linux/ptrace.h b/include/uapi/linux/ptrace.h index d5a1b8a492b9..e80ecb1bd427 100644 --- a/include/uapi/linux/ptrace.h +++ b/include/uapi/linux/ptrace.h @@ -73,6 +73,8 @@ struct seccomp_metadata { __u64 flags; /* Output: filter's flags */ }; +#define PTRACE_SECCOMP_NEW_LISTENER 0x420e + /* Read signals from a shared (process wide) queue */ #define PTRACE_PEEKSIGINFO_SHARED (1 << 0) diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 21fec73d45d4..289960ac181b 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -1096,6 +1096,10 @@ int ptrace_request(struct task_struct *child, long request, ret = seccomp_get_metadata(child, addr, datavp); break; + case PTRACE_SECCOMP_NEW_LISTENER: + ret = seccomp_new_listener(child, addr); + break; + default: break; } diff --git a/kernel/seccomp.c b/kernel/seccomp.c index 44a31ac8373a..17685803a2af 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -1777,4 +1777,35 @@ static struct file *init_listener(struct task_struct *task, return ret; } + +long seccomp_new_listener(struct task_struct *task, + unsigned long filter_off) +{ + struct seccomp_filter *filter; + struct file *listener; + int fd; + + if (!capable(CAP_SYS_ADMIN)) + return -EACCES; + + filter = get_nth_filter(task, filter_off); + if (IS_ERR(filter)) + return PTR_ERR(filter); + + fd = get_unused_fd_flags(0); + if (fd < 0) { + __put_seccomp_filter(filter); + return fd; + } + + listener = init_listener(task, filter); + __put_seccomp_filter(filter); + if (IS_ERR(listener)) { + put_unused_fd(fd); + return PTR_ERR(listener); + } + + fd_install(fd, listener); + return fd; +} #endif diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index 5f4b836a6792..c6ba3ed5392e 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -193,6 +193,10 @@ int seccomp(unsigned int op, unsigned int flags, void *args) } #endif +#ifndef PTRACE_SECCOMP_NEW_LISTENER +#define PTRACE_SECCOMP_NEW_LISTENER 0x420e +#endif + #if __BYTE_ORDER == __LITTLE_ENDIAN #define syscall_arg(_n) (offsetof(struct seccomp_data, args[_n])) #elif __BYTE_ORDER == __BIG_ENDIAN @@ -3175,6 +3179,70 @@ TEST(get_user_notification_syscall) EXPECT_EQ(0, WEXITSTATUS(status)); } +TEST(get_user_notification_ptrace) +{ + pid_t pid; + int status, listener; + int sk_pair[2]; + char c; + struct seccomp_notif req = {}; + struct seccomp_notif_resp resp = {}; + + ASSERT_EQ(socketpair(PF_LOCAL, SOCK_SEQPACKET, 0, sk_pair), 0); + + pid = fork(); + ASSERT_GE(pid, 0); + + if (pid == 0) { + EXPECT_EQ(user_trap_syscall(__NR_getpid, 0), 0); + + /* Test that we get ENOSYS while not attached */ + EXPECT_EQ(syscall(__NR_getpid), -1); + EXPECT_EQ(errno, ENOSYS); + + /* Signal we're ready and have installed the filter. */ + EXPECT_EQ(write(sk_pair[1], "J", 1), 1); + + EXPECT_EQ(read(sk_pair[1], &c, 1), 1); + EXPECT_EQ(c, 'H'); + + exit(syscall(__NR_getpid) != USER_NOTIF_MAGIC); + } + + EXPECT_EQ(read(sk_pair[0], &c, 1), 1); + EXPECT_EQ(c, 'J'); + + EXPECT_EQ(ptrace(PTRACE_ATTACH, pid), 0); + EXPECT_EQ(waitpid(pid, NULL, 0), pid); + listener = ptrace(PTRACE_SECCOMP_NEW_LISTENER, pid, 0); + EXPECT_GE(listener, 0); + + /* EBUSY for second listener */ + EXPECT_EQ(ptrace(PTRACE_SECCOMP_NEW_LISTENER, pid, 0), -1); + EXPECT_EQ(errno, EBUSY); + + EXPECT_EQ(ptrace(PTRACE_DETACH, pid, NULL, 0), 0); + + /* Now signal we are done and respond with magic */ + EXPECT_EQ(write(sk_pair[0], "H", 1), 1); + + req.len = sizeof(req); + EXPECT_EQ(ioctl(listener, SECCOMP_NOTIF_RECV, &req), sizeof(req)); + + resp.len = sizeof(resp); + resp.id = req.id; + resp.error = 0; + resp.val = USER_NOTIF_MAGIC; + + EXPECT_EQ(ioctl(listener, SECCOMP_NOTIF_SEND, &resp), sizeof(resp)); + + EXPECT_EQ(waitpid(pid, &status, 0), pid); + EXPECT_EQ(true, WIFEXITED(status)); + EXPECT_EQ(0, WEXITSTATUS(status)); + + close(listener); +} + /* * Check that a pid in a child namespace still shows up as valid in ours. */ From patchwork Thu Sep 27 15:11:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tycho Andersen X-Patchwork-Id: 10618141 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4505416B1 for ; Thu, 27 Sep 2018 15:12:08 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 33E7E2B9FA for ; Thu, 27 Sep 2018 15:12:08 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2730B2BA13; Thu, 27 Sep 2018 15:12:08 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B068A2BA17 for ; Thu, 27 Sep 2018 15:12:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728238AbeI0Va1 (ORCPT ); Thu, 27 Sep 2018 17:30:27 -0400 Received: from mail-pg1-f194.google.com ([209.85.215.194]:36536 "EHLO mail-pg1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728251AbeI0Va1 (ORCPT ); Thu, 27 Sep 2018 17:30:27 -0400 Received: by mail-pg1-f194.google.com with SMTP id d1-v6so2195972pgo.3 for ; Thu, 27 Sep 2018 08:11:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tycho-ws.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=RBPw+7zsUoOhRuxaXkMx/xgeCphN9zUCYK/fAuidvzw=; b=1R1tcronSS1Fe6VHU5W/mOkLlckQ9Lzqv/La0Qs7npCSYotGoA2lWE29/6twoAvm1u A9KHoOZf3rsCqlHQiArh/dqs+8jNNrQcLW/uujiz4nqs5AKS5eBquImM9pz//Qvq/X0P 4/AYXjFr5BWkeyq0JwsRxCVrD/88loz/1F4xWanocQqRuC+3y7jcKOge064BPHMC6Tkt 3TxFSDacIAF9MxPKC8wIXGIRAX9BkSX4Y+vJ0R+026gwvTNehwT1ZtAnDPwXYRsqzDUM 0i4S/RDXh8KWIFQo88aHRd985WPa7Jr3moR6zMYHdWMVE1XGJ0luAJWVJrpX+reFu4IB xXhA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=RBPw+7zsUoOhRuxaXkMx/xgeCphN9zUCYK/fAuidvzw=; b=TzrvDoaI/bK94CeXinQ2iInTtxIoQkCJzNtNIOcjB/GHhesXwcrMokZnvtEO3vFOnF 7T1KVrNVTwLdn/sv31p66KL8JufshB/S88rUhSrYhGVggXIHJFFkvEjq7NW6CATqMViX v9xi/HtUNCCToTntya6twEd56XszQWJgqG11uuXFYlvabKAxkV0GS3HNqZmSYzwvY/FQ 2ieL/LPQ6KHUY+J0cSmbZ9a2eLWDBfEEVB7osK0SuACC2mc3rIpYXR5iQiaRMx9aBdfk eFsnxYIKQOdecqwuQ4XRth7j7fg1a8Q9KQ0KJXdEIkhfEYmloumMTGNKJZhVGcAvusM5 gRhQ== X-Gm-Message-State: ABuFfogu5Dwr9LmISh99Zs3OqQULFEk4Z1o0Cv0+PSA0qjC89HmoVM0H QPB/SzLwiigGqASsAu7FeNnlgQ== X-Google-Smtp-Source: ACcGV63fSTxh4HlBCirYvTIW+fQLmpkBUoVvDThuvTnZuu8LU8mIhrdfrR8ELrymg+k26ZIyf5chAg== X-Received: by 2002:a63:1245:: with SMTP id 5-v6mr10676571pgs.299.1538061103958; Thu, 27 Sep 2018 08:11:43 -0700 (PDT) Received: from localhost.localdomain ([128.107.241.178]) by smtp.gmail.com with ESMTPSA id y19-v6sm5429610pff.14.2018.09.27.08.11.41 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 Sep 2018 08:11:43 -0700 (PDT) From: Tycho Andersen To: Kees Cook Cc: linux-kernel@vger.kernel.org, containers@lists.linux-foundation.org, linux-api@vger.kernel.org, Andy Lutomirski , Oleg Nesterov , "Eric W . Biederman" , "Serge E . Hallyn" , Christian Brauner , Tyler Hicks , Akihiro Suda , Jann Horn , linux-fsdevel@vger.kernel.org, Tycho Andersen , Alexander Viro Subject: [PATCH v7 4/6] files: add a replace_fd_files() function Date: Thu, 27 Sep 2018 09:11:17 -0600 Message-Id: <20180927151119.9989-5-tycho@tycho.ws> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180927151119.9989-1-tycho@tycho.ws> References: <20180927151119.9989-1-tycho@tycho.ws> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Similar to fd_install/__fd_install, we want to be able to replace an fd of an arbitrary struct files_struct, not just current's. We'll use this in the next patch to implement the seccomp ioctl that allows inserting fds into a stopped process' context. v7: new in v7 Signed-off-by: Tycho Andersen CC: Alexander Viro CC: Kees Cook CC: Andy Lutomirski CC: Oleg Nesterov CC: Eric W. Biederman CC: "Serge E. Hallyn" CC: Christian Brauner CC: Tyler Hicks CC: Akihiro Suda --- fs/file.c | 22 +++++++++++++++------- include/linux/file.h | 8 ++++++++ 2 files changed, 23 insertions(+), 7 deletions(-) diff --git a/fs/file.c b/fs/file.c index 7ffd6e9d103d..3b3c5aadaadb 100644 --- a/fs/file.c +++ b/fs/file.c @@ -850,24 +850,32 @@ __releases(&files->file_lock) } int replace_fd(unsigned fd, struct file *file, unsigned flags) +{ + return replace_fd_task(current, fd, file, flags); +} + +/* + * Same warning as __alloc_fd()/__fd_install() here. + */ +int replace_fd_task(struct task_struct *task, unsigned fd, + struct file *file, unsigned flags) { int err; - struct files_struct *files = current->files; if (!file) - return __close_fd(files, fd); + return __close_fd(task->files, fd); - if (fd >= rlimit(RLIMIT_NOFILE)) + if (fd >= task_rlimit(task, RLIMIT_NOFILE)) return -EBADF; - spin_lock(&files->file_lock); - err = expand_files(files, fd); + spin_lock(&task->files->file_lock); + err = expand_files(task->files, fd); if (unlikely(err < 0)) goto out_unlock; - return do_dup2(files, file, fd, flags); + return do_dup2(task->files, file, fd, flags); out_unlock: - spin_unlock(&files->file_lock); + spin_unlock(&task->files->file_lock); return err; } diff --git a/include/linux/file.h b/include/linux/file.h index 6b2fb032416c..f94277fee038 100644 --- a/include/linux/file.h +++ b/include/linux/file.h @@ -11,6 +11,7 @@ #include struct file; +struct task_struct; extern void fput(struct file *); @@ -79,6 +80,13 @@ static inline void fdput_pos(struct fd f) extern int f_dupfd(unsigned int from, struct file *file, unsigned flags); extern int replace_fd(unsigned fd, struct file *file, unsigned flags); +/* + * Warning! This is only safe if you know the owner of the files_struct is + * stopped outside syscall context. It's a very bad idea to use this unless you + * have similar guarantees in your code. + */ +extern int replace_fd_task(struct task_struct *task, unsigned fd, + struct file *file, unsigned flags); extern void set_close_on_exec(unsigned int fd, int flag); extern bool get_close_on_exec(unsigned int fd); extern int get_unused_fd_flags(unsigned flags); From patchwork Thu Sep 27 15:11:18 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tycho Andersen X-Patchwork-Id: 10618137 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EB08D112B for ; Thu, 27 Sep 2018 15:11:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DA93A2B9F1 for ; Thu, 27 Sep 2018 15:11:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CDE8B2BA10; Thu, 27 Sep 2018 15:11:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D49982BA07 for ; Thu, 27 Sep 2018 15:11:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728301AbeI0Vaa (ORCPT ); Thu, 27 Sep 2018 17:30:30 -0400 Received: from mail-pf1-f194.google.com ([209.85.210.194]:36841 "EHLO mail-pf1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727480AbeI0Va3 (ORCPT ); Thu, 27 Sep 2018 17:30:29 -0400 Received: by mail-pf1-f194.google.com with SMTP id b7-v6so2145402pfo.3 for ; Thu, 27 Sep 2018 08:11:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tycho-ws.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=D0c4pyDwFmUjQbrWpRi35g8ZBVegN6wTosLVIWzhntg=; b=hvQ+4o9KIsjo9OeJaYZlT6KrN9aHQW4QW7fBDFcAGaNCDV5NZR1n5j2e/hI901j8Ok qYzd/1VTZ+/IkNu7r4Xz9u6Fxcfx0V/2UDT8rswuGx02WPwVIxYQbymTLz11K3n+lduG DtE0ACz7NFJPgBcA7QwqC85PfX7q2BIaR32A22eS1QFOogMI/oAjHc8FwanfeugWgI5O SLN4MhoIbhO8ihEWxbxVcfHdqXEs0XnEY5jpPok8Is0KVP1I44oAsgNpYtLbU7qsVdM8 A514rO1g2MIdUHLprRXNTkYyFnSYK/7ju9dltTN6sjTP3dxzSPhNlVAVCx5tf3oEmidX UEgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=D0c4pyDwFmUjQbrWpRi35g8ZBVegN6wTosLVIWzhntg=; b=NCmw72zNH9nBcuC70kiKcnW82ByZK3n5yRxb8flpGN6ZMNIdMtmFCE24srn7R39LUa 1P38OMeIGSpz+z4m9ruUKS3hEMRbA0+9XWrdxfWA+qxVGfA1mjgRoeAT9LxfKxn00Dol d1DRwRHf/KNFjvM9Ppgbb5tvcHIQ7D69DLQ1y1E9D6RIvILmfCh3Fw8Zl/fnugw4Q/+3 L3hUwdxCU7hxZ62ivizhhuad3jydlao1dOsmd48u0H3kKD3yhxbSWbyaqPm5/F61nY8z H+BpwhpxNU4OYp/YvayefgByXa62cGVW7r4UFIqfyrl/ZMv5hs8WUJ2JcUZ+gIaFgwAG jxeA== X-Gm-Message-State: ABuFfoi2tIWdmMbhIKB8VbBrYx2+1HAwE6/vHBek09WIwB4twTSSU1Qx eWf3e1cgHIMRaV5TFv1dAU52CQ== X-Google-Smtp-Source: ACcGV63bZ9ufrS8+ONCL4xatX4nnDBcCIQfEvKHS3jnth4pv7bQquUnzEe1zz+hIRWar5HHAWeCYjQ== X-Received: by 2002:a63:8742:: with SMTP id i63-v6mr10772300pge.27.1538061106126; Thu, 27 Sep 2018 08:11:46 -0700 (PDT) Received: from localhost.localdomain ([128.107.241.178]) by smtp.gmail.com with ESMTPSA id y19-v6sm5429610pff.14.2018.09.27.08.11.44 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 Sep 2018 08:11:45 -0700 (PDT) From: Tycho Andersen To: Kees Cook Cc: linux-kernel@vger.kernel.org, containers@lists.linux-foundation.org, linux-api@vger.kernel.org, Andy Lutomirski , Oleg Nesterov , "Eric W . Biederman" , "Serge E . Hallyn" , Christian Brauner , Tyler Hicks , Akihiro Suda , Jann Horn , linux-fsdevel@vger.kernel.org, Tycho Andersen Subject: [PATCH v7 5/6] seccomp: add a way to pass FDs via a notification fd Date: Thu, 27 Sep 2018 09:11:18 -0600 Message-Id: <20180927151119.9989-6-tycho@tycho.ws> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180927151119.9989-1-tycho@tycho.ws> References: <20180927151119.9989-1-tycho@tycho.ws> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch adds a way to insert FDs into the tracee's process (also close/overwrite fds for the tracee). This functionality is necessary to mock things like socketpair() or dup2() or similar, but since it depends on external (vfs) patches, I've left it as a separate patch as before so the core functionality can still be merged while we argue about this. Except this time it doesn't add any ugliness to the API :) v7: new in v7 Signed-off-by: Tycho Andersen CC: Kees Cook CC: Andy Lutomirski CC: Oleg Nesterov CC: Eric W. Biederman CC: "Serge E. Hallyn" CC: Christian Brauner CC: Tyler Hicks CC: Akihiro Suda --- .../userspace-api/seccomp_filter.rst | 16 +++ include/uapi/linux/seccomp.h | 9 ++ kernel/seccomp.c | 54 ++++++++ tools/testing/selftests/seccomp/seccomp_bpf.c | 126 ++++++++++++++++++ 4 files changed, 205 insertions(+) diff --git a/Documentation/userspace-api/seccomp_filter.rst b/Documentation/userspace-api/seccomp_filter.rst index d2e61f1c0a0b..383a8dbae304 100644 --- a/Documentation/userspace-api/seccomp_filter.rst +++ b/Documentation/userspace-api/seccomp_filter.rst @@ -237,6 +237,13 @@ The interface for a seccomp notification fd consists of two structures: __s64 val; }; + struct seccomp_notif_put_fd { + __u64 id; + __s32 fd; + __u32 fd_flags; + __s32 to_replace; + }; + Users can read via ``ioctl(SECCOMP_NOTIF_RECV)`` (or ``poll()``) on a seccomp notification fd to receive a ``struct seccomp_notif``, which contains five members: the input length of the structure, a unique-per-filter ``id``, the @@ -256,6 +263,15 @@ mentioned above in this document: all arguments being read from the tracee's memory should be read into the tracer's memory before any policy decisions are made. This allows for an atomic decision on syscall arguments. +Userspace can also insert (or overwrite) file descriptors of the tracee using +``ioctl(SECCOMP_NOTIF_PUT_FD)``. The ``id`` member is the request/pid to insert +the fd into. The ``fd`` is the fd in the listener's table to send or ``-1`` if +an fd should be closed instead. The ``to_replace`` fd is the fd in the tracee's +table that should be overwritten, or -1 if a new fd is installed. ``fd_flags`` +should be the flags that the fd in the tracee's table is opened with (e.g. +``O_CLOEXEC`` or similar). The return value from this ioctl is the fd number +that was installed. + Sysctls ======= diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h index d4ccb32fe089..91d77f041fbb 100644 --- a/include/uapi/linux/seccomp.h +++ b/include/uapi/linux/seccomp.h @@ -77,6 +77,13 @@ struct seccomp_notif_resp { __s64 val; }; +struct seccomp_notif_put_fd { + __u64 id; + __s32 fd; + __u32 fd_flags; + __s32 to_replace; +}; + #define SECCOMP_IOC_MAGIC 0xF7 /* Flags for seccomp notification fd ioctl. */ @@ -86,5 +93,7 @@ struct seccomp_notif_resp { struct seccomp_notif_resp) #define SECCOMP_NOTIF_ID_VALID _IOR(SECCOMP_IOC_MAGIC, 2, \ __u64) +#define SECCOMP_NOTIF_PUT_FD _IOR(SECCOMP_IOC_MAGIC, 3, \ + struct seccomp_notif_put_fd) #endif /* _UAPI_LINUX_SECCOMP_H */ diff --git a/kernel/seccomp.c b/kernel/seccomp.c index 17685803a2af..07a05ad59731 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -41,6 +41,8 @@ #include #include #include +#include +#include enum notify_state { SECCOMP_NOTIFY_INIT, @@ -1684,6 +1686,56 @@ static long seccomp_notify_id_valid(struct seccomp_filter *filter, return ret; } +static long seccomp_notify_put_fd(struct seccomp_filter *filter, + unsigned long arg) +{ + struct seccomp_notif_put_fd req; + void __user *buf = (void __user *)arg; + struct seccomp_knotif *knotif = NULL; + long ret; + + if (copy_from_user(&req, buf, sizeof(req))) + return -EFAULT; + + if (req.fd < 0 && req.to_replace < 0) + return -EINVAL; + + ret = mutex_lock_interruptible(&filter->notify_lock); + if (ret < 0) + return ret; + + ret = -ENOENT; + list_for_each_entry(knotif, &filter->notif->notifications, list) { + struct file *file = NULL; + + if (knotif->id != req.id) + continue; + + if (req.fd >= 0) + file = fget(req.fd); + + if (req.to_replace >= 0) { + ret = replace_fd_task(knotif->task, req.to_replace, + file, req.fd_flags); + } else { + unsigned long max_files; + + max_files = task_rlimit(knotif->task, RLIMIT_NOFILE); + ret = __alloc_fd(knotif->task->files, 0, max_files, + req.fd_flags); + if (ret < 0) + break; + + __fd_install(knotif->task->files, ret, file); + } + + break; + } + + mutex_unlock(&filter->notify_lock); + return ret; +} + static long seccomp_notify_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { @@ -1696,6 +1748,8 @@ static long seccomp_notify_ioctl(struct file *file, unsigned int cmd, return seccomp_notify_send(filter, arg); case SECCOMP_NOTIF_ID_VALID: return seccomp_notify_id_valid(filter, arg); + case SECCOMP_NOTIF_PUT_FD: + return seccomp_notify_put_fd(filter, arg); default: return -EINVAL; } diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index c6ba3ed5392e..cd1322c02b92 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -43,6 +43,7 @@ #include #include #include +#include #include #include @@ -169,6 +170,9 @@ struct seccomp_metadata { struct seccomp_notif_resp) #define SECCOMP_NOTIF_ID_VALID _IOR(SECCOMP_IOC_MAGIC, 2, \ __u64) +#define SECCOMP_NOTIF_PUT_FD _IOR(SECCOMP_IOC_MAGIC, 3, \ + struct seccomp_notif_put_fd) + struct seccomp_notif { __u16 len; __u64 id; @@ -183,6 +187,13 @@ struct seccomp_notif_resp { __s32 error; __s64 val; }; + +struct seccomp_notif_put_fd { + __u64 id; + __s32 fd; + __u32 fd_flags; + __s32 to_replace; +}; #endif #ifndef seccomp @@ -193,6 +204,14 @@ int seccomp(unsigned int op, unsigned int flags, void *args) } #endif +#ifndef kcmp +int kcmp(pid_t pid1, pid_t pid2, int type, unsigned long idx1, + unsigned long idx2) +{ + return syscall(__NR_kcmp, pid1, pid2, type, idx1, idx2); +} +#endif + #ifndef PTRACE_SECCOMP_NEW_LISTENER #define PTRACE_SECCOMP_NEW_LISTENER 0x420e #endif @@ -3243,6 +3262,113 @@ TEST(get_user_notification_ptrace) close(listener); } +TEST(user_notification_pass_fd) +{ + pid_t pid; + int status, listener, fd; + int sk_pair[2]; + char c; + struct seccomp_notif req = {}; + struct seccomp_notif_resp resp = {}; + struct seccomp_notif_put_fd putfd = {}; + long ret; + + ASSERT_EQ(socketpair(PF_LOCAL, SOCK_SEQPACKET, 0, sk_pair), 0); + + pid = fork(); + ASSERT_GE(pid, 0); + + if (pid == 0) { + int fd; + char buf[16]; + + EXPECT_EQ(user_trap_syscall(__NR_getpid, 0), 0); + + /* Signal we're ready and have installed the filter. */ + EXPECT_EQ(write(sk_pair[1], "J", 1), 1); + + EXPECT_EQ(read(sk_pair[1], &c, 1), 1); + EXPECT_EQ(c, 'H'); + close(sk_pair[1]); + + /* An fd from getpid(). Let the games begin. */ + fd = syscall(__NR_getpid); + EXPECT_GT(fd, 0); + EXPECT_EQ(read(fd, buf, sizeof(buf)), 12); + close(fd); + + exit(strcmp("hello world", buf)); + } + + EXPECT_EQ(read(sk_pair[0], &c, 1), 1); + EXPECT_EQ(c, 'J'); + + EXPECT_EQ(ptrace(PTRACE_ATTACH, pid), 0); + EXPECT_EQ(waitpid(pid, NULL, 0), pid); + listener = ptrace(PTRACE_SECCOMP_NEW_LISTENER, pid, 0); + EXPECT_GE(listener, 0); + EXPECT_EQ(ptrace(PTRACE_DETACH, pid, NULL, 0), 0); + + /* Now signal we are done installing so it can do a getpid */ + EXPECT_EQ(write(sk_pair[0], "H", 1), 1); + close(sk_pair[0]); + + /* Make a new socket pair so we can send half across */ + EXPECT_EQ(socketpair(PF_LOCAL, SOCK_SEQPACKET, 0, sk_pair), 0); + + ret = read_notif(listener, &req); + EXPECT_EQ(ret, sizeof(req)); + EXPECT_EQ(errno, 0); + + resp.len = sizeof(resp); + resp.id = req.id; + + putfd.id = req.id; + putfd.fd_flags = 0; + + /* First, let's just create a new fd with our stdout. */ + putfd.fd = 0; + putfd.to_replace = -1; + fd = ioctl(listener, SECCOMP_NOTIF_PUT_FD, &putfd); + EXPECT_GE(fd, 0); + EXPECT_EQ(kcmp(req.pid, getpid(), KCMP_FILE, fd, 0), 0); + + /* Dup something else over the top of it. */ + putfd.fd = sk_pair[1]; + putfd.to_replace = fd; + fd = ioctl(listener, SECCOMP_NOTIF_PUT_FD, &putfd); + EXPECT_GE(fd, 0); + EXPECT_EQ(kcmp(req.pid, getpid(), KCMP_FILE, fd, sk_pair[1]), 0); + + /* Now, try to close it. */ + putfd.fd = -1; + putfd.to_replace = fd; + fd = ioctl(listener, SECCOMP_NOTIF_PUT_FD, &putfd); + EXPECT_GE(fd, 0); + EXPECT_EQ(kcmp(req.pid, getpid(), KCMP_FILE, fd, sk_pair[1]), 1); + + /* Ok, we tried the three cases, now let's do what we really want. */ + putfd.fd = sk_pair[1]; + putfd.to_replace = -1; + fd = ioctl(listener, SECCOMP_NOTIF_PUT_FD, &putfd); + EXPECT_GE(fd, 0); + EXPECT_EQ(kcmp(req.pid, getpid(), KCMP_FILE, fd, sk_pair[1]), 0); + + resp.val = fd; + resp.error = 0; + + EXPECT_EQ(ioctl(listener, SECCOMP_NOTIF_SEND, &resp), sizeof(resp)); + close(sk_pair[1]); + + EXPECT_EQ(write(sk_pair[0], "hello world\0", 12), 12); + close(sk_pair[0]); + + EXPECT_EQ(waitpid(pid, &status, 0), pid); + EXPECT_EQ(true, WIFEXITED(status)); + EXPECT_EQ(0, WEXITSTATUS(status)); + close(listener); +} + /* * Check that a pid in a child namespace still shows up as valid in ours. */ From patchwork Thu Sep 27 15:11:19 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tycho Andersen X-Patchwork-Id: 10618139 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6BC7116B1 for ; Thu, 27 Sep 2018 15:11:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 57FB42BA06 for ; Thu, 27 Sep 2018 15:11:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4BC1A2BA10; Thu, 27 Sep 2018 15:11:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 42C042BA06 for ; Thu, 27 Sep 2018 15:11:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728319AbeI0Vac (ORCPT ); Thu, 27 Sep 2018 17:30:32 -0400 Received: from mail-pg1-f195.google.com ([209.85.215.195]:34300 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728305AbeI0Vab (ORCPT ); Thu, 27 Sep 2018 17:30:31 -0400 Received: by mail-pg1-f195.google.com with SMTP id d19-v6so2207091pgv.1 for ; Thu, 27 Sep 2018 08:11:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tycho-ws.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=MDvXFHbBVPCiaxPFAEYKrLG36j1/sMJfgiFFw3qx+M4=; b=zc2Zb8zY/53XgpSjwTSMxedGAC5xVOqjcalnX29zw8pUq0UDiXkiDyE8CHVQaWSTjP lYSt2QWGIMlt7VN3RYzTYAumpuMlYfkbzUa4HNnzNlgcw92NSoXQhrIghTO7BIh5XnT7 q6Dd5vr+Icwi7r4C4BsL7duXgglSMcu/HiyZkK2IwHMyUMRS72T8EfjAFdd/wljahvU2 BZu/KSZheJn5fsdlfDSeSezou8PbHgfeMCnGeCRhZGwn6Yvqqa7mvrKUKUf586C1Omar 4aH2F4h3LTcbJYBpA3rhvCmgVbzps1sn13x7jsOQpP+4exPRd9boJeWrbKN4Rdh2dyog haeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=MDvXFHbBVPCiaxPFAEYKrLG36j1/sMJfgiFFw3qx+M4=; b=Lq7/UFJWr1uoGQy9boV2kXi1rfZ3f1BGRLqQrQRUJcKkw/UF1xSOrvCIrZlWFrI+gM Fh+bzCUnZXjfhxjy+IdKFUwTsSB4I2tTlhjHY1PAZwpQKvY2rOAyHS9CRC7HBvLqSCId faBJeNTqpdJu/UlNGXGMlrMDBFTWagXWNTgpVa5r7j4zbfvnxhV1vYeBW8o3kdISHnpZ onHsPbToIAKTuRLXbTzKvUYxjBUm3lDMdwEztTMZ/T/oIbGoHcsIUoWfPDmGY3xRgGI/ Mn4I9GkYVJEH1onG6FkernAPLBgKDz1gqLs1tFL2FlXpijoLAM+vy7Wa3pqkZElBocTZ nl8g== X-Gm-Message-State: ABuFfoizFtSPqk1wP6Km1TBoOWxLj2YJjbqflYoBF9q6r5MfoFx+ebhQ p/c3yzgBRzW2nzzdf4w9W/nPow== X-Google-Smtp-Source: ACcGV63sFsqE15D4ZM/TdgCpNeXPPx+KBHHPkitUKR50tbI/pQuP2JX+ewkOiM9KpgozNbqdZQy9rg== X-Received: by 2002:a17:902:599d:: with SMTP id p29-v6mr11636456pli.74.1538061108239; Thu, 27 Sep 2018 08:11:48 -0700 (PDT) Received: from localhost.localdomain ([128.107.241.178]) by smtp.gmail.com with ESMTPSA id y19-v6sm5429610pff.14.2018.09.27.08.11.46 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 Sep 2018 08:11:47 -0700 (PDT) From: Tycho Andersen To: Kees Cook Cc: linux-kernel@vger.kernel.org, containers@lists.linux-foundation.org, linux-api@vger.kernel.org, Andy Lutomirski , Oleg Nesterov , "Eric W . Biederman" , "Serge E . Hallyn" , Christian Brauner , Tyler Hicks , Akihiro Suda , Jann Horn , linux-fsdevel@vger.kernel.org, Tycho Andersen Subject: [PATCH v7 6/6] samples: add an example of seccomp user trap Date: Thu, 27 Sep 2018 09:11:19 -0600 Message-Id: <20180927151119.9989-7-tycho@tycho.ws> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180927151119.9989-1-tycho@tycho.ws> References: <20180927151119.9989-1-tycho@tycho.ws> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The idea here is just to give a demonstration of how one could safely use the SECCOMP_RET_USER_NOTIF feature to do mount policies. This particular policy is (as noted in the comment) not very interesting, but it serves to illustrate how one might apply a policy dodging the various TOCTOU issues. v5: new in v5 v7: updates for v7 API changes Signed-off-by: Tycho Andersen CC: Kees Cook CC: Andy Lutomirski CC: Oleg Nesterov CC: Eric W. Biederman CC: "Serge E. Hallyn" CC: Christian Brauner CC: Tyler Hicks CC: Akihiro Suda --- samples/seccomp/.gitignore | 1 + samples/seccomp/Makefile | 7 +- samples/seccomp/user-trap.c | 312 ++++++++++++++++++++++++++++++++++++ 3 files changed, 319 insertions(+), 1 deletion(-) diff --git a/samples/seccomp/.gitignore b/samples/seccomp/.gitignore index 78fb78184291..d1e2e817d556 100644 --- a/samples/seccomp/.gitignore +++ b/samples/seccomp/.gitignore @@ -1,3 +1,4 @@ bpf-direct bpf-fancy dropper +user-trap diff --git a/samples/seccomp/Makefile b/samples/seccomp/Makefile index cf34ff6b4065..4920903c8009 100644 --- a/samples/seccomp/Makefile +++ b/samples/seccomp/Makefile @@ -1,6 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 ifndef CROSS_COMPILE -hostprogs-$(CONFIG_SAMPLE_SECCOMP) := bpf-fancy dropper bpf-direct +hostprogs-$(CONFIG_SAMPLE_SECCOMP) := bpf-fancy dropper bpf-direct user-trap HOSTCFLAGS_bpf-fancy.o += -I$(objtree)/usr/include HOSTCFLAGS_bpf-fancy.o += -idirafter $(objtree)/include @@ -16,6 +16,10 @@ HOSTCFLAGS_bpf-direct.o += -I$(objtree)/usr/include HOSTCFLAGS_bpf-direct.o += -idirafter $(objtree)/include bpf-direct-objs := bpf-direct.o +HOSTCFLAGS_user-trap.o += -I$(objtree)/usr/include +HOSTCFLAGS_user-trap.o += -idirafter $(objtree)/include +user-trap-objs := user-trap.o + # Try to match the kernel target. ifndef CONFIG_64BIT @@ -33,6 +37,7 @@ HOSTCFLAGS_bpf-fancy.o += $(MFLAG) HOSTLDLIBS_bpf-direct += $(MFLAG) HOSTLDLIBS_bpf-fancy += $(MFLAG) HOSTLDLIBS_dropper += $(MFLAG) +HOSTLDLIBS_user-trap += $(MFLAG) endif always := $(hostprogs-m) endif diff --git a/samples/seccomp/user-trap.c b/samples/seccomp/user-trap.c new file mode 100644 index 000000000000..63c9a5994dc1 --- /dev/null +++ b/samples/seccomp/user-trap.c @@ -0,0 +1,312 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* + * Because of some grossness, we can't include linux/ptrace.h here, so we + * re-define PTRACE_SECCOMP_NEW_LISTENER. + */ +#ifndef PTRACE_SECCOMP_NEW_LISTENER +#define PTRACE_SECCOMP_NEW_LISTENER 0x420e +#endif + +#define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x))) + +static int seccomp(unsigned int op, unsigned int flags, void *args) +{ + errno = 0; + return syscall(__NR_seccomp, op, flags, args); +} + +static int user_trap_syscall(int nr, unsigned int flags) +{ + struct sock_filter filter[] = { + BPF_STMT(BPF_LD+BPF_W+BPF_ABS, + offsetof(struct seccomp_data, nr)), + BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, nr, 0, 1), + BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_USER_NOTIF), + BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW), + }; + + struct sock_fprog prog = { + .len = (unsigned short)ARRAY_SIZE(filter), + .filter = filter, + }; + + return seccomp(SECCOMP_SET_MODE_FILTER, flags, &prog); +} + +static int handle_req(struct seccomp_notif *req, + struct seccomp_notif_resp *resp, int listener) +{ + char path[PATH_MAX], source[PATH_MAX], target[PATH_MAX]; + int ret = -1, mem; + + resp->len = sizeof(*resp); + resp->id = req->id; + resp->error = -EPERM; + resp->val = 0; + + if (req->data.nr != __NR_mount) { + fprintf(stderr, "huh? trapped something besides mknod? %d\n", req->data.nr); + return -1; + } + + /* Only allow bind mounts. */ + if (!(req->data.args[3] & MS_BIND)) + return 0; + + /* + * Ok, let's read the task's memory to see where they wanted their + * mount to go. + */ + snprintf(path, sizeof(path), "/proc/%d/mem", req->pid); + mem = open(path, O_RDONLY); + if (mem < 0) { + perror("open mem"); + return -1; + } + + /* + * Now we avoid a TOCTOU: we referred to a pid by its pid, but since + * the pid that made the syscall may have died, we need to confirm that + * the pid is still valid after we open its /proc/pid/mem file. We can + * ask the listener fd this as follows. + * + * Note that this check should occur *after* any task-specific + * resources are opened, to make sure that the task has not died and + * we're not wrongly reading someone else's state in order to make + * decisions. + */ + if (ioctl(listener, SECCOMP_NOTIF_ID_VALID, &req->id) < 0) { + fprintf(stderr, "task died before we could map its memory\n"); + goto out; + } + + /* + * Phew, we've got the right /proc/pid/mem. Now we can read it. Note + * that to avoid another TOCTOU, we should read all of the pointer args + * before we decide to allow the syscall. + */ + if (lseek(mem, req->data.args[0], SEEK_SET) < 0) { + perror("seek"); + goto out; + } + + ret = read(mem, source, sizeof(source)); + if (ret < 0) { + perror("read"); + goto out; + } + + if (lseek(mem, req->data.args[1], SEEK_SET) < 0) { + perror("seek"); + goto out; + } + + ret = read(mem, target, sizeof(target)); + if (ret < 0) { + perror("read"); + goto out; + } + + /* + * Our policy is to only allow bind mounts inside /tmp. This isn't very + * interesting, because we could do unprivlieged bind mounts with user + * namespaces already, but you get the idea. + */ + if (!strncmp(source, "/tmp", 4) && !strncmp(target, "/tmp", 4)) { + if (mount(source, target, NULL, req->data.args[3], NULL) < 0) { + ret = -1; + perror("actual mount"); + goto out; + } + resp->error = 0; + } + + /* Even if we didn't allow it because of policy, generating the + * response was be a success, because we want to tell the worker EPERM. + */ + ret = 0; + +out: + close(mem); + return ret; +} + +int main(void) +{ + int sk_pair[2], ret = 1, status, listener; + pid_t worker = 0 , tracer = 0; + char c; + + if (socketpair(PF_LOCAL, SOCK_SEQPACKET, 0, sk_pair) < 0) { + perror("socketpair"); + return 1; + } + + worker = fork(); + if (worker < 0) { + perror("fork"); + goto close_pair; + } + + if (worker == 0) { + if (user_trap_syscall(__NR_mount, 0) < 0) { + perror("seccomp"); + exit(1); + } + + if (setuid(1000) < 0) { + perror("setuid"); + exit(1); + } + + if (write(sk_pair[1], "a", 1) != 1) { + perror("write"); + exit(1); + } + + if (read(sk_pair[1], &c, 1) != 1) { + perror("write"); + exit(1); + } + + if (mkdir("/tmp/foo", 0755) < 0) { + perror("mkdir"); + exit(1); + } + + if (mount("/dev/sda", "/tmp/foo", NULL, 0, NULL) != -1) { + fprintf(stderr, "huh? mounted /dev/sda?\n"); + exit(1); + } + + if (errno != EPERM) { + perror("bad error from mount"); + exit(1); + } + + if (mount("/tmp/foo", "/tmp/foo", NULL, MS_BIND, NULL) < 0) { + perror("mount"); + exit(1); + } + + exit(0); + } + + if (read(sk_pair[0], &c, 1) != 1) { + perror("read ready signal"); + goto out_kill; + } + + if (ptrace(PTRACE_ATTACH, worker) < 0) { + perror("ptrace"); + goto out_kill; + } + + if (waitpid(worker, NULL, 0) != worker) { + perror("waitpid"); + goto out_kill; + } + + listener = ptrace(PTRACE_SECCOMP_NEW_LISTENER, worker, 0); + if (listener < 0) { + perror("ptrace get listener"); + goto out_kill; + } + + if (ptrace(PTRACE_DETACH, worker, NULL, 0) < 0) { + perror("ptrace detach"); + goto out_kill; + } + + if (write(sk_pair[0], "a", 1) != 1) { + perror("write"); + exit(1); + } + + tracer = fork(); + if (tracer < 0) { + perror("fork"); + goto out_kill; + } + + if (tracer == 0) { + while (1) { + struct seccomp_notif req = {}; + struct seccomp_notif_resp resp = {}; + + req.len = sizeof(req); + if (ioctl(listener, SECCOMP_NOTIF_RECV, &req) != sizeof(req)) { + perror("ioctl recv"); + goto out_close; + } + + if (handle_req(&req, &resp, listener) < 0) + goto out_close; + + if (ioctl(listener, SECCOMP_NOTIF_SEND, &resp) != sizeof(resp)) { + perror("ioctl send"); + goto out_close; + } + } +out_close: + close(listener); + exit(1); + } + + close(listener); + + if (waitpid(worker, &status, 0) != worker) { + perror("waitpid"); + goto out_kill; + } + + if (umount2("/tmp/foo", MNT_DETACH) < 0 && errno != EINVAL) { + perror("umount2"); + goto out_kill; + } + + if (remove("/tmp/foo") < 0 && errno != ENOENT) { + perror("remove"); + exit(1); + } + + if (!WIFEXITED(status) || WEXITSTATUS(status)) { + fprintf(stderr, "worker exited nonzero\n"); + goto out_kill; + } + + ret = 0; + +out_kill: + if (tracer > 0) + kill(tracer, SIGKILL); + if (worker > 0) + kill(worker, SIGKILL); + +close_pair: + close(sk_pair[0]); + close(sk_pair[1]); + return ret; +}