From patchwork Wed Jul 15 14:49:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adrian Reber X-Patchwork-Id: 11665591 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2423F13B1 for ; Wed, 15 Jul 2020 14:51:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 07262206F5 for ; Wed, 15 Jul 2020 14:51:26 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="XdKoDTxq" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726661AbgGOOvX (ORCPT ); Wed, 15 Jul 2020 10:51:23 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:57454 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726629AbgGOOvW (ORCPT ); Wed, 15 Jul 2020 10:51:22 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1594824680; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qgiW7ncb5vtKY1UKZKCcWEcewAM6ZIUsPgJWUmthSs0=; b=XdKoDTxqNo1VUE9IdszMmhz3SsfBTu7wSn8MWieVqFCJWzZkX/PLjGHvMn07GhoAVLbtvo DXRIFAhR/4qeEWSEdqGpy1yYfu164KZDwlFVDPJ1BSVdIYDY8YIPmwfIP/vIEq1mabBvy3 1wci1rUnXnNd868wz9IaqvORZceopdA= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-107-_EpRdUJqN-GvLVGG7SHwHQ-1; Wed, 15 Jul 2020 10:51:15 -0400 X-MC-Unique: _EpRdUJqN-GvLVGG7SHwHQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E4A01100960F; Wed, 15 Jul 2020 14:51:11 +0000 (UTC) Received: from dcbz.redhat.com (ovpn-114-113.ams2.redhat.com [10.36.114.113]) by smtp.corp.redhat.com (Postfix) with ESMTP id A43E360BF1; Wed, 15 Jul 2020 14:51:04 +0000 (UTC) From: Adrian Reber To: Christian Brauner , Eric Biederman , Pavel Emelyanov , Oleg Nesterov , Dmitry Safonov <0x7f454c46@gmail.com>, Andrei Vagin , Nicolas Viennot , =?utf-8?b?TWljaGHFgiBDxYJh?= =?utf-8?b?cGnFhHNraQ==?= , Kamil Yurtsever , Dirk Petersen , Christine Flood , Casey Schaufler Cc: Mike Rapoport , Radostin Stoyanov , Adrian Reber , Cyrill Gorcunov , Serge Hallyn , Stephen Smalley , Sargun Dhillon , Arnd Bergmann , linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org, selinux@vger.kernel.org, Eric Paris , Jann Horn , linux-fsdevel@vger.kernel.org Subject: [PATCH v5 1/6] capabilities: Introduce CAP_CHECKPOINT_RESTORE Date: Wed, 15 Jul 2020 16:49:49 +0200 Message-Id: <20200715144954.1387760-2-areber@redhat.com> In-Reply-To: <20200715144954.1387760-1-areber@redhat.com> References: <20200715144954.1387760-1-areber@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: selinux-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: selinux@vger.kernel.org This patch introduces CAP_CHECKPOINT_RESTORE, a new capability facilitating checkpoint/restore for non-root users. Over the last years, The CRIU (Checkpoint/Restore In Userspace) team has been asked numerous times if it is possible to checkpoint/restore a process as non-root. The answer usually was: 'almost'. The main blocker to restore a process as non-root was to control the PID of the restored process. This feature available via the clone3 system call, or via /proc/sys/kernel/ns_last_pid is unfortunately guarded by CAP_SYS_ADMIN. In the past two years, requests for non-root checkpoint/restore have increased due to the following use cases: * Checkpoint/Restore in an HPC environment in combination with a resource manager distributing jobs where users are always running as non-root. There is a desire to provide a way to checkpoint and restore long running jobs. * Container migration as non-root * We have been in contact with JVM developers who are integrating CRIU into a Java VM to decrease the startup time. These checkpoint/restore applications are not meant to be running with CAP_SYS_ADMIN. We have seen the following workarounds: * Use a setuid wrapper around CRIU: See https://github.com/FredHutch/slurm-examples/blob/master/checkpointer/lib/checkpointer/checkpointer-suid.c * Use a setuid helper that writes to ns_last_pid. Unfortunately, this helper delegation technique is impossible to use with clone3, and is thus prone to races. See https://github.com/twosigma/set_ns_last_pid * Cycle through PIDs with fork() until the desired PID is reached: This has been demonstrated to work with cycling rates of 100,000 PIDs/s See https://github.com/twosigma/set_ns_last_pid * Patch out the CAP_SYS_ADMIN check from the kernel * Run the desired application in a new user and PID namespace to provide a local CAP_SYS_ADMIN for controlling PIDs. This technique has limited use in typical container environments (e.g., Kubernetes) as /proc is typically protected with read-only layers (e.g., /proc/sys) for hardening purposes. Read-only layers prevent additional /proc mounts (due to proc's SB_I_USERNS_VISIBLE property), making the use of new PID namespaces limited as certain applications need access to /proc matching their PID namespace. The introduced capability allows to: * Control PIDs when the current user is CAP_CHECKPOINT_RESTORE capable for the corresponding PID namespace via ns_last_pid/clone3. * Open files in /proc/pid/map_files when the current user is CAP_CHECKPOINT_RESTORE capable in the root namespace, useful for recovering files that are unreachable via the file system such as deleted files, or memfd files. See corresponding selftest for an example with clone3(). Signed-off-by: Adrian Reber Signed-off-by: Nicolas Viennot Acked-by: Christian Brauner --- include/linux/capability.h | 6 ++++++ include/uapi/linux/capability.h | 9 ++++++++- security/selinux/include/classmap.h | 5 +++-- 3 files changed, 17 insertions(+), 3 deletions(-) diff --git a/include/linux/capability.h b/include/linux/capability.h index b4345b38a6be..1e7fe311cabe 100644 --- a/include/linux/capability.h +++ b/include/linux/capability.h @@ -261,6 +261,12 @@ static inline bool bpf_capable(void) return capable(CAP_BPF) || capable(CAP_SYS_ADMIN); } +static inline bool checkpoint_restore_ns_capable(struct user_namespace *ns) +{ + return ns_capable(ns, CAP_CHECKPOINT_RESTORE) || + ns_capable(ns, CAP_SYS_ADMIN); +} + /* audit system wants to get cap info from files as well */ extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data *cpu_caps); diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h index 48ff0757ae5e..395dd0df8d08 100644 --- a/include/uapi/linux/capability.h +++ b/include/uapi/linux/capability.h @@ -408,7 +408,14 @@ struct vfs_ns_cap_data { */ #define CAP_BPF 39 -#define CAP_LAST_CAP CAP_BPF + +/* Allow checkpoint/restore related operations */ +/* Allow PID selection during clone3() */ +/* Allow writing to ns_last_pid */ + +#define CAP_CHECKPOINT_RESTORE 40 + +#define CAP_LAST_CAP CAP_CHECKPOINT_RESTORE #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP) diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h index e54d62d529f1..ba2e01a6955c 100644 --- a/security/selinux/include/classmap.h +++ b/security/selinux/include/classmap.h @@ -27,9 +27,10 @@ "audit_control", "setfcap" #define COMMON_CAP2_PERMS "mac_override", "mac_admin", "syslog", \ - "wake_alarm", "block_suspend", "audit_read", "perfmon", "bpf" + "wake_alarm", "block_suspend", "audit_read", "perfmon", "bpf", \ + "checkpoint_restore" -#if CAP_LAST_CAP > CAP_BPF +#if CAP_LAST_CAP > CAP_CHECKPOINT_RESTORE #error New capability defined, please update COMMON_CAP2_PERMS. #endif From patchwork Wed Jul 15 14:49:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adrian Reber X-Patchwork-Id: 11665595 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A4C6F1392 for ; Wed, 15 Jul 2020 14:51:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8DA3D2071B for ; Wed, 15 Jul 2020 14:51:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="YUEhM2Ed" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726769AbgGOOvc (ORCPT ); Wed, 15 Jul 2020 10:51:32 -0400 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:35146 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726758AbgGOOvb (ORCPT ); Wed, 15 Jul 2020 10:51:31 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1594824690; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cFqxEO5GCd1e1dZ9OjJhtlzwIsLHCJufCpGCyw2ZCfQ=; b=YUEhM2Ed+WCNg2/gpGqE/4HGmNv1rMNfh6157qn1WeJPApuz3FWKDbI8JRpRHF/iv7rmLc Zng9mgHsfColh2SR/7kKDXr77yjKQrD3b7WmHBf4zjn1wjVLNjUBneRzl4ykfXilG/Sdbf IMMP4KJ6SBaH761aK/cgQtdgJe/iW7g= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-202-TmwZIEFRPpGGho2HLUOd2Q-1; Wed, 15 Jul 2020 10:51:26 -0400 X-MC-Unique: TmwZIEFRPpGGho2HLUOd2Q-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 81A891009616; Wed, 15 Jul 2020 14:51:23 +0000 (UTC) Received: from dcbz.redhat.com (ovpn-114-113.ams2.redhat.com [10.36.114.113]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0C09A60BF1; Wed, 15 Jul 2020 14:51:12 +0000 (UTC) From: Adrian Reber To: Christian Brauner , Eric Biederman , Pavel Emelyanov , Oleg Nesterov , Dmitry Safonov <0x7f454c46@gmail.com>, Andrei Vagin , Nicolas Viennot , =?utf-8?b?TWljaGHFgiBDxYJh?= =?utf-8?b?cGnFhHNraQ==?= , Kamil Yurtsever , Dirk Petersen , Christine Flood , Casey Schaufler Cc: Mike Rapoport , Radostin Stoyanov , Adrian Reber , Cyrill Gorcunov , Serge Hallyn , Stephen Smalley , Sargun Dhillon , Arnd Bergmann , linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org, selinux@vger.kernel.org, Eric Paris , Jann Horn , linux-fsdevel@vger.kernel.org Subject: [PATCH v5 2/6] pid: use checkpoint_restore_ns_capable() for set_tid Date: Wed, 15 Jul 2020 16:49:50 +0200 Message-Id: <20200715144954.1387760-3-areber@redhat.com> In-Reply-To: <20200715144954.1387760-1-areber@redhat.com> References: <20200715144954.1387760-1-areber@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: selinux-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: selinux@vger.kernel.org Use the newly introduced capability CAP_CHECKPOINT_RESTORE to allow using clone3() with set_tid set. Signed-off-by: Adrian Reber Signed-off-by: Nicolas Viennot Acked-by: Christian Brauner --- kernel/pid.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/pid.c b/kernel/pid.c index de9d29c41d77..a9cbab0194d9 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -199,7 +199,7 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid, if (tid != 1 && !tmp->child_reaper) goto out_free; retval = -EPERM; - if (!ns_capable(tmp->user_ns, CAP_SYS_ADMIN)) + if (!checkpoint_restore_ns_capable(tmp->user_ns)) goto out_free; set_tid_size--; } From patchwork Wed Jul 15 14:49:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adrian Reber X-Patchwork-Id: 11665603 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 57CBA13B4 for ; Wed, 15 Jul 2020 14:53:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3E431206D5 for ; Wed, 15 Jul 2020 14:53:07 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Wl+nLcwN" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726843AbgGOOvk (ORCPT ); Wed, 15 Jul 2020 10:51:40 -0400 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:24730 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726809AbgGOOvk (ORCPT ); Wed, 15 Jul 2020 10:51:40 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1594824698; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=e+GUMkIqOM1756261b5DcpT99uKH+47iX9jCYAeaOt4=; b=Wl+nLcwN87Dv1+wlHdPwYRN68tHynhmQBAk1gZOLKowth2CIEzcy9yc0as8EMpH0vsW8Fv TIrssDPdoCZ3ZjfmWtuWRb2DAVuOenCIyKXHtvlwlhstj6BB8aXakQHGRK428HVJjNA7bQ z6OarKWlDj2sPOQER5Jwvr467PzlfgI= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-267-1qTtWfGdMCGV4yDZuWsOng-1; Wed, 15 Jul 2020 10:51:34 -0400 X-MC-Unique: 1qTtWfGdMCGV4yDZuWsOng-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id C8B5E100AA23; Wed, 15 Jul 2020 14:51:31 +0000 (UTC) Received: from dcbz.redhat.com (ovpn-114-113.ams2.redhat.com [10.36.114.113]) by smtp.corp.redhat.com (Postfix) with ESMTP id E795660BF4; Wed, 15 Jul 2020 14:51:24 +0000 (UTC) From: Adrian Reber To: Christian Brauner , Eric Biederman , Pavel Emelyanov , Oleg Nesterov , Dmitry Safonov <0x7f454c46@gmail.com>, Andrei Vagin , Nicolas Viennot , =?utf-8?b?TWljaGHFgiBDxYJh?= =?utf-8?b?cGnFhHNraQ==?= , Kamil Yurtsever , Dirk Petersen , Christine Flood , Casey Schaufler Cc: Mike Rapoport , Radostin Stoyanov , Adrian Reber , Cyrill Gorcunov , Serge Hallyn , Stephen Smalley , Sargun Dhillon , Arnd Bergmann , linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org, selinux@vger.kernel.org, Eric Paris , Jann Horn , linux-fsdevel@vger.kernel.org Subject: [PATCH v5 3/6] pid_namespace: use checkpoint_restore_ns_capable() for ns_last_pid Date: Wed, 15 Jul 2020 16:49:51 +0200 Message-Id: <20200715144954.1387760-4-areber@redhat.com> In-Reply-To: <20200715144954.1387760-1-areber@redhat.com> References: <20200715144954.1387760-1-areber@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: selinux-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: selinux@vger.kernel.org Use the newly introduced capability CAP_CHECKPOINT_RESTORE to allow writing to ns_last_pid. Signed-off-by: Adrian Reber Signed-off-by: Nicolas Viennot Acked-by: Christian Brauner --- kernel/pid_namespace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c index 0e5ac162c3a8..ac135bd600eb 100644 --- a/kernel/pid_namespace.c +++ b/kernel/pid_namespace.c @@ -269,7 +269,7 @@ static int pid_ns_ctl_handler(struct ctl_table *table, int write, struct ctl_table tmp = *table; int ret, next; - if (write && !ns_capable(pid_ns->user_ns, CAP_SYS_ADMIN)) + if (write && !checkpoint_restore_ns_capable(pid_ns->user_ns)) return -EPERM; /* From patchwork Wed Jul 15 14:49:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adrian Reber X-Patchwork-Id: 11665615 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AC5D813B4 for ; Wed, 15 Jul 2020 14:53:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 96065206D5 for ; Wed, 15 Jul 2020 14:53:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="KCNM5aYq" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726844AbgGOOxH (ORCPT ); Wed, 15 Jul 2020 10:53:07 -0400 Received: from us-smtp-2.mimecast.com ([207.211.31.81]:55667 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726858AbgGOOxD (ORCPT ); Wed, 15 Jul 2020 10:53:03 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1594824782; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YrIQ7usHt/ICUIVG/4Zzx10RZSX03Ne/OGNOYc1RqJU=; b=KCNM5aYqJ/1VQbSvnu1+TFwUE5vGLQRflnY/U8zdu7Wzr8K7EN2GwFz+/uGaBmoS/SLOIz BV6Gf7Ml1UttQugxItEhIEDQknDx7fk4m47k4HCbjfZuol8ZBwSGYHOpGiLrAxLzp85HDV dp0iI3f5LnOBQLloxgp/1fUyMvId1SM= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-417-OOnXwcsxNimB7eMBtWwEKw-1; Wed, 15 Jul 2020 10:51:49 -0400 X-MC-Unique: OOnXwcsxNimB7eMBtWwEKw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id DEEDE18A1DE7; Wed, 15 Jul 2020 14:51:45 +0000 (UTC) Received: from dcbz.redhat.com (ovpn-114-113.ams2.redhat.com [10.36.114.113]) by smtp.corp.redhat.com (Postfix) with ESMTP id 03C7E60BF1; Wed, 15 Jul 2020 14:51:32 +0000 (UTC) From: Adrian Reber To: Christian Brauner , Eric Biederman , Pavel Emelyanov , Oleg Nesterov , Dmitry Safonov <0x7f454c46@gmail.com>, Andrei Vagin , Nicolas Viennot , =?utf-8?b?TWljaGHFgiBDxYJh?= =?utf-8?b?cGnFhHNraQ==?= , Kamil Yurtsever , Dirk Petersen , Christine Flood , Casey Schaufler Cc: Mike Rapoport , Radostin Stoyanov , Adrian Reber , Cyrill Gorcunov , Serge Hallyn , Stephen Smalley , Sargun Dhillon , Arnd Bergmann , linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org, selinux@vger.kernel.org, Eric Paris , Jann Horn , linux-fsdevel@vger.kernel.org Subject: [PATCH v5 4/6] proc: allow access in init userns for map_files with CAP_CHECKPOINT_RESTORE Date: Wed, 15 Jul 2020 16:49:52 +0200 Message-Id: <20200715144954.1387760-5-areber@redhat.com> In-Reply-To: <20200715144954.1387760-1-areber@redhat.com> References: <20200715144954.1387760-1-areber@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: selinux-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: selinux@vger.kernel.org Opening files in /proc/pid/map_files when the current user is CAP_CHECKPOINT_RESTORE capable in the root namespace is useful for checkpointing and restoring to recover files that are unreachable via the file system such as deleted files, or memfd files. Signed-off-by: Adrian Reber Signed-off-by: Nicolas Viennot Reviewed-by: Cyrill Gorcunov --- fs/proc/base.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 65893686d1f1..cada783f229e 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -2194,16 +2194,16 @@ struct map_files_info { }; /* - * Only allow CAP_SYS_ADMIN to follow the links, due to concerns about how the - * symlinks may be used to bypass permissions on ancestor directories in the - * path to the file in question. + * Only allow CAP_SYS_ADMIN and CAP_CHECKPOINT_RESTORE to follow the links, due + * to concerns about how the symlinks may be used to bypass permissions on + * ancestor directories in the path to the file in question. */ static const char * proc_map_files_get_link(struct dentry *dentry, struct inode *inode, struct delayed_call *done) { - if (!capable(CAP_SYS_ADMIN)) + if (!capable(CAP_SYS_ADMIN) || !capable(CAP_CHECKPOINT_RESTORE)) return ERR_PTR(-EPERM); return proc_pid_get_link(dentry, inode, done); From patchwork Wed Jul 15 14:49:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adrian Reber X-Patchwork-Id: 11665611 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 736761392 for ; Wed, 15 Jul 2020 14:53:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5CAB7206F5 for ; Wed, 15 Jul 2020 14:53:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Bob3yeI4" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726885AbgGOOxD (ORCPT ); Wed, 15 Jul 2020 10:53:03 -0400 Received: from us-smtp-2.mimecast.com ([207.211.31.81]:40134 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726848AbgGOOxC (ORCPT ); Wed, 15 Jul 2020 10:53:02 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1594824781; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=92lte8n3yu9W0SorD40v1+TSi2IDS2o0VxXIGO6eK0g=; b=Bob3yeI4mz3GOodf1yF7IubjILiHG6BJ8qAH9d/DiB8JAaOlMtOAeNzCP8BQa81TPqAuzE xnrWN7YSJiN+QsmeZY0YwU1WhhcAFP5eAbt3JLnzbON3k0CCc9S6Sfo5VewaCZLM7kAJfq N+qIGvKTAcld+LF0z3eRiIoMKf1d5eU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-11-PvOYl-MuNP6OOxhtc3SWSw-1; Wed, 15 Jul 2020 10:51:54 -0400 X-MC-Unique: PvOYl-MuNP6OOxhtc3SWSw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 69FB118A1DE8; Wed, 15 Jul 2020 14:51:51 +0000 (UTC) Received: from dcbz.redhat.com (ovpn-114-113.ams2.redhat.com [10.36.114.113]) by smtp.corp.redhat.com (Postfix) with ESMTP id 256A160BF1; Wed, 15 Jul 2020 14:51:47 +0000 (UTC) From: Adrian Reber To: Christian Brauner , Eric Biederman , Pavel Emelyanov , Oleg Nesterov , Dmitry Safonov <0x7f454c46@gmail.com>, Andrei Vagin , Nicolas Viennot , =?utf-8?b?TWljaGHFgiBDxYJh?= =?utf-8?b?cGnFhHNraQ==?= , Kamil Yurtsever , Dirk Petersen , Christine Flood , Casey Schaufler Cc: Mike Rapoport , Radostin Stoyanov , Adrian Reber , Cyrill Gorcunov , Serge Hallyn , Stephen Smalley , Sargun Dhillon , Arnd Bergmann , linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org, selinux@vger.kernel.org, Eric Paris , Jann Horn , linux-fsdevel@vger.kernel.org Subject: [PATCH v5 5/6] prctl: Allow checkpoint/restore capable processes to change exe link Date: Wed, 15 Jul 2020 16:49:53 +0200 Message-Id: <20200715144954.1387760-6-areber@redhat.com> In-Reply-To: <20200715144954.1387760-1-areber@redhat.com> References: <20200715144954.1387760-1-areber@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: selinux-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: selinux@vger.kernel.org From: Nicolas Viennot Allow CAP_CHECKPOINT_RESTORE capable users to change /proc/self/exe. This commit also changes the permission error code from -EINVAL to -EPERM for consistency with the rest of the prctl() syscall when checking capabilities. Signed-off-by: Nicolas Viennot Signed-off-by: Adrian Reber --- kernel/sys.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/kernel/sys.c b/kernel/sys.c index 00a96746e28a..dd59b9142b1d 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2007,12 +2007,14 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data if (prctl_map.exe_fd != (u32)-1) { /* - * Make sure the caller has the rights to - * change /proc/pid/exe link: only local sys admin should - * be allowed to. + * Check if the current user is checkpoint/restore capable. + * At the time of this writing, it checks for CAP_SYS_ADMIN + * or CAP_CHECKPOINT_RESTORE. + * Note that a user with access to ptrace can masquerade an + * arbitrary program as any executable, even setuid ones. */ - if (!ns_capable(current_user_ns(), CAP_SYS_ADMIN)) - return -EINVAL; + if (!checkpoint_restore_ns_capable(current_user_ns())) + return -EPERM; error = prctl_set_mm_exe_file(mm, prctl_map.exe_fd); if (error) From patchwork Wed Jul 15 14:49:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adrian Reber X-Patchwork-Id: 11665619 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EC27017C7 for ; Wed, 15 Jul 2020 14:53:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CE227206D5 for ; Wed, 15 Jul 2020 14:53:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="GcBmqaBe" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726971AbgGOOxQ (ORCPT ); Wed, 15 Jul 2020 10:53:16 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:42542 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726852AbgGOOxP (ORCPT ); Wed, 15 Jul 2020 10:53:15 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1594824793; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=d9gLBZH3xrp5bh9Hcfk99uaWsYU1ho/1zy2f3F6QJLA=; b=GcBmqaBeGaE6Xh0fJyDB73bl8SUWYodxOY7pC8VhbuREBT+v3lxrF+AwOfBhEaB/WkMOTf geJ+bGYn7p3aDpWphOffX6onYmB8410QFQ+06tmNvlT++WIgx5bkM2w9Y8NparXQ4WasBL PIp497SJHe0r+NGBcwJ4QNkoDKPpddQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-331-vxqMJMi2PhGPvdwM3iMhzQ-1; Wed, 15 Jul 2020 10:52:03 -0400 X-MC-Unique: vxqMJMi2PhGPvdwM3iMhzQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id B5E7F800EB6; Wed, 15 Jul 2020 14:52:00 +0000 (UTC) Received: from dcbz.redhat.com (ovpn-114-113.ams2.redhat.com [10.36.114.113]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7CDEC60BF1; Wed, 15 Jul 2020 14:51:53 +0000 (UTC) From: Adrian Reber To: Christian Brauner , Eric Biederman , Pavel Emelyanov , Oleg Nesterov , Dmitry Safonov <0x7f454c46@gmail.com>, Andrei Vagin , Nicolas Viennot , =?utf-8?b?TWljaGHFgiBDxYJh?= =?utf-8?b?cGnFhHNraQ==?= , Kamil Yurtsever , Dirk Petersen , Christine Flood , Casey Schaufler Cc: Mike Rapoport , Radostin Stoyanov , Adrian Reber , Cyrill Gorcunov , Serge Hallyn , Stephen Smalley , Sargun Dhillon , Arnd Bergmann , linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org, selinux@vger.kernel.org, Eric Paris , Jann Horn , linux-fsdevel@vger.kernel.org Subject: [PATCH v5 6/6] selftests: add clone3() CAP_CHECKPOINT_RESTORE test Date: Wed, 15 Jul 2020 16:49:54 +0200 Message-Id: <20200715144954.1387760-7-areber@redhat.com> In-Reply-To: <20200715144954.1387760-1-areber@redhat.com> References: <20200715144954.1387760-1-areber@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Sender: selinux-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: selinux@vger.kernel.org This adds a test that changes its UID, uses capabilities to get CAP_CHECKPOINT_RESTORE and uses clone3() with set_tid to create a process with a given PID as non-root. Signed-off-by: Adrian Reber Acked-by: Serge Hallyn --- tools/testing/selftests/clone3/Makefile | 4 +- .../clone3/clone3_cap_checkpoint_restore.c | 203 ++++++++++++++++++ 2 files changed, 206 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/clone3/clone3_cap_checkpoint_restore.c diff --git a/tools/testing/selftests/clone3/Makefile b/tools/testing/selftests/clone3/Makefile index cf976c732906..ef7564cb7abe 100644 --- a/tools/testing/selftests/clone3/Makefile +++ b/tools/testing/selftests/clone3/Makefile @@ -1,6 +1,8 @@ # SPDX-License-Identifier: GPL-2.0 CFLAGS += -g -I../../../../usr/include/ +LDLIBS += -lcap -TEST_GEN_PROGS := clone3 clone3_clear_sighand clone3_set_tid +TEST_GEN_PROGS := clone3 clone3_clear_sighand clone3_set_tid \ + clone3_cap_checkpoint_restore include ../lib.mk diff --git a/tools/testing/selftests/clone3/clone3_cap_checkpoint_restore.c b/tools/testing/selftests/clone3/clone3_cap_checkpoint_restore.c new file mode 100644 index 000000000000..2cc3d57b91f2 --- /dev/null +++ b/tools/testing/selftests/clone3/clone3_cap_checkpoint_restore.c @@ -0,0 +1,203 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * Based on Christian Brauner's clone3() example. + * These tests are assuming to be running in the host's + * PID namespace. + */ + +/* capabilities related code based on selftests/bpf/test_verifier.c */ + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "../kselftest.h" +#include "clone3_selftests.h" + +#ifndef MAX_PID_NS_LEVEL +#define MAX_PID_NS_LEVEL 32 +#endif + +static void child_exit(int ret) +{ + fflush(stdout); + fflush(stderr); + _exit(ret); +} + +static int call_clone3_set_tid(pid_t * set_tid, size_t set_tid_size) +{ + int status; + pid_t pid = -1; + + struct clone_args args = { + .exit_signal = SIGCHLD, + .set_tid = ptr_to_u64(set_tid), + .set_tid_size = set_tid_size, + }; + + pid = sys_clone3(&args, sizeof(struct clone_args)); + if (pid < 0) { + ksft_print_msg("%s - Failed to create new process\n", + strerror(errno)); + return -errno; + } + + if (pid == 0) { + int ret; + char tmp = 0; + + ksft_print_msg + ("I am the child, my PID is %d (expected %d)\n", + getpid(), set_tid[0]); + + if (set_tid[0] != getpid()) + child_exit(EXIT_FAILURE); + child_exit(EXIT_SUCCESS); + } + + ksft_print_msg("I am the parent (%d). My child's pid is %d\n", + getpid(), pid); + + if (waitpid(pid, &status, 0) < 0) { + ksft_print_msg("Child returned %s\n", strerror(errno)); + return -errno; + } + + if (!WIFEXITED(status)) + return -1; + + return WEXITSTATUS(status); +} + +static int test_clone3_set_tid(pid_t * set_tid, + size_t set_tid_size, int expected) +{ + int ret; + + ksft_print_msg("[%d] Trying clone3() with CLONE_SET_TID to %d\n", + getpid(), set_tid[0]); + ret = call_clone3_set_tid(set_tid, set_tid_size); + + ksft_print_msg + ("[%d] clone3() with CLONE_SET_TID %d says :%d - expected %d\n", + getpid(), set_tid[0], ret, expected); + if (ret != expected) { + ksft_test_result_fail + ("[%d] Result (%d) is different than expected (%d)\n", + getpid(), ret, expected); + return -1; + } + ksft_test_result_pass + ("[%d] Result (%d) matches expectation (%d)\n", getpid(), ret, + expected); + + return 0; +} + +struct libcap { + struct __user_cap_header_struct hdr; + struct __user_cap_data_struct data[2]; +}; + +static int set_capability() +{ + cap_value_t cap_values[] = { CAP_SETUID, CAP_SETGID }; + struct libcap *cap; + int ret = -1; + cap_t caps; + + caps = cap_get_proc(); + if (!caps) { + perror("cap_get_proc"); + return -1; + } + + /* Drop all capabilities */ + if (cap_clear(caps)) { + perror("cap_clear"); + goto out; + } + + cap_set_flag(caps, CAP_EFFECTIVE, 2, cap_values, CAP_SET); + cap_set_flag(caps, CAP_PERMITTED, 2, cap_values, CAP_SET); + + cap = (struct libcap *) caps; + + /* 40 -> CAP_CHECKPOINT_RESTORE */ + cap->data[1].effective |= 1 << (40 - 32); + cap->data[1].permitted |= 1 << (40 - 32); + + if (cap_set_proc(caps)) { + perror("cap_set_proc"); + goto out; + } + ret = 0; +out: + if (cap_free(caps)) + perror("cap_free"); + return ret; +} + +int main(int argc, char *argv[]) +{ + pid_t pid; + int status; + int ret = 0; + pid_t set_tid[1]; + uid_t uid = getuid(); + + ksft_print_header(); + test_clone3_supported(); + ksft_set_plan(2); + + if (uid != 0) { + ksft_cnt.ksft_xskip = ksft_plan; + ksft_print_msg("Skipping all tests as non-root\n"); + return ksft_exit_pass(); + } + + memset(&set_tid, 0, sizeof(set_tid)); + + /* Find the current active PID */ + pid = fork(); + if (pid == 0) { + ksft_print_msg("Child has PID %d\n", getpid()); + child_exit(EXIT_SUCCESS); + } + if (waitpid(pid, &status, 0) < 0) + ksft_exit_fail_msg("Waiting for child %d failed", pid); + + /* After the child has finished, its PID should be free. */ + set_tid[0] = pid; + + if (set_capability()) + ksft_test_result_fail + ("Could not set CAP_CHECKPOINT_RESTORE\n"); + prctl(PR_SET_KEEPCAPS, 1, 0, 0, 0); + /* This would fail without CAP_CHECKPOINT_RESTORE */ + setgid(1000); + setuid(1000); + set_tid[0] = pid; + ret |= test_clone3_set_tid(set_tid, 1, -EPERM); + if (set_capability()) + ksft_test_result_fail + ("Could not set CAP_CHECKPOINT_RESTORE\n"); + /* This should work as we have CAP_CHECKPOINT_RESTORE as non-root */ + ret |= test_clone3_set_tid(set_tid, 1, 0); + + return !ret ? ksft_exit_pass() : ksft_exit_fail(); +}