From patchwork Fri Apr 28 05:18:31 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Vagin X-Patchwork-Id: 9703885 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id A288B602B7 for ; Fri, 28 Apr 2017 05:19:23 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 92B7C28636 for ; Fri, 28 Apr 2017 05:19:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8542628653; Fri, 28 Apr 2017 05:19:23 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 82BC928636 for ; Fri, 28 Apr 2017 05:19:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755717AbdD1FTD (ORCPT ); Fri, 28 Apr 2017 01:19:03 -0400 Received: from mail-pf0-f193.google.com ([209.85.192.193]:34219 "EHLO mail-pf0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752198AbdD1FTB (ORCPT ); Fri, 28 Apr 2017 01:19:01 -0400 Received: by mail-pf0-f193.google.com with SMTP id g23so15740891pfj.1; Thu, 27 Apr 2017 22:19:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id; bh=dlDGbj91ijSF0Qmv/AcaYjuALEL8DwJP3PqiyizAWyI=; b=VjiXqDZNhFG3eMnixG7UAfN43dKZDP3R+HUhIOtGLmAIUF4zXHCbIOlIuKQlDkSmzs UpigSDM3aa6vdU8AaeuRkMRb0N+8ct0DZ7BLReI9p4NkxKNY+VjMw3bdT4n03fM1zj/L jD93LjvFnM77xkj8lMyNV0kb9vbgfcQANwmHHPTbfngZd3rcqNbFoUTJ1361HVhxEuiw Xw7bRum1SzUJYOsSPi7uxCnlpCKc+G7k3JwHd+StxKn+KK5NkteekoNwh6Yo+mEsui77 /OhP1nnCN+xnwQnqZaI7+3Cc6/EAX+qZAdY/a2CBQk161UPMl0K7hg9q4rgiUbJT3Pdn A8yw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id; bh=dlDGbj91ijSF0Qmv/AcaYjuALEL8DwJP3PqiyizAWyI=; b=K3QFUVudSCDZZE9/v5kxP9gsrwgo+D/N/7H5EQt/jaaCXqS+/wcxTKOg2d7AxU1ctp FCU2PCk2XnYJmZ6FIaO4hvb4Y1xV+wApqS1TDAfqaoHymdPGyPIj2L1TqlJad4lsg/g9 g+Lcoj1ut+DQGx4uaTUuFU75FLZGUYfk3sujwM+ATjmQkEJC4qbntl1hsyEuo/V5TYA6 VebtcJ5wgNz+PoDyCox9DhTg8kxASznXoF+0UeILRbMJ7gp2Yb0JuhyjJdSODrworukf J5aYowNbTorwxmlHELW93TA8vzeX+vZ9/nlvhW+SIQ1FFI6FCXKSBoWqxcmfwct0POwA rJ1g== X-Gm-Message-State: AN3rC/6058ALQh3jcpGb9MwK/6EIhOjKUiEVN5NnP/cUAz0oyItAIuVB lnytSEGhc67TyQ== X-Received: by 10.84.217.202 with SMTP id d10mr12360286plj.135.1493356740711; Thu, 27 Apr 2017 22:19:00 -0700 (PDT) Received: from localhost.localdomain (c-73-140-212-29.hsd1.wa.comcast.net. [73.140.212.29]) by smtp.gmail.com with ESMTPSA id t6sm8637532pgt.55.2017.04.27.22.18.58 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 Apr 2017 22:18:59 -0700 (PDT) From: Andrei Vagin To: Alexander Viro , "Eric W . Biederman" Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, criu@openvz.org, Andrei Vagin Subject: [PATCH] mnt: allow to add a mount into an existing group Date: Thu, 27 Apr 2017 22:18:31 -0700 Message-Id: <20170428051831.20084-1-avagin@openvz.org> X-Mailer: git-send-email 2.9.3 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Now a shared group can be only inherited from a source mount. This patch adds an ability to add a mount into an existing shared group. mount(source, target, NULL, MS_SET_GROUP, NULL) mount() with the MS_SET_GROUP flag adds the "target" mount into a group of the "source" mount. The calling process has to have the CAP_SYS_ADMIN capability in namespaces of these mounts. The source and the target mounts have to have the same super block. This new functionality together with "mnt: Tuck mounts under others instead of creating shadow/side mounts." allows CRIU to dump and restore any set of mount namespaces. Currently we have a lot of issues about dumping and restoring mount namespaces. The bigest problem is that we can't construct mount trees directly due to several reasons: * groups can't be set, they can be only inherited * file systems has to be mounted from the specified user namespaces * the mount() syscall doesn't just create one mount -- the mount is also propagated to all members of a parent group * umount() doesn't detach mounts from all members of a group (mounts with children are not umounted) * mounts are propagated underneath of existing mounts * mount() doesn't allow to make bind-mounts between two namespaces * processes can have opened file descriptors to overmounted files All these operations are non-trivial, making the task of restoring a mount namespace practically unsolvable for reasonable time. The proposed change allows to restore a mount namespace in a direct manner, without any super complex logic. Cc: Eric W. Biederman Cc: Alexander Viro Signed-off-by: Andrei Vagin --- fs/namespace.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++--- include/uapi/linux/fs.h | 6 +++++ 2 files changed, 68 insertions(+), 4 deletions(-) diff --git a/fs/namespace.c b/fs/namespace.c index cc1375ef..3bf0cd2 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -2355,6 +2355,57 @@ static inline int tree_contains_unbindable(struct mount *mnt) return 0; } +static int do_set_group(struct path *path, const char *sibling_name) +{ + struct mount *sibling, *mnt; + struct path sibling_path; + int err; + + if (!sibling_name || !*sibling_name) + return -EINVAL; + + err = kern_path(sibling_name, LOOKUP_FOLLOW, &sibling_path); + if (err) + return err; + + sibling = real_mount(sibling_path.mnt); + mnt = real_mount(path->mnt); + + namespace_lock(); + + err = -EPERM; + if (!sibling->mnt_ns || + !ns_capable(sibling->mnt_ns->user_ns, CAP_SYS_ADMIN)) + goto out_unlock; + + err = -EINVAL; + if (sibling->mnt.mnt_sb != mnt->mnt.mnt_sb) + goto out_unlock; + + if (IS_MNT_SHARED(mnt) || IS_MNT_SLAVE(mnt)) + goto out_unlock; + + if (IS_MNT_SLAVE(sibling)) { + struct mount *m = sibling->mnt_master; + + list_add(&mnt->mnt_slave, &m->mnt_slave_list); + mnt->mnt_master = m; + } + + if (IS_MNT_SHARED(sibling)) { + mnt->mnt_group_id = sibling->mnt_group_id; + list_add(&mnt->mnt_share, &sibling->mnt_share); + set_mnt_shared(mnt); + } + + err = 0; +out_unlock: + namespace_unlock(); + + path_put(&sibling_path); + return err; +} + static int do_move_mount(struct path *path, const char *old_name) { struct path old_path, parent_path; @@ -2769,6 +2820,7 @@ long do_mount(const char *dev_name, const char __user *dir_name, struct path path; int retval = 0; int mnt_flags = 0; + unsigned long cmd; /* Discard magic */ if ((flags & MS_MGC_MSK) == MS_MGC_VAL) @@ -2820,19 +2872,25 @@ long do_mount(const char *dev_name, const char __user *dir_name, mnt_flags |= path.mnt->mnt_flags & MNT_ATIME_MASK; } + cmd = flags & (MS_REMOUNT | MS_BIND | + MS_SHARED | MS_PRIVATE | MS_SLAVE | MS_UNBINDABLE | + MS_MOVE | MS_SET_GROUP); + flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE | MS_BORN | MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT | MS_STRICTATIME | MS_NOREMOTELOCK | MS_SUBMOUNT); - if (flags & MS_REMOUNT) + if (cmd & MS_REMOUNT) retval = do_remount(&path, flags & ~MS_REMOUNT, mnt_flags, data_page); - else if (flags & MS_BIND) + else if (cmd & MS_BIND) retval = do_loopback(&path, dev_name, flags & MS_REC); - else if (flags & (MS_SHARED | MS_PRIVATE | MS_SLAVE | MS_UNBINDABLE)) + else if (cmd & (MS_SHARED | MS_PRIVATE | MS_SLAVE | MS_UNBINDABLE)) retval = do_change_type(&path, flags); - else if (flags & MS_MOVE) + else if (cmd & MS_MOVE) retval = do_move_mount(&path, dev_name); + else if (cmd & MS_SET_GROUP) + retval = do_set_group(&path, dev_name); else retval = do_new_mount(&path, type_page, flags, mnt_flags, dev_name, data_page); diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h index 048a85e..33423aa 100644 --- a/include/uapi/linux/fs.h +++ b/include/uapi/linux/fs.h @@ -131,6 +131,12 @@ struct inodes_stat_t { #define MS_STRICTATIME (1<<24) /* Always perform atime updates */ #define MS_LAZYTIME (1<<25) /* Update the on-disk [acm]times lazily */ +/* + * Here are commands and flags. Commands are handled in do_mount() + * and can intersect with kernel internal flags. + */ +#define MS_SET_GROUP (1<<26) /* Add a mount into a shared group */ + /* These sb flags are internal to the kernel */ #define MS_SUBMOUNT (1<<26) #define MS_NOREMOTELOCK (1<<27)