From patchwork Thu Nov 1 21:48:51 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Seth Forshee X-Patchwork-Id: 10664563 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BC2D417D5 for ; Thu, 1 Nov 2018 21:49:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AA2B42C3D1 for ; Thu, 1 Nov 2018 21:49:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9E0CD2C3D5; Thu, 1 Nov 2018 21:49:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1BDC62C3D1 for ; Thu, 1 Nov 2018 21:49:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727695AbeKBGxy (ORCPT ); Fri, 2 Nov 2018 02:53:54 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:58249 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726318AbeKBGxy (ORCPT ); Fri, 2 Nov 2018 02:53:54 -0400 Received: from mail-io1-f72.google.com ([209.85.166.72]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1gIKpt-0008JD-9Q for linux-fsdevel@vger.kernel.org; Thu, 01 Nov 2018 21:49:05 +0000 Received: by mail-io1-f72.google.com with SMTP id q22-v6so2561292iog.9 for ; Thu, 01 Nov 2018 14:49:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=iV6f50cfbE2+B8qKD5+AQCH/eJ9IChkzB4fSfQjBUEU=; b=e8OupJjraC+FNiXxQr/AebOhNZ5wRpTDNfgA+elxCMTHZgue1bxe+iXyEDQuCauv3b OzCjs+/fJjZvi4Ii4hehbBQAQRPmsdFUO1oc5pr8OlUGzZGXdVnAZQxW1MZBwl/LR1Uy FnD9/rkyimQ6KBcWXOZ1ocOKVMEJokybrDazwDMgwilxbPDbEcGMMGloAqCxz6kHyDy7 9YO8V8hdzYOwTJeRjgSjM8jpMQq78b7//LzM2ilMhZrd8KbjFwmrsNcHFwLPOfvSvKla /nz7nhzbKx085TvoVx/EVq7iqJdp9nVfQpwtj6KN6QBT5hHEKf00eCqmHWDS7WGhdGgC Du2g== X-Gm-Message-State: AGRZ1gJNeRi0XTHV3kWGD69rgkHQZSOprygZP8t5UU187LHj62cGJt8O k1QvP9Kal7UvgEZuYtEvWIRpSqj1shP+yMISW5uD0w85jq9vtYRkucYB4bW5mhAM2rUQGqmqJrx YDUpJPgE9fwDUU/DpOkdQJ3VtjJ+41KkdsDAp2Mq9+ww= X-Received: by 2002:a24:68f:: with SMTP id 137-v6mr7506045itv.54.1541108943322; Thu, 01 Nov 2018 14:49:03 -0700 (PDT) X-Google-Smtp-Source: AJdET5fXd/FCTpAfiXJMRR3K072buiL2OSq1KhS9C2+M8kQWTSAZt8QLwmZBhiTYcIbpVmYuhHcrkA== X-Received: by 2002:a24:68f:: with SMTP id 137-v6mr7506011itv.54.1541108942726; Thu, 01 Nov 2018 14:49:02 -0700 (PDT) Received: from localhost ([2605:a601:ac7:2a20:7c8b:4047:a2ef:69cd]) by smtp.gmail.com with ESMTPSA id j19-v6sm11506360itb.25.2018.11.01.14.49.01 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 01 Nov 2018 14:49:01 -0700 (PDT) From: Seth Forshee To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, containers@lists.linux-foundation.org, James Bottomley Subject: [RFC PATCH 1/6] shiftfs: uid/gid shifting bind mount Date: Thu, 1 Nov 2018 16:48:51 -0500 Message-Id: <20181101214856.4563-2-seth.forshee@canonical.com> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20181101214856.4563-1-seth.forshee@canonical.com> References: <20181101214856.4563-1-seth.forshee@canonical.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: James Bottomley This allows any subtree to be uid/gid shifted and bound elsewhere. It does this by operating simlarly to overlayfs. Its primary use is for shifting the underlying uids of filesystems used to support unpriviliged (uid shifted) containers. The usual use case here is that the container is operating with an uid shifted unprivileged root but sometimes needs to make use of or work with a filesystem image that has root at real uid 0. The mechanism is to allow any subordinate mount namespace to mount a shiftfs filesystem (by marking it FS_USERNS_MOUNT) but only allowing it to mount marked subtrees (using the -o mark option as root). Once mounted, the subtree is mapped via the super block user namespace so that the interior ids of the mounting user namespace are the ids written to the filesystem. Signed-off-by: James Bottomley [ saf: use designated initializers for path declarations to fix errors with struct randomization ] Signed-off-by: Seth Forshee --- v3 - update to 4.14 (d_real changes) v1 - based on original shiftfs with uid mappings now done via s_user_ns v2 - fix revalidation of dentries add inode aliasing --- fs/Kconfig | 8 + fs/Makefile | 1 + fs/shiftfs.c | 783 +++++++++++++++++++++++++++++++++++++ include/uapi/linux/magic.h | 2 + 4 files changed, 794 insertions(+) create mode 100644 fs/shiftfs.c diff --git a/fs/Kconfig b/fs/Kconfig index ac474a61be37..392c5a41a9f9 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -113,6 +113,14 @@ source "fs/autofs/Kconfig" source "fs/fuse/Kconfig" source "fs/overlayfs/Kconfig" +config SHIFT_FS + tristate "UID/GID shifting overlay filesystem for containers" + help + This filesystem can overlay any mounted filesystem and shift + the uid/gid the files appear at. The idea is that + unprivileged containers can use this to mount root volumes + using this technique. + menu "Caches" source "fs/fscache/Kconfig" diff --git a/fs/Makefile b/fs/Makefile index 293733f61594..d0222f3816bd 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -128,3 +128,4 @@ obj-y += exofs/ # Multiple modules obj-$(CONFIG_CEPH_FS) += ceph/ obj-$(CONFIG_PSTORE) += pstore/ obj-$(CONFIG_EFIVAR_FS) += efivarfs/ +obj-$(CONFIG_SHIFT_FS) += shiftfs.o diff --git a/fs/shiftfs.c b/fs/shiftfs.c new file mode 100644 index 000000000000..6028244c2f42 --- /dev/null +++ b/fs/shiftfs.c @@ -0,0 +1,783 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +struct shiftfs_super_info { + struct vfsmount *mnt; + struct user_namespace *userns; + bool mark; +}; + +static struct inode *shiftfs_new_inode(struct super_block *sb, umode_t mode, + struct dentry *dentry); + +enum { + OPT_MARK, + OPT_LAST, +}; + +/* global filesystem options */ +static const match_table_t tokens = { + { OPT_MARK, "mark" }, + { OPT_LAST, NULL } +}; + +static const struct cred *shiftfs_get_up_creds(struct super_block *sb) +{ + struct shiftfs_super_info *ssi = sb->s_fs_info; + struct cred *cred = prepare_creds(); + + if (!cred) + return NULL; + + cred->fsuid = KUIDT_INIT(from_kuid(sb->s_user_ns, cred->fsuid)); + cred->fsgid = KGIDT_INIT(from_kgid(sb->s_user_ns, cred->fsgid)); + put_user_ns(cred->user_ns); + cred->user_ns = get_user_ns(ssi->userns); + + return cred; +} + +static const struct cred *shiftfs_new_creds(const struct cred **newcred, + struct super_block *sb) +{ + const struct cred *cred = shiftfs_get_up_creds(sb); + + *newcred = cred; + + if (cred) + cred = override_creds(cred); + else + printk(KERN_ERR "shiftfs: Credential override failed: no memory\n"); + + return cred; +} + +static void shiftfs_old_creds(const struct cred *oldcred, + const struct cred **newcred) +{ + if (!*newcred) + return; + + revert_creds(oldcred); + put_cred(*newcred); +} + +static int shiftfs_parse_options(struct shiftfs_super_info *ssi, char *options) +{ + char *p; + substring_t args[MAX_OPT_ARGS]; + + ssi->mark = false; + + while ((p = strsep(&options, ",")) != NULL) { + int token; + + if (!*p) + continue; + + token = match_token(p, tokens, args); + switch (token) { + case OPT_MARK: + ssi->mark = true; + break; + default: + return -EINVAL; + } + } + return 0; +} + +static void shiftfs_d_release(struct dentry *dentry) +{ + struct dentry *real = dentry->d_fsdata; + + dput(real); +} + +static struct dentry *shiftfs_d_real(struct dentry *dentry, + const struct inode *inode, + unsigned int open_flags, + unsigned int dreal_flags) +{ + struct dentry *real = dentry->d_fsdata; + + if (unlikely(real->d_flags & DCACHE_OP_REAL)) + return real->d_op->d_real(real, real->d_inode, + open_flags, dreal_flags); + + return real; +} + +static int shiftfs_d_weak_revalidate(struct dentry *dentry, unsigned int flags) +{ + struct dentry *real = dentry->d_fsdata; + + if (d_unhashed(real)) + return 0; + + if (!(real->d_flags & DCACHE_OP_WEAK_REVALIDATE)) + return 1; + + return real->d_op->d_weak_revalidate(real, flags); +} + +static int shiftfs_d_revalidate(struct dentry *dentry, unsigned int flags) +{ + struct dentry *real = dentry->d_fsdata; + int ret; + + if (d_unhashed(real)) + return 0; + + /* + * inode state of underlying changed from positive to negative + * or vice versa; force a lookup to update our view + */ + if (d_is_negative(real) != d_is_negative(dentry)) + return 0; + + if (!(real->d_flags & DCACHE_OP_REVALIDATE)) + return 1; + + ret = real->d_op->d_revalidate(real, flags); + + if (ret == 0 && !(flags & LOOKUP_RCU)) + d_invalidate(real); + + return ret; +} + +static const struct dentry_operations shiftfs_dentry_ops = { + .d_release = shiftfs_d_release, + .d_real = shiftfs_d_real, + .d_revalidate = shiftfs_d_revalidate, + .d_weak_revalidate = shiftfs_d_weak_revalidate, +}; + +static int shiftfs_readlink(struct dentry *dentry, char __user *data, + int flags) +{ + struct dentry *real = dentry->d_fsdata; + const struct inode_operations *iop = real->d_inode->i_op; + + if (iop->readlink) + return iop->readlink(real, data, flags); + + return -EINVAL; +} + +static const char *shiftfs_get_link(struct dentry *dentry, struct inode *inode, + struct delayed_call *done) +{ + if (dentry) { + struct dentry *real = dentry->d_fsdata; + struct inode *reali = real->d_inode; + const struct inode_operations *iop = reali->i_op; + const char *res = ERR_PTR(-EPERM); + + if (iop->get_link) + res = iop->get_link(real, reali, done); + + return res; + } else { + /* RCU lookup not supported */ + return ERR_PTR(-ECHILD); + } +} + +static int shiftfs_setxattr(struct dentry *dentry, struct inode *inode, + const char *name, const void *value, + size_t size, int flags) +{ + struct dentry *real = dentry->d_fsdata; + int err = -EOPNOTSUPP; + const struct cred *oldcred, *newcred; + + oldcred = shiftfs_new_creds(&newcred, dentry->d_sb); + err = vfs_setxattr(real, name, value, size, flags); + shiftfs_old_creds(oldcred, &newcred); + + return err; +} + +static int shiftfs_xattr_get(const struct xattr_handler *handler, + struct dentry *dentry, struct inode *inode, + const char *name, void *value, size_t size) +{ + struct dentry *real = dentry->d_fsdata; + int err; + const struct cred *oldcred, *newcred; + + oldcred = shiftfs_new_creds(&newcred, dentry->d_sb); + err = vfs_getxattr(real, name, value, size); + shiftfs_old_creds(oldcred, &newcred); + + return err; +} + +static ssize_t shiftfs_listxattr(struct dentry *dentry, char *list, + size_t size) +{ + struct dentry *real = dentry->d_fsdata; + int err; + const struct cred *oldcred, *newcred; + + oldcred = shiftfs_new_creds(&newcred, dentry->d_sb); + err = vfs_listxattr(real, list, size); + shiftfs_old_creds(oldcred, &newcred); + + return err; +} + +static int shiftfs_removexattr(struct dentry *dentry, const char *name) +{ + struct dentry *real = dentry->d_fsdata; + int err; + const struct cred *oldcred, *newcred; + + oldcred = shiftfs_new_creds(&newcred, dentry->d_sb); + err = vfs_removexattr(real, name); + shiftfs_old_creds(oldcred, &newcred); + + return err; +} + +static int shiftfs_xattr_set(const struct xattr_handler *handler, + struct dentry *dentry, struct inode *inode, + const char *name, const void *value, size_t size, + int flags) +{ + if (!value) + return shiftfs_removexattr(dentry, name); + return shiftfs_setxattr(dentry, inode, name, value, size, flags); +} + +static void shiftfs_fill_inode(struct inode *inode, struct dentry *dentry) +{ + struct inode *reali; + + if (!dentry) + return; + + reali = dentry->d_inode; + + if (!reali->i_op->get_link) + inode->i_opflags |= IOP_NOFOLLOW; + + inode->i_mapping = reali->i_mapping; + inode->i_private = dentry; +} + +static int shiftfs_make_object(struct inode *dir, struct dentry *dentry, + umode_t mode, const char *symlink, + struct dentry *hardlink, bool excl) +{ + struct dentry *real = dir->i_private, *new = dentry->d_fsdata; + struct inode *reali = real->d_inode, *newi; + const struct inode_operations *iop = reali->i_op; + int err; + const struct cred *oldcred, *newcred; + bool op_ok = false; + + if (hardlink) { + op_ok = iop->link; + } else { + switch (mode & S_IFMT) { + case S_IFDIR: + op_ok = iop->mkdir; + break; + case S_IFREG: + op_ok = iop->create; + break; + case S_IFLNK: + op_ok = iop->symlink; + } + } + if (!op_ok) + return -EINVAL; + + + newi = shiftfs_new_inode(dentry->d_sb, mode, NULL); + if (!newi) + return -ENOMEM; + + oldcred = shiftfs_new_creds(&newcred, dentry->d_sb); + + inode_lock_nested(reali, I_MUTEX_PARENT); + + err = -EINVAL; /* shut gcc up about uninit var */ + if (hardlink) { + struct dentry *realhardlink = hardlink->d_fsdata; + + err = vfs_link(realhardlink, reali, new, NULL); + } else { + switch (mode & S_IFMT) { + case S_IFDIR: + err = vfs_mkdir(reali, new, mode); + break; + case S_IFREG: + err = vfs_create(reali, new, mode, excl); + break; + case S_IFLNK: + err = vfs_symlink(reali, new, symlink); + } + } + + shiftfs_old_creds(oldcred, &newcred); + + if (err) + goto out_dput; + + shiftfs_fill_inode(newi, new); + + d_instantiate(dentry, newi); + + new = NULL; + newi = NULL; + + out_dput: + dput(new); + iput(newi); + inode_unlock(reali); + + return err; +} + +static int shiftfs_create(struct inode *dir, struct dentry *dentry, + umode_t mode, bool excl) +{ + mode |= S_IFREG; + + return shiftfs_make_object(dir, dentry, mode, NULL, NULL, excl); +} + +static int shiftfs_mkdir(struct inode *dir, struct dentry *dentry, + umode_t mode) +{ + mode |= S_IFDIR; + + return shiftfs_make_object(dir, dentry, mode, NULL, NULL, false); +} + +static int shiftfs_link(struct dentry *hardlink, struct inode *dir, + struct dentry *dentry) +{ + return shiftfs_make_object(dir, dentry, 0, NULL, hardlink, false); +} + +static int shiftfs_symlink(struct inode *dir, struct dentry *dentry, + const char *symlink) +{ + return shiftfs_make_object(dir, dentry, S_IFLNK, symlink, NULL, false); +} + +static int shiftfs_rm(struct inode *dir, struct dentry *dentry, bool rmdir) +{ + struct dentry *real = dir->i_private, *new = dentry->d_fsdata; + struct inode *reali = real->d_inode; + int err; + const struct cred *oldcred, *newcred; + + inode_lock_nested(reali, I_MUTEX_PARENT); + + oldcred = shiftfs_new_creds(&newcred, dentry->d_sb); + + if (rmdir) + err = vfs_rmdir(reali, new); + else + err = vfs_unlink(reali, new, NULL); + + shiftfs_old_creds(oldcred, &newcred); + inode_unlock(reali); + + return err; +} + +static int shiftfs_unlink(struct inode *dir, struct dentry *dentry) +{ + return shiftfs_rm(dir, dentry, false); +} + +static int shiftfs_rmdir(struct inode *dir, struct dentry *dentry) +{ + return shiftfs_rm(dir, dentry, true); +} + +static int shiftfs_rename(struct inode *olddir, struct dentry *old, + struct inode *newdir, struct dentry *new, + unsigned int flags) +{ + struct dentry *rodd = olddir->i_private, *rndd = newdir->i_private, + *realold = old->d_fsdata, + *realnew = new->d_fsdata, *trap; + struct inode *realolddir = rodd->d_inode, *realnewdir = rndd->d_inode; + int err = -EINVAL; + const struct cred *oldcred, *newcred; + + trap = lock_rename(rndd, rodd); + + if (trap == realold || trap == realnew) + goto out_unlock; + + oldcred = shiftfs_new_creds(&newcred, old->d_sb); + + err = vfs_rename(realolddir, realold, realnewdir, + realnew, NULL, flags); + + shiftfs_old_creds(oldcred, &newcred); + + out_unlock: + unlock_rename(rndd, rodd); + + return err; +} + +static struct dentry *shiftfs_lookup(struct inode *dir, struct dentry *dentry, + unsigned int flags) +{ + struct dentry *real = dir->i_private, *new; + struct inode *reali = real->d_inode, *newi; + const struct cred *oldcred, *newcred; + + inode_lock(reali); + oldcred = shiftfs_new_creds(&newcred, dentry->d_sb); + new = lookup_one_len(dentry->d_name.name, real, dentry->d_name.len); + shiftfs_old_creds(oldcred, &newcred); + inode_unlock(reali); + + if (IS_ERR(new)) + return new; + + dentry->d_fsdata = new; + + newi = NULL; + if (!new->d_inode) + goto out; + + newi = shiftfs_new_inode(dentry->d_sb, new->d_inode->i_mode, new); + if (!newi) { + dput(new); + return ERR_PTR(-ENOMEM); + } + + out: + return d_splice_alias(newi, dentry); +} + +static int shiftfs_permission(struct inode *inode, int mask) +{ + struct dentry *real = inode->i_private; + struct inode *reali = real->d_inode; + const struct inode_operations *iop = reali->i_op; + int err; + const struct cred *oldcred, *newcred; + + if (mask & MAY_NOT_BLOCK) + return -ECHILD; + + oldcred = shiftfs_new_creds(&newcred, inode->i_sb); + if (iop->permission) + err = iop->permission(reali, mask); + else + err = generic_permission(reali, mask); + shiftfs_old_creds(oldcred, &newcred); + + return err; +} + +static int shiftfs_setattr(struct dentry *dentry, struct iattr *attr) +{ + struct dentry *real = dentry->d_fsdata; + struct inode *reali = real->d_inode; + const struct inode_operations *iop = reali->i_op; + struct iattr newattr = *attr; + const struct cred *oldcred, *newcred; + struct super_block *sb = dentry->d_sb; + int err; + + newattr.ia_uid = KUIDT_INIT(from_kuid(sb->s_user_ns, attr->ia_uid)); + newattr.ia_gid = KGIDT_INIT(from_kgid(sb->s_user_ns, attr->ia_gid)); + + oldcred = shiftfs_new_creds(&newcred, dentry->d_sb); + inode_lock(reali); + if (iop->setattr) + err = iop->setattr(real, &newattr); + else + err = simple_setattr(real, &newattr); + inode_unlock(reali); + shiftfs_old_creds(oldcred, &newcred); + + if (err) + return err; + + /* all OK, reflect the change on our inode */ + setattr_copy(d_inode(dentry), attr); + return 0; +} + +static int shiftfs_getattr(const struct path *path, struct kstat *stat, + u32 request_mask, unsigned int query_flags) +{ + struct inode *inode = path->dentry->d_inode; + struct dentry *real = path->dentry->d_fsdata; + struct inode *reali = real->d_inode; + const struct inode_operations *iop = reali->i_op; + struct path newpath = { .mnt = path->dentry->d_sb->s_fs_info, .dentry = real }; + int err = 0; + + if (iop->getattr) + err = iop->getattr(&newpath, stat, request_mask, query_flags); + else + generic_fillattr(reali, stat); + + if (err) + return err; + + /* transform the underlying id */ + stat->uid = make_kuid(inode->i_sb->s_user_ns, __kuid_val(stat->uid)); + stat->gid = make_kgid(inode->i_sb->s_user_ns, __kgid_val(stat->gid)); + return 0; +} + +static const struct inode_operations shiftfs_inode_ops = { + .lookup = shiftfs_lookup, + .getattr = shiftfs_getattr, + .setattr = shiftfs_setattr, + .permission = shiftfs_permission, + .mkdir = shiftfs_mkdir, + .symlink = shiftfs_symlink, + .get_link = shiftfs_get_link, + .readlink = shiftfs_readlink, + .unlink = shiftfs_unlink, + .rmdir = shiftfs_rmdir, + .rename = shiftfs_rename, + .link = shiftfs_link, + .create = shiftfs_create, + .mknod = NULL, /* no special files currently */ + .listxattr = shiftfs_listxattr, +}; + +static struct inode *shiftfs_new_inode(struct super_block *sb, umode_t mode, + struct dentry *dentry) +{ + struct inode *inode; + + inode = new_inode(sb); + if (!inode) + return NULL; + + /* + * our inode is completely vestigial. All lookups, getattr + * and permission checks are done on the underlying inode, so + * what the user sees is entirely from the underlying inode. + */ + mode &= S_IFMT; + + inode->i_ino = get_next_ino(); + inode->i_mode = mode; + inode->i_flags |= S_NOATIME | S_NOCMTIME; + + inode->i_op = &shiftfs_inode_ops; + + shiftfs_fill_inode(inode, dentry); + + return inode; +} + +static int shiftfs_show_options(struct seq_file *m, struct dentry *dentry) +{ + struct super_block *sb = dentry->d_sb; + struct shiftfs_super_info *ssi = sb->s_fs_info; + + if (ssi->mark) + seq_show_option(m, "mark", NULL); + + return 0; +} + +static int shiftfs_statfs(struct dentry *dentry, struct kstatfs *buf) +{ + struct super_block *sb = dentry->d_sb; + struct shiftfs_super_info *ssi = sb->s_fs_info; + struct dentry *root = sb->s_root; + struct dentry *realroot = root->d_fsdata; + struct path realpath = { .mnt = ssi->mnt, .dentry = realroot }; + int err; + + err = vfs_statfs(&realpath, buf); + if (err) + return err; + + buf->f_type = sb->s_magic; + + return 0; +} + +static void shiftfs_put_super(struct super_block *sb) +{ + struct shiftfs_super_info *ssi = sb->s_fs_info; + + mntput(ssi->mnt); + put_user_ns(ssi->userns); + kfree(ssi); +} + +static const struct xattr_handler shiftfs_xattr_handler = { + .prefix = "", + .get = shiftfs_xattr_get, + .set = shiftfs_xattr_set, +}; + +const struct xattr_handler *shiftfs_xattr_handlers[] = { + &shiftfs_xattr_handler, + NULL +}; + +static const struct super_operations shiftfs_super_ops = { + .put_super = shiftfs_put_super, + .show_options = shiftfs_show_options, + .statfs = shiftfs_statfs, +}; + +struct shiftfs_data { + void *data; + const char *path; +}; + +static int shiftfs_fill_super(struct super_block *sb, void *raw_data, + int silent) +{ + struct shiftfs_data *data = raw_data; + char *name = kstrdup(data->path, GFP_KERNEL); + int err = -ENOMEM; + struct shiftfs_super_info *ssi = NULL; + struct path path; + struct dentry *dentry; + + if (!name) + goto out; + + ssi = kzalloc(sizeof(*ssi), GFP_KERNEL); + if (!ssi) + goto out; + + err = -EPERM; + err = shiftfs_parse_options(ssi, data->data); + if (err) + goto out; + + /* to mark a mount point, must be real root */ + if (ssi->mark && !capable(CAP_SYS_ADMIN)) + goto out; + + /* else to mount a mark, must be userns admin */ + if (!ssi->mark && !ns_capable(current_user_ns(), CAP_SYS_ADMIN)) + goto out; + + err = kern_path(name, LOOKUP_FOLLOW, &path); + if (err) + goto out; + + err = -EPERM; + + if (!S_ISDIR(path.dentry->d_inode->i_mode)) { + err = -ENOTDIR; + goto out_put; + } + + sb->s_stack_depth = path.dentry->d_sb->s_stack_depth + 1; + if (sb->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) { + printk(KERN_ERR "shiftfs: maximum stacking depth exceeded\n"); + err = -EINVAL; + goto out_put; + } + + if (ssi->mark) { + /* + * this part is visible unshifted, so make sure no + * executables that could be used to give suid + * privileges + */ + sb->s_iflags = SB_I_NOEXEC; + ssi->mnt = path.mnt; + dentry = path.dentry; + } else { + struct shiftfs_super_info *mp_ssi; + + /* + * this leg executes if we're admin capable in + * the namespace, so be very careful + */ + if (path.dentry->d_sb->s_magic != SHIFTFS_MAGIC) + goto out_put; + mp_ssi = path.dentry->d_sb->s_fs_info; + if (!mp_ssi->mark) + goto out_put; + ssi->mnt = mntget(mp_ssi->mnt); + dentry = dget(path.dentry->d_fsdata); + path_put(&path); + } + ssi->userns = get_user_ns(dentry->d_sb->s_user_ns); + sb->s_fs_info = ssi; + sb->s_magic = SHIFTFS_MAGIC; + sb->s_op = &shiftfs_super_ops; + sb->s_xattr = shiftfs_xattr_handlers; + sb->s_d_op = &shiftfs_dentry_ops; + sb->s_root = d_make_root(shiftfs_new_inode(sb, S_IFDIR, dentry)); + sb->s_root->d_fsdata = dentry; + + return 0; + + out_put: + path_put(&path); + out: + kfree(name); + kfree(ssi); + return err; +} + +static struct dentry *shiftfs_mount(struct file_system_type *fs_type, + int flags, const char *dev_name, void *data) +{ + struct shiftfs_data d = { data, dev_name }; + + return mount_nodev(fs_type, flags, &d, shiftfs_fill_super); +} + +static struct file_system_type shiftfs_type = { + .owner = THIS_MODULE, + .name = "shiftfs", + .mount = shiftfs_mount, + .kill_sb = kill_anon_super, + .fs_flags = FS_USERNS_MOUNT, +}; + +static int __init shiftfs_init(void) +{ + return register_filesystem(&shiftfs_type); +} + +static void __exit shiftfs_exit(void) +{ + unregister_filesystem(&shiftfs_type); +} + +MODULE_ALIAS_FS("shiftfs"); +MODULE_AUTHOR("James Bottomley"); +MODULE_DESCRIPTION("uid/gid shifting bind filesystem"); +MODULE_LICENSE("GPL v2"); +module_init(shiftfs_init) +module_exit(shiftfs_exit) diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h index 1a6fee974116..671b0c6d0754 100644 --- a/include/uapi/linux/magic.h +++ b/include/uapi/linux/magic.h @@ -90,4 +90,6 @@ #define BALLOON_KVM_MAGIC 0x13661366 #define ZSMALLOC_MAGIC 0x58295829 +#define SHIFTFS_MAGIC 0x6a656a62 + #endif /* __LINUX_MAGIC_H__ */ From patchwork Thu Nov 1 21:48:52 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Seth Forshee X-Patchwork-Id: 10664565 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E106517D5 for ; Thu, 1 Nov 2018 21:49:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D2FF82C3D1 for ; Thu, 1 Nov 2018 21:49:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C73BA2C3D5; Thu, 1 Nov 2018 21:49:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 32B8D2C3D1 for ; Thu, 1 Nov 2018 21:49:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726821AbeKBGyb (ORCPT ); Fri, 2 Nov 2018 02:54:31 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:58254 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727662AbeKBGxy (ORCPT ); Fri, 2 Nov 2018 02:53:54 -0400 Received: from mail-io1-f72.google.com ([209.85.166.72]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1gIKpu-0008JS-H4 for linux-fsdevel@vger.kernel.org; Thu, 01 Nov 2018 21:49:06 +0000 Received: by mail-io1-f72.google.com with SMTP id k3-v6so8832658ioq.8 for ; Thu, 01 Nov 2018 14:49:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=HqO+O/aNpZPwRvx4yB9/pL17OMe9OsdTMxTohpBwNEc=; b=AzLnGGeWYxu9qjE0UxWM8JHIXuZ38YKvCzunC1w2s8HWfUV8GQ5FLhRMChPKSsZiqz xY35/eEj/SQlFs8BjjlE6eUmK16qT7jMTvEW1YxfH6m/34VDdpH9JedLalSjaFZwNAao pFLff64RlljZWQeI6CnS2EBHGblyWNPoqFuLvpR2yp0ulqFBn4vk2+rc8IW+1qYe8NEQ 157PEDl4vj4L3PMXh450XkbBuVyij/60Iq5HzNpm9xeQ1Gp50QzsD4/0Yx1bXa872mvE DFkiLY7TMeU0Np4RZc54wTnY86+aEjnoTHqWTATIKc5zeJGYmh3vHawtD0isN2GL3XGq Fc3g== X-Gm-Message-State: AGRZ1gKy6TtGzr4rOhFudrQzJu6drz2YlmB94wQIHIyggbUrGVmDSsVS O0AFmoM3I5YUa6piuYZKYf0a/A5qjHyY6d3UKAFVIyflxyRLgX3FyyJhflCdqssgHNvARAC1Y6P f7uW4IBRP1EcmMCKM1AehWnEmPm3HHzFdJ71OuogwEiA= X-Received: by 2002:a5e:980f:: with SMTP id s15-v6mr6702574ioj.87.1541108944831; Thu, 01 Nov 2018 14:49:04 -0700 (PDT) X-Google-Smtp-Source: AJdET5ePUeRP/Exu0eFfpjFT6LSRHDYkH5G6cJyaQ+egHUuzEBr7b+uaZWetTlGcUMc0BptAWoumag== X-Received: by 2002:a5e:980f:: with SMTP id s15-v6mr6702555ioj.87.1541108944308; Thu, 01 Nov 2018 14:49:04 -0700 (PDT) Received: from localhost ([2605:a601:ac7:2a20:7c8b:4047:a2ef:69cd]) by smtp.gmail.com with ESMTPSA id z9-v6sm9648532iom.12.2018.11.01.14.49.03 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 01 Nov 2018 14:49:03 -0700 (PDT) From: Seth Forshee To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, containers@lists.linux-foundation.org, James Bottomley Subject: [RFC PATCH 2/6] shiftfs: map inodes to lower fs inodes instead of dentries Date: Thu, 1 Nov 2018 16:48:52 -0500 Message-Id: <20181101214856.4563-3-seth.forshee@canonical.com> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20181101214856.4563-1-seth.forshee@canonical.com> References: <20181101214856.4563-1-seth.forshee@canonical.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Since shiftfs inodes map to dentries in the lower fs, two links to the same lowerfs inode create separate inodes in shiftfs. This causes problems for inotify, as a watch on one of these files in shiftfs will not see changes made to the underlying inode via the other file. Fix this by updating shiftfs to map its inodes to corresponding inodes in the lower fs. Inodes are cached using the pointer to the lower fs inode as the hash value. This fixes a second inotify problem whereby a watch is set on an inode, the dentry is evicted from the cache, and events on a new dentry are not reported back to the watch original inode. Signed-off-by: Seth Forshee --- fs/shiftfs.c | 105 ++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 79 insertions(+), 26 deletions(-) diff --git a/fs/shiftfs.c b/fs/shiftfs.c index 6028244c2f42..b179a1be7bc1 100644 --- a/fs/shiftfs.c +++ b/fs/shiftfs.c @@ -22,6 +22,7 @@ struct shiftfs_super_info { static struct inode *shiftfs_new_inode(struct super_block *sb, umode_t mode, struct dentry *dentry); +static void shiftfs_init_inode(struct inode *inode, umode_t mode); enum { OPT_MARK, @@ -278,15 +279,27 @@ static void shiftfs_fill_inode(struct inode *inode, struct dentry *dentry) inode->i_opflags |= IOP_NOFOLLOW; inode->i_mapping = reali->i_mapping; - inode->i_private = dentry; + inode->i_private = reali; + set_nlink(inode, reali->i_nlink); +} + +static int shiftfs_inode_test(struct inode *inode, void *data) +{ + return inode->i_private == data; +} + +static int shiftfs_inode_set(struct inode *inode, void *data) +{ + inode->i_private = data; + return 0; } static int shiftfs_make_object(struct inode *dir, struct dentry *dentry, umode_t mode, const char *symlink, struct dentry *hardlink, bool excl) { - struct dentry *real = dir->i_private, *new = dentry->d_fsdata; - struct inode *reali = real->d_inode, *newi; + struct dentry *new = dentry->d_fsdata; + struct inode *reali = dir->i_private, *inode, *newi; const struct inode_operations *iop = reali->i_op; int err; const struct cred *oldcred, *newcred; @@ -310,9 +323,14 @@ static int shiftfs_make_object(struct inode *dir, struct dentry *dentry, return -EINVAL; - newi = shiftfs_new_inode(dentry->d_sb, mode, NULL); - if (!newi) - return -ENOMEM; + if (hardlink) { + inode = d_inode(hardlink); + ihold(inode); + } else { + inode = shiftfs_new_inode(dentry->d_sb, mode, NULL); + if (!inode) + return -ENOMEM; + } oldcred = shiftfs_new_creds(&newcred, dentry->d_sb); @@ -341,16 +359,33 @@ static int shiftfs_make_object(struct inode *dir, struct dentry *dentry, if (err) goto out_dput; - shiftfs_fill_inode(newi, new); + if (hardlink) { + WARN_ON(inode->i_private != new->d_inode); + inc_nlink(inode); + } else { + shiftfs_fill_inode(inode, new); + + newi = inode_insert5(inode, (unsigned long)new->d_inode, + shiftfs_inode_test, shiftfs_inode_set, + new->d_inode); + if (newi != inode) { + pr_warn_ratelimited("shiftfs: newly created inode found in cache\n"); + iput(inode); + inode = newi; + } + } + + if (inode->i_state & I_NEW) + unlock_new_inode(inode); - d_instantiate(dentry, newi); + d_instantiate(dentry, inode); new = NULL; - newi = NULL; + inode = NULL; out_dput: dput(new); - iput(newi); + iput(inode); inode_unlock(reali); return err; @@ -386,8 +421,8 @@ static int shiftfs_symlink(struct inode *dir, struct dentry *dentry, static int shiftfs_rm(struct inode *dir, struct dentry *dentry, bool rmdir) { - struct dentry *real = dir->i_private, *new = dentry->d_fsdata; - struct inode *reali = real->d_inode; + struct dentry *new = dentry->d_fsdata; + struct inode *reali = dir->i_private; int err; const struct cred *oldcred, *newcred; @@ -400,6 +435,13 @@ static int shiftfs_rm(struct inode *dir, struct dentry *dentry, bool rmdir) else err = vfs_unlink(reali, new, NULL); + if (!err) { + if (rmdir) + clear_nlink(d_inode(dentry)); + else + drop_nlink(d_inode(dentry)); + } + shiftfs_old_creds(oldcred, &newcred); inode_unlock(reali); @@ -420,7 +462,8 @@ static int shiftfs_rename(struct inode *olddir, struct dentry *old, struct inode *newdir, struct dentry *new, unsigned int flags) { - struct dentry *rodd = olddir->i_private, *rndd = newdir->i_private, + struct dentry *rodd = old->d_parent->d_fsdata, + *rndd = new->d_parent->d_fsdata, *realold = old->d_fsdata, *realnew = new->d_fsdata, *trap; struct inode *realolddir = rodd->d_inode, *realnewdir = rndd->d_inode; @@ -448,8 +491,8 @@ static int shiftfs_rename(struct inode *olddir, struct dentry *old, static struct dentry *shiftfs_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags) { - struct dentry *real = dir->i_private, *new; - struct inode *reali = real->d_inode, *newi; + struct dentry *real = dentry->d_parent->d_fsdata, *new; + struct inode *reali = real->d_inode, *newi, *inode; const struct cred *oldcred, *newcred; inode_lock(reali); @@ -463,24 +506,30 @@ static struct dentry *shiftfs_lookup(struct inode *dir, struct dentry *dentry, dentry->d_fsdata = new; - newi = NULL; - if (!new->d_inode) + inode = NULL; + newi = new->d_inode; + if (!newi) goto out; - newi = shiftfs_new_inode(dentry->d_sb, new->d_inode->i_mode, new); - if (!newi) { + inode = iget5_locked(dentry->d_sb, (unsigned long)newi, + shiftfs_inode_test, shiftfs_inode_set, newi); + if (!inode) { dput(new); return ERR_PTR(-ENOMEM); } + if (inode->i_state & I_NEW) { + shiftfs_init_inode(inode, newi->i_mode); + shiftfs_fill_inode(inode, new); + unlock_new_inode(inode); + } out: - return d_splice_alias(newi, dentry); + return d_splice_alias(inode, dentry); } static int shiftfs_permission(struct inode *inode, int mask) { - struct dentry *real = inode->i_private; - struct inode *reali = real->d_inode; + struct inode *reali = inode->i_private; const struct inode_operations *iop = reali->i_op; int err; const struct cred *oldcred, *newcred; @@ -579,6 +628,14 @@ static struct inode *shiftfs_new_inode(struct super_block *sb, umode_t mode, if (!inode) return NULL; + shiftfs_init_inode(inode, mode); + shiftfs_fill_inode(inode, dentry); + + return inode; +} + +static void shiftfs_init_inode(struct inode *inode, umode_t mode) +{ /* * our inode is completely vestigial. All lookups, getattr * and permission checks are done on the underlying inode, so @@ -591,10 +648,6 @@ static struct inode *shiftfs_new_inode(struct super_block *sb, umode_t mode, inode->i_flags |= S_NOATIME | S_NOCMTIME; inode->i_op = &shiftfs_inode_ops; - - shiftfs_fill_inode(inode, dentry); - - return inode; } static int shiftfs_show_options(struct seq_file *m, struct dentry *dentry) From patchwork Thu Nov 1 21:48:53 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Seth Forshee X-Patchwork-Id: 10664559 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E851C157A for ; Thu, 1 Nov 2018 21:49:30 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DB3BF2C3D1 for ; Thu, 1 Nov 2018 21:49:30 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CED652C3D5; Thu, 1 Nov 2018 21:49:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 744172C3D1 for ; Thu, 1 Nov 2018 21:49:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727870AbeKBGx6 (ORCPT ); Fri, 2 Nov 2018 02:53:58 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:58260 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727760AbeKBGx4 (ORCPT ); Fri, 2 Nov 2018 02:53:56 -0400 Received: from mail-io1-f72.google.com ([209.85.166.72]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1gIKpw-0008Jj-Fr for linux-fsdevel@vger.kernel.org; Thu, 01 Nov 2018 21:49:08 +0000 Received: by mail-io1-f72.google.com with SMTP id q22-v6so2561436iog.9 for ; Thu, 01 Nov 2018 14:49:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=WpsqCbr7w6G1e66H+z4zXCEMfE/WMPxAtYiui+F+0TQ=; b=YIHRKzLRpTrF8Hsb68R8mgTIaQHCUwDqiCLwqthzLObs7780aJae55psa0tSzo2C2t FNV1QUrlJUJ5SGG3B7ib6wMdCezTW5OFyNkg+X/8jsCkPCV3pTjB2IOtcYxhsPO4YeBs /rq2mH3ojOvJWeTmwdziEnEL69D0D2B7bVHPO104AAVpK3/HUVCGQAgH7udyq7iDbkCX RmFBNd5gtKLcAzw8ylVYDDLiiUMe5MMIk+/epmkR0018F0t+t6O59BzasuXf757CVfGR heB5jBYhb/PhD+3FBe1Yeh6V/3IYh2QIt8E/KSoWqsnFf363zyafU6v4wZJbvzXkOLR6 beJA== X-Gm-Message-State: AGRZ1gJNPu6d1Ep2IHonNdDhCCq0Pke1y6L9wYUM0b4GPwcMlvVlytau bN54Wn4ibkFnRqeyN5yuAdhp1tQ0apo2VQfziB9qVC3T2lu/l4vUmfRTSVtI1PmrUDxbHvCp+cD AFYwfywdXj31Vvrj8Hh/X5Y8v2Ia1dssHOD4ljQjs1Kg= X-Received: by 2002:a6b:abc5:: with SMTP id u188-v6mr6751446ioe.211.1541108946962; Thu, 01 Nov 2018 14:49:06 -0700 (PDT) X-Google-Smtp-Source: AJdET5fjfZ7jrynE6AvGM6JEoWp141CvRE5cDt6y0xbNn7gwywf5aXDUTjg2PXVVig5KIofZ+e7igg== X-Received: by 2002:a6b:abc5:: with SMTP id u188-v6mr6751433ioe.211.1541108946470; Thu, 01 Nov 2018 14:49:06 -0700 (PDT) Received: from localhost ([2605:a601:ac7:2a20:7c8b:4047:a2ef:69cd]) by smtp.gmail.com with ESMTPSA id x21-v6sm11574038ita.6.2018.11.01.14.49.05 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 01 Nov 2018 14:49:05 -0700 (PDT) From: Seth Forshee To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, containers@lists.linux-foundation.org, James Bottomley Subject: [RFC PATCH 3/6] shiftfs: copy inode attrs up from underlying fs Date: Thu, 1 Nov 2018 16:48:53 -0500 Message-Id: <20181101214856.4563-4-seth.forshee@canonical.com> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20181101214856.4563-1-seth.forshee@canonical.com> References: <20181101214856.4563-1-seth.forshee@canonical.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Not all inode permission checks go through the permission callback, e.g. some checks related to file capabilities. Always copy up the inode attrs to ensure these checks work as expected. Also introduce helpers helpers for shifting kernel ids from one user ns to another, as this is an operation that is going to be repeated. Signed-off-by: Seth Forshee --- fs/shiftfs.c | 30 +++++++++++++++++++++++++++++- 1 file changed, 29 insertions(+), 1 deletion(-) diff --git a/fs/shiftfs.c b/fs/shiftfs.c index b179a1be7bc1..556594988dd2 100644 --- a/fs/shiftfs.c +++ b/fs/shiftfs.c @@ -266,6 +266,33 @@ static int shiftfs_xattr_set(const struct xattr_handler *handler, return shiftfs_setxattr(dentry, inode, name, value, size, flags); } +static kuid_t shift_kuid(struct user_namespace *from, struct user_namespace *to, + kuid_t kuid) +{ + uid_t uid = from_kuid(from, kuid); + return make_kuid(to, uid); +} + +static kgid_t shift_kgid(struct user_namespace *from, struct user_namespace *to, + kgid_t kgid) +{ + gid_t gid = from_kgid(from, kgid); + return make_kgid(to, gid); +} + +static void shiftfs_copyattr(struct inode *from, struct inode *to) +{ + struct user_namespace *from_ns = from->i_sb->s_user_ns; + struct user_namespace *to_ns = to->i_sb->s_user_ns; + + to->i_uid = shift_kuid(from_ns, to_ns, from->i_uid); + to->i_gid = shift_kgid(from_ns, to_ns, from->i_gid); + to->i_mode = from->i_mode; + to->i_atime = from->i_atime; + to->i_mtime = from->i_mtime; + to->i_ctime = from->i_ctime; +} + static void shiftfs_fill_inode(struct inode *inode, struct dentry *dentry) { struct inode *reali; @@ -278,6 +305,7 @@ static void shiftfs_fill_inode(struct inode *inode, struct dentry *dentry) if (!reali->i_op->get_link) inode->i_opflags |= IOP_NOFOLLOW; + shiftfs_copyattr(reali, inode); inode->i_mapping = reali->i_mapping; inode->i_private = reali; set_nlink(inode, reali->i_nlink); @@ -573,7 +601,7 @@ static int shiftfs_setattr(struct dentry *dentry, struct iattr *attr) return err; /* all OK, reflect the change on our inode */ - setattr_copy(d_inode(dentry), attr); + shiftfs_copyattr(reali, d_inode(dentry)); return 0; } From patchwork Thu Nov 1 21:48:54 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Seth Forshee X-Patchwork-Id: 10664561 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F1173157A for ; Thu, 1 Nov 2018 21:49:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E31442C3D1 for ; Thu, 1 Nov 2018 21:49:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D6DD52C3D5; Thu, 1 Nov 2018 21:49:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 77C982C3D1 for ; Thu, 1 Nov 2018 21:49:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727830AbeKBGx6 (ORCPT ); Fri, 2 Nov 2018 02:53:58 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:58265 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727650AbeKBGx6 (ORCPT ); Fri, 2 Nov 2018 02:53:58 -0400 Received: from mail-io1-f70.google.com ([209.85.166.70]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1gIKpy-0008Jz-7o for linux-fsdevel@vger.kernel.org; Thu, 01 Nov 2018 21:49:10 +0000 Received: by mail-io1-f70.google.com with SMTP id k3-v6so8832866ioq.8 for ; Thu, 01 Nov 2018 14:49:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=tfcS/D8E2NVDSMMG2zN/Checw4lvvL0gTuc15nYTfSo=; b=sqOvfEdPeKK3VeF1JUS90PwbUuoRgdqzsERXpXN2JyCq5A9l09DBct2kXtJjue8XTZ GDSaqg5o10h44KEgziZJjMm/6y6ZlOBOlq7qG8IahUhCgXU3kN+Fzzvc/EsWvr0J6KnZ 1dxhnM/ockluH/1J01syX05H5Y/FkalIiR3kJy9K3nYk5QFUiOv0Ag6TD7QT+7R/FzdM adKVXjegD2jhAo/wD5/jwFrglUHvAk8UsGYndsQOv64/aO56vvnWw4oCkjv7CV2g+pNK /nQOZFhuQeJb3feb7SlcLodPJK7ZpuBLqx0Yb5BazmlOVreGqEkbDVF4U8La/Wo732gO 99gA== X-Gm-Message-State: AGRZ1gKEcy4aaaiX9ptyJbgzDYwi/UwAoSYUkF2/b8pvNCJD8L6sNka+ /NtwvdMBeTk9uf8ovwE/zIJFlDK+3zmRo0NNxF5m8GQaXxfy2y0m70JCEEIEiVBbRpLAyLC+Qn0 MmpcmXeYGHJXX9SM/N/76QsYLAGuYVAWUB7KnCZz+Yt0= X-Received: by 2002:a02:a1cb:: with SMTP id o11-v6mr5094054jah.82.1541108948879; Thu, 01 Nov 2018 14:49:08 -0700 (PDT) X-Google-Smtp-Source: AJdET5d3gaLKChjWegS4zR65Vvle9g3lwREu2tQNnuNxjRpAezX5+R1sKV/oUpo/g3vllPc4gpB2/w== X-Received: by 2002:a02:a1cb:: with SMTP id o11-v6mr5094031jah.82.1541108948452; Thu, 01 Nov 2018 14:49:08 -0700 (PDT) Received: from localhost ([2605:a601:ac7:2a20:7c8b:4047:a2ef:69cd]) by smtp.gmail.com with ESMTPSA id k21-v6sm5266844iom.50.2018.11.01.14.49.07 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 01 Nov 2018 14:49:07 -0700 (PDT) From: Seth Forshee To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, containers@lists.linux-foundation.org, James Bottomley Subject: [RFC PATCH 4/6] shiftfs: translate uids using s_user_ns from lower fs Date: Thu, 1 Nov 2018 16:48:54 -0500 Message-Id: <20181101214856.4563-5-seth.forshee@canonical.com> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20181101214856.4563-1-seth.forshee@canonical.com> References: <20181101214856.4563-1-seth.forshee@canonical.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Do not assume that ids from the lower filesystem are from init_user_ns. Instead, translate them from that filesystem's s_user_ns and then to the shiftfs user ns. Signed-off-by: Seth Forshee --- fs/shiftfs.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/shiftfs.c b/fs/shiftfs.c index 556594988dd2..226c03d8588b 100644 --- a/fs/shiftfs.c +++ b/fs/shiftfs.c @@ -613,6 +613,8 @@ static int shiftfs_getattr(const struct path *path, struct kstat *stat, struct inode *reali = real->d_inode; const struct inode_operations *iop = reali->i_op; struct path newpath = { .mnt = path->dentry->d_sb->s_fs_info, .dentry = real }; + struct user_namespace *from_ns = reali->i_sb->s_user_ns; + struct user_namespace *to_ns = inode->i_sb->s_user_ns; int err = 0; if (iop->getattr) @@ -624,8 +626,8 @@ static int shiftfs_getattr(const struct path *path, struct kstat *stat, return err; /* transform the underlying id */ - stat->uid = make_kuid(inode->i_sb->s_user_ns, __kuid_val(stat->uid)); - stat->gid = make_kgid(inode->i_sb->s_user_ns, __kgid_val(stat->gid)); + stat->uid = shift_kuid(from_ns, to_ns, stat->uid); + stat->gid = shift_kgid(from_ns, to_ns, stat->gid); return 0; } From patchwork Thu Nov 1 21:48:55 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Seth Forshee X-Patchwork-Id: 10664555 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 804FD17D5 for ; Thu, 1 Nov 2018 21:49:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 720F82C1F6 for ; Thu, 1 Nov 2018 21:49:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 65B7F2C3D5; Thu, 1 Nov 2018 21:49:17 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AA4E52C1F6 for ; Thu, 1 Nov 2018 21:49:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727970AbeKBGyB (ORCPT ); Fri, 2 Nov 2018 02:54:01 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:58271 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727883AbeKBGyA (ORCPT ); Fri, 2 Nov 2018 02:54:00 -0400 Received: from mail-io1-f71.google.com ([209.85.166.71]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1gIKq0-0008KG-FE for linux-fsdevel@vger.kernel.org; Thu, 01 Nov 2018 21:49:12 +0000 Received: by mail-io1-f71.google.com with SMTP id c7-v6so18951558iod.1 for ; Thu, 01 Nov 2018 14:49:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=yczXKaqAEGqNQkY5lhKY0Xn53759KFuVu2oBmKdF/AE=; b=kNw3n1pwVXGX44TqtHV2bq7UVORPYigouT7PXFA2c2o155sDfmDNVHNO9dWQeJIPOw 0QdQ0wXUbSAaunEGg7885g5kxb09Ga3asp+iDs3jigKRY/SZGCy74A0x+v/asx6dBeZy MJrekSEuxRnQc4cjKiXcIa4BamUeyQa3sg5pnwy2EbV9F/aQSrH2JGiszT1gImzXUPO0 Uu0FrIKSngdzty6GdPEOgyk+ajecZhXi4SUz/sldocXw+4DlrVYZoKpkDkTGQisv+Qhs ymCBECGTEFVlZn8VmP+DqO+HqO3U5MCL8yds9EB10re6BgDdbG3s6SEXZ2avSQb/ZP5E 794g== X-Gm-Message-State: AGRZ1gLHg6N0iDLjOyWAGXPGqsUxiTOvnAxtnB6hzESvNXYO4FHMkGjI 2Nw80aI2nmI8j2aoYQMvhCLkjYyvR8XPVrHPx9UR2/qt6zDqh3ia33PJEpfiPCpyyQN9ipjBq6W +J+C3zRkHE2sGoZl1+DjYEObq3rfxdWoaQ5lCZ0Pj3Ok= X-Received: by 2002:a24:ef05:: with SMTP id i5-v6mr7685212ith.125.1541108950700; Thu, 01 Nov 2018 14:49:10 -0700 (PDT) X-Google-Smtp-Source: AJdET5e9/x089BHZSYmlcs9moJieKePTQTxwdIXB4C83vdvkGx7MUAoGGTmvMw/ctAgR8d0N/Dn8FQ== X-Received: by 2002:a24:ef05:: with SMTP id i5-v6mr7685177ith.125.1541108950203; Thu, 01 Nov 2018 14:49:10 -0700 (PDT) Received: from localhost ([2605:a601:ac7:2a20:7c8b:4047:a2ef:69cd]) by smtp.gmail.com with ESMTPSA id 10-v6sm16562581itl.32.2018.11.01.14.49.09 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 01 Nov 2018 14:49:09 -0700 (PDT) From: Seth Forshee To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, containers@lists.linux-foundation.org, James Bottomley Subject: [RFC PATCH 5/6] shiftfs: add support for posix acls Date: Thu, 1 Nov 2018 16:48:55 -0500 Message-Id: <20181101214856.4563-6-seth.forshee@canonical.com> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20181101214856.4563-1-seth.forshee@canonical.com> References: <20181101214856.4563-1-seth.forshee@canonical.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: Seth Forshee --- fs/Kconfig | 10 +++ fs/shiftfs.c | 185 +++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 195 insertions(+) diff --git a/fs/Kconfig b/fs/Kconfig index 392c5a41a9f9..691f3c4fc7eb 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -121,6 +121,16 @@ config SHIFT_FS unprivileged containers can use this to mount root volumes using this technique. +config SHIFT_FS_POSIX_ACL + bool "shiftfs POSIX Access Control Lists" + depends on SHIFT_FS + select FS_POSIX_ACL + help + POSIX Access Control Lists (ACLs) support permissions for users and + groups beyond the owner/group/world scheme. + + If you don't know what Access Control Lists are, say N. + menu "Caches" source "fs/fscache/Kconfig" diff --git a/fs/shiftfs.c b/fs/shiftfs.c index 226c03d8588b..b19af7b2fe75 100644 --- a/fs/shiftfs.c +++ b/fs/shiftfs.c @@ -13,6 +13,8 @@ #include #include #include +#include +#include struct shiftfs_super_info { struct vfsmount *mnt; @@ -631,6 +633,183 @@ static int shiftfs_getattr(const struct path *path, struct kstat *stat, return 0; } +#ifdef CONFIG_SHIFT_FS_POSIX_ACL + +static int +shift_acl_ids(struct user_namespace *from, struct user_namespace *to, + struct posix_acl *acl) +{ + int i; + + for (i = 0; i < acl->a_count; i++) { + struct posix_acl_entry *e = &acl->a_entries[i]; + switch(e->e_tag) { + case ACL_USER: + e->e_uid = shift_kuid(from, to, e->e_uid); + if (!uid_valid(e->e_uid)) + return -EOVERFLOW; + break; + case ACL_GROUP: + e->e_gid = shift_kgid(from, to, e->e_gid); + if (!gid_valid(e->e_gid)) + return -EOVERFLOW; + break; + } + } + return 0; +} + +static void +shift_acl_xattr_ids(struct user_namespace *from, struct user_namespace *to, + void *value, size_t size) +{ + struct posix_acl_xattr_header *header = value; + struct posix_acl_xattr_entry *entry = (void *)(header + 1), *end; + int count; + kuid_t kuid; + kgid_t kgid; + + if (!value) + return; + if (size < sizeof(struct posix_acl_xattr_header)) + return; + if (header->a_version != cpu_to_le32(POSIX_ACL_XATTR_VERSION)) + return; + + count = posix_acl_xattr_count(size); + if (count < 0) + return; + if (count == 0) + return; + + for (end = entry + count; entry != end; entry++) { + switch(le16_to_cpu(entry->e_tag)) { + case ACL_USER: + kuid = make_kuid(&init_user_ns, le32_to_cpu(entry->e_id)); + kuid = shift_kuid(from, to, kuid); + entry->e_id = cpu_to_le32(from_kuid(&init_user_ns, kuid)); + break; + case ACL_GROUP: + kgid = make_kgid(&init_user_ns, le32_to_cpu(entry->e_id)); + kgid = shift_kgid(from, to, kgid); + entry->e_id = cpu_to_le32(from_kgid(&init_user_ns, kgid)); + break; + default: + break; + } + } +} + +static struct posix_acl *shiftfs_get_acl(struct inode *inode, int type) +{ + struct inode *reali = inode->i_private; + const struct cred *oldcred, *newcred; + struct posix_acl *real_acl, *acl = NULL; + struct user_namespace *from_ns = reali->i_sb->s_user_ns; + struct user_namespace *to_ns = inode->i_sb->s_user_ns; + int size; + int err; + + if (!IS_POSIXACL(reali)) + return NULL; + + oldcred = shiftfs_new_creds(&newcred, inode->i_sb); + real_acl = get_acl(reali, type); + shiftfs_old_creds(oldcred, &newcred); + + if (real_acl && !IS_ERR(acl)) { + /* XXX: export posix_acl_clone? */ + size = sizeof(struct posix_acl) + + real_acl->a_count * sizeof(struct posix_acl_entry); + acl = kmemdup(acl, size, GFP_KERNEL); + posix_acl_release(real_acl); + + if (!acl) + return ERR_PTR(-ENOMEM); + + refcount_set(&acl->a_refcount, 1); + + err = shift_acl_ids(from_ns, to_ns, acl); + if (err) { + kfree(acl); + return ERR_PTR(err); + } + } + + return acl; +} + +static int +shiftfs_posix_acl_xattr_get(const struct xattr_handler *handler, + struct dentry *dentry, struct inode *inode, + const char *name, void *buffer, size_t size) +{ + struct inode *reali = inode->i_private; + int ret; + + ret = shiftfs_xattr_get(NULL, dentry, inode, handler->name, + buffer, size); + if (ret < 0) + return ret; + + shift_acl_xattr_ids(reali->i_sb->s_user_ns, inode->i_sb->s_user_ns, + buffer, size); + return ret; +} + +static int +shiftfs_posix_acl_xattr_set(const struct xattr_handler *handler, + struct dentry *dentry, struct inode *inode, + const char *name, const void *value, + size_t size, int flags) +{ + struct inode *reali = inode->i_private; + int err; + + if (!IS_POSIXACL(reali) || !reali->i_op->set_acl) + return -EOPNOTSUPP; + if (handler->flags == ACL_TYPE_DEFAULT && !S_ISDIR(inode->i_mode)) + return value ? -EACCES : 0; + if (!inode_owner_or_capable(inode)) + return -EPERM; + + if (value) { + shift_acl_xattr_ids(inode->i_sb->s_user_ns, + reali->i_sb->s_user_ns, + (void *)value, size); + err = shiftfs_setxattr(dentry, inode, handler->name, value, + size, flags); + } else { + err = shiftfs_removexattr(dentry, handler->name); + } + + if (!err) + shiftfs_copyattr(reali, inode); + return err; +} + +static const struct xattr_handler +shiftfs_posix_acl_access_xattr_handler = { + .name = XATTR_NAME_POSIX_ACL_ACCESS, + .flags = ACL_TYPE_ACCESS, + .get = shiftfs_posix_acl_xattr_get, + .set = shiftfs_posix_acl_xattr_set, +}; + +static const struct xattr_handler +shiftfs_posix_acl_default_xattr_handler = { + .name = XATTR_NAME_POSIX_ACL_DEFAULT, + .flags = ACL_TYPE_DEFAULT, + .get = shiftfs_posix_acl_xattr_get, + .set = shiftfs_posix_acl_xattr_set, +}; + +#else /* !CONFIG_SHIFT_FS_POSIX_ACL */ + +#define shiftfs_get_acl NULL + +#endif /* CONFIG_SHIFT_FS_POSIX_ACL */ + static const struct inode_operations shiftfs_inode_ops = { .lookup = shiftfs_lookup, .getattr = shiftfs_getattr, @@ -647,6 +826,7 @@ static const struct inode_operations shiftfs_inode_ops = { .create = shiftfs_create, .mknod = NULL, /* no special files currently */ .listxattr = shiftfs_listxattr, + .get_acl = shiftfs_get_acl, }; static struct inode *shiftfs_new_inode(struct super_block *sb, umode_t mode, @@ -725,6 +905,10 @@ static const struct xattr_handler shiftfs_xattr_handler = { }; const struct xattr_handler *shiftfs_xattr_handlers[] = { +#ifdef CONFIG_SHIFT_FS_POSIX_ACL + &shiftfs_posix_acl_access_xattr_handler, + &shiftfs_posix_acl_default_xattr_handler, +#endif &shiftfs_xattr_handler, NULL }; @@ -819,6 +1003,7 @@ static int shiftfs_fill_super(struct super_block *sb, void *raw_data, sb->s_op = &shiftfs_super_ops; sb->s_xattr = shiftfs_xattr_handlers; sb->s_d_op = &shiftfs_dentry_ops; + sb->s_flags |= SB_POSIXACL; sb->s_root = d_make_root(shiftfs_new_inode(sb, S_IFDIR, dentry)); sb->s_root->d_fsdata = dentry; From patchwork Thu Nov 1 21:48:56 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Seth Forshee X-Patchwork-Id: 10664557 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 37BB917D5 for ; Thu, 1 Nov 2018 21:49:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 285402C1F6 for ; Thu, 1 Nov 2018 21:49:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1C6B62C3D2; Thu, 1 Nov 2018 21:49:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 978C22C1F6 for ; Thu, 1 Nov 2018 21:49:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727977AbeKBGyD (ORCPT ); Fri, 2 Nov 2018 02:54:03 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:58275 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727835AbeKBGyD (ORCPT ); Fri, 2 Nov 2018 02:54:03 -0400 Received: from mail-it1-f200.google.com ([209.85.166.200]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1gIKq2-0008KT-Iu for linux-fsdevel@vger.kernel.org; Thu, 01 Nov 2018 21:49:14 +0000 Received: by mail-it1-f200.google.com with SMTP id n135-v6so497750ita.0 for ; Thu, 01 Nov 2018 14:49:14 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ssHBrVY4nRWUii0FegPmCz0YBObhDZI10ZPtWTEhtfI=; b=OnE4Vh/kQnU2VT6Iotgsrz4WQLSXJCA18X6QwHn4Bnmh/Yn4+fPCdVJYJ5IHUGkIOx CbI1pXUL7GjTiFG+r6huJt67Uv7wKlDJ9jh6xs/npDZlq64+Pl9rB3E3QE609G+Bwuzq Pkbrqys7JtTj0MTdTiIsn4Cv9CbIghmtOnX8nbFKqAhOWwsNpozahjXWsxaUdtK3GiGT APoGoa9Wh64oGGPv7bvlHjjzSp67LNj5FAEvR2UAlIUWmCCgH5lk4mTAp30o+E/87LdR 2Ciibw3sV3ZzwgSs2+bcLbYHRIm+CrJzxBCC5bWAzM0SmjClaT3t+nZIm/WLUjCCEQfI O9jg== X-Gm-Message-State: AGRZ1gLwWEaswUbgvoM48yUzhPTGHPKSjcsRJF9NmFWDdSxw6yw+2tD4 3Lgd3vanyA3mb/F2yqVK680oPyZql6qtvyY9sxLrTB7sEoogWfnh2gdNguKq4onVTBfl+LGnf+i 4ao5INcSW9FHVzCKFXX56ki2U82ilX25n1CUKs0nzK8Y= X-Received: by 2002:a02:b45a:: with SMTP id w26-v6mr7736307jaj.45.1541108953133; Thu, 01 Nov 2018 14:49:13 -0700 (PDT) X-Google-Smtp-Source: AJdET5dDaKJBrDdhw3VjHDkfjLz9RNuSjLwQJnSQr8NDzg7HqjE4SQHD+0DRV/mB/BZFSFmo+fiPXg== X-Received: by 2002:a02:b45a:: with SMTP id w26-v6mr7736291jaj.45.1541108952698; Thu, 01 Nov 2018 14:49:12 -0700 (PDT) Received: from localhost ([2605:a601:ac7:2a20:7c8b:4047:a2ef:69cd]) by smtp.gmail.com with ESMTPSA id o10-v6sm9449349iob.43.2018.11.01.14.49.11 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 01 Nov 2018 14:49:11 -0700 (PDT) From: Seth Forshee To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, containers@lists.linux-foundation.org, James Bottomley Subject: [RFC PATCH 6/6] shiftfs: support nested shiftfs mounts Date: Thu, 1 Nov 2018 16:48:56 -0500 Message-Id: <20181101214856.4563-7-seth.forshee@canonical.com> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20181101214856.4563-1-seth.forshee@canonical.com> References: <20181101214856.4563-1-seth.forshee@canonical.com> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP shiftfs mounts cannot be nested for two reasons -- global CAP_SYS_ADMIN is required to set up a mark mount, and a single functional shiftfs mount meets the filesystem stacking depth limit. The CAP_SYS_ADMIN requirement can be relaxed. All of the kernel ids in a mount must be within that mount's s_user_ns, so all that is needed is CAP_SYS_ADMIN within that s_user_ns. The stack depth issue can be worked around with a couple of adjustments. First, a mark mount doesn't really need to count against the stacking depth as it doesn't contribute to the call stack depth during filesystem operations. Therefore the mount over the mark mount only needs to count as one more than the lower filesystems stack depth. Second, when the lower mount is shiftfs we can just skip down to that mount's lower filesystem and shift ids relative to that. There is no reason to shift ids twice, and the lower path has already been marked safe for id shifting by a user privileged towards all ids in that mount's user ns. Signed-off-by: Seth Forshee --- fs/shiftfs.c | 68 +++++++++++++++++++++++++++++++++++----------------- 1 file changed, 46 insertions(+), 22 deletions(-) diff --git a/fs/shiftfs.c b/fs/shiftfs.c index b19af7b2fe75..008ace2842b9 100644 --- a/fs/shiftfs.c +++ b/fs/shiftfs.c @@ -930,7 +930,7 @@ static int shiftfs_fill_super(struct super_block *sb, void *raw_data, struct shiftfs_data *data = raw_data; char *name = kstrdup(data->path, GFP_KERNEL); int err = -ENOMEM; - struct shiftfs_super_info *ssi = NULL; + struct shiftfs_super_info *ssi = NULL, *mp_ssi; struct path path; struct dentry *dentry; @@ -946,11 +946,7 @@ static int shiftfs_fill_super(struct super_block *sb, void *raw_data, if (err) goto out; - /* to mark a mount point, must be real root */ - if (ssi->mark && !capable(CAP_SYS_ADMIN)) - goto out; - - /* else to mount a mark, must be userns admin */ + /* to mount a mark, must be userns admin */ if (!ssi->mark && !ns_capable(current_user_ns(), CAP_SYS_ADMIN)) goto out; @@ -962,41 +958,66 @@ static int shiftfs_fill_super(struct super_block *sb, void *raw_data, if (!S_ISDIR(path.dentry->d_inode->i_mode)) { err = -ENOTDIR; - goto out_put; - } - - sb->s_stack_depth = path.dentry->d_sb->s_stack_depth + 1; - if (sb->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) { - printk(KERN_ERR "shiftfs: maximum stacking depth exceeded\n"); - err = -EINVAL; - goto out_put; + goto out_put_path; } if (ssi->mark) { + struct super_block *lower_sb = path.mnt->mnt_sb; + + /* to mark a mount point, must root wrt lower s_user_ns */ + if (!ns_capable(lower_sb->s_user_ns, CAP_SYS_ADMIN)) + goto out_put_path; + + /* * this part is visible unshifted, so make sure no * executables that could be used to give suid * privileges */ sb->s_iflags = SB_I_NOEXEC; - ssi->mnt = path.mnt; - dentry = path.dentry; - } else { - struct shiftfs_super_info *mp_ssi; + /* + * Handle nesting of shiftfs mounts by referring this mark + * mount back to the original mark mount. This is more + * efficient and alleviates concerns about stack depth. + */ + if (lower_sb->s_magic == SHIFTFS_MAGIC) { + mp_ssi = lower_sb->s_fs_info; + + /* Doesn't make sense to mark a mark mount */ + if (mp_ssi->mark) { + err = -EINVAL; + goto out_put_path; + } + + ssi->mnt = mntget(mp_ssi->mnt); + dentry = dget(path.dentry->d_fsdata); + } else { + ssi->mnt = mntget(path.mnt); + dentry = dget(path.dentry); + } + } else { /* * this leg executes if we're admin capable in * the namespace, so be very careful */ if (path.dentry->d_sb->s_magic != SHIFTFS_MAGIC) - goto out_put; + goto out_put_path; mp_ssi = path.dentry->d_sb->s_fs_info; if (!mp_ssi->mark) - goto out_put; + goto out_put_path; ssi->mnt = mntget(mp_ssi->mnt); dentry = dget(path.dentry->d_fsdata); - path_put(&path); } + + sb->s_stack_depth = dentry->d_sb->s_stack_depth + 1; + if (sb->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) { + printk(KERN_ERR "shiftfs: maximum stacking depth exceeded\n"); + err = -EINVAL; + goto out_put_mnt; + } + + path_put(&path); ssi->userns = get_user_ns(dentry->d_sb->s_user_ns); sb->s_fs_info = ssi; sb->s_magic = SHIFTFS_MAGIC; @@ -1009,7 +1030,10 @@ static int shiftfs_fill_super(struct super_block *sb, void *raw_data, return 0; - out_put: + out_put_mnt: + mntput(ssi->mnt); + dput(dentry); + out_put_path: path_put(&path); out: kfree(name);