From patchwork Tue Feb 19 17:08:31 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 10820427 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B20C814E1 for ; Tue, 19 Feb 2019 17:08:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9AD012D035 for ; Tue, 19 Feb 2019 17:08:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8E7D22D052; Tue, 19 Feb 2019 17:08:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B59F42D041 for ; Tue, 19 Feb 2019 17:08:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728867AbfBSRIe (ORCPT ); Tue, 19 Feb 2019 12:08:34 -0500 Received: from mx1.redhat.com ([209.132.183.28]:60620 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726565AbfBSRIe (ORCPT ); Tue, 19 Feb 2019 12:08:34 -0500 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A56697D0CB; Tue, 19 Feb 2019 17:08:33 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-121-129.rdu2.redhat.com [10.10.121.129]) by smtp.corp.redhat.com (Postfix) with ESMTP id F2C0818669; Tue, 19 Feb 2019 17:08:31 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH 01/10] vfs: syscall: Add open_tree(2) to reference or clone a mount From: David Howells To: viro@zeniv.linux.org.uk Cc: linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, torvalds@linux-foundation.org, ebiederm@xmission.com, linux-security-module@vger.kernel.org Date: Tue, 19 Feb 2019 17:08:31 +0000 Message-ID: <155059611122.17079.12534811457233099094.stgit@warthog.procyon.org.uk> In-Reply-To: <155059610368.17079.2220554006494174417.stgit@warthog.procyon.org.uk> References: <155059610368.17079.2220554006494174417.stgit@warthog.procyon.org.uk> User-Agent: StGit/unknown-version MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Tue, 19 Feb 2019 17:08:33 +0000 (UTC) Sender: owner-linux-security-module@vger.kernel.org Precedence: bulk List-ID: X-Virus-Scanned: ClamAV using ClamSMTP From: Al Viro open_tree(dfd, pathname, flags) Returns an O_PATH-opened file descriptor or an error. dfd and pathname specify the location to open, in usual fashion (see e.g. fstatat(2)). flags should be an OR of some of the following: * AT_PATH_EMPTY, AT_NO_AUTOMOUNT, AT_SYMLINK_NOFOLLOW - same meanings as usual * OPEN_TREE_CLOEXEC - make the resulting descriptor close-on-exec * OPEN_TREE_CLONE or OPEN_TREE_CLONE | AT_RECURSIVE - instead of opening the location in question, create a detached mount tree matching the subtree rooted at location specified by dfd/pathname. With AT_RECURSIVE the entire subtree is cloned, without it - only the part within in the mount containing the location in question. In other words, the same as mount --rbind or mount --bind would've taken. The detached tree will be dissolved on the final close of obtained file. Creation of such detached trees requires the same capabilities as doing mount --bind. Signed-off-by: Al Viro Signed-off-by: David Howells cc: linux-api@vger.kernel.org Signed-off-by: Al Viro --- arch/x86/entry/syscalls/syscall_32.tbl | 1 arch/x86/entry/syscalls/syscall_64.tbl | 1 fs/file_table.c | 9 +- fs/internal.h | 1 fs/namespace.c | 157 ++++++++++++++++++++++++++++---- include/linux/fs.h | 7 + include/linux/syscalls.h | 1 include/uapi/linux/fcntl.h | 2 include/uapi/linux/mount.h | 6 + 9 files changed, 158 insertions(+), 27 deletions(-) diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index 3cf7b533b3d1..ea1b413afd47 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -398,3 +398,4 @@ 384 i386 arch_prctl sys_arch_prctl __ia32_compat_sys_arch_prctl 385 i386 io_pgetevents sys_io_pgetevents __ia32_compat_sys_io_pgetevents 386 i386 rseq sys_rseq __ia32_sys_rseq +387 i386 open_tree sys_open_tree __ia32_sys_open_tree diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index f0b1709a5ffb..0545bed581dc 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -343,6 +343,7 @@ 332 common statx __x64_sys_statx 333 common io_pgetevents __x64_sys_io_pgetevents 334 common rseq __x64_sys_rseq +335 common open_tree __x64_sys_open_tree # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/fs/file_table.c b/fs/file_table.c index 5679e7fcb6b0..10e0a3dcea4d 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -255,6 +255,7 @@ static void __fput(struct file *file) struct dentry *dentry = file->f_path.dentry; struct vfsmount *mnt = file->f_path.mnt; struct inode *inode = file->f_inode; + fmode_t mode = file->f_mode; if (unlikely(!(file->f_mode & FMODE_OPENED))) goto out; @@ -277,18 +278,20 @@ static void __fput(struct file *file) if (file->f_op->release) file->f_op->release(inode, file); if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL && - !(file->f_mode & FMODE_PATH))) { + !(mode & FMODE_PATH))) { cdev_put(inode->i_cdev); } fops_put(file->f_op); put_pid(file->f_owner.pid); - if ((file->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ) + if ((mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ) i_readcount_dec(inode); - if (file->f_mode & FMODE_WRITER) { + if (mode & FMODE_WRITER) { put_write_access(inode); __mnt_drop_write(mnt); } dput(dentry); + if (unlikely(mode & FMODE_NEED_UNMOUNT)) + dissolve_on_fput(mnt); mntput(mnt); out: file_free(file); diff --git a/fs/internal.h b/fs/internal.h index 6a8b71643af4..f3a027c44758 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -94,6 +94,7 @@ extern int __mnt_want_write_file(struct file *); extern void __mnt_drop_write(struct vfsmount *); extern void __mnt_drop_write_file(struct file *); +extern void dissolve_on_fput(struct vfsmount *); /* * fs_struct.c */ diff --git a/fs/namespace.c b/fs/namespace.c index bb9b7db1c66c..112d46f26fc3 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -20,6 +20,7 @@ #include /* init_rootfs */ #include /* get_fs_root et.al. */ #include /* fsnotify_vfsmount_delete */ +#include #include #include #include @@ -1830,6 +1831,21 @@ struct vfsmount *collect_mounts(const struct path *path) return &tree->mnt; } +static void free_mnt_ns(struct mnt_namespace *); +static struct mnt_namespace *alloc_mnt_ns(struct user_namespace *, bool); + +void dissolve_on_fput(struct vfsmount *mnt) +{ + struct mnt_namespace *ns; + namespace_lock(); + lock_mount_hash(); + ns = real_mount(mnt)->mnt_ns; + umount_tree(real_mount(mnt), UMOUNT_CONNECTED); + unlock_mount_hash(); + namespace_unlock(); + free_mnt_ns(ns); +} + void drop_collected_mounts(struct vfsmount *mnt) { namespace_lock(); @@ -2220,6 +2236,30 @@ static bool has_locked_children(struct mount *mnt, struct dentry *dentry) return false; } +static struct mount *__do_loopback(struct path *old_path, int recurse) +{ + struct mount *mnt = ERR_PTR(-EINVAL), *old = real_mount(old_path->mnt); + + if (IS_MNT_UNBINDABLE(old)) + return mnt; + + if (!check_mnt(old) && old_path->dentry->d_op != &ns_dentry_operations) + return mnt; + + if (!recurse && has_locked_children(old, old_path->dentry)) + return mnt; + + if (recurse) + mnt = copy_tree(old, old_path->dentry, CL_COPY_MNT_NS_FILE); + else + mnt = clone_mnt(old, old_path->dentry, 0); + + if (!IS_ERR(mnt)) + mnt->mnt.mnt_flags &= ~MNT_LOCKED; + + return mnt; +} + /* * do loopback mount. */ @@ -2227,7 +2267,7 @@ static int do_loopback(struct path *path, const char *old_name, int recurse) { struct path old_path; - struct mount *mnt = NULL, *old, *parent; + struct mount *mnt = NULL, *parent; struct mountpoint *mp; int err; if (!old_name || !*old_name) @@ -2241,38 +2281,21 @@ static int do_loopback(struct path *path, const char *old_name, goto out; mp = lock_mount(path); - err = PTR_ERR(mp); - if (IS_ERR(mp)) + if (IS_ERR(mp)) { + err = PTR_ERR(mp); goto out; + } - old = real_mount(old_path.mnt); parent = real_mount(path->mnt); - - err = -EINVAL; - if (IS_MNT_UNBINDABLE(old)) - goto out2; - if (!check_mnt(parent)) goto out2; - if (!check_mnt(old) && old_path.dentry->d_op != &ns_dentry_operations) - goto out2; - - if (!recurse && has_locked_children(old, old_path.dentry)) - goto out2; - - if (recurse) - mnt = copy_tree(old, old_path.dentry, CL_COPY_MNT_NS_FILE); - else - mnt = clone_mnt(old, old_path.dentry, 0); - + mnt = __do_loopback(&old_path, recurse); if (IS_ERR(mnt)) { err = PTR_ERR(mnt); goto out2; } - mnt->mnt.mnt_flags &= ~MNT_LOCKED; - err = graft_tree(mnt, parent, mp); if (err) { lock_mount_hash(); @@ -2286,6 +2309,96 @@ static int do_loopback(struct path *path, const char *old_name, return err; } +static struct file *open_detached_copy(struct path *path, bool recursive) +{ + struct user_namespace *user_ns = current->nsproxy->mnt_ns->user_ns; + struct mnt_namespace *ns = alloc_mnt_ns(user_ns, true); + struct mount *mnt, *p; + struct file *file; + + if (IS_ERR(ns)) + return ERR_CAST(ns); + + namespace_lock(); + mnt = __do_loopback(path, recursive); + if (IS_ERR(mnt)) { + namespace_unlock(); + free_mnt_ns(ns); + return ERR_CAST(mnt); + } + + lock_mount_hash(); + for (p = mnt; p; p = next_mnt(p, mnt)) { + p->mnt_ns = ns; + ns->mounts++; + } + ns->root = mnt; + list_add_tail(&ns->list, &mnt->mnt_list); + mntget(&mnt->mnt); + unlock_mount_hash(); + namespace_unlock(); + + mntput(path->mnt); + path->mnt = &mnt->mnt; + file = dentry_open(path, O_PATH, current_cred()); + if (IS_ERR(file)) + dissolve_on_fput(path->mnt); + else + file->f_mode |= FMODE_NEED_UNMOUNT; + return file; +} + +SYSCALL_DEFINE3(open_tree, int, dfd, const char *, filename, unsigned, flags) +{ + struct file *file; + struct path path; + int lookup_flags = LOOKUP_AUTOMOUNT | LOOKUP_FOLLOW; + bool detached = flags & OPEN_TREE_CLONE; + int error; + int fd; + + BUILD_BUG_ON(OPEN_TREE_CLOEXEC != O_CLOEXEC); + + if (flags & ~(AT_EMPTY_PATH | AT_NO_AUTOMOUNT | AT_RECURSIVE | + AT_SYMLINK_NOFOLLOW | OPEN_TREE_CLONE | + OPEN_TREE_CLOEXEC)) + return -EINVAL; + + if ((flags & (AT_RECURSIVE | OPEN_TREE_CLONE)) == AT_RECURSIVE) + return -EINVAL; + + if (flags & AT_NO_AUTOMOUNT) + lookup_flags &= ~LOOKUP_AUTOMOUNT; + if (flags & AT_SYMLINK_NOFOLLOW) + lookup_flags &= ~LOOKUP_FOLLOW; + if (flags & AT_EMPTY_PATH) + lookup_flags |= LOOKUP_EMPTY; + + if (detached && !may_mount()) + return -EPERM; + + fd = get_unused_fd_flags(flags & O_CLOEXEC); + if (fd < 0) + return fd; + + error = user_path_at(dfd, filename, lookup_flags, &path); + if (unlikely(error)) { + file = ERR_PTR(error); + } else { + if (detached) + file = open_detached_copy(&path, flags & AT_RECURSIVE); + else + file = dentry_open(&path, O_PATH, current_cred()); + path_put(&path); + } + if (IS_ERR(file)) { + put_unused_fd(fd); + return PTR_ERR(file); + } + fd_install(fd, file); + return fd; +} + /* * Don't allow locked mount flags to be cleared. * diff --git a/include/linux/fs.h b/include/linux/fs.h index 3e85cb8e8c20..c4ef82689cab 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -159,10 +159,13 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset, #define FMODE_NONOTIFY ((__force fmode_t)0x4000000) /* File is capable of returning -EAGAIN if I/O will block */ -#define FMODE_NOWAIT ((__force fmode_t)0x8000000) +#define FMODE_NOWAIT ((__force fmode_t)0x8000000) + +/* File represents mount that needs unmounting */ +#define FMODE_NEED_UNMOUNT ((__force fmode_t)0x10000000) /* File does not contribute to nr_files count */ -#define FMODE_NOACCOUNT ((__force fmode_t)0x20000000) +#define FMODE_NOACCOUNT ((__force fmode_t)0x20000000) /* * Flag for rw_copy_check_uvector and compat_rw_copy_check_uvector diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 257cccba3062..07bca6796498 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -926,6 +926,7 @@ asmlinkage long sys_statx(int dfd, const char __user *path, unsigned flags, unsigned mask, struct statx __user *buffer); asmlinkage long sys_rseq(struct rseq __user *rseq, uint32_t rseq_len, int flags, uint32_t sig); +asmlinkage long sys_open_tree(int dfd, const char __user *path, unsigned flags); /* * Architecture-specific system calls diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h index 6448cdd9a350..594b85f7cb86 100644 --- a/include/uapi/linux/fcntl.h +++ b/include/uapi/linux/fcntl.h @@ -90,5 +90,7 @@ #define AT_STATX_FORCE_SYNC 0x2000 /* - Force the attributes to be sync'd with the server */ #define AT_STATX_DONT_SYNC 0x4000 /* - Don't sync attributes with the server */ +#define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */ + #endif /* _UAPI_LINUX_FCNTL_H */ diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h index 3f9ec42510b0..fd7ae2e7eccf 100644 --- a/include/uapi/linux/mount.h +++ b/include/uapi/linux/mount.h @@ -55,4 +55,10 @@ #define MS_MGC_VAL 0xC0ED0000 #define MS_MGC_MSK 0xffff0000 +/* + * open_tree() flags. + */ +#define OPEN_TREE_CLONE 1 /* Clone the target tree and attach the clone */ +#define OPEN_TREE_CLOEXEC O_CLOEXEC /* Close the file on execve() */ + #endif /* _UAPI_LINUX_MOUNT_H */ From patchwork Tue Feb 19 17:08:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 10820431 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 767FD6C2 for ; Tue, 19 Feb 2019 17:08:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5EC152D035 for ; Tue, 19 Feb 2019 17:08:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 530942D041; Tue, 19 Feb 2019 17:08:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 91DDE2D052 for ; Tue, 19 Feb 2019 17:08:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727303AbfBSRIm (ORCPT ); Tue, 19 Feb 2019 12:08:42 -0500 Received: from mx1.redhat.com ([209.132.183.28]:37872 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726694AbfBSRIm (ORCPT ); Tue, 19 Feb 2019 12:08:42 -0500 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 42292C04BD35; Tue, 19 Feb 2019 17:08:41 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-121-129.rdu2.redhat.com [10.10.121.129]) by smtp.corp.redhat.com (Postfix) with ESMTP id 96E2C37DE; Tue, 19 Feb 2019 17:08:39 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH 02/10] vfs: syscall: Add move_mount(2) to move mounts around From: David Howells To: viro@zeniv.linux.org.uk Cc: linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, torvalds@linux-foundation.org, ebiederm@xmission.com, linux-security-module@vger.kernel.org Date: Tue, 19 Feb 2019 17:08:38 +0000 Message-ID: <155059611887.17079.12991580316407924257.stgit@warthog.procyon.org.uk> In-Reply-To: <155059610368.17079.2220554006494174417.stgit@warthog.procyon.org.uk> References: <155059610368.17079.2220554006494174417.stgit@warthog.procyon.org.uk> User-Agent: StGit/unknown-version MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Tue, 19 Feb 2019 17:08:41 +0000 (UTC) Sender: owner-linux-security-module@vger.kernel.org Precedence: bulk List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Add a move_mount() system call that will move a mount from one place to another and, in the next commit, allow to attach an unattached mount tree. The new system call looks like the following: int move_mount(int from_dfd, const char *from_path, int to_dfd, const char *to_path, unsigned int flags); Signed-off-by: David Howells cc: linux-api@vger.kernel.org Signed-off-by: Al Viro Signed-off-by: Tetsuo Handa Acked-by: John Johansen --- arch/x86/entry/syscalls/syscall_32.tbl | 1 arch/x86/entry/syscalls/syscall_64.tbl | 1 fs/namespace.c | 126 ++++++++++++++++++++++++-------- include/linux/lsm_hooks.h | 6 ++ include/linux/security.h | 7 ++ include/linux/syscalls.h | 3 + include/uapi/linux/mount.h | 11 +++ security/security.c | 5 + 8 files changed, 129 insertions(+), 31 deletions(-) diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index ea1b413afd47..76d092b7d1b0 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -399,3 +399,4 @@ 385 i386 io_pgetevents sys_io_pgetevents __ia32_compat_sys_io_pgetevents 386 i386 rseq sys_rseq __ia32_sys_rseq 387 i386 open_tree sys_open_tree __ia32_sys_open_tree +388 i386 move_mount sys_move_mount __ia32_sys_move_mount diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 0545bed581dc..37ba4e65eee6 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -344,6 +344,7 @@ 333 common io_pgetevents __x64_sys_io_pgetevents 334 common rseq __x64_sys_rseq 335 common open_tree __x64_sys_open_tree +336 common move_mount __x64_sys_move_mount # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/fs/namespace.c b/fs/namespace.c index 112d46f26fc3..f10122028a11 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -2537,72 +2537,81 @@ static inline int tree_contains_unbindable(struct mount *mnt) return 0; } -static int do_move_mount(struct path *path, const char *old_name) +static int do_move_mount(struct path *old_path, struct path *new_path) { - struct path old_path, parent_path; + struct path parent_path = {.mnt = NULL, .dentry = NULL}; struct mount *p; struct mount *old; struct mountpoint *mp; int err; - if (!old_name || !*old_name) - return -EINVAL; - err = kern_path(old_name, LOOKUP_FOLLOW, &old_path); - if (err) - return err; - mp = lock_mount(path); - err = PTR_ERR(mp); + mp = lock_mount(new_path); if (IS_ERR(mp)) - goto out; + return PTR_ERR(mp); - old = real_mount(old_path.mnt); - p = real_mount(path->mnt); + old = real_mount(old_path->mnt); + p = real_mount(new_path->mnt); err = -EINVAL; if (!check_mnt(p) || !check_mnt(old)) - goto out1; + goto out; - if (old->mnt.mnt_flags & MNT_LOCKED) - goto out1; + if (!mnt_has_parent(old)) + goto out; - err = -EINVAL; - if (old_path.dentry != old_path.mnt->mnt_root) - goto out1; + if (old->mnt.mnt_flags & MNT_LOCKED) + goto out; - if (!mnt_has_parent(old)) - goto out1; + if (old_path->dentry != old_path->mnt->mnt_root) + goto out; - if (d_is_dir(path->dentry) != - d_is_dir(old_path.dentry)) - goto out1; + if (d_is_dir(new_path->dentry) != + d_is_dir(old_path->dentry)) + goto out; /* * Don't move a mount residing in a shared parent. */ if (IS_MNT_SHARED(old->mnt_parent)) - goto out1; + goto out; /* * Don't move a mount tree containing unbindable mounts to a destination * mount which is shared. */ if (IS_MNT_SHARED(p) && tree_contains_unbindable(old)) - goto out1; + goto out; err = -ELOOP; for (; mnt_has_parent(p); p = p->mnt_parent) if (p == old) - goto out1; + goto out; - err = attach_recursive_mnt(old, real_mount(path->mnt), mp, &parent_path); + err = attach_recursive_mnt(old, real_mount(new_path->mnt), mp, + &parent_path); if (err) - goto out1; + goto out; /* if the mount is moved, it should no longer be expire * automatically */ list_del_init(&old->mnt_expire); -out1: - unlock_mount(mp); out: + unlock_mount(mp); if (!err) path_put(&parent_path); + return err; +} + +static int do_move_mount_old(struct path *path, const char *old_name) +{ + struct path old_path; + int err; + + if (!old_name || !*old_name) + return -EINVAL; + + err = kern_path(old_name, LOOKUP_FOLLOW, &old_path); + if (err) + return err; + + err = do_move_mount(&old_path, path); path_put(&old_path); return err; } @@ -3050,7 +3059,7 @@ long do_mount(const char *dev_name, const char __user *dir_name, else if (flags & (MS_SHARED | MS_PRIVATE | MS_SLAVE | MS_UNBINDABLE)) retval = do_change_type(&path, flags); else if (flags & MS_MOVE) - retval = do_move_mount(&path, dev_name); + retval = do_move_mount_old(&path, dev_name); else retval = do_new_mount(&path, type_page, sb_flags, mnt_flags, dev_name, data_page); @@ -3278,6 +3287,61 @@ SYSCALL_DEFINE5(mount, char __user *, dev_name, char __user *, dir_name, return ksys_mount(dev_name, dir_name, type, flags, data); } +/* + * Move a mount from one place to another. + * + * Note the flags value is a combination of MOVE_MOUNT_* flags. + */ +SYSCALL_DEFINE5(move_mount, + int, from_dfd, const char *, from_pathname, + int, to_dfd, const char *, to_pathname, + unsigned int, flags) +{ + struct path from_path, to_path; + unsigned int lflags; + int ret = 0; + + if (!may_mount()) + return -EPERM; + + if (flags & ~MOVE_MOUNT__MASK) + return -EINVAL; + + /* If someone gives a pathname, they aren't permitted to move + * from an fd that requires unmount as we can't get at the flag + * to clear it afterwards. + */ + lflags = 0; + if (flags & MOVE_MOUNT_F_SYMLINKS) lflags |= LOOKUP_FOLLOW; + if (flags & MOVE_MOUNT_F_AUTOMOUNTS) lflags |= LOOKUP_AUTOMOUNT; + if (flags & MOVE_MOUNT_F_EMPTY_PATH) lflags |= LOOKUP_EMPTY; + + ret = user_path_at(from_dfd, from_pathname, lflags, &from_path); + if (ret < 0) + return ret; + + lflags = 0; + if (flags & MOVE_MOUNT_T_SYMLINKS) lflags |= LOOKUP_FOLLOW; + if (flags & MOVE_MOUNT_T_AUTOMOUNTS) lflags |= LOOKUP_AUTOMOUNT; + if (flags & MOVE_MOUNT_T_EMPTY_PATH) lflags |= LOOKUP_EMPTY; + + ret = user_path_at(to_dfd, to_pathname, lflags, &to_path); + if (ret < 0) + goto out_from; + + ret = security_move_mount(&from_path, &to_path); + if (ret < 0) + goto out_to; + + ret = do_move_mount(&from_path, &to_path); + +out_to: + path_put(&to_path); +out_from: + path_put(&from_path); + return ret; +} + /* * Return true if path is reachable from root * diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 356e78fe90a8..89892a031ecf 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -160,6 +160,10 @@ * Parse a string of security data filling in the opts structure * @options string containing all mount options known by the LSM * @opts binary data structure usable by the LSM + * @move_mount: + * Check permission before a mount is moved. + * @from_path indicates the mount that is going to be moved. + * @to_path indicates the mountpoint that will be mounted upon. * @dentry_init_security: * Compute a context for a dentry as the inode is not yet available * since NFSv4 has no label backed by an EA anyway. @@ -1500,6 +1504,7 @@ union security_list_options { unsigned long *set_kern_flags); int (*sb_add_mnt_opt)(const char *option, const char *val, int len, void **mnt_opts); + int (*move_mount)(const struct path *from_path, const struct path *to_path); int (*dentry_init_security)(struct dentry *dentry, int mode, const struct qstr *name, void **ctx, u32 *ctxlen); @@ -1835,6 +1840,7 @@ struct security_hook_heads { struct hlist_head sb_set_mnt_opts; struct hlist_head sb_clone_mnt_opts; struct hlist_head sb_add_mnt_opt; + struct hlist_head move_mount; struct hlist_head dentry_init_security; struct hlist_head dentry_create_files_as; #ifdef CONFIG_SECURITY_PATH diff --git a/include/linux/security.h b/include/linux/security.h index f28a1ebfd78e..edc48e61c8b4 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -247,6 +247,7 @@ int security_sb_clone_mnt_opts(const struct super_block *oldsb, unsigned long *set_kern_flags); int security_add_mnt_opt(const char *option, const char *val, int len, void **mnt_opts); +int security_move_mount(const struct path *from_path, const struct path *to_path); int security_dentry_init_security(struct dentry *dentry, int mode, const struct qstr *name, void **ctx, u32 *ctxlen); @@ -609,6 +610,12 @@ static inline int security_add_mnt_opt(const char *option, const char *val, return 0; } +static inline int security_move_mount(const struct path *from_path, + const struct path *to_path) +{ + return 0; +} + static inline int security_inode_alloc(struct inode *inode) { return 0; diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 07bca6796498..13289cdf6b52 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -927,6 +927,9 @@ asmlinkage long sys_statx(int dfd, const char __user *path, unsigned flags, asmlinkage long sys_rseq(struct rseq __user *rseq, uint32_t rseq_len, int flags, uint32_t sig); asmlinkage long sys_open_tree(int dfd, const char __user *path, unsigned flags); +asmlinkage long sys_move_mount(int from_dfd, const char __user *from_path, + int to_dfd, const char __user *to_path, + unsigned int ms_flags); /* * Architecture-specific system calls diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h index fd7ae2e7eccf..3634e065836c 100644 --- a/include/uapi/linux/mount.h +++ b/include/uapi/linux/mount.h @@ -61,4 +61,15 @@ #define OPEN_TREE_CLONE 1 /* Clone the target tree and attach the clone */ #define OPEN_TREE_CLOEXEC O_CLOEXEC /* Close the file on execve() */ +/* + * move_mount() flags. + */ +#define MOVE_MOUNT_F_SYMLINKS 0x00000001 /* Follow symlinks on from path */ +#define MOVE_MOUNT_F_AUTOMOUNTS 0x00000002 /* Follow automounts on from path */ +#define MOVE_MOUNT_F_EMPTY_PATH 0x00000004 /* Empty from path permitted */ +#define MOVE_MOUNT_T_SYMLINKS 0x00000010 /* Follow symlinks on to path */ +#define MOVE_MOUNT_T_AUTOMOUNTS 0x00000020 /* Follow automounts on to path */ +#define MOVE_MOUNT_T_EMPTY_PATH 0x00000040 /* Empty to path permitted */ +#define MOVE_MOUNT__MASK 0x00000077 + #endif /* _UAPI_LINUX_MOUNT_H */ diff --git a/security/security.c b/security/security.c index 5759339319dc..6fe31b31dc74 100644 --- a/security/security.c +++ b/security/security.c @@ -476,6 +476,11 @@ int security_add_mnt_opt(const char *option, const char *val, int len, } EXPORT_SYMBOL(security_add_mnt_opt); +int security_move_mount(const struct path *from_path, const struct path *to_path) +{ + return call_int_hook(move_mount, 0, from_path, to_path); +} + int security_inode_alloc(struct inode *inode) { inode->i_security = NULL; From patchwork Tue Feb 19 17:08:46 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 10820433 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7794B6C2 for ; Tue, 19 Feb 2019 17:08:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5B95C2D076 for ; Tue, 19 Feb 2019 17:08:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4F9492D08F; Tue, 19 Feb 2019 17:08:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C69D82D076 for ; Tue, 19 Feb 2019 17:08:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728956AbfBSRIu (ORCPT ); Tue, 19 Feb 2019 12:08:50 -0500 Received: from mx1.redhat.com ([209.132.183.28]:38462 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726565AbfBSRIu (ORCPT ); Tue, 19 Feb 2019 12:08:50 -0500 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 809D9C007345; Tue, 19 Feb 2019 17:08:49 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-121-129.rdu2.redhat.com [10.10.121.129]) by smtp.corp.redhat.com (Postfix) with ESMTP id 453D05D970; Tue, 19 Feb 2019 17:08:47 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH 03/10] teach move_mount(2) to work with OPEN_TREE_CLONE From: David Howells To: viro@zeniv.linux.org.uk Cc: linux-fsdevel@vger.kernel.org, dhowells@redhat.com, torvalds@linux-foundation.org, ebiederm@xmission.com, linux-security-module@vger.kernel.org Date: Tue, 19 Feb 2019 17:08:46 +0000 Message-ID: <155059612649.17079.7287713053194562461.stgit@warthog.procyon.org.uk> In-Reply-To: <155059610368.17079.2220554006494174417.stgit@warthog.procyon.org.uk> References: <155059610368.17079.2220554006494174417.stgit@warthog.procyon.org.uk> User-Agent: StGit/unknown-version MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Tue, 19 Feb 2019 17:08:49 +0000 (UTC) Sender: owner-linux-security-module@vger.kernel.org Precedence: bulk List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Allow a detached tree created by open_tree(..., OPEN_TREE_CLONE) to be attached by move_mount(2). If by the time of final fput() of OPEN_TREE_CLONE-opened file its tree is not detached anymore, it won't be dissolved. move_mount(2) is adjusted to handle detached source. That gives us equivalents of mount --bind and mount --rbind. Thanks also to Alan Jenkins for providing a whole bunch of ways to break things using this interface. Signed-off-by: Al Viro Signed-off-by: David Howells Signed-off-by: Al Viro --- fs/namespace.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 55 insertions(+), 7 deletions(-) diff --git a/fs/namespace.c b/fs/namespace.c index f10122028a11..56423c60ac7e 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -1840,10 +1840,16 @@ void dissolve_on_fput(struct vfsmount *mnt) namespace_lock(); lock_mount_hash(); ns = real_mount(mnt)->mnt_ns; - umount_tree(real_mount(mnt), UMOUNT_CONNECTED); + if (ns) { + if (is_anon_ns(ns)) + umount_tree(real_mount(mnt), UMOUNT_CONNECTED); + else + ns = NULL; + } unlock_mount_hash(); namespace_unlock(); - free_mnt_ns(ns); + if (ns) + free_mnt_ns(ns); } void drop_collected_mounts(struct vfsmount *mnt) @@ -2079,6 +2085,10 @@ static int attach_recursive_mnt(struct mount *source_mnt, attach_mnt(source_mnt, dest_mnt, dest_mp); touch_mnt_namespace(source_mnt->mnt_ns); } else { + if (source_mnt->mnt_ns) { + /* move from anon - the caller will destroy */ + list_del_init(&source_mnt->mnt_ns->list); + } mnt_set_mountpoint(dest_mnt, dest_mp, source_mnt); commit_tree(source_mnt); } @@ -2537,13 +2547,37 @@ static inline int tree_contains_unbindable(struct mount *mnt) return 0; } +/* + * Check that there aren't references to earlier/same mount namespaces in the + * specified subtree. Such references can act as pins for mount namespaces + * that aren't checked by the mount-cycle checking code, thereby allowing + * cycles to be made. + */ +static bool check_for_nsfs_mounts(struct mount *subtree) +{ + struct mount *p; + bool ret = false; + + lock_mount_hash(); + for (p = subtree; p; p = next_mnt(p, subtree)) + if (mnt_ns_loop(p->mnt.mnt_root)) + goto out; + + ret = true; +out: + unlock_mount_hash(); + return ret; +} + static int do_move_mount(struct path *old_path, struct path *new_path) { struct path parent_path = {.mnt = NULL, .dentry = NULL}; + struct mnt_namespace *ns; struct mount *p; struct mount *old; struct mountpoint *mp; int err; + bool attached; mp = lock_mount(new_path); if (IS_ERR(mp)) @@ -2551,12 +2585,19 @@ static int do_move_mount(struct path *old_path, struct path *new_path) old = real_mount(old_path->mnt); p = real_mount(new_path->mnt); + attached = mnt_has_parent(old); + ns = old->mnt_ns; err = -EINVAL; - if (!check_mnt(p) || !check_mnt(old)) + /* The mountpoint must be in our namespace. */ + if (!check_mnt(p)) goto out; - if (!mnt_has_parent(old)) + /* The thing moved should be either ours or completely unattached. */ + if (attached && !check_mnt(old)) + goto out; + + if (!attached && !is_anon_ns(ns)) goto out; if (old->mnt.mnt_flags & MNT_LOCKED) @@ -2571,7 +2612,7 @@ static int do_move_mount(struct path *old_path, struct path *new_path) /* * Don't move a mount residing in a shared parent. */ - if (IS_MNT_SHARED(old->mnt_parent)) + if (attached && IS_MNT_SHARED(old->mnt_parent)) goto out; /* * Don't move a mount tree containing unbindable mounts to a destination @@ -2580,12 +2621,14 @@ static int do_move_mount(struct path *old_path, struct path *new_path) if (IS_MNT_SHARED(p) && tree_contains_unbindable(old)) goto out; err = -ELOOP; + if (!check_for_nsfs_mounts(old)) + goto out; for (; mnt_has_parent(p); p = p->mnt_parent) if (p == old) goto out; err = attach_recursive_mnt(old, real_mount(new_path->mnt), mp, - &parent_path); + attached ? &parent_path : NULL); if (err) goto out; @@ -2594,8 +2637,11 @@ static int do_move_mount(struct path *old_path, struct path *new_path) list_del_init(&old->mnt_expire); out: unlock_mount(mp); - if (!err) + if (!err) { path_put(&parent_path); + if (!attached) + free_mnt_ns(ns); + } return err; } @@ -3289,6 +3335,8 @@ SYSCALL_DEFINE5(mount, char __user *, dev_name, char __user *, dir_name, /* * Move a mount from one place to another. + * In combination with open_tree(OPEN_TREE_CLONE [| AT_RECURSIVE]) it can be + * used to copy a mount subtree. * * Note the flags value is a combination of MOVE_MOUNT_* flags. */ From patchwork Tue Feb 19 17:08:54 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 10820437 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9177A14E1 for ; Tue, 19 Feb 2019 17:09:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2B2BE2AD95 for ; Tue, 19 Feb 2019 17:09:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 192F82C9A2; Tue, 19 Feb 2019 17:09:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2CD8829C6B for ; Tue, 19 Feb 2019 17:08:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726565AbfBSRI6 (ORCPT ); Tue, 19 Feb 2019 12:08:58 -0500 Received: from mx1.redhat.com ([209.132.183.28]:37402 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726368AbfBSRI6 (ORCPT ); Tue, 19 Feb 2019 12:08:58 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C45853DDB6; Tue, 19 Feb 2019 17:08:57 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-121-129.rdu2.redhat.com [10.10.121.129]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7479060FD8; Tue, 19 Feb 2019 17:08:56 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH 04/10] Make anon_inodes unconditional From: David Howells To: viro@zeniv.linux.org.uk Cc: linux-fsdevel@vger.kernel.org, dhowells@redhat.com, torvalds@linux-foundation.org, ebiederm@xmission.com, linux-security-module@vger.kernel.org Date: Tue, 19 Feb 2019 17:08:54 +0000 Message-ID: <155059613474.17079.4339959276727605930.stgit@warthog.procyon.org.uk> In-Reply-To: <155059610368.17079.2220554006494174417.stgit@warthog.procyon.org.uk> References: <155059610368.17079.2220554006494174417.stgit@warthog.procyon.org.uk> User-Agent: StGit/unknown-version MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Tue, 19 Feb 2019 17:08:57 +0000 (UTC) Sender: owner-linux-security-module@vger.kernel.org Precedence: bulk List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Make the anon_inodes facility unconditional so that it can be used by core VFS code. Signed-off-by: David Howells Signed-off-by: Al Viro --- arch/arm/kvm/Kconfig | 1 - arch/arm64/kvm/Kconfig | 1 - arch/mips/kvm/Kconfig | 1 - arch/powerpc/kvm/Kconfig | 1 - arch/s390/kvm/Kconfig | 1 - arch/x86/Kconfig | 1 - arch/x86/kvm/Kconfig | 1 - drivers/base/Kconfig | 1 - drivers/char/tpm/Kconfig | 1 - drivers/dma-buf/Kconfig | 1 - drivers/gpio/Kconfig | 1 - drivers/iio/Kconfig | 1 - drivers/infiniband/Kconfig | 1 - drivers/vfio/Kconfig | 1 - fs/Makefile | 2 +- fs/notify/fanotify/Kconfig | 1 - fs/notify/inotify/Kconfig | 1 - init/Kconfig | 10 ---------- 18 files changed, 1 insertion(+), 27 deletions(-) diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig index 3f5320f46de2..f591026347a5 100644 --- a/arch/arm/kvm/Kconfig +++ b/arch/arm/kvm/Kconfig @@ -22,7 +22,6 @@ config KVM bool "Kernel-based Virtual Machine (KVM) support" depends on MMU && OF select PREEMPT_NOTIFIERS - select ANON_INODES select ARM_GIC select ARM_GIC_V3 select ARM_GIC_V3_ITS diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig index a3f85624313e..a67121d419a2 100644 --- a/arch/arm64/kvm/Kconfig +++ b/arch/arm64/kvm/Kconfig @@ -23,7 +23,6 @@ config KVM depends on OF select MMU_NOTIFIER select PREEMPT_NOTIFIERS - select ANON_INODES select HAVE_KVM_CPU_RELAX_INTERCEPT select HAVE_KVM_ARCH_TLB_FLUSH_ALL select KVM_MMIO diff --git a/arch/mips/kvm/Kconfig b/arch/mips/kvm/Kconfig index 4528bc9c3cb1..eac25aef21e0 100644 --- a/arch/mips/kvm/Kconfig +++ b/arch/mips/kvm/Kconfig @@ -21,7 +21,6 @@ config KVM depends on MIPS_FP_SUPPORT select EXPORT_UASM select PREEMPT_NOTIFIERS - select ANON_INODES select KVM_GENERIC_DIRTYLOG_READ_PROTECT select HAVE_KVM_VCPU_ASYNC_IOCTL select KVM_MMIO diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index bfdde04e4905..f53997a8ca62 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -20,7 +20,6 @@ if VIRTUALIZATION config KVM bool select PREEMPT_NOTIFIERS - select ANON_INODES select HAVE_KVM_EVENTFD select HAVE_KVM_VCPU_ASYNC_IOCTL select SRCU diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig index 767453faacfc..1816ee48eadd 100644 --- a/arch/s390/kvm/Kconfig +++ b/arch/s390/kvm/Kconfig @@ -21,7 +21,6 @@ config KVM prompt "Kernel-based Virtual Machine (KVM) support" depends on HAVE_KVM select PREEMPT_NOTIFIERS - select ANON_INODES select HAVE_KVM_CPU_RELAX_INTERCEPT select HAVE_KVM_VCPU_ASYNC_IOCTL select HAVE_KVM_EVENTFD diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 15af091611e2..7dbea4535d9b 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -46,7 +46,6 @@ config X86 # select ACPI_LEGACY_TABLES_LOOKUP if ACPI select ACPI_SYSTEM_POWER_STATES_SUPPORT if ACPI - select ANON_INODES select ARCH_CLOCKSOURCE_DATA select ARCH_CLOCKSOURCE_INIT select ARCH_DISCARD_MEMBLOCK diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 72fa955f4a15..fc042419e670 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -27,7 +27,6 @@ config KVM depends on X86_LOCAL_APIC select PREEMPT_NOTIFIERS select MMU_NOTIFIER - select ANON_INODES select HAVE_KVM_IRQCHIP select HAVE_KVM_IRQFD select IRQ_BYPASS_MANAGER diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index 3e63a900b330..ae213ed2a7c8 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -174,7 +174,6 @@ source "drivers/base/regmap/Kconfig" config DMA_SHARED_BUFFER bool default n - select ANON_INODES select IRQ_WORK help This option enables the framework for buffer-sharing between diff --git a/drivers/char/tpm/Kconfig b/drivers/char/tpm/Kconfig index 536e55d3919f..f3e4bc490cf0 100644 --- a/drivers/char/tpm/Kconfig +++ b/drivers/char/tpm/Kconfig @@ -157,7 +157,6 @@ config TCG_CRB config TCG_VTPM_PROXY tristate "VTPM Proxy Interface" depends on TCG_TPM - select ANON_INODES ---help--- This driver proxies for an emulated TPM (vTPM) running in userspace. A device /dev/vtpmx is provided that creates a device pair diff --git a/drivers/dma-buf/Kconfig b/drivers/dma-buf/Kconfig index 2e5a0faa2cb1..3fc9c2efc583 100644 --- a/drivers/dma-buf/Kconfig +++ b/drivers/dma-buf/Kconfig @@ -3,7 +3,6 @@ menu "DMABUF options" config SYNC_FILE bool "Explicit Synchronization Framework" default n - select ANON_INODES select DMA_SHARED_BUFFER ---help--- The Sync File Framework adds explicit syncronization via diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig index b5a2845347ec..07b9b838e929 100644 --- a/drivers/gpio/Kconfig +++ b/drivers/gpio/Kconfig @@ -12,7 +12,6 @@ config ARCH_HAVE_CUSTOM_GPIO_H menuconfig GPIOLIB bool "GPIO Support" - select ANON_INODES help This enables GPIO support through the generic GPIO library. You only need to enable this, if you also want to enable diff --git a/drivers/iio/Kconfig b/drivers/iio/Kconfig index d08aeb41cd07..1dec0fecb6ef 100644 --- a/drivers/iio/Kconfig +++ b/drivers/iio/Kconfig @@ -4,7 +4,6 @@ menuconfig IIO tristate "Industrial I/O support" - select ANON_INODES help The industrial I/O subsystem provides a unified framework for drivers for many different types of embedded sensors using a diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig index 0a3ec7c726ec..2793ea38649b 100644 --- a/drivers/infiniband/Kconfig +++ b/drivers/infiniband/Kconfig @@ -25,7 +25,6 @@ config INFINIBAND_USER_MAD config INFINIBAND_USER_ACCESS tristate "InfiniBand userspace access (verbs and CM)" - select ANON_INODES depends on MMU ---help--- Userspace InfiniBand access support. This enables the diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index 9de5ed38da83..3798d77d131c 100644 --- a/drivers/vfio/Kconfig +++ b/drivers/vfio/Kconfig @@ -22,7 +22,6 @@ menuconfig VFIO tristate "VFIO Non-Privileged userspace driver framework" depends on IOMMU_API select VFIO_IOMMU_TYPE1 if (X86 || S390 || ARM || ARM64) - select ANON_INODES help VFIO provides a framework for secure userspace device drivers. See Documentation/vfio.txt for more details. diff --git a/fs/Makefile b/fs/Makefile index 9a0b8003f069..ae681523b4b1 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -25,7 +25,7 @@ obj-$(CONFIG_PROC_FS) += proc_namespace.o obj-y += notify/ obj-$(CONFIG_EPOLL) += eventpoll.o -obj-$(CONFIG_ANON_INODES) += anon_inodes.o +obj-y += anon_inodes.o obj-$(CONFIG_SIGNALFD) += signalfd.o obj-$(CONFIG_TIMERFD) += timerfd.o obj-$(CONFIG_EVENTFD) += eventfd.o diff --git a/fs/notify/fanotify/Kconfig b/fs/notify/fanotify/Kconfig index 41355ce74ac0..f5b0b3af32dd 100644 --- a/fs/notify/fanotify/Kconfig +++ b/fs/notify/fanotify/Kconfig @@ -1,7 +1,6 @@ config FANOTIFY bool "Filesystem wide access notification" select FSNOTIFY - select ANON_INODES default n ---help--- Say Y here to enable fanotify support. fanotify is a file access diff --git a/fs/notify/inotify/Kconfig b/fs/notify/inotify/Kconfig index b981fc0c8379..0161c74e76e2 100644 --- a/fs/notify/inotify/Kconfig +++ b/fs/notify/inotify/Kconfig @@ -1,6 +1,5 @@ config INOTIFY_USER bool "Inotify support for userspace" - select ANON_INODES select FSNOTIFY default y ---help--- diff --git a/init/Kconfig b/init/Kconfig index d47cb77a220e..fea23d3f44dd 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1141,9 +1141,6 @@ config LD_DEAD_CODE_DATA_ELIMINATION config SYSCTL bool -config ANON_INODES - bool - config HAVE_UID16 bool @@ -1348,14 +1345,12 @@ config HAVE_FUTEX_CMPXCHG config EPOLL bool "Enable eventpoll support" if EXPERT default y - select ANON_INODES help Disabling this option will cause the kernel to be built without support for epoll family of system calls. config SIGNALFD bool "Enable signalfd() system call" if EXPERT - select ANON_INODES default y help Enable the signalfd() system call that allows to receive signals @@ -1365,7 +1360,6 @@ config SIGNALFD config TIMERFD bool "Enable timerfd() system call" if EXPERT - select ANON_INODES default y help Enable the timerfd() system call that allows to receive timer @@ -1375,7 +1369,6 @@ config TIMERFD config EVENTFD bool "Enable eventfd() system call" if EXPERT - select ANON_INODES default y help Enable the eventfd() system call that allows to receive both @@ -1477,7 +1470,6 @@ config KALLSYMS_BASE_RELATIVE # syscall, maps, verifier config BPF_SYSCALL bool "Enable bpf() system call" - select ANON_INODES select BPF select IRQ_WORK default n @@ -1494,7 +1486,6 @@ config BPF_JIT_ALWAYS_ON config USERFAULTFD bool "Enable userfaultfd() system call" - select ANON_INODES depends on MMU help Enable the userfaultfd() system call that allows to intercept and @@ -1561,7 +1552,6 @@ config PERF_EVENTS bool "Kernel performance events and counters" default y if PROFILING depends on HAVE_PERF_EVENTS - select ANON_INODES select IRQ_WORK select SRCU help From patchwork Tue Feb 19 17:09:03 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 10820443 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F2D4F14E1 for ; Tue, 19 Feb 2019 17:09:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0BFF72A187 for ; Tue, 19 Feb 2019 17:09:11 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F2E7C2C9A2; Tue, 19 Feb 2019 17:09:10 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 376CA2C9A2 for ; Tue, 19 Feb 2019 17:09:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727007AbfBSRJH (ORCPT ); Tue, 19 Feb 2019 12:09:07 -0500 Received: from mx1.redhat.com ([209.132.183.28]:38164 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728201AbfBSRJG (ORCPT ); Tue, 19 Feb 2019 12:09:06 -0500 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 2AA9FC0467E8; Tue, 19 Feb 2019 17:09:05 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-121-129.rdu2.redhat.com [10.10.121.129]) by smtp.corp.redhat.com (Postfix) with ESMTP id B523F19C56; Tue, 19 Feb 2019 17:09:03 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH 05/10] vfs: syscall: Add fsopen() to prepare for superblock creation From: David Howells To: viro@zeniv.linux.org.uk Cc: linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, torvalds@linux-foundation.org, ebiederm@xmission.com, linux-security-module@vger.kernel.org Date: Tue, 19 Feb 2019 17:09:03 +0000 Message-ID: <155059614302.17079.1281132139815655428.stgit@warthog.procyon.org.uk> In-Reply-To: <155059610368.17079.2220554006494174417.stgit@warthog.procyon.org.uk> References: <155059610368.17079.2220554006494174417.stgit@warthog.procyon.org.uk> User-Agent: StGit/unknown-version MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Tue, 19 Feb 2019 17:09:05 +0000 (UTC) Sender: owner-linux-security-module@vger.kernel.org Precedence: bulk List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Provide an fsopen() system call that starts the process of preparing to create a superblock that will then be mountable, using an fd as a context handle. fsopen() is given the name of the filesystem that will be used: int mfd = fsopen(const char *fsname, unsigned int flags); where flags can be 0 or FSOPEN_CLOEXEC. For example: sfd = fsopen("ext4", FSOPEN_CLOEXEC); fsconfig(sfd, FSCONFIG_SET_PATH, "source", "/dev/sda1", AT_FDCWD); fsconfig(sfd, FSCONFIG_SET_FLAG, "noatime", NULL, 0); fsconfig(sfd, FSCONFIG_SET_FLAG, "acl", NULL, 0); fsconfig(sfd, FSCONFIG_SET_FLAG, "user_xattr", NULL, 0); fsconfig(sfd, FSCONFIG_SET_STRING, "sb", "1", 0); fsconfig(sfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0); fsinfo(sfd, NULL, ...); // query new superblock attributes mfd = fsmount(sfd, FSMOUNT_CLOEXEC, MS_RELATIME); move_mount(mfd, "", sfd, AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH); sfd = fsopen("afs", -1); fsconfig(fd, FSCONFIG_SET_STRING, "source", "#grand.central.org:root.cell", 0); fsconfig(fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0); mfd = fsmount(sfd, 0, MS_NODEV); move_mount(mfd, "", sfd, AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH); If an error is reported at any step, an error message may be available to be read() back (ENODATA will be reported if there isn't an error available) in the form: "e :" "e SELinux:Mount on mountpoint not permitted" Once fsmount() has been called, further fsconfig() calls will incur EBUSY, even if the fsmount() fails. read() is still possible to retrieve error information. The fsopen() syscall creates a mount context and hangs it of the fd that it returns. Netlink is not used because it is optional and would make the core VFS dependent on the networking layer and also potentially add network namespace issues. Note that, for the moment, the caller must have SYS_CAP_ADMIN to use fsopen(). Signed-off-by: David Howells cc: linux-api@vger.kernel.org Signed-off-by: Al Viro --- arch/x86/entry/syscalls/syscall_32.tbl | 1 arch/x86/entry/syscalls/syscall_64.tbl | 1 fs/Makefile | 2 - fs/fs_context.c | 4 + fs/fsopen.c | 88 ++++++++++++++++++++++++++++++++ include/linux/fs_context.h | 16 ++++++ include/linux/syscalls.h | 1 include/uapi/linux/mount.h | 5 ++ 8 files changed, 117 insertions(+), 1 deletion(-) create mode 100644 fs/fsopen.c diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index 76d092b7d1b0..1647fefd2969 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -400,3 +400,4 @@ 386 i386 rseq sys_rseq __ia32_sys_rseq 387 i386 open_tree sys_open_tree __ia32_sys_open_tree 388 i386 move_mount sys_move_mount __ia32_sys_move_mount +389 i386 fsopen sys_fsopen __ia32_sys_fsopen diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 37ba4e65eee6..235d33dbccb2 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -345,6 +345,7 @@ 334 common rseq __x64_sys_rseq 335 common open_tree __x64_sys_open_tree 336 common move_mount __x64_sys_move_mount +337 common fsopen __x64_sys_fsopen # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/fs/Makefile b/fs/Makefile index ae681523b4b1..e3ea8093b178 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -13,7 +13,7 @@ obj-y := open.o read_write.o file_table.o super.o \ seq_file.o xattr.o libfs.o fs-writeback.o \ pnode.o splice.o sync.o utimes.o d_path.o \ stack.o fs_struct.o statfs.o fs_pin.o nsfs.o \ - fs_context.o fs_parser.o + fs_context.o fs_parser.o fsopen.o ifeq ($(CONFIG_BLOCK),y) obj-y += buffer.o block_dev.o direct-io.o mpage.o diff --git a/fs/fs_context.c b/fs/fs_context.c index 87e3546b9a52..eb806fae3117 100644 --- a/fs/fs_context.c +++ b/fs/fs_context.c @@ -271,6 +271,8 @@ static struct fs_context *alloc_fs_context(struct file_system_type *fs_type, fc->cred = get_current_cred(); fc->net_ns = get_net(current->nsproxy->net_ns); + mutex_init(&fc->uapi_mutex); + switch (purpose) { case FS_CONTEXT_FOR_MOUNT: fc->user_ns = get_user_ns(fc->cred->user_ns); @@ -353,6 +355,8 @@ struct fs_context *vfs_dup_fs_context(struct fs_context *src_fc) if (!fc) return ERR_PTR(-ENOMEM); + mutex_init(&fc->uapi_mutex); + fc->fs_private = NULL; fc->s_fs_info = NULL; fc->source = NULL; diff --git a/fs/fsopen.c b/fs/fsopen.c new file mode 100644 index 000000000000..d256f1ac9ff1 --- /dev/null +++ b/fs/fsopen.c @@ -0,0 +1,88 @@ +/* Filesystem access-by-fd. + * + * Copyright (C) 2017 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public Licence + * as published by the Free Software Foundation; either version + * 2 of the Licence, or (at your option) any later version. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "mount.h" + +static int fscontext_release(struct inode *inode, struct file *file) +{ + struct fs_context *fc = file->private_data; + + if (fc) { + file->private_data = NULL; + put_fs_context(fc); + } + return 0; +} + +const struct file_operations fscontext_fops = { + .release = fscontext_release, + .llseek = no_llseek, +}; + +/* + * Attach a filesystem context to a file and an fd. + */ +static int fscontext_create_fd(struct fs_context *fc, unsigned int o_flags) +{ + int fd; + + fd = anon_inode_getfd("fscontext", &fscontext_fops, fc, + O_RDWR | o_flags); + if (fd < 0) + put_fs_context(fc); + return fd; +} + +/* + * Open a filesystem by name so that it can be configured for mounting. + * + * We are allowed to specify a container in which the filesystem will be + * opened, thereby indicating which namespaces will be used (notably, which + * network namespace will be used for network filesystems). + */ +SYSCALL_DEFINE2(fsopen, const char __user *, _fs_name, unsigned int, flags) +{ + struct file_system_type *fs_type; + struct fs_context *fc; + const char *fs_name; + + if (!ns_capable(current->nsproxy->mnt_ns->user_ns, CAP_SYS_ADMIN)) + return -EPERM; + + if (flags & ~FSOPEN_CLOEXEC) + return -EINVAL; + + fs_name = strndup_user(_fs_name, PAGE_SIZE); + if (IS_ERR(fs_name)) + return PTR_ERR(fs_name); + + fs_type = get_fs_type(fs_name); + kfree(fs_name); + if (!fs_type) + return -ENODEV; + + fc = fs_context_for_mount(fs_type, 0); + put_filesystem(fs_type); + if (IS_ERR(fc)) + return PTR_ERR(fc); + + fc->phase = FS_CONTEXT_CREATE_PARAMS; + return fscontext_create_fd(fc, flags & FSOPEN_CLOEXEC ? O_CLOEXEC : 0); +} diff --git a/include/linux/fs_context.h b/include/linux/fs_context.h index eaca452088fa..7ab8b44fab3e 100644 --- a/include/linux/fs_context.h +++ b/include/linux/fs_context.h @@ -15,6 +15,7 @@ #include #include #include +#include struct cred; struct dentry; @@ -34,6 +35,19 @@ enum fs_context_purpose { FS_CONTEXT_FOR_RECONFIGURE, /* Superblock reconfiguration (remount) */ }; +/* + * Userspace usage phase for fsopen/fspick. + */ +enum fs_context_phase { + FS_CONTEXT_CREATE_PARAMS, /* Loading params for sb creation */ + FS_CONTEXT_CREATING, /* A superblock is being created */ + FS_CONTEXT_AWAITING_MOUNT, /* Superblock created, awaiting fsmount() */ + FS_CONTEXT_AWAITING_RECONF, /* Awaiting initialisation for reconfiguration */ + FS_CONTEXT_RECONF_PARAMS, /* Loading params for reconfiguration */ + FS_CONTEXT_RECONFIGURING, /* Reconfiguring the superblock */ + FS_CONTEXT_FAILED, /* Failed to correctly transition a context */ +}; + /* * Type of parameter value. */ @@ -74,6 +88,7 @@ struct fs_parameter { */ struct fs_context { const struct fs_context_operations *ops; + struct mutex uapi_mutex; /* Userspace access mutex */ struct file_system_type *fs_type; void *fs_private; /* The filesystem's context */ struct dentry *root; /* The root and superblock */ @@ -88,6 +103,7 @@ struct fs_context { unsigned int sb_flags_mask; /* Superblock flags that were changed */ unsigned int lsm_flags; /* Information flags from the fs to the LSM */ enum fs_context_purpose purpose:8; + enum fs_context_phase phase:8; /* The phase the context is in */ bool need_free:1; /* Need to call ops->free() */ bool global:1; /* Goes into &init_user_ns */ }; diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 13289cdf6b52..22ed8a11ef55 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -930,6 +930,7 @@ asmlinkage long sys_open_tree(int dfd, const char __user *path, unsigned flags); asmlinkage long sys_move_mount(int from_dfd, const char __user *from_path, int to_dfd, const char __user *to_path, unsigned int ms_flags); +asmlinkage long sys_fsopen(const char __user *fs_name, unsigned int flags); /* * Architecture-specific system calls diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h index 3634e065836c..7570df43d08f 100644 --- a/include/uapi/linux/mount.h +++ b/include/uapi/linux/mount.h @@ -72,4 +72,9 @@ #define MOVE_MOUNT_T_EMPTY_PATH 0x00000040 /* Empty to path permitted */ #define MOVE_MOUNT__MASK 0x00000077 +/* + * fsopen() flags. + */ +#define FSOPEN_CLOEXEC 0x00000001 + #endif /* _UAPI_LINUX_MOUNT_H */ From patchwork Tue Feb 19 17:09:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 10820445 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 61A0214E1 for ; Tue, 19 Feb 2019 17:09:15 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E33362A187 for ; Tue, 19 Feb 2019 17:09:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D526C2CA78; Tue, 19 Feb 2019 17:09:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F0B692A187 for ; Tue, 19 Feb 2019 17:09:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729249AbfBSRJN (ORCPT ); Tue, 19 Feb 2019 12:09:13 -0500 Received: from mx1.redhat.com ([209.132.183.28]:33786 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728201AbfBSRJN (ORCPT ); Tue, 19 Feb 2019 12:09:13 -0500 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 748B220B09; Tue, 19 Feb 2019 17:09:12 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-121-129.rdu2.redhat.com [10.10.121.129]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1E87F19C58; Tue, 19 Feb 2019 17:09:10 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH 06/10] vfs: Implement logging through fs_context From: David Howells To: viro@zeniv.linux.org.uk Cc: linux-fsdevel@vger.kernel.org, dhowells@redhat.com, torvalds@linux-foundation.org, ebiederm@xmission.com, linux-security-module@vger.kernel.org Date: Tue, 19 Feb 2019 17:09:10 +0000 Message-ID: <155059615036.17079.7832765121171328056.stgit@warthog.procyon.org.uk> In-Reply-To: <155059610368.17079.2220554006494174417.stgit@warthog.procyon.org.uk> References: <155059610368.17079.2220554006494174417.stgit@warthog.procyon.org.uk> User-Agent: StGit/unknown-version MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Tue, 19 Feb 2019 17:09:12 +0000 (UTC) Sender: owner-linux-security-module@vger.kernel.org Precedence: bulk List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Implement the ability for filesystems to log error, warning and informational messages through the fs_context. These can be extracted by userspace by reading from an fd created by fsopen(). Error messages are prefixed with "e ", warnings with "w " and informational messages with "i ". Inside the kernel, formatted messages are malloc'd but unformatted messages are not copied if they're either in the core .rodata section or in the .rodata section of the filesystem module pinned by fs_context::fs_type. The messages are only good till the fs_type is released. Note that the logging object is shared between duplicated fs_context structures. This is so that such as NFS which do a mount within a mount can get at least some of the errors from the inner mount. Five logging functions are provided for this: (1) void logfc(struct fs_context *fc, const char *fmt, ...); This logs a message into the context. If the buffer is full, the earliest message is discarded. (2) void errorf(fc, fmt, ...); This wraps logfc() to log an error. (3) void invalf(fc, fmt, ...); This wraps errorf() and returns -EINVAL for convenience. (4) void warnf(fc, fmt, ...); This wraps logfc() to log a warning. (5) void infof(fc, fmt, ...); This wraps logfc() to log an informational message. Signed-off-by: David Howells Signed-off-by: Al Viro --- fs/fs_context.c | 105 ++++++++++++++++++++++++++++++++++++++------ fs/fsopen.c | 67 ++++++++++++++++++++++++++++ include/linux/fs_context.h | 22 ++++++--- include/linux/module.h | 6 +++ 4 files changed, 179 insertions(+), 21 deletions(-) diff --git a/fs/fs_context.c b/fs/fs_context.c index eb806fae3117..dcf3786f90f9 100644 --- a/fs/fs_context.c +++ b/fs/fs_context.c @@ -11,6 +11,7 @@ */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt +#include #include #include #include @@ -23,6 +24,7 @@ #include #include #include +#include #include "mount.h" #include "internal.h" @@ -365,6 +367,8 @@ struct fs_context *vfs_dup_fs_context(struct fs_context *src_fc) get_net(fc->net_ns); get_user_ns(fc->user_ns); get_cred(fc->cred); + if (fc->log) + refcount_inc(&fc->log->usage); /* Can't call put until we've called ->dup */ ret = fc->ops->dup(fc, src_fc); @@ -382,7 +386,6 @@ struct fs_context *vfs_dup_fs_context(struct fs_context *src_fc) } EXPORT_SYMBOL(vfs_dup_fs_context); -#ifdef CONFIG_PRINTK /** * logfc - Log a message to a filesystem context * @fc: The filesystem context to log to. @@ -390,27 +393,100 @@ EXPORT_SYMBOL(vfs_dup_fs_context); */ void logfc(struct fs_context *fc, const char *fmt, ...) { + static const char store_failure[] = "OOM: Can't store error string"; + struct fc_log *log = fc ? fc->log : NULL; + const char *p; va_list va; + char *q; + u8 freeable; va_start(va, fmt); - - switch (fmt[0]) { - case 'w': - vprintk_emit(0, LOGLEVEL_WARNING, NULL, 0, fmt, va); - break; - case 'e': - vprintk_emit(0, LOGLEVEL_ERR, NULL, 0, fmt, va); - break; - default: - vprintk_emit(0, LOGLEVEL_NOTICE, NULL, 0, fmt, va); - break; + if (!strchr(fmt, '%')) { + p = fmt; + goto unformatted_string; } + if (strcmp(fmt, "%s") == 0) { + p = va_arg(va, const char *); + goto unformatted_string; + } + + q = kvasprintf(GFP_KERNEL, fmt, va); +copied_string: + if (!q) + goto store_failure; + freeable = 1; + goto store_string; + +unformatted_string: + if ((unsigned long)p >= (unsigned long)__start_rodata && + (unsigned long)p < (unsigned long)__end_rodata) + goto const_string; + if (log && within_module_core((unsigned long)p, log->owner)) + goto const_string; + q = kstrdup(p, GFP_KERNEL); + goto copied_string; + +store_failure: + p = store_failure; +const_string: + q = (char *)p; + freeable = 0; +store_string: + if (!log) { + switch (fmt[0]) { + case 'w': + printk(KERN_WARNING "%s\n", q + 2); + break; + case 'e': + printk(KERN_ERR "%s\n", q + 2); + break; + default: + printk(KERN_NOTICE "%s\n", q + 2); + break; + } + if (freeable) + kfree(q); + } else { + unsigned int logsize = ARRAY_SIZE(log->buffer); + u8 index; + + index = log->head & (logsize - 1); + BUILD_BUG_ON(sizeof(log->head) != sizeof(u8) || + sizeof(log->tail) != sizeof(u8)); + if ((u8)(log->head - log->tail) == logsize) { + /* The buffer is full, discard the oldest message */ + if (log->need_free & (1 << index)) + kfree(log->buffer[index]); + log->tail++; + } - pr_cont("\n"); + log->buffer[index] = q; + log->need_free &= ~(1 << index); + log->need_free |= freeable << index; + log->head++; + } va_end(va); } EXPORT_SYMBOL(logfc); -#endif + +/* + * Free a logging structure. + */ +static void put_fc_log(struct fs_context *fc) +{ + struct fc_log *log = fc->log; + int i; + + if (log) { + if (refcount_dec_and_test(&log->usage)) { + fc->log = NULL; + for (i = 0; i <= 7; i++) + if (log->need_free & (1 << i)) + kfree(log->buffer[i]); + kfree(log); + } + } +} /** * put_fs_context - Dispose of a superblock configuration context. @@ -435,6 +511,7 @@ void put_fs_context(struct fs_context *fc) put_user_ns(fc->user_ns); put_cred(fc->cred); kfree(fc->subtype); + put_fc_log(fc); put_filesystem(fc->fs_type); kfree(fc->source); kfree(fc); diff --git a/fs/fsopen.c b/fs/fsopen.c index d256f1ac9ff1..5fce6347de7a 100644 --- a/fs/fsopen.c +++ b/fs/fsopen.c @@ -20,6 +20,52 @@ #include #include "mount.h" +/* + * Allow the user to read back any error, warning or informational messages. + */ +static ssize_t fscontext_read(struct file *file, + char __user *_buf, size_t len, loff_t *pos) +{ + struct fs_context *fc = file->private_data; + struct fc_log *log = fc->log; + unsigned int logsize = ARRAY_SIZE(log->buffer); + ssize_t ret; + char *p; + bool need_free; + int index, n; + + ret = mutex_lock_interruptible(&fc->uapi_mutex); + if (ret < 0) + return ret; + + if (log->head == log->tail) { + mutex_unlock(&fc->uapi_mutex); + return -ENODATA; + } + + index = log->tail & (logsize - 1); + p = log->buffer[index]; + need_free = log->need_free & (1 << index); + log->buffer[index] = NULL; + log->need_free &= ~(1 << index); + log->tail++; + mutex_unlock(&fc->uapi_mutex); + + ret = -EMSGSIZE; + n = strlen(p); + if (n > len) + goto err_free; + ret = -EFAULT; + if (copy_to_user(_buf, p, n) != 0) + goto err_free; + ret = n; + +err_free: + if (need_free) + kfree(p); + return ret; +} + static int fscontext_release(struct inode *inode, struct file *file) { struct fs_context *fc = file->private_data; @@ -32,6 +78,7 @@ static int fscontext_release(struct inode *inode, struct file *file) } const struct file_operations fscontext_fops = { + .read = fscontext_read, .release = fscontext_release, .llseek = no_llseek, }; @@ -50,6 +97,16 @@ static int fscontext_create_fd(struct fs_context *fc, unsigned int o_flags) return fd; } +static int fscontext_alloc_log(struct fs_context *fc) +{ + fc->log = kzalloc(sizeof(*fc->log), GFP_KERNEL); + if (!fc->log) + return -ENOMEM; + refcount_set(&fc->log->usage, 1); + fc->log->owner = fc->fs_type->owner; + return 0; +} + /* * Open a filesystem by name so that it can be configured for mounting. * @@ -62,6 +119,7 @@ SYSCALL_DEFINE2(fsopen, const char __user *, _fs_name, unsigned int, flags) struct file_system_type *fs_type; struct fs_context *fc; const char *fs_name; + int ret; if (!ns_capable(current->nsproxy->mnt_ns->user_ns, CAP_SYS_ADMIN)) return -EPERM; @@ -84,5 +142,14 @@ SYSCALL_DEFINE2(fsopen, const char __user *, _fs_name, unsigned int, flags) return PTR_ERR(fc); fc->phase = FS_CONTEXT_CREATE_PARAMS; + + ret = fscontext_alloc_log(fc); + if (ret < 0) + goto err_fc; + return fscontext_create_fd(fc, flags & FSOPEN_CLOEXEC ? O_CLOEXEC : 0); + +err_fc: + put_fs_context(fc); + return ret; } diff --git a/include/linux/fs_context.h b/include/linux/fs_context.h index 7ab8b44fab3e..1f966670c8dc 100644 --- a/include/linux/fs_context.h +++ b/include/linux/fs_context.h @@ -13,6 +13,7 @@ #define _LINUX_FS_CONTEXT_H #include +#include #include #include #include @@ -95,6 +96,7 @@ struct fs_context { struct user_namespace *user_ns; /* The user namespace for this mount */ struct net *net_ns; /* The network namespace for this mount */ const struct cred *cred; /* The mounter's credentials */ + struct fc_log *log; /* Logging buffer */ const char *source; /* The source name (eg. dev path) */ const char *subtype; /* The subtype to set on the superblock */ void *security; /* Linux S&M options */ @@ -151,15 +153,21 @@ extern int vfs_get_super(struct fs_context *fc, extern const struct file_operations fscontext_fops; -#ifdef CONFIG_PRINTK +/* + * Mount error, warning and informational message logging. This structure is + * shareable between a mount and a subordinate mount. + */ +struct fc_log { + refcount_t usage; + u8 head; /* Insertion index in buffer[] */ + u8 tail; /* Removal index in buffer[] */ + u8 need_free; /* Mask of kfree'able items in buffer[] */ + struct module *owner; /* Owner module for strings that don't then need freeing */ + char *buffer[8]; +}; + extern __attribute__((format(printf, 2, 3))) void logfc(struct fs_context *fc, const char *fmt, ...); -#else -static inline __attribute__((format(printf, 2, 3))) -void logfc(struct fs_context *fc, const char *fmt, ...) -{ -} -#endif /** * infof - Store supplementary informational message diff --git a/include/linux/module.h b/include/linux/module.h index 8fa38d3e7538..902cbd0047d4 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -693,6 +693,12 @@ static inline bool is_module_text_address(unsigned long addr) return false; } +static inline bool within_module_core(unsigned long addr, + const struct module *mod) +{ + return false; +} + /* Get/put a kernel symbol (calls should be symmetric) */ #define symbol_get(x) ({ extern typeof(x) x __attribute__((weak)); &(x); }) #define symbol_put(x) do { } while (0) From patchwork Tue Feb 19 17:09:17 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 10820449 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3342E6C2 for ; Tue, 19 Feb 2019 17:09:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 979222CE8C for ; Tue, 19 Feb 2019 17:09:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8BC682CE96; Tue, 19 Feb 2019 17:09:23 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 67FD32CE8C for ; Tue, 19 Feb 2019 17:09:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726981AbfBSRJW (ORCPT ); Tue, 19 Feb 2019 12:09:22 -0500 Received: from mx1.redhat.com ([209.132.183.28]:38456 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728201AbfBSRJV (ORCPT ); Tue, 19 Feb 2019 12:09:21 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 4A449C0ABF42; Tue, 19 Feb 2019 17:09:20 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-121-129.rdu2.redhat.com [10.10.121.129]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7C54C611CD; Tue, 19 Feb 2019 17:09:18 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH 07/10] vfs: syscall: Add fsconfig() for configuring and managing a context From: David Howells To: viro@zeniv.linux.org.uk Cc: linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, torvalds@linux-foundation.org, ebiederm@xmission.com, linux-security-module@vger.kernel.org Date: Tue, 19 Feb 2019 17:09:17 +0000 Message-ID: <155059615769.17079.1506747783214959507.stgit@warthog.procyon.org.uk> In-Reply-To: <155059610368.17079.2220554006494174417.stgit@warthog.procyon.org.uk> References: <155059610368.17079.2220554006494174417.stgit@warthog.procyon.org.uk> User-Agent: StGit/unknown-version MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Tue, 19 Feb 2019 17:09:20 +0000 (UTC) Sender: owner-linux-security-module@vger.kernel.org Precedence: bulk List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Add a syscall for configuring a filesystem creation context and triggering actions upon it, to be used in conjunction with fsopen, fspick and fsmount. long fsconfig(int fs_fd, unsigned int cmd, const char *key, const void *value, int aux); Where fs_fd indicates the context, cmd indicates the action to take, key indicates the parameter name for parameter-setting actions and, if needed, value points to a buffer containing the value and aux can give more information for the value. The following command IDs are proposed: (*) FSCONFIG_SET_FLAG: No value is specified. The parameter must be boolean in nature. The key may be prefixed with "no" to invert the setting. value must be NULL and aux must be 0. (*) FSCONFIG_SET_STRING: A string value is specified. The parameter can be expecting boolean, integer, string or take a path. A conversion to an appropriate type will be attempted (which may include looking up as a path). value points to a NUL-terminated string and aux must be 0. (*) FSCONFIG_SET_BINARY: A binary blob is specified. value points to the blob and aux indicates its size. The parameter must be expecting a blob. (*) FSCONFIG_SET_PATH: A non-empty path is specified. The parameter must be expecting a path object. value points to a NUL-terminated string that is the path and aux is a file descriptor at which to start a relative lookup or AT_FDCWD. (*) FSCONFIG_SET_PATH_EMPTY: As fsconfig_set_path, but with AT_EMPTY_PATH implied. (*) FSCONFIG_SET_FD: An open file descriptor is specified. value must be NULL and aux indicates the file descriptor. (*) FSCONFIG_CMD_CREATE: Trigger superblock creation. (*) FSCONFIG_CMD_RECONFIGURE: Trigger superblock reconfiguration. For the "set" command IDs, the idea is that the file_system_type will point to a list of parameters and the types of value that those parameters expect to take. The core code can then do the parse and argument conversion and then give the LSM and FS a cooked option or array of options to use. Source specification is also done the same way same way, using special keys "source", "source1", "source2", etc.. [!] Note that, for the moment, the key and value are just glued back together and handed to the filesystem. Every filesystem that uses options uses match_token() and co. to do this, and this will need to be changed - but not all at once. Example usage: fd = fsopen("ext4", FSOPEN_CLOEXEC); fsconfig(fd, fsconfig_set_path, "source", "/dev/sda1", AT_FDCWD); fsconfig(fd, fsconfig_set_path_empty, "journal_path", "", journal_fd); fsconfig(fd, fsconfig_set_fd, "journal_fd", "", journal_fd); fsconfig(fd, fsconfig_set_flag, "user_xattr", NULL, 0); fsconfig(fd, fsconfig_set_flag, "noacl", NULL, 0); fsconfig(fd, fsconfig_set_string, "sb", "1", 0); fsconfig(fd, fsconfig_set_string, "errors", "continue", 0); fsconfig(fd, fsconfig_set_string, "data", "journal", 0); fsconfig(fd, fsconfig_set_string, "context", "unconfined_u:...", 0); fsconfig(fd, fsconfig_cmd_create, NULL, NULL, 0); mfd = fsmount(fd, FSMOUNT_CLOEXEC, MS_NOEXEC); or: fd = fsopen("ext4", FSOPEN_CLOEXEC); fsconfig(fd, fsconfig_set_string, "source", "/dev/sda1", 0); fsconfig(fd, fsconfig_cmd_create, NULL, NULL, 0); mfd = fsmount(fd, FSMOUNT_CLOEXEC, MS_NOEXEC); or: fd = fsopen("afs", FSOPEN_CLOEXEC); fsconfig(fd, fsconfig_set_string, "source", "#grand.central.org:root.cell", 0); fsconfig(fd, fsconfig_cmd_create, NULL, NULL, 0); mfd = fsmount(fd, FSMOUNT_CLOEXEC, MS_NOEXEC); or: fd = fsopen("jffs2", FSOPEN_CLOEXEC); fsconfig(fd, fsconfig_set_string, "source", "mtd0", 0); fsconfig(fd, fsconfig_cmd_create, NULL, NULL, 0); mfd = fsmount(fd, FSMOUNT_CLOEXEC, MS_NOEXEC); Signed-off-by: David Howells cc: linux-api@vger.kernel.org Signed-off-by: Al Viro --- arch/x86/entry/syscalls/syscall_32.tbl | 1 arch/x86/entry/syscalls/syscall_64.tbl | 1 fs/fs_context.c | 51 ++++++ fs/fsopen.c | 265 ++++++++++++++++++++++++++++++++ fs/internal.h | 3 include/linux/syscalls.h | 2 include/uapi/linux/mount.h | 14 ++ 7 files changed, 337 insertions(+) diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index 1647fefd2969..f9970310c126 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -401,3 +401,4 @@ 387 i386 open_tree sys_open_tree __ia32_sys_open_tree 388 i386 move_mount sys_move_mount __ia32_sys_move_mount 389 i386 fsopen sys_fsopen __ia32_sys_fsopen +390 i386 fsconfig sys_fsconfig __ia32_sys_fsconfig diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 235d33dbccb2..4185d36e03bb 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -346,6 +346,7 @@ 335 common open_tree __x64_sys_open_tree 336 common move_mount __x64_sys_move_mount 337 common fsopen __x64_sys_fsopen +338 common fsconfig __x64_sys_fsconfig # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/fs/fs_context.c b/fs/fs_context.c index dcf3786f90f9..a47ccd5a4a78 100644 --- a/fs/fs_context.c +++ b/fs/fs_context.c @@ -721,3 +721,54 @@ int parse_monolithic_mount_data(struct fs_context *fc, void *data) return monolithic_mount_data(fc, data); } + +/* + * Clean up a context after performing an action on it and put it into a state + * from where it can be used to reconfigure a superblock. + * + * Note that here we do only the parts that can't fail; the rest is in + * finish_clean_context() below and in between those fs_context is marked + * FS_CONTEXT_AWAITING_RECONF. The reason for splitup is that after + * successful mount or remount we need to report success to userland. + * Trying to do full reinit (for the sake of possible subsequent remount) + * and failing to allocate memory would've put us into a nasty situation. + * So here we only discard the old state and reinitialization is left + * until we actually try to reconfigure. + */ +void vfs_clean_context(struct fs_context *fc) +{ + if (fc->need_free && fc->ops && fc->ops->free) + fc->ops->free(fc); + fc->need_free = false; + fc->fs_private = NULL; + fc->s_fs_info = NULL; + fc->sb_flags = 0; + security_free_mnt_opts(&fc->security); + kfree(fc->subtype); + fc->subtype = NULL; + kfree(fc->source); + fc->source = NULL; + + fc->purpose = FS_CONTEXT_FOR_RECONFIGURE; + fc->phase = FS_CONTEXT_AWAITING_RECONF; +} + +int finish_clean_context(struct fs_context *fc) +{ + int error; + + if (fc->phase != FS_CONTEXT_AWAITING_RECONF) + return 0; + + if (fc->fs_type->init_fs_context) + error = fc->fs_type->init_fs_context(fc); + else + error = legacy_init_fs_context(fc); + if (unlikely(error)) { + fc->phase = FS_CONTEXT_FAILED; + return error; + } + fc->need_free = true; + fc->phase = FS_CONTEXT_RECONF_PARAMS; + return 0; +} diff --git a/fs/fsopen.c b/fs/fsopen.c index 5fce6347de7a..65cc2f68f994 100644 --- a/fs/fsopen.c +++ b/fs/fsopen.c @@ -10,6 +10,7 @@ */ #include +#include #include #include #include @@ -18,6 +19,7 @@ #include #include #include +#include "internal.h" #include "mount.h" /* @@ -153,3 +155,266 @@ SYSCALL_DEFINE2(fsopen, const char __user *, _fs_name, unsigned int, flags) put_fs_context(fc); return ret; } + +/* + * Check the state and apply the configuration. Note that this function is + * allowed to 'steal' the value by setting param->xxx to NULL before returning. + */ +static int vfs_fsconfig_locked(struct fs_context *fc, int cmd, + struct fs_parameter *param) +{ + struct super_block *sb; + int ret; + + ret = finish_clean_context(fc); + if (ret) + return ret; + switch (cmd) { + case FSCONFIG_CMD_CREATE: + if (fc->phase != FS_CONTEXT_CREATE_PARAMS) + return -EBUSY; + fc->phase = FS_CONTEXT_CREATING; + ret = vfs_get_tree(fc); + if (ret) + break; + sb = fc->root->d_sb; + ret = security_sb_kern_mount(sb); + if (unlikely(ret)) { + fc_drop_locked(fc); + break; + } + up_write(&sb->s_umount); + fc->phase = FS_CONTEXT_AWAITING_MOUNT; + return 0; + case FSCONFIG_CMD_RECONFIGURE: + if (fc->phase != FS_CONTEXT_RECONF_PARAMS) + return -EBUSY; + fc->phase = FS_CONTEXT_RECONFIGURING; + sb = fc->root->d_sb; + if (!ns_capable(sb->s_user_ns, CAP_SYS_ADMIN)) { + ret = -EPERM; + break; + } + down_write(&sb->s_umount); + ret = reconfigure_super(fc); + up_write(&sb->s_umount); + if (ret) + break; + vfs_clean_context(fc); + return 0; + default: + if (fc->phase != FS_CONTEXT_CREATE_PARAMS && + fc->phase != FS_CONTEXT_RECONF_PARAMS) + return -EBUSY; + + return vfs_parse_fs_param(fc, param); + } + fc->phase = FS_CONTEXT_FAILED; + return ret; +} + +/** + * sys_fsconfig - Set parameters and trigger actions on a context + * @fd: The filesystem context to act upon + * @cmd: The action to take + * @_key: Where appropriate, the parameter key to set + * @_value: Where appropriate, the parameter value to set + * @aux: Additional information for the value + * + * This system call is used to set parameters on a context, including + * superblock settings, data source and security labelling. + * + * Actions include triggering the creation of a superblock and the + * reconfiguration of the superblock attached to the specified context. + * + * When setting a parameter, @cmd indicates the type of value being proposed + * and @_key indicates the parameter to be altered. + * + * @_value and @aux are used to specify the value, should a value be required: + * + * (*) fsconfig_set_flag: No value is specified. The parameter must be boolean + * in nature. The key may be prefixed with "no" to invert the + * setting. @_value must be NULL and @aux must be 0. + * + * (*) fsconfig_set_string: A string value is specified. The parameter can be + * expecting boolean, integer, string or take a path. A conversion to an + * appropriate type will be attempted (which may include looking up as a + * path). @_value points to a NUL-terminated string and @aux must be 0. + * + * (*) fsconfig_set_binary: A binary blob is specified. @_value points to the + * blob and @aux indicates its size. The parameter must be expecting a + * blob. + * + * (*) fsconfig_set_path: A non-empty path is specified. The parameter must be + * expecting a path object. @_value points to a NUL-terminated string that + * is the path and @aux is a file descriptor at which to start a relative + * lookup or AT_FDCWD. + * + * (*) fsconfig_set_path_empty: As fsconfig_set_path, but with AT_EMPTY_PATH + * implied. + * + * (*) fsconfig_set_fd: An open file descriptor is specified. @_value must be + * NULL and @aux indicates the file descriptor. + */ +SYSCALL_DEFINE5(fsconfig, + int, fd, + unsigned int, cmd, + const char __user *, _key, + const void __user *, _value, + int, aux) +{ + struct fs_context *fc; + struct fd f; + int ret; + + struct fs_parameter param = { + .type = fs_value_is_undefined, + }; + + if (fd < 0) + return -EINVAL; + + switch (cmd) { + case FSCONFIG_SET_FLAG: + if (!_key || _value || aux) + return -EINVAL; + break; + case FSCONFIG_SET_STRING: + if (!_key || !_value || aux) + return -EINVAL; + break; + case FSCONFIG_SET_BINARY: + if (!_key || !_value || aux <= 0 || aux > 1024 * 1024) + return -EINVAL; + break; + case FSCONFIG_SET_PATH: + case FSCONFIG_SET_PATH_EMPTY: + if (!_key || !_value || (aux != AT_FDCWD && aux < 0)) + return -EINVAL; + break; + case FSCONFIG_SET_FD: + if (!_key || _value || aux < 0) + return -EINVAL; + break; + case FSCONFIG_CMD_CREATE: + case FSCONFIG_CMD_RECONFIGURE: + if (_key || _value || aux) + return -EINVAL; + break; + default: + return -EOPNOTSUPP; + } + + f = fdget(fd); + if (!f.file) + return -EBADF; + ret = -EINVAL; + if (f.file->f_op != &fscontext_fops) + goto out_f; + + fc = f.file->private_data; + if (fc->ops == &legacy_fs_context_ops) { + switch (cmd) { + case FSCONFIG_SET_BINARY: + case FSCONFIG_SET_PATH: + case FSCONFIG_SET_PATH_EMPTY: + case FSCONFIG_SET_FD: + ret = -EOPNOTSUPP; + goto out_f; + } + } + + if (_key) { + param.key = strndup_user(_key, 256); + if (IS_ERR(param.key)) { + ret = PTR_ERR(param.key); + goto out_f; + } + } + + switch (cmd) { + case FSCONFIG_SET_FLAG: + param.type = fs_value_is_flag; + break; + case FSCONFIG_SET_STRING: + param.type = fs_value_is_string; + param.string = strndup_user(_value, 256); + if (IS_ERR(param.string)) { + ret = PTR_ERR(param.string); + goto out_key; + } + param.size = strlen(param.string); + break; + case FSCONFIG_SET_BINARY: + param.type = fs_value_is_blob; + param.size = aux; + param.blob = memdup_user_nul(_value, aux); + if (IS_ERR(param.blob)) { + ret = PTR_ERR(param.blob); + goto out_key; + } + break; + case FSCONFIG_SET_PATH: + param.type = fs_value_is_filename; + param.name = getname_flags(_value, 0, NULL); + if (IS_ERR(param.name)) { + ret = PTR_ERR(param.name); + goto out_key; + } + param.dirfd = aux; + param.size = strlen(param.name->name); + break; + case FSCONFIG_SET_PATH_EMPTY: + param.type = fs_value_is_filename_empty; + param.name = getname_flags(_value, LOOKUP_EMPTY, NULL); + if (IS_ERR(param.name)) { + ret = PTR_ERR(param.name); + goto out_key; + } + param.dirfd = aux; + param.size = strlen(param.name->name); + break; + case FSCONFIG_SET_FD: + param.type = fs_value_is_file; + ret = -EBADF; + param.file = fget(aux); + if (!param.file) + goto out_key; + break; + default: + break; + } + + ret = mutex_lock_interruptible(&fc->uapi_mutex); + if (ret == 0) { + ret = vfs_fsconfig_locked(fc, cmd, ¶m); + mutex_unlock(&fc->uapi_mutex); + } + + /* Clean up the our record of any value that we obtained from + * userspace. Note that the value may have been stolen by the LSM or + * filesystem, in which case the value pointer will have been cleared. + */ + switch (cmd) { + case FSCONFIG_SET_STRING: + case FSCONFIG_SET_BINARY: + kfree(param.string); + break; + case FSCONFIG_SET_PATH: + case FSCONFIG_SET_PATH_EMPTY: + if (param.name) + putname(param.name); + break; + case FSCONFIG_SET_FD: + if (param.file) + fput(param.file); + break; + default: + break; + } +out_key: + kfree(param.key); +out_f: + fdput(f); + return ret; +} diff --git a/fs/internal.h b/fs/internal.h index f3a027c44758..95cf7b0af21f 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -55,8 +55,11 @@ extern void __init chrdev_init(void); /* * fs_context.c */ +extern const struct fs_context_operations legacy_fs_context_ops; extern int parse_monolithic_mount_data(struct fs_context *, void *); extern void fc_drop_locked(struct fs_context *); +extern void vfs_clean_context(struct fs_context *fc); +extern int finish_clean_context(struct fs_context *fc); /* * namei.c diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 22ed8a11ef55..2586faf20078 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -931,6 +931,8 @@ asmlinkage long sys_move_mount(int from_dfd, const char __user *from_path, int to_dfd, const char __user *to_path, unsigned int ms_flags); asmlinkage long sys_fsopen(const char __user *fs_name, unsigned int flags); +asmlinkage long sys_fsconfig(int fs_fd, unsigned int cmd, const char __user *key, + const void __user *value, int aux); /* * Architecture-specific system calls diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h index 7570df43d08f..4b90ba9d1770 100644 --- a/include/uapi/linux/mount.h +++ b/include/uapi/linux/mount.h @@ -77,4 +77,18 @@ */ #define FSOPEN_CLOEXEC 0x00000001 +/* + * The type of fsconfig() call made. + */ +enum fsconfig_command { + FSCONFIG_SET_FLAG = 0, /* Set parameter, supplying no value */ + FSCONFIG_SET_STRING = 1, /* Set parameter, supplying a string value */ + FSCONFIG_SET_BINARY = 2, /* Set parameter, supplying a binary blob value */ + FSCONFIG_SET_PATH = 3, /* Set parameter, supplying an object by path */ + FSCONFIG_SET_PATH_EMPTY = 4, /* Set parameter, supplying an object by (empty) path */ + FSCONFIG_SET_FD = 5, /* Set parameter, supplying an object by fd */ + FSCONFIG_CMD_CREATE = 6, /* Invoke superblock creation */ + FSCONFIG_CMD_RECONFIGURE = 7, /* Invoke superblock reconfiguration */ +}; + #endif /* _UAPI_LINUX_MOUNT_H */ From patchwork Tue Feb 19 17:09:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 10820455 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 231EE1805 for ; Tue, 19 Feb 2019 17:09:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0A0BF2CFE0 for ; Tue, 19 Feb 2019 17:09:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F22F32CFEF; Tue, 19 Feb 2019 17:09:30 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 51F892CFEC for ; Tue, 19 Feb 2019 17:09:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729359AbfBSRJ3 (ORCPT ); Tue, 19 Feb 2019 12:09:29 -0500 Received: from mx1.redhat.com ([209.132.183.28]:37898 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728201AbfBSRJ3 (ORCPT ); Tue, 19 Feb 2019 12:09:29 -0500 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 05A3623E6E7; Tue, 19 Feb 2019 17:09:28 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-121-129.rdu2.redhat.com [10.10.121.129]) by smtp.corp.redhat.com (Postfix) with ESMTP id 43BF7101E5B1; Tue, 19 Feb 2019 17:09:26 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH 08/10] vfs: syscall: Add fsmount() to create a mount for a superblock From: David Howells To: viro@zeniv.linux.org.uk Cc: linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, torvalds@linux-foundation.org, ebiederm@xmission.com, linux-security-module@vger.kernel.org Date: Tue, 19 Feb 2019 17:09:25 +0000 Message-ID: <155059616551.17079.6086553789074410155.stgit@warthog.procyon.org.uk> In-Reply-To: <155059610368.17079.2220554006494174417.stgit@warthog.procyon.org.uk> References: <155059610368.17079.2220554006494174417.stgit@warthog.procyon.org.uk> User-Agent: StGit/unknown-version MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Tue, 19 Feb 2019 17:09:28 +0000 (UTC) Sender: owner-linux-security-module@vger.kernel.org Precedence: bulk List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Provide a system call by which a filesystem opened with fsopen() and configured by a series of fsconfig() calls can have a detached mount object created for it. This mount object can then be attached to the VFS mount hierarchy using move_mount() by passing the returned file descriptor as the from directory fd. The system call looks like: int mfd = fsmount(int fsfd, unsigned int flags, unsigned int attr_flags); where fsfd is the file descriptor returned by fsopen(). flags can be 0 or FSMOUNT_CLOEXEC. attr_flags is a bitwise-OR of the following flags: MOUNT_ATTR_RDONLY Mount read-only MOUNT_ATTR_NOSUID Ignore suid and sgid bits MOUNT_ATTR_NODEV Disallow access to device special files MOUNT_ATTR_NOEXEC Disallow program execution MOUNT_ATTR__ATIME Setting on how atime should be updated MOUNT_ATTR_RELATIME - Update atime relative to mtime/ctime MOUNT_ATTR_NOATIME - Do not update access times MOUNT_ATTR_STRICTATIME - Always perform atime updates MOUNT_ATTR_NODIRATIME Do not update directory access times In the event that fsmount() fails, it may be possible to get an error message by calling read() on fsfd. If no message is available, ENODATA will be reported. Signed-off-by: David Howells cc: linux-api@vger.kernel.org Signed-off-by: Al Viro --- arch/x86/entry/syscalls/syscall_32.tbl | 1 arch/x86/entry/syscalls/syscall_64.tbl | 1 fs/namespace.c | 146 +++++++++++++++++++++++++++++++- include/linux/syscalls.h | 1 include/uapi/linux/mount.h | 18 ++++ 5 files changed, 164 insertions(+), 3 deletions(-) diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index f9970310c126..c78b68256f8a 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -402,3 +402,4 @@ 388 i386 move_mount sys_move_mount __ia32_sys_move_mount 389 i386 fsopen sys_fsopen __ia32_sys_fsopen 390 i386 fsconfig sys_fsconfig __ia32_sys_fsconfig +391 i386 fsmount sys_fsmount __ia32_sys_fsmount diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 4185d36e03bb..d44ead5d4368 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -347,6 +347,7 @@ 336 common move_mount __x64_sys_move_mount 337 common fsopen __x64_sys_fsopen 338 common fsconfig __x64_sys_fsconfig +339 common fsmount __x64_sys_fsmount # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/fs/namespace.c b/fs/namespace.c index 56423c60ac7e..43cbce0643ce 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -3334,9 +3334,149 @@ SYSCALL_DEFINE5(mount, char __user *, dev_name, char __user *, dir_name, } /* - * Move a mount from one place to another. - * In combination with open_tree(OPEN_TREE_CLONE [| AT_RECURSIVE]) it can be - * used to copy a mount subtree. + * Create a kernel mount representation for a new, prepared superblock + * (specified by fs_fd) and attach to an open_tree-like file descriptor. + */ +SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags, + unsigned int, attr_flags) +{ + struct mnt_namespace *ns; + struct fs_context *fc; + struct file *file; + struct path newmount; + struct mount *mnt; + struct fd f; + unsigned int mnt_flags = 0; + long ret; + + if (!may_mount()) + return -EPERM; + + if ((flags & ~(FSMOUNT_CLOEXEC)) != 0) + return -EINVAL; + + if (attr_flags & ~(MOUNT_ATTR_RDONLY | + MOUNT_ATTR_NOSUID | + MOUNT_ATTR_NODEV | + MOUNT_ATTR_NOEXEC | + MOUNT_ATTR__ATIME | + MOUNT_ATTR_NODIRATIME)) + return -EINVAL; + + if (attr_flags & MOUNT_ATTR_RDONLY) + mnt_flags |= MNT_READONLY; + if (attr_flags & MOUNT_ATTR_NOSUID) + mnt_flags |= MNT_NOSUID; + if (attr_flags & MOUNT_ATTR_NODEV) + mnt_flags |= MNT_NODEV; + if (attr_flags & MOUNT_ATTR_NOEXEC) + mnt_flags |= MNT_NOEXEC; + if (attr_flags & MOUNT_ATTR_NODIRATIME) + mnt_flags |= MNT_NODIRATIME; + + switch (attr_flags & MOUNT_ATTR__ATIME) { + case MOUNT_ATTR_STRICTATIME: + break; + case MOUNT_ATTR_NOATIME: + mnt_flags |= MNT_NOATIME; + break; + case MOUNT_ATTR_RELATIME: + mnt_flags |= MNT_RELATIME; + break; + default: + return -EINVAL; + } + + f = fdget(fs_fd); + if (!f.file) + return -EBADF; + + ret = -EINVAL; + if (f.file->f_op != &fscontext_fops) + goto err_fsfd; + + fc = f.file->private_data; + + ret = mutex_lock_interruptible(&fc->uapi_mutex); + if (ret < 0) + goto err_fsfd; + + /* There must be a valid superblock or we can't mount it */ + ret = -EINVAL; + if (!fc->root) + goto err_unlock; + + ret = -EPERM; + if (mount_too_revealing(fc->root->d_sb, &mnt_flags)) { + pr_warn("VFS: Mount too revealing\n"); + goto err_unlock; + } + + ret = -EBUSY; + if (fc->phase != FS_CONTEXT_AWAITING_MOUNT) + goto err_unlock; + + ret = -EPERM; + if ((fc->sb_flags & SB_MANDLOCK) && !may_mandlock()) + goto err_unlock; + + newmount.mnt = vfs_create_mount(fc); + if (IS_ERR(newmount.mnt)) { + ret = PTR_ERR(newmount.mnt); + goto err_unlock; + } + newmount.dentry = dget(fc->root); + newmount.mnt->mnt_flags = mnt_flags; + + /* We've done the mount bit - now move the file context into more or + * less the same state as if we'd done an fspick(). We don't want to + * do any memory allocation or anything like that at this point as we + * don't want to have to handle any errors incurred. + */ + vfs_clean_context(fc); + + ns = alloc_mnt_ns(current->nsproxy->mnt_ns->user_ns, true); + if (IS_ERR(ns)) { + ret = PTR_ERR(ns); + goto err_path; + } + mnt = real_mount(newmount.mnt); + mnt->mnt_ns = ns; + ns->root = mnt; + ns->mounts = 1; + list_add(&mnt->mnt_list, &ns->list); + + /* Attach to an apparent O_PATH fd with a note that we need to unmount + * it, not just simply put it. + */ + file = dentry_open(&newmount, O_PATH, fc->cred); + if (IS_ERR(file)) { + dissolve_on_fput(newmount.mnt); + ret = PTR_ERR(file); + goto err_path; + } + file->f_mode |= FMODE_NEED_UNMOUNT; + + ret = get_unused_fd_flags((flags & FSMOUNT_CLOEXEC) ? O_CLOEXEC : 0); + if (ret >= 0) + fd_install(ret, file); + else + fput(file); + +err_path: + path_put(&newmount); +err_unlock: + mutex_unlock(&fc->uapi_mutex); +err_fsfd: + fdput(f); + return ret; +} + +/* + * Move a mount from one place to another. In combination with + * fsopen()/fsmount() this is used to install a new mount and in combination + * with open_tree(OPEN_TREE_CLONE [| AT_RECURSIVE]) it can be used to copy + * a mount subtree. * * Note the flags value is a combination of MOVE_MOUNT_* flags. */ diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 2586faf20078..a332a731a0e4 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -933,6 +933,7 @@ asmlinkage long sys_move_mount(int from_dfd, const char __user *from_path, asmlinkage long sys_fsopen(const char __user *fs_name, unsigned int flags); asmlinkage long sys_fsconfig(int fs_fd, unsigned int cmd, const char __user *key, const void __user *value, int aux); +asmlinkage long sys_fsmount(int fs_fd, unsigned int flags, unsigned int ms_flags); /* * Architecture-specific system calls diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h index 4b90ba9d1770..3888d3b91dc5 100644 --- a/include/uapi/linux/mount.h +++ b/include/uapi/linux/mount.h @@ -91,4 +91,22 @@ enum fsconfig_command { FSCONFIG_CMD_RECONFIGURE = 7, /* Invoke superblock reconfiguration */ }; +/* + * fsmount() flags. + */ +#define FSMOUNT_CLOEXEC 0x00000001 + +/* + * Mount attributes. + */ +#define MOUNT_ATTR_RDONLY 0x00000001 /* Mount read-only */ +#define MOUNT_ATTR_NOSUID 0x00000002 /* Ignore suid and sgid bits */ +#define MOUNT_ATTR_NODEV 0x00000004 /* Disallow access to device special files */ +#define MOUNT_ATTR_NOEXEC 0x00000008 /* Disallow program execution */ +#define MOUNT_ATTR__ATIME 0x00000070 /* Setting on how atime should be updated */ +#define MOUNT_ATTR_RELATIME 0x00000000 /* - Update atime relative to mtime/ctime. */ +#define MOUNT_ATTR_NOATIME 0x00000010 /* - Do not update access times. */ +#define MOUNT_ATTR_STRICTATIME 0x00000020 /* - Always perform atime updates */ +#define MOUNT_ATTR_NODIRATIME 0x00000080 /* Do not update directory access times */ + #endif /* _UAPI_LINUX_MOUNT_H */ From patchwork Tue Feb 19 17:09:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 10820459 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9C6031805 for ; Tue, 19 Feb 2019 17:09:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 841852CFEC for ; Tue, 19 Feb 2019 17:09:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 78B802CFFF; Tue, 19 Feb 2019 17:09:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 014CF2CFEC for ; Tue, 19 Feb 2019 17:09:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729363AbfBSRJg (ORCPT ); Tue, 19 Feb 2019 12:09:36 -0500 Received: from mx1.redhat.com ([209.132.183.28]:39154 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728201AbfBSRJf (ORCPT ); Tue, 19 Feb 2019 12:09:35 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 68E4CC007320; Tue, 19 Feb 2019 17:09:35 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-121-129.rdu2.redhat.com [10.10.121.129]) by smtp.corp.redhat.com (Postfix) with ESMTP id F3F9560BE8; Tue, 19 Feb 2019 17:09:33 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH 09/10] vfs: syscall: Add fspick() to select a superblock for reconfiguration From: David Howells To: viro@zeniv.linux.org.uk Cc: linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, torvalds@linux-foundation.org, ebiederm@xmission.com, linux-security-module@vger.kernel.org Date: Tue, 19 Feb 2019 17:09:33 +0000 Message-ID: <155059617323.17079.2377064615299669557.stgit@warthog.procyon.org.uk> In-Reply-To: <155059610368.17079.2220554006494174417.stgit@warthog.procyon.org.uk> References: <155059610368.17079.2220554006494174417.stgit@warthog.procyon.org.uk> User-Agent: StGit/unknown-version MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Tue, 19 Feb 2019 17:09:35 +0000 (UTC) Sender: owner-linux-security-module@vger.kernel.org Precedence: bulk List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Provide an fspick() system call that can be used to pick an existing mountpoint into an fs_context which can thereafter be used to reconfigure a superblock (equivalent of the superblock side of -o remount). This looks like: int fd = fspick(AT_FDCWD, "/mnt", FSPICK_CLOEXEC | FSPICK_NO_AUTOMOUNT); fsconfig(fd, FSCONFIG_SET_FLAG, "intr", NULL, 0); fsconfig(fd, FSCONFIG_SET_FLAG, "noac", NULL, 0); fsconfig(fd, FSCONFIG_CMD_RECONFIGURE, NULL, NULL, 0); At the point of fspick being called, the file descriptor referring to the filesystem context is in exactly the same state as the one that was created by fsopen() after fsmount() has been successfully called. Signed-off-by: David Howells cc: linux-api@vger.kernel.org Signed-off-by: Al Viro --- arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + fs/fsopen.c | 57 ++++++++++++++++++++++++++++++++ include/linux/syscalls.h | 1 + include/uapi/linux/mount.h | 8 ++++ 5 files changed, 68 insertions(+) diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index c78b68256f8a..d1eb6c815790 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -403,3 +403,4 @@ 389 i386 fsopen sys_fsopen __ia32_sys_fsopen 390 i386 fsconfig sys_fsconfig __ia32_sys_fsconfig 391 i386 fsmount sys_fsmount __ia32_sys_fsmount +392 i386 fspick sys_fspick __ia32_sys_fspick diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index d44ead5d4368..d3ab703c02bb 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -348,6 +348,7 @@ 337 common fsopen __x64_sys_fsopen 338 common fsconfig __x64_sys_fsconfig 339 common fsmount __x64_sys_fsmount +340 common fspick __x64_sys_fspick # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/fs/fsopen.c b/fs/fsopen.c index 65cc2f68f994..3bb9c0c8cbcc 100644 --- a/fs/fsopen.c +++ b/fs/fsopen.c @@ -156,6 +156,63 @@ SYSCALL_DEFINE2(fsopen, const char __user *, _fs_name, unsigned int, flags) return ret; } +/* + * Pick a superblock into a context for reconfiguration. + */ +SYSCALL_DEFINE3(fspick, int, dfd, const char __user *, path, unsigned int, flags) +{ + struct fs_context *fc; + struct path target; + unsigned int lookup_flags; + int ret; + + if (!ns_capable(current->nsproxy->mnt_ns->user_ns, CAP_SYS_ADMIN)) + return -EPERM; + + if ((flags & ~(FSPICK_CLOEXEC | + FSPICK_SYMLINK_NOFOLLOW | + FSPICK_NO_AUTOMOUNT | + FSPICK_EMPTY_PATH)) != 0) + return -EINVAL; + + lookup_flags = LOOKUP_FOLLOW | LOOKUP_AUTOMOUNT; + if (flags & FSPICK_SYMLINK_NOFOLLOW) + lookup_flags &= ~LOOKUP_FOLLOW; + if (flags & FSPICK_NO_AUTOMOUNT) + lookup_flags &= ~LOOKUP_AUTOMOUNT; + if (flags & FSPICK_EMPTY_PATH) + lookup_flags |= LOOKUP_EMPTY; + ret = user_path_at(dfd, path, lookup_flags, &target); + if (ret < 0) + goto err; + + ret = -EINVAL; + if (target.mnt->mnt_root != target.dentry) + goto err_path; + + fc = fs_context_for_reconfigure(target.dentry, 0, 0); + if (IS_ERR(fc)) { + ret = PTR_ERR(fc); + goto err_path; + } + + fc->phase = FS_CONTEXT_RECONF_PARAMS; + + ret = fscontext_alloc_log(fc); + if (ret < 0) + goto err_fc; + + path_put(&target); + return fscontext_create_fd(fc, flags & FSPICK_CLOEXEC ? O_CLOEXEC : 0); + +err_fc: + put_fs_context(fc); +err_path: + path_put(&target); +err: + return ret; +} + /* * Check the state and apply the configuration. Note that this function is * allowed to 'steal' the value by setting param->xxx to NULL before returning. diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index a332a731a0e4..c3f9d865b91d 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -934,6 +934,7 @@ asmlinkage long sys_fsopen(const char __user *fs_name, unsigned int flags); asmlinkage long sys_fsconfig(int fs_fd, unsigned int cmd, const char __user *key, const void __user *value, int aux); asmlinkage long sys_fsmount(int fs_fd, unsigned int flags, unsigned int ms_flags); +asmlinkage long sys_fspick(int dfd, const char __user *path, unsigned int flags); /* * Architecture-specific system calls diff --git a/include/uapi/linux/mount.h b/include/uapi/linux/mount.h index 3888d3b91dc5..96a0240f23fe 100644 --- a/include/uapi/linux/mount.h +++ b/include/uapi/linux/mount.h @@ -77,6 +77,14 @@ */ #define FSOPEN_CLOEXEC 0x00000001 +/* + * fspick() flags. + */ +#define FSPICK_CLOEXEC 0x00000001 +#define FSPICK_SYMLINK_NOFOLLOW 0x00000002 +#define FSPICK_NO_AUTOMOUNT 0x00000004 +#define FSPICK_EMPTY_PATH 0x00000008 + /* * The type of fsconfig() call made. */ From patchwork Tue Feb 19 17:09:40 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 10820461 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 24AED6C2 for ; Tue, 19 Feb 2019 17:09:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0102C2CFE0 for ; Tue, 19 Feb 2019 17:09:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E8E652CFF1; Tue, 19 Feb 2019 17:09:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9D42F2CFE0 for ; Tue, 19 Feb 2019 17:09:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729389AbfBSRJp (ORCPT ); Tue, 19 Feb 2019 12:09:45 -0500 Received: from mx1.redhat.com ([209.132.183.28]:34144 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728201AbfBSRJo (ORCPT ); Tue, 19 Feb 2019 12:09:44 -0500 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id BCBE436809; Tue, 19 Feb 2019 17:09:42 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-121-129.rdu2.redhat.com [10.10.121.129]) by smtp.corp.redhat.com (Postfix) with ESMTP id 54A345BBCB; Tue, 19 Feb 2019 17:09:41 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH 10/10] vfs: Add a sample program for the new mount API From: David Howells To: viro@zeniv.linux.org.uk Cc: linux-fsdevel@vger.kernel.org, dhowells@redhat.com, torvalds@linux-foundation.org, ebiederm@xmission.com, linux-security-module@vger.kernel.org Date: Tue, 19 Feb 2019 17:09:40 +0000 Message-ID: <155059618062.17079.18093724934271186662.stgit@warthog.procyon.org.uk> In-Reply-To: <155059610368.17079.2220554006494174417.stgit@warthog.procyon.org.uk> References: <155059610368.17079.2220554006494174417.stgit@warthog.procyon.org.uk> User-Agent: StGit/unknown-version MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Tue, 19 Feb 2019 17:09:42 +0000 (UTC) Sender: owner-linux-security-module@vger.kernel.org Precedence: bulk List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Add a sample program to demonstrate fsopen/fsmount/move_mount to mount something. To make it compile on all arches, irrespective of whether or not syscall numbers are assigned, define the syscall number to -1 if it isn't to cause the kernel to return -ENOSYS. Signed-off-by: David Howells Signed-off-by: Al Viro --- samples/Kconfig | 9 + samples/Makefile | 2 samples/statx/Makefile | 7 - samples/statx/test-statx.c | 258 ------------------------------------------- samples/vfs/Makefile | 10 ++ samples/vfs/test-fsmount.c | 133 ++++++++++++++++++++++ samples/vfs/test-statx.c | 267 ++++++++++++++++++++++++++++++++++++++++++++ 7 files changed, 416 insertions(+), 270 deletions(-) delete mode 100644 samples/statx/Makefile delete mode 100644 samples/statx/test-statx.c create mode 100644 samples/vfs/Makefile create mode 100644 samples/vfs/test-fsmount.c create mode 100644 samples/vfs/test-statx.c diff --git a/samples/Kconfig b/samples/Kconfig index ad1ec7016d4c..dc4eb5355fad 100644 --- a/samples/Kconfig +++ b/samples/Kconfig @@ -147,10 +147,11 @@ config SAMPLE_VFIO_MDEV_MBOCHS Specifically it does *not* include any legacy vga stuff. Device looks a lot like "qemu -device secondary-vga". -config SAMPLE_STATX - bool "Build example extended-stat using code" - depends on BROKEN +config SAMPLE_VFS + bool "Build example programs that use new VFS system calls" help - Build example userspace program to use the new extended-stat syscall. + Build example userspace programs that use new VFS system calls such + as mount API and statx(). Note that this is restricted to the x86 + arch whilst it accesses system calls that aren't yet in all arches. endif # SAMPLES diff --git a/samples/Makefile b/samples/Makefile index bd601c038b86..c5a6175c2d3f 100644 --- a/samples/Makefile +++ b/samples/Makefile @@ -3,4 +3,4 @@ obj-$(CONFIG_SAMPLES) += kobject/ kprobes/ trace_events/ livepatch/ \ hw_breakpoint/ kfifo/ kdb/ hidraw/ rpmsg/ seccomp/ \ configfs/ connector/ v4l/ trace_printk/ \ - vfio-mdev/ statx/ qmi/ + vfio-mdev/ vfs/ qmi/ diff --git a/samples/statx/Makefile b/samples/statx/Makefile deleted file mode 100644 index 59df7c25a9d1..000000000000 --- a/samples/statx/Makefile +++ /dev/null @@ -1,7 +0,0 @@ -# List of programs to build -hostprogs-$(CONFIG_SAMPLE_STATX) := test-statx - -# Tell kbuild to always build the programs -always := $(hostprogs-y) - -HOSTCFLAGS_test-statx.o += -I$(objtree)/usr/include diff --git a/samples/statx/test-statx.c b/samples/statx/test-statx.c deleted file mode 100644 index d4d77b09412c..000000000000 --- a/samples/statx/test-statx.c +++ /dev/null @@ -1,258 +0,0 @@ -/* Test the statx() system call. - * - * Note that the output of this program is intended to look like the output of - * /bin/stat where possible. - * - * Copyright (C) 2015 Red Hat, Inc. All Rights Reserved. - * Written by David Howells (dhowells@redhat.com) - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public Licence - * as published by the Free Software Foundation; either version - * 2 of the Licence, or (at your option) any later version. - */ - -#define _GNU_SOURCE -#define _ATFILE_SOURCE -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#define AT_STATX_SYNC_TYPE 0x6000 -#define AT_STATX_SYNC_AS_STAT 0x0000 -#define AT_STATX_FORCE_SYNC 0x2000 -#define AT_STATX_DONT_SYNC 0x4000 - -static __attribute__((unused)) -ssize_t statx(int dfd, const char *filename, unsigned flags, - unsigned int mask, struct statx *buffer) -{ - return syscall(__NR_statx, dfd, filename, flags, mask, buffer); -} - -static void print_time(const char *field, struct statx_timestamp *ts) -{ - struct tm tm; - time_t tim; - char buffer[100]; - int len; - - tim = ts->tv_sec; - if (!localtime_r(&tim, &tm)) { - perror("localtime_r"); - exit(1); - } - len = strftime(buffer, 100, "%F %T", &tm); - if (len == 0) { - perror("strftime"); - exit(1); - } - printf("%s", field); - fwrite(buffer, 1, len, stdout); - printf(".%09u", ts->tv_nsec); - len = strftime(buffer, 100, "%z", &tm); - if (len == 0) { - perror("strftime2"); - exit(1); - } - fwrite(buffer, 1, len, stdout); - printf("\n"); -} - -static void dump_statx(struct statx *stx) -{ - char buffer[256], ft = '?'; - - printf("results=%x\n", stx->stx_mask); - - printf(" "); - if (stx->stx_mask & STATX_SIZE) - printf(" Size: %-15llu", (unsigned long long)stx->stx_size); - if (stx->stx_mask & STATX_BLOCKS) - printf(" Blocks: %-10llu", (unsigned long long)stx->stx_blocks); - printf(" IO Block: %-6llu", (unsigned long long)stx->stx_blksize); - if (stx->stx_mask & STATX_TYPE) { - switch (stx->stx_mode & S_IFMT) { - case S_IFIFO: printf(" FIFO\n"); ft = 'p'; break; - case S_IFCHR: printf(" character special file\n"); ft = 'c'; break; - case S_IFDIR: printf(" directory\n"); ft = 'd'; break; - case S_IFBLK: printf(" block special file\n"); ft = 'b'; break; - case S_IFREG: printf(" regular file\n"); ft = '-'; break; - case S_IFLNK: printf(" symbolic link\n"); ft = 'l'; break; - case S_IFSOCK: printf(" socket\n"); ft = 's'; break; - default: - printf(" unknown type (%o)\n", stx->stx_mode & S_IFMT); - break; - } - } else { - printf(" no type\n"); - } - - sprintf(buffer, "%02x:%02x", stx->stx_dev_major, stx->stx_dev_minor); - printf("Device: %-15s", buffer); - if (stx->stx_mask & STATX_INO) - printf(" Inode: %-11llu", (unsigned long long) stx->stx_ino); - if (stx->stx_mask & STATX_NLINK) - printf(" Links: %-5u", stx->stx_nlink); - if (stx->stx_mask & STATX_TYPE) { - switch (stx->stx_mode & S_IFMT) { - case S_IFBLK: - case S_IFCHR: - printf(" Device type: %u,%u", - stx->stx_rdev_major, stx->stx_rdev_minor); - break; - } - } - printf("\n"); - - if (stx->stx_mask & STATX_MODE) - printf("Access: (%04o/%c%c%c%c%c%c%c%c%c%c) ", - stx->stx_mode & 07777, - ft, - stx->stx_mode & S_IRUSR ? 'r' : '-', - stx->stx_mode & S_IWUSR ? 'w' : '-', - stx->stx_mode & S_IXUSR ? 'x' : '-', - stx->stx_mode & S_IRGRP ? 'r' : '-', - stx->stx_mode & S_IWGRP ? 'w' : '-', - stx->stx_mode & S_IXGRP ? 'x' : '-', - stx->stx_mode & S_IROTH ? 'r' : '-', - stx->stx_mode & S_IWOTH ? 'w' : '-', - stx->stx_mode & S_IXOTH ? 'x' : '-'); - if (stx->stx_mask & STATX_UID) - printf("Uid: %5d ", stx->stx_uid); - if (stx->stx_mask & STATX_GID) - printf("Gid: %5d\n", stx->stx_gid); - - if (stx->stx_mask & STATX_ATIME) - print_time("Access: ", &stx->stx_atime); - if (stx->stx_mask & STATX_MTIME) - print_time("Modify: ", &stx->stx_mtime); - if (stx->stx_mask & STATX_CTIME) - print_time("Change: ", &stx->stx_ctime); - if (stx->stx_mask & STATX_BTIME) - print_time(" Birth: ", &stx->stx_btime); - - if (stx->stx_attributes_mask) { - unsigned char bits, mbits; - int loop, byte; - - static char attr_representation[64 + 1] = - /* STATX_ATTR_ flags: */ - "????????" /* 63-56 */ - "????????" /* 55-48 */ - "????????" /* 47-40 */ - "????????" /* 39-32 */ - "????????" /* 31-24 0x00000000-ff000000 */ - "????????" /* 23-16 0x00000000-00ff0000 */ - "???me???" /* 15- 8 0x00000000-0000ff00 */ - "?dai?c??" /* 7- 0 0x00000000-000000ff */ - ; - - printf("Attributes: %016llx (", stx->stx_attributes); - for (byte = 64 - 8; byte >= 0; byte -= 8) { - bits = stx->stx_attributes >> byte; - mbits = stx->stx_attributes_mask >> byte; - for (loop = 7; loop >= 0; loop--) { - int bit = byte + loop; - - if (!(mbits & 0x80)) - putchar('.'); /* Not supported */ - else if (bits & 0x80) - putchar(attr_representation[63 - bit]); - else - putchar('-'); /* Not set */ - bits <<= 1; - mbits <<= 1; - } - if (byte) - putchar(' '); - } - printf(")\n"); - } -} - -static void dump_hex(unsigned long long *data, int from, int to) -{ - unsigned offset, print_offset = 1, col = 0; - - from /= 8; - to = (to + 7) / 8; - - for (offset = from; offset < to; offset++) { - if (print_offset) { - printf("%04x: ", offset * 8); - print_offset = 0; - } - printf("%016llx", data[offset]); - col++; - if ((col & 3) == 0) { - printf("\n"); - print_offset = 1; - } else { - printf(" "); - } - } - - if (!print_offset) - printf("\n"); -} - -int main(int argc, char **argv) -{ - struct statx stx; - int ret, raw = 0, atflag = AT_SYMLINK_NOFOLLOW; - - unsigned int mask = STATX_ALL; - - for (argv++; *argv; argv++) { - if (strcmp(*argv, "-F") == 0) { - atflag &= ~AT_STATX_SYNC_TYPE; - atflag |= AT_STATX_FORCE_SYNC; - continue; - } - if (strcmp(*argv, "-D") == 0) { - atflag &= ~AT_STATX_SYNC_TYPE; - atflag |= AT_STATX_DONT_SYNC; - continue; - } - if (strcmp(*argv, "-L") == 0) { - atflag &= ~AT_SYMLINK_NOFOLLOW; - continue; - } - if (strcmp(*argv, "-O") == 0) { - mask &= ~STATX_BASIC_STATS; - continue; - } - if (strcmp(*argv, "-A") == 0) { - atflag |= AT_NO_AUTOMOUNT; - continue; - } - if (strcmp(*argv, "-R") == 0) { - raw = 1; - continue; - } - - memset(&stx, 0xbf, sizeof(stx)); - ret = statx(AT_FDCWD, *argv, atflag, mask, &stx); - printf("statx(%s) = %d\n", *argv, ret); - if (ret < 0) { - perror(*argv); - exit(1); - } - - if (raw) - dump_hex((unsigned long long *)&stx, 0, sizeof(stx)); - - dump_statx(&stx); - } - return 0; -} diff --git a/samples/vfs/Makefile b/samples/vfs/Makefile new file mode 100644 index 000000000000..4ac9690fb3c4 --- /dev/null +++ b/samples/vfs/Makefile @@ -0,0 +1,10 @@ +# List of programs to build +hostprogs-$(CONFIG_SAMPLE_VFS) := \ + test-fsmount \ + test-statx + +# Tell kbuild to always build the programs +always := $(hostprogs-y) + +HOSTCFLAGS_test-fsmount.o += -I$(objtree)/usr/include +HOSTCFLAGS_test-statx.o += -I$(objtree)/usr/include diff --git a/samples/vfs/test-fsmount.c b/samples/vfs/test-fsmount.c new file mode 100644 index 000000000000..266d72b3dce4 --- /dev/null +++ b/samples/vfs/test-fsmount.c @@ -0,0 +1,133 @@ +/* fd-based mount test. + * + * Copyright (C) 2017 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public Licence + * as published by the Free Software Foundation; either version + * 2 of the Licence, or (at your option) any later version. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define E(x) do { if ((x) == -1) { perror(#x); exit(1); } } while(0) + +static void check_messages(int fd) +{ + char buf[4096]; + int err, n; + + err = errno; + + for (;;) { + n = read(fd, buf, sizeof(buf)); + if (n < 0) + break; + n -= 2; + + switch (buf[0]) { + case 'e': + fprintf(stderr, "Error: %*.*s\n", n, n, buf + 2); + break; + case 'w': + fprintf(stderr, "Warning: %*.*s\n", n, n, buf + 2); + break; + case 'i': + fprintf(stderr, "Info: %*.*s\n", n, n, buf + 2); + break; + } + } + + errno = err; +} + +static __attribute__((noreturn)) +void mount_error(int fd, const char *s) +{ + check_messages(fd); + fprintf(stderr, "%s: %m\n", s); + exit(1); +} + +/* Hope -1 isn't a syscall */ +#ifndef __NR_fsopen +#define __NR_fsopen -1 +#endif +#ifndef __NR_fsmount +#define __NR_fsmount -1 +#endif +#ifndef __NR_fsconfig +#define __NR_fsconfig -1 +#endif +#ifndef __NR_move_mount +#define __NR_move_mount -1 +#endif + + +static inline int fsopen(const char *fs_name, unsigned int flags) +{ + return syscall(__NR_fsopen, fs_name, flags); +} + +static inline int fsmount(int fsfd, unsigned int flags, unsigned int ms_flags) +{ + return syscall(__NR_fsmount, fsfd, flags, ms_flags); +} + +static inline int fsconfig(int fsfd, unsigned int cmd, + const char *key, const void *val, int aux) +{ + return syscall(__NR_fsconfig, fsfd, cmd, key, val, aux); +} + +static inline int move_mount(int from_dfd, const char *from_pathname, + int to_dfd, const char *to_pathname, + unsigned int flags) +{ + return syscall(__NR_move_mount, + from_dfd, from_pathname, + to_dfd, to_pathname, flags); +} + +#define E_fsconfig(fd, cmd, key, val, aux) \ + do { \ + if (fsconfig(fd, cmd, key, val, aux) == -1) \ + mount_error(fd, key ?: "create"); \ + } while (0) + +int main(int argc, char *argv[]) +{ + int fsfd, mfd; + + /* Mount a publically available AFS filesystem */ + fsfd = fsopen("afs", 0); + if (fsfd == -1) { + perror("fsopen"); + exit(1); + } + + E_fsconfig(fsfd, FSCONFIG_SET_STRING, "source", "#grand.central.org:root.cell.", 0); + E_fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0); + + mfd = fsmount(fsfd, 0, MOUNT_ATTR_RDONLY); + if (mfd < 0) + mount_error(fsfd, "fsmount"); + E(close(fsfd)); + + if (move_mount(mfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH) < 0) { + perror("move_mount"); + exit(1); + } + + E(close(mfd)); + exit(0); +} diff --git a/samples/vfs/test-statx.c b/samples/vfs/test-statx.c new file mode 100644 index 000000000000..e91f918e84c4 --- /dev/null +++ b/samples/vfs/test-statx.c @@ -0,0 +1,267 @@ +/* Test the statx() system call. + * + * Note that the output of this program is intended to look like the output of + * /bin/stat where possible. + * + * Copyright (C) 2015 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public Licence + * as published by the Free Software Foundation; either version + * 2 of the Licence, or (at your option) any later version. + */ + +#define _GNU_SOURCE +#define _ATFILE_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#define statx foo +#define statx_timestamp foo_timestamp +#include +#undef statx +#undef statx_timestamp + +#define AT_STATX_SYNC_TYPE 0x6000 +#define AT_STATX_SYNC_AS_STAT 0x0000 +#define AT_STATX_FORCE_SYNC 0x2000 +#define AT_STATX_DONT_SYNC 0x4000 + +#ifndef __NR_statx +#define __NR_statx -1 +#endif + +static __attribute__((unused)) +ssize_t statx(int dfd, const char *filename, unsigned flags, + unsigned int mask, struct statx *buffer) +{ + return syscall(__NR_statx, dfd, filename, flags, mask, buffer); +} + +static void print_time(const char *field, struct statx_timestamp *ts) +{ + struct tm tm; + time_t tim; + char buffer[100]; + int len; + + tim = ts->tv_sec; + if (!localtime_r(&tim, &tm)) { + perror("localtime_r"); + exit(1); + } + len = strftime(buffer, 100, "%F %T", &tm); + if (len == 0) { + perror("strftime"); + exit(1); + } + printf("%s", field); + fwrite(buffer, 1, len, stdout); + printf(".%09u", ts->tv_nsec); + len = strftime(buffer, 100, "%z", &tm); + if (len == 0) { + perror("strftime2"); + exit(1); + } + fwrite(buffer, 1, len, stdout); + printf("\n"); +} + +static void dump_statx(struct statx *stx) +{ + char buffer[256], ft = '?'; + + printf("results=%x\n", stx->stx_mask); + + printf(" "); + if (stx->stx_mask & STATX_SIZE) + printf(" Size: %-15llu", (unsigned long long)stx->stx_size); + if (stx->stx_mask & STATX_BLOCKS) + printf(" Blocks: %-10llu", (unsigned long long)stx->stx_blocks); + printf(" IO Block: %-6llu", (unsigned long long)stx->stx_blksize); + if (stx->stx_mask & STATX_TYPE) { + switch (stx->stx_mode & S_IFMT) { + case S_IFIFO: printf(" FIFO\n"); ft = 'p'; break; + case S_IFCHR: printf(" character special file\n"); ft = 'c'; break; + case S_IFDIR: printf(" directory\n"); ft = 'd'; break; + case S_IFBLK: printf(" block special file\n"); ft = 'b'; break; + case S_IFREG: printf(" regular file\n"); ft = '-'; break; + case S_IFLNK: printf(" symbolic link\n"); ft = 'l'; break; + case S_IFSOCK: printf(" socket\n"); ft = 's'; break; + default: + printf(" unknown type (%o)\n", stx->stx_mode & S_IFMT); + break; + } + } else { + printf(" no type\n"); + } + + sprintf(buffer, "%02x:%02x", stx->stx_dev_major, stx->stx_dev_minor); + printf("Device: %-15s", buffer); + if (stx->stx_mask & STATX_INO) + printf(" Inode: %-11llu", (unsigned long long) stx->stx_ino); + if (stx->stx_mask & STATX_NLINK) + printf(" Links: %-5u", stx->stx_nlink); + if (stx->stx_mask & STATX_TYPE) { + switch (stx->stx_mode & S_IFMT) { + case S_IFBLK: + case S_IFCHR: + printf(" Device type: %u,%u", + stx->stx_rdev_major, stx->stx_rdev_minor); + break; + } + } + printf("\n"); + + if (stx->stx_mask & STATX_MODE) + printf("Access: (%04o/%c%c%c%c%c%c%c%c%c%c) ", + stx->stx_mode & 07777, + ft, + stx->stx_mode & S_IRUSR ? 'r' : '-', + stx->stx_mode & S_IWUSR ? 'w' : '-', + stx->stx_mode & S_IXUSR ? 'x' : '-', + stx->stx_mode & S_IRGRP ? 'r' : '-', + stx->stx_mode & S_IWGRP ? 'w' : '-', + stx->stx_mode & S_IXGRP ? 'x' : '-', + stx->stx_mode & S_IROTH ? 'r' : '-', + stx->stx_mode & S_IWOTH ? 'w' : '-', + stx->stx_mode & S_IXOTH ? 'x' : '-'); + if (stx->stx_mask & STATX_UID) + printf("Uid: %5d ", stx->stx_uid); + if (stx->stx_mask & STATX_GID) + printf("Gid: %5d\n", stx->stx_gid); + + if (stx->stx_mask & STATX_ATIME) + print_time("Access: ", &stx->stx_atime); + if (stx->stx_mask & STATX_MTIME) + print_time("Modify: ", &stx->stx_mtime); + if (stx->stx_mask & STATX_CTIME) + print_time("Change: ", &stx->stx_ctime); + if (stx->stx_mask & STATX_BTIME) + print_time(" Birth: ", &stx->stx_btime); + + if (stx->stx_attributes_mask) { + unsigned char bits, mbits; + int loop, byte; + + static char attr_representation[64 + 1] = + /* STATX_ATTR_ flags: */ + "????????" /* 63-56 */ + "????????" /* 55-48 */ + "????????" /* 47-40 */ + "????????" /* 39-32 */ + "????????" /* 31-24 0x00000000-ff000000 */ + "????????" /* 23-16 0x00000000-00ff0000 */ + "???me???" /* 15- 8 0x00000000-0000ff00 */ + "?dai?c??" /* 7- 0 0x00000000-000000ff */ + ; + + printf("Attributes: %016llx (", + (unsigned long long)stx->stx_attributes); + for (byte = 64 - 8; byte >= 0; byte -= 8) { + bits = stx->stx_attributes >> byte; + mbits = stx->stx_attributes_mask >> byte; + for (loop = 7; loop >= 0; loop--) { + int bit = byte + loop; + + if (!(mbits & 0x80)) + putchar('.'); /* Not supported */ + else if (bits & 0x80) + putchar(attr_representation[63 - bit]); + else + putchar('-'); /* Not set */ + bits <<= 1; + mbits <<= 1; + } + if (byte) + putchar(' '); + } + printf(")\n"); + } +} + +static void dump_hex(unsigned long long *data, int from, int to) +{ + unsigned offset, print_offset = 1, col = 0; + + from /= 8; + to = (to + 7) / 8; + + for (offset = from; offset < to; offset++) { + if (print_offset) { + printf("%04x: ", offset * 8); + print_offset = 0; + } + printf("%016llx", data[offset]); + col++; + if ((col & 3) == 0) { + printf("\n"); + print_offset = 1; + } else { + printf(" "); + } + } + + if (!print_offset) + printf("\n"); +} + +int main(int argc, char **argv) +{ + struct statx stx; + int ret, raw = 0, atflag = AT_SYMLINK_NOFOLLOW; + + unsigned int mask = STATX_ALL; + + for (argv++; *argv; argv++) { + if (strcmp(*argv, "-F") == 0) { + atflag &= ~AT_STATX_SYNC_TYPE; + atflag |= AT_STATX_FORCE_SYNC; + continue; + } + if (strcmp(*argv, "-D") == 0) { + atflag &= ~AT_STATX_SYNC_TYPE; + atflag |= AT_STATX_DONT_SYNC; + continue; + } + if (strcmp(*argv, "-L") == 0) { + atflag &= ~AT_SYMLINK_NOFOLLOW; + continue; + } + if (strcmp(*argv, "-O") == 0) { + mask &= ~STATX_BASIC_STATS; + continue; + } + if (strcmp(*argv, "-A") == 0) { + atflag |= AT_NO_AUTOMOUNT; + continue; + } + if (strcmp(*argv, "-R") == 0) { + raw = 1; + continue; + } + + memset(&stx, 0xbf, sizeof(stx)); + ret = statx(AT_FDCWD, *argv, atflag, mask, &stx); + printf("statx(%s) = %d\n", *argv, ret); + if (ret < 0) { + perror(*argv); + exit(1); + } + + if (raw) + dump_hex((unsigned long long *)&stx, 0, sizeof(stx)); + + dump_statx(&stx); + } + return 0; +}