From patchwork Wed Oct 12 13:33:26 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Miklos Szeredi X-Patchwork-Id: 9373113 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 3655460772 for ; Wed, 12 Oct 2016 13:34:01 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 25EF529DD2 for ; Wed, 12 Oct 2016 13:34:01 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 19B2B29DCF; Wed, 12 Oct 2016 13:34:01 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B6FCA29DCF for ; Wed, 12 Oct 2016 13:33:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933709AbcJLNd4 (ORCPT ); Wed, 12 Oct 2016 09:33:56 -0400 Received: from mail-lf0-f67.google.com ([209.85.215.67]:35395 "EHLO mail-lf0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933754AbcJLNdc (ORCPT ); Wed, 12 Oct 2016 09:33:32 -0400 Received: by mail-lf0-f67.google.com with SMTP id x79so7172093lff.2 for ; Wed, 12 Oct 2016 06:33:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=szeredi.hu; s=google; h=date:from:to:cc:subject:message-id:mime-version:content-disposition :user-agent; bh=AdX3j7+sdyQpBn3InxasZeWNE9qHq0ijFzvVDqaOnsU=; b=IrMoVQ8pXG8/dbJVnqDTyfIHmlxjmncNxi4yChTISwKHPsNcCIC2GkMABH8faRQ/rn BjCX3d1ub530f6VTrb45gvs9EIskSbulxyz+UDwHtSgZe7L69J+neuxy95asU/zLKw4+ TPOwmu8XiFPrl6Tn/qX50WClER9KCTaib9z0o= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:mime-version :content-disposition:user-agent; bh=AdX3j7+sdyQpBn3InxasZeWNE9qHq0ijFzvVDqaOnsU=; b=nOiHqcROL36wP/BWNm3KkSJrITI/AKVD6TTqLp/iV0k6DVGqs/2UeIDdb9IDseHTG3 xFAdKppby7kHEURl0KQBxIIBb6zfk2LDz3QbtOW7uuu1MOeJuWSJ03YrRQYXFV58MguC OGQ09fWFekJmSm34n0a/+824iwtLMhXCUqDwcMV90vF2JfPYzRZsUXh72N8Qje3DuyvN ezjL3RvIF494GkvWCb+TFVlRkAsvgBD5L1HADiYV5pdcCfepELPJv+JIzvIDasWUA66y P5VXqBUycL5Cht9VxOHV7RFgU76KqifIv082eyxxkoiy7tCMoekByZS2sX9FA9i9F9E7 le3Q== X-Gm-Message-State: AA6/9RkUobydQ1hneBXCPkI12ji7jSh7opFWLeI75PW23uUPgy+aBPAFYfOHWwA+kX8SFg== X-Received: by 10.194.22.34 with SMTP id a2mr1491850wjf.95.1476279210147; Wed, 12 Oct 2016 06:33:30 -0700 (PDT) Received: from veci.piliscsaba.szeredi.hu (pool-dsl-2c-0018.externet.hu. [217.173.44.24]) by smtp.gmail.com with ESMTPSA id b8sm12948302wjq.40.2016.10.12.06.33.28 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 12 Oct 2016 06:33:29 -0700 (PDT) Date: Wed, 12 Oct 2016 15:33:26 +0200 From: Miklos Szeredi To: linux-unionfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Jeremy Eder , David Howells , Ratna Bolla , Gou Rao , Vinod Jayaraman , Al Viro , Vivek Goyal , Dave Chinner Subject: [POC/RFC PATCH] overlayfs: fix data inconsistency at copy up Message-ID: <20161012133326.GD31239@veci.piliscsaba.szeredi.hu> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.7.0 (2016-08-17) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This is a proof of concept patch to fix the following. /ovl is in overlay mount and /ovl/foo exists on the lower layer only. rofd = open("/ovl/foo", O_RDONLY); rwfd = open("/ovl/foo", O_WRONLY); /* this causes copy up */ write(rwfd, "bar", 3); read(rofd, buf, 3); assert(memcmp(buf, "bar", 3) == 0); Similar problem exists with an MAP_SHARED mmap created from rofd. While this has only caused few problems (yum/dnf failure is the only one I know of) and easily worked around in userspace, many see it as a proof that overlayfs can never be a proper "POSIX" filesystem. To quell those worries, here's a simple patch that should address the above. The only VFS change is that f_op is initialized from f_path.dentry->d_inode instead of file_inode(filp) in open. The effect of this is that overlayfs can intercept open and other file operations, while the file still effectively belongs to the underlying fs. The patch does not give up on the nice properties of overlayfs, like sharing the page cache with the underlying files. It does cause copy up in one case where previously there wasn't one and that's the O_RDONLY/MAP_SHARED case. I haven't done much research into this, but running some tests in chroot didn't trigger this. Comments, testing are welcome. Thanks, Miklos --- fs/internal.h | 1 - fs/open.c | 2 +- fs/overlayfs/dir.c | 2 +- fs/overlayfs/inode.c | 175 +++++++++++++++++++++++++++++++++++++++++++---- fs/overlayfs/overlayfs.h | 2 +- fs/overlayfs/super.c | 7 +- include/linux/fs.h | 1 + 7 files changed, 169 insertions(+), 21 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/internal.h b/fs/internal.h index f4da3341b4a3..361ba1c12698 100644 --- a/fs/internal.h +++ b/fs/internal.h @@ -112,7 +112,6 @@ extern long do_handle_open(int mountdirfd, struct file_handle __user *ufh, int open_flag); extern int open_check_o_direct(struct file *f); extern int vfs_open(const struct path *, struct file *, const struct cred *); -extern struct file *filp_clone_open(struct file *); /* * inode.c diff --git a/fs/open.c b/fs/open.c index a7719cfb7257..e21f1a6f77b7 100644 --- a/fs/open.c +++ b/fs/open.c @@ -728,7 +728,7 @@ static int do_dentry_open(struct file *f, if (S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode)) f->f_mode |= FMODE_ATOMIC_POS; - f->f_op = fops_get(inode->i_fop); + f->f_op = fops_get(f->f_path.dentry->d_inode->i_fop); if (unlikely(WARN_ON(!f->f_op))) { error = -ENODEV; goto cleanup_all; diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c index 5f90ddf778ba..1ea511be6e37 100644 --- a/fs/overlayfs/dir.c +++ b/fs/overlayfs/dir.c @@ -534,7 +534,7 @@ static int ovl_create_object(struct dentry *dentry, int mode, dev_t rdev, goto out; err = -ENOMEM; - inode = ovl_new_inode(dentry->d_sb, mode); + inode = ovl_new_inode(dentry->d_sb, mode, rdev); if (!inode) goto out_drop_write; diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c index c18d6a4ff456..744c8eb7e947 100644 --- a/fs/overlayfs/inode.c +++ b/fs/overlayfs/inode.c @@ -11,6 +11,9 @@ #include #include #include +#include +#include +#include #include "overlayfs.h" static int ovl_copy_up_truncate(struct dentry *dentry) @@ -381,7 +384,154 @@ static const struct inode_operations ovl_symlink_inode_operations = { .update_time = ovl_update_time, }; -static void ovl_fill_inode(struct inode *inode, umode_t mode) +static const struct file_operations ovl_file_operations; + +static const struct file_operations *ovl_real_fop(struct file *file) +{ + return file_inode(file)->i_fop; +} + +static int ovl_open(struct inode *realinode, struct file *file) +{ + int ret = 0; + const struct file_operations *fop = ovl_real_fop(file); + bool isupper = OVL_TYPE_UPPER(ovl_path_type(file->f_path.dentry)); + + /* Don't intercept upper file operations */ + if (isupper) + replace_fops(file, fop); + + if (fop->open) + ret = fop->open(realinode, file); + + if (!isupper && WARN_ON(file->f_op != &ovl_file_operations)) { + if (fop->release) + fop->release(realinode, file); + return -EINVAL; + } + + return ret; +} + +static int ovl_release(struct inode *realinode, struct file *file) +{ + const struct file_operations *fop = ovl_real_fop(file); + + if (fop->release) + return fop->release(realinode, file); + + return 0; +} + +static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *to) +{ + struct file *file = iocb->ki_filp; + bool isupper = OVL_TYPE_UPPER(ovl_path_type(file->f_path.dentry)); + ssize_t ret = -EINVAL; + + if (likely(!isupper)) { + const struct file_operations *fop = ovl_real_fop(file); + + if (likely(fop->read_iter)) + ret = fop->read_iter(iocb, to); + } else { + struct file *upperfile = filp_clone_open(file); + + if (IS_ERR(upperfile)) { + ret = PTR_ERR(upperfile); + } else { + ret = vfs_iter_read(upperfile, to, &iocb->ki_pos); + fput(upperfile); + } + } + + return ret; +} + +static loff_t ovl_llseek(struct file *file, loff_t offset, int whence) +{ + const struct file_operations *fop = ovl_real_fop(file); + + if (fop->llseek) + return fop->llseek(file, offset, whence); + + return -EINVAL; +} + +static int ovl_mmap(struct file *file, struct vm_area_struct *vma) +{ + const struct file_operations *fop = ovl_real_fop(file); + bool isupper = OVL_TYPE_UPPER(ovl_path_type(file->f_path.dentry)); + int err; + + /* + * Treat MAP_SHARED as hint about future writes to the file (through + * another file descriptor). Caller might not have had such an intent, + * but we hope MAP_PRIVATE will be used in most such cases. + * + * If we don't copy up now and the file is modified, it becomes really + * difficult to change the mapping to match that of the file's content + * later. + */ + if (unlikely(isupper || vma->vm_flags & VM_MAYSHARE)) { + if (!isupper) { + err = ovl_copy_up(file->f_path.dentry); + if (err) + goto out; + } + + file = filp_clone_open(file); + err = PTR_ERR(file); + if (IS_ERR(file)) + goto out; + + fput(vma->vm_file); + /* transfer ref: */ + vma->vm_file = file; + fop = file->f_op; + } + err = -EINVAL; + if (fop->mmap) + err = fop->mmap(file, vma); +out: + return err; +} + +static int ovl_fsync(struct file *file, loff_t start, loff_t end, + int datasync) +{ + bool isupper = OVL_TYPE_UPPER(ovl_path_type(file->f_path.dentry)); + int ret = -EINVAL; + + if (likely(!isupper)) { + const struct file_operations *fop = ovl_real_fop(file); + + if (likely(fop->fsync)) + ret = fop->fsync(file, start, end, datasync); + } else { + struct file *upperfile = filp_clone_open(file); + + if (IS_ERR(upperfile)) { + ret = PTR_ERR(upperfile); + } else { + ret = vfs_fsync_range(upperfile, start, end, datasync); + fput(upperfile); + } + } + + return ret; +} + +static const struct file_operations ovl_file_operations = { + .open = ovl_open, + .release = ovl_release, + .read_iter = ovl_read_iter, + .llseek = ovl_llseek, + .mmap = ovl_mmap, + .fsync = ovl_fsync, +}; + +static void ovl_fill_inode(struct inode *inode, umode_t mode, dev_t rdev) { inode->i_ino = get_next_ino(); inode->i_mode = mode; @@ -390,8 +540,12 @@ static void ovl_fill_inode(struct inode *inode, umode_t mode) inode->i_acl = inode->i_default_acl = ACL_DONT_CACHE; #endif - mode &= S_IFMT; - switch (mode) { + switch (mode & S_IFMT) { + case S_IFREG: + inode->i_op = &ovl_file_inode_operations; + inode->i_fop = &ovl_file_operations; + break; + case S_IFDIR: inode->i_op = &ovl_dir_inode_operations; inode->i_fop = &ovl_dir_operations; @@ -402,26 +556,19 @@ static void ovl_fill_inode(struct inode *inode, umode_t mode) break; default: - WARN(1, "illegal file type: %i\n", mode); - /* Fall through */ - - case S_IFREG: - case S_IFSOCK: - case S_IFBLK: - case S_IFCHR: - case S_IFIFO: inode->i_op = &ovl_file_inode_operations; + init_special_inode(inode, mode, rdev); break; } } -struct inode *ovl_new_inode(struct super_block *sb, umode_t mode) +struct inode *ovl_new_inode(struct super_block *sb, umode_t mode, dev_t rdev) { struct inode *inode; inode = new_inode(sb); if (inode) - ovl_fill_inode(inode, mode); + ovl_fill_inode(inode, mode, rdev); return inode; } @@ -445,7 +592,7 @@ struct inode *ovl_get_inode(struct super_block *sb, struct inode *realinode) inode = iget5_locked(sb, (unsigned long) realinode, ovl_inode_test, ovl_inode_set, realinode); if (inode && inode->i_state & I_NEW) { - ovl_fill_inode(inode, realinode->i_mode); + ovl_fill_inode(inode, realinode->i_mode, realinode->i_rdev); set_nlink(inode, realinode->i_nlink); unlock_new_inode(inode); } diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h index e218e741cb99..95d0d86c2d54 100644 --- a/fs/overlayfs/overlayfs.h +++ b/fs/overlayfs/overlayfs.h @@ -195,7 +195,7 @@ int ovl_open_maybe_copy_up(struct dentry *dentry, unsigned int file_flags); int ovl_update_time(struct inode *inode, struct timespec *ts, int flags); bool ovl_is_private_xattr(const char *name); -struct inode *ovl_new_inode(struct super_block *sb, umode_t mode); +struct inode *ovl_new_inode(struct super_block *sb, umode_t mode, dev_t rdev); struct inode *ovl_get_inode(struct super_block *sb, struct inode *realinode); static inline void ovl_copyattr(struct inode *from, struct inode *to) { diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c index 7e3f0127fc1a..daed68d13467 100644 --- a/fs/overlayfs/super.c +++ b/fs/overlayfs/super.c @@ -305,7 +305,7 @@ static struct dentry *ovl_d_real(struct dentry *dentry, { struct dentry *real; - if (d_is_dir(dentry)) { + if (!d_is_reg(dentry)) { if (!inode || inode == d_inode(dentry)) return dentry; goto bug; @@ -579,7 +579,8 @@ struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry, if (upperdentry && !d_is_dir(upperdentry)) { inode = ovl_get_inode(dentry->d_sb, realinode); } else { - inode = ovl_new_inode(dentry->d_sb, realinode->i_mode); + inode = ovl_new_inode(dentry->d_sb, realinode->i_mode, + realinode->i_rdev); if (inode) ovl_inode_init(inode, realinode, !!upperdentry); } @@ -1292,7 +1293,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent) if (!oe) goto out_put_cred; - root_dentry = d_make_root(ovl_new_inode(sb, S_IFDIR)); + root_dentry = d_make_root(ovl_new_inode(sb, S_IFDIR, 0)); if (!root_dentry) goto out_free_oe; diff --git a/include/linux/fs.h b/include/linux/fs.h index bc65d5918140..cc7d1203b846 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2342,6 +2342,7 @@ extern struct file *filp_open(const char *, int, umode_t); extern struct file *file_open_root(struct dentry *, struct vfsmount *, const char *, int, umode_t); extern struct file * dentry_open(const struct path *, int, const struct cred *); +extern struct file *filp_clone_open(struct file *); extern int filp_close(struct file *, fl_owner_t id); extern struct filename *getname_flags(const char __user *, int, int *);