From patchwork Tue Mar 7 16:26:43 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Miklos Szeredi X-Patchwork-Id: 9609613 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id AB1536016C for ; Tue, 7 Mar 2017 18:27:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A09DB20564 for ; Tue, 7 Mar 2017 18:27:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 927CD26220; Tue, 7 Mar 2017 18:27:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI, RCVD_IN_SORBS_SPAM, T_DKIM_INVALID autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F3D9A20564 for ; Tue, 7 Mar 2017 18:27:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755313AbdCGS1D (ORCPT ); Tue, 7 Mar 2017 13:27:03 -0500 Received: from mail-wm0-f67.google.com ([74.125.82.67]:35585 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750714AbdCGS1C (ORCPT ); Tue, 7 Mar 2017 13:27:02 -0500 Received: by mail-wm0-f67.google.com with SMTP id z63so2384765wmg.2 for ; Tue, 07 Mar 2017 10:27:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=szeredi.hu; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=ODslSwFAfoSun1pMyS3K0KqYnqkuYpeDr1QbI7ecswk=; b=aKR/wII5A2Gn9y27UiITyNWzmWUzdcwf6gB61mMVYzraQCaFoVZ+RYLQmnJZ+LAA7v d/8jCHRufI5ki1Z1uUzJE/lFqr8FtXlpofnVquOvKBpY1dDBRIE+JBFvDxJCAX0MMFjg gn2DdyDJye3GjxUHy9ZR86rC3LtvctuW7lceU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=ODslSwFAfoSun1pMyS3K0KqYnqkuYpeDr1QbI7ecswk=; b=Qcvzwp2vv6cCXK6iO4uli8J3nCLLKZS+V1t/C9I/AtD1ejnxIFzJ5zqs2R5ZWCO9uF 3vOzYCoxj6ylmLPwWdHOoE41aExm5YHcmyzPHBkp1c/waaKSELyCdBDy6CfNVnFLWLQ/ FzyxWun39Mpw8SVvyKDAp0f5rKzgjXWkBIypM76F4rcvQzijUVyimRm+0Ta8MENIb+8g 8xIvmfkWiY46+sBxCYEvkfyTzsrARb6RgGzYbjQ3m41/rENodulNpqrBzPLIcS0i2duU ahHDGHn0ooE4EehErRsVG2U4FAh0OEPYbPCoPvy56ZL/9vn+hnwSPdeIVrBnhuTafzpn rlQQ== X-Gm-Message-State: AMke39nXtcROGdQWQjsedPFvuPnuVPIQq0OXmLav+094UAJFypczlidRkji8DO9nqayc8g== X-Received: by 10.28.203.197 with SMTP id b188mr1766026wmg.110.1488904006916; Tue, 07 Mar 2017 08:26:46 -0800 (PST) Received: from veci.piliscsaba.szeredi.hu (catv-176-63-54-97.catv.broadband.hu. [176.63.54.97]) by smtp.gmail.com with ESMTPSA id f67sm19604486wmd.0.2017.03.07.08.26.45 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 07 Mar 2017 08:26:46 -0800 (PST) Date: Tue, 7 Mar 2017 17:26:43 +0100 From: Miklos Szeredi To: Al Viro Cc: Miklos Szeredi , linux-unionfs@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH 0/9] overlay: fix inconsistency of ro file after copy-up Message-ID: <20170307162643.GC30961@veci.piliscsaba.szeredi.hu> References: <1487347778-18596-1-git-send-email-mszeredi@redhat.com> <20170219091430.GL29622@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20170219091430.GL29622@ZenIV.linux.org.uk> User-Agent: Mutt/1.7.1 (2016-10-04) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Sun, Feb 19, 2017 at 09:14:41AM +0000, Al Viro wrote: > On Fri, Feb 17, 2017 at 05:09:29PM +0100, Miklos Szeredi wrote: > > A file is opened for read-only, opened read-write (resulting in a copy up) > > and modified. The data read back from the the read-only fd will be stale > > in this case (the read-only file descriptor still refers to the lower, > > unmodified file). > > > > This patchset fixes issues related to this corner case. This is a > > requirement from various parties for accepting overlayfs as a "POSIX" > > filesystem. > > > > When an operation (read, mmap, fsync) is done on an overlay fd opened > > read-only that is referring to a lower file, check if it has been copied up > > in the mean time. If so, open the upper file and use that for the operation. > > > > To make the performance impact minimal for non-overlay case, use a flag in > > file->f_mode to indicate that this is an overlay file. > > This is one hell of a DoS vector - it's really easy to eat tons of struct > file that way. Preparatory parts of that series make sense on their own, > but your "let's allocate a struct file, call ->open() and schedule that > struct file for closing upon the exit to userland on each kernel_read()" > is not. How about this? Do you see a problem with calling __fput() synchronously here? Thanks, Miklos --- fs/Makefile | 2 - fs/file_table.c | 2 - fs/internal.h | 1 fs/overlay_util.c | 53 +++++++++++++++++++++++++++++++++++++++++++ include/linux/fs.h | 11 ++++++++ include/linux/overlay_util.h | 13 ++++++++++ 6 files changed, 80 insertions(+), 2 deletions(-) --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -31,6 +31,7 @@ #include #include #include +#include #include #include @@ -1721,9 +1722,19 @@ struct inode_operations { int (*set_acl)(struct inode *, struct posix_acl *, int); } ____cacheline_aligned; + +static inline bool overlay_file_inconsistent(struct file *file) +{ + return unlikely(file->f_path.dentry->d_flags & DCACHE_OP_REAL) && + unlikely(d_real_inode(file->f_path.dentry) != file_inode(file)); +} + static inline ssize_t call_read_iter(struct file *file, struct kiocb *kio, struct iov_iter *iter) { + if (overlay_file_inconsistent(file)) + return overlay_read_iter(file, kio, iter); + return file->f_op->read_iter(kio, iter); } --- a/fs/Makefile +++ b/fs/Makefile @@ -11,7 +11,7 @@ obj-y := open.o read_write.o file_table. attr.o bad_inode.o file.o filesystems.o namespace.o \ seq_file.o xattr.o libfs.o fs-writeback.o \ pnode.o splice.o sync.o utimes.o \ - stack.o fs_struct.o statfs.o fs_pin.o nsfs.o + stack.o fs_struct.o statfs.o fs_pin.o nsfs.o overlay_util.o ifeq ($(CONFIG_BLOCK),y) obj-y += buffer.o block_dev.o direct-io.o mpage.o --- /dev/null +++ b/fs/overlay_util.c @@ -0,0 +1,53 @@ +/* + * Copyright (C) 2017 Red Hat, Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ + +#include +#include +#include +#include "internal.h" + +static struct file *overlay_clone_file(struct file *file) +{ + file = filp_clone_open(file); + if (!IS_ERR(file)) + file->f_mode |= FMODE_NONOTIFY; + + return file; +} + +/* + * Do the release synchronously. Otherwise we'd have a DoS problem when doing + * multiple reads (e.g. through kernel_read()) and only releasing the cloned + * files when returning to userspace. + * + * There's no risk of final dput or final mntput happening, since caller holds + * ref to both through the original file. + */ +static void overlay_put_cloned_file(struct file *file) +{ + if (WARN_ON(!atomic_long_dec_and_test(&file->f_count))) + return; + + __fput(file); +} + +ssize_t overlay_read_iter(struct file *file, struct kiocb *kio, + struct iov_iter *iter) +{ + ssize_t ret; + + file = overlay_clone_file(file); + if (IS_ERR(file)) + return PTR_ERR(file); + + ret = vfs_iter_read(file, iter, &kio->ki_pos); + overlay_put_cloned_file(file); + + return ret; +} +EXPORT_SYMBOL(overlay_read_iter); --- /dev/null +++ b/include/linux/overlay_util.h @@ -0,0 +1,13 @@ +#ifndef _LINUX_OVERLAY_FS_H +#define _LINUX_OVERLAY_FS_H + +#include + +struct file; +struct kiocb; +struct iov_iter; + +extern ssize_t overlay_read_iter(struct file *file, struct kiocb *kio, + struct iov_iter *iter); + +#endif /* _LINUX_OVERLAY_FS_H */ --- a/fs/file_table.c +++ b/fs/file_table.c @@ -184,7 +184,7 @@ EXPORT_SYMBOL(alloc_file); /* the real guts of fput() - releasing the last reference to file */ -static void __fput(struct file *file) +void __fput(struct file *file) { struct dentry *dentry = file->f_path.dentry; struct vfsmount *mnt = file->f_path.mnt; --- a/fs/internal.h +++ b/fs/internal.h @@ -83,6 +83,7 @@ extern void chroot_fs_refs(const struct * file_table.c */ extern struct file *get_empty_filp(void); +extern void __fput(struct file *); /* * super.c