diff mbox

[14/39] ovl: stack file ops

Message ID 20180529144339.16538-15-mszeredi@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Miklos Szeredi May 29, 2018, 2:43 p.m. UTC
Implement file operations on a regular overlay file.  The underlying file
is opened separately and cached in ->private_data.

It might be worth making an exception for such files when accounting in
nr_file to confirm to userspace expectations.  We are only adding a small
overhead (248bytes for the struct file) since the real inode and dentry are
pinned by overlayfs anyway.

This patch doesn't have any effect, since the vfs will use d_real() to find
the real underlying file to open.  The patch at the end of the series will
actually enable this functionality.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
---
 fs/overlayfs/Makefile    |  4 +--
 fs/overlayfs/file.c      | 76 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/overlayfs/inode.c     |  1 +
 fs/overlayfs/overlayfs.h |  3 ++
 4 files changed, 82 insertions(+), 2 deletions(-)
 create mode 100644 fs/overlayfs/file.c

Comments

Al Viro June 10, 2018, 4:13 a.m. UTC | #1
On Tue, May 29, 2018 at 04:43:14PM +0200, Miklos Szeredi wrote:
> Implement file operations on a regular overlay file.  The underlying file
> is opened separately and cached in ->private_data.
> 
> It might be worth making an exception for such files when accounting in
> nr_file to confirm to userspace expectations.  We are only adding a small
> overhead (248bytes for the struct file) since the real inode and dentry are
> pinned by overlayfs anyway.
> 
> This patch doesn't have any effect, since the vfs will use d_real() to find
> the real underlying file to open.  The patch at the end of the series will
> actually enable this functionality.

> +static struct file *ovl_open_realfile(const struct file *file)
> +{
> +	struct inode *inode = file_inode(file);
> +	struct inode *upperinode = ovl_inode_upper(inode);
> +	struct inode *realinode = upperinode ?: ovl_inode_lower(inode);
> +	struct file *realfile;
> +	const struct cred *old_cred;
> +
> +	old_cred = ovl_override_creds(inode->i_sb);
> +	realfile = path_open(&file->f_path, file->f_flags | O_NOATIME,
> +			     realinode, current_cred(), false);
> +	revert_creds(old_cred);
> +
> +	pr_debug("open(%p[%pD2/%c], 0%o) -> (%p, 0%o)\n",
> +		 file, file, upperinode ? 'u' : 'l', file->f_flags,
> +		 realfile, IS_ERR(realfile) ? 0 : realfile->f_flags);
> +
> +	return realfile;
> +}

IDGI.  OK, you open a file in the layer you want; good, but why the hell do you
*not* use the dentry/vfsmount from the same layer?

IOW, why does your path_open() get an explicit inode argument at all?  With the
rest of the work done in that series it looks like you should be able to use
vfs_open() instead...  Sure, for ovlfs file you want ->f_path on overlayfs and
not in a layer, but why do the same for those?

And why bother with override_creds at all?  What's wrong with simply passing
->creator_cred to path_open()/vfs_open()/whatnot?
Miklos Szeredi June 11, 2018, 7:09 a.m. UTC | #2
On Sun, Jun 10, 2018 at 6:13 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Tue, May 29, 2018 at 04:43:14PM +0200, Miklos Szeredi wrote:
>> Implement file operations on a regular overlay file.  The underlying file
>> is opened separately and cached in ->private_data.
>>
>> It might be worth making an exception for such files when accounting in
>> nr_file to confirm to userspace expectations.  We are only adding a small
>> overhead (248bytes for the struct file) since the real inode and dentry are
>> pinned by overlayfs anyway.
>>
>> This patch doesn't have any effect, since the vfs will use d_real() to find
>> the real underlying file to open.  The patch at the end of the series will
>> actually enable this functionality.
>
>> +static struct file *ovl_open_realfile(const struct file *file)
>> +{
>> +     struct inode *inode = file_inode(file);
>> +     struct inode *upperinode = ovl_inode_upper(inode);
>> +     struct inode *realinode = upperinode ?: ovl_inode_lower(inode);
>> +     struct file *realfile;
>> +     const struct cred *old_cred;
>> +
>> +     old_cred = ovl_override_creds(inode->i_sb);
>> +     realfile = path_open(&file->f_path, file->f_flags | O_NOATIME,
>> +                          realinode, current_cred(), false);
>> +     revert_creds(old_cred);
>> +
>> +     pr_debug("open(%p[%pD2/%c], 0%o) -> (%p, 0%o)\n",
>> +              file, file, upperinode ? 'u' : 'l', file->f_flags,
>> +              realfile, IS_ERR(realfile) ? 0 : realfile->f_flags);
>> +
>> +     return realfile;
>> +}
>
> IDGI.  OK, you open a file in the layer you want; good, but why the hell do you
> *not* use the dentry/vfsmount from the same layer?
>
> IOW, why does your path_open() get an explicit inode argument at all?  With the
> rest of the work done in that series it looks like you should be able to use
> vfs_open() instead...  Sure, for ovlfs file you want ->f_path on overlayfs and
> not in a layer, but why do the same for those?

I'd really like to get there some time but...

List of basic requirements:

 - Private mmap of overlay file shares page cache with lower file (and
hence with all other overlays using the same lower file).

 - /proc/PID/maps shows correct path.

Thought about setting f_mapping/i_mapping of overlay file to that of
underlying file.  But that breaks when doing a copy-up.  We can't just
go and change those mapping pointers, assumption is that those remain
constant (we'd need READ_ONCE() for all cases where we use the mapping
more than once).  It's probably doable, but it's a large and fragile
change.

Thanks,
Miklos
Al Viro June 12, 2018, 2:29 a.m. UTC | #3
On Mon, Jun 11, 2018 at 09:09:04AM +0200, Miklos Szeredi wrote:

[context: opening files in layers with unholy mix of overlayfs
->f_path and layer's ->f_inode/->f_op]

> I'd really like to get there some time but...
> 
> List of basic requirements:
> 
>  - Private mmap of overlay file shares page cache with lower file (and
> hence with all other overlays using the same lower file).
> 
>  - /proc/PID/maps shows correct path.
> 
> Thought about setting f_mapping/i_mapping of overlay file to that of
> underlying file.  But that breaks when doing a copy-up.  We can't just
> go and change those mapping pointers, assumption is that those remain
> constant (we'd need READ_ONCE() for all cases where we use the mapping
> more than once).  It's probably doable, but it's a large and fragile
> change.

We are really asking for trouble here - anything with e.g. ->read_iter()
using dentry will get in trouble with that kind of games.  Consider something
like

foo_read_iter(struct kiocb *iocb, struct iov_iter *to)
{
	struct file *file = iocb->ki_filp;
	struct foo_data *p = file->f_path.dentry->d_fsdata;
	...
}

which will work just fine for files on foofs, where we have ->d_fsdata set
on lookup.  Now, try to use foofs as a layer; suddenly, you get foofs
files with ->f_path.dentry being *overlayfs* dentry, with ->d_fsdata
being nothing like struct foo_data *.

Better yet, consider

foo_open(struct inode *inode, struct file *file)
{
	struct dentry *dentry = file->f_path.dentry;
	...
	foo_add_splat(dentry, splat);
	...
}
where foo_add_splat() inserts struct foo_splat into an hlist starting
in dentry->d_fsdata.  That's not a pure theory - we *do* have ->open()
instances doing things of that sort.  That'll bugger overlayfs quite
badly, not to mention that foofs methods won't be happy with overlayfs
dentries.

It might (or might not) work for the filesystems you'd been testing
on, but it's a lot of trouble waiting to happen.  Hell, try and use
ecryptfs as lower layer, see how fast it'll blow up.  Sure, it's
a dumb testcase, but I don't see how to check if something more
realistic is trouble-free.

I'd been trying to come up with some way to salvage that kludge of yours,
but I don't see any solutions.  We don't have good proxies for "this
filesystem might be unsafe as lower layer" ;-/

Frankly, it might be saner and safer to teach procfs (and similar
places) to do more than just use ->vm_file->f_path.  _That_ at least
is much more local in impact.
Al Viro June 12, 2018, 2:40 a.m. UTC | #4
On Tue, Jun 12, 2018 at 03:29:26AM +0100, Al Viro wrote:

> It might (or might not) work for the filesystems you'd been testing
> on, but it's a lot of trouble waiting to happen.  Hell, try and use
> ecryptfs as lower layer, see how fast it'll blow up.  Sure, it's
> a dumb testcase, but I don't see how to check if something more
> realistic is trouble-free.
> 
> I'd been trying to come up with some way to salvage that kludge of yours,
> but I don't see any solutions.  We don't have good proxies for "this
> filesystem might be unsafe as lower layer" ;-/

Note that anything that uses file_dentry() anywhere near ->open(),
->read_iter() or ->write_iter() is an instant trouble with your scheme.
Such as
int nfs_open(struct inode *inode, struct file *filp)
{
        struct nfs_open_context *ctx;

        ctx = alloc_nfs_open_context(file_dentry(filp), filp->f_mode, filp);
        if (IS_ERR(ctx)) 
                return PTR_ERR(ctx);
        nfs_file_set_open_context(filp, ctx);
        put_nfs_open_context(ctx);
        nfs_fscache_open_file(inode, filp);
        return 0;
}

You do want to support NFS for lower layers, right?
Miklos Szeredi June 12, 2018, 9:24 a.m. UTC | #5
On Tue, Jun 12, 2018 at 4:40 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Tue, Jun 12, 2018 at 03:29:26AM +0100, Al Viro wrote:
>
>> It might (or might not) work for the filesystems you'd been testing
>> on, but it's a lot of trouble waiting to happen.  Hell, try and use
>> ecryptfs as lower layer, see how fast it'll blow up.  Sure, it's
>> a dumb testcase, but I don't see how to check if something more
>> realistic is trouble-free.

That's funny, because when dhowells added the patch to make f_path
point to the overlay, I was fighting tooth and claw against that
change on the grounds of being unsafe, but it went through regardless
(and was in fact one of the biggest headaches in overlay/vfs
interaction).

So you might be right that there are bugs in the handling of ecryptfs,
etc, however the patchset is guaranteed not to cause regressions in
this area.

And yes, it would be best to get rid of that kludge once and for all.

>>
>> I'd been trying to come up with some way to salvage that kludge of yours,
>> but I don't see any solutions.  We don't have good proxies for "this
>> filesystem might be unsafe as lower layer" ;-/
>
> Note that anything that uses file_dentry() anywhere near ->open(),
> ->read_iter() or ->write_iter() is an instant trouble with your scheme.
> Such as
> int nfs_open(struct inode *inode, struct file *filp)
> {
>         struct nfs_open_context *ctx;
>
>         ctx = alloc_nfs_open_context(file_dentry(filp), filp->f_mode, filp);
>         if (IS_ERR(ctx))
>                 return PTR_ERR(ctx);
>         nfs_file_set_open_context(filp, ctx);
>         put_nfs_open_context(ctx);
>         nfs_fscache_open_file(inode, filp);
>         return 0;
> }
>
> You do want to support NFS for lower layers, right?

There's no change regarding how file_dentry() works.  We've just
pushed these weird files (f_path points to overlay, f_inode points to
underlay) down into the guts of overlayfs and are not directly
referenced from the file table anymore.  That shouldn't make *any*
difference from the lower fs's pov.

The only difference is that now the real file has creds inherited from
mounter task.  If lower filesystem's a_ops did some permission
checking based on that, then that might make a difference in behavior.
But I guess that difference would be in the positive direction, making
behavior more consistent.

Thanks,
Miklos
Al Viro June 12, 2018, 6:24 p.m. UTC | #6
On Tue, Jun 12, 2018 at 11:24:39AM +0200, Miklos Szeredi wrote:

> > Note that anything that uses file_dentry() anywhere near ->open(),
> > ->read_iter() or ->write_iter() is an instant trouble with your scheme.
> > Such as
> > int nfs_open(struct inode *inode, struct file *filp)
> > {
> >         struct nfs_open_context *ctx;
> >
> >         ctx = alloc_nfs_open_context(file_dentry(filp), filp->f_mode, filp);
> >         if (IS_ERR(ctx))
> >                 return PTR_ERR(ctx);
> >         nfs_file_set_open_context(filp, ctx);
> >         put_nfs_open_context(ctx);
> >         nfs_fscache_open_file(inode, filp);
> >         return 0;
> > }
> >
> > You do want to support NFS for lower layers, right?
> 
> There's no change regarding how file_dentry() works.  We've just
> pushed these weird files (f_path points to overlay, f_inode points to
> underlay) down into the guts of overlayfs and are not directly
> referenced from the file table anymore.  That shouldn't make *any*
> difference from the lower fs's pov.

*owwww*
I'd managed to push that particular nest of horrors out of mind ;-/
Having dug out my notes from back then and grepped around...  The real
mess is not even /proc/*/maps - it's /proc/*/map_files/* and yes, the
reasons for that kludge are still valid ;-/

Fuck.  OK, so we want to get rid of ->f_path.dentry accesses and see
that they don't come back.  Leaving them around due to "it won't come
anywhere near overlayfs" was a mistake of the same kind as leaving
d_add() in ->lookup() instances where we'd been certain that filesystem
would never get exported over NFS.  Just as we'd got open-by-handle for
e.g. NFS, we'd got nothing to prevent ecryptfs as lower layer in
overlayfs...

I hate it, but... consider path_open() objections withdrawn for now.
Uses of ->vm_file (and rules for those) are too convoluted to untangle
at the moment.  I still would love to get that straightened out, but
it's not this cycle fodder, more's the pity...
Al Viro June 12, 2018, 6:31 p.m. UTC | #7
On Tue, Jun 12, 2018 at 07:24:23PM +0100, Al Viro wrote:
> On Tue, Jun 12, 2018 at 11:24:39AM +0200, Miklos Szeredi wrote:
> 
> > > Note that anything that uses file_dentry() anywhere near ->open(),
> > > ->read_iter() or ->write_iter() is an instant trouble with your scheme.
> > > Such as
> > > int nfs_open(struct inode *inode, struct file *filp)
> > > {
> > >         struct nfs_open_context *ctx;
> > >
> > >         ctx = alloc_nfs_open_context(file_dentry(filp), filp->f_mode, filp);
> > >         if (IS_ERR(ctx))
> > >                 return PTR_ERR(ctx);
> > >         nfs_file_set_open_context(filp, ctx);
> > >         put_nfs_open_context(ctx);
> > >         nfs_fscache_open_file(inode, filp);
> > >         return 0;
> > > }
> > >
> > > You do want to support NFS for lower layers, right?
> > 
> > There's no change regarding how file_dentry() works.  We've just
> > pushed these weird files (f_path points to overlay, f_inode points to
> > underlay) down into the guts of overlayfs and are not directly
> > referenced from the file table anymore.  That shouldn't make *any*
> > difference from the lower fs's pov.
> 
> *owwww*
> I'd managed to push that particular nest of horrors out of mind ;-/
> Having dug out my notes from back then and grepped around...  The real
> mess is not even /proc/*/maps - it's /proc/*/map_files/* and yes, the
> reasons for that kludge are still valid ;-/
> 
> Fuck.  OK, so we want to get rid of ->f_path.dentry accesses and see
> that they don't come back.  Leaving them around due to "it won't come
> anywhere near overlayfs" was a mistake of the same kind as leaving
> d_add() in ->lookup() instances where we'd been certain that filesystem
> would never get exported over NFS.  Just as we'd got open-by-handle for
> e.g. NFS, we'd got nothing to prevent ecryptfs as lower layer in
> overlayfs...
> 
> I hate it, but... consider path_open() objections withdrawn for now.
> Uses of ->vm_file (and rules for those) are too convoluted to untangle
> at the moment.  I still would love to get that straightened out, but
> it's not this cycle fodder, more's the pity...

PS: conversion of ->f_path.dentry is easy and that can probably go this
cycle - it's a fairly trivial change, with no functional changes unless
overlayfs is used with <filesystem>, fixing really bad shit if it ever
gets used thus.  I'm not asking to put that into overlayfs pull *and*
it's independent from the "want to kill that fucking kludge" stuff.
The latter is too hard for this cycle, unfortunately.
Miklos Szeredi June 13, 2018, 9:21 a.m. UTC | #8
On Tue, Jun 12, 2018 at 8:31 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Tue, Jun 12, 2018 at 07:24:23PM +0100, Al Viro wrote:

>> I hate it, but... consider path_open() objections withdrawn for now.

Is that an ACK for the pull if I follow up with fixes for mmap botch, etc?

>> Uses of ->vm_file (and rules for those) are too convoluted to untangle
>> at the moment.  I still would love to get that straightened out, but
>> it's not this cycle fodder, more's the pity...

Looked at some other options...  What coda mmap does looks very
dubious.  It only sets f_mapping, not vm_file.  That's going to get
into all sorts of trouble when underlying fs tries to look at
file_inode() or worse, ->private_data.  Looks like that should be
converted to what overlayfs does, to have a remote chance of actually
not crashing on most filesystems.  Does anybody actually use coda
still?

> PS: conversion of ->f_path.dentry is easy and that can probably go this
> cycle - it's a fairly trivial change, with no functional changes unless
> overlayfs is used with <filesystem>, fixing really bad shit if it ever
> gets used thus.  I'm not asking to put that into overlayfs pull *and*
> it's independent from the "want to kill that fucking kludge" stuff.
> The latter is too hard for this cycle, unfortunately.

So this is about adding a file_dentry_check() (or whatever we want to
call it) helper to be used by all filesystems when dereferecing
f_path.dentry, right?

Thanks,
Miklos
J. R. Okajima June 13, 2018, 11:56 a.m. UTC | #9
Al Viro:
> I'd managed to push that particular nest of horrors out of mind ;-/
> Having dug out my notes from back then and grepped around...  The real
> mess is not even /proc/*/maps - it's /proc/*/map_files/* and yes, the
> reasons for that kludge are still valid ;-/
	:::
> Uses of ->vm_file (and rules for those) are too convoluted to untangle
> at the moment.  I still would love to get that straightened out, but
> it's not this cycle fodder, more's the pity...

I don't fully read this thread, but the discussion is related to the
file path printed in /proc/$$/maps?  If so, as just for your
information, here is an approach that aufs took.

In linux-v2.6 era, aufs tried implementing mmap by customzing
address_space ops, but it was not good and failed completing the
implementation.
As wel as overlayfs, aufs has two struct file objects for a single
a regular file.  One is for a virtual aufs' entry, and the other is for
a real layer's entry.  When a user issues mmap(2) for the virtual file,
aufs redirects the request to the real file on the layer internally.  So
the vm_file points to the real file.  It means /proc/$$/maps prints the
unexpected file path.

Aufs added another struct file* vm_prfile in struct vma.  It points to
the virtual aufs file, and /proc/$$/maps prints vm_prfile instead of
vm_file. Of cource, maintaining vm_prfile is important since vma may be
merged or splitted.
Still I don't like this approach, but I don't have another better idea,
also it works for many years.  You can get the patch in
aufs4-standalone.git on sourceforge if you want.


J. R. Okajima
Al Viro June 15, 2018, 5:47 a.m. UTC | #10
On Wed, Jun 13, 2018 at 11:21:30AM +0200, Miklos Szeredi wrote:
> On Tue, Jun 12, 2018 at 8:31 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > On Tue, Jun 12, 2018 at 07:24:23PM +0100, Al Viro wrote:
> 
> >> I hate it, but... consider path_open() objections withdrawn for now.
> 
> Is that an ACK for the pull if I follow up with fixes for mmap botch, etc?

Yes.

> >> Uses of ->vm_file (and rules for those) are too convoluted to untangle
> >> at the moment.  I still would love to get that straightened out, but
> >> it's not this cycle fodder, more's the pity...
> 
> Looked at some other options...  What coda mmap does looks very
> dubious.  It only sets f_mapping, not vm_file.  That's going to get
> into all sorts of trouble when underlying fs tries to look at
> file_inode() or worse, ->private_data.  Looks like that should be
> converted to what overlayfs does, to have a remote chance of actually
> not crashing on most filesystems.  Does anybody actually use coda
> still?

Keep in mind that coda is using the local fs only as cache; IOW, its needs
are much more limited than those of overlayfs - local r/w filesystem,
disk-backed or tmpfs, used pretty much as a scratch space.

> > PS: conversion of ->f_path.dentry is easy and that can probably go this
> > cycle - it's a fairly trivial change, with no functional changes unless
> > overlayfs is used with <filesystem>, fixing really bad shit if it ever
> > gets used thus.  I'm not asking to put that into overlayfs pull *and*
> > it's independent from the "want to kill that fucking kludge" stuff.
> > The latter is too hard for this cycle, unfortunately.
> 
> So this is about adding a file_dentry_check() (or whatever we want to
> call it) helper to be used by all filesystems when dereferecing
> f_path.dentry, right?

file_dentry(), and some of the users should be converted to file_inode().
There's also a missing helper for debugfs uses - more or less a combination
of file_dentry() and debugfs_file_get() (if not a conversion of
debugfs_file_get() to taking struct file - almost all users are of that
form, if not entirely all of them).  I've some of that done in local
branch...
Miklos Szeredi June 18, 2018, 11:50 a.m. UTC | #11
On Fri, Jun 15, 2018 at 06:47:17AM +0100, Al Viro wrote:
> On Wed, Jun 13, 2018 at 11:21:30AM +0200, Miklos Szeredi wrote:

> > Looked at some other options...  What coda mmap does looks very
> > dubious.  It only sets f_mapping, not vm_file.  That's going to get
> > into all sorts of trouble when underlying fs tries to look at
> > file_inode() or worse, ->private_data.  Looks like that should be
> > converted to what overlayfs does, to have a remote chance of actually
> > not crashing on most filesystems.  Does anybody actually use coda
> > still?
> 
> Keep in mind that coda is using the local fs only as cache; IOW, its needs
> are much more limited than those of overlayfs - local r/w filesystem,
> disk-backed or tmpfs, used pretty much as a scratch space.

Look:

coda_file_mmap(struct file *coda_file, struct vm_area_struct *vma)
{
[...]
	coda_file->f_mapping = host_file->f_mapping;
[...]
	return call_mmap(host_file, vma);
}

So that'll end up with vma->vm_file pointing to coda file, coda_file->f_mapping
pointing to host mapping.  Hence vm_ops and a_ops are going to come from host
file, but they'll be getting a "foreign" file with ->private_data and ->f_inode
pointing to coda structures.

For example:

int ext4_filemap_fault(struct vm_fault *vmf)
{
	struct inode *inode = file_inode(vmf->vma->vm_file)
	int err;

	down_read(&EXT4_I(inode)->i_mmap_sem);
[...]

There you have it: coda inode being interpreted as ext4 inode.  How is that
supposed to work?  How is it not blowing up?  What am I missing?

Thanks,
Miklos
diff mbox

Patch

diff --git a/fs/overlayfs/Makefile b/fs/overlayfs/Makefile
index 30802347a020..46e1ff8ac056 100644
--- a/fs/overlayfs/Makefile
+++ b/fs/overlayfs/Makefile
@@ -4,5 +4,5 @@ 
 
 obj-$(CONFIG_OVERLAY_FS) += overlay.o
 
-overlay-objs := super.o namei.o util.o inode.o dir.o readdir.o copy_up.o \
-		export.o
+overlay-objs := super.o namei.o util.o inode.o file.o dir.o readdir.o \
+		copy_up.o export.o
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
new file mode 100644
index 000000000000..a0b606885c41
--- /dev/null
+++ b/fs/overlayfs/file.c
@@ -0,0 +1,76 @@ 
+/*
+ * Copyright (C) 2017 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/cred.h>
+#include <linux/file.h>
+#include <linux/xattr.h>
+#include "overlayfs.h"
+
+static struct file *ovl_open_realfile(const struct file *file)
+{
+	struct inode *inode = file_inode(file);
+	struct inode *upperinode = ovl_inode_upper(inode);
+	struct inode *realinode = upperinode ?: ovl_inode_lower(inode);
+	struct file *realfile;
+	const struct cred *old_cred;
+
+	old_cred = ovl_override_creds(inode->i_sb);
+	realfile = path_open(&file->f_path, file->f_flags | O_NOATIME,
+			     realinode, current_cred(), false);
+	revert_creds(old_cred);
+
+	pr_debug("open(%p[%pD2/%c], 0%o) -> (%p, 0%o)\n",
+		 file, file, upperinode ? 'u' : 'l', file->f_flags,
+		 realfile, IS_ERR(realfile) ? 0 : realfile->f_flags);
+
+	return realfile;
+}
+
+static int ovl_open(struct inode *inode, struct file *file)
+{
+	struct dentry *dentry = file_dentry(file);
+	struct file *realfile;
+	int err;
+
+	err = ovl_open_maybe_copy_up(dentry, file->f_flags);
+	if (err)
+		return err;
+
+	/* No longer need these flags, so don't pass them on to underlying fs */
+	file->f_flags &= ~(O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC);
+
+	realfile = ovl_open_realfile(file);
+	if (IS_ERR(realfile))
+		return PTR_ERR(realfile);
+
+	file->private_data = realfile;
+
+	return 0;
+}
+
+static int ovl_release(struct inode *inode, struct file *file)
+{
+	fput(file->private_data);
+
+	return 0;
+}
+
+static loff_t ovl_llseek(struct file *file, loff_t offset, int whence)
+{
+	struct inode *realinode = ovl_inode_real(file_inode(file));
+
+	return generic_file_llseek_size(file, offset, whence,
+					realinode->i_sb->s_maxbytes,
+					i_size_read(realinode));
+}
+
+const struct file_operations ovl_file_operations = {
+	.open		= ovl_open,
+	.release	= ovl_release,
+	.llseek		= ovl_llseek,
+};
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 6682ea63c4fd..d6a13da0740f 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -535,6 +535,7 @@  static void ovl_fill_inode(struct inode *inode, umode_t mode, dev_t rdev,
 	switch (mode & S_IFMT) {
 	case S_IFREG:
 		inode->i_op = &ovl_file_inode_operations;
+		inode->i_fop = &ovl_file_operations;
 		break;
 
 	case S_IFDIR:
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 419c80a0024e..3f6e39a2f51e 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -377,6 +377,9 @@  struct dentry *ovl_create_real(struct inode *dir, struct dentry *newdentry,
 int ovl_cleanup(struct inode *dir, struct dentry *dentry);
 struct dentry *ovl_create_temp(struct dentry *workdir, struct ovl_cattr *attr);
 
+/* file.c */
+extern const struct file_operations ovl_file_operations;
+
 /* copy_up.c */
 int ovl_copy_up(struct dentry *dentry);
 int ovl_copy_up_flags(struct dentry *dentry, int flags);