diff mbox series

[v2,3/6] ovl: fix GPF in swapfile_activate of file from overlayfs over xfs

Message ID 1535300717-26686-4-git-send-email-amir73il@gmail.com (mailing list archive)
State New, archived
Headers show
Series Overlayfs stacked f_op fixes | expand

Commit Message

Amir Goldstein Aug. 26, 2018, 4:25 p.m. UTC
Since overlayfs implements stacked file operations, the underlying
filesystems are not supposed to be exposed to the overlayfs file,
whose f_inode is an overlayfs inode.

Assigning an overlayfs file to swap_file results in an attempt of xfs
code to dereference an xfs_inode struct from an ovl_inode pointer:

 CPU: 0 PID: 2462 Comm: swapon Not tainted
 4.18.0-xfstests-12721-g33e17876ea4e #3402
 RIP: 0010:xfs_find_bdev_for_inode+0x23/0x2f
 Call Trace:
  xfs_iomap_swapfile_activate+0x1f/0x43
  __se_sys_swapon+0xb1a/0xee9

Fix this by not assigning the real inode mapping to f_mapping, which
will cause swapon() to return an error (-EINVAL). Although it makes
sense not to allow setting swpafile on an overlayfs file, some users
may depend on it, so we may need to fix this up in the future.

Keeping f_mapping pointing to overlay inode mapping will cause O_DIRECT
open to fail. Fix this by installing ovl_aops with noop_direct_IO in
overlay inode mapping.

Keeping f_mapping pointing to overlay inode mapping will cause other
a_ops related operations to fail (e.g. readahead()). Those will be
fixed by follow up patches.

Suggested-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: f7c72396d0de ("ovl: add O_DIRECT support")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/overlayfs/file.c  | 3 ---
 fs/overlayfs/inode.c | 6 ++++++
 2 files changed, 6 insertions(+), 3 deletions(-)

Comments

Dave Chinner Aug. 27, 2018, 3:43 a.m. UTC | #1
On Sun, Aug 26, 2018 at 07:25:14PM +0300, Amir Goldstein wrote:
> Since overlayfs implements stacked file operations, the underlying
> filesystems are not supposed to be exposed to the overlayfs file,
> whose f_inode is an overlayfs inode.
> 
> Assigning an overlayfs file to swap_file results in an attempt of xfs
> code to dereference an xfs_inode struct from an ovl_inode pointer:
> 
>  CPU: 0 PID: 2462 Comm: swapon Not tainted
>  4.18.0-xfstests-12721-g33e17876ea4e #3402
>  RIP: 0010:xfs_find_bdev_for_inode+0x23/0x2f
>  Call Trace:
>   xfs_iomap_swapfile_activate+0x1f/0x43
>   __se_sys_swapon+0xb1a/0xee9
> 
> Fix this by not assigning the real inode mapping to f_mapping, which
> will cause swapon() to return an error (-EINVAL). Although it makes
> sense not to allow setting swpafile on an overlayfs file, some users
> may depend on it, so we may need to fix this up in the future.
> 
> Keeping f_mapping pointing to overlay inode mapping will cause O_DIRECT
> open to fail. Fix this by installing ovl_aops with noop_direct_IO in
> overlay inode mapping.

Ummm - shouldn't ovl be checking the real inode's .direct_IO method
and returning status based on that? i.e. if the underlying fs
doesn't support O_DIRECT, neither should ovl...

> +const struct address_space_operations ovl_aops = {
> +	/* For O_DIRECT dentry_open() checks f_mapping->a_ops->direct_IO */
> +	.direct_IO		= noop_direct_IO,
> +};
> +
>  /*
>   * It is possible to stack overlayfs instance on top of another
>   * overlayfs instance as lower layer. We need to annonate the
> @@ -575,6 +580,7 @@ static void ovl_fill_inode(struct inode *inode, umode_t mode, dev_t rdev,
>  	case S_IFREG:
>  		inode->i_op = &ovl_file_inode_operations;
>  		inode->i_fop = &ovl_file_operations;
> +		inode->i_mapping->a_ops = &ovl_aops;
>  		break;

So you put an ovl interposer in the way here - it needs to pass
through *everything* to the the real inode's aops, right?

Cheers,

Dave.
Amir Goldstein Aug. 27, 2018, 6:34 a.m. UTC | #2
On Mon, Aug 27, 2018 at 6:43 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Sun, Aug 26, 2018 at 07:25:14PM +0300, Amir Goldstein wrote:
> > Since overlayfs implements stacked file operations, the underlying
> > filesystems are not supposed to be exposed to the overlayfs file,
> > whose f_inode is an overlayfs inode.
> >
> > Assigning an overlayfs file to swap_file results in an attempt of xfs
> > code to dereference an xfs_inode struct from an ovl_inode pointer:
> >
> >  CPU: 0 PID: 2462 Comm: swapon Not tainted
> >  4.18.0-xfstests-12721-g33e17876ea4e #3402
> >  RIP: 0010:xfs_find_bdev_for_inode+0x23/0x2f
> >  Call Trace:
> >   xfs_iomap_swapfile_activate+0x1f/0x43
> >   __se_sys_swapon+0xb1a/0xee9
> >
> > Fix this by not assigning the real inode mapping to f_mapping, which
> > will cause swapon() to return an error (-EINVAL). Although it makes
> > sense not to allow setting swpafile on an overlayfs file, some users
> > may depend on it, so we may need to fix this up in the future.
> >
> > Keeping f_mapping pointing to overlay inode mapping will cause O_DIRECT
> > open to fail. Fix this by installing ovl_aops with noop_direct_IO in
> > overlay inode mapping.
>
> Ummm - shouldn't ovl be checking the real inode's .direct_IO method
> and returning status based on that? i.e. if the underlying fs
> doesn't support O_DIRECT, neither should ovl...
>

ovl_open_realfile() will take care of that later when overlay actually
tried to open the underlying file.

> > +const struct address_space_operations ovl_aops = {
> > +     /* For O_DIRECT dentry_open() checks f_mapping->a_ops->direct_IO */
> > +     .direct_IO              = noop_direct_IO,
> > +};
> > +
> >  /*
> >   * It is possible to stack overlayfs instance on top of another
> >   * overlayfs instance as lower layer. We need to annonate the
> > @@ -575,6 +580,7 @@ static void ovl_fill_inode(struct inode *inode, umode_t mode, dev_t rdev,
> >       case S_IFREG:
> >               inode->i_op = &ovl_file_inode_operations;
> >               inode->i_fop = &ovl_file_operations;
> > +             inode->i_mapping->a_ops = &ovl_aops;
> >               break;
>
> So you put an ovl interposer in the way here - it needs to pass
> through *everything* to the the real inode's aops, right?
>

No. it's just a decoy a_ops, so if we miss another spot like swapon()
user will get an error (i.e. for no a_ops->readpages) rather then calling
into underlying filesystem aops with an overlay file and oopsing.

So the trick of this patch is to change the game from whack-an-oops
to whack-an-einval, which is better for everyone.

Cheers,
Amir.
Miklos Szeredi Aug. 27, 2018, 9:49 a.m. UTC | #3
On Mon, Aug 27, 2018 at 8:34 AM, Amir Goldstein <amir73il@gmail.com> wrote:

> So the trick of this patch is to change the game from whack-an-oops
> to whack-an-einval, which is better for everyone.

In other words: this overlayfs update might cause regressions in dark
and musty corners of overlayfs usage, but those will not be Oops
inducing regressions.  And whenever someone reports a regression, that
will be solved by making that corner of the kernel less musty and dark
(i.e. properly stackable), so in the end this is a win-win game.

There's one corner case remaining, which is shared mmap consistency
across copy-up.  I think the solution to this will be:

 - private mappings of lower files will directly go to the lower
file's mapping (just like now)
 - anything else (shared mmap of lower files and all mmap of upper
files) will go to the overlay mapping (just like a normal fs)

The flip side is cache duplication between shared mmap and private
mmap of the same lower file.  I hope this would be pretty rare.

I don't see big complexities with the above, it should be solvable
purely in overlayfs.

Then there's getting rid of d_real() for good, for which Al has some
ideas, and we are done.  So yes, I think the end of the tunnel is
pretty well in sight.

Thanks,
Miklos
diff mbox series

Patch

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 24c9e0d70c3b..fd64daf6daa5 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -131,9 +131,6 @@  static int ovl_open(struct inode *inode, struct file *file)
 	if (IS_ERR(realfile))
 		return PTR_ERR(realfile);
 
-	/* For O_DIRECT dentry_open() checks f_mapping->a_ops->direct_IO */
-	file->f_mapping = realfile->f_mapping;
-
 	file->private_data = realfile;
 	file->f_mode |= FMODE_STACKED;
 
diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index 5014749fd4b4..b6ac545b5a32 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -504,6 +504,11 @@  static const struct inode_operations ovl_special_inode_operations = {
 	.update_time	= ovl_update_time,
 };
 
+const struct address_space_operations ovl_aops = {
+	/* For O_DIRECT dentry_open() checks f_mapping->a_ops->direct_IO */
+	.direct_IO		= noop_direct_IO,
+};
+
 /*
  * It is possible to stack overlayfs instance on top of another
  * overlayfs instance as lower layer. We need to annonate the
@@ -575,6 +580,7 @@  static void ovl_fill_inode(struct inode *inode, umode_t mode, dev_t rdev,
 	case S_IFREG:
 		inode->i_op = &ovl_file_inode_operations;
 		inode->i_fop = &ovl_file_operations;
+		inode->i_mapping->a_ops = &ovl_aops;
 		break;
 
 	case S_IFDIR: