[08/11] vfs: push EXDEV check down into ->remap_file_range
diff mbox series

Message ID 20181203083416.28978-9-david@fromorbit.com
State New
Headers show
Series
  • fs: fixes for major copy_file_range() issues
Related show

Commit Message

Dave Chinner Dec. 3, 2018, 8:34 a.m. UTC
From: Dave Chinner <dchinner@redhat.com>

before we can enable cross-device copies into copy_file_range(),
we have to ensure that ->remap_file_range() implemenations will
correctly reject attempts to do cross filesystem clones. Currently
these checks are done above calls to ->remap_file_range(), but
we need to drive them inwards so that we get EXDEV protection for all
callers of ->remap_file_range().

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/read_write.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

Comments

Amir Goldstein Dec. 3, 2018, 11:04 a.m. UTC | #1
On Mon, Dec 3, 2018 at 10:34 AM Dave Chinner <david@fromorbit.com> wrote:
>
> From: Dave Chinner <dchinner@redhat.com>
>
> before we can enable cross-device copies into copy_file_range(),
> we have to ensure that ->remap_file_range() implemenations will
> correctly reject attempts to do cross filesystem clones. Currently

But you only fixed remap_file_range() implemenations of xfs and ocfs2...

> these checks are done above calls to ->remap_file_range(), but
> we need to drive them inwards so that we get EXDEV protection for all
> callers of ->remap_file_range().
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/read_write.c | 21 +++++++++++++--------
>  1 file changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 3288db1d5f21..174cf92eea1d 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1909,6 +1909,19 @@ int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in,
>         bool same_inode = (inode_in == inode_out);
>         int ret;
>
> +       /*
> +        * FICLONE/FICLONERANGE ioctls enforce that src and dest files are on
> +        * the same mount. Practically, they only need to be on the same file
> +        * system. We check this here rather than at the ioctl layers because
> +        * this is effectively a limitation of the fielsystem implementations,
> +        * not so much the API itself. Further, ->remap_file_range() can be
> +        * called from syscalls that don't have cross device copy restrictions
> +        * (such as copy_file_range()) and so we need to catch them before we
> +        * do any damage.
> +        */
> +       if (inode_in->i_sb != inode_out->i_sb)
> +               return -EXDEV;
> +
>         /* Don't touch certain kinds of inodes */
>         if (IS_IMMUTABLE(inode_out))
>                 return -EPERM;
> @@ -2013,14 +2026,6 @@ loff_t do_clone_file_range(struct file *file_in, loff_t pos_in,
>         if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
>                 return -EINVAL;
>
> -       /*
> -        * FICLONE/FICLONERANGE ioctls enforce that src and dest files are on
> -        * the same mount. Practically, they only need to be on the same file
> -        * system.
> -        */
> -       if (inode_in->i_sb != inode_out->i_sb)
> -               return -EXDEV;
> -

That leaves {nfs42,cifs,btrfs}_remap_file_range() exposed to passing
files not of their own fs type let alone same sb when do_clone_file_range()
is called from ovl_copy_up_data().

Thanks,
Amir.
Darrick J. Wong Dec. 3, 2018, 6:24 p.m. UTC | #2
On Mon, Dec 03, 2018 at 07:34:13PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> before we can enable cross-device copies into copy_file_range(),
> we have to ensure that ->remap_file_range() implemenations will
> correctly reject attempts to do cross filesystem clones. Currently
> these checks are done above calls to ->remap_file_range(), but
> we need to drive them inwards so that we get EXDEV protection for all
> callers of ->remap_file_range().
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/read_write.c | 21 +++++++++++++--------
>  1 file changed, 13 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 3288db1d5f21..174cf92eea1d 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1909,6 +1909,19 @@ int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in,
>  	bool same_inode = (inode_in == inode_out);
>  	int ret;
>  
> +	/*
> +	 * FICLONE/FICLONERANGE ioctls enforce that src and dest files are on
> +	 * the same mount. Practically, they only need to be on the same file
> +	 * system. We check this here rather than at the ioctl layers because
> +	 * this is effectively a limitation of the fielsystem implementations,

"filesystem"...

--D

> +	 * not so much the API itself. Further, ->remap_file_range() can be
> +	 * called from syscalls that don't have cross device copy restrictions
> +	 * (such as copy_file_range()) and so we need to catch them before we
> +	 * do any damage.
> +	 */
> +	if (inode_in->i_sb != inode_out->i_sb)
> +		return -EXDEV;
> +
>  	/* Don't touch certain kinds of inodes */
>  	if (IS_IMMUTABLE(inode_out))
>  		return -EPERM;
> @@ -2013,14 +2026,6 @@ loff_t do_clone_file_range(struct file *file_in, loff_t pos_in,
>  	if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
>  		return -EINVAL;
>  
> -	/*
> -	 * FICLONE/FICLONERANGE ioctls enforce that src and dest files are on
> -	 * the same mount. Practically, they only need to be on the same file
> -	 * system.
> -	 */
> -	if (inode_in->i_sb != inode_out->i_sb)
> -		return -EXDEV;
> -
>  	if (!(file_in->f_mode & FMODE_READ) ||
>  	    !(file_out->f_mode & FMODE_WRITE) ||
>  	    (file_out->f_flags & O_APPEND))
> -- 
> 2.19.1
>
Darrick J. Wong Dec. 3, 2018, 7:11 p.m. UTC | #3
On Mon, Dec 03, 2018 at 01:04:07PM +0200, Amir Goldstein wrote:
> On Mon, Dec 3, 2018 at 10:34 AM Dave Chinner <david@fromorbit.com> wrote:
> >
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > before we can enable cross-device copies into copy_file_range(),
> > we have to ensure that ->remap_file_range() implemenations will
> > correctly reject attempts to do cross filesystem clones. Currently
> 
> But you only fixed remap_file_range() implemenations of xfs and ocfs2...
> 
> > these checks are done above calls to ->remap_file_range(), but
> > we need to drive them inwards so that we get EXDEV protection for all
> > callers of ->remap_file_range().
> >
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/read_write.c | 21 +++++++++++++--------
> >  1 file changed, 13 insertions(+), 8 deletions(-)
> >
> > diff --git a/fs/read_write.c b/fs/read_write.c
> > index 3288db1d5f21..174cf92eea1d 100644
> > --- a/fs/read_write.c
> > +++ b/fs/read_write.c
> > @@ -1909,6 +1909,19 @@ int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in,
> >         bool same_inode = (inode_in == inode_out);
> >         int ret;
> >
> > +       /*
> > +        * FICLONE/FICLONERANGE ioctls enforce that src and dest files are on
> > +        * the same mount. Practically, they only need to be on the same file
> > +        * system. We check this here rather than at the ioctl layers because
> > +        * this is effectively a limitation of the fielsystem implementations,
> > +        * not so much the API itself. Further, ->remap_file_range() can be
> > +        * called from syscalls that don't have cross device copy restrictions
> > +        * (such as copy_file_range()) and so we need to catch them before we
> > +        * do any damage.
> > +        */
> > +       if (inode_in->i_sb != inode_out->i_sb)
> > +               return -EXDEV;
> > +
> >         /* Don't touch certain kinds of inodes */
> >         if (IS_IMMUTABLE(inode_out))
> >                 return -EPERM;
> > @@ -2013,14 +2026,6 @@ loff_t do_clone_file_range(struct file *file_in, loff_t pos_in,
> >         if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
> >                 return -EINVAL;
> >
> > -       /*
> > -        * FICLONE/FICLONERANGE ioctls enforce that src and dest files are on
> > -        * the same mount. Practically, they only need to be on the same file
> > -        * system.
> > -        */
> > -       if (inode_in->i_sb != inode_out->i_sb)
> > -               return -EXDEV;
> > -
> 

I think this is sort of backwards -- the checks should stay in
do_clone_file_range, and vfs_copy_file_range should be calling that
instead of directly calling ->remap_range():

vfs_copy_file_range()
{
	file_start_write(...);
	ret = do_clone_file_range(...);
	if (ret > 0)
		return ret;
	ret = do_copy_file_range(...);
	file_end_write(...);
	return ret;
}

> That leaves {nfs42,cifs,btrfs}_remap_file_range() exposed to passing
> files not of their own fs type let alone same sb when do_clone_file_range()
> is called from ovl_copy_up_data().

...and then I think this problem goes away.

--D

> Thanks,
> Amir.
Dave Chinner Dec. 3, 2018, 11:34 p.m. UTC | #4
On Mon, Dec 03, 2018 at 01:04:07PM +0200, Amir Goldstein wrote:
> On Mon, Dec 3, 2018 at 10:34 AM Dave Chinner <david@fromorbit.com> wrote:
> >
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > before we can enable cross-device copies into copy_file_range(),
> > we have to ensure that ->remap_file_range() implemenations will
> > correctly reject attempts to do cross filesystem clones. Currently
> 
> But you only fixed remap_file_range() implemenations of xfs and ocfs2...
> 
> > these checks are done above calls to ->remap_file_range(), but
> > we need to drive them inwards so that we get EXDEV protection for all
> > callers of ->remap_file_range().
> >
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/read_write.c | 21 +++++++++++++--------
> >  1 file changed, 13 insertions(+), 8 deletions(-)
> >
> > diff --git a/fs/read_write.c b/fs/read_write.c
> > index 3288db1d5f21..174cf92eea1d 100644
> > --- a/fs/read_write.c
> > +++ b/fs/read_write.c
> > @@ -1909,6 +1909,19 @@ int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in,
> >         bool same_inode = (inode_in == inode_out);
> >         int ret;
> >
> > +       /*
> > +        * FICLONE/FICLONERANGE ioctls enforce that src and dest files are on
> > +        * the same mount. Practically, they only need to be on the same file
> > +        * system. We check this here rather than at the ioctl layers because
> > +        * this is effectively a limitation of the fielsystem implementations,
> > +        * not so much the API itself. Further, ->remap_file_range() can be
> > +        * called from syscalls that don't have cross device copy restrictions
> > +        * (such as copy_file_range()) and so we need to catch them before we
> > +        * do any damage.
> > +        */
> > +       if (inode_in->i_sb != inode_out->i_sb)
> > +               return -EXDEV;
> > +
> >         /* Don't touch certain kinds of inodes */
> >         if (IS_IMMUTABLE(inode_out))
> >                 return -EPERM;
> > @@ -2013,14 +2026,6 @@ loff_t do_clone_file_range(struct file *file_in, loff_t pos_in,
> >         if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
> >                 return -EINVAL;
> >
> > -       /*
> > -        * FICLONE/FICLONERANGE ioctls enforce that src and dest files are on
> > -        * the same mount. Practically, they only need to be on the same file
> > -        * system.
> > -        */
> > -       if (inode_in->i_sb != inode_out->i_sb)
> > -               return -EXDEV;
> > -
> 
> That leaves {nfs42,cifs,btrfs}_remap_file_range() exposed to passing
> files not of their own fs type let alone same sb when do_clone_file_range()
> is called from ovl_copy_up_data().

For some reason I thought everyone called
generic_remap_file_range_prep() so they behaved the same way. My
mistake.

Really, though, I'm of the opinion that those filesystems should be
changed to call the generic checks rather than open code their own
incomplete/incompatible set of checks. This is exactly what I'm
trying to avoid with copy_file_range() - checks are done in one
place, all filesystems have the same checks done - so that future
modification and maintenance is so much easier.

We need to do the same thing to the remap_file_range()
implementations.

-Dave.
Dave Chinner Dec. 3, 2018, 11:37 p.m. UTC | #5
On Mon, Dec 03, 2018 at 11:11:30AM -0800, Darrick J. Wong wrote:
> On Mon, Dec 03, 2018 at 01:04:07PM +0200, Amir Goldstein wrote:
> > On Mon, Dec 3, 2018 at 10:34 AM Dave Chinner <david@fromorbit.com> wrote:
> > >
> > > From: Dave Chinner <dchinner@redhat.com>
> > >
> > > before we can enable cross-device copies into copy_file_range(),
> > > we have to ensure that ->remap_file_range() implemenations will
> > > correctly reject attempts to do cross filesystem clones. Currently
> > 
> > But you only fixed remap_file_range() implemenations of xfs and ocfs2...
> > 
> > > these checks are done above calls to ->remap_file_range(), but
> > > we need to drive them inwards so that we get EXDEV protection for all
> > > callers of ->remap_file_range().
> > >
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > ---
> > >  fs/read_write.c | 21 +++++++++++++--------
> > >  1 file changed, 13 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/fs/read_write.c b/fs/read_write.c
> > > index 3288db1d5f21..174cf92eea1d 100644
> > > --- a/fs/read_write.c
> > > +++ b/fs/read_write.c
> > > @@ -1909,6 +1909,19 @@ int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in,
> > >         bool same_inode = (inode_in == inode_out);
> > >         int ret;
> > >
> > > +       /*
> > > +        * FICLONE/FICLONERANGE ioctls enforce that src and dest files are on
> > > +        * the same mount. Practically, they only need to be on the same file
> > > +        * system. We check this here rather than at the ioctl layers because
> > > +        * this is effectively a limitation of the fielsystem implementations,
> > > +        * not so much the API itself. Further, ->remap_file_range() can be
> > > +        * called from syscalls that don't have cross device copy restrictions
> > > +        * (such as copy_file_range()) and so we need to catch them before we
> > > +        * do any damage.
> > > +        */
> > > +       if (inode_in->i_sb != inode_out->i_sb)
> > > +               return -EXDEV;
> > > +
> > >         /* Don't touch certain kinds of inodes */
> > >         if (IS_IMMUTABLE(inode_out))
> > >                 return -EPERM;
> > > @@ -2013,14 +2026,6 @@ loff_t do_clone_file_range(struct file *file_in, loff_t pos_in,
> > >         if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
> > >                 return -EINVAL;
> > >
> > > -       /*
> > > -        * FICLONE/FICLONERANGE ioctls enforce that src and dest files are on
> > > -        * the same mount. Practically, they only need to be on the same file
> > > -        * system.
> > > -        */
> > > -       if (inode_in->i_sb != inode_out->i_sb)
> > > -               return -EXDEV;
> > > -
> > 
> 
> I think this is sort of backwards -- the checks should stay in
> do_clone_file_range, and vfs_copy_file_range should be calling that
> instead of directly calling ->remap_range():
> 
> vfs_copy_file_range()
> {
> 	file_start_write(...);
> 	ret = do_clone_file_range(...);
> 	if (ret > 0)
> 		return ret;
> 	ret = do_copy_file_range(...);
> 	file_end_write(...);
> 	return ret;
> }

I'm already confused by the way we weave in and out of "vfs_/do_*"
functions, and this just makes it worse.

Just what the hell is supposed to be in a "vfs_" prefixed function,
and why the hell is it considered a "vfs" level function if we then
export it's internal functions for individual filesystems to use?

Cheers,

Dave.
Darrick J. Wong Dec. 3, 2018, 11:58 p.m. UTC | #6
On Tue, Dec 04, 2018 at 10:37:14AM +1100, Dave Chinner wrote:
> On Mon, Dec 03, 2018 at 11:11:30AM -0800, Darrick J. Wong wrote:
> > On Mon, Dec 03, 2018 at 01:04:07PM +0200, Amir Goldstein wrote:
> > > On Mon, Dec 3, 2018 at 10:34 AM Dave Chinner <david@fromorbit.com> wrote:
> > > >
> > > > From: Dave Chinner <dchinner@redhat.com>
> > > >
> > > > before we can enable cross-device copies into copy_file_range(),
> > > > we have to ensure that ->remap_file_range() implemenations will
> > > > correctly reject attempts to do cross filesystem clones. Currently
> > > 
> > > But you only fixed remap_file_range() implemenations of xfs and ocfs2...
> > > 
> > > > these checks are done above calls to ->remap_file_range(), but
> > > > we need to drive them inwards so that we get EXDEV protection for all
> > > > callers of ->remap_file_range().
> > > >
> > > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > > ---
> > > >  fs/read_write.c | 21 +++++++++++++--------
> > > >  1 file changed, 13 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/fs/read_write.c b/fs/read_write.c
> > > > index 3288db1d5f21..174cf92eea1d 100644
> > > > --- a/fs/read_write.c
> > > > +++ b/fs/read_write.c
> > > > @@ -1909,6 +1909,19 @@ int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in,
> > > >         bool same_inode = (inode_in == inode_out);
> > > >         int ret;
> > > >
> > > > +       /*
> > > > +        * FICLONE/FICLONERANGE ioctls enforce that src and dest files are on
> > > > +        * the same mount. Practically, they only need to be on the same file
> > > > +        * system. We check this here rather than at the ioctl layers because
> > > > +        * this is effectively a limitation of the fielsystem implementations,
> > > > +        * not so much the API itself. Further, ->remap_file_range() can be
> > > > +        * called from syscalls that don't have cross device copy restrictions
> > > > +        * (such as copy_file_range()) and so we need to catch them before we
> > > > +        * do any damage.
> > > > +        */
> > > > +       if (inode_in->i_sb != inode_out->i_sb)
> > > > +               return -EXDEV;
> > > > +
> > > >         /* Don't touch certain kinds of inodes */
> > > >         if (IS_IMMUTABLE(inode_out))
> > > >                 return -EPERM;
> > > > @@ -2013,14 +2026,6 @@ loff_t do_clone_file_range(struct file *file_in, loff_t pos_in,
> > > >         if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
> > > >                 return -EINVAL;
> > > >
> > > > -       /*
> > > > -        * FICLONE/FICLONERANGE ioctls enforce that src and dest files are on
> > > > -        * the same mount. Practically, they only need to be on the same file
> > > > -        * system.
> > > > -        */
> > > > -       if (inode_in->i_sb != inode_out->i_sb)
> > > > -               return -EXDEV;
> > > > -
> > > 
> > 
> > I think this is sort of backwards -- the checks should stay in
> > do_clone_file_range, and vfs_copy_file_range should be calling that
> > instead of directly calling ->remap_range():
> > 
> > vfs_copy_file_range()
> > {
> > 	file_start_write(...);
> > 	ret = do_clone_file_range(...);
> > 	if (ret > 0)
> > 		return ret;
> > 	ret = do_copy_file_range(...);
> > 	file_end_write(...);
> > 	return ret;
> > }
> 
> I'm already confused by the way we weave in and out of "vfs_/do_*"
> functions, and this just makes it worse.
> 
> Just what the hell is supposed to be in a "vfs_" prefixed function,
> and why the hell is it considered a "vfs" level function if we then
> export it's internal functions for individual filesystems to use?

I /think/ vfs_ functions are file_start_write()/file_end_write()
wrappers around a similarly named function that lacks the freeze
protection??

(AFAICT Amir made that split so that overlayfs could use these
functions, though I do not know if everything vfs_ was made that way
/specifically/ for overlayfs or if that's the way things have been and
ovlfs simply takes advantage of it...)

Guhhh, none of this is documented......

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
Olga Kornievskaia Dec. 4, 2018, 8:18 a.m. UTC | #7
On Mon, Dec 3, 2018 at 3:34 AM Dave Chinner <david@fromorbit.com> wrote:
>
> From: Dave Chinner <dchinner@redhat.com>
>
> before we can enable cross-device copies into copy_file_range(),
> we have to ensure that ->remap_file_range() implemenations will
> correctly reject attempts to do cross filesystem clones. Currently
> these checks are done above calls to ->remap_file_range(), but
> we need to drive them inwards so that we get EXDEV protection for all
> callers of ->remap_file_range().

If there is no check before calling ->remap_file_range() then NFS
barfs. Perhaps it needs a check internally that checks that both file
handles are from the NFS but this was not needed before. Or there
needs to the a check in VFS.

>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/read_write.c | 21 +++++++++++++--------
>  1 file changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 3288db1d5f21..174cf92eea1d 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1909,6 +1909,19 @@ int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in,
>         bool same_inode = (inode_in == inode_out);
>         int ret;
>
> +       /*
> +        * FICLONE/FICLONERANGE ioctls enforce that src and dest files are on
> +        * the same mount. Practically, they only need to be on the same file
> +        * system. We check this here rather than at the ioctl layers because
> +        * this is effectively a limitation of the fielsystem implementations,
> +        * not so much the API itself. Further, ->remap_file_range() can be
> +        * called from syscalls that don't have cross device copy restrictions
> +        * (such as copy_file_range()) and so we need to catch them before we
> +        * do any damage.
> +        */
> +       if (inode_in->i_sb != inode_out->i_sb)
> +               return -EXDEV;
> +
>         /* Don't touch certain kinds of inodes */
>         if (IS_IMMUTABLE(inode_out))
>                 return -EPERM;
> @@ -2013,14 +2026,6 @@ loff_t do_clone_file_range(struct file *file_in, loff_t pos_in,
>         if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
>                 return -EINVAL;
>
> -       /*
> -        * FICLONE/FICLONERANGE ioctls enforce that src and dest files are on
> -        * the same mount. Practically, they only need to be on the same file
> -        * system.
> -        */
> -       if (inode_in->i_sb != inode_out->i_sb)
> -               return -EXDEV;
> -
>         if (!(file_in->f_mode & FMODE_READ) ||
>             !(file_out->f_mode & FMODE_WRITE) ||
>             (file_out->f_flags & O_APPEND))
> --
> 2.19.1
>
Amir Goldstein Dec. 4, 2018, 9:17 a.m. UTC | #8
> > > I think this is sort of backwards -- the checks should stay in
> > > do_clone_file_range, and vfs_copy_file_range should be calling that
> > > instead of directly calling ->remap_range():
> > >
> > > vfs_copy_file_range()
> > > {
> > >     file_start_write(...);
> > >     ret = do_clone_file_range(...);
> > >     if (ret > 0)
> > >             return ret;
> > >     ret = do_copy_file_range(...);
> > >     file_end_write(...);
> > >     return ret;
> > > }
> >
> > I'm already confused by the way we weave in and out of "vfs_/do_*"
> > functions, and this just makes it worse.
> >
> > Just what the hell is supposed to be in a "vfs_" prefixed function,
> > and why the hell is it considered a "vfs" level function if we then
> > export it's internal functions for individual filesystems to use?
>
> I /think/ vfs_ functions are file_start_write()/file_end_write()
> wrappers around a similarly named function that lacks the freeze
> protection??

That is definitely not an official definition of vfs_ vs. do_, but I found
this rule to be a common practice, which is why I swapped
{do,vfs}_clone_file_range(). But around vfs you can find many examples
where do_ helpers wrap vfs_ helpers.

>
> (AFAICT Amir made that split so that overlayfs could use these
> functions, though I do not know if everything vfs_ was made that way
> /specifically/ for overlayfs or if that's the way things have been and
> ovlfs simply takes advantage of it...)
>
> Guhhh, none of this is documented......
>

It looks like in git epoc, things were pretty straight forward.
vfs_XXX was the interface called after sys_XXX converted
userspace arguments (e.g. char *name, int fd) to vfs objects
(e.g. struct path,dentry,inode,file). Sometimes vfs_ helpers called
do_ helpers for several reasons. See for example epoc version
of fs/namei.c fs/read_write.c.
Even then there were exception. For example do_sendfile()
doesn't even have a vfs_ interface, although it is clear what
that prospect interface would look like.
To that end, do_splice_direct() acts as the standard do_ helper
to that non-existing vfs_ interface.

From there on, I guess things kinda grew organically.
fs/namei.c syscalls grew do_XXXat() helpers between syscalls
and vfs_XXX interface.

Overlayfs uses vfs_ interface 99% of the time, so from that perspective
it is regarded as an interface with vfs objects as arguments that does
NOT skip security_ checks and does NOT bypass freeze protection.

Overlayfs calling do_clone_file_range() and do_splice_direct() are
the only exception to this rule.
If we would want to replace those calls in ovl_copy_up_data() with
a single call to do_copy_file_range(), than said helper should NOT
be taking freeze protection and should do the fallback between
filesystem copy_file_range and generic_copy_file_range.

Cheers,
Amir.

Patch
diff mbox series

diff --git a/fs/read_write.c b/fs/read_write.c
index 3288db1d5f21..174cf92eea1d 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1909,6 +1909,19 @@  int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in,
 	bool same_inode = (inode_in == inode_out);
 	int ret;
 
+	/*
+	 * FICLONE/FICLONERANGE ioctls enforce that src and dest files are on
+	 * the same mount. Practically, they only need to be on the same file
+	 * system. We check this here rather than at the ioctl layers because
+	 * this is effectively a limitation of the fielsystem implementations,
+	 * not so much the API itself. Further, ->remap_file_range() can be
+	 * called from syscalls that don't have cross device copy restrictions
+	 * (such as copy_file_range()) and so we need to catch them before we
+	 * do any damage.
+	 */
+	if (inode_in->i_sb != inode_out->i_sb)
+		return -EXDEV;
+
 	/* Don't touch certain kinds of inodes */
 	if (IS_IMMUTABLE(inode_out))
 		return -EPERM;
@@ -2013,14 +2026,6 @@  loff_t do_clone_file_range(struct file *file_in, loff_t pos_in,
 	if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
 		return -EINVAL;
 
-	/*
-	 * FICLONE/FICLONERANGE ioctls enforce that src and dest files are on
-	 * the same mount. Practically, they only need to be on the same file
-	 * system.
-	 */
-	if (inode_in->i_sb != inode_out->i_sb)
-		return -EXDEV;
-
 	if (!(file_in->f_mode & FMODE_READ) ||
 	    !(file_out->f_mode & FMODE_WRITE) ||
 	    (file_out->f_flags & O_APPEND))