diff mbox series

[v1,02/11] VFS permit cross device vfs_copy_file_range

Message ID 20181019153018.32507-2-olga.kornievskaia@gmail.com (mailing list archive)
State New, archived
Headers show
Series [v1,01/11] fs: Don't copy beyond the end of the file | expand

Commit Message

Olga Kornievskaia Oct. 19, 2018, 3:30 p.m. UTC
From: Olga Kornievskaia <kolga@netapp.com>

Allow copy_file_range to copy between different superblocks but only
of the same file system types. This feature was of interest to CIFS
as well as NFS.

This feature is needed by NFSv4.2 to perform file copy operation on
the same server or file copy between different NFSv4.2 servers.

If a file system's fileoperations copy_file_range operation prohibits
cross-device copies, fall back to do_splice_direct. This would be
needed for the NFS (destination) server side implementation of the
file copy and currently for CIFS.

Besides NFS, there is only 1 implementor of the copy_file_range FS
operation -- CIFS. CIFS assumes incoming file descriptors are both
CIFS but it will check if they are coming from different servers and
return error code to fall back to do_splice_direct.

NFS will allow for copies between different NFS servers.

Adding to the vfs.txt documentation to explicitly warn about allowing
for different superblocks of the same file type to be passed into the
copy_file_range for the future users of copy_file_range method.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
---
 Documentation/filesystems/vfs.txt |  4 +++-
 fs/read_write.c                   | 13 ++++++-------
 2 files changed, 9 insertions(+), 8 deletions(-)

Comments

Amir Goldstein Oct. 19, 2018, 3:54 p.m. UTC | #1
On Fri, Oct 19, 2018 at 6:30 PM Olga Kornievskaia
<olga.kornievskaia@gmail.com> wrote:
>
> From: Olga Kornievskaia <kolga@netapp.com>
>
> Allow copy_file_range to copy between different superblocks but only
> of the same file system types. This feature was of interest to CIFS
> as well as NFS.
>
> This feature is needed by NFSv4.2 to perform file copy operation on
> the same server or file copy between different NFSv4.2 servers.
>
> If a file system's fileoperations copy_file_range operation prohibits
> cross-device copies, fall back to do_splice_direct. This would be
> needed for the NFS (destination) server side implementation of the
> file copy and currently for CIFS.
>
> Besides NFS, there is only 1 implementor of the copy_file_range FS
> operation -- CIFS. CIFS assumes incoming file descriptors are both
> CIFS but it will check if they are coming from different servers and
> return error code to fall back to do_splice_direct.
>
> NFS will allow for copies between different NFS servers.
>
> Adding to the vfs.txt documentation to explicitly warn about allowing
> for different superblocks of the same file type to be passed into the
> copy_file_range for the future users of copy_file_range method.
>
> Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> ---
>  Documentation/filesystems/vfs.txt |  4 +++-
>  fs/read_write.c                   | 13 ++++++-------
>  2 files changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
> index a6c6a8a..5e520de 100644
> --- a/Documentation/filesystems/vfs.txt
> +++ b/Documentation/filesystems/vfs.txt
> @@ -958,7 +958,9 @@ otherwise noted.
>
>    fallocate: called by the VFS to preallocate blocks or punch a hole.
>
> -  copy_file_range: called by the copy_file_range(2) system call.
> +  copy_file_range: called by copy_file_range(2) system call. This method
> +                  works on two file descriptors that might reside on
> +                  different superblocks of the same type of file system.
>
>    clone_file_range: called by the ioctl(2) system call for FICLONERANGE and
>         FICLONE commands.
> diff --git a/fs/read_write.c b/fs/read_write.c
> index c60790f..474e740 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1578,10 +1578,6 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>             (file_out->f_flags & O_APPEND))
>                 return -EBADF;
>
> -       /* this could be relaxed once a method supports cross-fs copies */
> -       if (inode_in->i_sb != inode_out->i_sb)
> -               return -EXDEV;
> -

You need to hoist this limitation to  clone_file_range() syscall, because
you are not allowed to change user facing behavior.
Maybe you can later add a uapi flag for copy_file_range() to explicitly allow
for cross-sb copy?
Maybe you can add the flag now for internal use  - only nfsv4 will pass that
flag to the vfs helper and system call will verify that flags == 0.

Thanks,
Amir.
Amir Goldstein Oct. 19, 2018, 4:14 p.m. UTC | #2
On Fri, Oct 19, 2018 at 6:54 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Fri, Oct 19, 2018 at 6:30 PM Olga Kornievskaia
> <olga.kornievskaia@gmail.com> wrote:
> >
> > From: Olga Kornievskaia <kolga@netapp.com>
> >
> > Allow copy_file_range to copy between different superblocks but only
> > of the same file system types. This feature was of interest to CIFS
> > as well as NFS.
> >
> > This feature is needed by NFSv4.2 to perform file copy operation on
> > the same server or file copy between different NFSv4.2 servers.
> >
> > If a file system's fileoperations copy_file_range operation prohibits
> > cross-device copies, fall back to do_splice_direct. This would be
> > needed for the NFS (destination) server side implementation of the
> > file copy and currently for CIFS.
> >
> > Besides NFS, there is only 1 implementor of the copy_file_range FS
> > operation -- CIFS. CIFS assumes incoming file descriptors are both
> > CIFS but it will check if they are coming from different servers and
> > return error code to fall back to do_splice_direct.
> >
> > NFS will allow for copies between different NFS servers.
> >
> > Adding to the vfs.txt documentation to explicitly warn about allowing
> > for different superblocks of the same file type to be passed into the
> > copy_file_range for the future users of copy_file_range method.
> >
> > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > ---
> >  Documentation/filesystems/vfs.txt |  4 +++-
> >  fs/read_write.c                   | 13 ++++++-------
> >  2 files changed, 9 insertions(+), 8 deletions(-)
> >
> > diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
> > index a6c6a8a..5e520de 100644
> > --- a/Documentation/filesystems/vfs.txt
> > +++ b/Documentation/filesystems/vfs.txt
> > @@ -958,7 +958,9 @@ otherwise noted.
> >
> >    fallocate: called by the VFS to preallocate blocks or punch a hole.
> >
> > -  copy_file_range: called by the copy_file_range(2) system call.
> > +  copy_file_range: called by copy_file_range(2) system call. This method
> > +                  works on two file descriptors that might reside on
> > +                  different superblocks of the same type of file system.
> >
> >    clone_file_range: called by the ioctl(2) system call for FICLONERANGE and
> >         FICLONE commands.
> > diff --git a/fs/read_write.c b/fs/read_write.c
> > index c60790f..474e740 100644
> > --- a/fs/read_write.c
> > +++ b/fs/read_write.c
> > @@ -1578,10 +1578,6 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> >             (file_out->f_flags & O_APPEND))
> >                 return -EBADF;
> >
> > -       /* this could be relaxed once a method supports cross-fs copies */
> > -       if (inode_in->i_sb != inode_out->i_sb)
> > -               return -EXDEV;
> > -
>
> You need to hoist this limitation to  clone_file_range() syscall, because
> you are not allowed to change user facing behavior.
> Maybe you can later add a uapi flag for copy_file_range() to explicitly allow
> for cross-sb copy?
> Maybe you can add the flag now for internal use  - only nfsv4 will pass that
> flag to the vfs helper and system call will verify that flags == 0.
>

Sorry, you meant to allow cross sb copy from client, so you do want
to change user facing behavior.
For that you need to declare a flag to opt in for the new behavior
and document it.
Please keep linux-api CC-ed when posting the flag.
Referencing a draft for man page is recommended.

Thanks,
Amir.
Olga Kornievskaia Oct. 19, 2018, 4:24 p.m. UTC | #3
On Fri, Oct 19, 2018 at 11:55 AM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Fri, Oct 19, 2018 at 6:30 PM Olga Kornievskaia
> <olga.kornievskaia@gmail.com> wrote:
> >
> > From: Olga Kornievskaia <kolga@netapp.com>
> >
> > Allow copy_file_range to copy between different superblocks but only
> > of the same file system types. This feature was of interest to CIFS
> > as well as NFS.
> >
> > This feature is needed by NFSv4.2 to perform file copy operation on
> > the same server or file copy between different NFSv4.2 servers.
> >
> > If a file system's fileoperations copy_file_range operation prohibits
> > cross-device copies, fall back to do_splice_direct. This would be
> > needed for the NFS (destination) server side implementation of the
> > file copy and currently for CIFS.
> >
> > Besides NFS, there is only 1 implementor of the copy_file_range FS
> > operation -- CIFS. CIFS assumes incoming file descriptors are both
> > CIFS but it will check if they are coming from different servers and
> > return error code to fall back to do_splice_direct.
> >
> > NFS will allow for copies between different NFS servers.
> >
> > Adding to the vfs.txt documentation to explicitly warn about allowing
> > for different superblocks of the same file type to be passed into the
> > copy_file_range for the future users of copy_file_range method.
> >
> > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > ---
> >  Documentation/filesystems/vfs.txt |  4 +++-
> >  fs/read_write.c                   | 13 ++++++-------
> >  2 files changed, 9 insertions(+), 8 deletions(-)
> >
> > diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
> > index a6c6a8a..5e520de 100644
> > --- a/Documentation/filesystems/vfs.txt
> > +++ b/Documentation/filesystems/vfs.txt
> > @@ -958,7 +958,9 @@ otherwise noted.
> >
> >    fallocate: called by the VFS to preallocate blocks or punch a hole.
> >
> > -  copy_file_range: called by the copy_file_range(2) system call.
> > +  copy_file_range: called by copy_file_range(2) system call. This method
> > +                  works on two file descriptors that might reside on
> > +                  different superblocks of the same type of file system.
> >
> >    clone_file_range: called by the ioctl(2) system call for FICLONERANGE and
> >         FICLONE commands.
> > diff --git a/fs/read_write.c b/fs/read_write.c
> > index c60790f..474e740 100644
> > --- a/fs/read_write.c
> > +++ b/fs/read_write.c
> > @@ -1578,10 +1578,6 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> >             (file_out->f_flags & O_APPEND))
> >                 return -EBADF;
> >
> > -       /* this could be relaxed once a method supports cross-fs copies */
> > -       if (inode_in->i_sb != inode_out->i_sb)
> > -               return -EXDEV;
> > -
>
> You need to hoist this limitation to  clone_file_range() syscall, because
> you are not allowed to change user facing behavior.
> Maybe you can later add a uapi flag for copy_file_range() to explicitly allow
> for cross-sb copy?

Sure I can do that.

> Maybe you can add the flag now for internal use  - only nfsv4 will pass that
> flag to the vfs helper and system call will verify that flags == 0.

Not only NFS wants it CIFS want it too.

>
> Thanks,
> Amir.
Olga Kornievskaia Oct. 19, 2018, 5:04 p.m. UTC | #4
On Fri, Oct 19, 2018 at 12:24 PM Olga Kornievskaia
<olga.kornievskaia@gmail.com> wrote:
>
> On Fri, Oct 19, 2018 at 11:55 AM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Fri, Oct 19, 2018 at 6:30 PM Olga Kornievskaia
> > <olga.kornievskaia@gmail.com> wrote:
> > >
> > > From: Olga Kornievskaia <kolga@netapp.com>
> > >
> > > Allow copy_file_range to copy between different superblocks but only
> > > of the same file system types. This feature was of interest to CIFS
> > > as well as NFS.
> > >
> > > This feature is needed by NFSv4.2 to perform file copy operation on
> > > the same server or file copy between different NFSv4.2 servers.
> > >
> > > If a file system's fileoperations copy_file_range operation prohibits
> > > cross-device copies, fall back to do_splice_direct. This would be
> > > needed for the NFS (destination) server side implementation of the
> > > file copy and currently for CIFS.
> > >
> > > Besides NFS, there is only 1 implementor of the copy_file_range FS
> > > operation -- CIFS. CIFS assumes incoming file descriptors are both
> > > CIFS but it will check if they are coming from different servers and
> > > return error code to fall back to do_splice_direct.
> > >
> > > NFS will allow for copies between different NFS servers.
> > >
> > > Adding to the vfs.txt documentation to explicitly warn about allowing
> > > for different superblocks of the same file type to be passed into the
> > > copy_file_range for the future users of copy_file_range method.
> > >
> > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
> > > ---
> > >  Documentation/filesystems/vfs.txt |  4 +++-
> > >  fs/read_write.c                   | 13 ++++++-------
> > >  2 files changed, 9 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
> > > index a6c6a8a..5e520de 100644
> > > --- a/Documentation/filesystems/vfs.txt
> > > +++ b/Documentation/filesystems/vfs.txt
> > > @@ -958,7 +958,9 @@ otherwise noted.
> > >
> > >    fallocate: called by the VFS to preallocate blocks or punch a hole.
> > >
> > > -  copy_file_range: called by the copy_file_range(2) system call.
> > > +  copy_file_range: called by copy_file_range(2) system call. This method
> > > +                  works on two file descriptors that might reside on
> > > +                  different superblocks of the same type of file system.
> > >
> > >    clone_file_range: called by the ioctl(2) system call for FICLONERANGE and
> > >         FICLONE commands.
> > > diff --git a/fs/read_write.c b/fs/read_write.c
> > > index c60790f..474e740 100644
> > > --- a/fs/read_write.c
> > > +++ b/fs/read_write.c
> > > @@ -1578,10 +1578,6 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> > >             (file_out->f_flags & O_APPEND))
> > >                 return -EBADF;
> > >
> > > -       /* this could be relaxed once a method supports cross-fs copies */
> > > -       if (inode_in->i_sb != inode_out->i_sb)
> > > -               return -EXDEV;
> > > -
> >
> > You need to hoist this limitation to  clone_file_range() syscall, because
> > you are not allowed to change user facing behavior.
> > Maybe you can later add a uapi flag for copy_file_range() to explicitly allow
> > for cross-sb copy?
>
> Sure I can do that.
>
> > Maybe you can add the flag now for internal use  - only nfsv4 will pass that
> > flag to the vfs helper and system call will verify that flags == 0.
>
> Not only NFS wants it CIFS want it too.

Can you please elaborate on what do you mean "internal use only"?
nfsv4 doesn't call copy_file_range(), a user land application (such as
glibc cp, or whatever does a system call for the copy_file_range()) is
the one calling and doing a copy between two different server (ie
superblocks). Thus the system call can't keep the check "flags == 0".

I can (maybe?) add user level flags that copy_file_range() system call
will pass, I will have to remove the "flags == 0" check and then I can
add a check if such a flag is passed, then skip the check for the
cross device check in copy_file_range().

Is this what's desired?
>
> >
> > Thanks,
> > Amir.
Matthew Wilcox Oct. 19, 2018, 5:44 p.m. UTC | #5
On Fri, Oct 19, 2018 at 07:14:23PM +0300, Amir Goldstein wrote:
> > On Fri, Oct 19, 2018 at 6:30 PM Olga Kornievskaia
> > <olga.kornievskaia@gmail.com> wrote:
> > > Allow copy_file_range to copy between different superblocks but only
> > > of the same file system types. This feature was of interest to CIFS
> > > as well as NFS.
> 
> Sorry, you meant to allow cross sb copy from client, so you do want
> to change user facing behavior.
> For that you need to declare a flag to opt in for the new behavior
> and document it.
> Please keep linux-api CC-ed when posting the flag.
> Referencing a draft for man page is recommended.

Huh?  We're allowed to make things work that used to fail.  There's no
need to add a flag that the user must set.
Amir Goldstein Oct. 19, 2018, 5:58 p.m. UTC | #6
On Fri, Oct 19, 2018 at 8:44 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Oct 19, 2018 at 07:14:23PM +0300, Amir Goldstein wrote:
> > > On Fri, Oct 19, 2018 at 6:30 PM Olga Kornievskaia
> > > <olga.kornievskaia@gmail.com> wrote:
> > > > Allow copy_file_range to copy between different superblocks but only
> > > > of the same file system types. This feature was of interest to CIFS
> > > > as well as NFS.
> >
> > Sorry, you meant to allow cross sb copy from client, so you do want
> > to change user facing behavior.
> > For that you need to declare a flag to opt in for the new behavior
> > and document it.
> > Please keep linux-api CC-ed when posting the flag.
> > Referencing a draft for man page is recommended.
>
> Huh?  We're allowed to make things work that used to fail.  There's no
> need to add a flag that the user must set.

OK, you are right. Maybe this case doesn't call for an opt-in flag.
I'll let other people weight in.
I should note that the behavior of copy_file_range() has already
changed silently to try to clone_file_range() first and that went
quietly...

Thanks,
Amir.
Matthew Wilcox Oct. 19, 2018, 5:58 p.m. UTC | #7
On Fri, Oct 19, 2018 at 11:30:18AM -0400, Olga Kornievskaia wrote:
> +++ b/Documentation/filesystems/vfs.txt
> @@ -958,7 +958,9 @@ otherwise noted.
>  
>    fallocate: called by the VFS to preallocate blocks or punch a hole.
>  
> -  copy_file_range: called by the copy_file_range(2) system call.
> +  copy_file_range: called by copy_file_range(2) system call. This method
> +		   works on two file descriptors that might reside on
> +		   different superblocks of the same type of file system.

I don't think this text is explicit enough about what has changed, and I
think this is the wrong place for it.  I think there should be a paragraph
in Documentation/filesystems/porting and it should follow the current style
in there.

> @@ -1591,7 +1587,8 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>  	 * Try cloning first, this is supported by more file systems, and
>  	 * more efficient if both clone and copy are supported (e.g. NFS).
>  	 */
> -	if (file_in->f_op->clone_file_range) {
> +	if (inode_in->i_sb == inode_out->i_sb &&
> +			file_in->f_op->clone_file_range) {

This reads weirdly to me.  I know it's the same order the tests were done
in before, but it would feel more natural to me to test:

	if (file_in->f_op->clone_file_range &&
			inode_in->i_sb == inode_out->i_sb)

Am I just suffering from "I would have done this differently"itis, or
is it unnatural?

> @@ -1600,10 +1597,12 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>  		}
>  	}
>  
> -	if (file_out->f_op->copy_file_range) {
> +	if (file_out->f_op->copy_file_range &&
> +			(file_in->f_op->copy_file_range ==
> +				file_out->f_op->copy_file_range)) {

Can we avoid this extra test here?  I know the various stamdards groups
including T10 and the IETF have been trying to define a universal
identifier for the same blob of storage, no matter how it's accessed;
potentially allowing access to the same storage across iSCSI, CIFS
and NFS.  If we ever get to a point where we support that (and I am
dubious), we'd want to remove this test again, and have to revalidate
all the filesystems.
Olga Kornievskaia Oct. 19, 2018, 6:47 p.m. UTC | #8
On Fri, Oct 19, 2018 at 1:58 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Oct 19, 2018 at 11:30:18AM -0400, Olga Kornievskaia wrote:
> > +++ b/Documentation/filesystems/vfs.txt
> > @@ -958,7 +958,9 @@ otherwise noted.
> >
> >    fallocate: called by the VFS to preallocate blocks or punch a hole.
> >
> > -  copy_file_range: called by the copy_file_range(2) system call.
> > +  copy_file_range: called by copy_file_range(2) system call. This method
> > +                works on two file descriptors that might reside on
> > +                different superblocks of the same type of file system.
>
> I don't think this text is explicit enough about what has changed,

Can you suggest a different wording that would be stronger? I can't
say any more clear that copy is allowed between different superblock
which wasn't the case before.

> and I
> think this is the wrong place for it.  I think there should be a paragraph
> in Documentation/filesystems/porting and it should follow the current style
> in there.

I have looked at the "porting" file and it's cryptic. I don't
understand what [mandatory] [recommended] stanzas are. There is
currently no copy_file_range. Is this just a "changelog" and you are
looking for something like
[mandatory]
copy_file_range() now allows copying between different superblocks.

I can add this wording to the "porting" file.

> > @@ -1591,7 +1587,8 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> >        * Try cloning first, this is supported by more file systems, and
> >        * more efficient if both clone and copy are supported (e.g. NFS).
> >        */
> > -     if (file_in->f_op->clone_file_range) {
> > +     if (inode_in->i_sb == inode_out->i_sb &&
> > +                     file_in->f_op->clone_file_range) {
>
> This reads weirdly to me.  I know it's the same order the tests were done
> in before, but it would feel more natural to me to test:
>
>         if (file_in->f_op->clone_file_range &&
>                         inode_in->i_sb == inode_out->i_sb)
>
> Am I just suffering from "I would have done this differently"itis, or
> is it unnatural?

I really don't care one way or another as long as the functionality
goes in. I can change it.

> > @@ -1600,10 +1597,12 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> >               }
> >       }
> >
> > -     if (file_out->f_op->copy_file_range) {
> > +     if (file_out->f_op->copy_file_range &&
> > +                     (file_in->f_op->copy_file_range ==
> > +                             file_out->f_op->copy_file_range)) {
>
> Can we avoid this extra test here?  I know the various stamdards groups
> including T10 and the IETF have been trying to define a universal
> identifier for the same blob of storage, no matter how it's accessed;
> potentially allowing access to the same storage across iSCSI, CIFS
> and NFS.  If we ever get to a point where we support that (and I am
> dubious), we'd want to remove this test again, and have to revalidate
> all the filesystems.

Well from previous submissions I was explicitly asked to add this check.
Matthew Wilcox Oct. 19, 2018, 7:06 p.m. UTC | #9
On Fri, Oct 19, 2018 at 02:47:42PM -0400, Olga Kornievskaia wrote:
> On Fri, Oct 19, 2018 at 1:58 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Fri, Oct 19, 2018 at 11:30:18AM -0400, Olga Kornievskaia wrote:
> > > +++ b/Documentation/filesystems/vfs.txt
> > > @@ -958,7 +958,9 @@ otherwise noted.
> > >
> > >    fallocate: called by the VFS to preallocate blocks or punch a hole.
> > >
> > > -  copy_file_range: called by the copy_file_range(2) system call.
> > > +  copy_file_range: called by copy_file_range(2) system call. This method
> > > +                works on two file descriptors that might reside on
> > > +                different superblocks of the same type of file system.
> >
> > I don't think this text is explicit enough about what has changed,
> 
> Can you suggest a different wording that would be stronger? I can't
> say any more clear that copy is allowed between different superblock
> which wasn't the case before.

I would leave this file alone.

> > and I
> > think this is the wrong place for it.  I think there should be a paragraph
> > in Documentation/filesystems/porting and it should follow the current style
> > in there.
> 
> I have looked at the "porting" file and it's cryptic. I don't
> understand what [mandatory] [recommended] stanzas are. There is
> currently no copy_file_range. Is this just a "changelog" and you are
> looking for something like
> [mandatory]
> copy_file_range() now allows copying between different superblocks.
> 
> I can add this wording to the "porting" file.

The porting file is written from the point of view of someone who's trying
to port an old filesystem to current Linux.

Maybe something like

[mandatory]
	->copy_file_range() may now be passed files which belong to two
	different filesystems.  The destination's copy_file_range() is the
	function which is called.  If it cannot copy ranges from the source,
	it should return -EXDEV.

> > > -     if (file_out->f_op->copy_file_range) {
> > > +     if (file_out->f_op->copy_file_range &&
> > > +                     (file_in->f_op->copy_file_range ==
> > > +                             file_out->f_op->copy_file_range)) {
> >
> > Can we avoid this extra test here?  I know the various stamdards groups
> > including T10 and the IETF have been trying to define a universal
> > identifier for the same blob of storage, no matter how it's accessed;
> > potentially allowing access to the same storage across iSCSI, CIFS
> > and NFS.  If we ever get to a point where we support that (and I am
> > dubious), we'd want to remove this test again, and have to revalidate
> > all the filesystems.
> 
> Well from previous submissions I was explicitly asked to add this check.

I'm not sure that's exactly what Jeff was asking for.  Or maybe it was
and my argument above will change his mind.  Jeff, what do you think?

If we do do this, cifs will need a modification as part of this patch to
reject non-CIFS files as it currently assumes that src_file->private_data
is a pointer to a struct cifsFileInfo.
Steve French Oct. 20, 2018, 1:37 a.m. UTC | #10
On Fri, Oct 19, 2018 at 11:24 AM Olga Kornievskaia
<olga.kornievskaia@gmail.com> wrote:
>
> On Fri, Oct 19, 2018 at 11:55 AM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Fri, Oct 19, 2018 at 6:30 PM Olga Kornievskaia
> > <olga.kornievskaia@gmail.com> wrote:
> > >
> > > From: Olga Kornievskaia <kolga@netapp.com>
> > >
> > > Allow copy_file_range to copy between different superblocks but only
> > > of the same file system types. This feature was of interest to CIFS
> > > as well as NFS.
> > >
> > > This feature is needed by NFSv4.2 to perform file copy operation on
> > > the same server or file copy between different NFSv4.2 servers.
> > >
> > > If a file system's fileoperations copy_file_range operation prohibits
> > > cross-device copies, fall back to do_splice_direct. This would be
> > > needed for the NFS (destination) server side implementation of the
> > > file copy and currently for CIFS.
> > >
> > > Besides NFS, there is only 1 implementor of the copy_file_range FS
> > > operation -- CIFS. CIFS assumes incoming file descriptors are both
> > > CIFS but it will check if they are coming from different servers and
> > > return error code to fall back to do_splice_direct.
> > >
> > > NFS will allow for copies between different NFS servers.
> > >
<snip>
> > Maybe you can add the flag now for internal use  - only nfsv4 will pass that
> > flag to the vfs helper and system call will verify that flags == 0.
>
> Not only NFS wants it CIFS want it too.

Allowing copy offload across distinct smb3 mounts would be very
helpful for cifs.ko

Yes - note that many servers that cifs.ko (SMB3) has to deal with have
already broadly implemented copy offload (multiple flavors, not just the
'copy chunk' style) and 100s of millions of these systems
(not just Windows and Samba) support them as long as the
source and target of the copy are to the same server.  It is the client side
and lack of tools and to some extent cross-mount copies that is holding it
back from being more widely used.  One reason I have only implemented
two of the copy offload mechanisms for SMB3 ('duplicate extents'
and 'copy chunk') and not the third, the popular 'odx' (T10 style)
copy offload is in
part that the value of the 'odx' style is more when the source and target
are different servers.  Note that this is broadly implemented on servers
even on other Unix (see e.g. Nexenta's description of server impmlementation
of 'odx' style https://www.slideshare.net/gordonross/smb3-offload-data-transfer-odx)
so the client API enhancements would be quickly useful for cifs.ko (SMB3 mounts)
use case.


--
Thanks,

Steve
Al Viro Oct. 20, 2018, 4:05 a.m. UTC | #11
On Fri, Oct 19, 2018 at 11:30:18AM -0400, Olga Kornievskaia wrote:
> Allow copy_file_range to copy between different superblocks but only
> of the same file system types. This feature was of interest to CIFS
> as well as NFS.

> +	if (file_out->f_op->copy_file_range &&
> +			(file_in->f_op->copy_file_range ==
> +				file_out->f_op->copy_file_range)) {

That is seriously asking for trouble.  If at some point we add
a library function usable as ->copy_file_range() instance, this
will turn into a hard-to-spot problem.

Comparing methods like that is best avoided.  If you want to compare
fs types, do just that - it's not hard.

Another potential issue here is the interplay with local filesystems
using vfs_clone_file_prep_inodes() (or Darrick's series lifting that
into generic code).  There we assume the same block size on both
sides; that is automatically true if they live on the same superblock,
but with your changes it's no longer true.
Amir Goldstein Oct. 20, 2018, 8:54 a.m. UTC | #12
On Sat, Oct 20, 2018 at 7:06 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Fri, Oct 19, 2018 at 11:30:18AM -0400, Olga Kornievskaia wrote:
> > Allow copy_file_range to copy between different superblocks but only
> > of the same file system types. This feature was of interest to CIFS
> > as well as NFS.
>
> > +     if (file_out->f_op->copy_file_range &&
> > +                     (file_in->f_op->copy_file_range ==
> > +                             file_out->f_op->copy_file_range)) {
>
> That is seriously asking for trouble.  If at some point we add
> a library function usable as ->copy_file_range() instance, this
> will turn into a hard-to-spot problem.
>
> Comparing methods like that is best avoided.  If you want to compare
> fs types, do just that - it's not hard.

Another thing is the commit message claims to:
"Allow copy_file_range to copy between different superblocks but only
of the same file system types"

But what the patch actually does is:
"Allow copy_file_range() syscall to copy between different filesystems
AND allow calling the filesystems' copy_file_range() method
between different superblocks but only of the same file system types"

It's probably OK and quite useful to do the former, but maybe man page
should be fixed to explicitly mention that the copy is expected to work
across filesystems since kernel version XXX (?)

If you don't wish to change cross filesystem type behavior and only
relax cross super block limitation, then you should replace the
same inode->i_sb check above with same inode->i_sb->s_type
check instead of doing the check only for calling the filesystem
copy_file_range() method.

>
> Another potential issue here is the interplay with local filesystems
> using vfs_clone_file_prep_inodes() (or Darrick's series lifting that
> into generic code).  There we assume the same block size on both
> sides; that is automatically true if they live on the same superblock,
> but with your changes it's no longer true.

Heh? I don't see that Darrick's work has anything to do with
copy_file_range(). It only touched vfs_copy_file_range() for
the ->clone_file_range() call, which Olga's patch protects with same sb
check.

Thanks,
Amir.
Jeff Layton Oct. 21, 2018, 1:01 p.m. UTC | #13
On Fri, 2018-10-19 at 12:06 -0700, Matthew Wilcox wrote:
> On Fri, Oct 19, 2018 at 02:47:42PM -0400, Olga Kornievskaia wrote:
> > On Fri, Oct 19, 2018 at 1:58 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > 
> > > On Fri, Oct 19, 2018 at 11:30:18AM -0400, Olga Kornievskaia wrote:
> > > > +++ b/Documentation/filesystems/vfs.txt
> > > > @@ -958,7 +958,9 @@ otherwise noted.
> > > > 
> > > >    fallocate: called by the VFS to preallocate blocks or punch a hole.
> > > > 
> > > > -  copy_file_range: called by the copy_file_range(2) system call.
> > > > +  copy_file_range: called by copy_file_range(2) system call. This method
> > > > +                works on two file descriptors that might reside on
> > > > +                different superblocks of the same type of file system.
> > > 
> > > I don't think this text is explicit enough about what has changed,
> > 
> > Can you suggest a different wording that would be stronger? I can't
> > say any more clear that copy is allowed between different superblock
> > which wasn't the case before.
> 
> I would leave this file alone.
> 
> > > and I
> > > think this is the wrong place for it.  I think there should be a paragraph
> > > in Documentation/filesystems/porting and it should follow the current style
> > > in there.
> > 
> > I have looked at the "porting" file and it's cryptic. I don't
> > understand what [mandatory] [recommended] stanzas are. There is
> > currently no copy_file_range. Is this just a "changelog" and you are
> > looking for something like
> > [mandatory]
> > copy_file_range() now allows copying between different superblocks.
> > 
> > I can add this wording to the "porting" file.
> 
> The porting file is written from the point of view of someone who's trying
> to port an old filesystem to current Linux.
> 
> Maybe something like
> 
> [mandatory]
> 	->copy_file_range() may now be passed files which belong to two
> 	different filesystems.  The destination's copy_file_range() is the
> 	function which is called.  If it cannot copy ranges from the source,
> 	it should return -EXDEV.
> 
> > > > -     if (file_out->f_op->copy_file_range) {
> > > > +     if (file_out->f_op->copy_file_range &&
> > > > +                     (file_in->f_op->copy_file_range ==
> > > > +                             file_out->f_op->copy_file_range)) {
> > > 
> > > Can we avoid this extra test here?  I know the various stamdards groups
> > > including T10 and the IETF have been trying to define a universal
> > > identifier for the same blob of storage, no matter how it's accessed;
> > > potentially allowing access to the same storage across iSCSI, CIFS
> > > and NFS.  If we ever get to a point where we support that (and I am
> > > dubious), we'd want to remove this test again, and have to revalidate
> > > all the filesystems.
> > 
> > Well from previous submissions I was explicitly asked to add this check.
> 
> I'm not sure that's exactly what Jeff was asking for.  Or maybe it was
> and my argument above will change his mind.  Jeff, what do you think?
> 
> If we do do this, cifs will need a modification as part of this patch to
> reject non-CIFS files as it currently assumes that src_file->private_data
> is a pointer to a struct cifsFileInfo.

My suggestion to Olga was more of a "if you feel you have to have a
check like this at the vfs layer...", but I think you and Al have
convinced me that comparing operations like this is not a good idea.

I still think that this check belongs down inside the fs copy_file_range
operation itself. That allows the the freedom to implement xdev copies
on a per-fs basis later without needing to alter vfs-level code.
Jeff Layton Oct. 21, 2018, 2:10 p.m. UTC | #14
On Fri, 2018-10-19 at 10:58 -0700, Matthew Wilcox wrote:
> On Fri, Oct 19, 2018 at 11:30:18AM -0400, Olga Kornievskaia wrote:
> > +++ b/Documentation/filesystems/vfs.txt
> > @@ -958,7 +958,9 @@ otherwise noted.
> >  
> >    fallocate: called by the VFS to preallocate blocks or punch a hole.
> >  
> > -  copy_file_range: called by the copy_file_range(2) system call.
> > +  copy_file_range: called by copy_file_range(2) system call. This method
> > +		   works on two file descriptors that might reside on
> > +		   different superblocks of the same type of file system.
> 
> I don't think this text is explicit enough about what has changed, and I
> think this is the wrong place for it.  I think there should be a paragraph
> in Documentation/filesystems/porting and it should follow the current style
> in there.
> 
> > @@ -1591,7 +1587,8 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> >  	 * Try cloning first, this is supported by more file systems, and
> >  	 * more efficient if both clone and copy are supported (e.g. NFS).
> >  	 */
> > -	if (file_in->f_op->clone_file_range) {
> > +	if (inode_in->i_sb == inode_out->i_sb &&
> > +			file_in->f_op->clone_file_range) {
> 
> This reads weirdly to me.  I know it's the same order the tests were done
> in before, but it would feel more natural to me to test:
> 
> 	if (file_in->f_op->clone_file_range &&
> 			inode_in->i_sb == inode_out->i_sb)
> 
> Am I just suffering from "I would have done this differently"itis, or
> is it unnatural?
> 
> > @@ -1600,10 +1597,12 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> >  		}
> >  	}
> >  
> > -	if (file_out->f_op->copy_file_range) {
> > +	if (file_out->f_op->copy_file_range &&
> > +			(file_in->f_op->copy_file_range ==
> > +				file_out->f_op->copy_file_range)) {
> 
> Can we avoid this extra test here?  I know the various stamdards groups
> including T10 and the IETF have been trying to define a universal
> identifier for the same blob of storage, no matter how it's accessed;
> potentially allowing access to the same storage across iSCSI, CIFS
> and NFS.  If we ever get to a point where we support that (and I am
> dubious), we'd want to remove this test again, and have to revalidate
> all the filesystems.
> 

Agreed. I think this really ought to be left up to the lower fs. Pass
the op both struct file pointers and let it sort it out.

We just need to document the expectation that file copy_file_range ops
vet file_in carefully as it could be from anything, and have them return
an appropriate error code if it's not something they can deal with.
Olga Kornievskaia Oct. 22, 2018, 6:39 p.m. UTC | #15
On Fri, Oct 19, 2018 at 3:06 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Oct 19, 2018 at 02:47:42PM -0400, Olga Kornievskaia wrote:
> > On Fri, Oct 19, 2018 at 1:58 PM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Fri, Oct 19, 2018 at 11:30:18AM -0400, Olga Kornievskaia wrote:
> > > > +++ b/Documentation/filesystems/vfs.txt
> > > > @@ -958,7 +958,9 @@ otherwise noted.
> > > >
> > > >    fallocate: called by the VFS to preallocate blocks or punch a hole.
> > > >
> > > > -  copy_file_range: called by the copy_file_range(2) system call.
> > > > +  copy_file_range: called by copy_file_range(2) system call. This method
> > > > +                works on two file descriptors that might reside on
> > > > +                different superblocks of the same type of file system.
> > >
> > > I don't think this text is explicit enough about what has changed,
> >
> > Can you suggest a different wording that would be stronger? I can't
> > say any more clear that copy is allowed between different superblock
> > which wasn't the case before.
>
> I would leave this file alone.
>
> > > and I
> > > think this is the wrong place for it.  I think there should be a paragraph
> > > in Documentation/filesystems/porting and it should follow the current style
> > > in there.
> >
> > I have looked at the "porting" file and it's cryptic. I don't
> > understand what [mandatory] [recommended] stanzas are. There is
> > currently no copy_file_range. Is this just a "changelog" and you are
> > looking for something like
> > [mandatory]
> > copy_file_range() now allows copying between different superblocks.
> >
> > I can add this wording to the "porting" file.
>
> The porting file is written from the point of view of someone who's trying
> to port an old filesystem to current Linux.
>
> Maybe something like
>
> [mandatory]
>         ->copy_file_range() may now be passed files which belong to two
>         different filesystems.  The destination's copy_file_range() is the
>         function which is called.  If it cannot copy ranges from the source,
>         it should return -EXDEV.

Thank you I can add this to the porting file.

>
> > > > -     if (file_out->f_op->copy_file_range) {
> > > > +     if (file_out->f_op->copy_file_range &&
> > > > +                     (file_in->f_op->copy_file_range ==
> > > > +                             file_out->f_op->copy_file_range)) {
> > >
> > > Can we avoid this extra test here?  I know the various stamdards groups
> > > including T10 and the IETF have been trying to define a universal
> > > identifier for the same blob of storage, no matter how it's accessed;
> > > potentially allowing access to the same storage across iSCSI, CIFS
> > > and NFS.  If we ever get to a point where we support that (and I am
> > > dubious), we'd want to remove this test again, and have to revalidate
> > > all the filesystems.
> >
> > Well from previous submissions I was explicitly asked to add this check.
>
> I'm not sure that's exactly what Jeff was asking for.  Or maybe it was
> and my argument above will change his mind.  Jeff, what do you think?
>
> If we do do this, cifs will need a modification as part of this patch to
> reject non-CIFS files as it currently assumes that src_file->private_data
> is a pointer to a struct cifsFileInfo.
Olga Kornievskaia Oct. 22, 2018, 6:45 p.m. UTC | #16
On Sat, Oct 20, 2018 at 4:54 AM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Sat, Oct 20, 2018 at 7:06 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > On Fri, Oct 19, 2018 at 11:30:18AM -0400, Olga Kornievskaia wrote:
> > > Allow copy_file_range to copy between different superblocks but only
> > > of the same file system types. This feature was of interest to CIFS
> > > as well as NFS.
> >
> > > +     if (file_out->f_op->copy_file_range &&
> > > +                     (file_in->f_op->copy_file_range ==
> > > +                             file_out->f_op->copy_file_range)) {
> >
> > That is seriously asking for trouble.  If at some point we add
> > a library function usable as ->copy_file_range() instance, this
> > will turn into a hard-to-spot problem.
> >
> > Comparing methods like that is best avoided.  If you want to compare
> > fs types, do just that - it's not hard.
>
> Another thing is the commit message claims to:
> "Allow copy_file_range to copy between different superblocks but only
> of the same file system types"
>
> But what the patch actually does is:
> "Allow copy_file_range() syscall to copy between different filesystems
> AND allow calling the filesystems' copy_file_range() method
> between different superblocks but only of the same file system types"
>
> It's probably OK and quite useful to do the former, but maybe man page
> should be fixed to explicitly mention that the copy is expected to work
> across filesystems since kernel version XXX (?)
>
> If you don't wish to change cross filesystem type behavior and only
> relax cross super block limitation, then you should replace the
> same inode->i_sb check above with same inode->i_sb->s_type
> check instead of doing the check only for calling the filesystem
> copy_file_range() method.

Thank you for the feedback. In the next version, I will remove the
check for the functions and instead check for the same file system
types.

>
> >
> > Another potential issue here is the interplay with local filesystems
> > using vfs_clone_file_prep_inodes() (or Darrick's series lifting that
> > into generic code).  There we assume the same block size on both
> > sides; that is automatically true if they live on the same superblock,
> > but with your changes it's no longer true.
>
> Heh? I don't see that Darrick's work has anything to do with
> copy_file_range(). It only touched vfs_copy_file_range() for
> the ->clone_file_range() call, which Olga's patch protects with same sb
> check.
>
> Thanks,
> Amir.
Matthew Wilcox Oct. 22, 2018, 7:06 p.m. UTC | #17
On Mon, Oct 22, 2018 at 02:45:04PM -0400, Olga Kornievskaia wrote:
> On Sat, Oct 20, 2018 at 4:54 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > Another thing is the commit message claims to:
> > "Allow copy_file_range to copy between different superblocks but only
> > of the same file system types"
> >
> > But what the patch actually does is:
> > "Allow copy_file_range() syscall to copy between different filesystems
> > AND allow calling the filesystems' copy_file_range() method
> > between different superblocks but only of the same file system types"
> >
> > It's probably OK and quite useful to do the former, but maybe man page
> > should be fixed to explicitly mention that the copy is expected to work
> > across filesystems since kernel version XXX (?)
> >
> > If you don't wish to change cross filesystem type behavior and only
> > relax cross super block limitation, then you should replace the
> > same inode->i_sb check above with same inode->i_sb->s_type
> > check instead of doing the check only for calling the filesystem
> > copy_file_range() method.
> 
> Thank you for the feedback. In the next version, I will remove the
> check for the functions and instead check for the same file system
> types.

Jeff and I agree that this is the wrong way to go.  Instead, the
cross-device check should be in the individual instances, not the top
level code.
Olga Kornievskaia Oct. 22, 2018, 7:34 p.m. UTC | #18
On Mon, Oct 22, 2018 at 3:06 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Mon, Oct 22, 2018 at 02:45:04PM -0400, Olga Kornievskaia wrote:
> > On Sat, Oct 20, 2018 at 4:54 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > > Another thing is the commit message claims to:
> > > "Allow copy_file_range to copy between different superblocks but only
> > > of the same file system types"
> > >
> > > But what the patch actually does is:
> > > "Allow copy_file_range() syscall to copy between different filesystems
> > > AND allow calling the filesystems' copy_file_range() method
> > > between different superblocks but only of the same file system types"
> > >
> > > It's probably OK and quite useful to do the former, but maybe man page
> > > should be fixed to explicitly mention that the copy is expected to work
> > > across filesystems since kernel version XXX (?)
> > >
> > > If you don't wish to change cross filesystem type behavior and only
> > > relax cross super block limitation, then you should replace the
> > > same inode->i_sb check above with same inode->i_sb->s_type
> > > check instead of doing the check only for calling the filesystem
> > > copy_file_range() method.
> >
> > Thank you for the feedback. In the next version, I will remove the
> > check for the functions and instead check for the same file system
> > types.
>
> Jeff and I agree that this is the wrong way to go.  Instead, the
> cross-device check should be in the individual instances, not the top
> level code.

So remove the check all together for the VFS (that was my original
patch to begin with (like #1 not this one). So am I missing the point
again, I keep getting different corrections every time.
Amir Goldstein Oct. 22, 2018, 7:48 p.m. UTC | #19
On Mon, Oct 22, 2018 at 10:35 PM Olga Kornievskaia
<olga.kornievskaia@gmail.com> wrote:
>
> On Mon, Oct 22, 2018 at 3:06 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Mon, Oct 22, 2018 at 02:45:04PM -0400, Olga Kornievskaia wrote:
> > > On Sat, Oct 20, 2018 at 4:54 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > > > Another thing is the commit message claims to:
> > > > "Allow copy_file_range to copy between different superblocks but only
> > > > of the same file system types"
> > > >
> > > > But what the patch actually does is:
> > > > "Allow copy_file_range() syscall to copy between different filesystems
> > > > AND allow calling the filesystems' copy_file_range() method
> > > > between different superblocks but only of the same file system types"
> > > >
> > > > It's probably OK and quite useful to do the former, but maybe man page
> > > > should be fixed to explicitly mention that the copy is expected to work
> > > > across filesystems since kernel version XXX (?)
> > > >
> > > > If you don't wish to change cross filesystem type behavior and only
> > > > relax cross super block limitation, then you should replace the
> > > > same inode->i_sb check above with same inode->i_sb->s_type
> > > > check instead of doing the check only for calling the filesystem
> > > > copy_file_range() method.
> > >
> > > Thank you for the feedback. In the next version, I will remove the
> > > check for the functions and instead check for the same file system
> > > types.
> >
> > Jeff and I agree that this is the wrong way to go.  Instead, the
> > cross-device check should be in the individual instances, not the top
> > level code.
>
> So remove the check all together for the VFS (that was my original
> patch to begin with (like #1 not this one). So am I missing the point
> again, I keep getting different corrections every time.

Because there are different opinions... although you did get the opinion
of the VFS maintainer, which was: compare i_sb->s_type.

Jeff, Matthew, really, what's the use of "allowing" cross fs type copy inside
filesystem code? and which method is going to be called?
file_out->f_op->copy_file_range()?
file_in->f_op->copy_file_range()?
Do we need to check if both are implemented? either?
This is just confusing Olga and gives no real value to anyone.
If we ever have a filesystem copy_file_range() method that can deal
with cross fs type copy, we can change it then when we know the
required semantics of that future call.

That is not to say that we cannot relax same fs type from copy_file_range()
syscall. That has already been done with the current patch, just not officially
declared in commit message.

Thanks,
Amir.
Matthew Wilcox Oct. 22, 2018, 8:29 p.m. UTC | #20
On Mon, Oct 22, 2018 at 10:48:10PM +0300, Amir Goldstein wrote:
> On Mon, Oct 22, 2018 at 10:35 PM Olga Kornievskaia
> <olga.kornievskaia@gmail.com> wrote:
> > So remove the check all together for the VFS (that was my original
> > patch to begin with (like #1 not this one). So am I missing the point
> > again, I keep getting different corrections every time.
> 
> Because there are different opinions... although you did get the opinion
> of the VFS maintainer, which was: compare i_sb->s_type.
> 
> Jeff, Matthew, really, what's the use of "allowing" cross fs type copy inside
> filesystem code? and which method is going to be called?
> file_out->f_op->copy_file_range()?
> file_in->f_op->copy_file_range()?

The destination's method, as Olga originally had.

> Do we need to check if both are implemented? either?
> This is just confusing Olga and gives no real value to anyone.
> If we ever have a filesystem copy_file_range() method that can deal
> with cross fs type copy, we can change it then when we know the
> required semantics of that future call.

Wrong.  Go back and read my reasoning earlier this thread.

> That is not to say that we cannot relax same fs type from copy_file_range()
> syscall. That has already been done with the current patch, just not officially
> declared in commit message.
> 
> Thanks,
> Amir.
Jeff Layton Oct. 22, 2018, 11:39 p.m. UTC | #21
On Mon, 2018-10-22 at 15:34 -0400, Olga Kornievskaia wrote:
> On Mon, Oct 22, 2018 at 3:06 PM Matthew Wilcox <willy@infradead.org> wrote:
> > 
> > On Mon, Oct 22, 2018 at 02:45:04PM -0400, Olga Kornievskaia wrote:
> > > On Sat, Oct 20, 2018 at 4:54 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > > > Another thing is the commit message claims to:
> > > > "Allow copy_file_range to copy between different superblocks but only
> > > > of the same file system types"
> > > > 
> > > > But what the patch actually does is:
> > > > "Allow copy_file_range() syscall to copy between different filesystems
> > > > AND allow calling the filesystems' copy_file_range() method
> > > > between different superblocks but only of the same file system types"
> > > > 
> > > > It's probably OK and quite useful to do the former, but maybe man page
> > > > should be fixed to explicitly mention that the copy is expected to work
> > > > across filesystems since kernel version XXX (?)
> > > > 
> > > > If you don't wish to change cross filesystem type behavior and only
> > > > relax cross super block limitation, then you should replace the
> > > > same inode->i_sb check above with same inode->i_sb->s_type
> > > > check instead of doing the check only for calling the filesystem
> > > > copy_file_range() method.
> > > 
> > > Thank you for the feedback. In the next version, I will remove the
> > > check for the functions and instead check for the same file system
> > > types.
> > 
> > Jeff and I agree that this is the wrong way to go.  Instead, the
> > cross-device check should be in the individual instances, not the top
> > level code.
> 
> So remove the check all together for the VFS (that was my original
> patch to begin with (like #1 not this one). So am I missing the point
> again, I keep getting different corrections every time.

Sorry if I wasn't clear before:

Basically, I think Willy and I are both envisioning that some
copy_file_range implementations may eventually not be subject to the
limitations of the checks you're adding.

All of the current and currently proposed ones are however, so we need
these checks in place. We have two options. We can either do them
globally at the vfs layer or in the individual filesystem "drivers".

I argue that it's better to do it at the driver level, because if we
ever want to implement one that is not subject to these limitations,
you'll effectively have to push these checks down into the drivers
later.

That's where things become ugly for backports and such. You'll basically
be changing a subtle expectation in the driver interface, which could be
a source of bugs later. If we set the expectation now that the drivers
need to do this checking then we can (hopefully) ensure that they all do
before they're ever merged.
Amir Goldstein Oct. 23, 2018, 6:05 a.m. UTC | #22
On Tue, Oct 23, 2018 at 2:39 AM Jeff Layton <jlayton@redhat.com> wrote:
>
> On Mon, 2018-10-22 at 15:34 -0400, Olga Kornievskaia wrote:
> > On Mon, Oct 22, 2018 at 3:06 PM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Mon, Oct 22, 2018 at 02:45:04PM -0400, Olga Kornievskaia wrote:
> > > > On Sat, Oct 20, 2018 at 4:54 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > > > > Another thing is the commit message claims to:
> > > > > "Allow copy_file_range to copy between different superblocks but only
> > > > > of the same file system types"
> > > > >
> > > > > But what the patch actually does is:
> > > > > "Allow copy_file_range() syscall to copy between different filesystems
> > > > > AND allow calling the filesystems' copy_file_range() method
> > > > > between different superblocks but only of the same file system types"
> > > > >
> > > > > It's probably OK and quite useful to do the former, but maybe man page
> > > > > should be fixed to explicitly mention that the copy is expected to work
> > > > > across filesystems since kernel version XXX (?)
> > > > >
> > > > > If you don't wish to change cross filesystem type behavior and only
> > > > > relax cross super block limitation, then you should replace the
> > > > > same inode->i_sb check above with same inode->i_sb->s_type
> > > > > check instead of doing the check only for calling the filesystem
> > > > > copy_file_range() method.
> > > >
> > > > Thank you for the feedback. In the next version, I will remove the
> > > > check for the functions and instead check for the same file system
> > > > types.
> > >
> > > Jeff and I agree that this is the wrong way to go.  Instead, the
> > > cross-device check should be in the individual instances, not the top
> > > level code.
> >
> > So remove the check all together for the VFS (that was my original
> > patch to begin with (like #1 not this one). So am I missing the point
> > again, I keep getting different corrections every time.
>
> Sorry if I wasn't clear before:
>
> Basically, I think Willy and I are both envisioning that some
> copy_file_range implementations may eventually not be subject to the
> limitations of the checks you're adding.
>

Yes. Eventually. And even Matthew is (quote) "dubious" about that ever
happening. Changing the interface as Matthew proposed has a price
and we need to compare this price to the alleged backporting price
that nobody may ever need to pay.

As far as I can tell, passing a struct file * on a file_operations method
that does not belong to that filesystem in unprecedented (*) and is a far
more lethal landmine than the alleged backporting landmine.

(*) prior to v4.19-rc1, filesystems could get an overlayfs file, but
     file_inode(file) has always belonged to the filesystem.

Olga,

I do not strongly object to Matthew's proposal, so don't feel
obligated to choose my side of the argument. I am just trying
to offer a different perspective.

In any case, my outstanding concerns with the patch are:

1. If you change syscall to support cross fs type copy (which is
    good IMO) need to document that in commit message
    and possibly follow up later with a note in man page

2. Commit message says:
    "This feature was of interest of ... NFS"
    "This feature is needed by NFSv4.2..."
    "NFS will allow for copies between different NFS servers."
    It is not clear to me if we are talking about present of future
    NFSv4.2 code. If NFSv4.2 currently does not support cross
    sb copy (??) than your patch need to enforce same sb
    in nfs4_copy_file_range(). If it does support cross sb copy
    than please edit the commit message to make that clear.

Thanks,
Amir.
Olga Kornievskaia Oct. 23, 2018, 3:03 p.m. UTC | #23
On Tue, Oct 23, 2018 at 2:05 AM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Tue, Oct 23, 2018 at 2:39 AM Jeff Layton <jlayton@redhat.com> wrote:
> >
> > On Mon, 2018-10-22 at 15:34 -0400, Olga Kornievskaia wrote:
> > > On Mon, Oct 22, 2018 at 3:06 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > >
> > > > On Mon, Oct 22, 2018 at 02:45:04PM -0400, Olga Kornievskaia wrote:
> > > > > On Sat, Oct 20, 2018 at 4:54 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > > > > > Another thing is the commit message claims to:
> > > > > > "Allow copy_file_range to copy between different superblocks but only
> > > > > > of the same file system types"
> > > > > >
> > > > > > But what the patch actually does is:
> > > > > > "Allow copy_file_range() syscall to copy between different filesystems
> > > > > > AND allow calling the filesystems' copy_file_range() method
> > > > > > between different superblocks but only of the same file system types"
> > > > > >
> > > > > > It's probably OK and quite useful to do the former, but maybe man page
> > > > > > should be fixed to explicitly mention that the copy is expected to work
> > > > > > across filesystems since kernel version XXX (?)
> > > > > >
> > > > > > If you don't wish to change cross filesystem type behavior and only
> > > > > > relax cross super block limitation, then you should replace the
> > > > > > same inode->i_sb check above with same inode->i_sb->s_type
> > > > > > check instead of doing the check only for calling the filesystem
> > > > > > copy_file_range() method.
> > > > >
> > > > > Thank you for the feedback. In the next version, I will remove the
> > > > > check for the functions and instead check for the same file system
> > > > > types.
> > > >
> > > > Jeff and I agree that this is the wrong way to go.  Instead, the
> > > > cross-device check should be in the individual instances, not the top
> > > > level code.
> > >
> > > So remove the check all together for the VFS (that was my original
> > > patch to begin with (like #1 not this one). So am I missing the point
> > > again, I keep getting different corrections every time.
> >
> > Sorry if I wasn't clear before:
> >
> > Basically, I think Willy and I are both envisioning that some
> > copy_file_range implementations may eventually not be subject to the
> > limitations of the checks you're adding.
> >
>
> Yes. Eventually. And even Matthew is (quote) "dubious" about that ever
> happening. Changing the interface as Matthew proposed has a price
> and we need to compare this price to the alleged backporting price
> that nobody may ever need to pay.
>
> As far as I can tell, passing a struct file * on a file_operations method
> that does not belong to that filesystem in unprecedented (*) and is a far
> more lethal landmine than the alleged backporting landmine.
>
> (*) prior to v4.19-rc1, filesystems could get an overlayfs file, but
>      file_inode(file) has always belonged to the filesystem.
>
> Olga,
>
> I do not strongly object to Matthew's proposal, so don't feel
> obligated to choose my side of the argument. I am just trying
> to offer a different perspective.
>
> In any case, my outstanding concerns with the patch are:
>
> 1. If you change syscall to support cross fs type copy (which is
>     good IMO) need to document that in commit message
>     and possibly follow up later with a note in man page
>
> 2. Commit message says:
>     "This feature was of interest of ... NFS"
>     "This feature is needed by NFSv4.2..."
>     "NFS will allow for copies between different NFS servers."
>     It is not clear to me if we are talking about present of future
>     NFSv4.2 code. If NFSv4.2 currently does not support cross
>     sb copy (??) than your patch need to enforce same sb
>     in nfs4_copy_file_range(). If it does support cross sb copy
>     than please edit the commit message to make that clear.

I personally agree with Amir. I think it's far fetched that a file
system would know how to handle something that's not of its type. When
the copy_file_range() was checked in, there was a comment above the
superblock check saying "we might want to relax this in the future".
It deemed appropriate then to enforce the check since none of the file
systems used it. Now, the future is here, and we are removing the
check but proposing a different once because again the future isn't
here and having a single check simplifies the code.

But I don't feel strongly about the check (or rather the location of
it VFS vs each FS) and what I ultimately need is to removed same sb
check. It sounds like if Amir isn't objecting, then the check for same
file system type should be removed from VFS. And, for each of the file
systems that currently support copy_file_range() -- CIFS, NFS, and
overlayfs -- I need to modify them and add a check for the same
fs_type.

Amir to answer your question, only NFSv4.2 has copy_offload
functionality (not earlier NFS versions). Furthermore, existing
upstream only supports same sb copy offload. What this patch series
adds is support for copy offload across different superblocks but NFS
will not support (and would need a check) for copy offload across
different file system types. Also I kinda stand behind the ideas
stated: 1) this is of interest to NFS (where NFS here is to represent
a community, and CIFS is used to represent the other community). 2)
NFSv4.2 copy offload a specific feature that needs this functionality.
3rd statement is bad. Only NFSv4.2 will allow copies between different
NFS servers (ie., after this patch +series), the emphasis was on "will
allow" meaning what this patch will allow to be done (ie, patch's
purpose). Or also, if the NFS server exports different exports, then a
copy between them (assuming exports of the same file system type).

In the next version of the patch, I'll do my best to specified what
changed as the consequence of removing the cross sb check (ie, file
system types of the passed in file can be from different file
systems). I will add wording to the man page and add the suggested
wording to the "porting" file.
Olga Kornievskaia Oct. 23, 2018, 3:30 p.m. UTC | #24
On Tue, Oct 23, 2018 at 11:03 AM Olga Kornievskaia
<olga.kornievskaia@gmail.com> wrote:
>
> On Tue, Oct 23, 2018 at 2:05 AM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Tue, Oct 23, 2018 at 2:39 AM Jeff Layton <jlayton@redhat.com> wrote:
> > >
> > > On Mon, 2018-10-22 at 15:34 -0400, Olga Kornievskaia wrote:
> > > > On Mon, Oct 22, 2018 at 3:06 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > >
> > > > > On Mon, Oct 22, 2018 at 02:45:04PM -0400, Olga Kornievskaia wrote:
> > > > > > On Sat, Oct 20, 2018 at 4:54 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > > > > > > Another thing is the commit message claims to:
> > > > > > > "Allow copy_file_range to copy between different superblocks but only
> > > > > > > of the same file system types"
> > > > > > >
> > > > > > > But what the patch actually does is:
> > > > > > > "Allow copy_file_range() syscall to copy between different filesystems
> > > > > > > AND allow calling the filesystems' copy_file_range() method
> > > > > > > between different superblocks but only of the same file system types"
> > > > > > >
> > > > > > > It's probably OK and quite useful to do the former, but maybe man page
> > > > > > > should be fixed to explicitly mention that the copy is expected to work
> > > > > > > across filesystems since kernel version XXX (?)
> > > > > > >
> > > > > > > If you don't wish to change cross filesystem type behavior and only
> > > > > > > relax cross super block limitation, then you should replace the
> > > > > > > same inode->i_sb check above with same inode->i_sb->s_type
> > > > > > > check instead of doing the check only for calling the filesystem
> > > > > > > copy_file_range() method.
> > > > > >
> > > > > > Thank you for the feedback. In the next version, I will remove the
> > > > > > check for the functions and instead check for the same file system
> > > > > > types.
> > > > >
> > > > > Jeff and I agree that this is the wrong way to go.  Instead, the
> > > > > cross-device check should be in the individual instances, not the top
> > > > > level code.
> > > >
> > > > So remove the check all together for the VFS (that was my original
> > > > patch to begin with (like #1 not this one). So am I missing the point
> > > > again, I keep getting different corrections every time.
> > >
> > > Sorry if I wasn't clear before:
> > >
> > > Basically, I think Willy and I are both envisioning that some
> > > copy_file_range implementations may eventually not be subject to the
> > > limitations of the checks you're adding.
> > >
> >
> > Yes. Eventually. And even Matthew is (quote) "dubious" about that ever
> > happening. Changing the interface as Matthew proposed has a price
> > and we need to compare this price to the alleged backporting price
> > that nobody may ever need to pay.
> >
> > As far as I can tell, passing a struct file * on a file_operations method
> > that does not belong to that filesystem in unprecedented (*) and is a far
> > more lethal landmine than the alleged backporting landmine.
> >
> > (*) prior to v4.19-rc1, filesystems could get an overlayfs file, but
> >      file_inode(file) has always belonged to the filesystem.
> >
> > Olga,
> >
> > I do not strongly object to Matthew's proposal, so don't feel
> > obligated to choose my side of the argument. I am just trying
> > to offer a different perspective.
> >
> > In any case, my outstanding concerns with the patch are:
> >
> > 1. If you change syscall to support cross fs type copy (which is
> >     good IMO) need to document that in commit message
> >     and possibly follow up later with a note in man page
> >
> > 2. Commit message says:
> >     "This feature was of interest of ... NFS"
> >     "This feature is needed by NFSv4.2..."
> >     "NFS will allow for copies between different NFS servers."
> >     It is not clear to me if we are talking about present of future
> >     NFSv4.2 code. If NFSv4.2 currently does not support cross
> >     sb copy (??) than your patch need to enforce same sb
> >     in nfs4_copy_file_range(). If it does support cross sb copy
> >     than please edit the commit message to make that clear.
>
> I personally agree with Amir. I think it's far fetched that a file
> system would know how to handle something that's not of its type. When
> the copy_file_range() was checked in, there was a comment above the
> superblock check saying "we might want to relax this in the future".
> It deemed appropriate then to enforce the check since none of the file
> systems used it. Now, the future is here, and we are removing the
> check but proposing a different once because again the future isn't
> here and having a single check simplifies the code.

Sorry Ok I wrote this too fast. I think I'm changing my mind and
siding with the check by the file system.

> But I don't feel strongly about the check (or rather the location of
> it VFS vs each FS) and what I ultimately need is to removed same sb
> check. It sounds like if Amir isn't objecting, then the check for same
> file system type should be removed from VFS. And, for each of the file
> systems that currently support copy_file_range() -- CIFS, NFS, and
> overlayfs -- I need to modify them and add a check for the same
> fs_type.
>
> Amir to answer your question, only NFSv4.2 has copy_offload
> functionality (not earlier NFS versions). Furthermore, existing
> upstream only supports same sb copy offload. What this patch series
> adds is support for copy offload across different superblocks but NFS
> will not support (and would need a check) for copy offload across
> different file system types. Also I kinda stand behind the ideas
> stated: 1) this is of interest to NFS (where NFS here is to represent
> a community, and CIFS is used to represent the other community). 2)
> NFSv4.2 copy offload a specific feature that needs this functionality.
> 3rd statement is bad. Only NFSv4.2 will allow copies between different
> NFS servers (ie., after this patch +series), the emphasis was on "will
> allow" meaning what this patch will allow to be done (ie, patch's
> purpose). Or also, if the NFS server exports different exports, then a
> copy between them (assuming exports of the same file system type).
>
> In the next version of the patch, I'll do my best to specified what
> changed as the consequence of removing the cross sb check (ie, file
> system types of the passed in file can be from different file
> systems). I will add wording to the man page and add the suggested
> wording to the "porting" file.
Matthew Wilcox Oct. 23, 2018, 3:39 p.m. UTC | #25
On Tue, Oct 23, 2018 at 11:03:02AM -0400, Olga Kornievskaia wrote:
> On Tue, Oct 23, 2018 at 2:05 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > On Tue, Oct 23, 2018 at 2:39 AM Jeff Layton <jlayton@redhat.com> wrote:
> > > Sorry if I wasn't clear before:
> > >
> > > Basically, I think Willy and I are both envisioning that some
> > > copy_file_range implementations may eventually not be subject to the
> > > limitations of the checks you're adding.
> > >
> >
> > Yes. Eventually. And even Matthew is (quote) "dubious" about that ever
> > happening. Changing the interface as Matthew proposed has a price
> > and we need to compare this price to the alleged backporting price
> > that nobody may ever need to pay.
> >
> > As far as I can tell, passing a struct file * on a file_operations method
> > that does not belong to that filesystem in unprecedented (*) and is a far
> > more lethal landmine than the alleged backporting landmine.
> >
> > (*) prior to v4.19-rc1, filesystems could get an overlayfs file, but
> >      file_inode(file) has always belonged to the filesystem.
> >
> > Olga,
> >
> > I do not strongly object to Matthew's proposal, so don't feel
> > obligated to choose my side of the argument. I am just trying
> > to offer a different perspective.
> >
> > In any case, my outstanding concerns with the patch are:
> >
> > 1. If you change syscall to support cross fs type copy (which is
> >     good IMO) need to document that in commit message
> >     and possibly follow up later with a note in man page
> >
> > 2. Commit message says:
> >     "This feature was of interest of ... NFS"
> >     "This feature is needed by NFSv4.2..."
> >     "NFS will allow for copies between different NFS servers."
> >     It is not clear to me if we are talking about present of future
> >     NFSv4.2 code. If NFSv4.2 currently does not support cross
> >     sb copy (??) than your patch need to enforce same sb
> >     in nfs4_copy_file_range(). If it does support cross sb copy
> >     than please edit the commit message to make that clear.
> 
> I personally agree with Amir. I think it's far fetched that a file
> system would know how to handle something that's not of its type. When
> the copy_file_range() was checked in, there was a comment above the
> superblock check saying "we might want to relax this in the future".
> It deemed appropriate then to enforce the check since none of the file
> systems used it. Now, the future is here, and we are removing the
> check but proposing a different once because again the future isn't
> here and having a single check simplifies the code.

I've done some more research and found that while NFSv4.2 has its own
representation of a copyable range of a file, iSCSI and SMB3 share the
same ROD.  So it's not at all implausible that some other filesystem
might also decide to use the same ROD, perhaps even NFS v4.3.

It's clearly crazy to expect filesystem A to know about all the
interpretations of filesystem B's file->private.  I'd expect us to add
a function like:

int vfs_get_rod(struct file *src, struct file_rod *rod);

and then a filesystem's copy_file_range() would check to see if both
src and dest referred to the same server, and if not call vfs_get_rod()
to be able to send the ROD to the destination.
Olga Kornievskaia Oct. 23, 2018, 5:16 p.m. UTC | #26
On Tue, Oct 23, 2018 at 11:30 AM Olga Kornievskaia
<olga.kornievskaia@gmail.com> wrote:
>
> On Tue, Oct 23, 2018 at 11:03 AM Olga Kornievskaia
> <olga.kornievskaia@gmail.com> wrote:
> >
> > On Tue, Oct 23, 2018 at 2:05 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > On Tue, Oct 23, 2018 at 2:39 AM Jeff Layton <jlayton@redhat.com> wrote:
> > > >
> > > > On Mon, 2018-10-22 at 15:34 -0400, Olga Kornievskaia wrote:
> > > > > On Mon, Oct 22, 2018 at 3:06 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > > >
> > > > > > On Mon, Oct 22, 2018 at 02:45:04PM -0400, Olga Kornievskaia wrote:
> > > > > > > On Sat, Oct 20, 2018 at 4:54 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > > > > > > > Another thing is the commit message claims to:
> > > > > > > > "Allow copy_file_range to copy between different superblocks but only
> > > > > > > > of the same file system types"
> > > > > > > >
> > > > > > > > But what the patch actually does is:
> > > > > > > > "Allow copy_file_range() syscall to copy between different filesystems
> > > > > > > > AND allow calling the filesystems' copy_file_range() method
> > > > > > > > between different superblocks but only of the same file system types"
> > > > > > > >
> > > > > > > > It's probably OK and quite useful to do the former, but maybe man page
> > > > > > > > should be fixed to explicitly mention that the copy is expected to work
> > > > > > > > across filesystems since kernel version XXX (?)
> > > > > > > >
> > > > > > > > If you don't wish to change cross filesystem type behavior and only
> > > > > > > > relax cross super block limitation, then you should replace the
> > > > > > > > same inode->i_sb check above with same inode->i_sb->s_type
> > > > > > > > check instead of doing the check only for calling the filesystem
> > > > > > > > copy_file_range() method.
> > > > > > >
> > > > > > > Thank you for the feedback. In the next version, I will remove the
> > > > > > > check for the functions and instead check for the same file system
> > > > > > > types.
> > > > > >
> > > > > > Jeff and I agree that this is the wrong way to go.  Instead, the
> > > > > > cross-device check should be in the individual instances, not the top
> > > > > > level code.
> > > > >
> > > > > So remove the check all together for the VFS (that was my original
> > > > > patch to begin with (like #1 not this one). So am I missing the point
> > > > > again, I keep getting different corrections every time.
> > > >
> > > > Sorry if I wasn't clear before:
> > > >
> > > > Basically, I think Willy and I are both envisioning that some
> > > > copy_file_range implementations may eventually not be subject to the
> > > > limitations of the checks you're adding.
> > > >
> > >
> > > Yes. Eventually. And even Matthew is (quote) "dubious" about that ever
> > > happening. Changing the interface as Matthew proposed has a price
> > > and we need to compare this price to the alleged backporting price
> > > that nobody may ever need to pay.
> > >
> > > As far as I can tell, passing a struct file * on a file_operations method
> > > that does not belong to that filesystem in unprecedented (*) and is a far
> > > more lethal landmine than the alleged backporting landmine.
> > >
> > > (*) prior to v4.19-rc1, filesystems could get an overlayfs file, but
> > >      file_inode(file) has always belonged to the filesystem.
> > >
> > > Olga,
> > >
> > > I do not strongly object to Matthew's proposal, so don't feel
> > > obligated to choose my side of the argument. I am just trying
> > > to offer a different perspective.
> > >
> > > In any case, my outstanding concerns with the patch are:
> > >
> > > 1. If you change syscall to support cross fs type copy (which is
> > >     good IMO) need to document that in commit message
> > >     and possibly follow up later with a note in man page
> > >
> > > 2. Commit message says:
> > >     "This feature was of interest of ... NFS"
> > >     "This feature is needed by NFSv4.2..."
> > >     "NFS will allow for copies between different NFS servers."
> > >     It is not clear to me if we are talking about present of future
> > >     NFSv4.2 code. If NFSv4.2 currently does not support cross
> > >     sb copy (??) than your patch need to enforce same sb
> > >     in nfs4_copy_file_range(). If it does support cross sb copy
> > >     than please edit the commit message to make that clear.
> >
> > I personally agree with Amir. I think it's far fetched that a file
> > system would know how to handle something that's not of its type. When
> > the copy_file_range() was checked in, there was a comment above the
> > superblock check saying "we might want to relax this in the future".
> > It deemed appropriate then to enforce the check since none of the file
> > systems used it. Now, the future is here, and we are removing the
> > check but proposing a different once because again the future isn't
> > here and having a single check simplifies the code.
>
> Sorry Ok I wrote this too fast. I think I'm changing my mind and
> siding with the check by the file system.

The reason I think we should remove the check all together is we'd
allow the callers of copy_file_range() to utilize the
do_splice_direct() between file system types even when
copy_file_range() doesn't support cross fs copy. Isn't that
beneficial?

> > But I don't feel strongly about the check (or rather the location of
> > it VFS vs each FS) and what I ultimately need is to removed same sb
> > check. It sounds like if Amir isn't objecting, then the check for same
> > file system type should be removed from VFS. And, for each of the file
> > systems that currently support copy_file_range() -- CIFS, NFS, and
> > overlayfs -- I need to modify them and add a check for the same
> > fs_type.
> >
> > Amir to answer your question, only NFSv4.2 has copy_offload
> > functionality (not earlier NFS versions). Furthermore, existing
> > upstream only supports same sb copy offload. What this patch series
> > adds is support for copy offload across different superblocks but NFS
> > will not support (and would need a check) for copy offload across
> > different file system types. Also I kinda stand behind the ideas
> > stated: 1) this is of interest to NFS (where NFS here is to represent
> > a community, and CIFS is used to represent the other community). 2)
> > NFSv4.2 copy offload a specific feature that needs this functionality.
> > 3rd statement is bad. Only NFSv4.2 will allow copies between different
> > NFS servers (ie., after this patch +series), the emphasis was on "will
> > allow" meaning what this patch will allow to be done (ie, patch's
> > purpose). Or also, if the NFS server exports different exports, then a
> > copy between them (assuming exports of the same file system type).
> >
> > In the next version of the patch, I'll do my best to specified what
> > changed as the consequence of removing the cross sb check (ie, file
> > system types of the passed in file can be from different file
> > systems). I will add wording to the man page and add the suggested
> > wording to the "porting" file.
Jeff Layton Oct. 24, 2018, 11:17 a.m. UTC | #27
On Tue, 2018-10-23 at 13:16 -0400, Olga Kornievskaia wrote:
> On Tue, Oct 23, 2018 at 11:30 AM Olga Kornievskaia
> <olga.kornievskaia@gmail.com> wrote:
> > 
> > On Tue, Oct 23, 2018 at 11:03 AM Olga Kornievskaia
> > <olga.kornievskaia@gmail.com> wrote:
> > > 
> > > On Tue, Oct 23, 2018 at 2:05 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > > > 
> > > > On Tue, Oct 23, 2018 at 2:39 AM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > 
> > > > > On Mon, 2018-10-22 at 15:34 -0400, Olga Kornievskaia wrote:
> > > > > > On Mon, Oct 22, 2018 at 3:06 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > > > > 
> > > > > > > On Mon, Oct 22, 2018 at 02:45:04PM -0400, Olga Kornievskaia wrote:
> > > > > > > > On Sat, Oct 20, 2018 at 4:54 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > > > > > > > > Another thing is the commit message claims to:
> > > > > > > > > "Allow copy_file_range to copy between different superblocks but only
> > > > > > > > > of the same file system types"
> > > > > > > > > 
> > > > > > > > > But what the patch actually does is:
> > > > > > > > > "Allow copy_file_range() syscall to copy between different filesystems
> > > > > > > > > AND allow calling the filesystems' copy_file_range() method
> > > > > > > > > between different superblocks but only of the same file system types"
> > > > > > > > > 
> > > > > > > > > It's probably OK and quite useful to do the former, but maybe man page
> > > > > > > > > should be fixed to explicitly mention that the copy is expected to work
> > > > > > > > > across filesystems since kernel version XXX (?)
> > > > > > > > > 
> > > > > > > > > If you don't wish to change cross filesystem type behavior and only
> > > > > > > > > relax cross super block limitation, then you should replace the
> > > > > > > > > same inode->i_sb check above with same inode->i_sb->s_type
> > > > > > > > > check instead of doing the check only for calling the filesystem
> > > > > > > > > copy_file_range() method.
> > > > > > > > 
> > > > > > > > Thank you for the feedback. In the next version, I will remove the
> > > > > > > > check for the functions and instead check for the same file system
> > > > > > > > types.
> > > > > > > 
> > > > > > > Jeff and I agree that this is the wrong way to go.  Instead, the
> > > > > > > cross-device check should be in the individual instances, not the top
> > > > > > > level code.
> > > > > > 
> > > > > > So remove the check all together for the VFS (that was my original
> > > > > > patch to begin with (like #1 not this one). So am I missing the point
> > > > > > again, I keep getting different corrections every time.
> > > > > 
> > > > > Sorry if I wasn't clear before:
> > > > > 
> > > > > Basically, I think Willy and I are both envisioning that some
> > > > > copy_file_range implementations may eventually not be subject to the
> > > > > limitations of the checks you're adding.
> > > > > 
> > > > 
> > > > Yes. Eventually. And even Matthew is (quote) "dubious" about that ever
> > > > happening. Changing the interface as Matthew proposed has a price
> > > > and we need to compare this price to the alleged backporting price
> > > > that nobody may ever need to pay.
> > > > 
> > > > As far as I can tell, passing a struct file * on a file_operations method
> > > > that does not belong to that filesystem in unprecedented (*) and is a far
> > > > more lethal landmine than the alleged backporting landmine.
> > > > 
> > > > (*) prior to v4.19-rc1, filesystems could get an overlayfs file, but
> > > >      file_inode(file) has always belonged to the filesystem.
> > > > 
> > > > Olga,
> > > > 
> > > > I do not strongly object to Matthew's proposal, so don't feel
> > > > obligated to choose my side of the argument. I am just trying
> > > > to offer a different perspective.
> > > > 
> > > > In any case, my outstanding concerns with the patch are:
> > > > 
> > > > 1. If you change syscall to support cross fs type copy (which is
> > > >     good IMO) need to document that in commit message
> > > >     and possibly follow up later with a note in man page
> > > > 
> > > > 2. Commit message says:
> > > >     "This feature was of interest of ... NFS"
> > > >     "This feature is needed by NFSv4.2..."
> > > >     "NFS will allow for copies between different NFS servers."
> > > >     It is not clear to me if we are talking about present of future
> > > >     NFSv4.2 code. If NFSv4.2 currently does not support cross
> > > >     sb copy (??) than your patch need to enforce same sb
> > > >     in nfs4_copy_file_range(). If it does support cross sb copy
> > > >     than please edit the commit message to make that clear.
> > > 
> > > I personally agree with Amir. I think it's far fetched that a file
> > > system would know how to handle something that's not of its type. When
> > > the copy_file_range() was checked in, there was a comment above the
> > > superblock check saying "we might want to relax this in the future".
> > > It deemed appropriate then to enforce the check since none of the file
> > > systems used it. Now, the future is here, and we are removing the
> > > check but proposing a different once because again the future isn't
> > > here and having a single check simplifies the code.
> > 
> > Sorry Ok I wrote this too fast. I think I'm changing my mind and
> > siding with the check by the file system.
> 
> The reason I think we should remove the check all together is we'd
> allow the callers of copy_file_range() to utilize the
> do_splice_direct() between file system types even when
> copy_file_range() doesn't support cross fs copy. Isn't that
> beneficial?

Others might argue no, but I think it's good to have that option open.

If the consensus is that we do need a check at the vfs layer to validate
that the struct files come from the same fstype (as Amir suggests), then
I can live with that. We've dealt with internal API changes before so we
can do it again.

Al is quite correct though that we really do want to compare fstypes
rather than op pointers in that case however.

> > > But I don't feel strongly about the check (or rather the location of
> > > it VFS vs each FS) and what I ultimately need is to removed same sb
> > > check. It sounds like if Amir isn't objecting, then the check for same
> > > file system type should be removed from VFS. And, for each of the file
> > > systems that currently support copy_file_range() -- CIFS, NFS, and
> > > overlayfs -- I need to modify them and add a check for the same
> > > fs_type.
> > > 
> > > Amir to answer your question, only NFSv4.2 has copy_offload
> > > functionality (not earlier NFS versions). Furthermore, existing
> > > upstream only supports same sb copy offload. What this patch series
> > > adds is support for copy offload across different superblocks but NFS
> > > will not support (and would need a check) for copy offload across
> > > different file system types. Also I kinda stand behind the ideas
> > > stated: 1) this is of interest to NFS (where NFS here is to represent
> > > a community, and CIFS is used to represent the other community). 2)
> > > NFSv4.2 copy offload a specific feature that needs this functionality.
> > > 3rd statement is bad. Only NFSv4.2 will allow copies between different
> > > NFS servers (ie., after this patch +series), the emphasis was on "will
> > > allow" meaning what this patch will allow to be done (ie, patch's
> > > purpose). Or also, if the NFS server exports different exports, then a
> > > copy between them (assuming exports of the same file system type).
> > > 
> > > In the next version of the patch, I'll do my best to specified what
> > > changed as the consequence of removing the cross sb check (ie, file
> > > system types of the passed in file can be from different file
> > > systems). I will add wording to the man page and add the suggested
> > > wording to the "porting" file.
Amir Goldstein Oct. 24, 2018, 11:32 a.m. UTC | #28
On Tue, Oct 23, 2018 at 6:39 PM Matthew Wilcox <willy@infradead.org> wrote:
>
[...]
> I've done some more research and found that while NFSv4.2 has its own
> representation of a copyable range of a file, iSCSI and SMB3 share the
> same ROD.  So it's not at all implausible that some other filesystem
> might also decide to use the same ROD, perhaps even NFS v4.3.
>
> It's clearly crazy to expect filesystem A to know about all the
> interpretations of filesystem B's file->private.  I'd expect us to add
> a function like:
>
> int vfs_get_rod(struct file *src, struct file_rod *rod);
>
> and then a filesystem's copy_file_range() would check to see if both
> src and dest referred to the same server, and if not call vfs_get_rod()
> to be able to send the ROD to the destination.
>

I am not saying cross filesystem type copy won't be possible.
I'm just saying we are going to get the API wrong anyway.
Surely, it is going to be either cifs or nfs42 that supports cross
fs copy first, not both at the same development cycle, so now
it seems flaky to use the out_file's copy_file_range() method.
You'd want to figure out which of the in/out fs supports cross fs
copy and call that method (not enough information).
I suspect we will need a new cross_copy_file_range() method.

And to answer Olga's question, yes, I do find cross fs copy with
do_splice_direct() beneficial.

Thanks,
Amir.
Olga Kornievskaia Oct. 24, 2018, 7:59 p.m. UTC | #29
On Wed, Oct 24, 2018 at 7:17 AM Jeff Layton <jlayton@redhat.com> wrote:
>
> On Tue, 2018-10-23 at 13:16 -0400, Olga Kornievskaia wrote:
> > On Tue, Oct 23, 2018 at 11:30 AM Olga Kornievskaia
> > <olga.kornievskaia@gmail.com> wrote:
> > >
> > > On Tue, Oct 23, 2018 at 11:03 AM Olga Kornievskaia
> > > <olga.kornievskaia@gmail.com> wrote:
> > > >
> > > > On Tue, Oct 23, 2018 at 2:05 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > > > >
> > > > > On Tue, Oct 23, 2018 at 2:39 AM Jeff Layton <jlayton@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, 2018-10-22 at 15:34 -0400, Olga Kornievskaia wrote:
> > > > > > > On Mon, Oct 22, 2018 at 3:06 PM Matthew Wilcox <willy@infradead.org> wrote:
> > > > > > > >
> > > > > > > > On Mon, Oct 22, 2018 at 02:45:04PM -0400, Olga Kornievskaia wrote:
> > > > > > > > > On Sat, Oct 20, 2018 at 4:54 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > > > > > > > > > Another thing is the commit message claims to:
> > > > > > > > > > "Allow copy_file_range to copy between different superblocks but only
> > > > > > > > > > of the same file system types"
> > > > > > > > > >
> > > > > > > > > > But what the patch actually does is:
> > > > > > > > > > "Allow copy_file_range() syscall to copy between different filesystems
> > > > > > > > > > AND allow calling the filesystems' copy_file_range() method
> > > > > > > > > > between different superblocks but only of the same file system types"
> > > > > > > > > >
> > > > > > > > > > It's probably OK and quite useful to do the former, but maybe man page
> > > > > > > > > > should be fixed to explicitly mention that the copy is expected to work
> > > > > > > > > > across filesystems since kernel version XXX (?)
> > > > > > > > > >
> > > > > > > > > > If you don't wish to change cross filesystem type behavior and only
> > > > > > > > > > relax cross super block limitation, then you should replace the
> > > > > > > > > > same inode->i_sb check above with same inode->i_sb->s_type
> > > > > > > > > > check instead of doing the check only for calling the filesystem
> > > > > > > > > > copy_file_range() method.
> > > > > > > > >
> > > > > > > > > Thank you for the feedback. In the next version, I will remove the
> > > > > > > > > check for the functions and instead check for the same file system
> > > > > > > > > types.
> > > > > > > >
> > > > > > > > Jeff and I agree that this is the wrong way to go.  Instead, the
> > > > > > > > cross-device check should be in the individual instances, not the top
> > > > > > > > level code.
> > > > > > >
> > > > > > > So remove the check all together for the VFS (that was my original
> > > > > > > patch to begin with (like #1 not this one). So am I missing the point
> > > > > > > again, I keep getting different corrections every time.
> > > > > >
> > > > > > Sorry if I wasn't clear before:
> > > > > >
> > > > > > Basically, I think Willy and I are both envisioning that some
> > > > > > copy_file_range implementations may eventually not be subject to the
> > > > > > limitations of the checks you're adding.
> > > > > >
> > > > >
> > > > > Yes. Eventually. And even Matthew is (quote) "dubious" about that ever
> > > > > happening. Changing the interface as Matthew proposed has a price
> > > > > and we need to compare this price to the alleged backporting price
> > > > > that nobody may ever need to pay.
> > > > >
> > > > > As far as I can tell, passing a struct file * on a file_operations method
> > > > > that does not belong to that filesystem in unprecedented (*) and is a far
> > > > > more lethal landmine than the alleged backporting landmine.
> > > > >
> > > > > (*) prior to v4.19-rc1, filesystems could get an overlayfs file, but
> > > > >      file_inode(file) has always belonged to the filesystem.
> > > > >
> > > > > Olga,
> > > > >
> > > > > I do not strongly object to Matthew's proposal, so don't feel
> > > > > obligated to choose my side of the argument. I am just trying
> > > > > to offer a different perspective.
> > > > >
> > > > > In any case, my outstanding concerns with the patch are:
> > > > >
> > > > > 1. If you change syscall to support cross fs type copy (which is
> > > > >     good IMO) need to document that in commit message
> > > > >     and possibly follow up later with a note in man page
> > > > >
> > > > > 2. Commit message says:
> > > > >     "This feature was of interest of ... NFS"
> > > > >     "This feature is needed by NFSv4.2..."
> > > > >     "NFS will allow for copies between different NFS servers."
> > > > >     It is not clear to me if we are talking about present of future
> > > > >     NFSv4.2 code. If NFSv4.2 currently does not support cross
> > > > >     sb copy (??) than your patch need to enforce same sb
> > > > >     in nfs4_copy_file_range(). If it does support cross sb copy
> > > > >     than please edit the commit message to make that clear.
> > > >
> > > > I personally agree with Amir. I think it's far fetched that a file
> > > > system would know how to handle something that's not of its type. When
> > > > the copy_file_range() was checked in, there was a comment above the
> > > > superblock check saying "we might want to relax this in the future".
> > > > It deemed appropriate then to enforce the check since none of the file
> > > > systems used it. Now, the future is here, and we are removing the
> > > > check but proposing a different once because again the future isn't
> > > > here and having a single check simplifies the code.
> > >
> > > Sorry Ok I wrote this too fast. I think I'm changing my mind and
> > > siding with the check by the file system.
> >
> > The reason I think we should remove the check all together is we'd
> > allow the callers of copy_file_range() to utilize the
> > do_splice_direct() between file system types even when
> > copy_file_range() doesn't support cross fs copy. Isn't that
> > beneficial?
>
> Others might argue no, but I think it's good to have that option open.
>
> If the consensus is that we do need a check at the vfs layer to validate
> that the struct files come from the same fstype (as Amir suggests), then
> I can live with that. We've dealt with internal API changes before so we
> can do it again.
>
> Al is quite correct though that we really do want to compare fstypes
> rather than op pointers in that case however.

It feels like folks are now ok with either the check being in the
drivers or doing the check in the VFS layer.

I'm picking the choice of not doing the check in the VFS layer because
it allows for do_splice_direct() by any caller. I'm about to submit
the new version of the patches (this time I will include the NFS patch
series). We can continue with the discussion on the new version.

I have added checks for the CIFS and OverlayFS to be consistent with
the previous behavior -- no cross-device copy_offload, I assume if and
when those file systems are ready to make use of it they'll remove the
check.

> > > > But I don't feel strongly about the check (or rather the location of
> > > > it VFS vs each FS) and what I ultimately need is to removed same sb
> > > > check. It sounds like if Amir isn't objecting, then the check for same
> > > > file system type should be removed from VFS. And, for each of the file
> > > > systems that currently support copy_file_range() -- CIFS, NFS, and
> > > > overlayfs -- I need to modify them and add a check for the same
> > > > fs_type.
> > > >
> > > > Amir to answer your question, only NFSv4.2 has copy_offload
> > > > functionality (not earlier NFS versions). Furthermore, existing
> > > > upstream only supports same sb copy offload. What this patch series
> > > > adds is support for copy offload across different superblocks but NFS
> > > > will not support (and would need a check) for copy offload across
> > > > different file system types. Also I kinda stand behind the ideas
> > > > stated: 1) this is of interest to NFS (where NFS here is to represent
> > > > a community, and CIFS is used to represent the other community). 2)
> > > > NFSv4.2 copy offload a specific feature that needs this functionality.
> > > > 3rd statement is bad. Only NFSv4.2 will allow copies between different
> > > > NFS servers (ie., after this patch +series), the emphasis was on "will
> > > > allow" meaning what this patch will allow to be done (ie, patch's
> > > > purpose). Or also, if the NFS server exports different exports, then a
> > > > copy between them (assuming exports of the same file system type).
> > > >
> > > > In the next version of the patch, I'll do my best to specified what
> > > > changed as the consequence of removing the cross sb check (ie, file
> > > > system types of the passed in file can be from different file
> > > > systems). I will add wording to the man page and add the suggested
> > > > wording to the "porting" file.
>
> --
> Jeff Layton <jlayton@redhat.com>
>
Amir Goldstein Oct. 25, 2018, 4:58 a.m. UTC | #30
On Wed, Oct 24, 2018 at 10:59 PM Olga Kornievskaia
<olga.kornievskaia@gmail.com> wrote:
>
[...]
> It feels like folks are now ok with either the check being in the
> drivers or doing the check in the VFS layer.
>
> I'm picking the choice of not doing the check in the VFS layer because
> it allows for do_splice_direct() by any caller.

I'm sorry, but this reasoning in flawed and this is not the reason that
Matthew promoted not doing same fs type check in vfs.
You did not understand the option that I was promoting to begin with.
What I suggested was:

1. Remove current same sb check in beginning of vfs_copy_file_range()
2. Check sb && ->clone_file_range
3. Check same sb type && ->copy_file_range
4. Cross fs do_splice_direct()

It's fine that you chose not to check for same fs type in VFS before calling
copy_file_range() method, but still requires an ACK from Al that he agrees
with passing in file * of another filesystem on the interface.

> I'm about to submit
> the new version of the patches (this time I will include the NFS patch
> series). We can continue with the discussion on the new version.
>
> I have added checks for the CIFS and OverlayFS to be consistent with
> the previous behavior -- no cross-device copy_offload, I assume if and
> when those file systems are ready to make use of it they'll remove the
> check.
>

Actually overlayfs code is "ready" for cross sb copy, but neither nfs nor
cifs are supported as upper file system, so it doesn't matter much.

Thanks,
Amir.
Olga Kornievskaia Oct. 25, 2018, 3:58 p.m. UTC | #31
On Thu, Oct 25, 2018 at 12:59 AM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Wed, Oct 24, 2018 at 10:59 PM Olga Kornievskaia
> <olga.kornievskaia@gmail.com> wrote:
> >
> [...]
> > It feels like folks are now ok with either the check being in the
> > drivers or doing the check in the VFS layer.
> >
> > I'm picking the choice of not doing the check in the VFS layer because
> > it allows for do_splice_direct() by any caller.
>
> I'm sorry, but this reasoning in flawed and this is not the reason that
> Matthew promoted not doing same fs type check in vfs.

I stated the reason why I picked to do the check at the driver layer.
Looking at your version of the sb type check to be only applied to the
copy_file_range indeed makes my argument invalid. I was under the
impression that sb type check was being proposed as a standalone check
(just like the sb check was standalone). Thus, yes I didn't completely
understand what you proposed.

> You did not understand the option that I was promoting to begin with.
> What I suggested was:
>
> 1. Remove current same sb check in beginning of vfs_copy_file_range()
> 2. Check sb && ->clone_file_range
> 3. Check same sb type && ->copy_file_range
> 4. Cross fs do_splice_direct()
>
> It's fine that you chose not to check for same fs type in VFS before calling
> copy_file_range() method, but still requires an ACK from Al that he agrees
> with passing in file * of another filesystem on the interface.

Al, can you please provide a final decision as to which way you would
prefer for this to be done.

> > I'm about to submit
> > the new version of the patches (this time I will include the NFS patch
> > series). We can continue with the discussion on the new version.
> >
> > I have added checks for the CIFS and OverlayFS to be consistent with
> > the previous behavior -- no cross-device copy_offload, I assume if and
> > when those file systems are ready to make use of it they'll remove the
> > check.
> >
>
> Actually overlayfs code is "ready" for cross sb copy, but neither nfs nor
> cifs are supported as upper file system, so it doesn't matter much.

So then the commit statement is still true. When overlayfs will have
upper file systems that do support it, this check can be removed.

>
> Thanks,
> Amir.
Olga Kornievskaia Oct. 25, 2018, 4 p.m. UTC | #32
On Thu, Oct 25, 2018 at 11:58 AM Olga Kornievskaia
<olga.kornievskaia@gmail.com> wrote:
>
> On Thu, Oct 25, 2018 at 12:59 AM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Wed, Oct 24, 2018 at 10:59 PM Olga Kornievskaia
> > <olga.kornievskaia@gmail.com> wrote:
> > >
> > [...]
> > > It feels like folks are now ok with either the check being in the
> > > drivers or doing the check in the VFS layer.
> > >
> > > I'm picking the choice of not doing the check in the VFS layer because
> > > it allows for do_splice_direct() by any caller.
> >
> > I'm sorry, but this reasoning in flawed and this is not the reason that
> > Matthew promoted not doing same fs type check in vfs.
>
> I stated the reason why I picked to do the check at the driver layer.
> Looking at your version of the sb type check to be only applied to the
> copy_file_range indeed makes my argument invalid. I was under the
> impression that sb type check was being proposed as a standalone check
> (just like the sb check was standalone). Thus, yes I didn't completely
> understand what you proposed.
>
> > You did not understand the option that I was promoting to begin with.
> > What I suggested was:
> >
> > 1. Remove current same sb check in beginning of vfs_copy_file_range()
> > 2. Check sb && ->clone_file_range
> > 3. Check same sb type && ->copy_file_range
> > 4. Cross fs do_splice_direct()
> >
> > It's fine that you chose not to check for same fs type in VFS before calling
> > copy_file_range() method, but still requires an ACK from Al that he agrees
> > with passing in file * of another filesystem on the interface.
>
> Al, can you please provide a final decision as to which way you would
> prefer for this to be done.
>
> > > I'm about to submit
> > > the new version of the patches (this time I will include the NFS patch
> > > series). We can continue with the discussion on the new version.
> > >
> > > I have added checks for the CIFS and OverlayFS to be consistent with
> > > the previous behavior -- no cross-device copy_offload, I assume if and
> > > when those file systems are ready to make use of it they'll remove the
> > > check.
> > >
> >
> > Actually overlayfs code is "ready" for cross sb copy, but neither nfs nor
> > cifs are supported as upper file system, so it doesn't matter much.
>
> So then the commit statement is still true. When overlayfs will have
> upper file systems that do support it, this check can be removed.

Ops sorry I meant them as questions. Do you feel that commit message
needs to be changed then?
>
> >
> > Thanks,
> > Amir.
Amir Goldstein Oct. 25, 2018, 4:57 p.m. UTC | #33
On Thu, Oct 25, 2018 at 7:00 PM Olga Kornievskaia
<olga.kornievskaia@gmail.com> wrote:
>
> On Thu, Oct 25, 2018 at 11:58 AM Olga Kornievskaia
> <olga.kornievskaia@gmail.com> wrote:
> >
> > On Thu, Oct 25, 2018 at 12:59 AM Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > On Wed, Oct 24, 2018 at 10:59 PM Olga Kornievskaia
> > > <olga.kornievskaia@gmail.com> wrote:
> > > >
> > > [...]
> > > > It feels like folks are now ok with either the check being in the
> > > > drivers or doing the check in the VFS layer.
> > > >
> > > > I'm picking the choice of not doing the check in the VFS layer because
> > > > it allows for do_splice_direct() by any caller.
> > >
> > > I'm sorry, but this reasoning in flawed and this is not the reason that
> > > Matthew promoted not doing same fs type check in vfs.
> >
> > I stated the reason why I picked to do the check at the driver layer.
> > Looking at your version of the sb type check to be only applied to the
> > copy_file_range indeed makes my argument invalid. I was under the
> > impression that sb type check was being proposed as a standalone check
> > (just like the sb check was standalone). Thus, yes I didn't completely
> > understand what you proposed.
> >
> > > You did not understand the option that I was promoting to begin with.
> > > What I suggested was:
> > >
> > > 1. Remove current same sb check in beginning of vfs_copy_file_range()
> > > 2. Check sb && ->clone_file_range
> > > 3. Check same sb type && ->copy_file_range
> > > 4. Cross fs do_splice_direct()
> > >
> > > It's fine that you chose not to check for same fs type in VFS before calling
> > > copy_file_range() method, but still requires an ACK from Al that he agrees
> > > with passing in file * of another filesystem on the interface.
> >
> > Al, can you please provide a final decision as to which way you would
> > prefer for this to be done.
> >
> > > > I'm about to submit
> > > > the new version of the patches (this time I will include the NFS patch
> > > > series). We can continue with the discussion on the new version.
> > > >
> > > > I have added checks for the CIFS and OverlayFS to be consistent with
> > > > the previous behavior -- no cross-device copy_offload, I assume if and
> > > > when those file systems are ready to make use of it they'll remove the
> > > > check.
> > > >
> > >
> > > Actually overlayfs code is "ready" for cross sb copy, but neither nfs nor
> > > cifs are supported as upper file system, so it doesn't matter much.
> >
> > So then the commit statement is still true. When overlayfs will have
> > upper file systems that do support it, this check can be removed.
>
> Ops sorry I meant them as questions. Do you feel that commit message
> needs to be changed then?

Commit message is fine by me.

Thanks,
Amir.
diff mbox series

Patch

diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index a6c6a8a..5e520de 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -958,7 +958,9 @@  otherwise noted.
 
   fallocate: called by the VFS to preallocate blocks or punch a hole.
 
-  copy_file_range: called by the copy_file_range(2) system call.
+  copy_file_range: called by copy_file_range(2) system call. This method
+		   works on two file descriptors that might reside on
+		   different superblocks of the same type of file system.
 
   clone_file_range: called by the ioctl(2) system call for FICLONERANGE and
 	FICLONE commands.
diff --git a/fs/read_write.c b/fs/read_write.c
index c60790f..474e740 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1578,10 +1578,6 @@  ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	    (file_out->f_flags & O_APPEND))
 		return -EBADF;
 
-	/* this could be relaxed once a method supports cross-fs copies */
-	if (inode_in->i_sb != inode_out->i_sb)
-		return -EXDEV;
-
 	if (len == 0)
 		return 0;
 
@@ -1591,7 +1587,8 @@  ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	 * Try cloning first, this is supported by more file systems, and
 	 * more efficient if both clone and copy are supported (e.g. NFS).
 	 */
-	if (file_in->f_op->clone_file_range) {
+	if (inode_in->i_sb == inode_out->i_sb &&
+			file_in->f_op->clone_file_range) {
 		ret = file_in->f_op->clone_file_range(file_in, pos_in,
 				file_out, pos_out, len);
 		if (ret == 0) {
@@ -1600,10 +1597,12 @@  ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 		}
 	}
 
-	if (file_out->f_op->copy_file_range) {
+	if (file_out->f_op->copy_file_range &&
+			(file_in->f_op->copy_file_range ==
+				file_out->f_op->copy_file_range)) {
 		ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out,
 						      pos_out, len, flags);
-		if (ret != -EOPNOTSUPP)
+		if (ret != -EOPNOTSUPP && ret != -EXDEV)
 			goto done;
 	}