Message ID | 20181019153018.32507-2-olga.kornievskaia@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v1,01/11] fs: Don't copy beyond the end of the file | expand |
On Fri, Oct 19, 2018 at 6:30 PM Olga Kornievskaia <olga.kornievskaia@gmail.com> wrote: > > From: Olga Kornievskaia <kolga@netapp.com> > > Allow copy_file_range to copy between different superblocks but only > of the same file system types. This feature was of interest to CIFS > as well as NFS. > > This feature is needed by NFSv4.2 to perform file copy operation on > the same server or file copy between different NFSv4.2 servers. > > If a file system's fileoperations copy_file_range operation prohibits > cross-device copies, fall back to do_splice_direct. This would be > needed for the NFS (destination) server side implementation of the > file copy and currently for CIFS. > > Besides NFS, there is only 1 implementor of the copy_file_range FS > operation -- CIFS. CIFS assumes incoming file descriptors are both > CIFS but it will check if they are coming from different servers and > return error code to fall back to do_splice_direct. > > NFS will allow for copies between different NFS servers. > > Adding to the vfs.txt documentation to explicitly warn about allowing > for different superblocks of the same file type to be passed into the > copy_file_range for the future users of copy_file_range method. > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > --- > Documentation/filesystems/vfs.txt | 4 +++- > fs/read_write.c | 13 ++++++------- > 2 files changed, 9 insertions(+), 8 deletions(-) > > diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt > index a6c6a8a..5e520de 100644 > --- a/Documentation/filesystems/vfs.txt > +++ b/Documentation/filesystems/vfs.txt > @@ -958,7 +958,9 @@ otherwise noted. > > fallocate: called by the VFS to preallocate blocks or punch a hole. > > - copy_file_range: called by the copy_file_range(2) system call. > + copy_file_range: called by copy_file_range(2) system call. This method > + works on two file descriptors that might reside on > + different superblocks of the same type of file system. > > clone_file_range: called by the ioctl(2) system call for FICLONERANGE and > FICLONE commands. > diff --git a/fs/read_write.c b/fs/read_write.c > index c60790f..474e740 100644 > --- a/fs/read_write.c > +++ b/fs/read_write.c > @@ -1578,10 +1578,6 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > (file_out->f_flags & O_APPEND)) > return -EBADF; > > - /* this could be relaxed once a method supports cross-fs copies */ > - if (inode_in->i_sb != inode_out->i_sb) > - return -EXDEV; > - You need to hoist this limitation to clone_file_range() syscall, because you are not allowed to change user facing behavior. Maybe you can later add a uapi flag for copy_file_range() to explicitly allow for cross-sb copy? Maybe you can add the flag now for internal use - only nfsv4 will pass that flag to the vfs helper and system call will verify that flags == 0. Thanks, Amir.
On Fri, Oct 19, 2018 at 6:54 PM Amir Goldstein <amir73il@gmail.com> wrote: > > On Fri, Oct 19, 2018 at 6:30 PM Olga Kornievskaia > <olga.kornievskaia@gmail.com> wrote: > > > > From: Olga Kornievskaia <kolga@netapp.com> > > > > Allow copy_file_range to copy between different superblocks but only > > of the same file system types. This feature was of interest to CIFS > > as well as NFS. > > > > This feature is needed by NFSv4.2 to perform file copy operation on > > the same server or file copy between different NFSv4.2 servers. > > > > If a file system's fileoperations copy_file_range operation prohibits > > cross-device copies, fall back to do_splice_direct. This would be > > needed for the NFS (destination) server side implementation of the > > file copy and currently for CIFS. > > > > Besides NFS, there is only 1 implementor of the copy_file_range FS > > operation -- CIFS. CIFS assumes incoming file descriptors are both > > CIFS but it will check if they are coming from different servers and > > return error code to fall back to do_splice_direct. > > > > NFS will allow for copies between different NFS servers. > > > > Adding to the vfs.txt documentation to explicitly warn about allowing > > for different superblocks of the same file type to be passed into the > > copy_file_range for the future users of copy_file_range method. > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > > --- > > Documentation/filesystems/vfs.txt | 4 +++- > > fs/read_write.c | 13 ++++++------- > > 2 files changed, 9 insertions(+), 8 deletions(-) > > > > diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt > > index a6c6a8a..5e520de 100644 > > --- a/Documentation/filesystems/vfs.txt > > +++ b/Documentation/filesystems/vfs.txt > > @@ -958,7 +958,9 @@ otherwise noted. > > > > fallocate: called by the VFS to preallocate blocks or punch a hole. > > > > - copy_file_range: called by the copy_file_range(2) system call. > > + copy_file_range: called by copy_file_range(2) system call. This method > > + works on two file descriptors that might reside on > > + different superblocks of the same type of file system. > > > > clone_file_range: called by the ioctl(2) system call for FICLONERANGE and > > FICLONE commands. > > diff --git a/fs/read_write.c b/fs/read_write.c > > index c60790f..474e740 100644 > > --- a/fs/read_write.c > > +++ b/fs/read_write.c > > @@ -1578,10 +1578,6 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > > (file_out->f_flags & O_APPEND)) > > return -EBADF; > > > > - /* this could be relaxed once a method supports cross-fs copies */ > > - if (inode_in->i_sb != inode_out->i_sb) > > - return -EXDEV; > > - > > You need to hoist this limitation to clone_file_range() syscall, because > you are not allowed to change user facing behavior. > Maybe you can later add a uapi flag for copy_file_range() to explicitly allow > for cross-sb copy? > Maybe you can add the flag now for internal use - only nfsv4 will pass that > flag to the vfs helper and system call will verify that flags == 0. > Sorry, you meant to allow cross sb copy from client, so you do want to change user facing behavior. For that you need to declare a flag to opt in for the new behavior and document it. Please keep linux-api CC-ed when posting the flag. Referencing a draft for man page is recommended. Thanks, Amir.
On Fri, Oct 19, 2018 at 11:55 AM Amir Goldstein <amir73il@gmail.com> wrote: > > On Fri, Oct 19, 2018 at 6:30 PM Olga Kornievskaia > <olga.kornievskaia@gmail.com> wrote: > > > > From: Olga Kornievskaia <kolga@netapp.com> > > > > Allow copy_file_range to copy between different superblocks but only > > of the same file system types. This feature was of interest to CIFS > > as well as NFS. > > > > This feature is needed by NFSv4.2 to perform file copy operation on > > the same server or file copy between different NFSv4.2 servers. > > > > If a file system's fileoperations copy_file_range operation prohibits > > cross-device copies, fall back to do_splice_direct. This would be > > needed for the NFS (destination) server side implementation of the > > file copy and currently for CIFS. > > > > Besides NFS, there is only 1 implementor of the copy_file_range FS > > operation -- CIFS. CIFS assumes incoming file descriptors are both > > CIFS but it will check if they are coming from different servers and > > return error code to fall back to do_splice_direct. > > > > NFS will allow for copies between different NFS servers. > > > > Adding to the vfs.txt documentation to explicitly warn about allowing > > for different superblocks of the same file type to be passed into the > > copy_file_range for the future users of copy_file_range method. > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > > --- > > Documentation/filesystems/vfs.txt | 4 +++- > > fs/read_write.c | 13 ++++++------- > > 2 files changed, 9 insertions(+), 8 deletions(-) > > > > diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt > > index a6c6a8a..5e520de 100644 > > --- a/Documentation/filesystems/vfs.txt > > +++ b/Documentation/filesystems/vfs.txt > > @@ -958,7 +958,9 @@ otherwise noted. > > > > fallocate: called by the VFS to preallocate blocks or punch a hole. > > > > - copy_file_range: called by the copy_file_range(2) system call. > > + copy_file_range: called by copy_file_range(2) system call. This method > > + works on two file descriptors that might reside on > > + different superblocks of the same type of file system. > > > > clone_file_range: called by the ioctl(2) system call for FICLONERANGE and > > FICLONE commands. > > diff --git a/fs/read_write.c b/fs/read_write.c > > index c60790f..474e740 100644 > > --- a/fs/read_write.c > > +++ b/fs/read_write.c > > @@ -1578,10 +1578,6 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > > (file_out->f_flags & O_APPEND)) > > return -EBADF; > > > > - /* this could be relaxed once a method supports cross-fs copies */ > > - if (inode_in->i_sb != inode_out->i_sb) > > - return -EXDEV; > > - > > You need to hoist this limitation to clone_file_range() syscall, because > you are not allowed to change user facing behavior. > Maybe you can later add a uapi flag for copy_file_range() to explicitly allow > for cross-sb copy? Sure I can do that. > Maybe you can add the flag now for internal use - only nfsv4 will pass that > flag to the vfs helper and system call will verify that flags == 0. Not only NFS wants it CIFS want it too. > > Thanks, > Amir.
On Fri, Oct 19, 2018 at 12:24 PM Olga Kornievskaia <olga.kornievskaia@gmail.com> wrote: > > On Fri, Oct 19, 2018 at 11:55 AM Amir Goldstein <amir73il@gmail.com> wrote: > > > > On Fri, Oct 19, 2018 at 6:30 PM Olga Kornievskaia > > <olga.kornievskaia@gmail.com> wrote: > > > > > > From: Olga Kornievskaia <kolga@netapp.com> > > > > > > Allow copy_file_range to copy between different superblocks but only > > > of the same file system types. This feature was of interest to CIFS > > > as well as NFS. > > > > > > This feature is needed by NFSv4.2 to perform file copy operation on > > > the same server or file copy between different NFSv4.2 servers. > > > > > > If a file system's fileoperations copy_file_range operation prohibits > > > cross-device copies, fall back to do_splice_direct. This would be > > > needed for the NFS (destination) server side implementation of the > > > file copy and currently for CIFS. > > > > > > Besides NFS, there is only 1 implementor of the copy_file_range FS > > > operation -- CIFS. CIFS assumes incoming file descriptors are both > > > CIFS but it will check if they are coming from different servers and > > > return error code to fall back to do_splice_direct. > > > > > > NFS will allow for copies between different NFS servers. > > > > > > Adding to the vfs.txt documentation to explicitly warn about allowing > > > for different superblocks of the same file type to be passed into the > > > copy_file_range for the future users of copy_file_range method. > > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > > > --- > > > Documentation/filesystems/vfs.txt | 4 +++- > > > fs/read_write.c | 13 ++++++------- > > > 2 files changed, 9 insertions(+), 8 deletions(-) > > > > > > diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt > > > index a6c6a8a..5e520de 100644 > > > --- a/Documentation/filesystems/vfs.txt > > > +++ b/Documentation/filesystems/vfs.txt > > > @@ -958,7 +958,9 @@ otherwise noted. > > > > > > fallocate: called by the VFS to preallocate blocks or punch a hole. > > > > > > - copy_file_range: called by the copy_file_range(2) system call. > > > + copy_file_range: called by copy_file_range(2) system call. This method > > > + works on two file descriptors that might reside on > > > + different superblocks of the same type of file system. > > > > > > clone_file_range: called by the ioctl(2) system call for FICLONERANGE and > > > FICLONE commands. > > > diff --git a/fs/read_write.c b/fs/read_write.c > > > index c60790f..474e740 100644 > > > --- a/fs/read_write.c > > > +++ b/fs/read_write.c > > > @@ -1578,10 +1578,6 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > > > (file_out->f_flags & O_APPEND)) > > > return -EBADF; > > > > > > - /* this could be relaxed once a method supports cross-fs copies */ > > > - if (inode_in->i_sb != inode_out->i_sb) > > > - return -EXDEV; > > > - > > > > You need to hoist this limitation to clone_file_range() syscall, because > > you are not allowed to change user facing behavior. > > Maybe you can later add a uapi flag for copy_file_range() to explicitly allow > > for cross-sb copy? > > Sure I can do that. > > > Maybe you can add the flag now for internal use - only nfsv4 will pass that > > flag to the vfs helper and system call will verify that flags == 0. > > Not only NFS wants it CIFS want it too. Can you please elaborate on what do you mean "internal use only"? nfsv4 doesn't call copy_file_range(), a user land application (such as glibc cp, or whatever does a system call for the copy_file_range()) is the one calling and doing a copy between two different server (ie superblocks). Thus the system call can't keep the check "flags == 0". I can (maybe?) add user level flags that copy_file_range() system call will pass, I will have to remove the "flags == 0" check and then I can add a check if such a flag is passed, then skip the check for the cross device check in copy_file_range(). Is this what's desired? > > > > > Thanks, > > Amir.
On Fri, Oct 19, 2018 at 07:14:23PM +0300, Amir Goldstein wrote: > > On Fri, Oct 19, 2018 at 6:30 PM Olga Kornievskaia > > <olga.kornievskaia@gmail.com> wrote: > > > Allow copy_file_range to copy between different superblocks but only > > > of the same file system types. This feature was of interest to CIFS > > > as well as NFS. > > Sorry, you meant to allow cross sb copy from client, so you do want > to change user facing behavior. > For that you need to declare a flag to opt in for the new behavior > and document it. > Please keep linux-api CC-ed when posting the flag. > Referencing a draft for man page is recommended. Huh? We're allowed to make things work that used to fail. There's no need to add a flag that the user must set.
On Fri, Oct 19, 2018 at 8:44 PM Matthew Wilcox <willy@infradead.org> wrote: > > On Fri, Oct 19, 2018 at 07:14:23PM +0300, Amir Goldstein wrote: > > > On Fri, Oct 19, 2018 at 6:30 PM Olga Kornievskaia > > > <olga.kornievskaia@gmail.com> wrote: > > > > Allow copy_file_range to copy between different superblocks but only > > > > of the same file system types. This feature was of interest to CIFS > > > > as well as NFS. > > > > Sorry, you meant to allow cross sb copy from client, so you do want > > to change user facing behavior. > > For that you need to declare a flag to opt in for the new behavior > > and document it. > > Please keep linux-api CC-ed when posting the flag. > > Referencing a draft for man page is recommended. > > Huh? We're allowed to make things work that used to fail. There's no > need to add a flag that the user must set. OK, you are right. Maybe this case doesn't call for an opt-in flag. I'll let other people weight in. I should note that the behavior of copy_file_range() has already changed silently to try to clone_file_range() first and that went quietly... Thanks, Amir.
On Fri, Oct 19, 2018 at 11:30:18AM -0400, Olga Kornievskaia wrote: > +++ b/Documentation/filesystems/vfs.txt > @@ -958,7 +958,9 @@ otherwise noted. > > fallocate: called by the VFS to preallocate blocks or punch a hole. > > - copy_file_range: called by the copy_file_range(2) system call. > + copy_file_range: called by copy_file_range(2) system call. This method > + works on two file descriptors that might reside on > + different superblocks of the same type of file system. I don't think this text is explicit enough about what has changed, and I think this is the wrong place for it. I think there should be a paragraph in Documentation/filesystems/porting and it should follow the current style in there. > @@ -1591,7 +1587,8 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > * Try cloning first, this is supported by more file systems, and > * more efficient if both clone and copy are supported (e.g. NFS). > */ > - if (file_in->f_op->clone_file_range) { > + if (inode_in->i_sb == inode_out->i_sb && > + file_in->f_op->clone_file_range) { This reads weirdly to me. I know it's the same order the tests were done in before, but it would feel more natural to me to test: if (file_in->f_op->clone_file_range && inode_in->i_sb == inode_out->i_sb) Am I just suffering from "I would have done this differently"itis, or is it unnatural? > @@ -1600,10 +1597,12 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > } > } > > - if (file_out->f_op->copy_file_range) { > + if (file_out->f_op->copy_file_range && > + (file_in->f_op->copy_file_range == > + file_out->f_op->copy_file_range)) { Can we avoid this extra test here? I know the various stamdards groups including T10 and the IETF have been trying to define a universal identifier for the same blob of storage, no matter how it's accessed; potentially allowing access to the same storage across iSCSI, CIFS and NFS. If we ever get to a point where we support that (and I am dubious), we'd want to remove this test again, and have to revalidate all the filesystems.
On Fri, Oct 19, 2018 at 1:58 PM Matthew Wilcox <willy@infradead.org> wrote: > > On Fri, Oct 19, 2018 at 11:30:18AM -0400, Olga Kornievskaia wrote: > > +++ b/Documentation/filesystems/vfs.txt > > @@ -958,7 +958,9 @@ otherwise noted. > > > > fallocate: called by the VFS to preallocate blocks or punch a hole. > > > > - copy_file_range: called by the copy_file_range(2) system call. > > + copy_file_range: called by copy_file_range(2) system call. This method > > + works on two file descriptors that might reside on > > + different superblocks of the same type of file system. > > I don't think this text is explicit enough about what has changed, Can you suggest a different wording that would be stronger? I can't say any more clear that copy is allowed between different superblock which wasn't the case before. > and I > think this is the wrong place for it. I think there should be a paragraph > in Documentation/filesystems/porting and it should follow the current style > in there. I have looked at the "porting" file and it's cryptic. I don't understand what [mandatory] [recommended] stanzas are. There is currently no copy_file_range. Is this just a "changelog" and you are looking for something like [mandatory] copy_file_range() now allows copying between different superblocks. I can add this wording to the "porting" file. > > @@ -1591,7 +1587,8 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > > * Try cloning first, this is supported by more file systems, and > > * more efficient if both clone and copy are supported (e.g. NFS). > > */ > > - if (file_in->f_op->clone_file_range) { > > + if (inode_in->i_sb == inode_out->i_sb && > > + file_in->f_op->clone_file_range) { > > This reads weirdly to me. I know it's the same order the tests were done > in before, but it would feel more natural to me to test: > > if (file_in->f_op->clone_file_range && > inode_in->i_sb == inode_out->i_sb) > > Am I just suffering from "I would have done this differently"itis, or > is it unnatural? I really don't care one way or another as long as the functionality goes in. I can change it. > > @@ -1600,10 +1597,12 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > > } > > } > > > > - if (file_out->f_op->copy_file_range) { > > + if (file_out->f_op->copy_file_range && > > + (file_in->f_op->copy_file_range == > > + file_out->f_op->copy_file_range)) { > > Can we avoid this extra test here? I know the various stamdards groups > including T10 and the IETF have been trying to define a universal > identifier for the same blob of storage, no matter how it's accessed; > potentially allowing access to the same storage across iSCSI, CIFS > and NFS. If we ever get to a point where we support that (and I am > dubious), we'd want to remove this test again, and have to revalidate > all the filesystems. Well from previous submissions I was explicitly asked to add this check.
On Fri, Oct 19, 2018 at 02:47:42PM -0400, Olga Kornievskaia wrote: > On Fri, Oct 19, 2018 at 1:58 PM Matthew Wilcox <willy@infradead.org> wrote: > > > > On Fri, Oct 19, 2018 at 11:30:18AM -0400, Olga Kornievskaia wrote: > > > +++ b/Documentation/filesystems/vfs.txt > > > @@ -958,7 +958,9 @@ otherwise noted. > > > > > > fallocate: called by the VFS to preallocate blocks or punch a hole. > > > > > > - copy_file_range: called by the copy_file_range(2) system call. > > > + copy_file_range: called by copy_file_range(2) system call. This method > > > + works on two file descriptors that might reside on > > > + different superblocks of the same type of file system. > > > > I don't think this text is explicit enough about what has changed, > > Can you suggest a different wording that would be stronger? I can't > say any more clear that copy is allowed between different superblock > which wasn't the case before. I would leave this file alone. > > and I > > think this is the wrong place for it. I think there should be a paragraph > > in Documentation/filesystems/porting and it should follow the current style > > in there. > > I have looked at the "porting" file and it's cryptic. I don't > understand what [mandatory] [recommended] stanzas are. There is > currently no copy_file_range. Is this just a "changelog" and you are > looking for something like > [mandatory] > copy_file_range() now allows copying between different superblocks. > > I can add this wording to the "porting" file. The porting file is written from the point of view of someone who's trying to port an old filesystem to current Linux. Maybe something like [mandatory] ->copy_file_range() may now be passed files which belong to two different filesystems. The destination's copy_file_range() is the function which is called. If it cannot copy ranges from the source, it should return -EXDEV. > > > - if (file_out->f_op->copy_file_range) { > > > + if (file_out->f_op->copy_file_range && > > > + (file_in->f_op->copy_file_range == > > > + file_out->f_op->copy_file_range)) { > > > > Can we avoid this extra test here? I know the various stamdards groups > > including T10 and the IETF have been trying to define a universal > > identifier for the same blob of storage, no matter how it's accessed; > > potentially allowing access to the same storage across iSCSI, CIFS > > and NFS. If we ever get to a point where we support that (and I am > > dubious), we'd want to remove this test again, and have to revalidate > > all the filesystems. > > Well from previous submissions I was explicitly asked to add this check. I'm not sure that's exactly what Jeff was asking for. Or maybe it was and my argument above will change his mind. Jeff, what do you think? If we do do this, cifs will need a modification as part of this patch to reject non-CIFS files as it currently assumes that src_file->private_data is a pointer to a struct cifsFileInfo.
On Fri, Oct 19, 2018 at 11:24 AM Olga Kornievskaia <olga.kornievskaia@gmail.com> wrote: > > On Fri, Oct 19, 2018 at 11:55 AM Amir Goldstein <amir73il@gmail.com> wrote: > > > > On Fri, Oct 19, 2018 at 6:30 PM Olga Kornievskaia > > <olga.kornievskaia@gmail.com> wrote: > > > > > > From: Olga Kornievskaia <kolga@netapp.com> > > > > > > Allow copy_file_range to copy between different superblocks but only > > > of the same file system types. This feature was of interest to CIFS > > > as well as NFS. > > > > > > This feature is needed by NFSv4.2 to perform file copy operation on > > > the same server or file copy between different NFSv4.2 servers. > > > > > > If a file system's fileoperations copy_file_range operation prohibits > > > cross-device copies, fall back to do_splice_direct. This would be > > > needed for the NFS (destination) server side implementation of the > > > file copy and currently for CIFS. > > > > > > Besides NFS, there is only 1 implementor of the copy_file_range FS > > > operation -- CIFS. CIFS assumes incoming file descriptors are both > > > CIFS but it will check if they are coming from different servers and > > > return error code to fall back to do_splice_direct. > > > > > > NFS will allow for copies between different NFS servers. > > > <snip> > > Maybe you can add the flag now for internal use - only nfsv4 will pass that > > flag to the vfs helper and system call will verify that flags == 0. > > Not only NFS wants it CIFS want it too. Allowing copy offload across distinct smb3 mounts would be very helpful for cifs.ko Yes - note that many servers that cifs.ko (SMB3) has to deal with have already broadly implemented copy offload (multiple flavors, not just the 'copy chunk' style) and 100s of millions of these systems (not just Windows and Samba) support them as long as the source and target of the copy are to the same server. It is the client side and lack of tools and to some extent cross-mount copies that is holding it back from being more widely used. One reason I have only implemented two of the copy offload mechanisms for SMB3 ('duplicate extents' and 'copy chunk') and not the third, the popular 'odx' (T10 style) copy offload is in part that the value of the 'odx' style is more when the source and target are different servers. Note that this is broadly implemented on servers even on other Unix (see e.g. Nexenta's description of server impmlementation of 'odx' style https://www.slideshare.net/gordonross/smb3-offload-data-transfer-odx) so the client API enhancements would be quickly useful for cifs.ko (SMB3 mounts) use case. -- Thanks, Steve
On Fri, Oct 19, 2018 at 11:30:18AM -0400, Olga Kornievskaia wrote: > Allow copy_file_range to copy between different superblocks but only > of the same file system types. This feature was of interest to CIFS > as well as NFS. > + if (file_out->f_op->copy_file_range && > + (file_in->f_op->copy_file_range == > + file_out->f_op->copy_file_range)) { That is seriously asking for trouble. If at some point we add a library function usable as ->copy_file_range() instance, this will turn into a hard-to-spot problem. Comparing methods like that is best avoided. If you want to compare fs types, do just that - it's not hard. Another potential issue here is the interplay with local filesystems using vfs_clone_file_prep_inodes() (or Darrick's series lifting that into generic code). There we assume the same block size on both sides; that is automatically true if they live on the same superblock, but with your changes it's no longer true.
On Sat, Oct 20, 2018 at 7:06 AM Al Viro <viro@zeniv.linux.org.uk> wrote: > > On Fri, Oct 19, 2018 at 11:30:18AM -0400, Olga Kornievskaia wrote: > > Allow copy_file_range to copy between different superblocks but only > > of the same file system types. This feature was of interest to CIFS > > as well as NFS. > > > + if (file_out->f_op->copy_file_range && > > + (file_in->f_op->copy_file_range == > > + file_out->f_op->copy_file_range)) { > > That is seriously asking for trouble. If at some point we add > a library function usable as ->copy_file_range() instance, this > will turn into a hard-to-spot problem. > > Comparing methods like that is best avoided. If you want to compare > fs types, do just that - it's not hard. Another thing is the commit message claims to: "Allow copy_file_range to copy between different superblocks but only of the same file system types" But what the patch actually does is: "Allow copy_file_range() syscall to copy between different filesystems AND allow calling the filesystems' copy_file_range() method between different superblocks but only of the same file system types" It's probably OK and quite useful to do the former, but maybe man page should be fixed to explicitly mention that the copy is expected to work across filesystems since kernel version XXX (?) If you don't wish to change cross filesystem type behavior and only relax cross super block limitation, then you should replace the same inode->i_sb check above with same inode->i_sb->s_type check instead of doing the check only for calling the filesystem copy_file_range() method. > > Another potential issue here is the interplay with local filesystems > using vfs_clone_file_prep_inodes() (or Darrick's series lifting that > into generic code). There we assume the same block size on both > sides; that is automatically true if they live on the same superblock, > but with your changes it's no longer true. Heh? I don't see that Darrick's work has anything to do with copy_file_range(). It only touched vfs_copy_file_range() for the ->clone_file_range() call, which Olga's patch protects with same sb check. Thanks, Amir.
On Fri, 2018-10-19 at 12:06 -0700, Matthew Wilcox wrote: > On Fri, Oct 19, 2018 at 02:47:42PM -0400, Olga Kornievskaia wrote: > > On Fri, Oct 19, 2018 at 1:58 PM Matthew Wilcox <willy@infradead.org> wrote: > > > > > > On Fri, Oct 19, 2018 at 11:30:18AM -0400, Olga Kornievskaia wrote: > > > > +++ b/Documentation/filesystems/vfs.txt > > > > @@ -958,7 +958,9 @@ otherwise noted. > > > > > > > > fallocate: called by the VFS to preallocate blocks or punch a hole. > > > > > > > > - copy_file_range: called by the copy_file_range(2) system call. > > > > + copy_file_range: called by copy_file_range(2) system call. This method > > > > + works on two file descriptors that might reside on > > > > + different superblocks of the same type of file system. > > > > > > I don't think this text is explicit enough about what has changed, > > > > Can you suggest a different wording that would be stronger? I can't > > say any more clear that copy is allowed between different superblock > > which wasn't the case before. > > I would leave this file alone. > > > > and I > > > think this is the wrong place for it. I think there should be a paragraph > > > in Documentation/filesystems/porting and it should follow the current style > > > in there. > > > > I have looked at the "porting" file and it's cryptic. I don't > > understand what [mandatory] [recommended] stanzas are. There is > > currently no copy_file_range. Is this just a "changelog" and you are > > looking for something like > > [mandatory] > > copy_file_range() now allows copying between different superblocks. > > > > I can add this wording to the "porting" file. > > The porting file is written from the point of view of someone who's trying > to port an old filesystem to current Linux. > > Maybe something like > > [mandatory] > ->copy_file_range() may now be passed files which belong to two > different filesystems. The destination's copy_file_range() is the > function which is called. If it cannot copy ranges from the source, > it should return -EXDEV. > > > > > - if (file_out->f_op->copy_file_range) { > > > > + if (file_out->f_op->copy_file_range && > > > > + (file_in->f_op->copy_file_range == > > > > + file_out->f_op->copy_file_range)) { > > > > > > Can we avoid this extra test here? I know the various stamdards groups > > > including T10 and the IETF have been trying to define a universal > > > identifier for the same blob of storage, no matter how it's accessed; > > > potentially allowing access to the same storage across iSCSI, CIFS > > > and NFS. If we ever get to a point where we support that (and I am > > > dubious), we'd want to remove this test again, and have to revalidate > > > all the filesystems. > > > > Well from previous submissions I was explicitly asked to add this check. > > I'm not sure that's exactly what Jeff was asking for. Or maybe it was > and my argument above will change his mind. Jeff, what do you think? > > If we do do this, cifs will need a modification as part of this patch to > reject non-CIFS files as it currently assumes that src_file->private_data > is a pointer to a struct cifsFileInfo. My suggestion to Olga was more of a "if you feel you have to have a check like this at the vfs layer...", but I think you and Al have convinced me that comparing operations like this is not a good idea. I still think that this check belongs down inside the fs copy_file_range operation itself. That allows the the freedom to implement xdev copies on a per-fs basis later without needing to alter vfs-level code.
On Fri, 2018-10-19 at 10:58 -0700, Matthew Wilcox wrote: > On Fri, Oct 19, 2018 at 11:30:18AM -0400, Olga Kornievskaia wrote: > > +++ b/Documentation/filesystems/vfs.txt > > @@ -958,7 +958,9 @@ otherwise noted. > > > > fallocate: called by the VFS to preallocate blocks or punch a hole. > > > > - copy_file_range: called by the copy_file_range(2) system call. > > + copy_file_range: called by copy_file_range(2) system call. This method > > + works on two file descriptors that might reside on > > + different superblocks of the same type of file system. > > I don't think this text is explicit enough about what has changed, and I > think this is the wrong place for it. I think there should be a paragraph > in Documentation/filesystems/porting and it should follow the current style > in there. > > > @@ -1591,7 +1587,8 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > > * Try cloning first, this is supported by more file systems, and > > * more efficient if both clone and copy are supported (e.g. NFS). > > */ > > - if (file_in->f_op->clone_file_range) { > > + if (inode_in->i_sb == inode_out->i_sb && > > + file_in->f_op->clone_file_range) { > > This reads weirdly to me. I know it's the same order the tests were done > in before, but it would feel more natural to me to test: > > if (file_in->f_op->clone_file_range && > inode_in->i_sb == inode_out->i_sb) > > Am I just suffering from "I would have done this differently"itis, or > is it unnatural? > > > @@ -1600,10 +1597,12 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > > } > > } > > > > - if (file_out->f_op->copy_file_range) { > > + if (file_out->f_op->copy_file_range && > > + (file_in->f_op->copy_file_range == > > + file_out->f_op->copy_file_range)) { > > Can we avoid this extra test here? I know the various stamdards groups > including T10 and the IETF have been trying to define a universal > identifier for the same blob of storage, no matter how it's accessed; > potentially allowing access to the same storage across iSCSI, CIFS > and NFS. If we ever get to a point where we support that (and I am > dubious), we'd want to remove this test again, and have to revalidate > all the filesystems. > Agreed. I think this really ought to be left up to the lower fs. Pass the op both struct file pointers and let it sort it out. We just need to document the expectation that file copy_file_range ops vet file_in carefully as it could be from anything, and have them return an appropriate error code if it's not something they can deal with.
On Fri, Oct 19, 2018 at 3:06 PM Matthew Wilcox <willy@infradead.org> wrote: > > On Fri, Oct 19, 2018 at 02:47:42PM -0400, Olga Kornievskaia wrote: > > On Fri, Oct 19, 2018 at 1:58 PM Matthew Wilcox <willy@infradead.org> wrote: > > > > > > On Fri, Oct 19, 2018 at 11:30:18AM -0400, Olga Kornievskaia wrote: > > > > +++ b/Documentation/filesystems/vfs.txt > > > > @@ -958,7 +958,9 @@ otherwise noted. > > > > > > > > fallocate: called by the VFS to preallocate blocks or punch a hole. > > > > > > > > - copy_file_range: called by the copy_file_range(2) system call. > > > > + copy_file_range: called by copy_file_range(2) system call. This method > > > > + works on two file descriptors that might reside on > > > > + different superblocks of the same type of file system. > > > > > > I don't think this text is explicit enough about what has changed, > > > > Can you suggest a different wording that would be stronger? I can't > > say any more clear that copy is allowed between different superblock > > which wasn't the case before. > > I would leave this file alone. > > > > and I > > > think this is the wrong place for it. I think there should be a paragraph > > > in Documentation/filesystems/porting and it should follow the current style > > > in there. > > > > I have looked at the "porting" file and it's cryptic. I don't > > understand what [mandatory] [recommended] stanzas are. There is > > currently no copy_file_range. Is this just a "changelog" and you are > > looking for something like > > [mandatory] > > copy_file_range() now allows copying between different superblocks. > > > > I can add this wording to the "porting" file. > > The porting file is written from the point of view of someone who's trying > to port an old filesystem to current Linux. > > Maybe something like > > [mandatory] > ->copy_file_range() may now be passed files which belong to two > different filesystems. The destination's copy_file_range() is the > function which is called. If it cannot copy ranges from the source, > it should return -EXDEV. Thank you I can add this to the porting file. > > > > > - if (file_out->f_op->copy_file_range) { > > > > + if (file_out->f_op->copy_file_range && > > > > + (file_in->f_op->copy_file_range == > > > > + file_out->f_op->copy_file_range)) { > > > > > > Can we avoid this extra test here? I know the various stamdards groups > > > including T10 and the IETF have been trying to define a universal > > > identifier for the same blob of storage, no matter how it's accessed; > > > potentially allowing access to the same storage across iSCSI, CIFS > > > and NFS. If we ever get to a point where we support that (and I am > > > dubious), we'd want to remove this test again, and have to revalidate > > > all the filesystems. > > > > Well from previous submissions I was explicitly asked to add this check. > > I'm not sure that's exactly what Jeff was asking for. Or maybe it was > and my argument above will change his mind. Jeff, what do you think? > > If we do do this, cifs will need a modification as part of this patch to > reject non-CIFS files as it currently assumes that src_file->private_data > is a pointer to a struct cifsFileInfo.
On Sat, Oct 20, 2018 at 4:54 AM Amir Goldstein <amir73il@gmail.com> wrote: > > On Sat, Oct 20, 2018 at 7:06 AM Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > On Fri, Oct 19, 2018 at 11:30:18AM -0400, Olga Kornievskaia wrote: > > > Allow copy_file_range to copy between different superblocks but only > > > of the same file system types. This feature was of interest to CIFS > > > as well as NFS. > > > > > + if (file_out->f_op->copy_file_range && > > > + (file_in->f_op->copy_file_range == > > > + file_out->f_op->copy_file_range)) { > > > > That is seriously asking for trouble. If at some point we add > > a library function usable as ->copy_file_range() instance, this > > will turn into a hard-to-spot problem. > > > > Comparing methods like that is best avoided. If you want to compare > > fs types, do just that - it's not hard. > > Another thing is the commit message claims to: > "Allow copy_file_range to copy between different superblocks but only > of the same file system types" > > But what the patch actually does is: > "Allow copy_file_range() syscall to copy between different filesystems > AND allow calling the filesystems' copy_file_range() method > between different superblocks but only of the same file system types" > > It's probably OK and quite useful to do the former, but maybe man page > should be fixed to explicitly mention that the copy is expected to work > across filesystems since kernel version XXX (?) > > If you don't wish to change cross filesystem type behavior and only > relax cross super block limitation, then you should replace the > same inode->i_sb check above with same inode->i_sb->s_type > check instead of doing the check only for calling the filesystem > copy_file_range() method. Thank you for the feedback. In the next version, I will remove the check for the functions and instead check for the same file system types. > > > > > Another potential issue here is the interplay with local filesystems > > using vfs_clone_file_prep_inodes() (or Darrick's series lifting that > > into generic code). There we assume the same block size on both > > sides; that is automatically true if they live on the same superblock, > > but with your changes it's no longer true. > > Heh? I don't see that Darrick's work has anything to do with > copy_file_range(). It only touched vfs_copy_file_range() for > the ->clone_file_range() call, which Olga's patch protects with same sb > check. > > Thanks, > Amir.
On Mon, Oct 22, 2018 at 02:45:04PM -0400, Olga Kornievskaia wrote: > On Sat, Oct 20, 2018 at 4:54 AM Amir Goldstein <amir73il@gmail.com> wrote: > > Another thing is the commit message claims to: > > "Allow copy_file_range to copy between different superblocks but only > > of the same file system types" > > > > But what the patch actually does is: > > "Allow copy_file_range() syscall to copy between different filesystems > > AND allow calling the filesystems' copy_file_range() method > > between different superblocks but only of the same file system types" > > > > It's probably OK and quite useful to do the former, but maybe man page > > should be fixed to explicitly mention that the copy is expected to work > > across filesystems since kernel version XXX (?) > > > > If you don't wish to change cross filesystem type behavior and only > > relax cross super block limitation, then you should replace the > > same inode->i_sb check above with same inode->i_sb->s_type > > check instead of doing the check only for calling the filesystem > > copy_file_range() method. > > Thank you for the feedback. In the next version, I will remove the > check for the functions and instead check for the same file system > types. Jeff and I agree that this is the wrong way to go. Instead, the cross-device check should be in the individual instances, not the top level code.
On Mon, Oct 22, 2018 at 3:06 PM Matthew Wilcox <willy@infradead.org> wrote: > > On Mon, Oct 22, 2018 at 02:45:04PM -0400, Olga Kornievskaia wrote: > > On Sat, Oct 20, 2018 at 4:54 AM Amir Goldstein <amir73il@gmail.com> wrote: > > > Another thing is the commit message claims to: > > > "Allow copy_file_range to copy between different superblocks but only > > > of the same file system types" > > > > > > But what the patch actually does is: > > > "Allow copy_file_range() syscall to copy between different filesystems > > > AND allow calling the filesystems' copy_file_range() method > > > between different superblocks but only of the same file system types" > > > > > > It's probably OK and quite useful to do the former, but maybe man page > > > should be fixed to explicitly mention that the copy is expected to work > > > across filesystems since kernel version XXX (?) > > > > > > If you don't wish to change cross filesystem type behavior and only > > > relax cross super block limitation, then you should replace the > > > same inode->i_sb check above with same inode->i_sb->s_type > > > check instead of doing the check only for calling the filesystem > > > copy_file_range() method. > > > > Thank you for the feedback. In the next version, I will remove the > > check for the functions and instead check for the same file system > > types. > > Jeff and I agree that this is the wrong way to go. Instead, the > cross-device check should be in the individual instances, not the top > level code. So remove the check all together for the VFS (that was my original patch to begin with (like #1 not this one). So am I missing the point again, I keep getting different corrections every time.
On Mon, Oct 22, 2018 at 10:35 PM Olga Kornievskaia <olga.kornievskaia@gmail.com> wrote: > > On Mon, Oct 22, 2018 at 3:06 PM Matthew Wilcox <willy@infradead.org> wrote: > > > > On Mon, Oct 22, 2018 at 02:45:04PM -0400, Olga Kornievskaia wrote: > > > On Sat, Oct 20, 2018 at 4:54 AM Amir Goldstein <amir73il@gmail.com> wrote: > > > > Another thing is the commit message claims to: > > > > "Allow copy_file_range to copy between different superblocks but only > > > > of the same file system types" > > > > > > > > But what the patch actually does is: > > > > "Allow copy_file_range() syscall to copy between different filesystems > > > > AND allow calling the filesystems' copy_file_range() method > > > > between different superblocks but only of the same file system types" > > > > > > > > It's probably OK and quite useful to do the former, but maybe man page > > > > should be fixed to explicitly mention that the copy is expected to work > > > > across filesystems since kernel version XXX (?) > > > > > > > > If you don't wish to change cross filesystem type behavior and only > > > > relax cross super block limitation, then you should replace the > > > > same inode->i_sb check above with same inode->i_sb->s_type > > > > check instead of doing the check only for calling the filesystem > > > > copy_file_range() method. > > > > > > Thank you for the feedback. In the next version, I will remove the > > > check for the functions and instead check for the same file system > > > types. > > > > Jeff and I agree that this is the wrong way to go. Instead, the > > cross-device check should be in the individual instances, not the top > > level code. > > So remove the check all together for the VFS (that was my original > patch to begin with (like #1 not this one). So am I missing the point > again, I keep getting different corrections every time. Because there are different opinions... although you did get the opinion of the VFS maintainer, which was: compare i_sb->s_type. Jeff, Matthew, really, what's the use of "allowing" cross fs type copy inside filesystem code? and which method is going to be called? file_out->f_op->copy_file_range()? file_in->f_op->copy_file_range()? Do we need to check if both are implemented? either? This is just confusing Olga and gives no real value to anyone. If we ever have a filesystem copy_file_range() method that can deal with cross fs type copy, we can change it then when we know the required semantics of that future call. That is not to say that we cannot relax same fs type from copy_file_range() syscall. That has already been done with the current patch, just not officially declared in commit message. Thanks, Amir.
On Mon, Oct 22, 2018 at 10:48:10PM +0300, Amir Goldstein wrote: > On Mon, Oct 22, 2018 at 10:35 PM Olga Kornievskaia > <olga.kornievskaia@gmail.com> wrote: > > So remove the check all together for the VFS (that was my original > > patch to begin with (like #1 not this one). So am I missing the point > > again, I keep getting different corrections every time. > > Because there are different opinions... although you did get the opinion > of the VFS maintainer, which was: compare i_sb->s_type. > > Jeff, Matthew, really, what's the use of "allowing" cross fs type copy inside > filesystem code? and which method is going to be called? > file_out->f_op->copy_file_range()? > file_in->f_op->copy_file_range()? The destination's method, as Olga originally had. > Do we need to check if both are implemented? either? > This is just confusing Olga and gives no real value to anyone. > If we ever have a filesystem copy_file_range() method that can deal > with cross fs type copy, we can change it then when we know the > required semantics of that future call. Wrong. Go back and read my reasoning earlier this thread. > That is not to say that we cannot relax same fs type from copy_file_range() > syscall. That has already been done with the current patch, just not officially > declared in commit message. > > Thanks, > Amir.
On Mon, 2018-10-22 at 15:34 -0400, Olga Kornievskaia wrote: > On Mon, Oct 22, 2018 at 3:06 PM Matthew Wilcox <willy@infradead.org> wrote: > > > > On Mon, Oct 22, 2018 at 02:45:04PM -0400, Olga Kornievskaia wrote: > > > On Sat, Oct 20, 2018 at 4:54 AM Amir Goldstein <amir73il@gmail.com> wrote: > > > > Another thing is the commit message claims to: > > > > "Allow copy_file_range to copy between different superblocks but only > > > > of the same file system types" > > > > > > > > But what the patch actually does is: > > > > "Allow copy_file_range() syscall to copy between different filesystems > > > > AND allow calling the filesystems' copy_file_range() method > > > > between different superblocks but only of the same file system types" > > > > > > > > It's probably OK and quite useful to do the former, but maybe man page > > > > should be fixed to explicitly mention that the copy is expected to work > > > > across filesystems since kernel version XXX (?) > > > > > > > > If you don't wish to change cross filesystem type behavior and only > > > > relax cross super block limitation, then you should replace the > > > > same inode->i_sb check above with same inode->i_sb->s_type > > > > check instead of doing the check only for calling the filesystem > > > > copy_file_range() method. > > > > > > Thank you for the feedback. In the next version, I will remove the > > > check for the functions and instead check for the same file system > > > types. > > > > Jeff and I agree that this is the wrong way to go. Instead, the > > cross-device check should be in the individual instances, not the top > > level code. > > So remove the check all together for the VFS (that was my original > patch to begin with (like #1 not this one). So am I missing the point > again, I keep getting different corrections every time. Sorry if I wasn't clear before: Basically, I think Willy and I are both envisioning that some copy_file_range implementations may eventually not be subject to the limitations of the checks you're adding. All of the current and currently proposed ones are however, so we need these checks in place. We have two options. We can either do them globally at the vfs layer or in the individual filesystem "drivers". I argue that it's better to do it at the driver level, because if we ever want to implement one that is not subject to these limitations, you'll effectively have to push these checks down into the drivers later. That's where things become ugly for backports and such. You'll basically be changing a subtle expectation in the driver interface, which could be a source of bugs later. If we set the expectation now that the drivers need to do this checking then we can (hopefully) ensure that they all do before they're ever merged.
On Tue, Oct 23, 2018 at 2:39 AM Jeff Layton <jlayton@redhat.com> wrote: > > On Mon, 2018-10-22 at 15:34 -0400, Olga Kornievskaia wrote: > > On Mon, Oct 22, 2018 at 3:06 PM Matthew Wilcox <willy@infradead.org> wrote: > > > > > > On Mon, Oct 22, 2018 at 02:45:04PM -0400, Olga Kornievskaia wrote: > > > > On Sat, Oct 20, 2018 at 4:54 AM Amir Goldstein <amir73il@gmail.com> wrote: > > > > > Another thing is the commit message claims to: > > > > > "Allow copy_file_range to copy between different superblocks but only > > > > > of the same file system types" > > > > > > > > > > But what the patch actually does is: > > > > > "Allow copy_file_range() syscall to copy between different filesystems > > > > > AND allow calling the filesystems' copy_file_range() method > > > > > between different superblocks but only of the same file system types" > > > > > > > > > > It's probably OK and quite useful to do the former, but maybe man page > > > > > should be fixed to explicitly mention that the copy is expected to work > > > > > across filesystems since kernel version XXX (?) > > > > > > > > > > If you don't wish to change cross filesystem type behavior and only > > > > > relax cross super block limitation, then you should replace the > > > > > same inode->i_sb check above with same inode->i_sb->s_type > > > > > check instead of doing the check only for calling the filesystem > > > > > copy_file_range() method. > > > > > > > > Thank you for the feedback. In the next version, I will remove the > > > > check for the functions and instead check for the same file system > > > > types. > > > > > > Jeff and I agree that this is the wrong way to go. Instead, the > > > cross-device check should be in the individual instances, not the top > > > level code. > > > > So remove the check all together for the VFS (that was my original > > patch to begin with (like #1 not this one). So am I missing the point > > again, I keep getting different corrections every time. > > Sorry if I wasn't clear before: > > Basically, I think Willy and I are both envisioning that some > copy_file_range implementations may eventually not be subject to the > limitations of the checks you're adding. > Yes. Eventually. And even Matthew is (quote) "dubious" about that ever happening. Changing the interface as Matthew proposed has a price and we need to compare this price to the alleged backporting price that nobody may ever need to pay. As far as I can tell, passing a struct file * on a file_operations method that does not belong to that filesystem in unprecedented (*) and is a far more lethal landmine than the alleged backporting landmine. (*) prior to v4.19-rc1, filesystems could get an overlayfs file, but file_inode(file) has always belonged to the filesystem. Olga, I do not strongly object to Matthew's proposal, so don't feel obligated to choose my side of the argument. I am just trying to offer a different perspective. In any case, my outstanding concerns with the patch are: 1. If you change syscall to support cross fs type copy (which is good IMO) need to document that in commit message and possibly follow up later with a note in man page 2. Commit message says: "This feature was of interest of ... NFS" "This feature is needed by NFSv4.2..." "NFS will allow for copies between different NFS servers." It is not clear to me if we are talking about present of future NFSv4.2 code. If NFSv4.2 currently does not support cross sb copy (??) than your patch need to enforce same sb in nfs4_copy_file_range(). If it does support cross sb copy than please edit the commit message to make that clear. Thanks, Amir.
On Tue, Oct 23, 2018 at 2:05 AM Amir Goldstein <amir73il@gmail.com> wrote: > > On Tue, Oct 23, 2018 at 2:39 AM Jeff Layton <jlayton@redhat.com> wrote: > > > > On Mon, 2018-10-22 at 15:34 -0400, Olga Kornievskaia wrote: > > > On Mon, Oct 22, 2018 at 3:06 PM Matthew Wilcox <willy@infradead.org> wrote: > > > > > > > > On Mon, Oct 22, 2018 at 02:45:04PM -0400, Olga Kornievskaia wrote: > > > > > On Sat, Oct 20, 2018 at 4:54 AM Amir Goldstein <amir73il@gmail.com> wrote: > > > > > > Another thing is the commit message claims to: > > > > > > "Allow copy_file_range to copy between different superblocks but only > > > > > > of the same file system types" > > > > > > > > > > > > But what the patch actually does is: > > > > > > "Allow copy_file_range() syscall to copy between different filesystems > > > > > > AND allow calling the filesystems' copy_file_range() method > > > > > > between different superblocks but only of the same file system types" > > > > > > > > > > > > It's probably OK and quite useful to do the former, but maybe man page > > > > > > should be fixed to explicitly mention that the copy is expected to work > > > > > > across filesystems since kernel version XXX (?) > > > > > > > > > > > > If you don't wish to change cross filesystem type behavior and only > > > > > > relax cross super block limitation, then you should replace the > > > > > > same inode->i_sb check above with same inode->i_sb->s_type > > > > > > check instead of doing the check only for calling the filesystem > > > > > > copy_file_range() method. > > > > > > > > > > Thank you for the feedback. In the next version, I will remove the > > > > > check for the functions and instead check for the same file system > > > > > types. > > > > > > > > Jeff and I agree that this is the wrong way to go. Instead, the > > > > cross-device check should be in the individual instances, not the top > > > > level code. > > > > > > So remove the check all together for the VFS (that was my original > > > patch to begin with (like #1 not this one). So am I missing the point > > > again, I keep getting different corrections every time. > > > > Sorry if I wasn't clear before: > > > > Basically, I think Willy and I are both envisioning that some > > copy_file_range implementations may eventually not be subject to the > > limitations of the checks you're adding. > > > > Yes. Eventually. And even Matthew is (quote) "dubious" about that ever > happening. Changing the interface as Matthew proposed has a price > and we need to compare this price to the alleged backporting price > that nobody may ever need to pay. > > As far as I can tell, passing a struct file * on a file_operations method > that does not belong to that filesystem in unprecedented (*) and is a far > more lethal landmine than the alleged backporting landmine. > > (*) prior to v4.19-rc1, filesystems could get an overlayfs file, but > file_inode(file) has always belonged to the filesystem. > > Olga, > > I do not strongly object to Matthew's proposal, so don't feel > obligated to choose my side of the argument. I am just trying > to offer a different perspective. > > In any case, my outstanding concerns with the patch are: > > 1. If you change syscall to support cross fs type copy (which is > good IMO) need to document that in commit message > and possibly follow up later with a note in man page > > 2. Commit message says: > "This feature was of interest of ... NFS" > "This feature is needed by NFSv4.2..." > "NFS will allow for copies between different NFS servers." > It is not clear to me if we are talking about present of future > NFSv4.2 code. If NFSv4.2 currently does not support cross > sb copy (??) than your patch need to enforce same sb > in nfs4_copy_file_range(). If it does support cross sb copy > than please edit the commit message to make that clear. I personally agree with Amir. I think it's far fetched that a file system would know how to handle something that's not of its type. When the copy_file_range() was checked in, there was a comment above the superblock check saying "we might want to relax this in the future". It deemed appropriate then to enforce the check since none of the file systems used it. Now, the future is here, and we are removing the check but proposing a different once because again the future isn't here and having a single check simplifies the code. But I don't feel strongly about the check (or rather the location of it VFS vs each FS) and what I ultimately need is to removed same sb check. It sounds like if Amir isn't objecting, then the check for same file system type should be removed from VFS. And, for each of the file systems that currently support copy_file_range() -- CIFS, NFS, and overlayfs -- I need to modify them and add a check for the same fs_type. Amir to answer your question, only NFSv4.2 has copy_offload functionality (not earlier NFS versions). Furthermore, existing upstream only supports same sb copy offload. What this patch series adds is support for copy offload across different superblocks but NFS will not support (and would need a check) for copy offload across different file system types. Also I kinda stand behind the ideas stated: 1) this is of interest to NFS (where NFS here is to represent a community, and CIFS is used to represent the other community). 2) NFSv4.2 copy offload a specific feature that needs this functionality. 3rd statement is bad. Only NFSv4.2 will allow copies between different NFS servers (ie., after this patch +series), the emphasis was on "will allow" meaning what this patch will allow to be done (ie, patch's purpose). Or also, if the NFS server exports different exports, then a copy between them (assuming exports of the same file system type). In the next version of the patch, I'll do my best to specified what changed as the consequence of removing the cross sb check (ie, file system types of the passed in file can be from different file systems). I will add wording to the man page and add the suggested wording to the "porting" file.
On Tue, Oct 23, 2018 at 11:03 AM Olga Kornievskaia <olga.kornievskaia@gmail.com> wrote: > > On Tue, Oct 23, 2018 at 2:05 AM Amir Goldstein <amir73il@gmail.com> wrote: > > > > On Tue, Oct 23, 2018 at 2:39 AM Jeff Layton <jlayton@redhat.com> wrote: > > > > > > On Mon, 2018-10-22 at 15:34 -0400, Olga Kornievskaia wrote: > > > > On Mon, Oct 22, 2018 at 3:06 PM Matthew Wilcox <willy@infradead.org> wrote: > > > > > > > > > > On Mon, Oct 22, 2018 at 02:45:04PM -0400, Olga Kornievskaia wrote: > > > > > > On Sat, Oct 20, 2018 at 4:54 AM Amir Goldstein <amir73il@gmail.com> wrote: > > > > > > > Another thing is the commit message claims to: > > > > > > > "Allow copy_file_range to copy between different superblocks but only > > > > > > > of the same file system types" > > > > > > > > > > > > > > But what the patch actually does is: > > > > > > > "Allow copy_file_range() syscall to copy between different filesystems > > > > > > > AND allow calling the filesystems' copy_file_range() method > > > > > > > between different superblocks but only of the same file system types" > > > > > > > > > > > > > > It's probably OK and quite useful to do the former, but maybe man page > > > > > > > should be fixed to explicitly mention that the copy is expected to work > > > > > > > across filesystems since kernel version XXX (?) > > > > > > > > > > > > > > If you don't wish to change cross filesystem type behavior and only > > > > > > > relax cross super block limitation, then you should replace the > > > > > > > same inode->i_sb check above with same inode->i_sb->s_type > > > > > > > check instead of doing the check only for calling the filesystem > > > > > > > copy_file_range() method. > > > > > > > > > > > > Thank you for the feedback. In the next version, I will remove the > > > > > > check for the functions and instead check for the same file system > > > > > > types. > > > > > > > > > > Jeff and I agree that this is the wrong way to go. Instead, the > > > > > cross-device check should be in the individual instances, not the top > > > > > level code. > > > > > > > > So remove the check all together for the VFS (that was my original > > > > patch to begin with (like #1 not this one). So am I missing the point > > > > again, I keep getting different corrections every time. > > > > > > Sorry if I wasn't clear before: > > > > > > Basically, I think Willy and I are both envisioning that some > > > copy_file_range implementations may eventually not be subject to the > > > limitations of the checks you're adding. > > > > > > > Yes. Eventually. And even Matthew is (quote) "dubious" about that ever > > happening. Changing the interface as Matthew proposed has a price > > and we need to compare this price to the alleged backporting price > > that nobody may ever need to pay. > > > > As far as I can tell, passing a struct file * on a file_operations method > > that does not belong to that filesystem in unprecedented (*) and is a far > > more lethal landmine than the alleged backporting landmine. > > > > (*) prior to v4.19-rc1, filesystems could get an overlayfs file, but > > file_inode(file) has always belonged to the filesystem. > > > > Olga, > > > > I do not strongly object to Matthew's proposal, so don't feel > > obligated to choose my side of the argument. I am just trying > > to offer a different perspective. > > > > In any case, my outstanding concerns with the patch are: > > > > 1. If you change syscall to support cross fs type copy (which is > > good IMO) need to document that in commit message > > and possibly follow up later with a note in man page > > > > 2. Commit message says: > > "This feature was of interest of ... NFS" > > "This feature is needed by NFSv4.2..." > > "NFS will allow for copies between different NFS servers." > > It is not clear to me if we are talking about present of future > > NFSv4.2 code. If NFSv4.2 currently does not support cross > > sb copy (??) than your patch need to enforce same sb > > in nfs4_copy_file_range(). If it does support cross sb copy > > than please edit the commit message to make that clear. > > I personally agree with Amir. I think it's far fetched that a file > system would know how to handle something that's not of its type. When > the copy_file_range() was checked in, there was a comment above the > superblock check saying "we might want to relax this in the future". > It deemed appropriate then to enforce the check since none of the file > systems used it. Now, the future is here, and we are removing the > check but proposing a different once because again the future isn't > here and having a single check simplifies the code. Sorry Ok I wrote this too fast. I think I'm changing my mind and siding with the check by the file system. > But I don't feel strongly about the check (or rather the location of > it VFS vs each FS) and what I ultimately need is to removed same sb > check. It sounds like if Amir isn't objecting, then the check for same > file system type should be removed from VFS. And, for each of the file > systems that currently support copy_file_range() -- CIFS, NFS, and > overlayfs -- I need to modify them and add a check for the same > fs_type. > > Amir to answer your question, only NFSv4.2 has copy_offload > functionality (not earlier NFS versions). Furthermore, existing > upstream only supports same sb copy offload. What this patch series > adds is support for copy offload across different superblocks but NFS > will not support (and would need a check) for copy offload across > different file system types. Also I kinda stand behind the ideas > stated: 1) this is of interest to NFS (where NFS here is to represent > a community, and CIFS is used to represent the other community). 2) > NFSv4.2 copy offload a specific feature that needs this functionality. > 3rd statement is bad. Only NFSv4.2 will allow copies between different > NFS servers (ie., after this patch +series), the emphasis was on "will > allow" meaning what this patch will allow to be done (ie, patch's > purpose). Or also, if the NFS server exports different exports, then a > copy between them (assuming exports of the same file system type). > > In the next version of the patch, I'll do my best to specified what > changed as the consequence of removing the cross sb check (ie, file > system types of the passed in file can be from different file > systems). I will add wording to the man page and add the suggested > wording to the "porting" file.
On Tue, Oct 23, 2018 at 11:03:02AM -0400, Olga Kornievskaia wrote: > On Tue, Oct 23, 2018 at 2:05 AM Amir Goldstein <amir73il@gmail.com> wrote: > > On Tue, Oct 23, 2018 at 2:39 AM Jeff Layton <jlayton@redhat.com> wrote: > > > Sorry if I wasn't clear before: > > > > > > Basically, I think Willy and I are both envisioning that some > > > copy_file_range implementations may eventually not be subject to the > > > limitations of the checks you're adding. > > > > > > > Yes. Eventually. And even Matthew is (quote) "dubious" about that ever > > happening. Changing the interface as Matthew proposed has a price > > and we need to compare this price to the alleged backporting price > > that nobody may ever need to pay. > > > > As far as I can tell, passing a struct file * on a file_operations method > > that does not belong to that filesystem in unprecedented (*) and is a far > > more lethal landmine than the alleged backporting landmine. > > > > (*) prior to v4.19-rc1, filesystems could get an overlayfs file, but > > file_inode(file) has always belonged to the filesystem. > > > > Olga, > > > > I do not strongly object to Matthew's proposal, so don't feel > > obligated to choose my side of the argument. I am just trying > > to offer a different perspective. > > > > In any case, my outstanding concerns with the patch are: > > > > 1. If you change syscall to support cross fs type copy (which is > > good IMO) need to document that in commit message > > and possibly follow up later with a note in man page > > > > 2. Commit message says: > > "This feature was of interest of ... NFS" > > "This feature is needed by NFSv4.2..." > > "NFS will allow for copies between different NFS servers." > > It is not clear to me if we are talking about present of future > > NFSv4.2 code. If NFSv4.2 currently does not support cross > > sb copy (??) than your patch need to enforce same sb > > in nfs4_copy_file_range(). If it does support cross sb copy > > than please edit the commit message to make that clear. > > I personally agree with Amir. I think it's far fetched that a file > system would know how to handle something that's not of its type. When > the copy_file_range() was checked in, there was a comment above the > superblock check saying "we might want to relax this in the future". > It deemed appropriate then to enforce the check since none of the file > systems used it. Now, the future is here, and we are removing the > check but proposing a different once because again the future isn't > here and having a single check simplifies the code. I've done some more research and found that while NFSv4.2 has its own representation of a copyable range of a file, iSCSI and SMB3 share the same ROD. So it's not at all implausible that some other filesystem might also decide to use the same ROD, perhaps even NFS v4.3. It's clearly crazy to expect filesystem A to know about all the interpretations of filesystem B's file->private. I'd expect us to add a function like: int vfs_get_rod(struct file *src, struct file_rod *rod); and then a filesystem's copy_file_range() would check to see if both src and dest referred to the same server, and if not call vfs_get_rod() to be able to send the ROD to the destination.
On Tue, Oct 23, 2018 at 11:30 AM Olga Kornievskaia <olga.kornievskaia@gmail.com> wrote: > > On Tue, Oct 23, 2018 at 11:03 AM Olga Kornievskaia > <olga.kornievskaia@gmail.com> wrote: > > > > On Tue, Oct 23, 2018 at 2:05 AM Amir Goldstein <amir73il@gmail.com> wrote: > > > > > > On Tue, Oct 23, 2018 at 2:39 AM Jeff Layton <jlayton@redhat.com> wrote: > > > > > > > > On Mon, 2018-10-22 at 15:34 -0400, Olga Kornievskaia wrote: > > > > > On Mon, Oct 22, 2018 at 3:06 PM Matthew Wilcox <willy@infradead.org> wrote: > > > > > > > > > > > > On Mon, Oct 22, 2018 at 02:45:04PM -0400, Olga Kornievskaia wrote: > > > > > > > On Sat, Oct 20, 2018 at 4:54 AM Amir Goldstein <amir73il@gmail.com> wrote: > > > > > > > > Another thing is the commit message claims to: > > > > > > > > "Allow copy_file_range to copy between different superblocks but only > > > > > > > > of the same file system types" > > > > > > > > > > > > > > > > But what the patch actually does is: > > > > > > > > "Allow copy_file_range() syscall to copy between different filesystems > > > > > > > > AND allow calling the filesystems' copy_file_range() method > > > > > > > > between different superblocks but only of the same file system types" > > > > > > > > > > > > > > > > It's probably OK and quite useful to do the former, but maybe man page > > > > > > > > should be fixed to explicitly mention that the copy is expected to work > > > > > > > > across filesystems since kernel version XXX (?) > > > > > > > > > > > > > > > > If you don't wish to change cross filesystem type behavior and only > > > > > > > > relax cross super block limitation, then you should replace the > > > > > > > > same inode->i_sb check above with same inode->i_sb->s_type > > > > > > > > check instead of doing the check only for calling the filesystem > > > > > > > > copy_file_range() method. > > > > > > > > > > > > > > Thank you for the feedback. In the next version, I will remove the > > > > > > > check for the functions and instead check for the same file system > > > > > > > types. > > > > > > > > > > > > Jeff and I agree that this is the wrong way to go. Instead, the > > > > > > cross-device check should be in the individual instances, not the top > > > > > > level code. > > > > > > > > > > So remove the check all together for the VFS (that was my original > > > > > patch to begin with (like #1 not this one). So am I missing the point > > > > > again, I keep getting different corrections every time. > > > > > > > > Sorry if I wasn't clear before: > > > > > > > > Basically, I think Willy and I are both envisioning that some > > > > copy_file_range implementations may eventually not be subject to the > > > > limitations of the checks you're adding. > > > > > > > > > > Yes. Eventually. And even Matthew is (quote) "dubious" about that ever > > > happening. Changing the interface as Matthew proposed has a price > > > and we need to compare this price to the alleged backporting price > > > that nobody may ever need to pay. > > > > > > As far as I can tell, passing a struct file * on a file_operations method > > > that does not belong to that filesystem in unprecedented (*) and is a far > > > more lethal landmine than the alleged backporting landmine. > > > > > > (*) prior to v4.19-rc1, filesystems could get an overlayfs file, but > > > file_inode(file) has always belonged to the filesystem. > > > > > > Olga, > > > > > > I do not strongly object to Matthew's proposal, so don't feel > > > obligated to choose my side of the argument. I am just trying > > > to offer a different perspective. > > > > > > In any case, my outstanding concerns with the patch are: > > > > > > 1. If you change syscall to support cross fs type copy (which is > > > good IMO) need to document that in commit message > > > and possibly follow up later with a note in man page > > > > > > 2. Commit message says: > > > "This feature was of interest of ... NFS" > > > "This feature is needed by NFSv4.2..." > > > "NFS will allow for copies between different NFS servers." > > > It is not clear to me if we are talking about present of future > > > NFSv4.2 code. If NFSv4.2 currently does not support cross > > > sb copy (??) than your patch need to enforce same sb > > > in nfs4_copy_file_range(). If it does support cross sb copy > > > than please edit the commit message to make that clear. > > > > I personally agree with Amir. I think it's far fetched that a file > > system would know how to handle something that's not of its type. When > > the copy_file_range() was checked in, there was a comment above the > > superblock check saying "we might want to relax this in the future". > > It deemed appropriate then to enforce the check since none of the file > > systems used it. Now, the future is here, and we are removing the > > check but proposing a different once because again the future isn't > > here and having a single check simplifies the code. > > Sorry Ok I wrote this too fast. I think I'm changing my mind and > siding with the check by the file system. The reason I think we should remove the check all together is we'd allow the callers of copy_file_range() to utilize the do_splice_direct() between file system types even when copy_file_range() doesn't support cross fs copy. Isn't that beneficial? > > But I don't feel strongly about the check (or rather the location of > > it VFS vs each FS) and what I ultimately need is to removed same sb > > check. It sounds like if Amir isn't objecting, then the check for same > > file system type should be removed from VFS. And, for each of the file > > systems that currently support copy_file_range() -- CIFS, NFS, and > > overlayfs -- I need to modify them and add a check for the same > > fs_type. > > > > Amir to answer your question, only NFSv4.2 has copy_offload > > functionality (not earlier NFS versions). Furthermore, existing > > upstream only supports same sb copy offload. What this patch series > > adds is support for copy offload across different superblocks but NFS > > will not support (and would need a check) for copy offload across > > different file system types. Also I kinda stand behind the ideas > > stated: 1) this is of interest to NFS (where NFS here is to represent > > a community, and CIFS is used to represent the other community). 2) > > NFSv4.2 copy offload a specific feature that needs this functionality. > > 3rd statement is bad. Only NFSv4.2 will allow copies between different > > NFS servers (ie., after this patch +series), the emphasis was on "will > > allow" meaning what this patch will allow to be done (ie, patch's > > purpose). Or also, if the NFS server exports different exports, then a > > copy between them (assuming exports of the same file system type). > > > > In the next version of the patch, I'll do my best to specified what > > changed as the consequence of removing the cross sb check (ie, file > > system types of the passed in file can be from different file > > systems). I will add wording to the man page and add the suggested > > wording to the "porting" file.
On Tue, 2018-10-23 at 13:16 -0400, Olga Kornievskaia wrote: > On Tue, Oct 23, 2018 at 11:30 AM Olga Kornievskaia > <olga.kornievskaia@gmail.com> wrote: > > > > On Tue, Oct 23, 2018 at 11:03 AM Olga Kornievskaia > > <olga.kornievskaia@gmail.com> wrote: > > > > > > On Tue, Oct 23, 2018 at 2:05 AM Amir Goldstein <amir73il@gmail.com> wrote: > > > > > > > > On Tue, Oct 23, 2018 at 2:39 AM Jeff Layton <jlayton@redhat.com> wrote: > > > > > > > > > > On Mon, 2018-10-22 at 15:34 -0400, Olga Kornievskaia wrote: > > > > > > On Mon, Oct 22, 2018 at 3:06 PM Matthew Wilcox <willy@infradead.org> wrote: > > > > > > > > > > > > > > On Mon, Oct 22, 2018 at 02:45:04PM -0400, Olga Kornievskaia wrote: > > > > > > > > On Sat, Oct 20, 2018 at 4:54 AM Amir Goldstein <amir73il@gmail.com> wrote: > > > > > > > > > Another thing is the commit message claims to: > > > > > > > > > "Allow copy_file_range to copy between different superblocks but only > > > > > > > > > of the same file system types" > > > > > > > > > > > > > > > > > > But what the patch actually does is: > > > > > > > > > "Allow copy_file_range() syscall to copy between different filesystems > > > > > > > > > AND allow calling the filesystems' copy_file_range() method > > > > > > > > > between different superblocks but only of the same file system types" > > > > > > > > > > > > > > > > > > It's probably OK and quite useful to do the former, but maybe man page > > > > > > > > > should be fixed to explicitly mention that the copy is expected to work > > > > > > > > > across filesystems since kernel version XXX (?) > > > > > > > > > > > > > > > > > > If you don't wish to change cross filesystem type behavior and only > > > > > > > > > relax cross super block limitation, then you should replace the > > > > > > > > > same inode->i_sb check above with same inode->i_sb->s_type > > > > > > > > > check instead of doing the check only for calling the filesystem > > > > > > > > > copy_file_range() method. > > > > > > > > > > > > > > > > Thank you for the feedback. In the next version, I will remove the > > > > > > > > check for the functions and instead check for the same file system > > > > > > > > types. > > > > > > > > > > > > > > Jeff and I agree that this is the wrong way to go. Instead, the > > > > > > > cross-device check should be in the individual instances, not the top > > > > > > > level code. > > > > > > > > > > > > So remove the check all together for the VFS (that was my original > > > > > > patch to begin with (like #1 not this one). So am I missing the point > > > > > > again, I keep getting different corrections every time. > > > > > > > > > > Sorry if I wasn't clear before: > > > > > > > > > > Basically, I think Willy and I are both envisioning that some > > > > > copy_file_range implementations may eventually not be subject to the > > > > > limitations of the checks you're adding. > > > > > > > > > > > > > Yes. Eventually. And even Matthew is (quote) "dubious" about that ever > > > > happening. Changing the interface as Matthew proposed has a price > > > > and we need to compare this price to the alleged backporting price > > > > that nobody may ever need to pay. > > > > > > > > As far as I can tell, passing a struct file * on a file_operations method > > > > that does not belong to that filesystem in unprecedented (*) and is a far > > > > more lethal landmine than the alleged backporting landmine. > > > > > > > > (*) prior to v4.19-rc1, filesystems could get an overlayfs file, but > > > > file_inode(file) has always belonged to the filesystem. > > > > > > > > Olga, > > > > > > > > I do not strongly object to Matthew's proposal, so don't feel > > > > obligated to choose my side of the argument. I am just trying > > > > to offer a different perspective. > > > > > > > > In any case, my outstanding concerns with the patch are: > > > > > > > > 1. If you change syscall to support cross fs type copy (which is > > > > good IMO) need to document that in commit message > > > > and possibly follow up later with a note in man page > > > > > > > > 2. Commit message says: > > > > "This feature was of interest of ... NFS" > > > > "This feature is needed by NFSv4.2..." > > > > "NFS will allow for copies between different NFS servers." > > > > It is not clear to me if we are talking about present of future > > > > NFSv4.2 code. If NFSv4.2 currently does not support cross > > > > sb copy (??) than your patch need to enforce same sb > > > > in nfs4_copy_file_range(). If it does support cross sb copy > > > > than please edit the commit message to make that clear. > > > > > > I personally agree with Amir. I think it's far fetched that a file > > > system would know how to handle something that's not of its type. When > > > the copy_file_range() was checked in, there was a comment above the > > > superblock check saying "we might want to relax this in the future". > > > It deemed appropriate then to enforce the check since none of the file > > > systems used it. Now, the future is here, and we are removing the > > > check but proposing a different once because again the future isn't > > > here and having a single check simplifies the code. > > > > Sorry Ok I wrote this too fast. I think I'm changing my mind and > > siding with the check by the file system. > > The reason I think we should remove the check all together is we'd > allow the callers of copy_file_range() to utilize the > do_splice_direct() between file system types even when > copy_file_range() doesn't support cross fs copy. Isn't that > beneficial? Others might argue no, but I think it's good to have that option open. If the consensus is that we do need a check at the vfs layer to validate that the struct files come from the same fstype (as Amir suggests), then I can live with that. We've dealt with internal API changes before so we can do it again. Al is quite correct though that we really do want to compare fstypes rather than op pointers in that case however. > > > But I don't feel strongly about the check (or rather the location of > > > it VFS vs each FS) and what I ultimately need is to removed same sb > > > check. It sounds like if Amir isn't objecting, then the check for same > > > file system type should be removed from VFS. And, for each of the file > > > systems that currently support copy_file_range() -- CIFS, NFS, and > > > overlayfs -- I need to modify them and add a check for the same > > > fs_type. > > > > > > Amir to answer your question, only NFSv4.2 has copy_offload > > > functionality (not earlier NFS versions). Furthermore, existing > > > upstream only supports same sb copy offload. What this patch series > > > adds is support for copy offload across different superblocks but NFS > > > will not support (and would need a check) for copy offload across > > > different file system types. Also I kinda stand behind the ideas > > > stated: 1) this is of interest to NFS (where NFS here is to represent > > > a community, and CIFS is used to represent the other community). 2) > > > NFSv4.2 copy offload a specific feature that needs this functionality. > > > 3rd statement is bad. Only NFSv4.2 will allow copies between different > > > NFS servers (ie., after this patch +series), the emphasis was on "will > > > allow" meaning what this patch will allow to be done (ie, patch's > > > purpose). Or also, if the NFS server exports different exports, then a > > > copy between them (assuming exports of the same file system type). > > > > > > In the next version of the patch, I'll do my best to specified what > > > changed as the consequence of removing the cross sb check (ie, file > > > system types of the passed in file can be from different file > > > systems). I will add wording to the man page and add the suggested > > > wording to the "porting" file.
On Tue, Oct 23, 2018 at 6:39 PM Matthew Wilcox <willy@infradead.org> wrote: > [...] > I've done some more research and found that while NFSv4.2 has its own > representation of a copyable range of a file, iSCSI and SMB3 share the > same ROD. So it's not at all implausible that some other filesystem > might also decide to use the same ROD, perhaps even NFS v4.3. > > It's clearly crazy to expect filesystem A to know about all the > interpretations of filesystem B's file->private. I'd expect us to add > a function like: > > int vfs_get_rod(struct file *src, struct file_rod *rod); > > and then a filesystem's copy_file_range() would check to see if both > src and dest referred to the same server, and if not call vfs_get_rod() > to be able to send the ROD to the destination. > I am not saying cross filesystem type copy won't be possible. I'm just saying we are going to get the API wrong anyway. Surely, it is going to be either cifs or nfs42 that supports cross fs copy first, not both at the same development cycle, so now it seems flaky to use the out_file's copy_file_range() method. You'd want to figure out which of the in/out fs supports cross fs copy and call that method (not enough information). I suspect we will need a new cross_copy_file_range() method. And to answer Olga's question, yes, I do find cross fs copy with do_splice_direct() beneficial. Thanks, Amir.
On Wed, Oct 24, 2018 at 7:17 AM Jeff Layton <jlayton@redhat.com> wrote: > > On Tue, 2018-10-23 at 13:16 -0400, Olga Kornievskaia wrote: > > On Tue, Oct 23, 2018 at 11:30 AM Olga Kornievskaia > > <olga.kornievskaia@gmail.com> wrote: > > > > > > On Tue, Oct 23, 2018 at 11:03 AM Olga Kornievskaia > > > <olga.kornievskaia@gmail.com> wrote: > > > > > > > > On Tue, Oct 23, 2018 at 2:05 AM Amir Goldstein <amir73il@gmail.com> wrote: > > > > > > > > > > On Tue, Oct 23, 2018 at 2:39 AM Jeff Layton <jlayton@redhat.com> wrote: > > > > > > > > > > > > On Mon, 2018-10-22 at 15:34 -0400, Olga Kornievskaia wrote: > > > > > > > On Mon, Oct 22, 2018 at 3:06 PM Matthew Wilcox <willy@infradead.org> wrote: > > > > > > > > > > > > > > > > On Mon, Oct 22, 2018 at 02:45:04PM -0400, Olga Kornievskaia wrote: > > > > > > > > > On Sat, Oct 20, 2018 at 4:54 AM Amir Goldstein <amir73il@gmail.com> wrote: > > > > > > > > > > Another thing is the commit message claims to: > > > > > > > > > > "Allow copy_file_range to copy between different superblocks but only > > > > > > > > > > of the same file system types" > > > > > > > > > > > > > > > > > > > > But what the patch actually does is: > > > > > > > > > > "Allow copy_file_range() syscall to copy between different filesystems > > > > > > > > > > AND allow calling the filesystems' copy_file_range() method > > > > > > > > > > between different superblocks but only of the same file system types" > > > > > > > > > > > > > > > > > > > > It's probably OK and quite useful to do the former, but maybe man page > > > > > > > > > > should be fixed to explicitly mention that the copy is expected to work > > > > > > > > > > across filesystems since kernel version XXX (?) > > > > > > > > > > > > > > > > > > > > If you don't wish to change cross filesystem type behavior and only > > > > > > > > > > relax cross super block limitation, then you should replace the > > > > > > > > > > same inode->i_sb check above with same inode->i_sb->s_type > > > > > > > > > > check instead of doing the check only for calling the filesystem > > > > > > > > > > copy_file_range() method. > > > > > > > > > > > > > > > > > > Thank you for the feedback. In the next version, I will remove the > > > > > > > > > check for the functions and instead check for the same file system > > > > > > > > > types. > > > > > > > > > > > > > > > > Jeff and I agree that this is the wrong way to go. Instead, the > > > > > > > > cross-device check should be in the individual instances, not the top > > > > > > > > level code. > > > > > > > > > > > > > > So remove the check all together for the VFS (that was my original > > > > > > > patch to begin with (like #1 not this one). So am I missing the point > > > > > > > again, I keep getting different corrections every time. > > > > > > > > > > > > Sorry if I wasn't clear before: > > > > > > > > > > > > Basically, I think Willy and I are both envisioning that some > > > > > > copy_file_range implementations may eventually not be subject to the > > > > > > limitations of the checks you're adding. > > > > > > > > > > > > > > > > Yes. Eventually. And even Matthew is (quote) "dubious" about that ever > > > > > happening. Changing the interface as Matthew proposed has a price > > > > > and we need to compare this price to the alleged backporting price > > > > > that nobody may ever need to pay. > > > > > > > > > > As far as I can tell, passing a struct file * on a file_operations method > > > > > that does not belong to that filesystem in unprecedented (*) and is a far > > > > > more lethal landmine than the alleged backporting landmine. > > > > > > > > > > (*) prior to v4.19-rc1, filesystems could get an overlayfs file, but > > > > > file_inode(file) has always belonged to the filesystem. > > > > > > > > > > Olga, > > > > > > > > > > I do not strongly object to Matthew's proposal, so don't feel > > > > > obligated to choose my side of the argument. I am just trying > > > > > to offer a different perspective. > > > > > > > > > > In any case, my outstanding concerns with the patch are: > > > > > > > > > > 1. If you change syscall to support cross fs type copy (which is > > > > > good IMO) need to document that in commit message > > > > > and possibly follow up later with a note in man page > > > > > > > > > > 2. Commit message says: > > > > > "This feature was of interest of ... NFS" > > > > > "This feature is needed by NFSv4.2..." > > > > > "NFS will allow for copies between different NFS servers." > > > > > It is not clear to me if we are talking about present of future > > > > > NFSv4.2 code. If NFSv4.2 currently does not support cross > > > > > sb copy (??) than your patch need to enforce same sb > > > > > in nfs4_copy_file_range(). If it does support cross sb copy > > > > > than please edit the commit message to make that clear. > > > > > > > > I personally agree with Amir. I think it's far fetched that a file > > > > system would know how to handle something that's not of its type. When > > > > the copy_file_range() was checked in, there was a comment above the > > > > superblock check saying "we might want to relax this in the future". > > > > It deemed appropriate then to enforce the check since none of the file > > > > systems used it. Now, the future is here, and we are removing the > > > > check but proposing a different once because again the future isn't > > > > here and having a single check simplifies the code. > > > > > > Sorry Ok I wrote this too fast. I think I'm changing my mind and > > > siding with the check by the file system. > > > > The reason I think we should remove the check all together is we'd > > allow the callers of copy_file_range() to utilize the > > do_splice_direct() between file system types even when > > copy_file_range() doesn't support cross fs copy. Isn't that > > beneficial? > > Others might argue no, but I think it's good to have that option open. > > If the consensus is that we do need a check at the vfs layer to validate > that the struct files come from the same fstype (as Amir suggests), then > I can live with that. We've dealt with internal API changes before so we > can do it again. > > Al is quite correct though that we really do want to compare fstypes > rather than op pointers in that case however. It feels like folks are now ok with either the check being in the drivers or doing the check in the VFS layer. I'm picking the choice of not doing the check in the VFS layer because it allows for do_splice_direct() by any caller. I'm about to submit the new version of the patches (this time I will include the NFS patch series). We can continue with the discussion on the new version. I have added checks for the CIFS and OverlayFS to be consistent with the previous behavior -- no cross-device copy_offload, I assume if and when those file systems are ready to make use of it they'll remove the check. > > > > But I don't feel strongly about the check (or rather the location of > > > > it VFS vs each FS) and what I ultimately need is to removed same sb > > > > check. It sounds like if Amir isn't objecting, then the check for same > > > > file system type should be removed from VFS. And, for each of the file > > > > systems that currently support copy_file_range() -- CIFS, NFS, and > > > > overlayfs -- I need to modify them and add a check for the same > > > > fs_type. > > > > > > > > Amir to answer your question, only NFSv4.2 has copy_offload > > > > functionality (not earlier NFS versions). Furthermore, existing > > > > upstream only supports same sb copy offload. What this patch series > > > > adds is support for copy offload across different superblocks but NFS > > > > will not support (and would need a check) for copy offload across > > > > different file system types. Also I kinda stand behind the ideas > > > > stated: 1) this is of interest to NFS (where NFS here is to represent > > > > a community, and CIFS is used to represent the other community). 2) > > > > NFSv4.2 copy offload a specific feature that needs this functionality. > > > > 3rd statement is bad. Only NFSv4.2 will allow copies between different > > > > NFS servers (ie., after this patch +series), the emphasis was on "will > > > > allow" meaning what this patch will allow to be done (ie, patch's > > > > purpose). Or also, if the NFS server exports different exports, then a > > > > copy between them (assuming exports of the same file system type). > > > > > > > > In the next version of the patch, I'll do my best to specified what > > > > changed as the consequence of removing the cross sb check (ie, file > > > > system types of the passed in file can be from different file > > > > systems). I will add wording to the man page and add the suggested > > > > wording to the "porting" file. > > -- > Jeff Layton <jlayton@redhat.com> >
On Wed, Oct 24, 2018 at 10:59 PM Olga Kornievskaia <olga.kornievskaia@gmail.com> wrote: > [...] > It feels like folks are now ok with either the check being in the > drivers or doing the check in the VFS layer. > > I'm picking the choice of not doing the check in the VFS layer because > it allows for do_splice_direct() by any caller. I'm sorry, but this reasoning in flawed and this is not the reason that Matthew promoted not doing same fs type check in vfs. You did not understand the option that I was promoting to begin with. What I suggested was: 1. Remove current same sb check in beginning of vfs_copy_file_range() 2. Check sb && ->clone_file_range 3. Check same sb type && ->copy_file_range 4. Cross fs do_splice_direct() It's fine that you chose not to check for same fs type in VFS before calling copy_file_range() method, but still requires an ACK from Al that he agrees with passing in file * of another filesystem on the interface. > I'm about to submit > the new version of the patches (this time I will include the NFS patch > series). We can continue with the discussion on the new version. > > I have added checks for the CIFS and OverlayFS to be consistent with > the previous behavior -- no cross-device copy_offload, I assume if and > when those file systems are ready to make use of it they'll remove the > check. > Actually overlayfs code is "ready" for cross sb copy, but neither nfs nor cifs are supported as upper file system, so it doesn't matter much. Thanks, Amir.
On Thu, Oct 25, 2018 at 12:59 AM Amir Goldstein <amir73il@gmail.com> wrote: > > On Wed, Oct 24, 2018 at 10:59 PM Olga Kornievskaia > <olga.kornievskaia@gmail.com> wrote: > > > [...] > > It feels like folks are now ok with either the check being in the > > drivers or doing the check in the VFS layer. > > > > I'm picking the choice of not doing the check in the VFS layer because > > it allows for do_splice_direct() by any caller. > > I'm sorry, but this reasoning in flawed and this is not the reason that > Matthew promoted not doing same fs type check in vfs. I stated the reason why I picked to do the check at the driver layer. Looking at your version of the sb type check to be only applied to the copy_file_range indeed makes my argument invalid. I was under the impression that sb type check was being proposed as a standalone check (just like the sb check was standalone). Thus, yes I didn't completely understand what you proposed. > You did not understand the option that I was promoting to begin with. > What I suggested was: > > 1. Remove current same sb check in beginning of vfs_copy_file_range() > 2. Check sb && ->clone_file_range > 3. Check same sb type && ->copy_file_range > 4. Cross fs do_splice_direct() > > It's fine that you chose not to check for same fs type in VFS before calling > copy_file_range() method, but still requires an ACK from Al that he agrees > with passing in file * of another filesystem on the interface. Al, can you please provide a final decision as to which way you would prefer for this to be done. > > I'm about to submit > > the new version of the patches (this time I will include the NFS patch > > series). We can continue with the discussion on the new version. > > > > I have added checks for the CIFS and OverlayFS to be consistent with > > the previous behavior -- no cross-device copy_offload, I assume if and > > when those file systems are ready to make use of it they'll remove the > > check. > > > > Actually overlayfs code is "ready" for cross sb copy, but neither nfs nor > cifs are supported as upper file system, so it doesn't matter much. So then the commit statement is still true. When overlayfs will have upper file systems that do support it, this check can be removed. > > Thanks, > Amir.
On Thu, Oct 25, 2018 at 11:58 AM Olga Kornievskaia <olga.kornievskaia@gmail.com> wrote: > > On Thu, Oct 25, 2018 at 12:59 AM Amir Goldstein <amir73il@gmail.com> wrote: > > > > On Wed, Oct 24, 2018 at 10:59 PM Olga Kornievskaia > > <olga.kornievskaia@gmail.com> wrote: > > > > > [...] > > > It feels like folks are now ok with either the check being in the > > > drivers or doing the check in the VFS layer. > > > > > > I'm picking the choice of not doing the check in the VFS layer because > > > it allows for do_splice_direct() by any caller. > > > > I'm sorry, but this reasoning in flawed and this is not the reason that > > Matthew promoted not doing same fs type check in vfs. > > I stated the reason why I picked to do the check at the driver layer. > Looking at your version of the sb type check to be only applied to the > copy_file_range indeed makes my argument invalid. I was under the > impression that sb type check was being proposed as a standalone check > (just like the sb check was standalone). Thus, yes I didn't completely > understand what you proposed. > > > You did not understand the option that I was promoting to begin with. > > What I suggested was: > > > > 1. Remove current same sb check in beginning of vfs_copy_file_range() > > 2. Check sb && ->clone_file_range > > 3. Check same sb type && ->copy_file_range > > 4. Cross fs do_splice_direct() > > > > It's fine that you chose not to check for same fs type in VFS before calling > > copy_file_range() method, but still requires an ACK from Al that he agrees > > with passing in file * of another filesystem on the interface. > > Al, can you please provide a final decision as to which way you would > prefer for this to be done. > > > > I'm about to submit > > > the new version of the patches (this time I will include the NFS patch > > > series). We can continue with the discussion on the new version. > > > > > > I have added checks for the CIFS and OverlayFS to be consistent with > > > the previous behavior -- no cross-device copy_offload, I assume if and > > > when those file systems are ready to make use of it they'll remove the > > > check. > > > > > > > Actually overlayfs code is "ready" for cross sb copy, but neither nfs nor > > cifs are supported as upper file system, so it doesn't matter much. > > So then the commit statement is still true. When overlayfs will have > upper file systems that do support it, this check can be removed. Ops sorry I meant them as questions. Do you feel that commit message needs to be changed then? > > > > > Thanks, > > Amir.
On Thu, Oct 25, 2018 at 7:00 PM Olga Kornievskaia <olga.kornievskaia@gmail.com> wrote: > > On Thu, Oct 25, 2018 at 11:58 AM Olga Kornievskaia > <olga.kornievskaia@gmail.com> wrote: > > > > On Thu, Oct 25, 2018 at 12:59 AM Amir Goldstein <amir73il@gmail.com> wrote: > > > > > > On Wed, Oct 24, 2018 at 10:59 PM Olga Kornievskaia > > > <olga.kornievskaia@gmail.com> wrote: > > > > > > > [...] > > > > It feels like folks are now ok with either the check being in the > > > > drivers or doing the check in the VFS layer. > > > > > > > > I'm picking the choice of not doing the check in the VFS layer because > > > > it allows for do_splice_direct() by any caller. > > > > > > I'm sorry, but this reasoning in flawed and this is not the reason that > > > Matthew promoted not doing same fs type check in vfs. > > > > I stated the reason why I picked to do the check at the driver layer. > > Looking at your version of the sb type check to be only applied to the > > copy_file_range indeed makes my argument invalid. I was under the > > impression that sb type check was being proposed as a standalone check > > (just like the sb check was standalone). Thus, yes I didn't completely > > understand what you proposed. > > > > > You did not understand the option that I was promoting to begin with. > > > What I suggested was: > > > > > > 1. Remove current same sb check in beginning of vfs_copy_file_range() > > > 2. Check sb && ->clone_file_range > > > 3. Check same sb type && ->copy_file_range > > > 4. Cross fs do_splice_direct() > > > > > > It's fine that you chose not to check for same fs type in VFS before calling > > > copy_file_range() method, but still requires an ACK from Al that he agrees > > > with passing in file * of another filesystem on the interface. > > > > Al, can you please provide a final decision as to which way you would > > prefer for this to be done. > > > > > > I'm about to submit > > > > the new version of the patches (this time I will include the NFS patch > > > > series). We can continue with the discussion on the new version. > > > > > > > > I have added checks for the CIFS and OverlayFS to be consistent with > > > > the previous behavior -- no cross-device copy_offload, I assume if and > > > > when those file systems are ready to make use of it they'll remove the > > > > check. > > > > > > > > > > Actually overlayfs code is "ready" for cross sb copy, but neither nfs nor > > > cifs are supported as upper file system, so it doesn't matter much. > > > > So then the commit statement is still true. When overlayfs will have > > upper file systems that do support it, this check can be removed. > > Ops sorry I meant them as questions. Do you feel that commit message > needs to be changed then? Commit message is fine by me. Thanks, Amir.
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index a6c6a8a..5e520de 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -958,7 +958,9 @@ otherwise noted. fallocate: called by the VFS to preallocate blocks or punch a hole. - copy_file_range: called by the copy_file_range(2) system call. + copy_file_range: called by copy_file_range(2) system call. This method + works on two file descriptors that might reside on + different superblocks of the same type of file system. clone_file_range: called by the ioctl(2) system call for FICLONERANGE and FICLONE commands. diff --git a/fs/read_write.c b/fs/read_write.c index c60790f..474e740 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -1578,10 +1578,6 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, (file_out->f_flags & O_APPEND)) return -EBADF; - /* this could be relaxed once a method supports cross-fs copies */ - if (inode_in->i_sb != inode_out->i_sb) - return -EXDEV; - if (len == 0) return 0; @@ -1591,7 +1587,8 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, * Try cloning first, this is supported by more file systems, and * more efficient if both clone and copy are supported (e.g. NFS). */ - if (file_in->f_op->clone_file_range) { + if (inode_in->i_sb == inode_out->i_sb && + file_in->f_op->clone_file_range) { ret = file_in->f_op->clone_file_range(file_in, pos_in, file_out, pos_out, len); if (ret == 0) { @@ -1600,10 +1597,12 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, } } - if (file_out->f_op->copy_file_range) { + if (file_out->f_op->copy_file_range && + (file_in->f_op->copy_file_range == + file_out->f_op->copy_file_range)) { ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out, pos_out, len, flags); - if (ret != -EOPNOTSUPP) + if (ret != -EOPNOTSUPP && ret != -EXDEV) goto done; }