Message ID | 20181130200348.59524-2-olga.kornievskaia@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | server-side support for "inter" SSC copy | expand |
On Fri, Nov 30, 2018 at 10:04 PM Olga Kornievskaia <olga.kornievskaia@gmail.com> wrote: > > Relax the condition that input files must be from the same > file systems. > > Add checks that input parameters adhere semantics. > > If no copy_file_range() support is found, then do generic > checks for the unsupported page cache ranges, LFS, limits, > and clear setuid/setgid if not running as root before calling > do_splice_direct(). Update atime,ctime,mtime afterwards. > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > --- This patch is either going to bring you down or make you stronger ;-) This is not how its done. Behavior change and refactoring mixed into one patch is wrong for several reasons. And when you relax same sb check you need to restrict it inside filesystems, like your previous patch did. You already had v7 patch reviewed-by 4 developers. What made you go and change it (and posted as v2)? Your intentions were good trying to fix the broken syscall, but I hope you understood that Dave didn't mean that you *have* to add the missing generic checks as part of your work. He just pointed out how broken the current interface is in the context of reviewing your patch. In any case, I hear that Dave is neck deep in fixing copy_file_range() so changes to this function should be collaborated with him. Or better yet, wait until he posts his fixes and carry on from there. If I were you, I would just go back to the reviewed v7 vfs patch. Thanks, Amir.
On Sat, Dec 1, 2018 at 3:11 AM Amir Goldstein <amir73il@gmail.com> wrote: > > On Fri, Nov 30, 2018 at 10:04 PM Olga Kornievskaia > <olga.kornievskaia@gmail.com> wrote: > > > > Relax the condition that input files must be from the same > > file systems. > > > > Add checks that input parameters adhere semantics. > > > > If no copy_file_range() support is found, then do generic > > checks for the unsupported page cache ranges, LFS, limits, > > and clear setuid/setgid if not running as root before calling > > do_splice_direct(). Update atime,ctime,mtime afterwards. > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > > --- > > This patch is either going to bring you down or make you stronger ;-) > > This is not how its done. Behavior change and refactoring mixed into > one patch is wrong for several reasons. And when you relax same sb > check you need to restrict it inside filesystems, like your previous patch > did. > > You already had v7 patch reviewed-by 4 developers. > What made you go and change it (and posted as v2)? > > Your intentions were good trying to fix the broken syscall, but > I hope you understood that Dave didn't mean that you *have* to > add the missing generic checks as part of your work. He just > pointed out how broken the current interface is in the context of > reviewing your patch. > > In any case, I hear that Dave is neck deep in fixing copy_file_range() > so changes to this function should be collaborated with him. Or better > yet, wait until he posts his fixes and carry on from there. > > If I were you, I would just go back to the reviewed v7 vfs patch. This is NOT a replacement to the v7 vfs patch??? This is a new patch on top of that one. I assume that v7 patch has been OK-ed by everybody and is ready to go in??? As you recall, what was left is to provide the functionality to relax the check for the superblocks to be the same before calling the do_splice_direct(). This patch attempt do this. I was under the impression that to do so extra checks were needed to be added which I added. > > Thanks, > Amir.
On Sat, Dec 1, 2018 at 8:23 AM Olga Kornievskaia <olga.kornievskaia@gmail.com> wrote: > > On Sat, Dec 1, 2018 at 3:11 AM Amir Goldstein <amir73il@gmail.com> wrote: > > > > On Fri, Nov 30, 2018 at 10:04 PM Olga Kornievskaia > > <olga.kornievskaia@gmail.com> wrote: > > > > > > Relax the condition that input files must be from the same > > > file systems. > > > > > > Add checks that input parameters adhere semantics. > > > > > > If no copy_file_range() support is found, then do generic > > > checks for the unsupported page cache ranges, LFS, limits, > > > and clear setuid/setgid if not running as root before calling > > > do_splice_direct(). Update atime,ctime,mtime afterwards. > > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > > > --- > > > > This patch is either going to bring you down or make you stronger ;-) > > > > This is not how its done. Behavior change and refactoring mixed into > > one patch is wrong for several reasons. And when you relax same sb > > check you need to restrict it inside filesystems, like your previous patch > > did. > > > > You already had v7 patch reviewed-by 4 developers. > > What made you go and change it (and posted as v2)? > > > > Your intentions were good trying to fix the broken syscall, but > > I hope you understood that Dave didn't mean that you *have* to > > add the missing generic checks as part of your work. He just > > pointed out how broken the current interface is in the context of > > reviewing your patch. > > > > In any case, I hear that Dave is neck deep in fixing copy_file_range() > > so changes to this function should be collaborated with him. Or better > > yet, wait until he posts his fixes and carry on from there. > > > > If I were you, I would just go back to the reviewed v7 vfs patch. > > This is NOT a replacement to the v7 vfs patch??? This is a new patch > on top of that one. > > I assume that v7 patch has been OK-ed by everybody and is ready to go in??? > > As you recall, what was left is to provide the functionality to relax > the check for the superblocks to be the same before calling the > do_splice_direct(). This patch attempt do this. I was under the > impression that to do so extra checks were needed to be added which I > added. > To clarify, previously I had a VFS patch with the client-side series to support "server to server" copy offload. It needed the functionality to be able to call copy_file_range with different super blocks. This patch series is for the server side support for the "server to server" copy offload. It requires ability to call copy_file_range() and do a copy between NFS and a local file system. Thus it needs generic_copy_file_range. > > > > > Thanks, > > Amir.
On Sat, Dec 1, 2018 at 5:57 PM Olga Kornievskaia <olga.kornievskaia@gmail.com> wrote: > > On Sat, Dec 1, 2018 at 9:03 AM Amir Goldstein <amir73il@gmail.com> wrote: > > > > > > > > On Sat, Dec 1, 2018, 3:44 PM Olga Kornievskaia <olga.kornievskaia@gmail.com wrote: > >> > >> On Sat, Dec 1, 2018 at 8:23 AM Olga Kornievskaia > >> <olga.kornievskaia@gmail.com> wrote: > >> > > >> > On Sat, Dec 1, 2018 at 3:11 AM Amir Goldstein <amir73il@gmail.com> wrote: > >> > > > >> > > On Fri, Nov 30, 2018 at 10:04 PM Olga Kornievskaia > >> > > <olga.kornievskaia@gmail.com> wrote: > >> > > > > >> > > > Relax the condition that input files must be from the same > >> > > > file systems. > >> > > > > >> > > > Add checks that input parameters adhere semantics. > >> > > > > >> > > > If no copy_file_range() support is found, then do generic > >> > > > checks for the unsupported page cache ranges, LFS, limits, > >> > > > and clear setuid/setgid if not running as root before calling > >> > > > do_splice_direct(). Update atime,ctime,mtime afterwards. > >> > > > > >> > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > >> > > > --- > >> > > > >> > > This patch is either going to bring you down or make you stronger ;-) > >> > > > >> > > This is not how its done. Behavior change and refactoring mixed into > >> > > one patch is wrong for several reasons. And when you relax same sb > >> > > check you need to restrict it inside filesystems, like your previous patch > >> > > did. > >> > > > >> > > You already had v7 patch reviewed-by 4 developers. > >> > > What made you go and change it (and posted as v2)? > >> > > > >> > > Your intentions were good trying to fix the broken syscall, but > >> > > I hope you understood that Dave didn't mean that you *have* to > >> > > add the missing generic checks as part of your work. He just > >> > > pointed out how broken the current interface is in the context of > >> > > reviewing your patch. > >> > > > >> > > In any case, I hear that Dave is neck deep in fixing copy_file_range() > >> > > so changes to this function should be collaborated with him. Or better > >> > > yet, wait until he posts his fixes and carry on from there. > >> > > > >> > > If I were you, I would just go back to the reviewed v7 vfs patch. > >> > > >> > This is NOT a replacement to the v7 vfs patch??? This is a new patch > >> > on top of that one. > >> > > >> > I assume that v7 patch has been OK-ed by everybody and is ready to go in??? > >> > > >> > As you recall, what was left is to provide the functionality to relax > >> > the check for the superblocks to be the same before calling the > >> > do_splice_direct(). This patch attempt do this. I was under the > >> > impression that to do so extra checks were needed to be added which I > >> > added. > >> > > >> > >> To clarify, previously I had a VFS patch with the client-side series > >> to support "server to server" copy offload. It needed the > >> functionality to be able to call copy_file_range with different super > >> blocks. > >> > >> This patch series is for the server side support for the "server to > >> server" copy offload. It requires ability to call copy_file_range() > >> and do a copy between NFS and a local file system. Thus it needs > >> generic_copy_file_range. > > > > > > Ah. Sorry for the confusion. > > My comment on change of behavior and refactoring in same patch still hold. > > My comment about coordinate your work with Dave Chinner still hold. > > Understood. I will email Dave directly and coordinate. > > > Raise that with a comment about adding test coverage to the new > > generic cross fs copy API to xfstest. > > What kind of extra coverage are you envisioning? Something that > requires two different file systems mounted and then does a fs copy? > Yes, if you add this functionality you should add test coverage for the added functionality. It's not going to be trivial to add cross fs type tests to xfstests, but adding cross fs (same type) should be relatively easy (copy_file_range from test fs to scratch fs). > > Am I mistaken that this change affects any cross fs copy file range > > by userspace and not only by kernel nfsd? > > That's correct, any cross fs copy is what I'm going for here. > Forgive me for being thick. After briefly going over the patches, I still don't understand if you *need* to add generic cross fs copy to implement server side copy support in nfsd? Or if you are adding it as an added bonus to the community along with your SSC patch set? The first two patches of the series seem unrelated to the rest, but maybe I'm just not getting the connection? Thanks, Amir.
On Fri, Nov 30, 2018 at 03:03:39PM -0500, Olga Kornievskaia wrote: > Relax the condition that input files must be from the same > file systems. > + ret = do_splice_direct(file_in, &pos_in, file_out, &pos_out, > + count > MAX_RW_COUNT ? MAX_RW_COUNT : count, 0); Wasn't there a concern about splicing between filesystems with different block sizes mentioned the last time this came up? I can't find a citation for that now. > - /* this could be relaxed once generic cross fs support is added */ > - if (inode_in->i_sb != inode_out->i_sb) { > - ret = -EXDEV; > - goto done; > - }
On Sat, Dec 01, 2018 at 10:11:48AM +0200, Amir Goldstein wrote: > On Fri, Nov 30, 2018 at 10:04 PM Olga Kornievskaia > <olga.kornievskaia@gmail.com> wrote: > > > > Relax the condition that input files must be from the same > > file systems. > > > > Add checks that input parameters adhere semantics. > > > > If no copy_file_range() support is found, then do generic > > checks for the unsupported page cache ranges, LFS, limits, > > and clear setuid/setgid if not running as root before calling > > do_splice_direct(). Update atime,ctime,mtime afterwards. > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > > --- > > This patch is either going to bring you down or make you stronger ;-) > > This is not how its done. Behavior change and refactoring mixed into > one patch is wrong for several reasons. And when you relax same sb > check you need to restrict it inside filesystems, like your previous patch > did. ..... > In any case, I hear that Dave is neck deep in fixing copy_file_range() > so changes to this function should be collaborated with him. Or better > yet, wait until he posts his fixes and carry on from there. Yeah, because I've heard nothing for a month and this is kinda important, I have a series of 8-9 patches that make all the fixes we need, push the cross-filesystem checks down into the filesystems, and let filesystems handle the fallback to a splice based copy themselves (because there are way more fallback cases than just EOPNOPSUPP and EXDEV). I also have a patch for the man page that document all the missing failure cases, and document where things are filesystem specific or not. And I also have a fstests patch that exercises all the failure cases so that all filesystems will end up behaving the same way for all the same cases they should. I'm still sorting out the fstests patch (it requires changes to xfs_io's copy-range command) so I've got some confidence that the code actually does what it says in the man page, but I should have that sorted in a couple of days. Cheers, Dave.
On Sat, Dec 01, 2018 at 01:18:06PM -0800, Matthew Wilcox wrote: > On Fri, Nov 30, 2018 at 03:03:39PM -0500, Olga Kornievskaia wrote: > > Relax the condition that input files must be from the same > > file systems. > > > + ret = do_splice_direct(file_in, &pos_in, file_out, &pos_out, > > + count > MAX_RW_COUNT ? MAX_RW_COUNT : count, 0); > > Wasn't there a concern about splicing between filesystems with different > block sizes mentioned the last time this came up? I can't find a citation > for that now. the filesystems should be able to handle that themselves - they are just passes an iter that has a range of data regions in pages that they copy the required data into/out of. The data transfer mechanism itself is completely independent of filesystem block sizes.... There's lots of other problems with do_splice_direct, but I don't think this is one of them. I coul dbe wrong - this code has pretty much zero documentation on how it is supposed to work and what it is supposed to do - so don't take my word for it... Cheers, Dave.
On Sat, Dec 1, 2018 at 5:00 PM Dave Chinner <david@fromorbit.com> wrote: > > On Sat, Dec 01, 2018 at 10:11:48AM +0200, Amir Goldstein wrote: > > On Fri, Nov 30, 2018 at 10:04 PM Olga Kornievskaia > > <olga.kornievskaia@gmail.com> wrote: > > > > > > Relax the condition that input files must be from the same > > > file systems. > > > > > > Add checks that input parameters adhere semantics. > > > > > > If no copy_file_range() support is found, then do generic > > > checks for the unsupported page cache ranges, LFS, limits, > > > and clear setuid/setgid if not running as root before calling > > > do_splice_direct(). Update atime,ctime,mtime afterwards. > > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > > > --- > > > > This patch is either going to bring you down or make you stronger ;-) > > > > This is not how its done. Behavior change and refactoring mixed into > > one patch is wrong for several reasons. And when you relax same sb > > check you need to restrict it inside filesystems, like your previous patch > > did. > ..... > > In any case, I hear that Dave is neck deep in fixing copy_file_range() > > so changes to this function should be collaborated with him. Or better > > yet, wait until he posts his fixes and carry on from there. > > Yeah, because I've heard nothing for a month and this is kinda > important Dave I think that's unfair. It is important. NFS is actually the file system that needed VFS support for cross fs copy_file_range and I was working on it. If you were in doubt, you could have emailed and asked me. I'm unsure now what does this mean. I have a patch series with a VFS patch that went thru the extensive review (people spend time on it) and an NFS patch series that depends on it that is ready for the upstream push. Are you saying that the VFS patch is no longer welcomed and thus NFS series is no longer viable either? , I have a series of 8-9 patches that make all the fixes we > need, push the cross-filesystem checks down into the filesystems, > and let filesystems handle the fallback to a splice based copy > themselves (because there are way more fallback cases than just > EOPNOPSUPP and EXDEV). Are you saying it is each individual filesystem responsibility to fallback on splice? Isn't that a step backwards? Each individual filesystem is going to implement the same code of calling do_splice_direct() to do the functionally that could and should be in VFS? > > I also have a patch for the man page that document all the missing > failure cases, and document where things are filesystem specific or > not. > > And I also have a fstests patch that exercises all the failure cases > so that all filesystems will end up behaving the same way for all > the same cases they should. > > I'm still sorting out the fstests patch (it requires changes > to xfs_io's copy-range command) so I've got some confidence that the > code actually does what it says in the man page, but I should have > that sorted in a couple of days. > > Cheers, > > Dave. > > -- > Dave Chinner > david@fromorbit.com
On Sat, Dec 1, 2018 at 10:12 PM Olga Kornievskaia <olga.kornievskaia@gmail.com> wrote: > > On Sat, Dec 1, 2018 at 5:00 PM Dave Chinner <david@fromorbit.com> wrote: > > > > On Sat, Dec 01, 2018 at 10:11:48AM +0200, Amir Goldstein wrote: > > > On Fri, Nov 30, 2018 at 10:04 PM Olga Kornievskaia > > > <olga.kornievskaia@gmail.com> wrote: > > > > > > > > Relax the condition that input files must be from the same > > > > file systems. > > > > > > > > Add checks that input parameters adhere semantics. > > > > > > > > If no copy_file_range() support is found, then do generic > > > > checks for the unsupported page cache ranges, LFS, limits, > > > > and clear setuid/setgid if not running as root before calling > > > > do_splice_direct(). Update atime,ctime,mtime afterwards. > > > > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > > > > --- > > > > > > This patch is either going to bring you down or make you stronger ;-) > > > > > > This is not how its done. Behavior change and refactoring mixed into > > > one patch is wrong for several reasons. And when you relax same sb > > > check you need to restrict it inside filesystems, like your previous patch > > > did. > > ..... > > > In any case, I hear that Dave is neck deep in fixing copy_file_range() > > > so changes to this function should be collaborated with him. Or better > > > yet, wait until he posts his fixes and carry on from there. > > > > Yeah, because I've heard nothing for a month and this is kinda > > important > > Dave I think that's unfair. It is important. NFS is actually the file > system that needed VFS support for cross fs copy_file_range and I was > working on it. If you were in doubt, you could have emailed and asked > me. Just to be clear. What I think was unfair in that comment was the wording "this is kinda important". I think a lot stems from lack of clarity in the the mailing list communications. I object to the fact that it wasn't clear who was going to implement the functionality. Since the work was needed by NFS I didn't want to assume that somebody in VFS would just do it for us. At the time nobody in VFS stood up and said they would do the work and thus I tried to do my best. I'm grateful, and would have been in the first place, that somebody did support generic cross-filesystem functionality. Thus I'm by no means speaking against Dave's work. > I'm unsure now what does this mean. I have a patch series with a VFS > patch that went thru the extensive review (people spend time on it) > and an NFS patch series that depends on it that is ready for the > upstream push. Are you saying that the VFS patch is no longer welcomed > and thus NFS series is no longer viable either? I'm unclear of the fate of the patch set that has the (v7) VFS patch that was reviewed and approved and is thought to be pushed for 4.21. It is unclear if the new work is on top of that or not. > , I have a series of 8-9 patches that make all the fixes we > > need, push the cross-filesystem checks down into the filesystems, > > and let filesystems handle the fallback to a splice based copy > > themselves (because there are way more fallback cases than just > > EOPNOPSUPP and EXDEV). > > Are you saying it is each individual filesystem responsibility to > fallback on splice? Isn't that a step backwards? Each individual > filesystem is going to implement the same code of calling > do_splice_direct() to do the functionally that could and should be in > VFS? > > > > > I also have a patch for the man page that document all the missing > > failure cases, and document where things are filesystem specific or > > not. > > > > And I also have a fstests patch that exercises all the failure cases > > so that all filesystems will end up behaving the same way for all > > the same cases they should. > > > > I'm still sorting out the fstests patch (it requires changes > > to xfs_io's copy-range command) so I've got some confidence that the > > code actually does what it says in the man page, but I should have > > that sorted in a couple of days. > > > > Cheers, > > > > Dave. > > > > -- > > Dave Chinner > > david@fromorbit.com
On Sat, Dec 01, 2018 at 10:12:05PM -0500, Olga Kornievskaia wrote: > On Sat, Dec 1, 2018 at 5:00 PM Dave Chinner <david@fromorbit.com> wrote: > > > > On Sat, Dec 01, 2018 at 10:11:48AM +0200, Amir Goldstein wrote: > > > On Fri, Nov 30, 2018 at 10:04 PM Olga Kornievskaia > > > <olga.kornievskaia@gmail.com> wrote: > > > > > > > > Relax the condition that input files must be from the same > > > > file systems. > > > > > > > > Add checks that input parameters adhere semantics. > > > > > > > > If no copy_file_range() support is found, then do generic > > > > checks for the unsupported page cache ranges, LFS, limits, > > > > and clear setuid/setgid if not running as root before calling > > > > do_splice_direct(). Update atime,ctime,mtime afterwards. > > > > > > > > Signed-off-by: Olga Kornievskaia <kolga@netapp.com> > > > > --- > > > > > > This patch is either going to bring you down or make you stronger ;-) > > > > > > This is not how its done. Behavior change and refactoring mixed into > > > one patch is wrong for several reasons. And when you relax same sb > > > check you need to restrict it inside filesystems, like your previous patch > > > did. > > ..... > > > In any case, I hear that Dave is neck deep in fixing copy_file_range() > > > so changes to this function should be collaborated with him. Or better > > > yet, wait until he posts his fixes and carry on from there. > > > > Yeah, because I've heard nothing for a month and this is kinda > > important > > Dave I think that's unfair. It is important. NFS is actually the file > system that needed VFS support for cross fs copy_file_range and I was > working on it. If you were in doubt, you could have emailed and asked > me. Last I heard from you was "this isn't my problem and I don't have time to deal with it". You were fairly unambiguous in saying you weren't going to spend any time on it. > I'm unsure now what does this mean. I have a patch series with a VFS > patch that went thru the extensive review (people spend time on it) > and an NFS patch series that depends on it that is ready for the > upstream push. Are you saying that the VFS patch is no longer welcomed > and thus NFS series is no longer viable either? No, I'm saying that this is urgent work and needs to be separated from the NFS patch series, of which there are now two and you've split copy_file_range() changes across both patch sets. copy_file_range() is broken for *everyone*, not just NFS. i.e. fixing these problems should not be tied to some other filesystem feature patchset. > , I have a series of 8-9 patches that make all the fixes we > > need, push the cross-filesystem checks down into the filesystems, > > and let filesystems handle the fallback to a splice based copy > > themselves (because there are way more fallback cases than just > > EOPNOPSUPP and EXDEV). > > Are you saying it is each individual filesystem responsibility to > fallback on splice? Isn't that a step backwards? Each individual > filesystem is going to implement the same code of calling > do_splice_direct() to do the functionally that could and should be in > VFS? I've done this because one of the problems I've found is that different filesystems *do not fall back consistently*. e.g. the NFS client will return -EINVAL if src/dst are the same file, but -EINVAL is not one of the errors that the vfs code falls back to a data copy on. This is despite the fact that the fallback path can copy to/from the same file, we support same file copy through the ->remap_file_range offload, etc. IOWs, the behaviour of the syscall when it comes to single file ranges is completely inconsistent because fallbacks are implemented on a filesystem-by-filesystem basis. I called the fallback generic_copy_file_range(), and filesystems that implement ->copy_file_range() are responsible for calling it themselves if they want a fallback. That's because there may be different error/constraint conditions at the filesystem level that prevent offloading the copy, and we can't distinguish at the VFs between "-EINVAL means fallback because it was a single file copy" and "-EINVAL means fail, parameter out of range". IOWs, if you implement ->copy_file_range() you take full resposnsibility for implementing the copying function. This is exactly what we do for all the other file methods, so this is just making the implementation behaviour consistent with the rest of the code. FWIW, this also points out a problem with the copy_file_range() definition - it does not say WTF should happen if the copy ranges /overlap/ in the same file. clone is clear on that - support is determined by the filesystem (i.e. "EINVAL [...] XFS and Btrfs do not support overlapping reflink ranges in the same file."). For copying, the fallback code can't copy the file data correctly if the ranges overlap, so I've added checks to make this illegal and added that overlapping ranges are not supported to the man page..... These are the sort of API definition problems that I'm fixing with right now, and I'm writing tests to make sure that all filesystems will behave the same way for given copy scenarios. i.e. I'm not doing this so I can get a NFS feature patchset merged, I'm doing this to make the copy_file_range API well defined and robust and allow implementations to be verified against the specification the man page lays out. Cheers, Dave.
diff --git a/fs/read_write.c b/fs/read_write.c index 7b9e59d..2d309b0 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -1540,6 +1540,44 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos, } #endif +ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in, + struct file *file_out, loff_t pos_out, + loff_t len, unsigned int flags) +{ + ssize_t ret; + loff_t size_in = i_size_read(file_inode(file_in)), count; + + /* preform generic checks for unsupported page cache ranges, LFS + * limits. If pos exceeds the limit, returns EFBIG + */ + count = min(len, size_in - pos_in); + ret = generic_access_check_limits(file_in, pos_in, &count); + if (ret) + goto done; + ret = generic_write_check_limits(file_out, pos_out, &count); + if (ret) + goto done; + /* If not running as root, clear setuid/setgid bits. This keeps + * people from modifying setuid and setgid binaries. + */ + if (!IS_NOSEC(file_inode(file_out))) { + ret = file_remove_privs(file_out); + if (ret) + goto done; + } + + ret = do_splice_direct(file_in, &pos_in, file_out, &pos_out, + count > MAX_RW_COUNT ? MAX_RW_COUNT : count, 0); + + file_accessed(file_in); + if (!(file_out->f_mode & FMODE_NOCMTIME)) + file_update_time(file_out); + +done: + return ret; +} +EXPORT_SYMBOL(generic_copy_file_range); + /* * copy_file_range() differs from regular file read and write in that it * specifically allows return partial success. When it does so is up to @@ -1552,6 +1590,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, struct inode *inode_in = file_inode(file_in); struct inode *inode_out = file_inode(file_out); ssize_t ret; + loff_t size_in; if (flags != 0) return -EINVAL; @@ -1577,6 +1616,15 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, if (len == 0) return 0; + /* Ensure offsets don't wrap. */ + if (pos_in + len < pos_in || pos_out + len < pos_out) + return -EINVAL; + + size_in = i_size_read(inode_in); + /* Ensure that source range is within EOF. */ + if (pos_in >= size_in || pos_in + len > size_in) + return -EINVAL; + file_start_write(file_out); /* @@ -1597,22 +1645,12 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, } } - if (file_out->f_op->copy_file_range) { + if (file_out->f_op->copy_file_range) ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out, pos_out, len, flags); - if (ret != -EOPNOTSUPP) - goto done; - } - - /* this could be relaxed once generic cross fs support is added */ - if (inode_in->i_sb != inode_out->i_sb) { - ret = -EXDEV; - goto done; - } - - ret = do_splice_direct(file_in, &pos_in, file_out, &pos_out, - len > MAX_RW_COUNT ? MAX_RW_COUNT : len, 0); - + else + ret = generic_copy_file_range(file_in, pos_in, file_out, + pos_out, len, flags); done: if (ret > 0) { fsnotify_access(file_in); diff --git a/include/linux/fs.h b/include/linux/fs.h index c95c080..c88ad09 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1874,6 +1874,9 @@ extern ssize_t vfs_readv(struct file *, const struct iovec __user *, unsigned long, loff_t *, rwf_t); extern ssize_t vfs_copy_file_range(struct file *, loff_t , struct file *, loff_t, size_t, unsigned int); +extern ssize_t generic_copy_file_range(struct file *file_int, loff_t pos_in, + struct file *file_out, loff_t pos_out, + loff_t len, unsigned int flags); extern int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, loff_t *count, @@ -3016,6 +3019,10 @@ static inline void remove_inode_hash(struct inode *inode) extern int generic_file_mmap(struct file *, struct vm_area_struct *); extern int generic_file_readonly_mmap(struct file *, struct vm_area_struct *); extern ssize_t generic_write_checks(struct kiocb *, struct iov_iter *); +extern int generic_access_check_limits(struct file *file, loff_t pos, + loff_t *count); +extern int generic_write_check_limits(struct file *file, loff_t pos, + loff_t *count); extern int generic_remap_checks(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, loff_t *count, unsigned int remap_flags); diff --git a/mm/filemap.c b/mm/filemap.c index 81adec8..894f3ae 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2829,8 +2829,7 @@ struct page *read_cache_page_gfp(struct address_space *mapping, * LFS limits. If pos is under the limit it becomes a short access. If it * exceeds the limit we return -EFBIG. */ -static int generic_access_check_limits(struct file *file, loff_t pos, - loff_t *count) +int generic_access_check_limits(struct file *file, loff_t pos, loff_t *count) { struct inode *inode = file->f_mapping->host; loff_t max_size = inode->i_sb->s_maxbytes; @@ -2844,8 +2843,7 @@ static int generic_access_check_limits(struct file *file, loff_t pos, return 0; } -static int generic_write_check_limits(struct file *file, loff_t pos, - loff_t *count) +int generic_write_check_limits(struct file *file, loff_t pos, loff_t *count) { loff_t limit = rlimit(RLIMIT_FSIZE);
Relax the condition that input files must be from the same file systems. Add checks that input parameters adhere semantics. If no copy_file_range() support is found, then do generic checks for the unsupported page cache ranges, LFS, limits, and clear setuid/setgid if not running as root before calling do_splice_direct(). Update atime,ctime,mtime afterwards. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> --- fs/read_write.c | 66 ++++++++++++++++++++++++++++++++++++++++++------------ include/linux/fs.h | 7 ++++++ mm/filemap.c | 6 ++--- 3 files changed, 61 insertions(+), 18 deletions(-)