Message ID | 20190413205439.9623-1-shawn@git.icu (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | new flag COPY_FILE_RANGE_FILESIZE for copy_file_range() | expand |
On Sat, Apr 13, 2019 at 03:54:39PM -0500, Shawn Landden wrote: /me pulls out his close-reading glasses and the copy_file_range manpage... > If flags includes COPY_FILE_RANGE_FILESIZE then the length > copied is the length of the file. off_in and off_out are > ignored. len must be 0 or the file size. They're ignored? As in the copy operation reads the number of bytes in the file referenced by fd_in from fd_in at its current position and is writes that out to fd_out at its current position? I don't see why I would want such an operation... ...but I can see how people could make use of a CFR_ENTIRE_FILE that would check that both file descriptors are actually regular files, and if so copy the entire contents of the fd_in file into the same position in the fd_out file, and then set the fd_out file's length to match. If @off_in or @off_out are non-NULL then they'll be updated to the new EOFs if the copy completes succesfully and @len can be anything. Also: please update the manual page and the xfstests regression test for this syscall. > This implementation saves a call to stat() in the common case > of copying files. It does not fix any race conditions, but that > is possible in the future with this interface. > > EAGAIN: If COPY_FILE_RANGE_FILESIZE was passed and len is not 0 > or the file size. The values are invalid, so why would we tell userspace to try again instead of the EINVAL that we usually use? > Signed-off-by: Shawn Landden <shawn@git.icu> > CC: <linux-api@vger.kernel.org> > --- > fs/read_write.c | 14 +++++++++++++- > include/uapi/linux/stat.h | 4 ++++ > 2 files changed, 17 insertions(+), 1 deletion(-) > > diff --git a/fs/read_write.c b/fs/read_write.c > index 61b43ad7608e..6d06361f0856 100644 > --- a/fs/read_write.c > +++ b/fs/read_write.c > @@ -1557,7 +1557,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > struct inode *inode_out = file_inode(file_out); > ssize_t ret; > > - if (flags != 0) > + if ((flags & ~COPY_FILE_RANGE_FILESIZE) != 0) FWIW you might as well shorten the prefix to "CFR_" since nobody else is using it. --D > return -EINVAL; > > if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode)) > @@ -1565,6 +1565,18 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, > if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode)) > return -EINVAL; > > + if (flags & COPY_FILE_RANGE_FILESIZE) { > + struct kstat stat; > + int error; > + error = vfs_getattr(&file_in->f_path, &stat, > + STATX_SIZE, 0); > + if (error < 0) > + return error; > + if (!(len == 0 || len == stat.size)) > + return -EAGAIN; > + len = stat.size; > + } > + > ret = rw_verify_area(READ, file_in, &pos_in, len); > if (unlikely(ret)) > return ret; > diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h > index 7b35e98d3c58..1075aa4666ef 100644 > --- a/include/uapi/linux/stat.h > +++ b/include/uapi/linux/stat.h > @@ -170,5 +170,9 @@ struct statx { > > #define STATX_ATTR_AUTOMOUNT 0x00001000 /* Dir: Automount trigger */ > > +/* > + * Flags for copy_file_range() > + */ > +#define COPY_FILE_RANGE_FILESIZE 0x00000001 /* Copy the full length of the input file */ > > #endif /* _UAPI_LINUX_STAT_H */ > -- > 2.20.1 >
On Sun, Apr 14, 2019 at 4:04 AM Darrick J. Wong <darrick.wong@oracle.com> wrote: > > On Sat, Apr 13, 2019 at 03:54:39PM -0500, Shawn Landden wrote: > > /me pulls out his close-reading glasses and the copy_file_range manpage... > > > If flags includes COPY_FILE_RANGE_FILESIZE then the length > > copied is the length of the file. off_in and off_out are > > ignored. len must be 0 or the file size. > > They're ignored? As in the copy operation reads the number of bytes in > the file referenced by fd_in from fd_in at its current position and is > writes that out to fd_out at its current position? I don't see why I > would want such an operation... > > ...but I can see how people could make use of a CFR_ENTIRE_FILE that > would check that both file descriptors are actually regular files, and > if so copy the entire contents of the fd_in file into the same position > in the fd_out file, and then set the fd_out file's length to match. If > @off_in or @off_out are non-NULL then they'll be updated to the new EOFs > if the copy completes succesfully and @len can be anything. > IDGI. In what way would that be helpful? Would the syscall fail if it cannot copy entire file (like clone_file_range) or return bytes copied? If latter, then user will have to call syscall again until getting 0 return value. User can already call copy_file_range with len=SSIZE_MAX and get almost the same thing. Unless the idea is to optimize for less syscalls for copying very large files?? In that case, MAX_RW_COUNT limit for this syscall would need to be relaxed. While on the subject, something that has been discussed in the past is that copy_file_range() and sendfile() of a large file are not killable, so that is that should be fixed, especially if the interface is going to be used to copy more data in-kernel. IOW, the motivation of the patch is not clear to me: > This implementation saves a call to stat() in the common case What is the real life workload where this micro optimization would have any affect? > It does not fix any race conditions, but that is possible in the future > with this interface. Then please present a plan or an implementation of how that interface can solve race conditions and if that is the only motivation for the interface than I do not see why we should merge the interface before the implementation. Please let me know if I am missing something. Thanks, Amir.
diff --git a/fs/read_write.c b/fs/read_write.c index 61b43ad7608e..6d06361f0856 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -1557,7 +1557,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, struct inode *inode_out = file_inode(file_out); ssize_t ret; - if (flags != 0) + if ((flags & ~COPY_FILE_RANGE_FILESIZE) != 0) return -EINVAL; if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode)) @@ -1565,6 +1565,18 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in, if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode)) return -EINVAL; + if (flags & COPY_FILE_RANGE_FILESIZE) { + struct kstat stat; + int error; + error = vfs_getattr(&file_in->f_path, &stat, + STATX_SIZE, 0); + if (error < 0) + return error; + if (!(len == 0 || len == stat.size)) + return -EAGAIN; + len = stat.size; + } + ret = rw_verify_area(READ, file_in, &pos_in, len); if (unlikely(ret)) return ret; diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h index 7b35e98d3c58..1075aa4666ef 100644 --- a/include/uapi/linux/stat.h +++ b/include/uapi/linux/stat.h @@ -170,5 +170,9 @@ struct statx { #define STATX_ATTR_AUTOMOUNT 0x00001000 /* Dir: Automount trigger */ +/* + * Flags for copy_file_range() + */ +#define COPY_FILE_RANGE_FILESIZE 0x00000001 /* Copy the full length of the input file */ #endif /* _UAPI_LINUX_STAT_H */
If flags includes COPY_FILE_RANGE_FILESIZE then the length copied is the length of the file. off_in and off_out are ignored. len must be 0 or the file size. This implementation saves a call to stat() in the common case of copying files. It does not fix any race conditions, but that is possible in the future with this interface. EAGAIN: If COPY_FILE_RANGE_FILESIZE was passed and len is not 0 or the file size. Signed-off-by: Shawn Landden <shawn@git.icu> CC: <linux-api@vger.kernel.org> --- fs/read_write.c | 14 +++++++++++++- include/uapi/linux/stat.h | 4 ++++ 2 files changed, 17 insertions(+), 1 deletion(-)