diff mbox series

new flag COPY_FILE_RANGE_FILESIZE for copy_file_range()

Message ID 20190413205439.9623-1-shawn@git.icu (mailing list archive)
State New, archived
Headers show
Series new flag COPY_FILE_RANGE_FILESIZE for copy_file_range() | expand

Commit Message

Shawn Landden April 13, 2019, 8:54 p.m. UTC
If flags includes COPY_FILE_RANGE_FILESIZE then the length
copied is the length of the file. off_in and off_out are
ignored. len must be 0 or the file size.

This implementation saves a call to stat() in the common case
of copying files. It does not fix any race conditions, but that
is possible in the future with this interface.

EAGAIN: If COPY_FILE_RANGE_FILESIZE was passed and len is not 0
or the file size.

Signed-off-by: Shawn Landden <shawn@git.icu>
CC: <linux-api@vger.kernel.org>
---
 fs/read_write.c           | 14 +++++++++++++-
 include/uapi/linux/stat.h |  4 ++++
 2 files changed, 17 insertions(+), 1 deletion(-)

Comments

Darrick J. Wong April 14, 2019, 1:02 a.m. UTC | #1
On Sat, Apr 13, 2019 at 03:54:39PM -0500, Shawn Landden wrote:

/me pulls out his close-reading glasses and the copy_file_range manpage...

> If flags includes COPY_FILE_RANGE_FILESIZE then the length
> copied is the length of the file. off_in and off_out are
> ignored.  len must be 0 or the file size.

They're ignored?  As in the copy operation reads the number of bytes in
the file referenced by fd_in from fd_in at its current position and is
writes that out to fd_out at its current position?  I don't see why I
would want such an operation...

...but I can see how people could make use of a CFR_ENTIRE_FILE that
would check that both file descriptors are actually regular files, and
if so copy the entire contents of the fd_in file into the same position
in the fd_out file, and then set the fd_out file's length to match.  If
@off_in or @off_out are non-NULL then they'll be updated to the new EOFs
if the copy completes succesfully and @len can be anything.

Also: please update the manual page and the xfstests regression test for
this syscall.

> This implementation saves a call to stat() in the common case
> of copying files. It does not fix any race conditions, but that
> is possible in the future with this interface.
> 
> EAGAIN: If COPY_FILE_RANGE_FILESIZE was passed and len is not 0
> or the file size.

The values are invalid, so why would we tell userspace to try again
instead of the EINVAL that we usually use?

> Signed-off-by: Shawn Landden <shawn@git.icu>
> CC: <linux-api@vger.kernel.org>
> ---
>  fs/read_write.c           | 14 +++++++++++++-
>  include/uapi/linux/stat.h |  4 ++++
>  2 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 61b43ad7608e..6d06361f0856 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1557,7 +1557,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>  	struct inode *inode_out = file_inode(file_out);
>  	ssize_t ret;
>  
> -	if (flags != 0)
> +	if ((flags & ~COPY_FILE_RANGE_FILESIZE) != 0)

FWIW you might as well shorten the prefix to "CFR_" since nobody else is
using it.

--D

>  		return -EINVAL;
>  
>  	if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
> @@ -1565,6 +1565,18 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>  	if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
>  		return -EINVAL;
>  
> +	if (flags & COPY_FILE_RANGE_FILESIZE) {
> +		struct kstat stat;
> +		int error;
> +		error = vfs_getattr(&file_in->f_path, &stat,
> +				    STATX_SIZE, 0);
> +		if (error < 0)
> +			return error;
> +		if (!(len == 0 || len == stat.size))
> +			return -EAGAIN;
> +		len = stat.size;
> +	}
> +
>  	ret = rw_verify_area(READ, file_in, &pos_in, len);
>  	if (unlikely(ret))
>  		return ret;
> diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
> index 7b35e98d3c58..1075aa4666ef 100644
> --- a/include/uapi/linux/stat.h
> +++ b/include/uapi/linux/stat.h
> @@ -170,5 +170,9 @@ struct statx {
>  
>  #define STATX_ATTR_AUTOMOUNT		0x00001000 /* Dir: Automount trigger */
>  
> +/*
> + * Flags for copy_file_range()
> + */
> +#define COPY_FILE_RANGE_FILESIZE	0x00000001 /* Copy the full length of the input file */
>  
>  #endif /* _UAPI_LINUX_STAT_H */
> -- 
> 2.20.1
>
Amir Goldstein April 14, 2019, 7:10 a.m. UTC | #2
On Sun, Apr 14, 2019 at 4:04 AM Darrick J. Wong <darrick.wong@oracle.com> wrote:
>
> On Sat, Apr 13, 2019 at 03:54:39PM -0500, Shawn Landden wrote:
>
> /me pulls out his close-reading glasses and the copy_file_range manpage...
>
> > If flags includes COPY_FILE_RANGE_FILESIZE then the length
> > copied is the length of the file. off_in and off_out are
> > ignored.  len must be 0 or the file size.
>
> They're ignored?  As in the copy operation reads the number of bytes in
> the file referenced by fd_in from fd_in at its current position and is
> writes that out to fd_out at its current position?  I don't see why I
> would want such an operation...
>
> ...but I can see how people could make use of a CFR_ENTIRE_FILE that
> would check that both file descriptors are actually regular files, and
> if so copy the entire contents of the fd_in file into the same position
> in the fd_out file, and then set the fd_out file's length to match.  If
> @off_in or @off_out are non-NULL then they'll be updated to the new EOFs
> if the copy completes succesfully and @len can be anything.
>

IDGI. In what way would that be helpful?
Would the syscall fail if it cannot copy entire file (like clone_file_range)
or return bytes copied?
If latter, then user will have to call syscall again until getting 0
return value.
User can already call copy_file_range with len=SSIZE_MAX and get almost
the same thing.
Unless the idea is to optimize for less syscalls for copying very large files??
In that case, MAX_RW_COUNT limit for this syscall would need to be relaxed.

While on the subject, something that has been discussed in the past is that
copy_file_range() and sendfile() of a large file are not killable, so that is
that should be fixed, especially if the interface is going to be used to copy
more data in-kernel.

IOW, the motivation of the patch is not clear to me:

> This implementation saves a call to stat() in the common case

What is the real life workload where this micro optimization would
have any affect?

> It does not fix any race conditions, but that is possible in the future
> with this interface.

Then please present a plan or an implementation of how that interface
can solve race conditions and if that is the only motivation for the
interface than I do not see why we should merge the interface before
the implementation.

Please let me know if I am missing something.

Thanks,
Amir.
diff mbox series

Patch

diff --git a/fs/read_write.c b/fs/read_write.c
index 61b43ad7608e..6d06361f0856 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1557,7 +1557,7 @@  ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	struct inode *inode_out = file_inode(file_out);
 	ssize_t ret;
 
-	if (flags != 0)
+	if ((flags & ~COPY_FILE_RANGE_FILESIZE) != 0)
 		return -EINVAL;
 
 	if (S_ISDIR(inode_in->i_mode) || S_ISDIR(inode_out->i_mode))
@@ -1565,6 +1565,18 @@  ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	if (!S_ISREG(inode_in->i_mode) || !S_ISREG(inode_out->i_mode))
 		return -EINVAL;
 
+	if (flags & COPY_FILE_RANGE_FILESIZE) {
+		struct kstat stat;
+		int error;
+		error = vfs_getattr(&file_in->f_path, &stat,
+				    STATX_SIZE, 0);
+		if (error < 0)
+			return error;
+		if (!(len == 0 || len == stat.size))
+			return -EAGAIN;
+		len = stat.size;
+	}
+
 	ret = rw_verify_area(READ, file_in, &pos_in, len);
 	if (unlikely(ret))
 		return ret;
diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h
index 7b35e98d3c58..1075aa4666ef 100644
--- a/include/uapi/linux/stat.h
+++ b/include/uapi/linux/stat.h
@@ -170,5 +170,9 @@  struct statx {
 
 #define STATX_ATTR_AUTOMOUNT		0x00001000 /* Dir: Automount trigger */
 
+/*
+ * Flags for copy_file_range()
+ */
+#define COPY_FILE_RANGE_FILESIZE	0x00000001 /* Copy the full length of the input file */
 
 #endif /* _UAPI_LINUX_STAT_H */