diff mbox

[v8,5/4] copy_file_range.2: New page documenting copy_file_range()

Message ID 1446844701-848-6-git-send-email-Anna.Schumaker@Netapp.com (mailing list archive)
State New, archived
Headers show

Commit Message

Schumaker, Anna Nov. 6, 2015, 9:18 p.m. UTC
copy_file_range() is a new system call for copying ranges of data
completely in the kernel.  This gives filesystems an opportunity to
implement some kind of "copy acceleration", such as reflinks or
server-side-copy (in the case of NFS).

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
v8:
- Document that files can not be open with O_APPEND.
---
 man2/copy_file_range.2 | 201 +++++++++++++++++++++++++++++++++++++++++++++++++
 man2/splice.2          |   1 +
 2 files changed, 202 insertions(+)
 create mode 100644 man2/copy_file_range.2

Comments

Michael Kerrisk (man-pages) Jan. 25, 2016, 1:45 p.m. UTC | #1
Hi Anna,

On 6 November 2015 at 22:18, Anna Schumaker <Anna.Schumaker@netapp.com> wrote:
> copy_file_range() is a new system call for copying ranges of data
> completely in the kernel.  This gives filesystems an opportunity to
> implement some kind of "copy acceleration", such as reflinks or
> server-side-copy (in the case of NFS).

I see that there was a V9 patch series, but this page that came
with the V8 series is the most recent that I can find. Is it up to
date, technically, with what has been merged into 4.5rc?

Thanks,

Michael

> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
> v8:
> - Document that files can not be open with O_APPEND.
> ---
>  man2/copy_file_range.2 | 201 +++++++++++++++++++++++++++++++++++++++++++++++++
>  man2/splice.2          |   1 +
>  2 files changed, 202 insertions(+)
>  create mode 100644 man2/copy_file_range.2
>
> diff --git a/man2/copy_file_range.2 b/man2/copy_file_range.2
> new file mode 100644
> index 0000000..d9f76d1
> --- /dev/null
> +++ b/man2/copy_file_range.2
> @@ -0,0 +1,201 @@
> +.\"This manpage is Copyright (C) 2015 Anna Schumaker <Anna.Schumaker@Netapp.com>
> +.\"
> +.\" %%%LICENSE_START(VERBATIM)
> +.\" Permission is granted to make and distribute verbatim copies of this
> +.\" manual provided the copyright notice and this permission notice are
> +.\" preserved on all copies.
> +.\"
> +.\" Permission is granted to copy and distribute modified versions of
> +.\" this manual under the conditions for verbatim copying, provided that
> +.\" the entire resulting derived work is distributed under the terms of
> +.\" a permission notice identical to this one.
> +.\"
> +.\" Since the Linux kernel and libraries are constantly changing, this
> +.\" manual page may be incorrect or out-of-date.  The author(s) assume
> +.\" no responsibility for errors or omissions, or for damages resulting
> +.\" from the use of the information contained herein.  The author(s) may
> +.\" not have taken the same level of care in the production of this
> +.\" manual, which is licensed free of charge, as they might when working
> +.\" professionally.
> +.\"
> +.\" Formatted or processed versions of this manual, if unaccompanied by
> +.\" the source, must acknowledge the copyright and authors of this work.
> +.\" %%%LICENSE_END
> +.\"
> +.TH COPY 2 2015-11-06 "Linux" "Linux Programmer's Manual"
> +.SH NAME
> +copy_file_range \- Copy a range of data from one file to another
> +.SH SYNOPSIS
> +.nf
> +.B #include <sys/syscall.h>
> +.B #include <unistd.h>
> +
> +.BI "ssize_t copy_file_range(int " fd_in ", loff_t *" off_in ", int " fd_out ",
> +.BI "                        loff_t *" off_out ", size_t " len \
> +", unsigned int " flags );
> +.fi
> +.SH DESCRIPTION
> +The
> +.BR copy_file_range ()
> +system call performs an in-kernel copy between two file descriptors
> +without the additional cost of transferring data from the kernel to userspace
> +and then back into the kernel.
> +It copies up to
> +.I len
> +bytes of data from file descriptor
> +.I fd_in
> +to file descriptor
> +.IR fd_out ,
> +overwriting any data that exists within the requested range of the target file.
> +
> +The following semantics apply for
> +.IR off_in ,
> +and similar statements apply to
> +.IR off_out :
> +.IP * 3
> +If
> +.I off_in
> +is NULL, then bytes are read from
> +.I fd_in
> +starting from the current file offset, and the offset is
> +adjusted by the number of bytes copied.
> +.IP *
> +If
> +.I off_in
> +is not NULL, then
> +.I off_in
> +must point to a buffer that specifies the starting
> +offset where bytes from
> +.I fd_in
> +will be read.  The current file offset of
> +.I fd_in
> +is not changed, but
> +.I off_in
> +is adjusted appropriately.
> +.PP
> +
> +The
> +.I flags
> +argument must be set to 0.
> +.SH RETURN VALUE
> +Upon successful completion,
> +.BR copy_file_range ()
> +will return the number of bytes copied between files.
> +This could be less than the length originally requested.
> +
> +On error,
> +.BR copy_file_range ()
> +returns \-1 and
> +.I errno
> +is set to indicate the error.
> +.SH ERRORS
> +.TP
> +.B EBADF
> +One or more file descriptors are not valid; or
> +.I fd_in
> +is not open for reading; or
> +.I fd_out
> +is not open for writing; or
> +.I fd_out
> +is open for appending.
> +.TP
> +.B EINVAL
> +Requested range extends beyond the end of the source file; or the
> +.I flags
> +argument is not 0.
> +.TP
> +.B EIO
> +A low level I/O error occurred while copying.
> +.TP
> +.B ENOMEM
> +Out of memory.
> +.TP
> +.B ENOSPC
> +There is not enough space on the target filesystem to complete the copy.
> +.TP
> +.B EXDEV
> +.IR file_in " and " file_out
> +are not on the same mounted filesystem.
> +.SH VERSIONS
> +The
> +.BR copy_file_range ()
> +system call first appeared in Linux 4.4.
> +.SH CONFORMING TO
> +The
> +.BR copy_file_range ()
> +system call is a nonstandard Linux extension.
> +.SH NOTES
> +If
> +.I file_in
> +is a sparse file, then
> +.BR copy_file_range ()
> +may expand any holes existing in the requested range.
> +Users may benefit from calling
> +.BR copy_file_range ()
> +in a loop, and using
> +.BR lseek (2)
> +to find the locations of data segments.
> +.SH EXAMPLE
> +.nf
> +#define _GNU_SOURCE
> +#include <fcntl.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <sys/stat.h>
> +#include <sys/syscall.h>
> +#include <unistd.h>
> +
> +loff_t copy_file_range(int fd_in, loff_t *off_in, int fd_out,
> +                       loff_t *off_out, size_t len, unsigned int flags)
> +{
> +    return syscall(__NR_copy_file_range, fd_in, off_in, fd_out,
> +                   off_out, len, flags);
> +}
> +
> +int main(int argc, char **argv)
> +{
> +    int fd_in, fd_out;
> +    struct stat stat;
> +    loff_t len, ret;
> +    char buf[2];
> +
> +    if (argc != 3) {
> +        fprintf(stderr, "Usage: %s <source> <destination>\\n", argv[0]);
> +        exit(EXIT_FAILURE);
> +    }
> +
> +    fd_in = open(argv[1], O_RDONLY);
> +    if (fd_in == \-1) {
> +        perror("open (argv[1])");
> +        exit(EXIT_FAILURE);
> +    }
> +
> +    if (fstat(fd_in, &stat) == \-1) {
> +        perror("fstat");
> +        exit(EXIT_FAILURE);
> +    }
> +    len = stat.st_size;
> +
> +    fd_out = open(argv[2], O_CREAT|O_WRONLY|O_TRUNC, 0644);
> +    if (fd_out == \-1) {
> +        perror("open (argv[2])");
> +        exit(EXIT_FAILURE);
> +    }
> +
> +    do {
> +        ret = copy_file_range(fd_in, NULL, fd_out, NULL, len, 0);
> +        if (ret == \-1) {
> +            perror("copy_file_range");
> +            exit(EXIT_FAILURE);
> +        }
> +
> +        len \-= ret;
> +    } while (len > 0);
> +
> +    close(fd_in);
> +    close(fd_out);
> +    exit(EXIT_SUCCESS);
> +}
> +.fi
> +.SH SEE ALSO
> +.BR splice (2)
> diff --git a/man2/splice.2 b/man2/splice.2
> index b9b4f42..5c162e0 100644
> --- a/man2/splice.2
> +++ b/man2/splice.2
> @@ -238,6 +238,7 @@ only pointers are copied, not the pages of the buffer.
>  See
>  .BR tee (2).
>  .SH SEE ALSO
> +.BR copy_file_range (2),
>  .BR sendfile (2),
>  .BR tee (2),
>  .BR vmsplice (2)
> --
> 2.6.2
>
Schumaker, Anna Jan. 25, 2016, 9:48 p.m. UTC | #2
Hi Michael,

On 01/25/2016 08:45 AM, Michael Kerrisk (man-pages) wrote:
> Hi Anna,
> 
> On 6 November 2015 at 22:18, Anna Schumaker <Anna.Schumaker@netapp.com> wrote:
>> copy_file_range() is a new system call for copying ranges of data
>> completely in the kernel.  This gives filesystems an opportunity to
>> implement some kind of "copy acceleration", such as reflinks or
>> server-side-copy (in the case of NFS).
> 
> I see that there was a V9 patch series, but this page that came
> with the V8 series is the most recent that I can find. Is it up to
> date, technically, with what has been merged into 4.5rc?

I double checked, and it looks like v9 only had a few small fixes so the man page from v8 should be up to date.  Sorry for the confusion!

Anna

> 
> Thanks,
> 
> Michael
> 
>> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
>> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>> ---
>> v8:
>> - Document that files can not be open with O_APPEND.
>> ---
>>  man2/copy_file_range.2 | 201 +++++++++++++++++++++++++++++++++++++++++++++++++
>>  man2/splice.2          |   1 +
>>  2 files changed, 202 insertions(+)
>>  create mode 100644 man2/copy_file_range.2
>>
>> diff --git a/man2/copy_file_range.2 b/man2/copy_file_range.2
>> new file mode 100644
>> index 0000000..d9f76d1
>> --- /dev/null
>> +++ b/man2/copy_file_range.2
>> @@ -0,0 +1,201 @@
>> +.\"This manpage is Copyright (C) 2015 Anna Schumaker <Anna.Schumaker@Netapp.com>
>> +.\"
>> +.\" %%%LICENSE_START(VERBATIM)
>> +.\" Permission is granted to make and distribute verbatim copies of this
>> +.\" manual provided the copyright notice and this permission notice are
>> +.\" preserved on all copies.
>> +.\"
>> +.\" Permission is granted to copy and distribute modified versions of
>> +.\" this manual under the conditions for verbatim copying, provided that
>> +.\" the entire resulting derived work is distributed under the terms of
>> +.\" a permission notice identical to this one.
>> +.\"
>> +.\" Since the Linux kernel and libraries are constantly changing, this
>> +.\" manual page may be incorrect or out-of-date.  The author(s) assume
>> +.\" no responsibility for errors or omissions, or for damages resulting
>> +.\" from the use of the information contained herein.  The author(s) may
>> +.\" not have taken the same level of care in the production of this
>> +.\" manual, which is licensed free of charge, as they might when working
>> +.\" professionally.
>> +.\"
>> +.\" Formatted or processed versions of this manual, if unaccompanied by
>> +.\" the source, must acknowledge the copyright and authors of this work.
>> +.\" %%%LICENSE_END
>> +.\"
>> +.TH COPY 2 2015-11-06 "Linux" "Linux Programmer's Manual"
>> +.SH NAME
>> +copy_file_range \- Copy a range of data from one file to another
>> +.SH SYNOPSIS
>> +.nf
>> +.B #include <sys/syscall.h>
>> +.B #include <unistd.h>
>> +
>> +.BI "ssize_t copy_file_range(int " fd_in ", loff_t *" off_in ", int " fd_out ",
>> +.BI "                        loff_t *" off_out ", size_t " len \
>> +", unsigned int " flags );
>> +.fi
>> +.SH DESCRIPTION
>> +The
>> +.BR copy_file_range ()
>> +system call performs an in-kernel copy between two file descriptors
>> +without the additional cost of transferring data from the kernel to userspace
>> +and then back into the kernel.
>> +It copies up to
>> +.I len
>> +bytes of data from file descriptor
>> +.I fd_in
>> +to file descriptor
>> +.IR fd_out ,
>> +overwriting any data that exists within the requested range of the target file.
>> +
>> +The following semantics apply for
>> +.IR off_in ,
>> +and similar statements apply to
>> +.IR off_out :
>> +.IP * 3
>> +If
>> +.I off_in
>> +is NULL, then bytes are read from
>> +.I fd_in
>> +starting from the current file offset, and the offset is
>> +adjusted by the number of bytes copied.
>> +.IP *
>> +If
>> +.I off_in
>> +is not NULL, then
>> +.I off_in
>> +must point to a buffer that specifies the starting
>> +offset where bytes from
>> +.I fd_in
>> +will be read.  The current file offset of
>> +.I fd_in
>> +is not changed, but
>> +.I off_in
>> +is adjusted appropriately.
>> +.PP
>> +
>> +The
>> +.I flags
>> +argument must be set to 0.
>> +.SH RETURN VALUE
>> +Upon successful completion,
>> +.BR copy_file_range ()
>> +will return the number of bytes copied between files.
>> +This could be less than the length originally requested.
>> +
>> +On error,
>> +.BR copy_file_range ()
>> +returns \-1 and
>> +.I errno
>> +is set to indicate the error.
>> +.SH ERRORS
>> +.TP
>> +.B EBADF
>> +One or more file descriptors are not valid; or
>> +.I fd_in
>> +is not open for reading; or
>> +.I fd_out
>> +is not open for writing; or
>> +.I fd_out
>> +is open for appending.
>> +.TP
>> +.B EINVAL
>> +Requested range extends beyond the end of the source file; or the
>> +.I flags
>> +argument is not 0.
>> +.TP
>> +.B EIO
>> +A low level I/O error occurred while copying.
>> +.TP
>> +.B ENOMEM
>> +Out of memory.
>> +.TP
>> +.B ENOSPC
>> +There is not enough space on the target filesystem to complete the copy.
>> +.TP
>> +.B EXDEV
>> +.IR file_in " and " file_out
>> +are not on the same mounted filesystem.
>> +.SH VERSIONS
>> +The
>> +.BR copy_file_range ()
>> +system call first appeared in Linux 4.4.
>> +.SH CONFORMING TO
>> +The
>> +.BR copy_file_range ()
>> +system call is a nonstandard Linux extension.
>> +.SH NOTES
>> +If
>> +.I file_in
>> +is a sparse file, then
>> +.BR copy_file_range ()
>> +may expand any holes existing in the requested range.
>> +Users may benefit from calling
>> +.BR copy_file_range ()
>> +in a loop, and using
>> +.BR lseek (2)
>> +to find the locations of data segments.
>> +.SH EXAMPLE
>> +.nf
>> +#define _GNU_SOURCE
>> +#include <fcntl.h>
>> +#include <stdio.h>
>> +#include <stdlib.h>
>> +#include <sys/stat.h>
>> +#include <sys/syscall.h>
>> +#include <unistd.h>
>> +
>> +loff_t copy_file_range(int fd_in, loff_t *off_in, int fd_out,
>> +                       loff_t *off_out, size_t len, unsigned int flags)
>> +{
>> +    return syscall(__NR_copy_file_range, fd_in, off_in, fd_out,
>> +                   off_out, len, flags);
>> +}
>> +
>> +int main(int argc, char **argv)
>> +{
>> +    int fd_in, fd_out;
>> +    struct stat stat;
>> +    loff_t len, ret;
>> +    char buf[2];
>> +
>> +    if (argc != 3) {
>> +        fprintf(stderr, "Usage: %s <source> <destination>\\n", argv[0]);
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    fd_in = open(argv[1], O_RDONLY);
>> +    if (fd_in == \-1) {
>> +        perror("open (argv[1])");
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    if (fstat(fd_in, &stat) == \-1) {
>> +        perror("fstat");
>> +        exit(EXIT_FAILURE);
>> +    }
>> +    len = stat.st_size;
>> +
>> +    fd_out = open(argv[2], O_CREAT|O_WRONLY|O_TRUNC, 0644);
>> +    if (fd_out == \-1) {
>> +        perror("open (argv[2])");
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    do {
>> +        ret = copy_file_range(fd_in, NULL, fd_out, NULL, len, 0);
>> +        if (ret == \-1) {
>> +            perror("copy_file_range");
>> +            exit(EXIT_FAILURE);
>> +        }
>> +
>> +        len \-= ret;
>> +    } while (len > 0);
>> +
>> +    close(fd_in);
>> +    close(fd_out);
>> +    exit(EXIT_SUCCESS);
>> +}
>> +.fi
>> +.SH SEE ALSO
>> +.BR splice (2)
>> diff --git a/man2/splice.2 b/man2/splice.2
>> index b9b4f42..5c162e0 100644
>> --- a/man2/splice.2
>> +++ b/man2/splice.2
>> @@ -238,6 +238,7 @@ only pointers are copied, not the pages of the buffer.
>>  See
>>  .BR tee (2).
>>  .SH SEE ALSO
>> +.BR copy_file_range (2),
>>  .BR sendfile (2),
>>  .BR tee (2),
>>  .BR vmsplice (2)
>> --
>> 2.6.2
>>
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael Kerrisk (man-pages) Jan. 26, 2016, 9:49 a.m. UTC | #3
Hi Anna,

Thanks for writing this page! I've merged it, and pushed to Git.
I made a few tweaks, which I think are all straightforward, 
but you might like to review my comments below.

On 11/06/2015 10:18 PM, Anna Schumaker wrote:
> copy_file_range() is a new system call for copying ranges of data
> completely in the kernel.  This gives filesystems an opportunity to
> implement some kind of "copy acceleration", such as reflinks or
> server-side-copy (in the case of NFS).
> 
> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
> v8:
> - Document that files can not be open with O_APPEND.
> ---
>  man2/copy_file_range.2 | 201 +++++++++++++++++++++++++++++++++++++++++++++++++
>  man2/splice.2          |   1 +
>  2 files changed, 202 insertions(+)
>  create mode 100644 man2/copy_file_range.2
> 
> diff --git a/man2/copy_file_range.2 b/man2/copy_file_range.2
> new file mode 100644
> index 0000000..d9f76d1
> --- /dev/null
> +++ b/man2/copy_file_range.2
> @@ -0,0 +1,201 @@
> +.\"This manpage is Copyright (C) 2015 Anna Schumaker <Anna.Schumaker@Netapp.com>
> +.\"
> +.\" %%%LICENSE_START(VERBATIM)
> +.\" Permission is granted to make and distribute verbatim copies of this
> +.\" manual provided the copyright notice and this permission notice are
> +.\" preserved on all copies.
> +.\"
> +.\" Permission is granted to copy and distribute modified versions of
> +.\" this manual under the conditions for verbatim copying, provided that
> +.\" the entire resulting derived work is distributed under the terms of
> +.\" a permission notice identical to this one.
> +.\"
> +.\" Since the Linux kernel and libraries are constantly changing, this
> +.\" manual page may be incorrect or out-of-date.  The author(s) assume
> +.\" no responsibility for errors or omissions, or for damages resulting
> +.\" from the use of the information contained herein.  The author(s) may
> +.\" not have taken the same level of care in the production of this
> +.\" manual, which is licensed free of charge, as they might when working
> +.\" professionally.
> +.\"
> +.\" Formatted or processed versions of this manual, if unaccompanied by
> +.\" the source, must acknowledge the copyright and authors of this work.
> +.\" %%%LICENSE_END
> +.\"
> +.TH COPY 2 2015-11-06 "Linux" "Linux Programmer's Manual"
> +.SH NAME
> +copy_file_range \- Copy a range of data from one file to another
> +.SH SYNOPSIS
> +.nf
> +.B #include <sys/syscall.h>
> +.B #include <unistd.h>
> +
> +.BI "ssize_t copy_file_range(int " fd_in ", loff_t *" off_in ", int " fd_out ",
> +.BI "                        loff_t *" off_out ", size_t " len \
> +", unsigned int " flags );
> +.fi
> +.SH DESCRIPTION
> +The
> +.BR copy_file_range ()
> +system call performs an in-kernel copy between two file descriptors
> +without the additional cost of transferring data from the kernel to userspace
> +and then back into the kernel.
> +It copies up to
> +.I len
> +bytes of data from file descriptor
> +.I fd_in
> +to file descriptor
> +.IR fd_out ,
> +overwriting any data that exists within the requested range of the target file.
> +
> +The following semantics apply for
> +.IR off_in ,
> +and similar statements apply to
> +.IR off_out :
> +.IP * 3
> +If
> +.I off_in
> +is NULL, then bytes are read from
> +.I fd_in
> +starting from the current file offset, and the offset is
> +adjusted by the number of bytes copied.
> +.IP *
> +If
> +.I off_in
> +is not NULL, then
> +.I off_in
> +must point to a buffer that specifies the starting
> +offset where bytes from
> +.I fd_in
> +will be read.  The current file offset of
> +.I fd_in
> +is not changed, but
> +.I off_in
> +is adjusted appropriately.
> +.PP
> +
> +The
> +.I flags
> +argument must be set to 0.
> +.SH RETURN VALUE
> +Upon successful completion,
> +.BR copy_file_range ()
> +will return the number of bytes copied between files.
> +This could be less than the length originally requested.
> +
> +On error,
> +.BR copy_file_range ()
> +returns \-1 and
> +.I errno
> +is set to indicate the error.
> +.SH ERRORS
> +.TP
> +.B EBADF
> +One or more file descriptors are not valid; or
> +.I fd_in
> +is not open for reading; or
> +.I fd_out
> +is not open for writing; or
> +.I fd_out
> +is open for appending.
> +.TP
> +.B EINVAL
> +Requested range extends beyond the end of the source file; or the
> +.I flags
> +argument is not 0.
> +.TP
> +.B EIO
> +A low level I/O error occurred while copying.
> +.TP
> +.B ENOMEM
> +Out of memory.
> +.TP
> +.B ENOSPC
> +There is not enough space on the target filesystem to complete the copy.
> +.TP
> +.B EXDEV
> +.IR file_in " and " file_out
> +are not on the same mounted filesystem.
> +.SH VERSIONS
> +The
> +.BR copy_file_range ()
> +system call first appeared in Linux 4.4.
> +.SH CONFORMING TO
> +The
> +.BR copy_file_range ()
> +system call is a nonstandard Linux extension.
> +.SH NOTES
> +If
> +.I file_in
> +is a sparse file, then
> +.BR copy_file_range ()
> +may expand any holes existing in the requested range.
> +Users may benefit from calling
> +.BR copy_file_range ()
> +in a loop, and using
> +.BR lseek (2)
> +to find the locations of data segments.

Here, I explicitly added mention of SEEK_HOLE and SEEK_DATA. okay?

> +.SH EXAMPLE
> +.nf
> +#define _GNU_SOURCE
> +#include <fcntl.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <sys/stat.h>
> +#include <sys/syscall.h>
> +#include <unistd.h>
> +
> +loff_t copy_file_range(int fd_in, loff_t *off_in, int fd_out,
> +                       loff_t *off_out, size_t len, unsigned int flags)
> +{
> +    return syscall(__NR_copy_file_range, fd_in, off_in, fd_out,
> +                   off_out, len, flags);
> +}
> +
> +int main(int argc, char **argv)
> +{
> +    int fd_in, fd_out;
> +    struct stat stat;
> +    loff_t len, ret;
> +    char buf[2];

'buf' is unused, so I removed it. I assume it was accidental cruft; this
is just a heads-up, in case some you meant have some code in the program
that would use that variable.

> +
> +    if (argc != 3) {
> +        fprintf(stderr, "Usage: %s <source> <destination>\\n", argv[0]);
> +        exit(EXIT_FAILURE);
> +    }
> +
> +    fd_in = open(argv[1], O_RDONLY);
> +    if (fd_in == \-1) {
> +        perror("open (argv[1])");
> +        exit(EXIT_FAILURE);
> +    }
> +
> +    if (fstat(fd_in, &stat) == \-1) {
> +        perror("fstat");
> +        exit(EXIT_FAILURE);
> +    }
> +    len = stat.st_size;
> +
> +    fd_out = open(argv[2], O_CREAT|O_WRONLY|O_TRUNC, 0644);
> +    if (fd_out == \-1) {
> +        perror("open (argv[2])");
> +        exit(EXIT_FAILURE);
> +    }
> +
> +    do {
> +        ret = copy_file_range(fd_in, NULL, fd_out, NULL, len, 0);
> +        if (ret == \-1) {
> +            perror("copy_file_range");
> +            exit(EXIT_FAILURE);
> +        }
> +
> +        len \-= ret;
> +    } while (len > 0);
> +
> +    close(fd_in);
> +    close(fd_out);
> +    exit(EXIT_SUCCESS);
> +}
> +.fi
> +.SH SEE ALSO
> +.BR splice (2)
> diff --git a/man2/splice.2 b/man2/splice.2
> index b9b4f42..5c162e0 100644
> --- a/man2/splice.2
> +++ b/man2/splice.2
> @@ -238,6 +238,7 @@ only pointers are copied, not the pages of the buffer.
>  See
>  .BR tee (2).
>  .SH SEE ALSO
> +.BR copy_file_range (2),
>  .BR sendfile (2),
>  .BR tee (2),
>  .BR vmsplice (2)

Thanks, 

Michael
Schumaker, Anna Jan. 26, 2016, 3:35 p.m. UTC | #4
On 01/26/2016 04:49 AM, Michael Kerrisk (man-pages) wrote:
> Hi Anna,
> 
> Thanks for writing this page! I've merged it, and pushed to Git.
> I made a few tweaks, which I think are all straightforward, 
> but you might like to review my comments below.
> 
> On 11/06/2015 10:18 PM, Anna Schumaker wrote:
>> copy_file_range() is a new system call for copying ranges of data
>> completely in the kernel.  This gives filesystems an opportunity to
>> implement some kind of "copy acceleration", such as reflinks or
>> server-side-copy (in the case of NFS).
>>
>> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
>> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>> ---
>> v8:
>> - Document that files can not be open with O_APPEND.
>> ---
>>  man2/copy_file_range.2 | 201 +++++++++++++++++++++++++++++++++++++++++++++++++
>>  man2/splice.2          |   1 +
>>  2 files changed, 202 insertions(+)
>>  create mode 100644 man2/copy_file_range.2
>>
>> diff --git a/man2/copy_file_range.2 b/man2/copy_file_range.2
>> new file mode 100644
>> index 0000000..d9f76d1
>> --- /dev/null
>> +++ b/man2/copy_file_range.2
>> @@ -0,0 +1,201 @@
>> +.\"This manpage is Copyright (C) 2015 Anna Schumaker <Anna.Schumaker@Netapp.com>
>> +.\"
>> +.\" %%%LICENSE_START(VERBATIM)
>> +.\" Permission is granted to make and distribute verbatim copies of this
>> +.\" manual provided the copyright notice and this permission notice are
>> +.\" preserved on all copies.
>> +.\"
>> +.\" Permission is granted to copy and distribute modified versions of
>> +.\" this manual under the conditions for verbatim copying, provided that
>> +.\" the entire resulting derived work is distributed under the terms of
>> +.\" a permission notice identical to this one.
>> +.\"
>> +.\" Since the Linux kernel and libraries are constantly changing, this
>> +.\" manual page may be incorrect or out-of-date.  The author(s) assume
>> +.\" no responsibility for errors or omissions, or for damages resulting
>> +.\" from the use of the information contained herein.  The author(s) may
>> +.\" not have taken the same level of care in the production of this
>> +.\" manual, which is licensed free of charge, as they might when working
>> +.\" professionally.
>> +.\"
>> +.\" Formatted or processed versions of this manual, if unaccompanied by
>> +.\" the source, must acknowledge the copyright and authors of this work.
>> +.\" %%%LICENSE_END
>> +.\"
>> +.TH COPY 2 2015-11-06 "Linux" "Linux Programmer's Manual"
>> +.SH NAME
>> +copy_file_range \- Copy a range of data from one file to another
>> +.SH SYNOPSIS
>> +.nf
>> +.B #include <sys/syscall.h>
>> +.B #include <unistd.h>
>> +
>> +.BI "ssize_t copy_file_range(int " fd_in ", loff_t *" off_in ", int " fd_out ",
>> +.BI "                        loff_t *" off_out ", size_t " len \
>> +", unsigned int " flags );
>> +.fi
>> +.SH DESCRIPTION
>> +The
>> +.BR copy_file_range ()
>> +system call performs an in-kernel copy between two file descriptors
>> +without the additional cost of transferring data from the kernel to userspace
>> +and then back into the kernel.
>> +It copies up to
>> +.I len
>> +bytes of data from file descriptor
>> +.I fd_in
>> +to file descriptor
>> +.IR fd_out ,
>> +overwriting any data that exists within the requested range of the target file.
>> +
>> +The following semantics apply for
>> +.IR off_in ,
>> +and similar statements apply to
>> +.IR off_out :
>> +.IP * 3
>> +If
>> +.I off_in
>> +is NULL, then bytes are read from
>> +.I fd_in
>> +starting from the current file offset, and the offset is
>> +adjusted by the number of bytes copied.
>> +.IP *
>> +If
>> +.I off_in
>> +is not NULL, then
>> +.I off_in
>> +must point to a buffer that specifies the starting
>> +offset where bytes from
>> +.I fd_in
>> +will be read.  The current file offset of
>> +.I fd_in
>> +is not changed, but
>> +.I off_in
>> +is adjusted appropriately.
>> +.PP
>> +
>> +The
>> +.I flags
>> +argument must be set to 0.
>> +.SH RETURN VALUE
>> +Upon successful completion,
>> +.BR copy_file_range ()
>> +will return the number of bytes copied between files.
>> +This could be less than the length originally requested.
>> +
>> +On error,
>> +.BR copy_file_range ()
>> +returns \-1 and
>> +.I errno
>> +is set to indicate the error.
>> +.SH ERRORS
>> +.TP
>> +.B EBADF
>> +One or more file descriptors are not valid; or
>> +.I fd_in
>> +is not open for reading; or
>> +.I fd_out
>> +is not open for writing; or
>> +.I fd_out
>> +is open for appending.
>> +.TP
>> +.B EINVAL
>> +Requested range extends beyond the end of the source file; or the
>> +.I flags
>> +argument is not 0.
>> +.TP
>> +.B EIO
>> +A low level I/O error occurred while copying.
>> +.TP
>> +.B ENOMEM
>> +Out of memory.
>> +.TP
>> +.B ENOSPC
>> +There is not enough space on the target filesystem to complete the copy.
>> +.TP
>> +.B EXDEV
>> +.IR file_in " and " file_out
>> +are not on the same mounted filesystem.
>> +.SH VERSIONS
>> +The
>> +.BR copy_file_range ()
>> +system call first appeared in Linux 4.4.
>> +.SH CONFORMING TO
>> +The
>> +.BR copy_file_range ()
>> +system call is a nonstandard Linux extension.
>> +.SH NOTES
>> +If
>> +.I file_in
>> +is a sparse file, then
>> +.BR copy_file_range ()
>> +may expand any holes existing in the requested range.
>> +Users may benefit from calling
>> +.BR copy_file_range ()
>> +in a loop, and using
>> +.BR lseek (2)
>> +to find the locations of data segments.
> 
> Here, I explicitly added mention of SEEK_HOLE and SEEK_DATA. okay?

Yeah, that makes sense.

> 
>> +.SH EXAMPLE
>> +.nf
>> +#define _GNU_SOURCE
>> +#include <fcntl.h>
>> +#include <stdio.h>
>> +#include <stdlib.h>
>> +#include <sys/stat.h>
>> +#include <sys/syscall.h>
>> +#include <unistd.h>
>> +
>> +loff_t copy_file_range(int fd_in, loff_t *off_in, int fd_out,
>> +                       loff_t *off_out, size_t len, unsigned int flags)
>> +{
>> +    return syscall(__NR_copy_file_range, fd_in, off_in, fd_out,
>> +                   off_out, len, flags);
>> +}
>> +
>> +int main(int argc, char **argv)
>> +{
>> +    int fd_in, fd_out;
>> +    struct stat stat;
>> +    loff_t len, ret;
>> +    char buf[2];
> 
> 'buf' is unused, so I removed it. I assume it was accidental cruft; this
> is just a heads-up, in case some you meant have some code in the program
> that would use that variable.

I don't even remember what I used 'buf' for, so that makes sense too.

Thanks for committing it!
Anna

> 
>> +
>> +    if (argc != 3) {
>> +        fprintf(stderr, "Usage: %s <source> <destination>\\n", argv[0]);
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    fd_in = open(argv[1], O_RDONLY);
>> +    if (fd_in == \-1) {
>> +        perror("open (argv[1])");
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    if (fstat(fd_in, &stat) == \-1) {
>> +        perror("fstat");
>> +        exit(EXIT_FAILURE);
>> +    }
>> +    len = stat.st_size;
>> +
>> +    fd_out = open(argv[2], O_CREAT|O_WRONLY|O_TRUNC, 0644);
>> +    if (fd_out == \-1) {
>> +        perror("open (argv[2])");
>> +        exit(EXIT_FAILURE);
>> +    }
>> +
>> +    do {
>> +        ret = copy_file_range(fd_in, NULL, fd_out, NULL, len, 0);
>> +        if (ret == \-1) {
>> +            perror("copy_file_range");
>> +            exit(EXIT_FAILURE);
>> +        }
>> +
>> +        len \-= ret;
>> +    } while (len > 0);
>> +
>> +    close(fd_in);
>> +    close(fd_out);
>> +    exit(EXIT_SUCCESS);
>> +}
>> +.fi
>> +.SH SEE ALSO
>> +.BR splice (2)
>> diff --git a/man2/splice.2 b/man2/splice.2
>> index b9b4f42..5c162e0 100644
>> --- a/man2/splice.2
>> +++ b/man2/splice.2
>> @@ -238,6 +238,7 @@ only pointers are copied, not the pages of the buffer.
>>  See
>>  .BR tee (2).
>>  .SH SEE ALSO
>> +.BR copy_file_range (2),
>>  .BR sendfile (2),
>>  .BR tee (2),
>>  .BR vmsplice (2)
> 
> Thanks, 
> 
> Michael
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/man2/copy_file_range.2 b/man2/copy_file_range.2
new file mode 100644
index 0000000..d9f76d1
--- /dev/null
+++ b/man2/copy_file_range.2
@@ -0,0 +1,201 @@ 
+.\"This manpage is Copyright (C) 2015 Anna Schumaker <Anna.Schumaker@Netapp.com>
+.\"
+.\" %%%LICENSE_START(VERBATIM)
+.\" Permission is granted to make and distribute verbatim copies of this
+.\" manual provided the copyright notice and this permission notice are
+.\" preserved on all copies.
+.\"
+.\" Permission is granted to copy and distribute modified versions of
+.\" this manual under the conditions for verbatim copying, provided that
+.\" the entire resulting derived work is distributed under the terms of
+.\" a permission notice identical to this one.
+.\"
+.\" Since the Linux kernel and libraries are constantly changing, this
+.\" manual page may be incorrect or out-of-date.  The author(s) assume
+.\" no responsibility for errors or omissions, or for damages resulting
+.\" from the use of the information contained herein.  The author(s) may
+.\" not have taken the same level of care in the production of this
+.\" manual, which is licensed free of charge, as they might when working
+.\" professionally.
+.\"
+.\" Formatted or processed versions of this manual, if unaccompanied by
+.\" the source, must acknowledge the copyright and authors of this work.
+.\" %%%LICENSE_END
+.\"
+.TH COPY 2 2015-11-06 "Linux" "Linux Programmer's Manual"
+.SH NAME
+copy_file_range \- Copy a range of data from one file to another
+.SH SYNOPSIS
+.nf
+.B #include <sys/syscall.h>
+.B #include <unistd.h>
+
+.BI "ssize_t copy_file_range(int " fd_in ", loff_t *" off_in ", int " fd_out ",
+.BI "                        loff_t *" off_out ", size_t " len \
+", unsigned int " flags );
+.fi
+.SH DESCRIPTION
+The
+.BR copy_file_range ()
+system call performs an in-kernel copy between two file descriptors
+without the additional cost of transferring data from the kernel to userspace
+and then back into the kernel.
+It copies up to
+.I len
+bytes of data from file descriptor
+.I fd_in
+to file descriptor
+.IR fd_out ,
+overwriting any data that exists within the requested range of the target file.
+
+The following semantics apply for
+.IR off_in ,
+and similar statements apply to
+.IR off_out :
+.IP * 3
+If
+.I off_in
+is NULL, then bytes are read from
+.I fd_in
+starting from the current file offset, and the offset is
+adjusted by the number of bytes copied.
+.IP *
+If
+.I off_in
+is not NULL, then
+.I off_in
+must point to a buffer that specifies the starting
+offset where bytes from
+.I fd_in
+will be read.  The current file offset of
+.I fd_in
+is not changed, but
+.I off_in
+is adjusted appropriately.
+.PP
+
+The
+.I flags
+argument must be set to 0.
+.SH RETURN VALUE
+Upon successful completion,
+.BR copy_file_range ()
+will return the number of bytes copied between files.
+This could be less than the length originally requested.
+
+On error,
+.BR copy_file_range ()
+returns \-1 and
+.I errno
+is set to indicate the error.
+.SH ERRORS
+.TP
+.B EBADF
+One or more file descriptors are not valid; or
+.I fd_in
+is not open for reading; or
+.I fd_out
+is not open for writing; or
+.I fd_out
+is open for appending.
+.TP
+.B EINVAL
+Requested range extends beyond the end of the source file; or the
+.I flags
+argument is not 0.
+.TP
+.B EIO
+A low level I/O error occurred while copying.
+.TP
+.B ENOMEM
+Out of memory.
+.TP
+.B ENOSPC
+There is not enough space on the target filesystem to complete the copy.
+.TP
+.B EXDEV
+.IR file_in " and " file_out
+are not on the same mounted filesystem.
+.SH VERSIONS
+The
+.BR copy_file_range ()
+system call first appeared in Linux 4.4.
+.SH CONFORMING TO
+The
+.BR copy_file_range ()
+system call is a nonstandard Linux extension.
+.SH NOTES
+If
+.I file_in
+is a sparse file, then
+.BR copy_file_range ()
+may expand any holes existing in the requested range.
+Users may benefit from calling
+.BR copy_file_range ()
+in a loop, and using
+.BR lseek (2)
+to find the locations of data segments.
+.SH EXAMPLE
+.nf
+#define _GNU_SOURCE
+#include <fcntl.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/stat.h>
+#include <sys/syscall.h>
+#include <unistd.h>
+
+loff_t copy_file_range(int fd_in, loff_t *off_in, int fd_out,
+                       loff_t *off_out, size_t len, unsigned int flags)
+{
+    return syscall(__NR_copy_file_range, fd_in, off_in, fd_out,
+                   off_out, len, flags);
+}
+
+int main(int argc, char **argv)
+{
+    int fd_in, fd_out;
+    struct stat stat;
+    loff_t len, ret;
+    char buf[2];
+
+    if (argc != 3) {
+        fprintf(stderr, "Usage: %s <source> <destination>\\n", argv[0]);
+        exit(EXIT_FAILURE);
+    }
+
+    fd_in = open(argv[1], O_RDONLY);
+    if (fd_in == \-1) {
+        perror("open (argv[1])");
+        exit(EXIT_FAILURE);
+    }
+
+    if (fstat(fd_in, &stat) == \-1) {
+        perror("fstat");
+        exit(EXIT_FAILURE);
+    }
+    len = stat.st_size;
+
+    fd_out = open(argv[2], O_CREAT|O_WRONLY|O_TRUNC, 0644);
+    if (fd_out == \-1) {
+        perror("open (argv[2])");
+        exit(EXIT_FAILURE);
+    }
+
+    do {
+        ret = copy_file_range(fd_in, NULL, fd_out, NULL, len, 0);
+        if (ret == \-1) {
+            perror("copy_file_range");
+            exit(EXIT_FAILURE);
+        }
+
+        len \-= ret;
+    } while (len > 0);
+
+    close(fd_in);
+    close(fd_out);
+    exit(EXIT_SUCCESS);
+}
+.fi
+.SH SEE ALSO
+.BR splice (2)
diff --git a/man2/splice.2 b/man2/splice.2
index b9b4f42..5c162e0 100644
--- a/man2/splice.2
+++ b/man2/splice.2
@@ -238,6 +238,7 @@  only pointers are copied, not the pages of the buffer.
 See
 .BR tee (2).
 .SH SEE ALSO
+.BR copy_file_range (2),
 .BR sendfile (2),
 .BR tee (2),
 .BR vmsplice (2)