diff mbox series

copy_file_range.2: Kernel v5.12 updates

Message ID 20210224142307.7284-1-lhenriques@suse.de (mailing list archive)
State New, archived
Headers show
Series copy_file_range.2: Kernel v5.12 updates | expand

Commit Message

Luis Henriques Feb. 24, 2021, 2:23 p.m. UTC
Update man-page with recent changes to this syscall.

Signed-off-by: Luis Henriques <lhenriques@suse.de>
---
Hi!

Here's a suggestion for fixing the manpage for copy_file_range().  Note that
I've assumed the fix will hit 5.12.

 man2/copy_file_range.2 | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

Comments

Amir Goldstein Feb. 24, 2021, 4:10 p.m. UTC | #1
On Wed, Feb 24, 2021 at 4:22 PM Luis Henriques <lhenriques@suse.de> wrote:
>
> Update man-page with recent changes to this syscall.
>
> Signed-off-by: Luis Henriques <lhenriques@suse.de>
> ---
> Hi!
>
> Here's a suggestion for fixing the manpage for copy_file_range().  Note that
> I've assumed the fix will hit 5.12.
>
>  man2/copy_file_range.2 | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/man2/copy_file_range.2 b/man2/copy_file_range.2
> index 611a39b8026b..b0fd85e2631e 100644
> --- a/man2/copy_file_range.2
> +++ b/man2/copy_file_range.2
> @@ -169,6 +169,9 @@ Out of memory.
>  .B ENOSPC
>  There is not enough space on the target filesystem to complete the copy.
>  .TP
> +.B EOPNOTSUPP
> +The filesystem does not support this operation.
> +.TP
>  .B EOVERFLOW
>  The requested source or destination range is too large to represent in the
>  specified data types.
> @@ -187,7 +190,7 @@ refers to an active swap file.
>  .B EXDEV
>  The files referred to by
>  .IR fd_in " and " fd_out
> -are not on the same mounted filesystem (pre Linux 5.3).
> +are not on the same mounted filesystem (pre Linux 5.3 and post Linux 5.12).

I think you need to drop the (Linux range) altogether.
What's missing here is the NFS cross server copy use case.
Maybe:

...are not on the same mounted filesystem and the source and target filesystems
do not support cross-filesystem copy.

You may refer the reader to VERSIONS section where it will say which
filesystems support cross-fs copy as of kernel version XXX (i.e. cifs and nfs).

>  .SH VERSIONS
>  The
>  .BR copy_file_range ()
> @@ -202,6 +205,11 @@ Applications should target the behaviour and requirements of 5.3 kernels.
>  .PP
>  First support for cross-filesystem copies was introduced in Linux 5.3.
>  Older kernels will return -EXDEV when cross-filesystem copies are attempted.
> +.PP
> +After Linux 5.12, support for copies between different filesystems was dropped.
> +However, individual filesystems may still provide
> +.BR copy_file_range ()
> +implementations that allow copies across different devices.

Again, this is not likely to stay uptodate for very long.
The stable kernels are expected to apply your patch (because it fixes
a regression)
so this should be phrased differently.
If it were me, I would provide all the details of the situation to
Michael and ask him
to write the best description for this section.

Thanks,
Amir.
Luis Henriques Feb. 25, 2021, 10:21 a.m. UTC | #2
On Wed, Feb 24, 2021 at 06:10:45PM +0200, Amir Goldstein wrote:
> On Wed, Feb 24, 2021 at 4:22 PM Luis Henriques <lhenriques@suse.de> wrote:
> >
> > Update man-page with recent changes to this syscall.
> >
> > Signed-off-by: Luis Henriques <lhenriques@suse.de>
> > ---
> > Hi!
> >
> > Here's a suggestion for fixing the manpage for copy_file_range().  Note that
> > I've assumed the fix will hit 5.12.
> >
> >  man2/copy_file_range.2 | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/man2/copy_file_range.2 b/man2/copy_file_range.2
> > index 611a39b8026b..b0fd85e2631e 100644
> > --- a/man2/copy_file_range.2
> > +++ b/man2/copy_file_range.2
> > @@ -169,6 +169,9 @@ Out of memory.
> >  .B ENOSPC
> >  There is not enough space on the target filesystem to complete the copy.
> >  .TP
> > +.B EOPNOTSUPP
> > +The filesystem does not support this operation.
> > +.TP
> >  .B EOVERFLOW
> >  The requested source or destination range is too large to represent in the
> >  specified data types.
> > @@ -187,7 +190,7 @@ refers to an active swap file.
> >  .B EXDEV
> >  The files referred to by
> >  .IR fd_in " and " fd_out
> > -are not on the same mounted filesystem (pre Linux 5.3).
> > +are not on the same mounted filesystem (pre Linux 5.3 and post Linux 5.12).
> 
> I think you need to drop the (Linux range) altogether.
> What's missing here is the NFS cross server copy use case.
> Maybe:
> 
> ...are not on the same mounted filesystem and the source and target filesystems
> do not support cross-filesystem copy.
> 
> You may refer the reader to VERSIONS section where it will say which
> filesystems support cross-fs copy as of kernel version XXX (i.e. cifs and nfs).
> 
> >  .SH VERSIONS
> >  The
> >  .BR copy_file_range ()
> > @@ -202,6 +205,11 @@ Applications should target the behaviour and requirements of 5.3 kernels.
> >  .PP
> >  First support for cross-filesystem copies was introduced in Linux 5.3.
> >  Older kernels will return -EXDEV when cross-filesystem copies are attempted.
> > +.PP
> > +After Linux 5.12, support for copies between different filesystems was dropped.
> > +However, individual filesystems may still provide
> > +.BR copy_file_range ()
> > +implementations that allow copies across different devices.
> 
> Again, this is not likely to stay uptodate for very long.
> The stable kernels are expected to apply your patch (because it fixes
> a regression)
> so this should be phrased differently.
> If it were me, I would provide all the details of the situation to
> Michael and ask him
> to write the best description for this section.

Thanks Amir.

Yeah, it's tricky.  Support was added and then dropped.   Since stable
kernels will be picking this patch,  maybe the best thing to do is to no
mention the generic cross-filesystem support at all...?  Or simply say
that 5.3 temporarily supported it but that support was later dropped.

Michael (or Alejandro), would you be OK handling this yourself as Amir
suggested?

Cheers,
--
Luís
Alejandro Colomar Feb. 26, 2021, 10:13 a.m. UTC | #3
Hello Luis,

On 2/25/21 11:21 AM, Luis Henriques wrote:
> On Wed, Feb 24, 2021 at 06:10:45PM +0200, Amir Goldstein wrote:
>> If it were me, I would provide all the details of the situation to
>> Michael and ask him
>> to write the best description for this section.
> 
> Thanks Amir.
> 
> Yeah, it's tricky.  Support was added and then dropped.   Since stable
> kernels will be picking this patch,  maybe the best thing to do is to no
> mention the generic cross-filesystem support at all...?  Or simply say
> that 5.3 temporarily supported it but that support was later dropped.
> 
> Michael (or Alejandro), would you be OK handling this yourself as Amir
> suggested?

Could you please provide a more detailed history of what is to be 
documented?

Thanks,

Alex
Amir Goldstein Feb. 26, 2021, 10:34 a.m. UTC | #4
On Fri, Feb 26, 2021 at 12:13 PM Alejandro Colomar (man-pages)
<alx.manpages@gmail.com> wrote:
>
> Hello Luis,
>
> On 2/25/21 11:21 AM, Luis Henriques wrote:
> > On Wed, Feb 24, 2021 at 06:10:45PM +0200, Amir Goldstein wrote:
> >> If it were me, I would provide all the details of the situation to
> >> Michael and ask him
> >> to write the best description for this section.
> >
> > Thanks Amir.
> >
> > Yeah, it's tricky.  Support was added and then dropped.   Since stable
> > kernels will be picking this patch,  maybe the best thing to do is to no
> > mention the generic cross-filesystem support at all...?  Or simply say
> > that 5.3 temporarily supported it but that support was later dropped.
> >
> > Michael (or Alejandro), would you be OK handling this yourself as Amir
> > suggested?
>
> Could you please provide a more detailed history of what is to be
> documented?
>

Is this detailed enough? ;-)

https://lwn.net/Articles/846403/

Thanks,
Amir.
Alejandro Colomar Feb. 26, 2021, 11:15 a.m. UTC | #5
Hello Amir,

On 2/26/21 11:34 AM, Amir Goldstein wrote:
> Is this detailed enough? ;-)
> 
> https://lwn.net/Articles/846403/

I'm sorry I can't read it yet:

[
Subscription required
The page you have tried to view (How useful should copy_file_range() 
be?) is currently available to LWN subscribers only. Reader 
subscriptions are a necessary way to fund the continued existence of LWN 
and the quality of its content.
[...]
(Alternatively, this item will become freely available on March 4, 2021)
]

However, the 4th of March is close enough, i guess.

Thanks,

Alex
Jeff Layton Feb. 26, 2021, 1:59 p.m. UTC | #6
On Fri, 2021-02-26 at 12:15 +0100, Alejandro Colomar (man-pages) wrote:
> Hello Amir,
> 
> On 2/26/21 11:34 AM, Amir Goldstein wrote:
> > Is this detailed enough? ;-)
> > 
> > https://lwn.net/Articles/846403/
> 
> I'm sorry I can't read it yet:
> 
> [
> Subscription required
> The page you have tried to view (How useful should copy_file_range() 
> be?) is currently available to LWN subscribers only. Reader 
> subscriptions are a necessary way to fund the continued existence of LWN 
> and the quality of its content.
> [...]
> (Alternatively, this item will become freely available on March 4, 2021)
> ]
> 


Here's a link that should work. I'm probably breaking the rules a bit as
a subscriber, but hopefully Jon won't mind too much. FWIW, I've found it
to be worthwhile to subscribe to LWN if you're doing a lot of kernel
development:

    https://lwn.net/SubscriberLink/846403/0fd639403e629cab/

Cheers,
Alejandro Colomar Feb. 26, 2021, 9:26 p.m. UTC | #7
Hello Jeff,

On 2/26/21 2:59 PM, Jeff Layton wrote:
> Here's a link that should work. I'm probably breaking the rules a bit as
> a subscriber, but hopefully Jon won't mind too much. FWIW, I've found it
> to be worthwhile to subscribe to LWN if you're doing a lot of kernel
> development:
> 
>      https://lwn.net/SubscriberLink/846403/0fd639403e629cab/

Thanks!  (I already received the link privately some minutes before from 
various people.)

It seems that he considers it fair use :)

[[
Where is it appropriate to post a subscriber link?

Almost anywhere. Private mail, messages to project mailing lists, and 
blog entries are all appropriate. As long as people do not use 
subscriber links as a way to defeat our attempts to gain subscribers, we 
are happy to see them shared.
]]
<https://lwn.net/op/FAQ.lwn#site>

Cheers,

Alex
Alejandro Colomar Feb. 26, 2021, 10:18 p.m. UTC | #8
Hello Amir, Luis,

On 2/24/21 5:10 PM, Amir Goldstein wrote:
> On Wed, Feb 24, 2021 at 4:22 PM Luis Henriques <lhenriques@suse.de> wrote:
>>
>> Update man-page with recent changes to this syscall.
>>
>> Signed-off-by: Luis Henriques <lhenriques@suse.de>
>> ---
>> Hi!
>>
>> Here's a suggestion for fixing the manpage for copy_file_range().  Note that
>> I've assumed the fix will hit 5.12.
>>
>>   man2/copy_file_range.2 | 10 +++++++++-
>>   1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/man2/copy_file_range.2 b/man2/copy_file_range.2
>> index 611a39b8026b..b0fd85e2631e 100644
>> --- a/man2/copy_file_range.2
>> +++ b/man2/copy_file_range.2
>> @@ -169,6 +169,9 @@ Out of memory.
>>   .B ENOSPC
>>   There is not enough space on the target filesystem to complete the copy.
>>   .TP
>> +.B EOPNOTSUPP

I'll add the kernel version here:

.BR EOPNOTSUPP " (since Linux 5.12)"

>> +The filesystem does not support this operation >> +.TP
>>   .B EOVERFLOW
>>   The requested source or destination range is too large to represent in the
>>   specified data types.
>> @@ -187,7 +190,7 @@ refers to an active swap file.
>>   .B EXDEV
>>   The files referred to by
>>   .IR fd_in " and " fd_out
>> -are not on the same mounted filesystem (pre Linux 5.3).
>> +are not on the same mounted filesystem (pre Linux 5.3 and post Linux 5.12).

I'm not sure that 'mounted' adds any value here.  Would you remove the 
word here?

It reads as if two separate devices with the same filesystem type would 
still give this error.

Per the LWN.net article Amir shared, this is permitted ("When called 
from user space, copy_file_range() will only try to copy a file across 
filesystems if the two are of the same type").

This behavior was slightly different before 5.3 AFAICR (was it?) ("until 
then, copy_file_range() refused to copy between files that were not 
located on the same filesystem.").  If that's the case, I'd specify the 
difference, or more probably split the error into two, one before 5.3, 
and one since 5.12.

> 
> I think you need to drop the (Linux range) altogether.

I'll keep the range.  Users of 5.3..5.11 might be surprised if the 
filesystems are different and they don't get an error, I think.

I reworded it to follow other pages conventions:

.BR EXDEV " (before Linux 5.3; or since Linux 5.12)"

which renders as:

        EXDEV (before Linux 5.3; or since Linux 5.12)
               The files referred to by fd_in and fd_out are not on
               the same mounted filesystem.


> What's missing here is the NFS cross server copy use case.
> Maybe:
> 
> ...are not on the same mounted filesystem and the source and target filesystems
> do not support cross-filesystem copy.

Yes.

Again, this wasn't true before 5.3, right?

> 
> You may refer the reader to VERSIONS section where it will say which
> filesystems support cross-fs copy as of kernel version XXX (i.e. cifs and nfs).
> 
>>   .SH VERSIONS
>>   The
>>   .BR copy_file_range ()
>> @@ -202,6 +205,11 @@ Applications should target the behaviour and requirements of 5.3 kernels.
>>   .PP
>>   First support for cross-filesystem copies was introduced in Linux 5.3.
>>   Older kernels will return -EXDEV when cross-filesystem copies are attempted.
>> +.PP
>> +After Linux 5.12, support for copies between different filesystems was dropped.
>> +However, individual filesystems may still provide
>> +.BR copy_file_range ()
>> +implementations that allow copies across different devices.
> 
> Again, this is not likely to stay uptodate for very long.
> The stable kernels are expected to apply your patch (because it fixes
> a regression)
> so this should be phrased differently.
> If it were me, I would provide all the details of the situation to
> Michael and ask him
> to write the best description for this section.

I'll look into more detail at this part in a later review.


On 2/26/21 11:34 AM, Amir Goldstein wrote:
 > Is this detailed enough? ;-)
 >
 > https://lwn.net/Articles/846403/

Yes, it is!



Thanks,

Alex
Amir Goldstein Feb. 27, 2021, 5:41 a.m. UTC | #9
On Sat, Feb 27, 2021 at 12:19 AM Alejandro Colomar (man-pages)
<alx.manpages@gmail.com> wrote:
>
> Hello Amir, Luis,
>
> On 2/24/21 5:10 PM, Amir Goldstein wrote:
> > On Wed, Feb 24, 2021 at 4:22 PM Luis Henriques <lhenriques@suse.de> wrote:
> >>
> >> Update man-page with recent changes to this syscall.
> >>
> >> Signed-off-by: Luis Henriques <lhenriques@suse.de>
> >> ---
> >> Hi!
> >>
> >> Here's a suggestion for fixing the manpage for copy_file_range().  Note that
> >> I've assumed the fix will hit 5.12.
> >>
> >>   man2/copy_file_range.2 | 10 +++++++++-
> >>   1 file changed, 9 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/man2/copy_file_range.2 b/man2/copy_file_range.2
> >> index 611a39b8026b..b0fd85e2631e 100644
> >> --- a/man2/copy_file_range.2
> >> +++ b/man2/copy_file_range.2
> >> @@ -169,6 +169,9 @@ Out of memory.
> >>   .B ENOSPC
> >>   There is not enough space on the target filesystem to complete the copy.
> >>   .TP
> >> +.B EOPNOTSUPP
>
> I'll add the kernel version here:
>
> .BR EOPNOTSUPP " (since Linux 5.12)"

Error could be returned prior to 5.3 and would be probably returned
by future stable kernels 5.3..5.12 too

>
> >> +The filesystem does not support this operation >> +.TP
> >>   .B EOVERFLOW
> >>   The requested source or destination range is too large to represent in the
> >>   specified data types.
> >> @@ -187,7 +190,7 @@ refers to an active swap file.
> >>   .B EXDEV
> >>   The files referred to by
> >>   .IR fd_in " and " fd_out
> >> -are not on the same mounted filesystem (pre Linux 5.3).
> >> +are not on the same mounted filesystem (pre Linux 5.3 and post Linux 5.12).
>
> I'm not sure that 'mounted' adds any value here.  Would you remove the
> word here?

See rename(2). 'mounted' in this context is explained there.
HOWEVER, it does not fit here.
copy_file_range() IS allowed between two mounts of the same filesystem instance.

To make things more complicated, it appears that cross mount clone is not
allowed via FICLONE/FICLONERANGE ioctl, so ioctl_ficlonerange(2) man page
also uses the 'mounted filesystem' terminology for EXDEV

As things stand now, because of the fallback to clone logic,
copy_file_range() provides a way for users to clone across different mounts
of the same filesystem instance, which they cannot do with the FICLONE ioctl.

Fun :)

BTW, I don't know if preventing cross mount clone was done intentionally,
but as I wrote in a comment in the code once:

        /*
         * FICLONE/FICLONERANGE ioctls enforce that src and dest files are on
         * the same mount. Practically, they only need to be on the same file
         * system.
         */

>
> It reads as if two separate devices with the same filesystem type would
> still give this error.
>
> Per the LWN.net article Amir shared, this is permitted ("When called
> from user space, copy_file_range() will only try to copy a file across
> filesystems if the two are of the same type").
>
> This behavior was slightly different before 5.3 AFAICR (was it?) ("until
> then, copy_file_range() refused to copy between files that were not
> located on the same filesystem.").  If that's the case, I'd specify the
> difference, or more probably split the error into two, one before 5.3,
> and one since 5.12.
>

True.

> >
> > I think you need to drop the (Linux range) altogether.
>
> I'll keep the range.  Users of 5.3..5.11 might be surprised if the
> filesystems are different and they don't get an error, I think.
>
> I reworded it to follow other pages conventions:
>
> .BR EXDEV " (before Linux 5.3; or since Linux 5.12)"
>
> which renders as:
>
>         EXDEV (before Linux 5.3; or since Linux 5.12)
>                The files referred to by fd_in and fd_out are not on
>                the same mounted filesystem.
>

drop 'mounted'

>
> > What's missing here is the NFS cross server copy use case.
> > Maybe:
> >
> > ...are not on the same mounted filesystem and the source and target filesystems
> > do not support cross-filesystem copy.
>
> Yes.
>
> Again, this wasn't true before 5.3, right?
>

Right.
Actually, v5.3 provides the vfs capabilities for filesystems to support
cross fs copy. I am not sure if NFS already implements cross fs copy in
v5.3 and not sure about cifs. Need to get input from nfs/cis developers
or dig in the release notes for server-side copy.

> >
> > You may refer the reader to VERSIONS section where it will say which
> > filesystems support cross-fs copy as of kernel version XXX (i.e. cifs and nfs).
> >
> >>   .SH VERSIONS
> >>   The
> >>   .BR copy_file_range ()
> >> @@ -202,6 +205,11 @@ Applications should target the behaviour and requirements of 5.3 kernels.
> >>   .PP
> >>   First support for cross-filesystem copies was introduced in Linux 5.3.
> >>   Older kernels will return -EXDEV when cross-filesystem copies are attempted.
> >> +.PP
> >> +After Linux 5.12, support for copies between different filesystems was dropped.
> >> +However, individual filesystems may still provide
> >> +.BR copy_file_range ()
> >> +implementations that allow copies across different devices.
> >
> > Again, this is not likely to stay uptodate for very long.
> > The stable kernels are expected to apply your patch (because it fixes
> > a regression)
> > so this should be phrased differently.
> > If it were me, I would provide all the details of the situation to
> > Michael and ask him
> > to write the best description for this section.
>
> I'll look into more detail at this part in a later review.
>
>
> On 2/26/21 11:34 AM, Amir Goldstein wrote:
>  > Is this detailed enough? ;-)
>  >
>  > https://lwn.net/Articles/846403/
>
> Yes, it is!
>

Thanks to LWN :)

Thanks,
Amir.
Alejandro Colomar Feb. 27, 2021, 12:20 p.m. UTC | #10
Hi Amir,

On 2/27/21 6:41 AM, Amir Goldstein wrote:
> On Sat, Feb 27, 2021 at 12:19 AM Alejandro Colomar (man-pages)
>> On 2/24/21 5:10 PM, Amir Goldstein wrote:
>>> On Wed, Feb 24, 2021 at 4:22 PM Luis Henriques <lhenriques@suse.de> wrote:
>>>>    .TP
>>>> +.B EOPNOTSUPP
>>
>> I'll add the kernel version here:
>>
>> .BR EOPNOTSUPP " (since Linux 5.12)"
> 
> Error could be returned prior to 5.3 and would be probably returned
> by future stable kernels 5.3..5.12 too

OK, I think I'll state <5.3 and >=5.12 for the moment, and if Greg adds 
that to stable 5.3..5.11 kernels, please update me.

>>>>    .B EXDEV
>>>>    The files referred to by
>>>>    .IR fd_in " and " fd_out
>>>> -are not on the same mounted filesystem (pre Linux 5.3).
>>>> +are not on the same mounted filesystem (pre Linux 5.3 and post Linux 5.12).
>>
>> I'm not sure that 'mounted' adds any value here.  Would you remove the
>> word here?
> 
> See rename(2). 'mounted' in this context is explained there.
> HOWEVER, it does not fit here.
> copy_file_range() IS allowed between two mounts of the same filesystem instance.

Also allowed for <5.3 ?

> 
> To make things more complicated, it appears that cross mount clone is not
> allowed via FICLONE/FICLONERANGE ioctl, so ioctl_ficlonerange(2) man page
> also uses the 'mounted filesystem' terminology for EXDEV
> 
> As things stand now, because of the fallback to clone logic,
> copy_file_range() provides a way for users to clone across different mounts
> of the same filesystem instance, which they cannot do with the FICLONE ioctl.
> 
> Fun :)
> 
> BTW, I don't know if preventing cross mount clone was done intentionally,
> but as I wrote in a comment in the code once:
> 
>          /*
>           * FICLONE/FICLONERANGE ioctls enforce that src and dest files are on
>           * the same mount. Practically, they only need to be on the same file
>           * system.
>           */

:)

> 
>>
>> It reads as if two separate devices with the same filesystem type would
>> still give this error.
>>
>> Per the LWN.net article Amir shared, this is permitted ("When called
>> from user space, copy_file_range() will only try to copy a file across
>> filesystems if the two are of the same type").
>>
>> This behavior was slightly different before 5.3 AFAICR (was it?) ("until
>> then, copy_file_range() refused to copy between files that were not
>> located on the same filesystem.").  If that's the case, I'd specify the
>> difference, or more probably split the error into two, one before 5.3,
>> and one since 5.12.
>>
> 
> True.
> 
>>>
>>> I think you need to drop the (Linux range) altogether.
>>
>> I'll keep the range.  Users of 5.3..5.11 might be surprised if the
>> filesystems are different and they don't get an error, I think.
>>
>> I reworded it to follow other pages conventions:
>>
>> .BR EXDEV " (before Linux 5.3; or since Linux 5.12)"
>>
>> which renders as:
>>
>>          EXDEV (before Linux 5.3; or since Linux 5.12)
>>                 The files referred to by fd_in and fd_out are not on
>>                 the same mounted filesystem.
>>
> 
> drop 'mounted'

Yes

> 
>>
>>> What's missing here is the NFS cross server copy use case.
>>> Maybe:
>>>
>>> ...are not on the same mounted filesystem and the source and target filesystems
>>> do not support cross-filesystem copy.
>>
>> Yes.
>>
>> Again, this wasn't true before 5.3, right?
>>
> 
> Right.
> Actually, v5.3 provides the vfs capabilities for filesystems to support
> cross fs copy. I am not sure if NFS already implements cross fs copy in
> v5.3 and not sure about cifs. Need to get input from nfs/cis developers
> or dig in the release notes for server-side copy.

Okay
> Thanks to LWN :)

:)

Thanks,

Alex
Steve French Feb. 27, 2021, 11:08 p.m. UTC | #11
On Fri, Feb 26, 2021 at 11:43 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Sat, Feb 27, 2021 at 12:19 AM Alejandro Colomar (man-pages)
> <alx.manpages@gmail.com> wrote:
> >
> > Hello Amir, Luis,
> >
> > On 2/24/21 5:10 PM, Amir Goldstein wrote:
> > > On Wed, Feb 24, 2021 at 4:22 PM Luis Henriques <lhenriques@suse.de> wrote:
> > >>
> > >> Update man-page with recent changes to this syscall.
> > >>
> > >> Signed-off-by: Luis Henriques <lhenriques@suse.de>
> > >> ---
> > >> Hi!
> > >>
> > >> Here's a suggestion for fixing the manpage for copy_file_range().  Note that
> > >> I've assumed the fix will hit 5.12.
> > >>
> > >>   man2/copy_file_range.2 | 10 +++++++++-
> > >>   1 file changed, 9 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/man2/copy_file_range.2 b/man2/copy_file_range.2
> > >> index 611a39b8026b..b0fd85e2631e 100644
> > >> --- a/man2/copy_file_range.2
> > >> +++ b/man2/copy_file_range.2
> > >> @@ -169,6 +169,9 @@ Out of memory.
> > >>   .B ENOSPC
> > >>   There is not enough space on the target filesystem to complete the copy.
> > >>   .TP
> > >> +.B EOPNOTSUPP
> >
> > I'll add the kernel version here:
> >
> > .BR EOPNOTSUPP " (since Linux 5.12)"
>
> Error could be returned prior to 5.3 and would be probably returned
> by future stable kernels 5.3..5.12 too
>
> >
> > >> +The filesystem does not support this operation >> +.TP
> > >>   .B EOVERFLOW
> > >>   The requested source or destination range is too large to represent in the
> > >>   specified data types.
> > >> @@ -187,7 +190,7 @@ refers to an active swap file.
> > >>   .B EXDEV
> > >>   The files referred to by
> > >>   .IR fd_in " and " fd_out
> > >> -are not on the same mounted filesystem (pre Linux 5.3).
> > >> +are not on the same mounted filesystem (pre Linux 5.3 and post Linux 5.12).
> >
> > I'm not sure that 'mounted' adds any value here.  Would you remove the
> > word here?
>
> See rename(2). 'mounted' in this context is explained there.
> HOWEVER, it does not fit here.
> copy_file_range() IS allowed between two mounts of the same filesystem instance.
>
> To make things more complicated, it appears that cross mount clone is not
> allowed via FICLONE/FICLONERANGE ioctl, so ioctl_ficlonerange(2) man page
> also uses the 'mounted filesystem' terminology for EXDEV
>
> As things stand now, because of the fallback to clone logic,
> copy_file_range() provides a way for users to clone across different mounts
> of the same filesystem instance, which they cannot do with the FICLONE ioctl.
>
> Fun :)
>
> BTW, I don't know if preventing cross mount clone was done intentionally,
> but as I wrote in a comment in the code once:
>
>         /*
>          * FICLONE/FICLONERANGE ioctls enforce that src and dest files are on
>          * the same mount. Practically, they only need to be on the same file
>          * system.
>          */
>
> >
> > It reads as if two separate devices with the same filesystem type would
> > still give this error.
> >
> > Per the LWN.net article Amir shared, this is permitted ("When called
> > from user space, copy_file_range() will only try to copy a file across
> > filesystems if the two are of the same type").
> >
> > This behavior was slightly different before 5.3 AFAICR (was it?) ("until
> > then, copy_file_range() refused to copy between files that were not
> > located on the same filesystem.").  If that's the case, I'd specify the
> > difference, or more probably split the error into two, one before 5.3,
> > and one since 5.12.
> >
>
> True.
>
> > >
> > > I think you need to drop the (Linux range) altogether.
> >
> > I'll keep the range.  Users of 5.3..5.11 might be surprised if the
> > filesystems are different and they don't get an error, I think.
> >
> > I reworded it to follow other pages conventions:
> >
> > .BR EXDEV " (before Linux 5.3; or since Linux 5.12)"
> >
> > which renders as:
> >
> >         EXDEV (before Linux 5.3; or since Linux 5.12)
> >                The files referred to by fd_in and fd_out are not on
> >                the same mounted filesystem.
> >
>
> drop 'mounted'
>
> >
> > > What's missing here is the NFS cross server copy use case.
> > > Maybe:

At least for the SMB3 kernel server (ksmbd "cifsd") looks like they use splice.
And for the user space CIFS/SMB3 server (like Samba) they have a configurable
plug in library interface ("Samba VFS modules") that would allow you
to implement
cross filesystem copy optimally for your version of Linux and plug
this into Samba
with little work on your part.

> >
> > Again, this wasn't true before 5.3, right?
> >
>
> Right.
> Actually, v5.3 provides the vfs capabilities for filesystems to support
> cross fs copy. I am not sure if NFS already implements cross fs copy in
> v5.3 and not sure about cifs. Need to get input from nfs/cis developers
> or dig in the release notes for server-side copy.

The SMB3 protocol has multiple ways to do "server side copy" (copy
offload to the server), some of which would apply to your example.
The case of "reflink" in many cases would be most efficient, and is supported
by the Linux client (see MS-SMB2 protocol specification section 3.3.5.15.18) but
is supported by fewer server file systems, so probably more important
to focus on
the other mechanisms which are server side copy rather than clone.  The most
popular way, supported by most servers, is  "CopyChunk" - 100s of
millions of systems
support this (if not more) - see MS-SMB2 protocol specification
section 2.2.31.1 and
3.3.5.15.16 - there are various cases where two different SMB3 mounts
on the same
client could handle cross mount server side copy.

There are other mechanisms supported by fewer servers SMB3 ODX/T10 style copy
offload (Windows and some others see e.g. Gordon at Nexenta's presentation
https://www.slideshare.net/gordonross/smb3-offload-data-transfer-odx)
but still popular for virtualization workloads.  For this it could be
even more common
for those to be different mounts on the client.  The Linux client does
not support
the SMB3 ODX/T10 offload yet but it would be good to add support for it.
There is a nice description of its additional benefits at
https://docs.microsoft.com/en-us/windows-hardware/drivers/storage/offloaded-data-transfer

But - yes SMB3 on Linux can have cross mount file copy today, which is
far more efficient
(having the server do the copy for us) rather than sending large
reads/writes back and
forth over the network from the client.  In the future I am hoping that use case
becomes even more common over SMB3 as cloud servers improve.


> > > You may refer the reader to VERSIONS section where it will say which
> > > filesystems support cross-fs copy as of kernel version XXX (i.e. cifs and nfs).
> > >
> > >>   .SH VERSIONS
> > >>   The
> > >>   .BR copy_file_range ()
> > >> @@ -202,6 +205,11 @@ Applications should target the behaviour and requirements of 5.3 kernels.
> > >>   .PP
> > >>   First support for cross-filesystem copies was introduced in Linux 5.3.
> > >>   Older kernels will return -EXDEV when cross-filesystem copies are attempted.
> > >> +.PP
> > >> +After Linux 5.12, support for copies between different filesystems was dropped.
> > >> +However, individual filesystems may still provide
> > >> +.BR copy_file_range ()
> > >> +implementations that allow copies across different devices.

Yes - this could be very important, especially for cifs (smb3) going forward.
Amir Goldstein Feb. 28, 2021, 7:35 a.m. UTC | #12
On Sun, Feb 28, 2021 at 1:08 AM Steve French <smfrench@gmail.com> wrote:
>
> On Fri, Feb 26, 2021 at 11:43 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Sat, Feb 27, 2021 at 12:19 AM Alejandro Colomar (man-pages)
> > <alx.manpages@gmail.com> wrote:
> > >
> > > Hello Amir, Luis,
> > >
> > > On 2/24/21 5:10 PM, Amir Goldstein wrote:
> > > > On Wed, Feb 24, 2021 at 4:22 PM Luis Henriques <lhenriques@suse.de> wrote:
> > > >>
> > > >> Update man-page with recent changes to this syscall.
> > > >>
> > > >> Signed-off-by: Luis Henriques <lhenriques@suse.de>
> > > >> ---
> > > >> Hi!
> > > >>
> > > >> Here's a suggestion for fixing the manpage for copy_file_range().  Note that
> > > >> I've assumed the fix will hit 5.12.
> > > >>
> > > >>   man2/copy_file_range.2 | 10 +++++++++-
> > > >>   1 file changed, 9 insertions(+), 1 deletion(-)
> > > >>
> > > >> diff --git a/man2/copy_file_range.2 b/man2/copy_file_range.2
> > > >> index 611a39b8026b..b0fd85e2631e 100644
> > > >> --- a/man2/copy_file_range.2
> > > >> +++ b/man2/copy_file_range.2
> > > >> @@ -169,6 +169,9 @@ Out of memory.
> > > >>   .B ENOSPC
> > > >>   There is not enough space on the target filesystem to complete the copy.
> > > >>   .TP
> > > >> +.B EOPNOTSUPP
> > >
> > > I'll add the kernel version here:
> > >
> > > .BR EOPNOTSUPP " (since Linux 5.12)"
> >
> > Error could be returned prior to 5.3 and would be probably returned
> > by future stable kernels 5.3..5.12 too
> >
> > >
> > > >> +The filesystem does not support this operation >> +.TP
> > > >>   .B EOVERFLOW
> > > >>   The requested source or destination range is too large to represent in the
> > > >>   specified data types.
> > > >> @@ -187,7 +190,7 @@ refers to an active swap file.
> > > >>   .B EXDEV
> > > >>   The files referred to by
> > > >>   .IR fd_in " and " fd_out
> > > >> -are not on the same mounted filesystem (pre Linux 5.3).
> > > >> +are not on the same mounted filesystem (pre Linux 5.3 and post Linux 5.12).
> > >
> > > I'm not sure that 'mounted' adds any value here.  Would you remove the
> > > word here?
> >
> > See rename(2). 'mounted' in this context is explained there.
> > HOWEVER, it does not fit here.
> > copy_file_range() IS allowed between two mounts of the same filesystem instance.
> >
> > To make things more complicated, it appears that cross mount clone is not
> > allowed via FICLONE/FICLONERANGE ioctl, so ioctl_ficlonerange(2) man page
> > also uses the 'mounted filesystem' terminology for EXDEV
> >
> > As things stand now, because of the fallback to clone logic,
> > copy_file_range() provides a way for users to clone across different mounts
> > of the same filesystem instance, which they cannot do with the FICLONE ioctl.
> >
> > Fun :)
> >
> > BTW, I don't know if preventing cross mount clone was done intentionally,
> > but as I wrote in a comment in the code once:
> >
> >         /*
> >          * FICLONE/FICLONERANGE ioctls enforce that src and dest files are on
> >          * the same mount. Practically, they only need to be on the same file
> >          * system.
> >          */
> >
> > >
> > > It reads as if two separate devices with the same filesystem type would
> > > still give this error.
> > >
> > > Per the LWN.net article Amir shared, this is permitted ("When called
> > > from user space, copy_file_range() will only try to copy a file across
> > > filesystems if the two are of the same type").
> > >
> > > This behavior was slightly different before 5.3 AFAICR (was it?) ("until
> > > then, copy_file_range() refused to copy between files that were not
> > > located on the same filesystem.").  If that's the case, I'd specify the
> > > difference, or more probably split the error into two, one before 5.3,
> > > and one since 5.12.
> > >
> >
> > True.
> >
> > > >
> > > > I think you need to drop the (Linux range) altogether.
> > >
> > > I'll keep the range.  Users of 5.3..5.11 might be surprised if the
> > > filesystems are different and they don't get an error, I think.
> > >
> > > I reworded it to follow other pages conventions:
> > >
> > > .BR EXDEV " (before Linux 5.3; or since Linux 5.12)"
> > >
> > > which renders as:
> > >
> > >         EXDEV (before Linux 5.3; or since Linux 5.12)
> > >                The files referred to by fd_in and fd_out are not on
> > >                the same mounted filesystem.
> > >
> >
> > drop 'mounted'
> >
> > >
> > > > What's missing here is the NFS cross server copy use case.
> > > > Maybe:
>
> At least for the SMB3 kernel server (ksmbd "cifsd") looks like they use splice.
> And for the user space CIFS/SMB3 server (like Samba) they have a configurable
> plug in library interface ("Samba VFS modules") that would allow you
> to implement
> cross filesystem copy optimally for your version of Linux and plug
> this into Samba
> with little work on your part.
>
> > >
> > > Again, this wasn't true before 5.3, right?
> > >
> >
> > Right.
> > Actually, v5.3 provides the vfs capabilities for filesystems to support
> > cross fs copy. I am not sure if NFS already implements cross fs copy in
> > v5.3 and not sure about cifs. Need to get input from nfs/cis developers
> > or dig in the release notes for server-side copy.
>
> The SMB3 protocol has multiple ways to do "server side copy" (copy
> offload to the server), some of which would apply to your example.
> The case of "reflink" in many cases would be most efficient, and is supported
> by the Linux client (see MS-SMB2 protocol specification section 3.3.5.15.18) but
> is supported by fewer server file systems, so probably more important
> to focus on
> the other mechanisms which are server side copy rather than clone.  The most
> popular way, supported by most servers, is  "CopyChunk" - 100s of
> millions of systems
> support this (if not more) - see MS-SMB2 protocol specification
> section 2.2.31.1 and
> 3.3.5.15.16 - there are various cases where two different SMB3 mounts
> on the same
> client could handle cross mount server side copy.
>
> There are other mechanisms supported by fewer servers SMB3 ODX/T10 style copy
> offload (Windows and some others see e.g. Gordon at Nexenta's presentation
> https://www.slideshare.net/gordonross/smb3-offload-data-transfer-odx)
> but still popular for virtualization workloads.  For this it could be
> even more common
> for those to be different mounts on the client.  The Linux client does
> not support
> the SMB3 ODX/T10 offload yet but it would be good to add support for it.
> There is a nice description of its additional benefits at
> https://docs.microsoft.com/en-us/windows-hardware/drivers/storage/offloaded-data-transfer
>
> But - yes SMB3 on Linux can have cross mount file copy today, which is
> far more efficient

Can have? or does have?
IIUC, server-side copy ability exists for "same cifs fs" for a long time and
since v5.3, it is available for "same cifs connection", which is not exactly
the same as "same cifs fs" but also not really different for most people.
Can you elaborate about  that?
Just assume the server can do anything. What can the Linux client do
since v5.3 or later?

> (having the server do the copy for us) rather than sending large
> reads/writes back and
> forth over the network from the client.  In the future I am hoping that use case
> becomes even more common over SMB3 as cloud servers improve.
>
>
> > > > You may refer the reader to VERSIONS section where it will say which
> > > > filesystems support cross-fs copy as of kernel version XXX (i.e. cifs and nfs).
> > > >
> > > >>   .SH VERSIONS
> > > >>   The
> > > >>   .BR copy_file_range ()
> > > >> @@ -202,6 +205,11 @@ Applications should target the behaviour and requirements of 5.3 kernels.
> > > >>   .PP
> > > >>   First support for cross-filesystem copies was introduced in Linux 5.3.
> > > >>   Older kernels will return -EXDEV when cross-filesystem copies are attempted.
> > > >> +.PP
> > > >> +After Linux 5.12, support for copies between different filesystems was dropped.
> > > >> +However, individual filesystems may still provide
> > > >> +.BR copy_file_range ()
> > > >> +implementations that allow copies across different devices.
>
> Yes - this could be very important, especially for cifs (smb3) going forward.
>
>
>
> --
> Thanks,
>
> Steve
Steve French Feb. 28, 2021, 10:25 p.m. UTC | #13
On Sun, Feb 28, 2021 at 1:36 AM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Sun, Feb 28, 2021 at 1:08 AM Steve French <smfrench@gmail.com> wrote:
> >
> > On Fri, Feb 26, 2021 at 11:43 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > On Sat, Feb 27, 2021 at 12:19 AM Alejandro Colomar (man-pages)
> > > <alx.manpages@gmail.com> wrote:
> > > >
> > > > Hello Amir, Luis,
> > > >
> > > > On 2/24/21 5:10 PM, Amir Goldstein wrote:
> > > > > On Wed, Feb 24, 2021 at 4:22 PM Luis Henriques <lhenriques@suse.de> wrote:
> > > > >>
> > > > >> Update man-page with recent changes to this syscall.
> > > > >>
> > > > >> Signed-off-by: Luis Henriques <lhenriques@suse.de>
> > > > >> ---
> > > > >> Hi!
> > > > >>
> > > > >> Here's a suggestion for fixing the manpage for copy_file_range().  Note that
> > > > >> I've assumed the fix will hit 5.12.
> > > > >>
> > > > >>   man2/copy_file_range.2 | 10 +++++++++-
> > > > >>   1 file changed, 9 insertions(+), 1 deletion(-)
> > > > >>
> > > > >> diff --git a/man2/copy_file_range.2 b/man2/copy_file_range.2
> > > > >> index 611a39b8026b..b0fd85e2631e 100644
> > > > >> --- a/man2/copy_file_range.2
> > > > >> +++ b/man2/copy_file_range.2
> > > > >> @@ -169,6 +169,9 @@ Out of memory.
> > > > >>   .B ENOSPC
> > > > >>   There is not enough space on the target filesystem to complete the copy.
> > > > >>   .TP
> > > > >> +.B EOPNOTSUPP
> > > >
> > > > I'll add the kernel version here:
> > > >
> > > > .BR EOPNOTSUPP " (since Linux 5.12)"
> > >
> > > Error could be returned prior to 5.3 and would be probably returned
> > > by future stable kernels 5.3..5.12 too
> > >
> > > >
> > > > >> +The filesystem does not support this operation >> +.TP
> > > > >>   .B EOVERFLOW
> > > > >>   The requested source or destination range is too large to represent in the
> > > > >>   specified data types.
> > > > >> @@ -187,7 +190,7 @@ refers to an active swap file.
> > > > >>   .B EXDEV
> > > > >>   The files referred to by
> > > > >>   .IR fd_in " and " fd_out
> > > > >> -are not on the same mounted filesystem (pre Linux 5.3).
> > > > >> +are not on the same mounted filesystem (pre Linux 5.3 and post Linux 5.12).
> > > >
> > > > I'm not sure that 'mounted' adds any value here.  Would you remove the
> > > > word here?
> > >
> > > See rename(2). 'mounted' in this context is explained there.
> > > HOWEVER, it does not fit here.
> > > copy_file_range() IS allowed between two mounts of the same filesystem instance.
> > >
> > > To make things more complicated, it appears that cross mount clone is not
> > > allowed via FICLONE/FICLONERANGE ioctl, so ioctl_ficlonerange(2) man page
> > > also uses the 'mounted filesystem' terminology for EXDEV
> > >
> > > As things stand now, because of the fallback to clone logic,
> > > copy_file_range() provides a way for users to clone across different mounts
> > > of the same filesystem instance, which they cannot do with the FICLONE ioctl.
> > >
> > > Fun :)
> > >
> > > BTW, I don't know if preventing cross mount clone was done intentionally,
> > > but as I wrote in a comment in the code once:
> > >
> > >         /*
> > >          * FICLONE/FICLONERANGE ioctls enforce that src and dest files are on
> > >          * the same mount. Practically, they only need to be on the same file
> > >          * system.
> > >          */
> > >
> > > >
> > > > It reads as if two separate devices with the same filesystem type would
> > > > still give this error.
> > > >
> > > > Per the LWN.net article Amir shared, this is permitted ("When called
> > > > from user space, copy_file_range() will only try to copy a file across
> > > > filesystems if the two are of the same type").
> > > >
> > > > This behavior was slightly different before 5.3 AFAICR (was it?) ("until
> > > > then, copy_file_range() refused to copy between files that were not
> > > > located on the same filesystem.").  If that's the case, I'd specify the
> > > > difference, or more probably split the error into two, one before 5.3,
> > > > and one since 5.12.
> > > >
> > >
> > > True.
> > >
> > > > >
> > > > > I think you need to drop the (Linux range) altogether.
> > > >
> > > > I'll keep the range.  Users of 5.3..5.11 might be surprised if the
> > > > filesystems are different and they don't get an error, I think.
> > > >
> > > > I reworded it to follow other pages conventions:
> > > >
> > > > .BR EXDEV " (before Linux 5.3; or since Linux 5.12)"
> > > >
> > > > which renders as:
> > > >
> > > >         EXDEV (before Linux 5.3; or since Linux 5.12)
> > > >                The files referred to by fd_in and fd_out are not on
> > > >                the same mounted filesystem.
> > > >
> > >
> > > drop 'mounted'
> > >
> > > >
> > > > > What's missing here is the NFS cross server copy use case.
> > > > > Maybe:
> >
> > At least for the SMB3 kernel server (ksmbd "cifsd") looks like they use splice.
> > And for the user space CIFS/SMB3 server (like Samba) they have a configurable
> > plug in library interface ("Samba VFS modules") that would allow you
> > to implement
> > cross filesystem copy optimally for your version of Linux and plug
> > this into Samba
> > with little work on your part.
> >
> > > >
> > > > Again, this wasn't true before 5.3, right?
> > > >
> > >
> > > Right.
> > > Actually, v5.3 provides the vfs capabilities for filesystems to support
> > > cross fs copy. I am not sure if NFS already implements cross fs copy in
> > > v5.3 and not sure about cifs. Need to get input from nfs/cis developers
> > > or dig in the release notes for server-side copy.
> >
> > The SMB3 protocol has multiple ways to do "server side copy" (copy
> > offload to the server), some of which would apply to your example.
> > The case of "reflink" in many cases would be most efficient, and is supported
> > by the Linux client (see MS-SMB2 protocol specification section 3.3.5.15.18) but
> > is supported by fewer server file systems, so probably more important
> > to focus on
> > the other mechanisms which are server side copy rather than clone.  The most
> > popular way, supported by most servers, is  "CopyChunk" - 100s of
> > millions of systems
> > support this (if not more) - see MS-SMB2 protocol specification
> > section 2.2.31.1 and
> > 3.3.5.15.16 - there are various cases where two different SMB3 mounts
> > on the same
> > client could handle cross mount server side copy.
> >
> > There are other mechanisms supported by fewer servers SMB3 ODX/T10 style copy
> > offload (Windows and some others see e.g. Gordon at Nexenta's presentation
> > https://www.slideshare.net/gordonross/smb3-offload-data-transfer-odx)
> > but still popular for virtualization workloads.  For this it could be
> > even more common
> > for those to be different mounts on the client.  The Linux client does
> > not support
> > the SMB3 ODX/T10 offload yet but it would be good to add support for it.
> > There is a nice description of its additional benefits at
> > https://docs.microsoft.com/en-us/windows-hardware/drivers/storage/offloaded-data-transfer
> >
> > But - yes SMB3 on Linux can have cross mount file copy today, which is
> > far more efficient
>
> Can have? or does have?
> IIUC, server-side copy ability exists for "same cifs fs" for a long time and
> since v5.3, it is available for "same cifs connection", which is not exactly
> the same as "same cifs fs" but also not really different for most people.
> Can you elaborate about  that?
> Just assume the server can do anything. What can the Linux client do
> since v5.3 or later?

Inside the SMB3 client (cifs.ko) we check that the file handles provided
are for the same authenticated user to the same server, so
e.g. you could mount //server/share on /mnt1 and //server/anothershare on /mnt2
and do a copy_file_range from /mnt1/file1 to /mnt2/file2 even though these are
different mounts.   The cifs client should allow additional cases of cross mount
copy, but at least this helps for various common scenarios and is very widely
supported on most servers as well.
Amir Goldstein March 1, 2021, 6:18 a.m. UTC | #14
On Mon, Mar 1, 2021 at 12:25 AM Steve French <smfrench@gmail.com> wrote:
>
> On Sun, Feb 28, 2021 at 1:36 AM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Sun, Feb 28, 2021 at 1:08 AM Steve French <smfrench@gmail.com> wrote:
> > >
> > > On Fri, Feb 26, 2021 at 11:43 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > >
> > > > On Sat, Feb 27, 2021 at 12:19 AM Alejandro Colomar (man-pages)
> > > > <alx.manpages@gmail.com> wrote:
> > > > >
> > > > > Hello Amir, Luis,
> > > > >
> > > > > On 2/24/21 5:10 PM, Amir Goldstein wrote:
> > > > > > On Wed, Feb 24, 2021 at 4:22 PM Luis Henriques <lhenriques@suse.de> wrote:
> > > > > >>
> > > > > >> Update man-page with recent changes to this syscall.
> > > > > >>
> > > > > >> Signed-off-by: Luis Henriques <lhenriques@suse.de>
> > > > > >> ---
> > > > > >> Hi!
> > > > > >>
> > > > > >> Here's a suggestion for fixing the manpage for copy_file_range().  Note that
> > > > > >> I've assumed the fix will hit 5.12.
> > > > > >>
> > > > > >>   man2/copy_file_range.2 | 10 +++++++++-
> > > > > >>   1 file changed, 9 insertions(+), 1 deletion(-)
> > > > > >>
> > > > > >> diff --git a/man2/copy_file_range.2 b/man2/copy_file_range.2
> > > > > >> index 611a39b8026b..b0fd85e2631e 100644
> > > > > >> --- a/man2/copy_file_range.2
> > > > > >> +++ b/man2/copy_file_range.2
> > > > > >> @@ -169,6 +169,9 @@ Out of memory.
> > > > > >>   .B ENOSPC
> > > > > >>   There is not enough space on the target filesystem to complete the copy.
> > > > > >>   .TP
> > > > > >> +.B EOPNOTSUPP
> > > > >
> > > > > I'll add the kernel version here:
> > > > >
> > > > > .BR EOPNOTSUPP " (since Linux 5.12)"
> > > >
> > > > Error could be returned prior to 5.3 and would be probably returned
> > > > by future stable kernels 5.3..5.12 too
> > > >
> > > > >
> > > > > >> +The filesystem does not support this operation >> +.TP
> > > > > >>   .B EOVERFLOW
> > > > > >>   The requested source or destination range is too large to represent in the
> > > > > >>   specified data types.
> > > > > >> @@ -187,7 +190,7 @@ refers to an active swap file.
> > > > > >>   .B EXDEV
> > > > > >>   The files referred to by
> > > > > >>   .IR fd_in " and " fd_out
> > > > > >> -are not on the same mounted filesystem (pre Linux 5.3).
> > > > > >> +are not on the same mounted filesystem (pre Linux 5.3 and post Linux 5.12).
> > > > >
> > > > > I'm not sure that 'mounted' adds any value here.  Would you remove the
> > > > > word here?
> > > >
> > > > See rename(2). 'mounted' in this context is explained there.
> > > > HOWEVER, it does not fit here.
> > > > copy_file_range() IS allowed between two mounts of the same filesystem instance.
> > > >
> > > > To make things more complicated, it appears that cross mount clone is not
> > > > allowed via FICLONE/FICLONERANGE ioctl, so ioctl_ficlonerange(2) man page
> > > > also uses the 'mounted filesystem' terminology for EXDEV
> > > >
> > > > As things stand now, because of the fallback to clone logic,
> > > > copy_file_range() provides a way for users to clone across different mounts
> > > > of the same filesystem instance, which they cannot do with the FICLONE ioctl.
> > > >
> > > > Fun :)
> > > >
> > > > BTW, I don't know if preventing cross mount clone was done intentionally,
> > > > but as I wrote in a comment in the code once:
> > > >
> > > >         /*
> > > >          * FICLONE/FICLONERANGE ioctls enforce that src and dest files are on
> > > >          * the same mount. Practically, they only need to be on the same file
> > > >          * system.
> > > >          */
> > > >
> > > > >
> > > > > It reads as if two separate devices with the same filesystem type would
> > > > > still give this error.
> > > > >
> > > > > Per the LWN.net article Amir shared, this is permitted ("When called
> > > > > from user space, copy_file_range() will only try to copy a file across
> > > > > filesystems if the two are of the same type").
> > > > >
> > > > > This behavior was slightly different before 5.3 AFAICR (was it?) ("until
> > > > > then, copy_file_range() refused to copy between files that were not
> > > > > located on the same filesystem.").  If that's the case, I'd specify the
> > > > > difference, or more probably split the error into two, one before 5.3,
> > > > > and one since 5.12.
> > > > >
> > > >
> > > > True.
> > > >
> > > > > >
> > > > > > I think you need to drop the (Linux range) altogether.
> > > > >
> > > > > I'll keep the range.  Users of 5.3..5.11 might be surprised if the
> > > > > filesystems are different and they don't get an error, I think.
> > > > >
> > > > > I reworded it to follow other pages conventions:
> > > > >
> > > > > .BR EXDEV " (before Linux 5.3; or since Linux 5.12)"
> > > > >
> > > > > which renders as:
> > > > >
> > > > >         EXDEV (before Linux 5.3; or since Linux 5.12)
> > > > >                The files referred to by fd_in and fd_out are not on
> > > > >                the same mounted filesystem.
> > > > >
> > > >
> > > > drop 'mounted'
> > > >
> > > > >
> > > > > > What's missing here is the NFS cross server copy use case.
> > > > > > Maybe:
> > >
> > > At least for the SMB3 kernel server (ksmbd "cifsd") looks like they use splice.
> > > And for the user space CIFS/SMB3 server (like Samba) they have a configurable
> > > plug in library interface ("Samba VFS modules") that would allow you
> > > to implement
> > > cross filesystem copy optimally for your version of Linux and plug
> > > this into Samba
> > > with little work on your part.
> > >
> > > > >
> > > > > Again, this wasn't true before 5.3, right?
> > > > >
> > > >
> > > > Right.
> > > > Actually, v5.3 provides the vfs capabilities for filesystems to support
> > > > cross fs copy. I am not sure if NFS already implements cross fs copy in
> > > > v5.3 and not sure about cifs. Need to get input from nfs/cis developers
> > > > or dig in the release notes for server-side copy.
> > >
> > > The SMB3 protocol has multiple ways to do "server side copy" (copy
> > > offload to the server), some of which would apply to your example.
> > > The case of "reflink" in many cases would be most efficient, and is supported
> > > by the Linux client (see MS-SMB2 protocol specification section 3.3.5.15.18) but
> > > is supported by fewer server file systems, so probably more important
> > > to focus on
> > > the other mechanisms which are server side copy rather than clone.  The most
> > > popular way, supported by most servers, is  "CopyChunk" - 100s of
> > > millions of systems
> > > support this (if not more) - see MS-SMB2 protocol specification
> > > section 2.2.31.1 and
> > > 3.3.5.15.16 - there are various cases where two different SMB3 mounts
> > > on the same
> > > client could handle cross mount server side copy.
> > >
> > > There are other mechanisms supported by fewer servers SMB3 ODX/T10 style copy
> > > offload (Windows and some others see e.g. Gordon at Nexenta's presentation
> > > https://www.slideshare.net/gordonross/smb3-offload-data-transfer-odx)
> > > but still popular for virtualization workloads.  For this it could be
> > > even more common
> > > for those to be different mounts on the client.  The Linux client does
> > > not support
> > > the SMB3 ODX/T10 offload yet but it would be good to add support for it.
> > > There is a nice description of its additional benefits at
> > > https://docs.microsoft.com/en-us/windows-hardware/drivers/storage/offloaded-data-transfer
> > >
> > > But - yes SMB3 on Linux can have cross mount file copy today, which is
> > > far more efficient
> >
> > Can have? or does have?
> > IIUC, server-side copy ability exists for "same cifs fs" for a long time and
> > since v5.3, it is available for "same cifs connection", which is not exactly
> > the same as "same cifs fs" but also not really different for most people.
> > Can you elaborate about  that?
> > Just assume the server can do anything. What can the Linux client do
> > since v5.3 or later?
>
> Inside the SMB3 client (cifs.ko) we check that the file handles provided
> are for the same authenticated user to the same server, so
> e.g. you could mount //server/share on /mnt1 and //server/anothershare on /mnt2
> and do a copy_file_range from /mnt1/file1 to /mnt2/file2 even though these are
> different mounts.   The cifs client should allow additional cases of cross mount
> copy, but at least this helps for various common scenarios and is very widely
> supported on most servers as well.
>

Got it. Thanks for clarifying.

So it appears that both cifs and nfs support cross-fs copy since v5.3
and many other fs that support clone, started supporting cross-mnt
(same fs) copy (implemented as clone) since v5.3 and still do to this day.

Alejandro, just to be clear, none of these changes are in v5.12 yet,
so please hold on to your patch for now.

Thanks,
Amir.
diff mbox series

Patch

diff --git a/man2/copy_file_range.2 b/man2/copy_file_range.2
index 611a39b8026b..b0fd85e2631e 100644
--- a/man2/copy_file_range.2
+++ b/man2/copy_file_range.2
@@ -169,6 +169,9 @@  Out of memory.
 .B ENOSPC
 There is not enough space on the target filesystem to complete the copy.
 .TP
+.B EOPNOTSUPP
+The filesystem does not support this operation.
+.TP
 .B EOVERFLOW
 The requested source or destination range is too large to represent in the
 specified data types.
@@ -187,7 +190,7 @@  refers to an active swap file.
 .B EXDEV
 The files referred to by
 .IR fd_in " and " fd_out
-are not on the same mounted filesystem (pre Linux 5.3).
+are not on the same mounted filesystem (pre Linux 5.3 and post Linux 5.12).
 .SH VERSIONS
 The
 .BR copy_file_range ()
@@ -202,6 +205,11 @@  Applications should target the behaviour and requirements of 5.3 kernels.
 .PP
 First support for cross-filesystem copies was introduced in Linux 5.3.
 Older kernels will return -EXDEV when cross-filesystem copies are attempted.
+.PP
+After Linux 5.12, support for copies between different filesystems was dropped.
+However, individual filesystems may still provide
+.BR copy_file_range ()
+implementations that allow copies across different devices.
 .SH CONFORMING TO
 The
 .BR copy_file_range ()