[v1,2/3] VFS permit cross device vfs_copy_file_range
diff mbox

Message ID 20170302160211.30451-3-kolga@netapp.com
State New
Headers show

Commit Message

Olga Kornievskaia March 2, 2017, 4:02 p.m. UTC
Allow nfs_copy_file_range to copy across devices.
NFSv4.2 inter server to server copy always copies across devices, and
NFSv4.2 intra server to server copy can copy across devices on the same
server.

If a file system's fileoperations copy_file_range operation prohibits
cross-device copies, fall back to do_splice_direct. This is needed for
nfsd_copy_file_range() which is called by the inter server to server
destination server acting as an NFS client, and reading the file from
the source server.

Signed-off-by: Andy Adamson <andros@netapp.com>
---
 fs/read_write.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

Comments

Christoph Hellwig March 2, 2017, 4:07 p.m. UTC | #1
On Thu, Mar 02, 2017 at 11:02:10AM -0500, Olga Kornievskaia wrote:
> Allow nfs_copy_file_range to copy across devices.
> NFSv4.2 inter server to server copy always copies across devices, and
> NFSv4.2 intra server to server copy can copy across devices on the same
> server.
> 
> If a file system's fileoperations copy_file_range operation prohibits
> cross-device copies, fall back to do_splice_direct. This is needed for
> nfsd_copy_file_range() which is called by the inter server to server
> destination server acting as an NFS client, and reading the file from
> the source server.

NAK, we really should not do operations between different superblocks.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Olga Kornievskaia March 2, 2017, 4:38 p.m. UTC | #2
> On Mar 2, 2017, at 11:07 AM, Christoph Hellwig <hch@lst.de> wrote:
> 
> On Thu, Mar 02, 2017 at 11:02:10AM -0500, Olga Kornievskaia wrote:
>> Allow nfs_copy_file_range to copy across devices.
>> NFSv4.2 inter server to server copy always copies across devices, and
>> NFSv4.2 intra server to server copy can copy across devices on the same
>> server.
>> 
>> If a file system's fileoperations copy_file_range operation prohibits
>> cross-device copies, fall back to do_splice_direct. This is needed for
>> nfsd_copy_file_range() which is called by the inter server to server
>> destination server acting as an NFS client, and reading the file from
>> the source server.
> 
> NAK, we really should not do operations between different superblocks.

Can you provide some reasoning as to why? What would it break? The reasoning for including one is to allow for a file system to achieve better performance which seems like a feature that would be of great benefit. --
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Olga Kornievskaia March 7, 2017, 8:35 p.m. UTC | #3
> On Mar 2, 2017, at 11:38 AM, Olga Kornievskaia <kolga@netapp.com> wrote:
> 
> 
>> On Mar 2, 2017, at 11:07 AM, Christoph Hellwig <hch@lst.de> wrote:
>> 
>> On Thu, Mar 02, 2017 at 11:02:10AM -0500, Olga Kornievskaia wrote:
>>> Allow nfs_copy_file_range to copy across devices.
>>> NFSv4.2 inter server to server copy always copies across devices, and
>>> NFSv4.2 intra server to server copy can copy across devices on the same
>>> server.
>>> 
>>> If a file system's fileoperations copy_file_range operation prohibits
>>> cross-device copies, fall back to do_splice_direct. This is needed for
>>> nfsd_copy_file_range() which is called by the inter server to server
>>> destination server acting as an NFS client, and reading the file from
>>> the source server.
>> 
>> NAK, we really should not do operations between different superblocks.
> 
> Can you provide some reasoning as to why? What would it break? The reasoning for including one is to allow for a file system to achieve better performance which seems like a feature that would be of great benefit.

Christoph, could you please elaborate on your objection.

Al, could you weigh in with regards to if and what would it take to allow for copy_file_range() to allow for copies between different superblocks.

We would appreciate the feedback of how can we enable this performance feature to be useful to users.

Thank you.



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bruce Fields March 15, 2017, 6:09 p.m. UTC | #4
On Thu, Mar 02, 2017 at 11:38:03AM -0500, Olga Kornievskaia wrote:
> > On Mar 2, 2017, at 11:07 AM, Christoph Hellwig <hch@lst.de> wrote:
> > NAK, we really should not do operations between different
> > superblocks.
> 
> Can you provide some reasoning as to why? What would it break? The
> reasoning for including one is to allow for a file system to achieve
> better performance which seems like a feature that would be of great
> benefit. --

Yes, this has come up a few times.  What's going on?:

	- There was an explanation, and I missed it.
	- The explanation is complicated and Christoph hasn't had time
	  to write it up.
	- Christoph has a strong suspicion there are issues without
	  being sure exactly where they are.
	- Something else?

I mean, the whole enterprise is dead in the water without this, as far
as I can tell, so I'd really like to understand the answer one way or
the other.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bruce Fields March 21, 2017, 3:50 p.m. UTC | #5
On Wed, Mar 15, 2017 at 02:09:13PM -0400, J. Bruce Fields wrote:
> On Thu, Mar 02, 2017 at 11:38:03AM -0500, Olga Kornievskaia wrote:
> > > On Mar 2, 2017, at 11:07 AM, Christoph Hellwig <hch@lst.de> wrote:
> > > NAK, we really should not do operations between different
> > > superblocks.
> > 
> > Can you provide some reasoning as to why? What would it break? The
> > reasoning for including one is to allow for a file system to achieve
> > better performance which seems like a feature that would be of great
> > benefit. --
> 
> Yes, this has come up a few times.  What's going on?:
> 
> 	- There was an explanation, and I missed it.
> 	- The explanation is complicated and Christoph hasn't had time
> 	  to write it up.
> 	- Christoph has a strong suspicion there are issues without
> 	  being sure exactly where they are.
> 	- Something else?

As far as I can tell from talking at LSF:

	- file system operations crossing superblocks is unusual.  No
	  specific known issue.  Also not sure I understand why splice
	  isn't precedent.
	- implementation (with server acting as a client, long-running
	  process, etc.) will be complicated and ugly.  Sure, I guess
	  we'll see how complicated and weigh that against any
	  advantages.  (Though I'm curious about case e.g. of btrfs
	  clone--can't it easily clone across filesystem boundaries in
	  some cases when they share storage?)

Anyway.  I'll read the patches.  Probably not this week.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bruce Fields March 21, 2017, 7:03 p.m. UTC | #6
On Tue, Mar 21, 2017 at 12:03:08PM -0400, Olga Kornievskaia wrote:
> Thank you for the update. I guess I don’t see how the proposed NFS
> implementation is complicated and ugly (but I’m biased). I’ll try to
> give you some performance number. My 1 data point (1gb) inter copy
> showed 30% improvement (how can that be ignored).

That would be useful, thanks--if it comes with some details about the
setup.

I'm not so curious about percent improvement, as how to predict the
performance on a given network.

If server-to-server copy looks like it's normally able to use close to
the available bandwidth between the two servers, and if a traditional
read-write-copy loop is similarly able to use the available bandwidth,
then I can figure out whether server-to-server copy will help on my
setup.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Olga Kornievskaia March 22, 2017, 5:16 p.m. UTC | #7
On Tue, Mar 21, 2017 at 3:03 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> On Tue, Mar 21, 2017 at 12:03:08PM -0400, Olga Kornievskaia wrote:
>> Thank you for the update. I guess I don’t see how the proposed NFS
>> implementation is complicated and ugly (but I’m biased). I’ll try to
>> give you some performance number. My 1 data point (1gb) inter copy
>> showed 30% improvement (how can that be ignored).
>
> That would be useful, thanks--if it comes with some details about the
> setup.

What I have available to me are two laptops that I run my VMs on. It
is not a setup that is representative of a real setup. I think this
setup can only provide percent improvement numbers.

Andy was suggesting that perhaps the performance lab at Redhat would
be able to do some testing of the patches for some real world
performance?

> I'm not so curious about percent improvement, as how to predict the
> performance on a given network.
>
> If server-to-server copy looks like it's normally able to use close to
> the available bandwidth between the two servers, and if a traditional
> read-write-copy loop is similarly able to use the available bandwidth,
> then I can figure out whether server-to-server copy will help on my
> setup.
>
> --b.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Florian Weimer Aug. 31, 2018, 4:13 p.m. UTC | #8
On 03/02/2017 05:02 PM, Olga Kornievskaia wrote:
> Allow nfs_copy_file_range to copy across devices.
> NFSv4.2 inter server to server copy always copies across devices, and
> NFSv4.2 intra server to server copy can copy across devices on the same
> server.
> 
> If a file system's fileoperations copy_file_range operation prohibits
> cross-device copies, fall back to do_splice_direct. This is needed for
> nfsd_copy_file_range() which is called by the inter server to server
> destination server acting as an NFS client, and reading the file from
> the source server.
> 
> Signed-off-by: Andy Adamson<andros@netapp.com>

What has happened to the patch?  We unwittingly used copy_file_range in 
the glibc test suite, without realizing that it does not support 
cross-device copies.

Thanks,
Florian
Olga Kornievskaia Aug. 31, 2018, 4:25 p.m. UTC | #9
On Fri, Aug 31, 2018 at 12:14 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> On 03/02/2017 05:02 PM, Olga Kornievskaia wrote:
> > Allow nfs_copy_file_range to copy across devices.
> > NFSv4.2 inter server to server copy always copies across devices, and
> > NFSv4.2 intra server to server copy can copy across devices on the same
> > server.
> >
> > If a file system's fileoperations copy_file_range operation prohibits
> > cross-device copies, fall back to do_splice_direct. This is needed for
> > nfsd_copy_file_range() which is called by the inter server to server
> > destination server acting as an NFS client, and reading the file from
> > the source server.
> >
> > Signed-off-by: Andy Adamson<andros@netapp.com>
>
> What has happened to the patch?  We unwittingly used copy_file_range in
> the glibc test suite, without realizing that it does not support
> cross-device copies.

I'm still planning to fight for the patch to be included. As per
maintainers request, I separated out the async copy patches out and
hopefully that will be going into 4.20. I'm working on the "inter"
copy offload functionality which does require this patch. I will start
submitting and pushing this patch along with the rest of the patches.

Are you interested in the copy_file_range() functionality that does
support cross-device copies? If so can you tell me how are you using
it? It would be also helpful to watch for the submission of the patch
and argue in its favor.

Thank you.
>
> Thanks,
> Florian
Steve French Aug. 31, 2018, 10:56 p.m. UTC | #10
On Fri, Aug 31, 2018 at 11:41 AM Olga Kornievskaia <aglo@umich.edu> wrote:
>
> On Fri, Aug 31, 2018 at 12:14 PM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > On 03/02/2017 05:02 PM, Olga Kornievskaia wrote:
> > > Allow nfs_copy_file_range to copy across devices.
> > > NFSv4.2 inter server to server copy always copies across devices, and
> > > NFSv4.2 intra server to server copy can copy across devices on the same
> > > server.
> > >
> > > If a file system's fileoperations copy_file_range operation prohibits
> > > cross-device copies, fall back to do_splice_direct. This is needed for
> > > nfsd_copy_file_range() which is called by the inter server to server
> > > destination server acting as an NFS client, and reading the file from
> > > the source server.
> > >
> > > Signed-off-by: Andy Adamson<andros@netapp.com>
> >
> > What has happened to the patch?  We unwittingly used copy_file_range in
> > the glibc test suite, without realizing that it does not support
> > cross-device copies.
>
> I'm still planning to fight for the patch to be included. As per
> maintainers request, I separated out the async copy patches out and
> hopefully that will be going into 4.20. I'm working on the "inter"
> copy offload functionality which does require this patch. I will start
> submitting and pushing this patch along with the rest of the patches.
>
> Are you interested in the copy_file_range() functionality that does
> support cross-device copies? If so can you tell me how are you using
> it? It would be also helpful to watch for the submission of the patch
> and argue in its favor.

SMB3 clients and server (Windows, Macs, Samba and probably most
every modern NAS out there) support the SMB3 "CopyChunk"
operation used by the Linux client.

I would expect that one of the most useful cases is cross device
For example you have two mounts to your server or server cluster
\\server\share1 is mounted to /mnt1
and
\\server\backup is mounted to /mnt2

and you want to do copy_file or copy_file_range of various
files from /mnt1 to /mnt2

As long as they are both the same file system type, why not
let the file system tell you whether it can do the copy.

Given that this is 10x to 100x faster than the alternative
and this is a common case (and there are many 100s of millions
of SMB3 server capable systems out there which already support
copychunk (and thus would support copy file) - why would we
want to restrict it.

It is much saner to let the file system tell the VFS if it can't
support the cross device copy.

---- and to make it even more obvious why this patch matters ---
Virtualization clients (and various Windows and NetApp server)
support another copy offload mechanism (not just CopyChunk)
ie via ODX which would support cross server not just cross share
copy.    Enabling this would be a BIG incentive for finishing up
ODX copy support in Linux (cifs.ko using SMB3).   This is
fairly widely supported by servers.

Patch
diff mbox

diff --git a/fs/read_write.c b/fs/read_write.c
index 1d9e305..75084cd 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1534,10 +1534,6 @@  ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	    (file_out->f_flags & O_APPEND))
 		return -EBADF;
 
-	/* this could be relaxed once a method supports cross-fs copies */
-	if (inode_in->i_sb != inode_out->i_sb)
-		return -EXDEV;
-
 	if (len == 0)
 		return 0;
 
@@ -1559,7 +1555,7 @@  ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	if (file_out->f_op->copy_file_range) {
 		ret = file_out->f_op->copy_file_range(file_in, pos_in, file_out,
 						      pos_out, len, flags);
-		if (ret != -EOPNOTSUPP)
+		if (ret != -EOPNOTSUPP && ret != -EXDEV)
 			goto done;
 	}