diff mbox series

[v8] vfs: fix copy_file_range regression in cross-fs copies

Message ID 20210222102456.6692-1-lhenriques@suse.de (mailing list archive)
State New
Headers show
Series [v8] vfs: fix copy_file_range regression in cross-fs copies | expand

Commit Message

Luis Henriques Feb. 22, 2021, 10:24 a.m. UTC
A regression has been reported by Nicolas Boichat, found while using the
copy_file_range syscall to copy a tracefs file.  Before commit
5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices") the
kernel would return -EXDEV to userspace when trying to copy a file across
different filesystems.  After this commit, the syscall doesn't fail anymore
and instead returns zero (zero bytes copied), as this file's content is
generated on-the-fly and thus reports a size of zero.

This patch restores some cross-filesystem copy restrictions that existed
prior to commit 5dae222a5ff0 ("vfs: allow copy_file_range to copy across
devices").  Filesystems are still allowed to fall-back to the VFS
generic_copy_file_range() implementation, but that has now to be done
explicitly.

nfsd is also modified to fall-back into generic_copy_file_range() in case
vfs_copy_file_range() fails with -EOPNOTSUPP or -EXDEV.

Fixes: 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices")
Link: https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@chromium.org/
Link: https://lore.kernel.org/linux-fsdevel/CANMq1KDZuxir2LM5jOTm0xx+BnvW=ZmpsG47CyHFJwnw7zSX6Q@mail.gmail.com/
Link: https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/
Reported-by: Nicolas Boichat <drinkcat@chromium.org>
Signed-off-by: Luis Henriques <lhenriques@suse.de>
---
Changes since v7
- set 'ret' to '-EOPNOTSUPP' before the clone 'if' statement so that the
  error returned is always related to the 'copy' operation
Changes since v6
- restored i_sb checks for the clone operation
Changes since v5
- check if ->copy_file_range is NULL before calling it
Changes since v4
- nfsd falls-back to generic_copy_file_range() only *if* it gets -EOPNOTSUPP
  or -EXDEV.
Changes since v3
- dropped the COPY_FILE_SPLICE flag
- kept the f_op's checks early in generic_copy_file_checks, implementing
  Amir's suggestions
- modified nfsd to use generic_copy_file_range()
Changes since v2
- do all the required checks earlier, in generic_copy_file_checks(),
  adding new checks for ->remap_file_range
- new COPY_FILE_SPLICE flag
- don't remove filesystem's fallback to generic_copy_file_range()
- updated commit changelog (and subject)
Changes since v1 (after Amir review)
- restored do_copy_file_range() helper
- return -EOPNOTSUPP if fs doesn't implement CFR
- updated commit description

 fs/nfsd/vfs.c   |  8 +++++++-
 fs/read_write.c | 49 ++++++++++++++++++++++++-------------------------
 2 files changed, 31 insertions(+), 26 deletions(-)

Comments

Amir Goldstein Feb. 22, 2021, 10:46 a.m. UTC | #1
On Mon, Feb 22, 2021 at 12:23 PM Luis Henriques <lhenriques@suse.de> wrote:
>
> A regression has been reported by Nicolas Boichat, found while using the
> copy_file_range syscall to copy a tracefs file.  Before commit
> 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices") the
> kernel would return -EXDEV to userspace when trying to copy a file across
> different filesystems.  After this commit, the syscall doesn't fail anymore
> and instead returns zero (zero bytes copied), as this file's content is
> generated on-the-fly and thus reports a size of zero.
>
> This patch restores some cross-filesystem copy restrictions that existed
> prior to commit 5dae222a5ff0 ("vfs: allow copy_file_range to copy across
> devices").  Filesystems are still allowed to fall-back to the VFS
> generic_copy_file_range() implementation, but that has now to be done
> explicitly.
>
> nfsd is also modified to fall-back into generic_copy_file_range() in case
> vfs_copy_file_range() fails with -EOPNOTSUPP or -EXDEV.
>
> Fixes: 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices")
> Link: https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@chromium.org/
> Link: https://lore.kernel.org/linux-fsdevel/CANMq1KDZuxir2LM5jOTm0xx+BnvW=ZmpsG47CyHFJwnw7zSX6Q@mail.gmail.com/
> Link: https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/
> Reported-by: Nicolas Boichat <drinkcat@chromium.org>
> Signed-off-by: Luis Henriques <lhenriques@suse.de>
> ---

Reviewed-by: Amir Goldstein <amir73il@gmail.com>

Thanks,
Amir.
Dai Ngo Feb. 22, 2021, 4:25 p.m. UTC | #2
On 2/22/21 2:24 AM, Luis Henriques wrote:
> A regression has been reported by Nicolas Boichat, found while using the
> copy_file_range syscall to copy a tracefs file.  Before commit
> 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices") the
> kernel would return -EXDEV to userspace when trying to copy a file across
> different filesystems.  After this commit, the syscall doesn't fail anymore
> and instead returns zero (zero bytes copied), as this file's content is
> generated on-the-fly and thus reports a size of zero.
>
> This patch restores some cross-filesystem copy restrictions that existed
> prior to commit 5dae222a5ff0 ("vfs: allow copy_file_range to copy across
> devices").  Filesystems are still allowed to fall-back to the VFS
> generic_copy_file_range() implementation, but that has now to be done
> explicitly.
>
> nfsd is also modified to fall-back into generic_copy_file_range() in case
> vfs_copy_file_range() fails with -EOPNOTSUPP or -EXDEV.
>
> Fixes: 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices")
> Link: https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@chromium.org/__;!!GqivPVa7Brio!P1UWThiSkxbjfjFQWNYJmCxGEkiLFyvHjH6cS-G1ZTt1z-TeqwGQgQmi49dC6w$
> Link: https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/CANMq1KDZuxir2LM5jOTm0xx*BnvW=ZmpsG47CyHFJwnw7zSX6Q@mail.gmail.com/__;Kw!!GqivPVa7Brio!P1UWThiSkxbjfjFQWNYJmCxGEkiLFyvHjH6cS-G1ZTt1z-TeqwGQgQmgCmMHzA$
> Link: https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/__;!!GqivPVa7Brio!P1UWThiSkxbjfjFQWNYJmCxGEkiLFyvHjH6cS-G1ZTt1z-TeqwGQgQmzqItkrQ$
> Reported-by: Nicolas Boichat <drinkcat@chromium.org>
> Signed-off-by: Luis Henriques <lhenriques@suse.de>
> ---
> Changes since v7
> - set 'ret' to '-EOPNOTSUPP' before the clone 'if' statement so that the
>    error returned is always related to the 'copy' operation
> Changes since v6
> - restored i_sb checks for the clone operation
> Changes since v5
> - check if ->copy_file_range is NULL before calling it
> Changes since v4
> - nfsd falls-back to generic_copy_file_range() only *if* it gets -EOPNOTSUPP
>    or -EXDEV.
> Changes since v3
> - dropped the COPY_FILE_SPLICE flag
> - kept the f_op's checks early in generic_copy_file_checks, implementing
>    Amir's suggestions
> - modified nfsd to use generic_copy_file_range()
> Changes since v2
> - do all the required checks earlier, in generic_copy_file_checks(),
>    adding new checks for ->remap_file_range
> - new COPY_FILE_SPLICE flag
> - don't remove filesystem's fallback to generic_copy_file_range()
> - updated commit changelog (and subject)
> Changes since v1 (after Amir review)
> - restored do_copy_file_range() helper
> - return -EOPNOTSUPP if fs doesn't implement CFR
> - updated commit description
>
>   fs/nfsd/vfs.c   |  8 +++++++-
>   fs/read_write.c | 49 ++++++++++++++++++++++++-------------------------
>   2 files changed, 31 insertions(+), 26 deletions(-)
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 04937e51de56..23dab0fa9087 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -568,6 +568,7 @@ __be32 nfsd4_clone_file_range(struct nfsd_file *nf_src, u64 src_pos,
>   ssize_t nfsd_copy_file_range(struct file *src, u64 src_pos, struct file *dst,
>   			     u64 dst_pos, u64 count)
>   {
> +	ssize_t ret;
>   
>   	/*
>   	 * Limit copy to 4MB to prevent indefinitely blocking an nfsd
> @@ -578,7 +579,12 @@ ssize_t nfsd_copy_file_range(struct file *src, u64 src_pos, struct file *dst,
>   	 * limit like this and pipeline multiple COPY requests.
>   	 */
>   	count = min_t(u64, count, 1 << 22);
> -	return vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0);
> +	ret = vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0);
> +
> +	if (ret == -EOPNOTSUPP || ret == -EXDEV)
> +		ret = generic_copy_file_range(src, src_pos, dst, dst_pos,
> +					      count, 0);
> +	return ret;
>   }
>   
>   __be32 nfsd4_vfs_fallocate(struct svc_rqst *rqstp, struct svc_fh *fhp,
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 75f764b43418..5a26297fd410 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1388,28 +1388,6 @@ ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in,
>   }
>   EXPORT_SYMBOL(generic_copy_file_range);
>   
> -static ssize_t do_copy_file_range(struct file *file_in, loff_t pos_in,
> -				  struct file *file_out, loff_t pos_out,
> -				  size_t len, unsigned int flags)
> -{
> -	/*
> -	 * Although we now allow filesystems to handle cross sb copy, passing
> -	 * a file of the wrong filesystem type to filesystem driver can result
> -	 * in an attempt to dereference the wrong type of ->private_data, so
> -	 * avoid doing that until we really have a good reason.  NFS defines
> -	 * several different file_system_type structures, but they all end up
> -	 * using the same ->copy_file_range() function pointer.
> -	 */
> -	if (file_out->f_op->copy_file_range &&
> -	    file_out->f_op->copy_file_range == file_in->f_op->copy_file_range)
> -		return file_out->f_op->copy_file_range(file_in, pos_in,
> -						       file_out, pos_out,
> -						       len, flags);
> -
> -	return generic_copy_file_range(file_in, pos_in, file_out, pos_out, len,
> -				       flags);
> -}
> -
>   /*
>    * Performs necessary checks before doing a file copy
>    *
> @@ -1427,6 +1405,25 @@ static int generic_copy_file_checks(struct file *file_in, loff_t pos_in,
>   	loff_t size_in;
>   	int ret;
>   
> +	/*
> +	 * Although we now allow filesystems to handle cross sb copy, passing
> +	 * a file of the wrong filesystem type to filesystem driver can result
> +	 * in an attempt to dereference the wrong type of ->private_data, so
> +	 * avoid doing that until we really have a good reason.  NFS defines
> +	 * several different file_system_type structures, but they all end up
> +	 * using the same ->copy_file_range() function pointer.
> +	 */
> +	if (file_out->f_op->copy_file_range) {
> +		if (file_in->f_op->copy_file_range !=
> +		    file_out->f_op->copy_file_range)
> +			return -EXDEV;
> +	} else if (file_in->f_op->remap_file_range) {
> +		if (file_inode(file_in)->i_sb != file_inode(file_out)->i_sb)
> +			return -EXDEV;

I think this check is redundant, it's done in vfs_copy_file_range.
If this check is removed then the else clause below should be removed
also. Once this check and the else clause are removed then might as
well move the the check of copy_file_range from here to vfs_copy_file_range.

-Dai

> +	} else {
> +                return -EOPNOTSUPP;
> +	}
> +
>   	ret = generic_file_rw_checks(file_in, file_out);
>   	if (ret)
>   		return ret;
> @@ -1495,6 +1492,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>   
>   	file_start_write(file_out);
>   
> +	ret = -EOPNOTSUPP;
>   	/*
>   	 * Try cloning first, this is supported by more file systems, and
>   	 * more efficient if both clone and copy are supported (e.g. NFS).
> @@ -1513,9 +1511,10 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>   		}
>   	}
>   
> -	ret = do_copy_file_range(file_in, pos_in, file_out, pos_out, len,
> -				flags);
> -	WARN_ON_ONCE(ret == -EOPNOTSUPP);
> +	if (file_out->f_op->copy_file_range)
> +		ret = file_out->f_op->copy_file_range(file_in, pos_in,
> +						      file_out, pos_out,
> +						      len, flags);
>   done:
>   	if (ret > 0) {
>   		fsnotify_access(file_in);
Luis Henriques Feb. 23, 2021, 10:32 a.m. UTC | #3
On Mon, Feb 22, 2021 at 08:25:27AM -0800, dai.ngo@oracle.com wrote:
> 
> On 2/22/21 2:24 AM, Luis Henriques wrote:
> > A regression has been reported by Nicolas Boichat, found while using the
> > copy_file_range syscall to copy a tracefs file.  Before commit
> > 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices") the
> > kernel would return -EXDEV to userspace when trying to copy a file across
> > different filesystems.  After this commit, the syscall doesn't fail anymore
> > and instead returns zero (zero bytes copied), as this file's content is
> > generated on-the-fly and thus reports a size of zero.
> > 
> > This patch restores some cross-filesystem copy restrictions that existed
> > prior to commit 5dae222a5ff0 ("vfs: allow copy_file_range to copy across
> > devices").  Filesystems are still allowed to fall-back to the VFS
> > generic_copy_file_range() implementation, but that has now to be done
> > explicitly.
> > 
> > nfsd is also modified to fall-back into generic_copy_file_range() in case
> > vfs_copy_file_range() fails with -EOPNOTSUPP or -EXDEV.
> > 
> > Fixes: 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices")
> > Link: https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@chromium.org/__;!!GqivPVa7Brio!P1UWThiSkxbjfjFQWNYJmCxGEkiLFyvHjH6cS-G1ZTt1z-TeqwGQgQmi49dC6w$
> > Link: https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/CANMq1KDZuxir2LM5jOTm0xx*BnvW=ZmpsG47CyHFJwnw7zSX6Q@mail.gmail.com/__;Kw!!GqivPVa7Brio!P1UWThiSkxbjfjFQWNYJmCxGEkiLFyvHjH6cS-G1ZTt1z-TeqwGQgQmgCmMHzA$
> > Link: https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/__;!!GqivPVa7Brio!P1UWThiSkxbjfjFQWNYJmCxGEkiLFyvHjH6cS-G1ZTt1z-TeqwGQgQmzqItkrQ$
> > Reported-by: Nicolas Boichat <drinkcat@chromium.org>
> > Signed-off-by: Luis Henriques <lhenriques@suse.de>
> > ---
> > Changes since v7
> > - set 'ret' to '-EOPNOTSUPP' before the clone 'if' statement so that the
> >    error returned is always related to the 'copy' operation
> > Changes since v6
> > - restored i_sb checks for the clone operation
> > Changes since v5
> > - check if ->copy_file_range is NULL before calling it
> > Changes since v4
> > - nfsd falls-back to generic_copy_file_range() only *if* it gets -EOPNOTSUPP
> >    or -EXDEV.
> > Changes since v3
> > - dropped the COPY_FILE_SPLICE flag
> > - kept the f_op's checks early in generic_copy_file_checks, implementing
> >    Amir's suggestions
> > - modified nfsd to use generic_copy_file_range()
> > Changes since v2
> > - do all the required checks earlier, in generic_copy_file_checks(),
> >    adding new checks for ->remap_file_range
> > - new COPY_FILE_SPLICE flag
> > - don't remove filesystem's fallback to generic_copy_file_range()
> > - updated commit changelog (and subject)
> > Changes since v1 (after Amir review)
> > - restored do_copy_file_range() helper
> > - return -EOPNOTSUPP if fs doesn't implement CFR
> > - updated commit description
> > 
> >   fs/nfsd/vfs.c   |  8 +++++++-
> >   fs/read_write.c | 49 ++++++++++++++++++++++++-------------------------
> >   2 files changed, 31 insertions(+), 26 deletions(-)
> > 
> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index 04937e51de56..23dab0fa9087 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -568,6 +568,7 @@ __be32 nfsd4_clone_file_range(struct nfsd_file *nf_src, u64 src_pos,
> >   ssize_t nfsd_copy_file_range(struct file *src, u64 src_pos, struct file *dst,
> >   			     u64 dst_pos, u64 count)
> >   {
> > +	ssize_t ret;
> >   	/*
> >   	 * Limit copy to 4MB to prevent indefinitely blocking an nfsd
> > @@ -578,7 +579,12 @@ ssize_t nfsd_copy_file_range(struct file *src, u64 src_pos, struct file *dst,
> >   	 * limit like this and pipeline multiple COPY requests.
> >   	 */
> >   	count = min_t(u64, count, 1 << 22);
> > -	return vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0);
> > +	ret = vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0);
> > +
> > +	if (ret == -EOPNOTSUPP || ret == -EXDEV)
> > +		ret = generic_copy_file_range(src, src_pos, dst, dst_pos,
> > +					      count, 0);
> > +	return ret;
> >   }
> >   __be32 nfsd4_vfs_fallocate(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > diff --git a/fs/read_write.c b/fs/read_write.c
> > index 75f764b43418..5a26297fd410 100644
> > --- a/fs/read_write.c
> > +++ b/fs/read_write.c
> > @@ -1388,28 +1388,6 @@ ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in,
> >   }
> >   EXPORT_SYMBOL(generic_copy_file_range);
> > -static ssize_t do_copy_file_range(struct file *file_in, loff_t pos_in,
> > -				  struct file *file_out, loff_t pos_out,
> > -				  size_t len, unsigned int flags)
> > -{
> > -	/*
> > -	 * Although we now allow filesystems to handle cross sb copy, passing
> > -	 * a file of the wrong filesystem type to filesystem driver can result
> > -	 * in an attempt to dereference the wrong type of ->private_data, so
> > -	 * avoid doing that until we really have a good reason.  NFS defines
> > -	 * several different file_system_type structures, but they all end up
> > -	 * using the same ->copy_file_range() function pointer.
> > -	 */
> > -	if (file_out->f_op->copy_file_range &&
> > -	    file_out->f_op->copy_file_range == file_in->f_op->copy_file_range)
> > -		return file_out->f_op->copy_file_range(file_in, pos_in,
> > -						       file_out, pos_out,
> > -						       len, flags);
> > -
> > -	return generic_copy_file_range(file_in, pos_in, file_out, pos_out, len,
> > -				       flags);
> > -}
> > -
> >   /*
> >    * Performs necessary checks before doing a file copy
> >    *
> > @@ -1427,6 +1405,25 @@ static int generic_copy_file_checks(struct file *file_in, loff_t pos_in,
> >   	loff_t size_in;
> >   	int ret;
> > +	/*
> > +	 * Although we now allow filesystems to handle cross sb copy, passing
> > +	 * a file of the wrong filesystem type to filesystem driver can result
> > +	 * in an attempt to dereference the wrong type of ->private_data, so
> > +	 * avoid doing that until we really have a good reason.  NFS defines
> > +	 * several different file_system_type structures, but they all end up
> > +	 * using the same ->copy_file_range() function pointer.
> > +	 */
> > +	if (file_out->f_op->copy_file_range) {
> > +		if (file_in->f_op->copy_file_range !=
> > +		    file_out->f_op->copy_file_range)
> > +			return -EXDEV;
> > +	} else if (file_in->f_op->remap_file_range) {
> > +		if (file_inode(file_in)->i_sb != file_inode(file_out)->i_sb)
> > +			return -EXDEV;
> 
> I think this check is redundant, it's done in vfs_copy_file_range.
> If this check is removed then the else clause below should be removed
> also. Once this check and the else clause are removed then might as
> well move the the check of copy_file_range from here to vfs_copy_file_range.
> 

I don't think it's really redundant, although I agree is messy due to the
fact we try to clone first instead of copying them.

So, in the clone path, this is the only place where we return -EXDEV if:

1) we don't have ->copy_file_range *and*
2) we have ->remap_file_range but the i_sb are different.

The check in vfs_copy_file_range() is only executed if:

1) we have *valid* ->copy_file_range ops and/or
2) we have *valid* ->remap_file_range

So... if we remove the check in generic_copy_file_checks() as you suggest
and:
- we don't have ->copy_file_range,
- we have ->remap_file_range but
- the i_sb are different

we'll return the -EOPNOTSUPP (the one set in line "ret = -EOPNOTSUPP;" in
function vfs_copy_file_range() ) instead of -EXDEV.

But I may have got it all wrong.  I've looked so many times at this code
that I'm probably useless at finding problems in it :-)

Cheers,
--
Luís

> -Dai
> 
> > +	} else {
> > +                return -EOPNOTSUPP;
> > +	}
> > +
> >   	ret = generic_file_rw_checks(file_in, file_out);
> >   	if (ret)
> >   		return ret;
> > @@ -1495,6 +1492,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> >   	file_start_write(file_out);
> > +	ret = -EOPNOTSUPP;
> >   	/*
> >   	 * Try cloning first, this is supported by more file systems, and
> >   	 * more efficient if both clone and copy are supported (e.g. NFS).
> > @@ -1513,9 +1511,10 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> >   		}
> >   	}
> > -	ret = do_copy_file_range(file_in, pos_in, file_out, pos_out, len,
> > -				flags);
> > -	WARN_ON_ONCE(ret == -EOPNOTSUPP);
> > +	if (file_out->f_op->copy_file_range)
> > +		ret = file_out->f_op->copy_file_range(file_in, pos_in,
> > +						      file_out, pos_out,
> > +						      len, flags);
> >   done:
> >   	if (ret > 0) {
> >   		fsnotify_access(file_in);
Amir Goldstein Feb. 23, 2021, 3:28 p.m. UTC | #4
On Tue, Feb 23, 2021 at 12:31 PM Luis Henriques <lhenriques@suse.de> wrote:
>
> On Mon, Feb 22, 2021 at 08:25:27AM -0800, dai.ngo@oracle.com wrote:
> >
> > On 2/22/21 2:24 AM, Luis Henriques wrote:
> > > A regression has been reported by Nicolas Boichat, found while using the
> > > copy_file_range syscall to copy a tracefs file.  Before commit
> > > 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices") the
> > > kernel would return -EXDEV to userspace when trying to copy a file across
> > > different filesystems.  After this commit, the syscall doesn't fail anymore
> > > and instead returns zero (zero bytes copied), as this file's content is
> > > generated on-the-fly and thus reports a size of zero.
> > >
> > > This patch restores some cross-filesystem copy restrictions that existed
> > > prior to commit 5dae222a5ff0 ("vfs: allow copy_file_range to copy across
> > > devices").  Filesystems are still allowed to fall-back to the VFS
> > > generic_copy_file_range() implementation, but that has now to be done
> > > explicitly.
> > >
> > > nfsd is also modified to fall-back into generic_copy_file_range() in case
> > > vfs_copy_file_range() fails with -EOPNOTSUPP or -EXDEV.
> > >
> > > Fixes: 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices")
> > > Link: https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@chromium.org/__;!!GqivPVa7Brio!P1UWThiSkxbjfjFQWNYJmCxGEkiLFyvHjH6cS-G1ZTt1z-TeqwGQgQmi49dC6w$
> > > Link: https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/CANMq1KDZuxir2LM5jOTm0xx*BnvW=ZmpsG47CyHFJwnw7zSX6Q@mail.gmail.com/__;Kw!!GqivPVa7Brio!P1UWThiSkxbjfjFQWNYJmCxGEkiLFyvHjH6cS-G1ZTt1z-TeqwGQgQmgCmMHzA$
> > > Link: https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/__;!!GqivPVa7Brio!P1UWThiSkxbjfjFQWNYJmCxGEkiLFyvHjH6cS-G1ZTt1z-TeqwGQgQmzqItkrQ$
> > > Reported-by: Nicolas Boichat <drinkcat@chromium.org>
> > > Signed-off-by: Luis Henriques <lhenriques@suse.de>
> > > ---
> > > Changes since v7
> > > - set 'ret' to '-EOPNOTSUPP' before the clone 'if' statement so that the
> > >    error returned is always related to the 'copy' operation
> > > Changes since v6
> > > - restored i_sb checks for the clone operation
> > > Changes since v5
> > > - check if ->copy_file_range is NULL before calling it
> > > Changes since v4
> > > - nfsd falls-back to generic_copy_file_range() only *if* it gets -EOPNOTSUPP
> > >    or -EXDEV.
> > > Changes since v3
> > > - dropped the COPY_FILE_SPLICE flag
> > > - kept the f_op's checks early in generic_copy_file_checks, implementing
> > >    Amir's suggestions
> > > - modified nfsd to use generic_copy_file_range()
> > > Changes since v2
> > > - do all the required checks earlier, in generic_copy_file_checks(),
> > >    adding new checks for ->remap_file_range
> > > - new COPY_FILE_SPLICE flag
> > > - don't remove filesystem's fallback to generic_copy_file_range()
> > > - updated commit changelog (and subject)
> > > Changes since v1 (after Amir review)
> > > - restored do_copy_file_range() helper
> > > - return -EOPNOTSUPP if fs doesn't implement CFR
> > > - updated commit description
> > >
> > >   fs/nfsd/vfs.c   |  8 +++++++-
> > >   fs/read_write.c | 49 ++++++++++++++++++++++++-------------------------
> > >   2 files changed, 31 insertions(+), 26 deletions(-)
> > >
> > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > > index 04937e51de56..23dab0fa9087 100644
> > > --- a/fs/nfsd/vfs.c
> > > +++ b/fs/nfsd/vfs.c
> > > @@ -568,6 +568,7 @@ __be32 nfsd4_clone_file_range(struct nfsd_file *nf_src, u64 src_pos,
> > >   ssize_t nfsd_copy_file_range(struct file *src, u64 src_pos, struct file *dst,
> > >                          u64 dst_pos, u64 count)
> > >   {
> > > +   ssize_t ret;
> > >     /*
> > >      * Limit copy to 4MB to prevent indefinitely blocking an nfsd
> > > @@ -578,7 +579,12 @@ ssize_t nfsd_copy_file_range(struct file *src, u64 src_pos, struct file *dst,
> > >      * limit like this and pipeline multiple COPY requests.
> > >      */
> > >     count = min_t(u64, count, 1 << 22);
> > > -   return vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0);
> > > +   ret = vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0);
> > > +
> > > +   if (ret == -EOPNOTSUPP || ret == -EXDEV)
> > > +           ret = generic_copy_file_range(src, src_pos, dst, dst_pos,
> > > +                                         count, 0);
> > > +   return ret;
> > >   }
> > >   __be32 nfsd4_vfs_fallocate(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > > diff --git a/fs/read_write.c b/fs/read_write.c
> > > index 75f764b43418..5a26297fd410 100644
> > > --- a/fs/read_write.c
> > > +++ b/fs/read_write.c
> > > @@ -1388,28 +1388,6 @@ ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in,
> > >   }
> > >   EXPORT_SYMBOL(generic_copy_file_range);
> > > -static ssize_t do_copy_file_range(struct file *file_in, loff_t pos_in,
> > > -                             struct file *file_out, loff_t pos_out,
> > > -                             size_t len, unsigned int flags)
> > > -{
> > > -   /*
> > > -    * Although we now allow filesystems to handle cross sb copy, passing
> > > -    * a file of the wrong filesystem type to filesystem driver can result
> > > -    * in an attempt to dereference the wrong type of ->private_data, so
> > > -    * avoid doing that until we really have a good reason.  NFS defines
> > > -    * several different file_system_type structures, but they all end up
> > > -    * using the same ->copy_file_range() function pointer.
> > > -    */
> > > -   if (file_out->f_op->copy_file_range &&
> > > -       file_out->f_op->copy_file_range == file_in->f_op->copy_file_range)
> > > -           return file_out->f_op->copy_file_range(file_in, pos_in,
> > > -                                                  file_out, pos_out,
> > > -                                                  len, flags);
> > > -
> > > -   return generic_copy_file_range(file_in, pos_in, file_out, pos_out, len,
> > > -                                  flags);
> > > -}
> > > -
> > >   /*
> > >    * Performs necessary checks before doing a file copy
> > >    *
> > > @@ -1427,6 +1405,25 @@ static int generic_copy_file_checks(struct file *file_in, loff_t pos_in,
> > >     loff_t size_in;
> > >     int ret;
> > > +   /*
> > > +    * Although we now allow filesystems to handle cross sb copy, passing
> > > +    * a file of the wrong filesystem type to filesystem driver can result
> > > +    * in an attempt to dereference the wrong type of ->private_data, so
> > > +    * avoid doing that until we really have a good reason.  NFS defines
> > > +    * several different file_system_type structures, but they all end up
> > > +    * using the same ->copy_file_range() function pointer.
> > > +    */
> > > +   if (file_out->f_op->copy_file_range) {
> > > +           if (file_in->f_op->copy_file_range !=
> > > +               file_out->f_op->copy_file_range)
> > > +                   return -EXDEV;
> > > +   } else if (file_in->f_op->remap_file_range) {
> > > +           if (file_inode(file_in)->i_sb != file_inode(file_out)->i_sb)
> > > +                   return -EXDEV;
> >
> > I think this check is redundant, it's done in vfs_copy_file_range.
> > If this check is removed then the else clause below should be removed
> > also. Once this check and the else clause are removed then might as
> > well move the the check of copy_file_range from here to vfs_copy_file_range.
> >
>
> I don't think it's really redundant, although I agree is messy due to the
> fact we try to clone first instead of copying them.
>

It was put here in early checks on purpose before the check for
zero size file.
I'm pretty sure this wasn't the case in earlier versions of the path
and then it did not solve the reported problem.

Thanks,
Amir.
Dai Ngo Feb. 23, 2021, 3:29 p.m. UTC | #5
On 2/23/21 2:32 AM, Luis Henriques wrote:
> On Mon, Feb 22, 2021 at 08:25:27AM -0800, dai.ngo@oracle.com wrote:
>> On 2/22/21 2:24 AM, Luis Henriques wrote:
>>> A regression has been reported by Nicolas Boichat, found while using the
>>> copy_file_range syscall to copy a tracefs file.  Before commit
>>> 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices") the
>>> kernel would return -EXDEV to userspace when trying to copy a file across
>>> different filesystems.  After this commit, the syscall doesn't fail anymore
>>> and instead returns zero (zero bytes copied), as this file's content is
>>> generated on-the-fly and thus reports a size of zero.
>>>
>>> This patch restores some cross-filesystem copy restrictions that existed
>>> prior to commit 5dae222a5ff0 ("vfs: allow copy_file_range to copy across
>>> devices").  Filesystems are still allowed to fall-back to the VFS
>>> generic_copy_file_range() implementation, but that has now to be done
>>> explicitly.
>>>
>>> nfsd is also modified to fall-back into generic_copy_file_range() in case
>>> vfs_copy_file_range() fails with -EOPNOTSUPP or -EXDEV.
>>>
>>> Fixes: 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices")
>>> Link: https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@chromium.org/__;!!GqivPVa7Brio!P1UWThiSkxbjfjFQWNYJmCxGEkiLFyvHjH6cS-G1ZTt1z-TeqwGQgQmi49dC6w$
>>> Link: https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/CANMq1KDZuxir2LM5jOTm0xx*BnvW=ZmpsG47CyHFJwnw7zSX6Q@mail.gmail.com/__;Kw!!GqivPVa7Brio!P1UWThiSkxbjfjFQWNYJmCxGEkiLFyvHjH6cS-G1ZTt1z-TeqwGQgQmgCmMHzA$
>>> Link: https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/__;!!GqivPVa7Brio!P1UWThiSkxbjfjFQWNYJmCxGEkiLFyvHjH6cS-G1ZTt1z-TeqwGQgQmzqItkrQ$
>>> Reported-by: Nicolas Boichat <drinkcat@chromium.org>
>>> Signed-off-by: Luis Henriques <lhenriques@suse.de>
>>> ---
>>> Changes since v7
>>> - set 'ret' to '-EOPNOTSUPP' before the clone 'if' statement so that the
>>>     error returned is always related to the 'copy' operation
>>> Changes since v6
>>> - restored i_sb checks for the clone operation
>>> Changes since v5
>>> - check if ->copy_file_range is NULL before calling it
>>> Changes since v4
>>> - nfsd falls-back to generic_copy_file_range() only *if* it gets -EOPNOTSUPP
>>>     or -EXDEV.
>>> Changes since v3
>>> - dropped the COPY_FILE_SPLICE flag
>>> - kept the f_op's checks early in generic_copy_file_checks, implementing
>>>     Amir's suggestions
>>> - modified nfsd to use generic_copy_file_range()
>>> Changes since v2
>>> - do all the required checks earlier, in generic_copy_file_checks(),
>>>     adding new checks for ->remap_file_range
>>> - new COPY_FILE_SPLICE flag
>>> - don't remove filesystem's fallback to generic_copy_file_range()
>>> - updated commit changelog (and subject)
>>> Changes since v1 (after Amir review)
>>> - restored do_copy_file_range() helper
>>> - return -EOPNOTSUPP if fs doesn't implement CFR
>>> - updated commit description
>>>
>>>    fs/nfsd/vfs.c   |  8 +++++++-
>>>    fs/read_write.c | 49 ++++++++++++++++++++++++-------------------------
>>>    2 files changed, 31 insertions(+), 26 deletions(-)
>>>
>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>>> index 04937e51de56..23dab0fa9087 100644
>>> --- a/fs/nfsd/vfs.c
>>> +++ b/fs/nfsd/vfs.c
>>> @@ -568,6 +568,7 @@ __be32 nfsd4_clone_file_range(struct nfsd_file *nf_src, u64 src_pos,
>>>    ssize_t nfsd_copy_file_range(struct file *src, u64 src_pos, struct file *dst,
>>>    			     u64 dst_pos, u64 count)
>>>    {
>>> +	ssize_t ret;
>>>    	/*
>>>    	 * Limit copy to 4MB to prevent indefinitely blocking an nfsd
>>> @@ -578,7 +579,12 @@ ssize_t nfsd_copy_file_range(struct file *src, u64 src_pos, struct file *dst,
>>>    	 * limit like this and pipeline multiple COPY requests.
>>>    	 */
>>>    	count = min_t(u64, count, 1 << 22);
>>> -	return vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0);
>>> +	ret = vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0);
>>> +
>>> +	if (ret == -EOPNOTSUPP || ret == -EXDEV)
>>> +		ret = generic_copy_file_range(src, src_pos, dst, dst_pos,
>>> +					      count, 0);
>>> +	return ret;
>>>    }
>>>    __be32 nfsd4_vfs_fallocate(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>> diff --git a/fs/read_write.c b/fs/read_write.c
>>> index 75f764b43418..5a26297fd410 100644
>>> --- a/fs/read_write.c
>>> +++ b/fs/read_write.c
>>> @@ -1388,28 +1388,6 @@ ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in,
>>>    }
>>>    EXPORT_SYMBOL(generic_copy_file_range);
>>> -static ssize_t do_copy_file_range(struct file *file_in, loff_t pos_in,
>>> -				  struct file *file_out, loff_t pos_out,
>>> -				  size_t len, unsigned int flags)
>>> -{
>>> -	/*
>>> -	 * Although we now allow filesystems to handle cross sb copy, passing
>>> -	 * a file of the wrong filesystem type to filesystem driver can result
>>> -	 * in an attempt to dereference the wrong type of ->private_data, so
>>> -	 * avoid doing that until we really have a good reason.  NFS defines
>>> -	 * several different file_system_type structures, but they all end up
>>> -	 * using the same ->copy_file_range() function pointer.
>>> -	 */
>>> -	if (file_out->f_op->copy_file_range &&
>>> -	    file_out->f_op->copy_file_range == file_in->f_op->copy_file_range)
>>> -		return file_out->f_op->copy_file_range(file_in, pos_in,
>>> -						       file_out, pos_out,
>>> -						       len, flags);
>>> -
>>> -	return generic_copy_file_range(file_in, pos_in, file_out, pos_out, len,
>>> -				       flags);
>>> -}
>>> -
>>>    /*
>>>     * Performs necessary checks before doing a file copy
>>>     *
>>> @@ -1427,6 +1405,25 @@ static int generic_copy_file_checks(struct file *file_in, loff_t pos_in,
>>>    	loff_t size_in;
>>>    	int ret;
>>> +	/*
>>> +	 * Although we now allow filesystems to handle cross sb copy, passing
>>> +	 * a file of the wrong filesystem type to filesystem driver can result
>>> +	 * in an attempt to dereference the wrong type of ->private_data, so
>>> +	 * avoid doing that until we really have a good reason.  NFS defines
>>> +	 * several different file_system_type structures, but they all end up
>>> +	 * using the same ->copy_file_range() function pointer.
>>> +	 */
>>> +	if (file_out->f_op->copy_file_range) {
>>> +		if (file_in->f_op->copy_file_range !=
>>> +		    file_out->f_op->copy_file_range)
>>> +			return -EXDEV;
>>> +	} else if (file_in->f_op->remap_file_range) {
>>> +		if (file_inode(file_in)->i_sb != file_inode(file_out)->i_sb)
>>> +			return -EXDEV;
>> I think this check is redundant, it's done in vfs_copy_file_range.
>> If this check is removed then the else clause below should be removed
>> also. Once this check and the else clause are removed then might as
>> well move the the check of copy_file_range from here to vfs_copy_file_range.
>>
> I don't think it's really redundant, although I agree is messy due to the
> fact we try to clone first instead of copying them.
>
> So, in the clone path, this is the only place where we return -EXDEV if:
>
> 1) we don't have ->copy_file_range *and*
> 2) we have ->remap_file_range but the i_sb are different.
>
> The check in vfs_copy_file_range() is only executed if:
>
> 1) we have *valid* ->copy_file_range ops and/or
> 2) we have *valid* ->remap_file_range
>
> So... if we remove the check in generic_copy_file_checks() as you suggest
> and:
> - we don't have ->copy_file_range,
> - we have ->remap_file_range but
> - the i_sb are different
>
> we'll return the -EOPNOTSUPP (the one set in line "ret = -EOPNOTSUPP;" in
> function vfs_copy_file_range() ) instead of -EXDEV.

Yes, this is the different.The NFS code handles both -EOPNOTSUPP and
-EXDEVV by doing generic_copy_file_range.  Do any other consumers of
vfs_copy_file_range rely on -EXDEV and not -EOPNOTSUPP and which is
the correct error code for this case? It seems to me that -EOPNOTSUPP
is more appropriate than EXDEV when (sb1 != sb2).

>
> But I may have got it all wrong.  I've looked so many times at this code
> that I'm probably useless at finding problems in it :-)

You're not alone, we all try to do the right thing :-)

-Dai

>
> Cheers,
> --
> Luís
>
>> -Dai
>>
>>> +	} else {
>>> +                return -EOPNOTSUPP;
>>> +	}
>>> +
>>>    	ret = generic_file_rw_checks(file_in, file_out);
>>>    	if (ret)
>>>    		return ret;
>>> @@ -1495,6 +1492,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>>>    	file_start_write(file_out);
>>> +	ret = -EOPNOTSUPP;
>>>    	/*
>>>    	 * Try cloning first, this is supported by more file systems, and
>>>    	 * more efficient if both clone and copy are supported (e.g. NFS).
>>> @@ -1513,9 +1511,10 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>>>    		}
>>>    	}
>>> -	ret = do_copy_file_range(file_in, pos_in, file_out, pos_out, len,
>>> -				flags);
>>> -	WARN_ON_ONCE(ret == -EOPNOTSUPP);
>>> +	if (file_out->f_op->copy_file_range)
>>> +		ret = file_out->f_op->copy_file_range(file_in, pos_in,
>>> +						      file_out, pos_out,
>>> +						      len, flags);
>>>    done:
>>>    	if (ret > 0) {
>>>    		fsnotify_access(file_in);
Dai Ngo Feb. 23, 2021, 4:02 p.m. UTC | #6
On 2/23/21 7:29 AM, dai.ngo@oracle.com wrote:
>
> On 2/23/21 2:32 AM, Luis Henriques wrote:
>> On Mon, Feb 22, 2021 at 08:25:27AM -0800, dai.ngo@oracle.com wrote:
>>> On 2/22/21 2:24 AM, Luis Henriques wrote:
>>>> A regression has been reported by Nicolas Boichat, found while 
>>>> using the
>>>> copy_file_range syscall to copy a tracefs file.  Before commit
>>>> 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices") the
>>>> kernel would return -EXDEV to userspace when trying to copy a file 
>>>> across
>>>> different filesystems.  After this commit, the syscall doesn't fail 
>>>> anymore
>>>> and instead returns zero (zero bytes copied), as this file's 
>>>> content is
>>>> generated on-the-fly and thus reports a size of zero.
>>>>
>>>> This patch restores some cross-filesystem copy restrictions that 
>>>> existed
>>>> prior to commit 5dae222a5ff0 ("vfs: allow copy_file_range to copy 
>>>> across
>>>> devices").  Filesystems are still allowed to fall-back to the VFS
>>>> generic_copy_file_range() implementation, but that has now to be done
>>>> explicitly.
>>>>
>>>> nfsd is also modified to fall-back into generic_copy_file_range() 
>>>> in case
>>>> vfs_copy_file_range() fails with -EOPNOTSUPP or -EXDEV.
>>>>
>>>> Fixes: 5dae222a5ff0 ("vfs: allow copy_file_range to copy across 
>>>> devices")
>>>> Link: 
>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@chromium.org/__;!!GqivPVa7Brio!P1UWThiSkxbjfjFQWNYJmCxGEkiLFyvHjH6cS-G1ZTt1z-TeqwGQgQmi49dC6w$
>>>> Link: 
>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/CANMq1KDZuxir2LM5jOTm0xx*BnvW=ZmpsG47CyHFJwnw7zSX6Q@mail.gmail.com/__;Kw!!GqivPVa7Brio!P1UWThiSkxbjfjFQWNYJmCxGEkiLFyvHjH6cS-G1ZTt1z-TeqwGQgQmgCmMHzA$
>>>> Link: 
>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/__;!!GqivPVa7Brio!P1UWThiSkxbjfjFQWNYJmCxGEkiLFyvHjH6cS-G1ZTt1z-TeqwGQgQmzqItkrQ$
>>>> Reported-by: Nicolas Boichat <drinkcat@chromium.org>
>>>> Signed-off-by: Luis Henriques <lhenriques@suse.de>
>>>> ---
>>>> Changes since v7
>>>> - set 'ret' to '-EOPNOTSUPP' before the clone 'if' statement so 
>>>> that the
>>>>     error returned is always related to the 'copy' operation
>>>> Changes since v6
>>>> - restored i_sb checks for the clone operation
>>>> Changes since v5
>>>> - check if ->copy_file_range is NULL before calling it
>>>> Changes since v4
>>>> - nfsd falls-back to generic_copy_file_range() only *if* it gets 
>>>> -EOPNOTSUPP
>>>>     or -EXDEV.
>>>> Changes since v3
>>>> - dropped the COPY_FILE_SPLICE flag
>>>> - kept the f_op's checks early in generic_copy_file_checks, 
>>>> implementing
>>>>     Amir's suggestions
>>>> - modified nfsd to use generic_copy_file_range()
>>>> Changes since v2
>>>> - do all the required checks earlier, in generic_copy_file_checks(),
>>>>     adding new checks for ->remap_file_range
>>>> - new COPY_FILE_SPLICE flag
>>>> - don't remove filesystem's fallback to generic_copy_file_range()
>>>> - updated commit changelog (and subject)
>>>> Changes since v1 (after Amir review)
>>>> - restored do_copy_file_range() helper
>>>> - return -EOPNOTSUPP if fs doesn't implement CFR
>>>> - updated commit description
>>>>
>>>>    fs/nfsd/vfs.c   |  8 +++++++-
>>>>    fs/read_write.c | 49 
>>>> ++++++++++++++++++++++++-------------------------
>>>>    2 files changed, 31 insertions(+), 26 deletions(-)
>>>>
>>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>>>> index 04937e51de56..23dab0fa9087 100644
>>>> --- a/fs/nfsd/vfs.c
>>>> +++ b/fs/nfsd/vfs.c
>>>> @@ -568,6 +568,7 @@ __be32 nfsd4_clone_file_range(struct nfsd_file 
>>>> *nf_src, u64 src_pos,
>>>>    ssize_t nfsd_copy_file_range(struct file *src, u64 src_pos, 
>>>> struct file *dst,
>>>>                     u64 dst_pos, u64 count)
>>>>    {
>>>> +    ssize_t ret;
>>>>        /*
>>>>         * Limit copy to 4MB to prevent indefinitely blocking an nfsd
>>>> @@ -578,7 +579,12 @@ ssize_t nfsd_copy_file_range(struct file *src, 
>>>> u64 src_pos, struct file *dst,
>>>>         * limit like this and pipeline multiple COPY requests.
>>>>         */
>>>>        count = min_t(u64, count, 1 << 22);
>>>> -    return vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0);
>>>> +    ret = vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0);
>>>> +
>>>> +    if (ret == -EOPNOTSUPP || ret == -EXDEV)
>>>> +        ret = generic_copy_file_range(src, src_pos, dst, dst_pos,
>>>> +                          count, 0);
>>>> +    return ret;
>>>>    }
>>>>    __be32 nfsd4_vfs_fallocate(struct svc_rqst *rqstp, struct svc_fh 
>>>> *fhp,
>>>> diff --git a/fs/read_write.c b/fs/read_write.c
>>>> index 75f764b43418..5a26297fd410 100644
>>>> --- a/fs/read_write.c
>>>> +++ b/fs/read_write.c
>>>> @@ -1388,28 +1388,6 @@ ssize_t generic_copy_file_range(struct file 
>>>> *file_in, loff_t pos_in,
>>>>    }
>>>>    EXPORT_SYMBOL(generic_copy_file_range);
>>>> -static ssize_t do_copy_file_range(struct file *file_in, loff_t 
>>>> pos_in,
>>>> -                  struct file *file_out, loff_t pos_out,
>>>> -                  size_t len, unsigned int flags)
>>>> -{
>>>> -    /*
>>>> -     * Although we now allow filesystems to handle cross sb copy, 
>>>> passing
>>>> -     * a file of the wrong filesystem type to filesystem driver 
>>>> can result
>>>> -     * in an attempt to dereference the wrong type of 
>>>> ->private_data, so
>>>> -     * avoid doing that until we really have a good reason.  NFS 
>>>> defines
>>>> -     * several different file_system_type structures, but they all 
>>>> end up
>>>> -     * using the same ->copy_file_range() function pointer.
>>>> -     */
>>>> -    if (file_out->f_op->copy_file_range &&
>>>> -        file_out->f_op->copy_file_range == 
>>>> file_in->f_op->copy_file_range)
>>>> -        return file_out->f_op->copy_file_range(file_in, pos_in,
>>>> -                               file_out, pos_out,
>>>> -                               len, flags);
>>>> -
>>>> -    return generic_copy_file_range(file_in, pos_in, file_out, 
>>>> pos_out, len,
>>>> -                       flags);
>>>> -}
>>>> -
>>>>    /*
>>>>     * Performs necessary checks before doing a file copy
>>>>     *
>>>> @@ -1427,6 +1405,25 @@ static int generic_copy_file_checks(struct 
>>>> file *file_in, loff_t pos_in,
>>>>        loff_t size_in;
>>>>        int ret;
>>>> +    /*
>>>> +     * Although we now allow filesystems to handle cross sb copy, 
>>>> passing
>>>> +     * a file of the wrong filesystem type to filesystem driver 
>>>> can result
>>>> +     * in an attempt to dereference the wrong type of 
>>>> ->private_data, so
>>>> +     * avoid doing that until we really have a good reason.  NFS 
>>>> defines
>>>> +     * several different file_system_type structures, but they all 
>>>> end up
>>>> +     * using the same ->copy_file_range() function pointer.
>>>> +     */
>>>> +    if (file_out->f_op->copy_file_range) {
>>>> +        if (file_in->f_op->copy_file_range !=
>>>> +            file_out->f_op->copy_file_range)
>>>> +            return -EXDEV;
>>>> +    } else if (file_in->f_op->remap_file_range) {
>>>> +        if (file_inode(file_in)->i_sb != file_inode(file_out)->i_sb)
>>>> +            return -EXDEV;
>>> I think this check is redundant, it's done in vfs_copy_file_range.
>>> If this check is removed then the else clause below should be removed
>>> also. Once this check and the else clause are removed then might as
>>> well move the the check of copy_file_range from here to 
>>> vfs_copy_file_range.
>>>
>> I don't think it's really redundant, although I agree is messy due to 
>> the
>> fact we try to clone first instead of copying them.
>>
>> So, in the clone path, this is the only place where we return -EXDEV if:
>>
>> 1) we don't have ->copy_file_range *and*
>> 2) we have ->remap_file_range but the i_sb are different.
>>
>> The check in vfs_copy_file_range() is only executed if:
>>
>> 1) we have *valid* ->copy_file_range ops and/or
>> 2) we have *valid* ->remap_file_range
>>
>> So... if we remove the check in generic_copy_file_checks() as you 
>> suggest
>> and:
>> - we don't have ->copy_file_range,
>> - we have ->remap_file_range but
>> - the i_sb are different
>>
>> we'll return the -EOPNOTSUPP (the one set in line "ret = 
>> -EOPNOTSUPP;" in
>> function vfs_copy_file_range() ) instead of -EXDEV.
>
> Yes, this is the different.The NFS code handles both -EOPNOTSUPP and
> -EXDEVV by doing generic_copy_file_range.  Do any other consumers of
> vfs_copy_file_range rely on -EXDEV and not -EOPNOTSUPP and which is
> the correct error code for this case? It seems to me that -EOPNOTSUPP
> is more appropriate than EXDEV when (sb1 != sb2).

So with the current patch, for a clone operation across 2 filesystems:

   . if src and dst filesystem support both copy_file_range and
     map_file_range then the code returns -ENOTSUPPORT.

   . if the filesystems only support map_file_range then the
     code returns -EXDEV

This seems confusing, shouldn't only 1 error code returned for this case?

-Dai

>
>>
>> But I may have got it all wrong.  I've looked so many times at this code
>> that I'm probably useless at finding problems in it :-)
>
> You're not alone, we all try to do the right thing :-)
>
> -Dai
>
>>
>> Cheers,
>> -- 
>> Luís
>>
>>> -Dai
>>>
>>>> +    } else {
>>>> +                return -EOPNOTSUPP;
>>>> +    }
>>>> +
>>>>        ret = generic_file_rw_checks(file_in, file_out);
>>>>        if (ret)
>>>>            return ret;
>>>> @@ -1495,6 +1492,7 @@ ssize_t vfs_copy_file_range(struct file 
>>>> *file_in, loff_t pos_in,
>>>>        file_start_write(file_out);
>>>> +    ret = -EOPNOTSUPP;
>>>>        /*
>>>>         * Try cloning first, this is supported by more file 
>>>> systems, and
>>>>         * more efficient if both clone and copy are supported (e.g. 
>>>> NFS).
>>>> @@ -1513,9 +1511,10 @@ ssize_t vfs_copy_file_range(struct file 
>>>> *file_in, loff_t pos_in,
>>>>            }
>>>>        }
>>>> -    ret = do_copy_file_range(file_in, pos_in, file_out, pos_out, len,
>>>> -                flags);
>>>> -    WARN_ON_ONCE(ret == -EOPNOTSUPP);
>>>> +    if (file_out->f_op->copy_file_range)
>>>> +        ret = file_out->f_op->copy_file_range(file_in, pos_in,
>>>> +                              file_out, pos_out,
>>>> +                              len, flags);
>>>>    done:
>>>>        if (ret > 0) {
>>>>            fsnotify_access(file_in);
Amir Goldstein Feb. 23, 2021, 4:47 p.m. UTC | #7
On Tue, Feb 23, 2021 at 6:02 PM <dai.ngo@oracle.com> wrote:
>
>
> On 2/23/21 7:29 AM, dai.ngo@oracle.com wrote:
> >
> > On 2/23/21 2:32 AM, Luis Henriques wrote:
> >> On Mon, Feb 22, 2021 at 08:25:27AM -0800, dai.ngo@oracle.com wrote:
> >>> On 2/22/21 2:24 AM, Luis Henriques wrote:
> >>>> A regression has been reported by Nicolas Boichat, found while
> >>>> using the
> >>>> copy_file_range syscall to copy a tracefs file.  Before commit
> >>>> 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices") the
> >>>> kernel would return -EXDEV to userspace when trying to copy a file
> >>>> across
> >>>> different filesystems.  After this commit, the syscall doesn't fail
> >>>> anymore
> >>>> and instead returns zero (zero bytes copied), as this file's
> >>>> content is
> >>>> generated on-the-fly and thus reports a size of zero.
> >>>>
> >>>> This patch restores some cross-filesystem copy restrictions that
> >>>> existed
> >>>> prior to commit 5dae222a5ff0 ("vfs: allow copy_file_range to copy
> >>>> across
> >>>> devices").  Filesystems are still allowed to fall-back to the VFS
> >>>> generic_copy_file_range() implementation, but that has now to be done
> >>>> explicitly.
> >>>>
> >>>> nfsd is also modified to fall-back into generic_copy_file_range()
> >>>> in case
> >>>> vfs_copy_file_range() fails with -EOPNOTSUPP or -EXDEV.
> >>>>
> >>>> Fixes: 5dae222a5ff0 ("vfs: allow copy_file_range to copy across
> >>>> devices")
> >>>> Link:
> >>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@chromium.org/__;!!GqivPVa7Brio!P1UWThiSkxbjfjFQWNYJmCxGEkiLFyvHjH6cS-G1ZTt1z-TeqwGQgQmi49dC6w$
> >>>> Link:
> >>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/CANMq1KDZuxir2LM5jOTm0xx*BnvW=ZmpsG47CyHFJwnw7zSX6Q@mail.gmail.com/__;Kw!!GqivPVa7Brio!P1UWThiSkxbjfjFQWNYJmCxGEkiLFyvHjH6cS-G1ZTt1z-TeqwGQgQmgCmMHzA$
> >>>> Link:
> >>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/__;!!GqivPVa7Brio!P1UWThiSkxbjfjFQWNYJmCxGEkiLFyvHjH6cS-G1ZTt1z-TeqwGQgQmzqItkrQ$
> >>>> Reported-by: Nicolas Boichat <drinkcat@chromium.org>
> >>>> Signed-off-by: Luis Henriques <lhenriques@suse.de>
> >>>> ---
> >>>> Changes since v7
> >>>> - set 'ret' to '-EOPNOTSUPP' before the clone 'if' statement so
> >>>> that the
> >>>>     error returned is always related to the 'copy' operation
> >>>> Changes since v6
> >>>> - restored i_sb checks for the clone operation
> >>>> Changes since v5
> >>>> - check if ->copy_file_range is NULL before calling it
> >>>> Changes since v4
> >>>> - nfsd falls-back to generic_copy_file_range() only *if* it gets
> >>>> -EOPNOTSUPP
> >>>>     or -EXDEV.
> >>>> Changes since v3
> >>>> - dropped the COPY_FILE_SPLICE flag
> >>>> - kept the f_op's checks early in generic_copy_file_checks,
> >>>> implementing
> >>>>     Amir's suggestions
> >>>> - modified nfsd to use generic_copy_file_range()
> >>>> Changes since v2
> >>>> - do all the required checks earlier, in generic_copy_file_checks(),
> >>>>     adding new checks for ->remap_file_range
> >>>> - new COPY_FILE_SPLICE flag
> >>>> - don't remove filesystem's fallback to generic_copy_file_range()
> >>>> - updated commit changelog (and subject)
> >>>> Changes since v1 (after Amir review)
> >>>> - restored do_copy_file_range() helper
> >>>> - return -EOPNOTSUPP if fs doesn't implement CFR
> >>>> - updated commit description
> >>>>
> >>>>    fs/nfsd/vfs.c   |  8 +++++++-
> >>>>    fs/read_write.c | 49
> >>>> ++++++++++++++++++++++++-------------------------
> >>>>    2 files changed, 31 insertions(+), 26 deletions(-)
> >>>>
> >>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> >>>> index 04937e51de56..23dab0fa9087 100644
> >>>> --- a/fs/nfsd/vfs.c
> >>>> +++ b/fs/nfsd/vfs.c
> >>>> @@ -568,6 +568,7 @@ __be32 nfsd4_clone_file_range(struct nfsd_file
> >>>> *nf_src, u64 src_pos,
> >>>>    ssize_t nfsd_copy_file_range(struct file *src, u64 src_pos,
> >>>> struct file *dst,
> >>>>                     u64 dst_pos, u64 count)
> >>>>    {
> >>>> +    ssize_t ret;
> >>>>        /*
> >>>>         * Limit copy to 4MB to prevent indefinitely blocking an nfsd
> >>>> @@ -578,7 +579,12 @@ ssize_t nfsd_copy_file_range(struct file *src,
> >>>> u64 src_pos, struct file *dst,
> >>>>         * limit like this and pipeline multiple COPY requests.
> >>>>         */
> >>>>        count = min_t(u64, count, 1 << 22);
> >>>> -    return vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0);
> >>>> +    ret = vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0);
> >>>> +
> >>>> +    if (ret == -EOPNOTSUPP || ret == -EXDEV)
> >>>> +        ret = generic_copy_file_range(src, src_pos, dst, dst_pos,
> >>>> +                          count, 0);
> >>>> +    return ret;
> >>>>    }
> >>>>    __be32 nfsd4_vfs_fallocate(struct svc_rqst *rqstp, struct svc_fh
> >>>> *fhp,
> >>>> diff --git a/fs/read_write.c b/fs/read_write.c
> >>>> index 75f764b43418..5a26297fd410 100644
> >>>> --- a/fs/read_write.c
> >>>> +++ b/fs/read_write.c
> >>>> @@ -1388,28 +1388,6 @@ ssize_t generic_copy_file_range(struct file
> >>>> *file_in, loff_t pos_in,
> >>>>    }
> >>>>    EXPORT_SYMBOL(generic_copy_file_range);
> >>>> -static ssize_t do_copy_file_range(struct file *file_in, loff_t
> >>>> pos_in,
> >>>> -                  struct file *file_out, loff_t pos_out,
> >>>> -                  size_t len, unsigned int flags)
> >>>> -{
> >>>> -    /*
> >>>> -     * Although we now allow filesystems to handle cross sb copy,
> >>>> passing
> >>>> -     * a file of the wrong filesystem type to filesystem driver
> >>>> can result
> >>>> -     * in an attempt to dereference the wrong type of
> >>>> ->private_data, so
> >>>> -     * avoid doing that until we really have a good reason.  NFS
> >>>> defines
> >>>> -     * several different file_system_type structures, but they all
> >>>> end up
> >>>> -     * using the same ->copy_file_range() function pointer.
> >>>> -     */
> >>>> -    if (file_out->f_op->copy_file_range &&
> >>>> -        file_out->f_op->copy_file_range ==
> >>>> file_in->f_op->copy_file_range)
> >>>> -        return file_out->f_op->copy_file_range(file_in, pos_in,
> >>>> -                               file_out, pos_out,
> >>>> -                               len, flags);
> >>>> -
> >>>> -    return generic_copy_file_range(file_in, pos_in, file_out,
> >>>> pos_out, len,
> >>>> -                       flags);
> >>>> -}
> >>>> -
> >>>>    /*
> >>>>     * Performs necessary checks before doing a file copy
> >>>>     *
> >>>> @@ -1427,6 +1405,25 @@ static int generic_copy_file_checks(struct
> >>>> file *file_in, loff_t pos_in,
> >>>>        loff_t size_in;
> >>>>        int ret;
> >>>> +    /*
> >>>> +     * Although we now allow filesystems to handle cross sb copy,
> >>>> passing
> >>>> +     * a file of the wrong filesystem type to filesystem driver
> >>>> can result
> >>>> +     * in an attempt to dereference the wrong type of
> >>>> ->private_data, so
> >>>> +     * avoid doing that until we really have a good reason.  NFS
> >>>> defines
> >>>> +     * several different file_system_type structures, but they all
> >>>> end up
> >>>> +     * using the same ->copy_file_range() function pointer.
> >>>> +     */
> >>>> +    if (file_out->f_op->copy_file_range) {
> >>>> +        if (file_in->f_op->copy_file_range !=
> >>>> +            file_out->f_op->copy_file_range)
> >>>> +            return -EXDEV;
> >>>> +    } else if (file_in->f_op->remap_file_range) {
> >>>> +        if (file_inode(file_in)->i_sb != file_inode(file_out)->i_sb)
> >>>> +            return -EXDEV;
> >>> I think this check is redundant, it's done in vfs_copy_file_range.
> >>> If this check is removed then the else clause below should be removed
> >>> also. Once this check and the else clause are removed then might as
> >>> well move the the check of copy_file_range from here to
> >>> vfs_copy_file_range.
> >>>
> >> I don't think it's really redundant, although I agree is messy due to
> >> the
> >> fact we try to clone first instead of copying them.
> >>
> >> So, in the clone path, this is the only place where we return -EXDEV if:
> >>
> >> 1) we don't have ->copy_file_range *and*
> >> 2) we have ->remap_file_range but the i_sb are different.
> >>
> >> The check in vfs_copy_file_range() is only executed if:
> >>
> >> 1) we have *valid* ->copy_file_range ops and/or
> >> 2) we have *valid* ->remap_file_range
> >>
> >> So... if we remove the check in generic_copy_file_checks() as you
> >> suggest
> >> and:
> >> - we don't have ->copy_file_range,
> >> - we have ->remap_file_range but
> >> - the i_sb are different
> >>
> >> we'll return the -EOPNOTSUPP (the one set in line "ret =
> >> -EOPNOTSUPP;" in
> >> function vfs_copy_file_range() ) instead of -EXDEV.
> >
> > Yes, this is the different.The NFS code handles both -EOPNOTSUPP and
> > -EXDEVV by doing generic_copy_file_range.  Do any other consumers of
> > vfs_copy_file_range rely on -EXDEV and not -EOPNOTSUPP and which is
> > the correct error code for this case? It seems to me that -EOPNOTSUPP
> > is more appropriate than EXDEV when (sb1 != sb2).
>

EXDEV is the right code for:
filesystem supports the operation but not for sb1 != sb1.

> So with the current patch, for a clone operation across 2 filesystems:
>
>    . if src and dst filesystem support both copy_file_range and
>      map_file_range then the code returns -ENOTSUPPORT.
>

Why do you say that?
Which code are you referring to exactly?
Did you see this behavior in a test?

>    . if the filesystems only support map_file_range then the
>      code returns -EXDEV
>
> This seems confusing, shouldn't only 1 error code returned for this case?
>

From my read of the code, user will get -EXDEV in both the cases you
listed.

Thanks,
Amir.
Dai Ngo Feb. 23, 2021, 4:57 p.m. UTC | #8
On 2/23/21 8:47 AM, Amir Goldstein wrote:
> On Tue, Feb 23, 2021 at 6:02 PM <dai.ngo@oracle.com> wrote:
>>
>> On 2/23/21 7:29 AM, dai.ngo@oracle.com wrote:
>>> On 2/23/21 2:32 AM, Luis Henriques wrote:
>>>> On Mon, Feb 22, 2021 at 08:25:27AM -0800, dai.ngo@oracle.com wrote:
>>>>> On 2/22/21 2:24 AM, Luis Henriques wrote:
>>>>>> A regression has been reported by Nicolas Boichat, found while
>>>>>> using the
>>>>>> copy_file_range syscall to copy a tracefs file.  Before commit
>>>>>> 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices") the
>>>>>> kernel would return -EXDEV to userspace when trying to copy a file
>>>>>> across
>>>>>> different filesystems.  After this commit, the syscall doesn't fail
>>>>>> anymore
>>>>>> and instead returns zero (zero bytes copied), as this file's
>>>>>> content is
>>>>>> generated on-the-fly and thus reports a size of zero.
>>>>>>
>>>>>> This patch restores some cross-filesystem copy restrictions that
>>>>>> existed
>>>>>> prior to commit 5dae222a5ff0 ("vfs: allow copy_file_range to copy
>>>>>> across
>>>>>> devices").  Filesystems are still allowed to fall-back to the VFS
>>>>>> generic_copy_file_range() implementation, but that has now to be done
>>>>>> explicitly.
>>>>>>
>>>>>> nfsd is also modified to fall-back into generic_copy_file_range()
>>>>>> in case
>>>>>> vfs_copy_file_range() fails with -EOPNOTSUPP or -EXDEV.
>>>>>>
>>>>>> Fixes: 5dae222a5ff0 ("vfs: allow copy_file_range to copy across
>>>>>> devices")
>>>>>> Link:
>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@chromium.org/__;!!GqivPVa7Brio!P1UWThiSkxbjfjFQWNYJmCxGEkiLFyvHjH6cS-G1ZTt1z-TeqwGQgQmi49dC6w$
>>>>>> Link:
>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/CANMq1KDZuxir2LM5jOTm0xx*BnvW=ZmpsG47CyHFJwnw7zSX6Q@mail.gmail.com/__;Kw!!GqivPVa7Brio!P1UWThiSkxbjfjFQWNYJmCxGEkiLFyvHjH6cS-G1ZTt1z-TeqwGQgQmgCmMHzA$
>>>>>> Link:
>>>>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/__;!!GqivPVa7Brio!P1UWThiSkxbjfjFQWNYJmCxGEkiLFyvHjH6cS-G1ZTt1z-TeqwGQgQmzqItkrQ$
>>>>>> Reported-by: Nicolas Boichat <drinkcat@chromium.org>
>>>>>> Signed-off-by: Luis Henriques <lhenriques@suse.de>
>>>>>> ---
>>>>>> Changes since v7
>>>>>> - set 'ret' to '-EOPNOTSUPP' before the clone 'if' statement so
>>>>>> that the
>>>>>>      error returned is always related to the 'copy' operation
>>>>>> Changes since v6
>>>>>> - restored i_sb checks for the clone operation
>>>>>> Changes since v5
>>>>>> - check if ->copy_file_range is NULL before calling it
>>>>>> Changes since v4
>>>>>> - nfsd falls-back to generic_copy_file_range() only *if* it gets
>>>>>> -EOPNOTSUPP
>>>>>>      or -EXDEV.
>>>>>> Changes since v3
>>>>>> - dropped the COPY_FILE_SPLICE flag
>>>>>> - kept the f_op's checks early in generic_copy_file_checks,
>>>>>> implementing
>>>>>>      Amir's suggestions
>>>>>> - modified nfsd to use generic_copy_file_range()
>>>>>> Changes since v2
>>>>>> - do all the required checks earlier, in generic_copy_file_checks(),
>>>>>>      adding new checks for ->remap_file_range
>>>>>> - new COPY_FILE_SPLICE flag
>>>>>> - don't remove filesystem's fallback to generic_copy_file_range()
>>>>>> - updated commit changelog (and subject)
>>>>>> Changes since v1 (after Amir review)
>>>>>> - restored do_copy_file_range() helper
>>>>>> - return -EOPNOTSUPP if fs doesn't implement CFR
>>>>>> - updated commit description
>>>>>>
>>>>>>     fs/nfsd/vfs.c   |  8 +++++++-
>>>>>>     fs/read_write.c | 49
>>>>>> ++++++++++++++++++++++++-------------------------
>>>>>>     2 files changed, 31 insertions(+), 26 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>>>>>> index 04937e51de56..23dab0fa9087 100644
>>>>>> --- a/fs/nfsd/vfs.c
>>>>>> +++ b/fs/nfsd/vfs.c
>>>>>> @@ -568,6 +568,7 @@ __be32 nfsd4_clone_file_range(struct nfsd_file
>>>>>> *nf_src, u64 src_pos,
>>>>>>     ssize_t nfsd_copy_file_range(struct file *src, u64 src_pos,
>>>>>> struct file *dst,
>>>>>>                      u64 dst_pos, u64 count)
>>>>>>     {
>>>>>> +    ssize_t ret;
>>>>>>         /*
>>>>>>          * Limit copy to 4MB to prevent indefinitely blocking an nfsd
>>>>>> @@ -578,7 +579,12 @@ ssize_t nfsd_copy_file_range(struct file *src,
>>>>>> u64 src_pos, struct file *dst,
>>>>>>          * limit like this and pipeline multiple COPY requests.
>>>>>>          */
>>>>>>         count = min_t(u64, count, 1 << 22);
>>>>>> -    return vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0);
>>>>>> +    ret = vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0);
>>>>>> +
>>>>>> +    if (ret == -EOPNOTSUPP || ret == -EXDEV)
>>>>>> +        ret = generic_copy_file_range(src, src_pos, dst, dst_pos,
>>>>>> +                          count, 0);
>>>>>> +    return ret;
>>>>>>     }
>>>>>>     __be32 nfsd4_vfs_fallocate(struct svc_rqst *rqstp, struct svc_fh
>>>>>> *fhp,
>>>>>> diff --git a/fs/read_write.c b/fs/read_write.c
>>>>>> index 75f764b43418..5a26297fd410 100644
>>>>>> --- a/fs/read_write.c
>>>>>> +++ b/fs/read_write.c
>>>>>> @@ -1388,28 +1388,6 @@ ssize_t generic_copy_file_range(struct file
>>>>>> *file_in, loff_t pos_in,
>>>>>>     }
>>>>>>     EXPORT_SYMBOL(generic_copy_file_range);
>>>>>> -static ssize_t do_copy_file_range(struct file *file_in, loff_t
>>>>>> pos_in,
>>>>>> -                  struct file *file_out, loff_t pos_out,
>>>>>> -                  size_t len, unsigned int flags)
>>>>>> -{
>>>>>> -    /*
>>>>>> -     * Although we now allow filesystems to handle cross sb copy,
>>>>>> passing
>>>>>> -     * a file of the wrong filesystem type to filesystem driver
>>>>>> can result
>>>>>> -     * in an attempt to dereference the wrong type of
>>>>>> ->private_data, so
>>>>>> -     * avoid doing that until we really have a good reason.  NFS
>>>>>> defines
>>>>>> -     * several different file_system_type structures, but they all
>>>>>> end up
>>>>>> -     * using the same ->copy_file_range() function pointer.
>>>>>> -     */
>>>>>> -    if (file_out->f_op->copy_file_range &&
>>>>>> -        file_out->f_op->copy_file_range ==
>>>>>> file_in->f_op->copy_file_range)
>>>>>> -        return file_out->f_op->copy_file_range(file_in, pos_in,
>>>>>> -                               file_out, pos_out,
>>>>>> -                               len, flags);
>>>>>> -
>>>>>> -    return generic_copy_file_range(file_in, pos_in, file_out,
>>>>>> pos_out, len,
>>>>>> -                       flags);
>>>>>> -}
>>>>>> -
>>>>>>     /*
>>>>>>      * Performs necessary checks before doing a file copy
>>>>>>      *
>>>>>> @@ -1427,6 +1405,25 @@ static int generic_copy_file_checks(struct
>>>>>> file *file_in, loff_t pos_in,
>>>>>>         loff_t size_in;
>>>>>>         int ret;
>>>>>> +    /*
>>>>>> +     * Although we now allow filesystems to handle cross sb copy,
>>>>>> passing
>>>>>> +     * a file of the wrong filesystem type to filesystem driver
>>>>>> can result
>>>>>> +     * in an attempt to dereference the wrong type of
>>>>>> ->private_data, so
>>>>>> +     * avoid doing that until we really have a good reason.  NFS
>>>>>> defines
>>>>>> +     * several different file_system_type structures, but they all
>>>>>> end up
>>>>>> +     * using the same ->copy_file_range() function pointer.
>>>>>> +     */
>>>>>> +    if (file_out->f_op->copy_file_range) {
>>>>>> +        if (file_in->f_op->copy_file_range !=
>>>>>> +            file_out->f_op->copy_file_range)
>>>>>> +            return -EXDEV;
>>>>>> +    } else if (file_in->f_op->remap_file_range) {
>>>>>> +        if (file_inode(file_in)->i_sb != file_inode(file_out)->i_sb)
>>>>>> +            return -EXDEV;
>>>>> I think this check is redundant, it's done in vfs_copy_file_range.
>>>>> If this check is removed then the else clause below should be removed
>>>>> also. Once this check and the else clause are removed then might as
>>>>> well move the the check of copy_file_range from here to
>>>>> vfs_copy_file_range.
>>>>>
>>>> I don't think it's really redundant, although I agree is messy due to
>>>> the
>>>> fact we try to clone first instead of copying them.
>>>>
>>>> So, in the clone path, this is the only place where we return -EXDEV if:
>>>>
>>>> 1) we don't have ->copy_file_range *and*
>>>> 2) we have ->remap_file_range but the i_sb are different.
>>>>
>>>> The check in vfs_copy_file_range() is only executed if:
>>>>
>>>> 1) we have *valid* ->copy_file_range ops and/or
>>>> 2) we have *valid* ->remap_file_range
>>>>
>>>> So... if we remove the check in generic_copy_file_checks() as you
>>>> suggest
>>>> and:
>>>> - we don't have ->copy_file_range,
>>>> - we have ->remap_file_range but
>>>> - the i_sb are different
>>>>
>>>> we'll return the -EOPNOTSUPP (the one set in line "ret =
>>>> -EOPNOTSUPP;" in
>>>> function vfs_copy_file_range() ) instead of -EXDEV.
>>> Yes, this is the different.The NFS code handles both -EOPNOTSUPP and
>>> -EXDEVV by doing generic_copy_file_range.  Do any other consumers of
>>> vfs_copy_file_range rely on -EXDEV and not -EOPNOTSUPP and which is
>>> the correct error code for this case? It seems to me that -EOPNOTSUPP
>>> is more appropriate than EXDEV when (sb1 != sb2).
> EXDEV is the right code for:
> filesystem supports the operation but not for sb1 != sb1.
>
>> So with the current patch, for a clone operation across 2 filesystems:
>>
>>     . if src and dst filesystem support both copy_file_range and
>>       map_file_range then the code returns -ENOTSUPPORT.
>>
> Why do you say that?
> Which code are you referring to exactly?

If the filesystems support both copy_file_range and map_file_range,
it passes the check in generic_file_check but it fails with the
check in vfs_copy_file_range and returns -ENOTSUPPORT (added by
the v8 patch)

-Dai

> Did you see this behavior in a test?
>
>>     . if the filesystems only support map_file_range then the
>>       code returns -EXDEV
>>
>> This seems confusing, shouldn't only 1 error code returned for this case?
>>
>  From my read of the code, user will get -EXDEV in both the cases you
> listed.
>
> Thanks,
> Amir.
Olga Kornievskaia Feb. 23, 2021, 5:13 p.m. UTC | #9
On Tue, Feb 23, 2021 at 11:03 AM <dai.ngo@oracle.com> wrote:
>
>
> On 2/23/21 7:29 AM, dai.ngo@oracle.com wrote:
> >
> > On 2/23/21 2:32 AM, Luis Henriques wrote:
> >> On Mon, Feb 22, 2021 at 08:25:27AM -0800, dai.ngo@oracle.com wrote:
> >>> On 2/22/21 2:24 AM, Luis Henriques wrote:
> >>>> A regression has been reported by Nicolas Boichat, found while
> >>>> using the
> >>>> copy_file_range syscall to copy a tracefs file.  Before commit
> >>>> 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices") the
> >>>> kernel would return -EXDEV to userspace when trying to copy a file
> >>>> across
> >>>> different filesystems.  After this commit, the syscall doesn't fail
> >>>> anymore
> >>>> and instead returns zero (zero bytes copied), as this file's
> >>>> content is
> >>>> generated on-the-fly and thus reports a size of zero.
> >>>>
> >>>> This patch restores some cross-filesystem copy restrictions that
> >>>> existed
> >>>> prior to commit 5dae222a5ff0 ("vfs: allow copy_file_range to copy
> >>>> across
> >>>> devices").  Filesystems are still allowed to fall-back to the VFS
> >>>> generic_copy_file_range() implementation, but that has now to be done
> >>>> explicitly.
> >>>>
> >>>> nfsd is also modified to fall-back into generic_copy_file_range()
> >>>> in case
> >>>> vfs_copy_file_range() fails with -EOPNOTSUPP or -EXDEV.
> >>>>
> >>>> Fixes: 5dae222a5ff0 ("vfs: allow copy_file_range to copy across
> >>>> devices")
> >>>> Link:
> >>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@chromium.org/__;!!GqivPVa7Brio!P1UWThiSkxbjfjFQWNYJmCxGEkiLFyvHjH6cS-G1ZTt1z-TeqwGQgQmi49dC6w$
> >>>> Link:
> >>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/CANMq1KDZuxir2LM5jOTm0xx*BnvW=ZmpsG47CyHFJwnw7zSX6Q@mail.gmail.com/__;Kw!!GqivPVa7Brio!P1UWThiSkxbjfjFQWNYJmCxGEkiLFyvHjH6cS-G1ZTt1z-TeqwGQgQmgCmMHzA$
> >>>> Link:
> >>>> https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/__;!!GqivPVa7Brio!P1UWThiSkxbjfjFQWNYJmCxGEkiLFyvHjH6cS-G1ZTt1z-TeqwGQgQmzqItkrQ$
> >>>> Reported-by: Nicolas Boichat <drinkcat@chromium.org>
> >>>> Signed-off-by: Luis Henriques <lhenriques@suse.de>
> >>>> ---
> >>>> Changes since v7
> >>>> - set 'ret' to '-EOPNOTSUPP' before the clone 'if' statement so
> >>>> that the
> >>>>     error returned is always related to the 'copy' operation
> >>>> Changes since v6
> >>>> - restored i_sb checks for the clone operation
> >>>> Changes since v5
> >>>> - check if ->copy_file_range is NULL before calling it
> >>>> Changes since v4
> >>>> - nfsd falls-back to generic_copy_file_range() only *if* it gets
> >>>> -EOPNOTSUPP
> >>>>     or -EXDEV.
> >>>> Changes since v3
> >>>> - dropped the COPY_FILE_SPLICE flag
> >>>> - kept the f_op's checks early in generic_copy_file_checks,
> >>>> implementing
> >>>>     Amir's suggestions
> >>>> - modified nfsd to use generic_copy_file_range()
> >>>> Changes since v2
> >>>> - do all the required checks earlier, in generic_copy_file_checks(),
> >>>>     adding new checks for ->remap_file_range
> >>>> - new COPY_FILE_SPLICE flag
> >>>> - don't remove filesystem's fallback to generic_copy_file_range()
> >>>> - updated commit changelog (and subject)
> >>>> Changes since v1 (after Amir review)
> >>>> - restored do_copy_file_range() helper
> >>>> - return -EOPNOTSUPP if fs doesn't implement CFR
> >>>> - updated commit description
> >>>>
> >>>>    fs/nfsd/vfs.c   |  8 +++++++-
> >>>>    fs/read_write.c | 49
> >>>> ++++++++++++++++++++++++-------------------------
> >>>>    2 files changed, 31 insertions(+), 26 deletions(-)
> >>>>
> >>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> >>>> index 04937e51de56..23dab0fa9087 100644
> >>>> --- a/fs/nfsd/vfs.c
> >>>> +++ b/fs/nfsd/vfs.c
> >>>> @@ -568,6 +568,7 @@ __be32 nfsd4_clone_file_range(struct nfsd_file
> >>>> *nf_src, u64 src_pos,
> >>>>    ssize_t nfsd_copy_file_range(struct file *src, u64 src_pos,
> >>>> struct file *dst,
> >>>>                     u64 dst_pos, u64 count)
> >>>>    {
> >>>> +    ssize_t ret;
> >>>>        /*
> >>>>         * Limit copy to 4MB to prevent indefinitely blocking an nfsd
> >>>> @@ -578,7 +579,12 @@ ssize_t nfsd_copy_file_range(struct file *src,
> >>>> u64 src_pos, struct file *dst,
> >>>>         * limit like this and pipeline multiple COPY requests.
> >>>>         */
> >>>>        count = min_t(u64, count, 1 << 22);
> >>>> -    return vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0);
> >>>> +    ret = vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0);
> >>>> +
> >>>> +    if (ret == -EOPNOTSUPP || ret == -EXDEV)
> >>>> +        ret = generic_copy_file_range(src, src_pos, dst, dst_pos,
> >>>> +                          count, 0);
> >>>> +    return ret;
> >>>>    }
> >>>>    __be32 nfsd4_vfs_fallocate(struct svc_rqst *rqstp, struct svc_fh
> >>>> *fhp,
> >>>> diff --git a/fs/read_write.c b/fs/read_write.c
> >>>> index 75f764b43418..5a26297fd410 100644
> >>>> --- a/fs/read_write.c
> >>>> +++ b/fs/read_write.c
> >>>> @@ -1388,28 +1388,6 @@ ssize_t generic_copy_file_range(struct file
> >>>> *file_in, loff_t pos_in,
> >>>>    }
> >>>>    EXPORT_SYMBOL(generic_copy_file_range);
> >>>> -static ssize_t do_copy_file_range(struct file *file_in, loff_t
> >>>> pos_in,
> >>>> -                  struct file *file_out, loff_t pos_out,
> >>>> -                  size_t len, unsigned int flags)
> >>>> -{
> >>>> -    /*
> >>>> -     * Although we now allow filesystems to handle cross sb copy,
> >>>> passing
> >>>> -     * a file of the wrong filesystem type to filesystem driver
> >>>> can result
> >>>> -     * in an attempt to dereference the wrong type of
> >>>> ->private_data, so
> >>>> -     * avoid doing that until we really have a good reason.  NFS
> >>>> defines
> >>>> -     * several different file_system_type structures, but they all
> >>>> end up
> >>>> -     * using the same ->copy_file_range() function pointer.
> >>>> -     */
> >>>> -    if (file_out->f_op->copy_file_range &&
> >>>> -        file_out->f_op->copy_file_range ==
> >>>> file_in->f_op->copy_file_range)
> >>>> -        return file_out->f_op->copy_file_range(file_in, pos_in,
> >>>> -                               file_out, pos_out,
> >>>> -                               len, flags);
> >>>> -
> >>>> -    return generic_copy_file_range(file_in, pos_in, file_out,
> >>>> pos_out, len,
> >>>> -                       flags);
> >>>> -}
> >>>> -
> >>>>    /*
> >>>>     * Performs necessary checks before doing a file copy
> >>>>     *
> >>>> @@ -1427,6 +1405,25 @@ static int generic_copy_file_checks(struct
> >>>> file *file_in, loff_t pos_in,
> >>>>        loff_t size_in;
> >>>>        int ret;
> >>>> +    /*
> >>>> +     * Although we now allow filesystems to handle cross sb copy,
> >>>> passing
> >>>> +     * a file of the wrong filesystem type to filesystem driver
> >>>> can result
> >>>> +     * in an attempt to dereference the wrong type of
> >>>> ->private_data, so
> >>>> +     * avoid doing that until we really have a good reason.  NFS
> >>>> defines
> >>>> +     * several different file_system_type structures, but they all
> >>>> end up
> >>>> +     * using the same ->copy_file_range() function pointer.
> >>>> +     */
> >>>> +    if (file_out->f_op->copy_file_range) {
> >>>> +        if (file_in->f_op->copy_file_range !=
> >>>> +            file_out->f_op->copy_file_range)
> >>>> +            return -EXDEV;
> >>>> +    } else if (file_in->f_op->remap_file_range) {
> >>>> +        if (file_inode(file_in)->i_sb != file_inode(file_out)->i_sb)
> >>>> +            return -EXDEV;
> >>> I think this check is redundant, it's done in vfs_copy_file_range.
> >>> If this check is removed then the else clause below should be removed
> >>> also. Once this check and the else clause are removed then might as
> >>> well move the the check of copy_file_range from here to
> >>> vfs_copy_file_range.
> >>>
> >> I don't think it's really redundant, although I agree is messy due to
> >> the
> >> fact we try to clone first instead of copying them.
> >>
> >> So, in the clone path, this is the only place where we return -EXDEV if:
> >>
> >> 1) we don't have ->copy_file_range *and*
> >> 2) we have ->remap_file_range but the i_sb are different.
> >>
> >> The check in vfs_copy_file_range() is only executed if:
> >>
> >> 1) we have *valid* ->copy_file_range ops and/or
> >> 2) we have *valid* ->remap_file_range
> >>
> >> So... if we remove the check in generic_copy_file_checks() as you
> >> suggest
> >> and:
> >> - we don't have ->copy_file_range,
> >> - we have ->remap_file_range but
> >> - the i_sb are different
> >>
> >> we'll return the -EOPNOTSUPP (the one set in line "ret =
> >> -EOPNOTSUPP;" in
> >> function vfs_copy_file_range() ) instead of -EXDEV.
> >
> > Yes, this is the different.The NFS code handles both -EOPNOTSUPP and
> > -EXDEVV by doing generic_copy_file_range.  Do any other consumers of
> > vfs_copy_file_range rely on -EXDEV and not -EOPNOTSUPP and which is
> > the correct error code for this case? It seems to me that -EOPNOTSUPP
> > is more appropriate than EXDEV when (sb1 != sb2).
>
> So with the current patch, for a clone operation across 2 filesystems:

Wait, I can't get passed "a clone operation across 2 filesystems", I
thought there are not any options. It's not allowed? Then we go do try
the copy. Those are two different steps so errors code might be
different.

>    . if src and dst filesystem support both copy_file_range and
>      map_file_range then the code returns -ENOTSUPPORT.
>
>    . if the filesystems only support map_file_range then the
>      code returns -EXDEV
>
> This seems confusing, shouldn't only 1 error code returned for this case?
>
> -Dai
>
> >
> >>
> >> But I may have got it all wrong.  I've looked so many times at this code
> >> that I'm probably useless at finding problems in it :-)
> >
> > You're not alone, we all try to do the right thing :-)
> >
> > -Dai
> >
> >>
> >> Cheers,
> >> --
> >> Luís
> >>
> >>> -Dai
> >>>
> >>>> +    } else {
> >>>> +                return -EOPNOTSUPP;
> >>>> +    }
> >>>> +
> >>>>        ret = generic_file_rw_checks(file_in, file_out);
> >>>>        if (ret)
> >>>>            return ret;
> >>>> @@ -1495,6 +1492,7 @@ ssize_t vfs_copy_file_range(struct file
> >>>> *file_in, loff_t pos_in,
> >>>>        file_start_write(file_out);
> >>>> +    ret = -EOPNOTSUPP;
> >>>>        /*
> >>>>         * Try cloning first, this is supported by more file
> >>>> systems, and
> >>>>         * more efficient if both clone and copy are supported (e.g.
> >>>> NFS).
> >>>> @@ -1513,9 +1511,10 @@ ssize_t vfs_copy_file_range(struct file
> >>>> *file_in, loff_t pos_in,
> >>>>            }
> >>>>        }
> >>>> -    ret = do_copy_file_range(file_in, pos_in, file_out, pos_out, len,
> >>>> -                flags);
> >>>> -    WARN_ON_ONCE(ret == -EOPNOTSUPP);
> >>>> +    if (file_out->f_op->copy_file_range)
> >>>> +        ret = file_out->f_op->copy_file_range(file_in, pos_in,
> >>>> +                              file_out, pos_out,
> >>>> +                              len, flags);
> >>>>    done:
> >>>>        if (ret > 0) {
> >>>>            fsnotify_access(file_in);
Amir Goldstein Feb. 23, 2021, 5:33 p.m. UTC | #10
On Tue, Feb 23, 2021 at 7:31 PM <dai.ngo@oracle.com> wrote:
>
> On 2/23/21 8:57 AM, dai.ngo@oracle.com wrote:
>
>
> On 2/23/21 8:47 AM, Amir Goldstein wrote:
>
> On Tue, Feb 23, 2021 at 6:02 PM <dai.ngo@oracle.com> wrote:
>
>
> On 2/23/21 7:29 AM, dai.ngo@oracle.com wrote:
>
> On 2/23/21 2:32 AM, Luis Henriques wrote:
>
> On Mon, Feb 22, 2021 at 08:25:27AM -0800, dai.ngo@oracle.com wrote:
>
> On 2/22/21 2:24 AM, Luis Henriques wrote:
>
> A regression has been reported by Nicolas Boichat, found while
> using the
> copy_file_range syscall to copy a tracefs file.  Before commit
> 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices") the
> kernel would return -EXDEV to userspace when trying to copy a file
> across
> different filesystems.  After this commit, the syscall doesn't fail
> anymore
> and instead returns zero (zero bytes copied), as this file's
> content is
> generated on-the-fly and thus reports a size of zero.
>
> This patch restores some cross-filesystem copy restrictions that
> existed
> prior to commit 5dae222a5ff0 ("vfs: allow copy_file_range to copy
> across
> devices").  Filesystems are still allowed to fall-back to the VFS
> generic_copy_file_range() implementation, but that has now to be done
> explicitly.
>
> nfsd is also modified to fall-back into generic_copy_file_range()
> in case
> vfs_copy_file_range() fails with -EOPNOTSUPP or -EXDEV.
>
> Fixes: 5dae222a5ff0 ("vfs: allow copy_file_range to copy across
> devices")
> Link:
> https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@chromium.org/__;!!GqivPVa7Brio!P1UWThiSkxbjfjFQWNYJmCxGEkiLFyvHjH6cS-G1ZTt1z-TeqwGQgQmi49dC6w$
> Link:
> https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/CANMq1KDZuxir2LM5jOTm0xx*BnvW=ZmpsG47CyHFJwnw7zSX6Q@mail.gmail.com/__;Kw!!GqivPVa7Brio!P1UWThiSkxbjfjFQWNYJmCxGEkiLFyvHjH6cS-G1ZTt1z-TeqwGQgQmgCmMHzA$
> Link:
> https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/__;!!GqivPVa7Brio!P1UWThiSkxbjfjFQWNYJmCxGEkiLFyvHjH6cS-G1ZTt1z-TeqwGQgQmzqItkrQ$
> Reported-by: Nicolas Boichat <drinkcat@chromium.org>
> Signed-off-by: Luis Henriques <lhenriques@suse.de>
> ---
> Changes since v7
> - set 'ret' to '-EOPNOTSUPP' before the clone 'if' statement so
> that the
>      error returned is always related to the 'copy' operation
> Changes since v6
> - restored i_sb checks for the clone operation
> Changes since v5
> - check if ->copy_file_range is NULL before calling it
> Changes since v4
> - nfsd falls-back to generic_copy_file_range() only *if* it gets
> -EOPNOTSUPP
>      or -EXDEV.
> Changes since v3
> - dropped the COPY_FILE_SPLICE flag
> - kept the f_op's checks early in generic_copy_file_checks,
> implementing
>      Amir's suggestions
> - modified nfsd to use generic_copy_file_range()
> Changes since v2
> - do all the required checks earlier, in generic_copy_file_checks(),
>      adding new checks for ->remap_file_range
> - new COPY_FILE_SPLICE flag
> - don't remove filesystem's fallback to generic_copy_file_range()
> - updated commit changelog (and subject)
> Changes since v1 (after Amir review)
> - restored do_copy_file_range() helper
> - return -EOPNOTSUPP if fs doesn't implement CFR
> - updated commit description
>
>     fs/nfsd/vfs.c   |  8 +++++++-
>     fs/read_write.c | 49
> ++++++++++++++++++++++++-------------------------
>     2 files changed, 31 insertions(+), 26 deletions(-)
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 04937e51de56..23dab0fa9087 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -568,6 +568,7 @@ __be32 nfsd4_clone_file_range(struct nfsd_file
> *nf_src, u64 src_pos,
>     ssize_t nfsd_copy_file_range(struct file *src, u64 src_pos,
> struct file *dst,
>                      u64 dst_pos, u64 count)
>     {
> +    ssize_t ret;
>         /*
>          * Limit copy to 4MB to prevent indefinitely blocking an nfsd
> @@ -578,7 +579,12 @@ ssize_t nfsd_copy_file_range(struct file *src,
> u64 src_pos, struct file *dst,
>          * limit like this and pipeline multiple COPY requests.
>          */
>         count = min_t(u64, count, 1 << 22);
> -    return vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0);
> +    ret = vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0);
> +
> +    if (ret == -EOPNOTSUPP || ret == -EXDEV)
> +        ret = generic_copy_file_range(src, src_pos, dst, dst_pos,
> +                          count, 0);
> +    return ret;
>     }
>     __be32 nfsd4_vfs_fallocate(struct svc_rqst *rqstp, struct svc_fh
> *fhp,
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 75f764b43418..5a26297fd410 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1388,28 +1388,6 @@ ssize_t generic_copy_file_range(struct file
> *file_in, loff_t pos_in,
>     }
>     EXPORT_SYMBOL(generic_copy_file_range);
> -static ssize_t do_copy_file_range(struct file *file_in, loff_t
> pos_in,
> -                  struct file *file_out, loff_t pos_out,
> -                  size_t len, unsigned int flags)
> -{
> -    /*
> -     * Although we now allow filesystems to handle cross sb copy,
> passing
> -     * a file of the wrong filesystem type to filesystem driver
> can result
> -     * in an attempt to dereference the wrong type of
> ->private_data, so
> -     * avoid doing that until we really have a good reason.  NFS
> defines
> -     * several different file_system_type structures, but they all
> end up
> -     * using the same ->copy_file_range() function pointer.
> -     */
> -    if (file_out->f_op->copy_file_range &&
> -        file_out->f_op->copy_file_range ==
> file_in->f_op->copy_file_range)
> -        return file_out->f_op->copy_file_range(file_in, pos_in,
> -                               file_out, pos_out,
> -                               len, flags);
> -
> -    return generic_copy_file_range(file_in, pos_in, file_out,
> pos_out, len,
> -                       flags);
> -}
> -
>     /*
>      * Performs necessary checks before doing a file copy
>      *
> @@ -1427,6 +1405,25 @@ static int generic_copy_file_checks(struct
> file *file_in, loff_t pos_in,
>         loff_t size_in;
>         int ret;
> +    /*
> +     * Although we now allow filesystems to handle cross sb copy,
> passing
> +     * a file of the wrong filesystem type to filesystem driver
> can result
> +     * in an attempt to dereference the wrong type of
> ->private_data, so
> +     * avoid doing that until we really have a good reason.  NFS
> defines
> +     * several different file_system_type structures, but they all
> end up
> +     * using the same ->copy_file_range() function pointer.
> +     */
> +    if (file_out->f_op->copy_file_range) {
> +        if (file_in->f_op->copy_file_range !=
> +            file_out->f_op->copy_file_range)
> +            return -EXDEV;
> +    } else if (file_in->f_op->remap_file_range) {
> +        if (file_inode(file_in)->i_sb != file_inode(file_out)->i_sb)
> +            return -EXDEV;
>
> I think this check is redundant, it's done in vfs_copy_file_range.
> If this check is removed then the else clause below should be removed
> also. Once this check and the else clause are removed then might as
> well move the the check of copy_file_range from here to
> vfs_copy_file_range.
>
> I don't think it's really redundant, although I agree is messy due to
> the
> fact we try to clone first instead of copying them.
>
> So, in the clone path, this is the only place where we return -EXDEV if:
>
> 1) we don't have ->copy_file_range *and*
> 2) we have ->remap_file_range but the i_sb are different.
>
> The check in vfs_copy_file_range() is only executed if:
>
> 1) we have *valid* ->copy_file_range ops and/or
> 2) we have *valid* ->remap_file_range
>
> So... if we remove the check in generic_copy_file_checks() as you
> suggest
> and:
> - we don't have ->copy_file_range,
> - we have ->remap_file_range but
> - the i_sb are different
>
> we'll return the -EOPNOTSUPP (the one set in line "ret =
> -EOPNOTSUPP;" in
> function vfs_copy_file_range() ) instead of -EXDEV.
>
> Yes, this is the different.The NFS code handles both -EOPNOTSUPP and
> -EXDEVV by doing generic_copy_file_range.  Do any other consumers of
> vfs_copy_file_range rely on -EXDEV and not -EOPNOTSUPP and which is
> the correct error code for this case? It seems to me that -EOPNOTSUPP
> is more appropriate than EXDEV when (sb1 != sb2).
>
> EXDEV is the right code for:
> filesystem supports the operation but not for sb1 != sb1.
>
> So with the current patch, for a clone operation across 2 filesystems:
>
>     . if src and dst filesystem support both copy_file_range and
>       map_file_range then the code returns -ENOTSUPPORT.
>
> Why do you say that?
> Which code are you referring to exactly?
>
>
> If the filesystems support both copy_file_range and map_file_range,
> it passes the check in generic_file_check but it fails with the
> check in vfs_copy_file_range and returns -ENOTSUPPORT (added by
> the v8 patch)
>
> Ok, I misread the code here. If it passes the check in generic_copy_file_checks
> and it fails the sb check in vfs_copy_file_range then it tries copy_file_range
> so it's ok.
>
> I think having the check in both generic_copy_file_checks and vfs_copy_file_range
> making the code hard to read. What's the reason not to do the check only in
> vfs_copy_file_range?
>

You are going in circles.
I already answered that.
Please re-read the entire thread on all patch versions before commenting.

Thanks,
Amir.
Luis Henriques Feb. 23, 2021, 5:56 p.m. UTC | #11
On Tue, Feb 23, 2021 at 08:57:38AM -0800, dai.ngo@oracle.com wrote:
> 
> On 2/23/21 8:47 AM, Amir Goldstein wrote:
> > On Tue, Feb 23, 2021 at 6:02 PM <dai.ngo@oracle.com> wrote:
> > > 
> > > On 2/23/21 7:29 AM, dai.ngo@oracle.com wrote:
> > > > On 2/23/21 2:32 AM, Luis Henriques wrote:
> > > > > On Mon, Feb 22, 2021 at 08:25:27AM -0800, dai.ngo@oracle.com wrote:
> > > > > > On 2/22/21 2:24 AM, Luis Henriques wrote:
> > > > > > > A regression has been reported by Nicolas Boichat, found while
> > > > > > > using the
> > > > > > > copy_file_range syscall to copy a tracefs file.  Before commit
> > > > > > > 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices") the
> > > > > > > kernel would return -EXDEV to userspace when trying to copy a file
> > > > > > > across
> > > > > > > different filesystems.  After this commit, the syscall doesn't fail
> > > > > > > anymore
> > > > > > > and instead returns zero (zero bytes copied), as this file's
> > > > > > > content is
> > > > > > > generated on-the-fly and thus reports a size of zero.
> > > > > > > 
> > > > > > > This patch restores some cross-filesystem copy restrictions that
> > > > > > > existed
> > > > > > > prior to commit 5dae222a5ff0 ("vfs: allow copy_file_range to copy
> > > > > > > across
> > > > > > > devices").  Filesystems are still allowed to fall-back to the VFS
> > > > > > > generic_copy_file_range() implementation, but that has now to be done
> > > > > > > explicitly.
> > > > > > > 
> > > > > > > nfsd is also modified to fall-back into generic_copy_file_range()
> > > > > > > in case
> > > > > > > vfs_copy_file_range() fails with -EOPNOTSUPP or -EXDEV.
> > > > > > > 
> > > > > > > Fixes: 5dae222a5ff0 ("vfs: allow copy_file_range to copy across
> > > > > > > devices")
> > > > > > > Link:
> > > > > > > https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@chromium.org/__;!!GqivPVa7Brio!P1UWThiSkxbjfjFQWNYJmCxGEkiLFyvHjH6cS-G1ZTt1z-TeqwGQgQmi49dC6w$
> > > > > > > Link:
> > > > > > > https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/CANMq1KDZuxir2LM5jOTm0xx*BnvW=ZmpsG47CyHFJwnw7zSX6Q@mail.gmail.com/__;Kw!!GqivPVa7Brio!P1UWThiSkxbjfjFQWNYJmCxGEkiLFyvHjH6cS-G1ZTt1z-TeqwGQgQmgCmMHzA$
> > > > > > > Link:
> > > > > > > https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/__;!!GqivPVa7Brio!P1UWThiSkxbjfjFQWNYJmCxGEkiLFyvHjH6cS-G1ZTt1z-TeqwGQgQmzqItkrQ$
> > > > > > > Reported-by: Nicolas Boichat <drinkcat@chromium.org>
> > > > > > > Signed-off-by: Luis Henriques <lhenriques@suse.de>
> > > > > > > ---
> > > > > > > Changes since v7
> > > > > > > - set 'ret' to '-EOPNOTSUPP' before the clone 'if' statement so
> > > > > > > that the
> > > > > > >      error returned is always related to the 'copy' operation
> > > > > > > Changes since v6
> > > > > > > - restored i_sb checks for the clone operation
> > > > > > > Changes since v5
> > > > > > > - check if ->copy_file_range is NULL before calling it
> > > > > > > Changes since v4
> > > > > > > - nfsd falls-back to generic_copy_file_range() only *if* it gets
> > > > > > > -EOPNOTSUPP
> > > > > > >      or -EXDEV.
> > > > > > > Changes since v3
> > > > > > > - dropped the COPY_FILE_SPLICE flag
> > > > > > > - kept the f_op's checks early in generic_copy_file_checks,
> > > > > > > implementing
> > > > > > >      Amir's suggestions
> > > > > > > - modified nfsd to use generic_copy_file_range()
> > > > > > > Changes since v2
> > > > > > > - do all the required checks earlier, in generic_copy_file_checks(),
> > > > > > >      adding new checks for ->remap_file_range
> > > > > > > - new COPY_FILE_SPLICE flag
> > > > > > > - don't remove filesystem's fallback to generic_copy_file_range()
> > > > > > > - updated commit changelog (and subject)
> > > > > > > Changes since v1 (after Amir review)
> > > > > > > - restored do_copy_file_range() helper
> > > > > > > - return -EOPNOTSUPP if fs doesn't implement CFR
> > > > > > > - updated commit description
> > > > > > > 
> > > > > > >     fs/nfsd/vfs.c   |  8 +++++++-
> > > > > > >     fs/read_write.c | 49
> > > > > > > ++++++++++++++++++++++++-------------------------
> > > > > > >     2 files changed, 31 insertions(+), 26 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > > > > > > index 04937e51de56..23dab0fa9087 100644
> > > > > > > --- a/fs/nfsd/vfs.c
> > > > > > > +++ b/fs/nfsd/vfs.c
> > > > > > > @@ -568,6 +568,7 @@ __be32 nfsd4_clone_file_range(struct nfsd_file
> > > > > > > *nf_src, u64 src_pos,
> > > > > > >     ssize_t nfsd_copy_file_range(struct file *src, u64 src_pos,
> > > > > > > struct file *dst,
> > > > > > >                      u64 dst_pos, u64 count)
> > > > > > >     {
> > > > > > > +    ssize_t ret;
> > > > > > >         /*
> > > > > > >          * Limit copy to 4MB to prevent indefinitely blocking an nfsd
> > > > > > > @@ -578,7 +579,12 @@ ssize_t nfsd_copy_file_range(struct file *src,
> > > > > > > u64 src_pos, struct file *dst,
> > > > > > >          * limit like this and pipeline multiple COPY requests.
> > > > > > >          */
> > > > > > >         count = min_t(u64, count, 1 << 22);
> > > > > > > -    return vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0);
> > > > > > > +    ret = vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0);
> > > > > > > +
> > > > > > > +    if (ret == -EOPNOTSUPP || ret == -EXDEV)
> > > > > > > +        ret = generic_copy_file_range(src, src_pos, dst, dst_pos,
> > > > > > > +                          count, 0);
> > > > > > > +    return ret;
> > > > > > >     }
> > > > > > >     __be32 nfsd4_vfs_fallocate(struct svc_rqst *rqstp, struct svc_fh
> > > > > > > *fhp,
> > > > > > > diff --git a/fs/read_write.c b/fs/read_write.c
> > > > > > > index 75f764b43418..5a26297fd410 100644
> > > > > > > --- a/fs/read_write.c
> > > > > > > +++ b/fs/read_write.c
> > > > > > > @@ -1388,28 +1388,6 @@ ssize_t generic_copy_file_range(struct file
> > > > > > > *file_in, loff_t pos_in,
> > > > > > >     }
> > > > > > >     EXPORT_SYMBOL(generic_copy_file_range);
> > > > > > > -static ssize_t do_copy_file_range(struct file *file_in, loff_t
> > > > > > > pos_in,
> > > > > > > -                  struct file *file_out, loff_t pos_out,
> > > > > > > -                  size_t len, unsigned int flags)
> > > > > > > -{
> > > > > > > -    /*
> > > > > > > -     * Although we now allow filesystems to handle cross sb copy,
> > > > > > > passing
> > > > > > > -     * a file of the wrong filesystem type to filesystem driver
> > > > > > > can result
> > > > > > > -     * in an attempt to dereference the wrong type of
> > > > > > > ->private_data, so
> > > > > > > -     * avoid doing that until we really have a good reason.  NFS
> > > > > > > defines
> > > > > > > -     * several different file_system_type structures, but they all
> > > > > > > end up
> > > > > > > -     * using the same ->copy_file_range() function pointer.
> > > > > > > -     */
> > > > > > > -    if (file_out->f_op->copy_file_range &&
> > > > > > > -        file_out->f_op->copy_file_range ==
> > > > > > > file_in->f_op->copy_file_range)
> > > > > > > -        return file_out->f_op->copy_file_range(file_in, pos_in,
> > > > > > > -                               file_out, pos_out,
> > > > > > > -                               len, flags);
> > > > > > > -
> > > > > > > -    return generic_copy_file_range(file_in, pos_in, file_out,
> > > > > > > pos_out, len,
> > > > > > > -                       flags);
> > > > > > > -}
> > > > > > > -
> > > > > > >     /*
> > > > > > >      * Performs necessary checks before doing a file copy
> > > > > > >      *
> > > > > > > @@ -1427,6 +1405,25 @@ static int generic_copy_file_checks(struct
> > > > > > > file *file_in, loff_t pos_in,
> > > > > > >         loff_t size_in;
> > > > > > >         int ret;
> > > > > > > +    /*
> > > > > > > +     * Although we now allow filesystems to handle cross sb copy,
> > > > > > > passing
> > > > > > > +     * a file of the wrong filesystem type to filesystem driver
> > > > > > > can result
> > > > > > > +     * in an attempt to dereference the wrong type of
> > > > > > > ->private_data, so
> > > > > > > +     * avoid doing that until we really have a good reason.  NFS
> > > > > > > defines
> > > > > > > +     * several different file_system_type structures, but they all
> > > > > > > end up
> > > > > > > +     * using the same ->copy_file_range() function pointer.
> > > > > > > +     */
> > > > > > > +    if (file_out->f_op->copy_file_range) {
> > > > > > > +        if (file_in->f_op->copy_file_range !=
> > > > > > > +            file_out->f_op->copy_file_range)
> > > > > > > +            return -EXDEV;
> > > > > > > +    } else if (file_in->f_op->remap_file_range) {
> > > > > > > +        if (file_inode(file_in)->i_sb != file_inode(file_out)->i_sb)
> > > > > > > +            return -EXDEV;
> > > > > > I think this check is redundant, it's done in vfs_copy_file_range.
> > > > > > If this check is removed then the else clause below should be removed
> > > > > > also. Once this check and the else clause are removed then might as
> > > > > > well move the the check of copy_file_range from here to
> > > > > > vfs_copy_file_range.
> > > > > > 
> > > > > I don't think it's really redundant, although I agree is messy due to
> > > > > the
> > > > > fact we try to clone first instead of copying them.
> > > > > 
> > > > > So, in the clone path, this is the only place where we return -EXDEV if:
> > > > > 
> > > > > 1) we don't have ->copy_file_range *and*
> > > > > 2) we have ->remap_file_range but the i_sb are different.
> > > > > 
> > > > > The check in vfs_copy_file_range() is only executed if:
> > > > > 
> > > > > 1) we have *valid* ->copy_file_range ops and/or
> > > > > 2) we have *valid* ->remap_file_range
> > > > > 
> > > > > So... if we remove the check in generic_copy_file_checks() as you
> > > > > suggest
> > > > > and:
> > > > > - we don't have ->copy_file_range,
> > > > > - we have ->remap_file_range but
> > > > > - the i_sb are different
> > > > > 
> > > > > we'll return the -EOPNOTSUPP (the one set in line "ret =
> > > > > -EOPNOTSUPP;" in
> > > > > function vfs_copy_file_range() ) instead of -EXDEV.
> > > > Yes, this is the different.The NFS code handles both -EOPNOTSUPP and
> > > > -EXDEVV by doing generic_copy_file_range.  Do any other consumers of
> > > > vfs_copy_file_range rely on -EXDEV and not -EOPNOTSUPP and which is
> > > > the correct error code for this case? It seems to me that -EOPNOTSUPP
> > > > is more appropriate than EXDEV when (sb1 != sb2).
> > EXDEV is the right code for:
> > filesystem supports the operation but not for sb1 != sb1.
> > 
> > > So with the current patch, for a clone operation across 2 filesystems:
> > > 
> > >     . if src and dst filesystem support both copy_file_range and
> > >       map_file_range then the code returns -ENOTSUPPORT.
> > > 
> > Why do you say that?
> > Which code are you referring to exactly?
> 
> If the filesystems support both copy_file_range and map_file_range,
> it passes the check in generic_file_check but it fails with the
> check in vfs_copy_file_range and returns -ENOTSUPPORT (added by
> the v8 patch)

I'm sorry but I can't simply see where this can happen.  If both syscalls
are present (and all other checks pass), the code will first try the
->map_file_range.  If that succeeds, it bails out; if that fails, it tries
the ->copy_file_range.  The -ENOTSUPPORT is just there for the case the
->map_file_range fails and ->copy_file_range isn't implemented.

[ <sigh> It would be so much easier if we didn't attempt to clone. ]

But as I said previously, I'm way beyond embarrassment now as I failed to
see too many obvious mistakes in previous versions :-)

Cheers,
--
Luís
Dai Ngo Feb. 24, 2021, 12:13 a.m. UTC | #12
On 2/23/21 9:33 AM, Amir Goldstein wrote:
> On Tue, Feb 23, 2021 at 7:31 PM <dai.ngo@oracle.com> wrote:
>> On 2/23/21 8:57 AM, dai.ngo@oracle.com wrote:
>>
>>
>> On 2/23/21 8:47 AM, Amir Goldstein wrote:
>>
>> On Tue, Feb 23, 2021 at 6:02 PM <dai.ngo@oracle.com> wrote:
>>
>>
>> On 2/23/21 7:29 AM, dai.ngo@oracle.com wrote:
>>
>> On 2/23/21 2:32 AM, Luis Henriques wrote:
>>
>> On Mon, Feb 22, 2021 at 08:25:27AM -0800, dai.ngo@oracle.com wrote:
>>
>> On 2/22/21 2:24 AM, Luis Henriques wrote:
>>
>> A regression has been reported by Nicolas Boichat, found while
>> using the
>> copy_file_range syscall to copy a tracefs file.  Before commit
>> 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices") the
>> kernel would return -EXDEV to userspace when trying to copy a file
>> across
>> different filesystems.  After this commit, the syscall doesn't fail
>> anymore
>> and instead returns zero (zero bytes copied), as this file's
>> content is
>> generated on-the-fly and thus reports a size of zero.
>>
>> This patch restores some cross-filesystem copy restrictions that
>> existed
>> prior to commit 5dae222a5ff0 ("vfs: allow copy_file_range to copy
>> across
>> devices").  Filesystems are still allowed to fall-back to the VFS
>> generic_copy_file_range() implementation, but that has now to be done
>> explicitly.
>>
>> nfsd is also modified to fall-back into generic_copy_file_range()
>> in case
>> vfs_copy_file_range() fails with -EOPNOTSUPP or -EXDEV.
>>
>> Fixes: 5dae222a5ff0 ("vfs: allow copy_file_range to copy across
>> devices")
>> Link:
>> https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@chromium.org/__;!!GqivPVa7Brio!P1UWThiSkxbjfjFQWNYJmCxGEkiLFyvHjH6cS-G1ZTt1z-TeqwGQgQmi49dC6w$
>> Link:
>> https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/CANMq1KDZuxir2LM5jOTm0xx*BnvW=ZmpsG47CyHFJwnw7zSX6Q@mail.gmail.com/__;Kw!!GqivPVa7Brio!P1UWThiSkxbjfjFQWNYJmCxGEkiLFyvHjH6cS-G1ZTt1z-TeqwGQgQmgCmMHzA$
>> Link:
>> https://urldefense.com/v3/__https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/__;!!GqivPVa7Brio!P1UWThiSkxbjfjFQWNYJmCxGEkiLFyvHjH6cS-G1ZTt1z-TeqwGQgQmzqItkrQ$
>> Reported-by: Nicolas Boichat <drinkcat@chromium.org>
>> Signed-off-by: Luis Henriques <lhenriques@suse.de>
>> ---
>> Changes since v7
>> - set 'ret' to '-EOPNOTSUPP' before the clone 'if' statement so
>> that the
>>       error returned is always related to the 'copy' operation
>> Changes since v6
>> - restored i_sb checks for the clone operation
>> Changes since v5
>> - check if ->copy_file_range is NULL before calling it
>> Changes since v4
>> - nfsd falls-back to generic_copy_file_range() only *if* it gets
>> -EOPNOTSUPP
>>       or -EXDEV.
>> Changes since v3
>> - dropped the COPY_FILE_SPLICE flag
>> - kept the f_op's checks early in generic_copy_file_checks,
>> implementing
>>       Amir's suggestions
>> - modified nfsd to use generic_copy_file_range()
>> Changes since v2
>> - do all the required checks earlier, in generic_copy_file_checks(),
>>       adding new checks for ->remap_file_range
>> - new COPY_FILE_SPLICE flag
>> - don't remove filesystem's fallback to generic_copy_file_range()
>> - updated commit changelog (and subject)
>> Changes since v1 (after Amir review)
>> - restored do_copy_file_range() helper
>> - return -EOPNOTSUPP if fs doesn't implement CFR
>> - updated commit description
>>
>>      fs/nfsd/vfs.c   |  8 +++++++-
>>      fs/read_write.c | 49
>> ++++++++++++++++++++++++-------------------------
>>      2 files changed, 31 insertions(+), 26 deletions(-)
>>
>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>> index 04937e51de56..23dab0fa9087 100644
>> --- a/fs/nfsd/vfs.c
>> +++ b/fs/nfsd/vfs.c
>> @@ -568,6 +568,7 @@ __be32 nfsd4_clone_file_range(struct nfsd_file
>> *nf_src, u64 src_pos,
>>      ssize_t nfsd_copy_file_range(struct file *src, u64 src_pos,
>> struct file *dst,
>>                       u64 dst_pos, u64 count)
>>      {
>> +    ssize_t ret;
>>          /*
>>           * Limit copy to 4MB to prevent indefinitely blocking an nfsd
>> @@ -578,7 +579,12 @@ ssize_t nfsd_copy_file_range(struct file *src,
>> u64 src_pos, struct file *dst,
>>           * limit like this and pipeline multiple COPY requests.
>>           */
>>          count = min_t(u64, count, 1 << 22);
>> -    return vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0);
>> +    ret = vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0);
>> +
>> +    if (ret == -EOPNOTSUPP || ret == -EXDEV)
>> +        ret = generic_copy_file_range(src, src_pos, dst, dst_pos,
>> +                          count, 0);
>> +    return ret;
>>      }
>>      __be32 nfsd4_vfs_fallocate(struct svc_rqst *rqstp, struct svc_fh
>> *fhp,
>> diff --git a/fs/read_write.c b/fs/read_write.c
>> index 75f764b43418..5a26297fd410 100644
>> --- a/fs/read_write.c
>> +++ b/fs/read_write.c
>> @@ -1388,28 +1388,6 @@ ssize_t generic_copy_file_range(struct file
>> *file_in, loff_t pos_in,
>>      }
>>      EXPORT_SYMBOL(generic_copy_file_range);
>> -static ssize_t do_copy_file_range(struct file *file_in, loff_t
>> pos_in,
>> -                  struct file *file_out, loff_t pos_out,
>> -                  size_t len, unsigned int flags)
>> -{
>> -    /*
>> -     * Although we now allow filesystems to handle cross sb copy,
>> passing
>> -     * a file of the wrong filesystem type to filesystem driver
>> can result
>> -     * in an attempt to dereference the wrong type of
>> ->private_data, so
>> -     * avoid doing that until we really have a good reason.  NFS
>> defines
>> -     * several different file_system_type structures, but they all
>> end up
>> -     * using the same ->copy_file_range() function pointer.
>> -     */
>> -    if (file_out->f_op->copy_file_range &&
>> -        file_out->f_op->copy_file_range ==
>> file_in->f_op->copy_file_range)
>> -        return file_out->f_op->copy_file_range(file_in, pos_in,
>> -                               file_out, pos_out,
>> -                               len, flags);
>> -
>> -    return generic_copy_file_range(file_in, pos_in, file_out,
>> pos_out, len,
>> -                       flags);
>> -}
>> -
>>      /*
>>       * Performs necessary checks before doing a file copy
>>       *
>> @@ -1427,6 +1405,25 @@ static int generic_copy_file_checks(struct
>> file *file_in, loff_t pos_in,
>>          loff_t size_in;
>>          int ret;
>> +    /*
>> +     * Although we now allow filesystems to handle cross sb copy,
>> passing
>> +     * a file of the wrong filesystem type to filesystem driver
>> can result
>> +     * in an attempt to dereference the wrong type of
>> ->private_data, so
>> +     * avoid doing that until we really have a good reason.  NFS
>> defines
>> +     * several different file_system_type structures, but they all
>> end up
>> +     * using the same ->copy_file_range() function pointer.
>> +     */
>> +    if (file_out->f_op->copy_file_range) {
>> +        if (file_in->f_op->copy_file_range !=
>> +            file_out->f_op->copy_file_range)
>> +            return -EXDEV;
>> +    } else if (file_in->f_op->remap_file_range) {
>> +        if (file_inode(file_in)->i_sb != file_inode(file_out)->i_sb)
>> +            return -EXDEV;
>>
>> I think this check is redundant, it's done in vfs_copy_file_range.
>> If this check is removed then the else clause below should be removed
>> also. Once this check and the else clause are removed then might as
>> well move the the check of copy_file_range from here to
>> vfs_copy_file_range.
>>
>> I don't think it's really redundant, although I agree is messy due to
>> the
>> fact we try to clone first instead of copying them.
>>
>> So, in the clone path, this is the only place where we return -EXDEV if:
>>
>> 1) we don't have ->copy_file_range *and*
>> 2) we have ->remap_file_range but the i_sb are different.
>>
>> The check in vfs_copy_file_range() is only executed if:
>>
>> 1) we have *valid* ->copy_file_range ops and/or
>> 2) we have *valid* ->remap_file_range
>>
>> So... if we remove the check in generic_copy_file_checks() as you
>> suggest
>> and:
>> - we don't have ->copy_file_range,
>> - we have ->remap_file_range but
>> - the i_sb are different
>>
>> we'll return the -EOPNOTSUPP (the one set in line "ret =
>> -EOPNOTSUPP;" in
>> function vfs_copy_file_range() ) instead of -EXDEV.
>>
>> Yes, this is the different.The NFS code handles both -EOPNOTSUPP and
>> -EXDEVV by doing generic_copy_file_range.  Do any other consumers of
>> vfs_copy_file_range rely on -EXDEV and not -EOPNOTSUPP and which is
>> the correct error code for this case? It seems to me that -EOPNOTSUPP
>> is more appropriate than EXDEV when (sb1 != sb2).
>>
>> EXDEV is the right code for:
>> filesystem supports the operation but not for sb1 != sb1.
>>
>> So with the current patch, for a clone operation across 2 filesystems:
>>
>>      . if src and dst filesystem support both copy_file_range and
>>        map_file_range then the code returns -ENOTSUPPORT.
>>
>> Why do you say that?
>> Which code are you referring to exactly?
>>
>>
>> If the filesystems support both copy_file_range and map_file_range,
>> it passes the check in generic_file_check but it fails with the
>> check in vfs_copy_file_range and returns -ENOTSUPPORT (added by
>> the v8 patch)
>>
>> Ok, I misread the code here. If it passes the check in generic_copy_file_checks
>> and it fails the sb check in vfs_copy_file_range then it tries copy_file_range
>> so it's ok.
>>
>> I think having the check in both generic_copy_file_checks and vfs_copy_file_range
>> making the code hard to read. What's the reason not to do the check only in
>> vfs_copy_file_range?
>>
> You are going in circles.
> I already answered that.
> Please re-read the entire thread on all patch versions before commenting.

I'm fine with the patch as it is, as long as it does not break NFS.

I just think it's easier to read if the checks are done in
vfs_copy_file_range such as:

@@ -1495,6 +1473,11 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
  
         file_start_write(file_out);
  
+       if (file_out->f_op->copy_file_range == NULL &&
+           file_in->f_op->remap_file_range == NULL)
+               return -EOPNOTSUPP;     /* not sure this error is needed */
+
+       ret = -EXDEV;
         /*
          * Try cloning first, this is supported by more file systems, and
          * more efficient if both clone and copy are supported (e.g. NFS).
@@ -1513,9 +1496,10 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
                 }
         }
  
-       ret = do_copy_file_range(file_in, pos_in, file_out, pos_out, len,
-                               flags);
-       WARN_ON_ONCE(ret == -EOPNOTSUPP);
+       if (file_out->f_op->copy_file_range &&
+           file_out->f_op->copy_file_range == file_in->f_op->copy_file_range) {
+               ret = file_out->f_op->copy_file_range(file_in, pos_in,
+                               file_out, pos_out, len, flags);
  done:
         if (ret > 0) {
                 fsnotify_access(file_in);

Thanks,
-Dai

>
> Thanks,
> Amir.
Olga Kornievskaia Feb. 24, 2021, 1 a.m. UTC | #13
On Mon, Feb 22, 2021 at 5:25 AM Luis Henriques <lhenriques@suse.de> wrote:
>
> A regression has been reported by Nicolas Boichat, found while using the
> copy_file_range syscall to copy a tracefs file.  Before commit
> 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices") the
> kernel would return -EXDEV to userspace when trying to copy a file across
> different filesystems.  After this commit, the syscall doesn't fail anymore
> and instead returns zero (zero bytes copied), as this file's content is
> generated on-the-fly and thus reports a size of zero.
>
> This patch restores some cross-filesystem copy restrictions that existed
> prior to commit 5dae222a5ff0 ("vfs: allow copy_file_range to copy across
> devices").  Filesystems are still allowed to fall-back to the VFS
> generic_copy_file_range() implementation, but that has now to be done
> explicitly.
>
> nfsd is also modified to fall-back into generic_copy_file_range() in case
> vfs_copy_file_range() fails with -EOPNOTSUPP or -EXDEV.
>
> Fixes: 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices")
> Link: https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@chromium.org/
> Link: https://lore.kernel.org/linux-fsdevel/CANMq1KDZuxir2LM5jOTm0xx+BnvW=ZmpsG47CyHFJwnw7zSX6Q@mail.gmail.com/
> Link: https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/
> Reported-by: Nicolas Boichat <drinkcat@chromium.org>
> Signed-off-by: Luis Henriques <lhenriques@suse.de>

I tested v8 and I believe it works for NFS.

> ---
> Changes since v7
> - set 'ret' to '-EOPNOTSUPP' before the clone 'if' statement so that the
>   error returned is always related to the 'copy' operation
> Changes since v6
> - restored i_sb checks for the clone operation
> Changes since v5
> - check if ->copy_file_range is NULL before calling it
> Changes since v4
> - nfsd falls-back to generic_copy_file_range() only *if* it gets -EOPNOTSUPP
>   or -EXDEV.
> Changes since v3
> - dropped the COPY_FILE_SPLICE flag
> - kept the f_op's checks early in generic_copy_file_checks, implementing
>   Amir's suggestions
> - modified nfsd to use generic_copy_file_range()
> Changes since v2
> - do all the required checks earlier, in generic_copy_file_checks(),
>   adding new checks for ->remap_file_range
> - new COPY_FILE_SPLICE flag
> - don't remove filesystem's fallback to generic_copy_file_range()
> - updated commit changelog (and subject)
> Changes since v1 (after Amir review)
> - restored do_copy_file_range() helper
> - return -EOPNOTSUPP if fs doesn't implement CFR
> - updated commit description
>
>  fs/nfsd/vfs.c   |  8 +++++++-
>  fs/read_write.c | 49 ++++++++++++++++++++++++-------------------------
>  2 files changed, 31 insertions(+), 26 deletions(-)
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 04937e51de56..23dab0fa9087 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -568,6 +568,7 @@ __be32 nfsd4_clone_file_range(struct nfsd_file *nf_src, u64 src_pos,
>  ssize_t nfsd_copy_file_range(struct file *src, u64 src_pos, struct file *dst,
>                              u64 dst_pos, u64 count)
>  {
> +       ssize_t ret;
>
>         /*
>          * Limit copy to 4MB to prevent indefinitely blocking an nfsd
> @@ -578,7 +579,12 @@ ssize_t nfsd_copy_file_range(struct file *src, u64 src_pos, struct file *dst,
>          * limit like this and pipeline multiple COPY requests.
>          */
>         count = min_t(u64, count, 1 << 22);
> -       return vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0);
> +       ret = vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0);
> +
> +       if (ret == -EOPNOTSUPP || ret == -EXDEV)
> +               ret = generic_copy_file_range(src, src_pos, dst, dst_pos,
> +                                             count, 0);
> +       return ret;
>  }
>
>  __be32 nfsd4_vfs_fallocate(struct svc_rqst *rqstp, struct svc_fh *fhp,
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 75f764b43418..5a26297fd410 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1388,28 +1388,6 @@ ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in,
>  }
>  EXPORT_SYMBOL(generic_copy_file_range);
>
> -static ssize_t do_copy_file_range(struct file *file_in, loff_t pos_in,
> -                                 struct file *file_out, loff_t pos_out,
> -                                 size_t len, unsigned int flags)
> -{
> -       /*
> -        * Although we now allow filesystems to handle cross sb copy, passing
> -        * a file of the wrong filesystem type to filesystem driver can result
> -        * in an attempt to dereference the wrong type of ->private_data, so
> -        * avoid doing that until we really have a good reason.  NFS defines
> -        * several different file_system_type structures, but they all end up
> -        * using the same ->copy_file_range() function pointer.
> -        */
> -       if (file_out->f_op->copy_file_range &&
> -           file_out->f_op->copy_file_range == file_in->f_op->copy_file_range)
> -               return file_out->f_op->copy_file_range(file_in, pos_in,
> -                                                      file_out, pos_out,
> -                                                      len, flags);
> -
> -       return generic_copy_file_range(file_in, pos_in, file_out, pos_out, len,
> -                                      flags);
> -}
> -
>  /*
>   * Performs necessary checks before doing a file copy
>   *
> @@ -1427,6 +1405,25 @@ static int generic_copy_file_checks(struct file *file_in, loff_t pos_in,
>         loff_t size_in;
>         int ret;
>
> +       /*
> +        * Although we now allow filesystems to handle cross sb copy, passing
> +        * a file of the wrong filesystem type to filesystem driver can result
> +        * in an attempt to dereference the wrong type of ->private_data, so
> +        * avoid doing that until we really have a good reason.  NFS defines
> +        * several different file_system_type structures, but they all end up
> +        * using the same ->copy_file_range() function pointer.
> +        */
> +       if (file_out->f_op->copy_file_range) {
> +               if (file_in->f_op->copy_file_range !=
> +                   file_out->f_op->copy_file_range)
> +                       return -EXDEV;
> +       } else if (file_in->f_op->remap_file_range) {
> +               if (file_inode(file_in)->i_sb != file_inode(file_out)->i_sb)
> +                       return -EXDEV;
> +       } else {
> +                return -EOPNOTSUPP;
> +       }
> +
>         ret = generic_file_rw_checks(file_in, file_out);
>         if (ret)
>                 return ret;
> @@ -1495,6 +1492,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>
>         file_start_write(file_out);
>
> +       ret = -EOPNOTSUPP;
>         /*
>          * Try cloning first, this is supported by more file systems, and
>          * more efficient if both clone and copy are supported (e.g. NFS).
> @@ -1513,9 +1511,10 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>                 }
>         }
>
> -       ret = do_copy_file_range(file_in, pos_in, file_out, pos_out, len,
> -                               flags);
> -       WARN_ON_ONCE(ret == -EOPNOTSUPP);
> +       if (file_out->f_op->copy_file_range)
> +               ret = file_out->f_op->copy_file_range(file_in, pos_in,
> +                                                     file_out, pos_out,
> +                                                     len, flags);
>  done:
>         if (ret > 0) {
>                 fsnotify_access(file_in);
Luis Henriques Feb. 24, 2021, 10:23 a.m. UTC | #14
On Tue, Feb 23, 2021 at 08:00:54PM -0500, Olga Kornievskaia wrote:
> On Mon, Feb 22, 2021 at 5:25 AM Luis Henriques <lhenriques@suse.de> wrote:
> >
> > A regression has been reported by Nicolas Boichat, found while using the
> > copy_file_range syscall to copy a tracefs file.  Before commit
> > 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices") the
> > kernel would return -EXDEV to userspace when trying to copy a file across
> > different filesystems.  After this commit, the syscall doesn't fail anymore
> > and instead returns zero (zero bytes copied), as this file's content is
> > generated on-the-fly and thus reports a size of zero.
> >
> > This patch restores some cross-filesystem copy restrictions that existed
> > prior to commit 5dae222a5ff0 ("vfs: allow copy_file_range to copy across
> > devices").  Filesystems are still allowed to fall-back to the VFS
> > generic_copy_file_range() implementation, but that has now to be done
> > explicitly.
> >
> > nfsd is also modified to fall-back into generic_copy_file_range() in case
> > vfs_copy_file_range() fails with -EOPNOTSUPP or -EXDEV.
> >
> > Fixes: 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices")
> > Link: https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@chromium.org/
> > Link: https://lore.kernel.org/linux-fsdevel/CANMq1KDZuxir2LM5jOTm0xx+BnvW=ZmpsG47CyHFJwnw7zSX6Q@mail.gmail.com/
> > Link: https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/
> > Reported-by: Nicolas Boichat <drinkcat@chromium.org>
> > Signed-off-by: Luis Henriques <lhenriques@suse.de>
> 
> I tested v8 and I believe it works for NFS.

Thanks a lot for the testing.  And to everyone else for reviews,
feedback,... and patience.

I'll now go look into the manpage and see what needs to be changed.

Cheers,
--
Luís
Nicolas Boichat Feb. 24, 2021, 10:44 a.m. UTC | #15
On Wed, Feb 24, 2021 at 6:22 PM Luis Henriques <lhenriques@suse.de> wrote:
>
> On Tue, Feb 23, 2021 at 08:00:54PM -0500, Olga Kornievskaia wrote:
> > On Mon, Feb 22, 2021 at 5:25 AM Luis Henriques <lhenriques@suse.de> wrote:
> > >
> > > A regression has been reported by Nicolas Boichat, found while using the
> > > copy_file_range syscall to copy a tracefs file.  Before commit
> > > 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices") the
> > > kernel would return -EXDEV to userspace when trying to copy a file across
> > > different filesystems.  After this commit, the syscall doesn't fail anymore
> > > and instead returns zero (zero bytes copied), as this file's content is
> > > generated on-the-fly and thus reports a size of zero.
> > >
> > > This patch restores some cross-filesystem copy restrictions that existed
> > > prior to commit 5dae222a5ff0 ("vfs: allow copy_file_range to copy across
> > > devices").  Filesystems are still allowed to fall-back to the VFS
> > > generic_copy_file_range() implementation, but that has now to be done
> > > explicitly.
> > >
> > > nfsd is also modified to fall-back into generic_copy_file_range() in case
> > > vfs_copy_file_range() fails with -EOPNOTSUPP or -EXDEV.
> > >
> > > Fixes: 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices")
> > > Link: https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@chromium.org/
> > > Link: https://lore.kernel.org/linux-fsdevel/CANMq1KDZuxir2LM5jOTm0xx+BnvW=ZmpsG47CyHFJwnw7zSX6Q@mail.gmail.com/
> > > Link: https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/
> > > Reported-by: Nicolas Boichat <drinkcat@chromium.org>
> > > Signed-off-by: Luis Henriques <lhenriques@suse.de>
> >
> > I tested v8 and I believe it works for NFS.
>
> Thanks a lot for the testing.  And to everyone else for reviews,
> feedback,... and patience.

Thanks so much to you!!!

Works here, you can add my
Tested-by: Nicolas Boichat <drinkcat@chromium.org>

>
> I'll now go look into the manpage and see what needs to be changed.
>
> Cheers,
> --
> Luís
Nicolas Boichat April 9, 2021, 5:23 a.m. UTC | #16
On Wed, Feb 24, 2021 at 6:44 PM Nicolas Boichat <drinkcat@chromium.org> wrote:
>
> On Wed, Feb 24, 2021 at 6:22 PM Luis Henriques <lhenriques@suse.de> wrote:
> >
> > On Tue, Feb 23, 2021 at 08:00:54PM -0500, Olga Kornievskaia wrote:
> > > On Mon, Feb 22, 2021 at 5:25 AM Luis Henriques <lhenriques@suse.de> wrote:
> > > >
> > > > A regression has been reported by Nicolas Boichat, found while using the
> > > > copy_file_range syscall to copy a tracefs file.  Before commit
> > > > 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices") the
> > > > kernel would return -EXDEV to userspace when trying to copy a file across
> > > > different filesystems.  After this commit, the syscall doesn't fail anymore
> > > > and instead returns zero (zero bytes copied), as this file's content is
> > > > generated on-the-fly and thus reports a size of zero.
> > > >
> > > > This patch restores some cross-filesystem copy restrictions that existed
> > > > prior to commit 5dae222a5ff0 ("vfs: allow copy_file_range to copy across
> > > > devices").  Filesystems are still allowed to fall-back to the VFS
> > > > generic_copy_file_range() implementation, but that has now to be done
> > > > explicitly.
> > > >
> > > > nfsd is also modified to fall-back into generic_copy_file_range() in case
> > > > vfs_copy_file_range() fails with -EOPNOTSUPP or -EXDEV.
> > > >
> > > > Fixes: 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices")
> > > > Link: https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@chromium.org/
> > > > Link: https://lore.kernel.org/linux-fsdevel/CANMq1KDZuxir2LM5jOTm0xx+BnvW=ZmpsG47CyHFJwnw7zSX6Q@mail.gmail.com/
> > > > Link: https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/
> > > > Reported-by: Nicolas Boichat <drinkcat@chromium.org>
> > > > Signed-off-by: Luis Henriques <lhenriques@suse.de>
> > >
> > > I tested v8 and I believe it works for NFS.
> >
> > Thanks a lot for the testing.  And to everyone else for reviews,
> > feedback,... and patience.
>
> Thanks so much to you!!!
>
> Works here, you can add my
> Tested-by: Nicolas Boichat <drinkcat@chromium.org>

What happened to this patch? It does not seem to have been picked up
yet? Any reason why?

> >
> > I'll now go look into the manpage and see what needs to be changed.
> >
> > Cheers,
> > --
> > Luís
Luis Henriques April 9, 2021, 1:39 p.m. UTC | #17
Nicolas Boichat <drinkcat@chromium.org> writes:

> On Wed, Feb 24, 2021 at 6:44 PM Nicolas Boichat <drinkcat@chromium.org> wrote:
>>
>> On Wed, Feb 24, 2021 at 6:22 PM Luis Henriques <lhenriques@suse.de> wrote:
>> >
>> > On Tue, Feb 23, 2021 at 08:00:54PM -0500, Olga Kornievskaia wrote:
>> > > On Mon, Feb 22, 2021 at 5:25 AM Luis Henriques <lhenriques@suse.de> wrote:
>> > > >
>> > > > A regression has been reported by Nicolas Boichat, found while using the
>> > > > copy_file_range syscall to copy a tracefs file.  Before commit
>> > > > 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices") the
>> > > > kernel would return -EXDEV to userspace when trying to copy a file across
>> > > > different filesystems.  After this commit, the syscall doesn't fail anymore
>> > > > and instead returns zero (zero bytes copied), as this file's content is
>> > > > generated on-the-fly and thus reports a size of zero.
>> > > >
>> > > > This patch restores some cross-filesystem copy restrictions that existed
>> > > > prior to commit 5dae222a5ff0 ("vfs: allow copy_file_range to copy across
>> > > > devices").  Filesystems are still allowed to fall-back to the VFS
>> > > > generic_copy_file_range() implementation, but that has now to be done
>> > > > explicitly.
>> > > >
>> > > > nfsd is also modified to fall-back into generic_copy_file_range() in case
>> > > > vfs_copy_file_range() fails with -EOPNOTSUPP or -EXDEV.
>> > > >
>> > > > Fixes: 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices")
>> > > > Link: https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@chromium.org/
>> > > > Link: https://lore.kernel.org/linux-fsdevel/CANMq1KDZuxir2LM5jOTm0xx+BnvW=ZmpsG47CyHFJwnw7zSX6Q@mail.gmail.com/
>> > > > Link: https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/
>> > > > Reported-by: Nicolas Boichat <drinkcat@chromium.org>
>> > > > Signed-off-by: Luis Henriques <lhenriques@suse.de>
>> > >
>> > > I tested v8 and I believe it works for NFS.
>> >
>> > Thanks a lot for the testing.  And to everyone else for reviews,
>> > feedback,... and patience.
>>
>> Thanks so much to you!!!
>>
>> Works here, you can add my
>> Tested-by: Nicolas Boichat <drinkcat@chromium.org>
>
> What happened to this patch? It does not seem to have been picked up
> yet? Any reason why?

Hmm... good question.  I'm not actually sure who would be picking it.  Al,
maybe...?

Cheers,
Amir Goldstein April 9, 2021, 1:50 p.m. UTC | #18
On Fri, Apr 9, 2021 at 4:39 PM Luis Henriques <lhenriques@suse.de> wrote:
>
> Nicolas Boichat <drinkcat@chromium.org> writes:
>
> > On Wed, Feb 24, 2021 at 6:44 PM Nicolas Boichat <drinkcat@chromium.org> wrote:
> >>
> >> On Wed, Feb 24, 2021 at 6:22 PM Luis Henriques <lhenriques@suse.de> wrote:
> >> >
> >> > On Tue, Feb 23, 2021 at 08:00:54PM -0500, Olga Kornievskaia wrote:
> >> > > On Mon, Feb 22, 2021 at 5:25 AM Luis Henriques <lhenriques@suse.de> wrote:
> >> > > >
> >> > > > A regression has been reported by Nicolas Boichat, found while using the
> >> > > > copy_file_range syscall to copy a tracefs file.  Before commit
> >> > > > 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices") the
> >> > > > kernel would return -EXDEV to userspace when trying to copy a file across
> >> > > > different filesystems.  After this commit, the syscall doesn't fail anymore
> >> > > > and instead returns zero (zero bytes copied), as this file's content is
> >> > > > generated on-the-fly and thus reports a size of zero.
> >> > > >
> >> > > > This patch restores some cross-filesystem copy restrictions that existed
> >> > > > prior to commit 5dae222a5ff0 ("vfs: allow copy_file_range to copy across
> >> > > > devices").  Filesystems are still allowed to fall-back to the VFS
> >> > > > generic_copy_file_range() implementation, but that has now to be done
> >> > > > explicitly.
> >> > > >
> >> > > > nfsd is also modified to fall-back into generic_copy_file_range() in case
> >> > > > vfs_copy_file_range() fails with -EOPNOTSUPP or -EXDEV.
> >> > > >
> >> > > > Fixes: 5dae222a5ff0 ("vfs: allow copy_file_range to copy across devices")
> >> > > > Link: https://lore.kernel.org/linux-fsdevel/20210212044405.4120619-1-drinkcat@chromium.org/
> >> > > > Link: https://lore.kernel.org/linux-fsdevel/CANMq1KDZuxir2LM5jOTm0xx+BnvW=ZmpsG47CyHFJwnw7zSX6Q@mail.gmail.com/
> >> > > > Link: https://lore.kernel.org/linux-fsdevel/20210126135012.1.If45b7cdc3ff707bc1efa17f5366057d60603c45f@changeid/
> >> > > > Reported-by: Nicolas Boichat <drinkcat@chromium.org>
> >> > > > Signed-off-by: Luis Henriques <lhenriques@suse.de>
> >> > >
> >> > > I tested v8 and I believe it works for NFS.
> >> >
> >> > Thanks a lot for the testing.  And to everyone else for reviews,
> >> > feedback,... and patience.
> >>
> >> Thanks so much to you!!!
> >>
> >> Works here, you can add my
> >> Tested-by: Nicolas Boichat <drinkcat@chromium.org>
> >
> > What happened to this patch? It does not seem to have been picked up
> > yet? Any reason why?
>
> Hmm... good question.  I'm not actually sure who would be picking it.  Al,
> maybe...?
>

Darrick,

Would you mind taking this through your tree in case Al doesn't pick it up?

Thanks,
Amir.
diff mbox series

Patch

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 04937e51de56..23dab0fa9087 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -568,6 +568,7 @@  __be32 nfsd4_clone_file_range(struct nfsd_file *nf_src, u64 src_pos,
 ssize_t nfsd_copy_file_range(struct file *src, u64 src_pos, struct file *dst,
 			     u64 dst_pos, u64 count)
 {
+	ssize_t ret;
 
 	/*
 	 * Limit copy to 4MB to prevent indefinitely blocking an nfsd
@@ -578,7 +579,12 @@  ssize_t nfsd_copy_file_range(struct file *src, u64 src_pos, struct file *dst,
 	 * limit like this and pipeline multiple COPY requests.
 	 */
 	count = min_t(u64, count, 1 << 22);
-	return vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0);
+	ret = vfs_copy_file_range(src, src_pos, dst, dst_pos, count, 0);
+
+	if (ret == -EOPNOTSUPP || ret == -EXDEV)
+		ret = generic_copy_file_range(src, src_pos, dst, dst_pos,
+					      count, 0);
+	return ret;
 }
 
 __be32 nfsd4_vfs_fallocate(struct svc_rqst *rqstp, struct svc_fh *fhp,
diff --git a/fs/read_write.c b/fs/read_write.c
index 75f764b43418..5a26297fd410 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1388,28 +1388,6 @@  ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in,
 }
 EXPORT_SYMBOL(generic_copy_file_range);
 
-static ssize_t do_copy_file_range(struct file *file_in, loff_t pos_in,
-				  struct file *file_out, loff_t pos_out,
-				  size_t len, unsigned int flags)
-{
-	/*
-	 * Although we now allow filesystems to handle cross sb copy, passing
-	 * a file of the wrong filesystem type to filesystem driver can result
-	 * in an attempt to dereference the wrong type of ->private_data, so
-	 * avoid doing that until we really have a good reason.  NFS defines
-	 * several different file_system_type structures, but they all end up
-	 * using the same ->copy_file_range() function pointer.
-	 */
-	if (file_out->f_op->copy_file_range &&
-	    file_out->f_op->copy_file_range == file_in->f_op->copy_file_range)
-		return file_out->f_op->copy_file_range(file_in, pos_in,
-						       file_out, pos_out,
-						       len, flags);
-
-	return generic_copy_file_range(file_in, pos_in, file_out, pos_out, len,
-				       flags);
-}
-
 /*
  * Performs necessary checks before doing a file copy
  *
@@ -1427,6 +1405,25 @@  static int generic_copy_file_checks(struct file *file_in, loff_t pos_in,
 	loff_t size_in;
 	int ret;
 
+	/*
+	 * Although we now allow filesystems to handle cross sb copy, passing
+	 * a file of the wrong filesystem type to filesystem driver can result
+	 * in an attempt to dereference the wrong type of ->private_data, so
+	 * avoid doing that until we really have a good reason.  NFS defines
+	 * several different file_system_type structures, but they all end up
+	 * using the same ->copy_file_range() function pointer.
+	 */
+	if (file_out->f_op->copy_file_range) {
+		if (file_in->f_op->copy_file_range !=
+		    file_out->f_op->copy_file_range)
+			return -EXDEV;
+	} else if (file_in->f_op->remap_file_range) {
+		if (file_inode(file_in)->i_sb != file_inode(file_out)->i_sb)
+			return -EXDEV;
+	} else {
+                return -EOPNOTSUPP;
+	}
+
 	ret = generic_file_rw_checks(file_in, file_out);
 	if (ret)
 		return ret;
@@ -1495,6 +1492,7 @@  ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 
 	file_start_write(file_out);
 
+	ret = -EOPNOTSUPP;
 	/*
 	 * Try cloning first, this is supported by more file systems, and
 	 * more efficient if both clone and copy are supported (e.g. NFS).
@@ -1513,9 +1511,10 @@  ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 		}
 	}
 
-	ret = do_copy_file_range(file_in, pos_in, file_out, pos_out, len,
-				flags);
-	WARN_ON_ONCE(ret == -EOPNOTSUPP);
+	if (file_out->f_op->copy_file_range)
+		ret = file_out->f_op->copy_file_range(file_in, pos_in,
+						      file_out, pos_out,
+						      len, flags);
 done:
 	if (ret > 0) {
 		fsnotify_access(file_in);