diff mbox series

[6/6] vfs: Disallow copy_file_range on generated file systems

Message ID 20210212124354.6.Idc9c3110d708aa0df9d8fe5a6246524dc8469dae@changeid (mailing list archive)
State New, archived
Headers show
Series Add generated flag to filesystem struct to block copy_file_range | expand

Commit Message

Nicolas Boichat Feb. 12, 2021, 4:44 a.m. UTC
copy_file_range (which calls generic_copy_file_checks) uses the
inode file size to adjust the copy count parameter. This breaks
with special filesystems like procfs/sysfs/debugfs/tracefs, where
the file size appears to be zero, but content is actually returned
when a read operation is performed. Other issues would also
happen on partial writes, as the function would attempt to seek
in the input file.

Use the newly introduced FS_GENERATED_CONTENT filesystem flag
to return -EOPNOTSUPP: applications can then retry with a more
usual read/write based file copy (the fallback code is usually
already present to handle older kernels).

Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
---

 fs/read_write.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Darrick J. Wong Feb. 12, 2021, 4:53 a.m. UTC | #1
On Fri, Feb 12, 2021 at 12:44:05PM +0800, Nicolas Boichat wrote:
> copy_file_range (which calls generic_copy_file_checks) uses the
> inode file size to adjust the copy count parameter. This breaks
> with special filesystems like procfs/sysfs/debugfs/tracefs, where
> the file size appears to be zero, but content is actually returned
> when a read operation is performed. Other issues would also
> happen on partial writes, as the function would attempt to seek
> in the input file.
> 
> Use the newly introduced FS_GENERATED_CONTENT filesystem flag
> to return -EOPNOTSUPP: applications can then retry with a more
> usual read/write based file copy (the fallback code is usually
> already present to handle older kernels).
> 
> Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
> ---
> 
>  fs/read_write.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 0029ff2b0ca8..80322e89fb0a 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1485,6 +1485,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>  	if (flags != 0)
>  		return -EINVAL;
>  
> +	if (file_inode(file_in)->i_sb->s_type->fs_flags & FS_GENERATED_CONTENT)
> +		return -EOPNOTSUPP;

Why not declare a dummy copy_file_range_nop function that returns
EOPNOTSUPP and point all of these filesystems at it?

(Or, I guess in these days where function pointers are the enemy,
create a #define that is a cast of 0x1, and fix do_copy_file_range to
return EOPNOTSUPP if it sees that?)

--D

> +
>  	ret = generic_copy_file_checks(file_in, pos_in, file_out, pos_out, &len,
>  				       flags);
>  	if (unlikely(ret))
> -- 
> 2.30.0.478.g8a0d178c01-goog
>
Darrick J. Wong Feb. 12, 2021, 4:59 a.m. UTC | #2
On Thu, Feb 11, 2021 at 08:53:47PM -0800, Darrick J. Wong wrote:
> On Fri, Feb 12, 2021 at 12:44:05PM +0800, Nicolas Boichat wrote:
> > copy_file_range (which calls generic_copy_file_checks) uses the
> > inode file size to adjust the copy count parameter. This breaks
> > with special filesystems like procfs/sysfs/debugfs/tracefs, where
> > the file size appears to be zero, but content is actually returned
> > when a read operation is performed. Other issues would also
> > happen on partial writes, as the function would attempt to seek
> > in the input file.
> > 
> > Use the newly introduced FS_GENERATED_CONTENT filesystem flag
> > to return -EOPNOTSUPP: applications can then retry with a more
> > usual read/write based file copy (the fallback code is usually
> > already present to handle older kernels).
> > 
> > Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
> > ---
> > 
> >  fs/read_write.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/fs/read_write.c b/fs/read_write.c
> > index 0029ff2b0ca8..80322e89fb0a 100644
> > --- a/fs/read_write.c
> > +++ b/fs/read_write.c
> > @@ -1485,6 +1485,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> >  	if (flags != 0)
> >  		return -EINVAL;
> >  
> > +	if (file_inode(file_in)->i_sb->s_type->fs_flags & FS_GENERATED_CONTENT)
> > +		return -EOPNOTSUPP;
> 
> Why not declare a dummy copy_file_range_nop function that returns
> EOPNOTSUPP and point all of these filesystems at it?
> 
> (Or, I guess in these days where function pointers are the enemy,
> create a #define that is a cast of 0x1, and fix do_copy_file_range to
> return EOPNOTSUPP if it sees that?)

Oh, I see, because that doesn't help if the source file is procfs and
the dest file is (say) xfs, because the generic version will try to do
splice magic and *poof*.

I guess the other nit thatI can think of at this late hour is ... what
about the other virtual filesystems like configfs and whatnot?  Should
we have a way to flag them as "this can't be the source of a CFR
request" as well?

Or is it just trace/debug/proc/sysfs that have these "zero size but
readable" speshul behaviors?

--D

> 
> --D
> 
> > +
> >  	ret = generic_copy_file_checks(file_in, pos_in, file_out, pos_out, &len,
> >  				       flags);
> >  	if (unlikely(ret))
> > -- 
> > 2.30.0.478.g8a0d178c01-goog
> >
Nicolas Boichat Feb. 12, 2021, 5:24 a.m. UTC | #3
On Fri, Feb 12, 2021 at 12:59 PM Darrick J. Wong <djwong@kernel.org> wrote:
>
> On Thu, Feb 11, 2021 at 08:53:47PM -0800, Darrick J. Wong wrote:
> > On Fri, Feb 12, 2021 at 12:44:05PM +0800, Nicolas Boichat wrote:
> > > copy_file_range (which calls generic_copy_file_checks) uses the
> > > inode file size to adjust the copy count parameter. This breaks
> > > with special filesystems like procfs/sysfs/debugfs/tracefs, where
> > > the file size appears to be zero, but content is actually returned
> > > when a read operation is performed. Other issues would also
> > > happen on partial writes, as the function would attempt to seek
> > > in the input file.
> > >
> > > Use the newly introduced FS_GENERATED_CONTENT filesystem flag
> > > to return -EOPNOTSUPP: applications can then retry with a more
> > > usual read/write based file copy (the fallback code is usually
> > > already present to handle older kernels).
> > >
> > > Signed-off-by: Nicolas Boichat <drinkcat@chromium.org>
> > > ---
> > >
> > >  fs/read_write.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > >
> > > diff --git a/fs/read_write.c b/fs/read_write.c
> > > index 0029ff2b0ca8..80322e89fb0a 100644
> > > --- a/fs/read_write.c
> > > +++ b/fs/read_write.c
> > > @@ -1485,6 +1485,9 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
> > >     if (flags != 0)
> > >             return -EINVAL;
> > >
> > > +   if (file_inode(file_in)->i_sb->s_type->fs_flags & FS_GENERATED_CONTENT)
> > > +           return -EOPNOTSUPP;
> >
> > Why not declare a dummy copy_file_range_nop function that returns
> > EOPNOTSUPP and point all of these filesystems at it?
> >
> > (Or, I guess in these days where function pointers are the enemy,
> > create a #define that is a cast of 0x1, and fix do_copy_file_range to
> > return EOPNOTSUPP if it sees that?)

I was pondering abusing ERR_PTR(-EOPNOTSUPP) for this purpose ,-P

>
> Oh, I see, because that doesn't help if the source file is procfs and
> the dest file is (say) xfs, because the generic version will try to do
> splice magic and *poof*.

Yep. I mean, we could still add a check if the
file_in->f_op->copy_file_range == copy_file_range_nop in
do_copy_file_range...
But then we'd need to sprinkle .copy_file_range = copy_file_range_nop
in many many places (~700 as a lower bound[1]), since the file
operation structure is defined at the file level, not at the FS level,
and people are likely to forget...

[1]
$ git grep "struct file_operations.*=" | grep debug | wc -l
631
$ git grep "struct file_operations.*=" | grep trace | wc -l
84

>
> I guess the other nit thatI can think of at this late hour is ... what
> about the other virtual filesystems like configfs and whatnot?  Should
> we have a way to flag them as "this can't be the source of a CFR
> request" as well?
>
> Or is it just trace/debug/proc/sysfs that have these "zero size but
> readable" speshul behaviors?

I did try to audit the other filesystems. The ones I spotted:
 - devpts should be fine (only device nodes in there)
 - I think pstore doesn't need the flag as it's RAM-backed and persistent.

But yes, I missed configfs, thanks for catching that. I think we need
to add the flag for that one (looks like the sizes are all 4K).

>
> --D
>
> >
> > --D
> >
> > > +
> > >     ret = generic_copy_file_checks(file_in, pos_in, file_out, pos_out, &len,
> > >                                    flags);
> > >     if (unlikely(ret))
> > > --
> > > 2.30.0.478.g8a0d178c01-goog
> > >
diff mbox series

Patch

diff --git a/fs/read_write.c b/fs/read_write.c
index 0029ff2b0ca8..80322e89fb0a 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1485,6 +1485,9 @@  ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	if (flags != 0)
 		return -EINVAL;
 
+	if (file_inode(file_in)->i_sb->s_type->fs_flags & FS_GENERATED_CONTENT)
+		return -EOPNOTSUPP;
+
 	ret = generic_copy_file_checks(file_in, pos_in, file_out, pos_out, &len,
 				       flags);
 	if (unlikely(ret))