diff mbox series

[v2,6/8] vfs: copy_file_range should update file timestamps

Message ID 20190526061100.21761-7-amir73il@gmail.com (mailing list archive)
State New, archived
Headers show
Series Fixes for major copy_file_range() issues | expand

Commit Message

Amir Goldstein May 26, 2019, 6:10 a.m. UTC
From: Dave Chinner <dchinner@redhat.com>

Timestamps are not updated right now, so programs looking for
timestamp updates for file modifications (like rsync) will not
detect that files have changed. We are also accessing the source
data when doing a copy (but not when cloning) so we need to update
atime on the source file as well.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/read_write.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Luis Henriques May 27, 2019, 2:35 p.m. UTC | #1
On Sun, May 26, 2019 at 09:10:57AM +0300, Amir Goldstein wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Timestamps are not updated right now, so programs looking for
> timestamp updates for file modifications (like rsync) will not
> detect that files have changed. We are also accessing the source
> data when doing a copy (but not when cloning) so we need to update
> atime on the source file as well.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/read_write.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/fs/read_write.c b/fs/read_write.c
> index e16bcafc0da2..4b23a86aacd9 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1576,6 +1576,16 @@ int generic_copy_file_range_prep(struct file *file_in, struct file *file_out)
>  
>  	WARN_ON_ONCE(!inode_is_locked(file_inode(file_out)));
>  
> +	/* Update source timestamps, because we are accessing file data */
> +	file_accessed(file_in);
> +
> +	/* Update destination timestamps, since we can alter file contents. */
> +	if (!(file_out->f_mode & FMODE_NOCMTIME)) {
> +		ret = file_update_time(file_out);
> +		if (ret)
> +			return ret;
> +	}
> +

Is this the right place for updating the timestamps?  I see that in same
cases we may be updating the timestamp even if there was an error and no
copy was performed.  For example, if file_remove_privs fails.

(btw, I've re-tested everything on ceph and everything seems to be
working fine.)

Cheers,
--
Luís


>  	/*
>  	 * Clear the security bits if the process is not being run by root.
>  	 * This keeps people from modifying setuid and setgid binaries.
> -- 
> 2.17.1
> 
>
Dave Chinner May 27, 2019, 10:05 p.m. UTC | #2
On Mon, May 27, 2019 at 03:35:39PM +0100, Luis Henriques wrote:
> On Sun, May 26, 2019 at 09:10:57AM +0300, Amir Goldstein wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Timestamps are not updated right now, so programs looking for
> > timestamp updates for file modifications (like rsync) will not
> > detect that files have changed. We are also accessing the source
> > data when doing a copy (but not when cloning) so we need to update
> > atime on the source file as well.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > ---
> >  fs/read_write.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/fs/read_write.c b/fs/read_write.c
> > index e16bcafc0da2..4b23a86aacd9 100644
> > --- a/fs/read_write.c
> > +++ b/fs/read_write.c
> > @@ -1576,6 +1576,16 @@ int generic_copy_file_range_prep(struct file *file_in, struct file *file_out)
> >  
> >  	WARN_ON_ONCE(!inode_is_locked(file_inode(file_out)));
> >  
> > +	/* Update source timestamps, because we are accessing file data */
> > +	file_accessed(file_in);
> > +
> > +	/* Update destination timestamps, since we can alter file contents. */
> > +	if (!(file_out->f_mode & FMODE_NOCMTIME)) {
> > +		ret = file_update_time(file_out);
> > +		if (ret)
> > +			return ret;
> > +	}
> > +
> 
> Is this the right place for updating the timestamps?  I see that in same
> cases we may be updating the timestamp even if there was an error and no
> copy was performed.  For example, if file_remove_privs fails.

It's the same place we do it for read - file_accessed() is called
before we do the IO - and the same place for write -
file_update_time() is called before we copy data into the pagecache
or do direct IO. As such, it really doesn't matter if it is before
or after file_remove_privs() - the IO can still fail for many
reasons after we've updated the timestamps and in some of the
failure cases (e.g. we failed the sync at the end of an O_DSYNC
buffered write) we still want the timestamps to be modified because
the data and/or user visible metadata /may/ have been changed.

cfr operates under the same constraints as read() and write(), so we
need to update the timestamps up front regardless of whether the
copy ends up succeeding or not....

Cheers,

Dave.
Luis Henriques May 28, 2019, 8:53 a.m. UTC | #3
Dave Chinner <david@fromorbit.com> writes:

> On Mon, May 27, 2019 at 03:35:39PM +0100, Luis Henriques wrote:
>> On Sun, May 26, 2019 at 09:10:57AM +0300, Amir Goldstein wrote:
>> > From: Dave Chinner <dchinner@redhat.com>
>> > 
>> > Timestamps are not updated right now, so programs looking for
>> > timestamp updates for file modifications (like rsync) will not
>> > detect that files have changed. We are also accessing the source
>> > data when doing a copy (but not when cloning) so we need to update
>> > atime on the source file as well.
>> > 
>> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
>> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>> > ---
>> >  fs/read_write.c | 10 ++++++++++
>> >  1 file changed, 10 insertions(+)
>> > 
>> > diff --git a/fs/read_write.c b/fs/read_write.c
>> > index e16bcafc0da2..4b23a86aacd9 100644
>> > --- a/fs/read_write.c
>> > +++ b/fs/read_write.c
>> > @@ -1576,6 +1576,16 @@ int generic_copy_file_range_prep(struct file *file_in, struct file *file_out)
>> >  
>> >  	WARN_ON_ONCE(!inode_is_locked(file_inode(file_out)));
>> >  
>> > +	/* Update source timestamps, because we are accessing file data */
>> > +	file_accessed(file_in);
>> > +
>> > +	/* Update destination timestamps, since we can alter file contents. */
>> > +	if (!(file_out->f_mode & FMODE_NOCMTIME)) {
>> > +		ret = file_update_time(file_out);
>> > +		if (ret)
>> > +			return ret;
>> > +	}
>> > +
>> 
>> Is this the right place for updating the timestamps?  I see that in same
>> cases we may be updating the timestamp even if there was an error and no
>> copy was performed.  For example, if file_remove_privs fails.
>
> It's the same place we do it for read - file_accessed() is called
> before we do the IO - and the same place for write -
> file_update_time() is called before we copy data into the pagecache
> or do direct IO. As such, it really doesn't matter if it is before
> or after file_remove_privs() - the IO can still fail for many
> reasons after we've updated the timestamps and in some of the
> failure cases (e.g. we failed the sync at the end of an O_DSYNC
> buffered write) we still want the timestamps to be modified because
> the data and/or user visible metadata /may/ have been changed.
>
> cfr operates under the same constraints as read() and write(), so we
> need to update the timestamps up front regardless of whether the
> copy ends up succeeding or not....

Great, thanks for explaining it.  It now makes sense, even for
consistency, to have this operation here.

Cheers,
Darrick J. Wong May 28, 2019, 4:30 p.m. UTC | #4
On Sun, May 26, 2019 at 09:10:57AM +0300, Amir Goldstein wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Timestamps are not updated right now, so programs looking for
> timestamp updates for file modifications (like rsync) will not
> detect that files have changed. We are also accessing the source
> data when doing a copy (but not when cloning) so we need to update
> atime on the source file as well.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>  fs/read_write.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/fs/read_write.c b/fs/read_write.c
> index e16bcafc0da2..4b23a86aacd9 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1576,6 +1576,16 @@ int generic_copy_file_range_prep(struct file *file_in, struct file *file_out)
>  
>  	WARN_ON_ONCE(!inode_is_locked(file_inode(file_out)));
>  
> +	/* Update source timestamps, because we are accessing file data */
> +	file_accessed(file_in);
> +
> +	/* Update destination timestamps, since we can alter file contents. */
> +	if (!(file_out->f_mode & FMODE_NOCMTIME)) {
> +		ret = file_update_time(file_out);
> +		if (ret)
> +			return ret;
> +	}
> +
>  	/*
>  	 * Clear the security bits if the process is not being run by root.
>  	 * This keeps people from modifying setuid and setgid binaries.

Should the file_update_time and file_remove_privs calls be factored into
a separate small function to be called by generic_{copy,remap}_range_prep?

Looks ok otherwise,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> -- 
> 2.17.1
>
Amir Goldstein May 28, 2019, 4:36 p.m. UTC | #5
On Tue, May 28, 2019 at 7:30 PM Darrick J. Wong <darrick.wong@oracle.com> wrote:
>
> On Sun, May 26, 2019 at 09:10:57AM +0300, Amir Goldstein wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > Timestamps are not updated right now, so programs looking for
> > timestamp updates for file modifications (like rsync) will not
> > detect that files have changed. We are also accessing the source
> > data when doing a copy (but not when cloning) so we need to update
> > atime on the source file as well.
> >
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > ---
> >  fs/read_write.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/fs/read_write.c b/fs/read_write.c
> > index e16bcafc0da2..4b23a86aacd9 100644
> > --- a/fs/read_write.c
> > +++ b/fs/read_write.c
> > @@ -1576,6 +1576,16 @@ int generic_copy_file_range_prep(struct file *file_in, struct file *file_out)
> >
> >       WARN_ON_ONCE(!inode_is_locked(file_inode(file_out)));
> >
> > +     /* Update source timestamps, because we are accessing file data */
> > +     file_accessed(file_in);
> > +
> > +     /* Update destination timestamps, since we can alter file contents. */
> > +     if (!(file_out->f_mode & FMODE_NOCMTIME)) {
> > +             ret = file_update_time(file_out);
> > +             if (ret)
> > +                     return ret;
> > +     }
> > +
> >       /*
> >        * Clear the security bits if the process is not being run by root.
> >        * This keeps people from modifying setuid and setgid binaries.
>
> Should the file_update_time and file_remove_privs calls be factored into
> a separate small function to be called by generic_{copy,remap}_range_prep?
>

Ok. same helper could be called also after copy when inode is not locked
throughout the copy.

Thanks,
Amir.
diff mbox series

Patch

diff --git a/fs/read_write.c b/fs/read_write.c
index e16bcafc0da2..4b23a86aacd9 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1576,6 +1576,16 @@  int generic_copy_file_range_prep(struct file *file_in, struct file *file_out)
 
 	WARN_ON_ONCE(!inode_is_locked(file_inode(file_out)));
 
+	/* Update source timestamps, because we are accessing file data */
+	file_accessed(file_in);
+
+	/* Update destination timestamps, since we can alter file contents. */
+	if (!(file_out->f_mode & FMODE_NOCMTIME)) {
+		ret = file_update_time(file_out);
+		if (ret)
+			return ret;
+	}
+
 	/*
 	 * Clear the security bits if the process is not being run by root.
 	 * This keeps people from modifying setuid and setgid binaries.