Message ID | 20181003020342.GD19324@magnolia (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | xfs: zero posteof blocks when cloning above eof | expand |
On 10/2/18 9:03 PM, Darrick J. Wong wrote: > From: Darrick J. Wong <darrick.wong@oracle.com> > > When we're reflinking between two files and the destination file range > is well beyond the destination file's EOF marker, zero any posteof > speculative preallocations in the destination file so that we don't > expose stale disk contents. The previous strategy of trying to clear > the preallocations does not work if the destination file has the > PREALLOC flag set but no delalloc blocks. > > Uncovered by shared/010. > > Reported-by: Zorro Lang <zlang@redhat.com> > Bugzilla-id: https://bugzilla.kernel.org/show_bug.cgi?id=201259 > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> The action makes sense, and this does resolve my simple testcase, and makes shared/010 pass for me as well. However, this makes my correctness spidey-sense tingle; why is there a new helper unique to extending reflinks, when extending writes already must do the same thing? I didn't follow all the discussion on IRC, but might be worth explaining on the list for others as well. Are there any other extending write tests that aren't happening for extending reflink? -Eric > --- > fs/xfs/xfs_reflink.c | 34 ++++++++++++++++++++++++++-------- > 1 file changed, 26 insertions(+), 8 deletions(-) > > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c > index 38f405415b88..c8e996a99a74 100644 > --- a/fs/xfs/xfs_reflink.c > +++ b/fs/xfs/xfs_reflink.c > @@ -1195,6 +1195,27 @@ xfs_iolock_two_inodes_and_break_layout( > return 0; > } > > +/* > + * If we're reflinking to a point past the destination file's EOF, we must > + * zero any speculative post-EOF preallocations that sit between the old EOF > + * and the destination file offset. > + */ > +static int > +xfs_reflink_zero_posteof( > + struct xfs_inode *ip, > + loff_t pos) > +{ > + loff_t isize = i_size_read(VFS_I(ip)); > + bool did_zeroing = false; > + > + if (pos <= isize) > + return 0; > + > + trace_xfs_zero_eof(ip, isize, pos - isize); > + return iomap_zero_range(VFS_I(ip), isize, pos - isize, &did_zeroing, > + &xfs_iomap_ops); > +} > + > /* > * Link a range of blocks from one file to another. > */ > @@ -1257,15 +1278,12 @@ xfs_reflink_remap_range( > trace_xfs_reflink_remap_range(src, pos_in, len, dest, pos_out); > > /* > - * Clear out post-eof preallocations because we don't have page cache > - * backing the delayed allocations and they'll never get freed on > - * their own. > + * Zero existing post-eof speculative preallocations in the destination > + * file. > */ > - if (xfs_can_free_eofblocks(dest, true)) { > - ret = xfs_free_eofblocks(dest); > - if (ret) > - goto out_unlock; > - } > + ret = xfs_reflink_zero_posteof(dest, pos_out); > + if (ret) > + goto out_unlock; > > /* Set flags and remap blocks. */ > ret = xfs_reflink_set_inode_flag(src, dest); >
On Tue, Oct 02, 2018 at 07:03:42PM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <darrick.wong@oracle.com> > > When we're reflinking between two files and the destination file range > is well beyond the destination file's EOF marker, zero any posteof > speculative preallocations in the destination file so that we don't > expose stale disk contents. The previous strategy of trying to clear > the preallocations does not work if the destination file has the > PREALLOC flag set but no delalloc blocks. > > Uncovered by shared/010. > > Reported-by: Zorro Lang <zlang@redhat.com> > Bugzilla-id: https://bugzilla.kernel.org/show_bug.cgi?id=201259 > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > --- > fs/xfs/xfs_reflink.c | 34 ++++++++++++++++++++++++++-------- > 1 file changed, 26 insertions(+), 8 deletions(-) > > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c > index 38f405415b88..c8e996a99a74 100644 > --- a/fs/xfs/xfs_reflink.c > +++ b/fs/xfs/xfs_reflink.c > @@ -1195,6 +1195,27 @@ xfs_iolock_two_inodes_and_break_layout( > return 0; > } > > +/* > + * If we're reflinking to a point past the destination file's EOF, we must > + * zero any speculative post-EOF preallocations that sit between the old EOF > + * and the destination file offset. > + */ > +static int > +xfs_reflink_zero_posteof( > + struct xfs_inode *ip, > + loff_t pos) > +{ > + loff_t isize = i_size_read(VFS_I(ip)); > + bool did_zeroing = false; > + > + if (pos <= isize) > + return 0; > + > + trace_xfs_zero_eof(ip, isize, pos - isize); > + return iomap_zero_range(VFS_I(ip), isize, pos - isize, &did_zeroing, > + &xfs_iomap_ops); iomap_zero_range() accepts NULL for the *did_zero param. Otherwise seems fine, barring Eric's question on whether we need additional checks.. Brian > +} > + > /* > * Link a range of blocks from one file to another. > */ > @@ -1257,15 +1278,12 @@ xfs_reflink_remap_range( > trace_xfs_reflink_remap_range(src, pos_in, len, dest, pos_out); > > /* > - * Clear out post-eof preallocations because we don't have page cache > - * backing the delayed allocations and they'll never get freed on > - * their own. > + * Zero existing post-eof speculative preallocations in the destination > + * file. > */ > - if (xfs_can_free_eofblocks(dest, true)) { > - ret = xfs_free_eofblocks(dest); > - if (ret) > - goto out_unlock; > - } > + ret = xfs_reflink_zero_posteof(dest, pos_out); > + if (ret) > + goto out_unlock; > > /* Set flags and remap blocks. */ > ret = xfs_reflink_set_inode_flag(src, dest);
On Wed, Oct 03, 2018 at 07:11:14AM -0500, Eric Sandeen wrote: > On 10/2/18 9:03 PM, Darrick J. Wong wrote: > > From: Darrick J. Wong <darrick.wong@oracle.com> > > > > When we're reflinking between two files and the destination file range > > is well beyond the destination file's EOF marker, zero any posteof > > speculative preallocations in the destination file so that we don't > > expose stale disk contents. The previous strategy of trying to clear > > the preallocations does not work if the destination file has the > > PREALLOC flag set but no delalloc blocks. > > > > Uncovered by shared/010. > > > > Reported-by: Zorro Lang <zlang@redhat.com> > > Bugzilla-id: https://bugzilla.kernel.org/show_bug.cgi?id=201259 > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > > The action makes sense, and this does resolve my simple testcase, > and makes shared/010 pass for me as well. > > However, this makes my correctness spidey-sense tingle; why is there a > new helper unique to extending reflinks, when extending writes already > must do the same thing? I think you're referring to Dave's earlier question of "Why don't you just use xfs_file_aio_write_checks?" It's tempting to adapt xfs_file_aio_write_checks for reflink, but I think I have to create a new function because (a) we don't have a kiocb to pass in, and (b) we have to lock two inodes for reflink while abiding the [VX]FS inode locking rules and making sure we break the destination flie's layout correctly. > I didn't follow all the discussion on IRC, but might be worth > explaining on the list for others as well. Are there any other > extending write tests that aren't happening for extending reflink? Yes, there are a number of behavioral inconsistencies between regular write and clonerange that have been discovered in the past few days, and it's going to take me a few days to clean all of this up: - Lack of file_update_times(), though the ctime update is open-coded in the reflink routines. - Lack of file_remove_privs() to drop suid and capabilities on write. Totally missing from the btrfs implementation and xfs/ocfs2 followed that behavior warts and all. - Lack of RLIMIT_FSIZE checking: D'oh. Same lame excuse as above. - Lack of MAX_NON_LFS size checking: Same. - Lack of s_maxbytes checking: Same. Alarming since this means we can reflink to offsets the pagecache doesn't support. - Should our clonerange return bytes reflinked to copy_file_range? That last one requires more careful consideration & will take longer; the first two are nearly ready. --D > -Eric > > > --- > > fs/xfs/xfs_reflink.c | 34 ++++++++++++++++++++++++++-------- > > 1 file changed, 26 insertions(+), 8 deletions(-) > > > > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c > > index 38f405415b88..c8e996a99a74 100644 > > --- a/fs/xfs/xfs_reflink.c > > +++ b/fs/xfs/xfs_reflink.c > > @@ -1195,6 +1195,27 @@ xfs_iolock_two_inodes_and_break_layout( > > return 0; > > } > > > > +/* > > + * If we're reflinking to a point past the destination file's EOF, we must > > + * zero any speculative post-EOF preallocations that sit between the old EOF > > + * and the destination file offset. > > + */ > > +static int > > +xfs_reflink_zero_posteof( > > + struct xfs_inode *ip, > > + loff_t pos) > > +{ > > + loff_t isize = i_size_read(VFS_I(ip)); > > + bool did_zeroing = false; > > + > > + if (pos <= isize) > > + return 0; > > + > > + trace_xfs_zero_eof(ip, isize, pos - isize); > > + return iomap_zero_range(VFS_I(ip), isize, pos - isize, &did_zeroing, > > + &xfs_iomap_ops); > > +} > > + > > /* > > * Link a range of blocks from one file to another. > > */ > > @@ -1257,15 +1278,12 @@ xfs_reflink_remap_range( > > trace_xfs_reflink_remap_range(src, pos_in, len, dest, pos_out); > > > > /* > > - * Clear out post-eof preallocations because we don't have page cache > > - * backing the delayed allocations and they'll never get freed on > > - * their own. > > + * Zero existing post-eof speculative preallocations in the destination > > + * file. > > */ > > - if (xfs_can_free_eofblocks(dest, true)) { > > - ret = xfs_free_eofblocks(dest); > > - if (ret) > > - goto out_unlock; > > - } > > + ret = xfs_reflink_zero_posteof(dest, pos_out); > > + if (ret) > > + goto out_unlock; > > > > /* Set flags and remap blocks. */ > > ret = xfs_reflink_set_inode_flag(src, dest); > >
On Wed, Oct 03, 2018 at 08:20:27AM -0400, Brian Foster wrote: > On Tue, Oct 02, 2018 at 07:03:42PM -0700, Darrick J. Wong wrote: > > From: Darrick J. Wong <darrick.wong@oracle.com> > > > > When we're reflinking between two files and the destination file range > > is well beyond the destination file's EOF marker, zero any posteof > > speculative preallocations in the destination file so that we don't > > expose stale disk contents. The previous strategy of trying to clear > > the preallocations does not work if the destination file has the > > PREALLOC flag set but no delalloc blocks. > > > > Uncovered by shared/010. > > > > Reported-by: Zorro Lang <zlang@redhat.com> > > Bugzilla-id: https://bugzilla.kernel.org/show_bug.cgi?id=201259 > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > > --- > > fs/xfs/xfs_reflink.c | 34 ++++++++++++++++++++++++++-------- > > 1 file changed, 26 insertions(+), 8 deletions(-) > > > > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c > > index 38f405415b88..c8e996a99a74 100644 > > --- a/fs/xfs/xfs_reflink.c > > +++ b/fs/xfs/xfs_reflink.c > > @@ -1195,6 +1195,27 @@ xfs_iolock_two_inodes_and_break_layout( > > return 0; > > } > > > > +/* > > + * If we're reflinking to a point past the destination file's EOF, we must > > + * zero any speculative post-EOF preallocations that sit between the old EOF > > + * and the destination file offset. > > + */ > > +static int > > +xfs_reflink_zero_posteof( > > + struct xfs_inode *ip, > > + loff_t pos) > > +{ > > + loff_t isize = i_size_read(VFS_I(ip)); > > + bool did_zeroing = false; > > + > > + if (pos <= isize) > > + return 0; > > + > > + trace_xfs_zero_eof(ip, isize, pos - isize); > > + return iomap_zero_range(VFS_I(ip), isize, pos - isize, &did_zeroing, > > + &xfs_iomap_ops); > > iomap_zero_range() accepts NULL for the *did_zero param. Otherwise seems > fine, barring Eric's question on whether we need additional checks.. Oh, right, it does. Will fix, thank you! --D > Brian > > > +} > > + > > /* > > * Link a range of blocks from one file to another. > > */ > > @@ -1257,15 +1278,12 @@ xfs_reflink_remap_range( > > trace_xfs_reflink_remap_range(src, pos_in, len, dest, pos_out); > > > > /* > > - * Clear out post-eof preallocations because we don't have page cache > > - * backing the delayed allocations and they'll never get freed on > > - * their own. > > + * Zero existing post-eof speculative preallocations in the destination > > + * file. > > */ > > - if (xfs_can_free_eofblocks(dest, true)) { > > - ret = xfs_free_eofblocks(dest); > > - if (ret) > > - goto out_unlock; > > - } > > + ret = xfs_reflink_zero_posteof(dest, pos_out); > > + if (ret) > > + goto out_unlock; > > > > /* Set flags and remap blocks. */ > > ret = xfs_reflink_set_inode_flag(src, dest);
On 10/3/18 10:12 AM, Darrick J. Wong wrote: > On Wed, Oct 03, 2018 at 07:11:14AM -0500, Eric Sandeen wrote: >> On 10/2/18 9:03 PM, Darrick J. Wong wrote: >>> From: Darrick J. Wong <darrick.wong@oracle.com> >>> >>> When we're reflinking between two files and the destination file range >>> is well beyond the destination file's EOF marker, zero any posteof >>> speculative preallocations in the destination file so that we don't >>> expose stale disk contents. The previous strategy of trying to clear >>> the preallocations does not work if the destination file has the >>> PREALLOC flag set but no delalloc blocks. >>> >>> Uncovered by shared/010. >>> >>> Reported-by: Zorro Lang <zlang@redhat.com> >>> Bugzilla-id: https://bugzilla.kernel.org/show_bug.cgi?id=201259 >>> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> >> >> The action makes sense, and this does resolve my simple testcase, >> and makes shared/010 pass for me as well. >> >> However, this makes my correctness spidey-sense tingle; why is there a >> new helper unique to extending reflinks, when extending writes already >> must do the same thing? > > I think you're referring to Dave's earlier question of "Why don't you > just use xfs_file_aio_write_checks?" > > It's tempting to adapt xfs_file_aio_write_checks for reflink, but I > think I have to create a new function because (a) we don't have a kiocb > to pass in, and (b) we have to lock two inodes for reflink while abiding > the [VX]FS inode locking rules and making sure we break the destination > flie's layout correctly. > >> I didn't follow all the discussion on IRC, but might be worth >> explaining on the list for others as well. Are there any other >> extending write tests that aren't happening for extending reflink? > > Yes, there are a number of behavioral inconsistencies between regular > write and clonerange that have been discovered in the past few days, and > it's going to take me a few days to clean all of this up: > > - Lack of file_update_times(), though the ctime update is open-coded in > the reflink routines. > > - Lack of file_remove_privs() to drop suid and capabilities on write. > Totally missing from the btrfs implementation and xfs/ocfs2 followed > that behavior warts and all. > > - Lack of RLIMIT_FSIZE checking: D'oh. Same lame excuse as above. > > - Lack of MAX_NON_LFS size checking: Same. > > - Lack of s_maxbytes checking: Same. Alarming since this means we can > reflink to offsets the pagecache doesn't support. > > - Should our clonerange return bytes reflinked to copy_file_range? > > That last one requires more careful consideration & will take longer; > the first two are nearly ready. Ok, so let's say something like "this patch looks good as far as it goes, but as you work out these other issues, please consider code structure so that requirements which are common to extending write & extending reflink are done in common code rather than cut & pasted?" :) -Eric
On Wed, Oct 03, 2018 at 10:35:50AM -0500, Eric Sandeen wrote: > On 10/3/18 10:12 AM, Darrick J. Wong wrote: > > On Wed, Oct 03, 2018 at 07:11:14AM -0500, Eric Sandeen wrote: > >> On 10/2/18 9:03 PM, Darrick J. Wong wrote: > >>> From: Darrick J. Wong <darrick.wong@oracle.com> > >>> > >>> When we're reflinking between two files and the destination file range > >>> is well beyond the destination file's EOF marker, zero any posteof > >>> speculative preallocations in the destination file so that we don't > >>> expose stale disk contents. The previous strategy of trying to clear > >>> the preallocations does not work if the destination file has the > >>> PREALLOC flag set but no delalloc blocks. > >>> > >>> Uncovered by shared/010. > >>> > >>> Reported-by: Zorro Lang <zlang@redhat.com> > >>> Bugzilla-id: https://bugzilla.kernel.org/show_bug.cgi?id=201259 > >>> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > >> > >> The action makes sense, and this does resolve my simple testcase, > >> and makes shared/010 pass for me as well. > >> > >> However, this makes my correctness spidey-sense tingle; why is there a > >> new helper unique to extending reflinks, when extending writes already > >> must do the same thing? > > > > I think you're referring to Dave's earlier question of "Why don't you > > just use xfs_file_aio_write_checks?" > > > > It's tempting to adapt xfs_file_aio_write_checks for reflink, but I > > think I have to create a new function because (a) we don't have a kiocb > > to pass in, and (b) we have to lock two inodes for reflink while abiding > > the [VX]FS inode locking rules and making sure we break the destination > > flie's layout correctly. > > > >> I didn't follow all the discussion on IRC, but might be worth > >> explaining on the list for others as well. Are there any other > >> extending write tests that aren't happening for extending reflink? > > > > Yes, there are a number of behavioral inconsistencies between regular > > write and clonerange that have been discovered in the past few days, and > > it's going to take me a few days to clean all of this up: > > > > - Lack of file_update_times(), though the ctime update is open-coded in > > the reflink routines. > > > > - Lack of file_remove_privs() to drop suid and capabilities on write. > > Totally missing from the btrfs implementation and xfs/ocfs2 followed > > that behavior warts and all. > > > > - Lack of RLIMIT_FSIZE checking: D'oh. Same lame excuse as above. > > > > - Lack of MAX_NON_LFS size checking: Same. > > > > - Lack of s_maxbytes checking: Same. Alarming since this means we can > > reflink to offsets the pagecache doesn't support. > > > > - Should our clonerange return bytes reflinked to copy_file_range? > > > > That last one requires more careful consideration & will take longer; > > the first two are nearly ready. > > Ok, so let's say something like "this patch looks good as far as it goes, > but as you work out these other issues, please consider code structure > so that requirements which are common to extending write & extending reflink > are done in common code rather than cut & pasted?" :) The scattershot approach sucks, yes. I'm concentrating for now on fixing the glaring holes and anticipate adding a final patch to pull everything into a common xfs_reflink_clone_file_prep function that takes both inodes and does whatever checking and prep work are needed (like xfs_file_aio_write_checks) so that when it returns, the two files are ready for xfs_reflink_remap_blocks. --D > -Eric
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 38f405415b88..c8e996a99a74 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -1195,6 +1195,27 @@ xfs_iolock_two_inodes_and_break_layout( return 0; } +/* + * If we're reflinking to a point past the destination file's EOF, we must + * zero any speculative post-EOF preallocations that sit between the old EOF + * and the destination file offset. + */ +static int +xfs_reflink_zero_posteof( + struct xfs_inode *ip, + loff_t pos) +{ + loff_t isize = i_size_read(VFS_I(ip)); + bool did_zeroing = false; + + if (pos <= isize) + return 0; + + trace_xfs_zero_eof(ip, isize, pos - isize); + return iomap_zero_range(VFS_I(ip), isize, pos - isize, &did_zeroing, + &xfs_iomap_ops); +} + /* * Link a range of blocks from one file to another. */ @@ -1257,15 +1278,12 @@ xfs_reflink_remap_range( trace_xfs_reflink_remap_range(src, pos_in, len, dest, pos_out); /* - * Clear out post-eof preallocations because we don't have page cache - * backing the delayed allocations and they'll never get freed on - * their own. + * Zero existing post-eof speculative preallocations in the destination + * file. */ - if (xfs_can_free_eofblocks(dest, true)) { - ret = xfs_free_eofblocks(dest); - if (ret) - goto out_unlock; - } + ret = xfs_reflink_zero_posteof(dest, pos_out); + if (ret) + goto out_unlock; /* Set flags and remap blocks. */ ret = xfs_reflink_set_inode_flag(src, dest);