Message ID | 20200402041705.GD80283@magnolia (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | xfs: reflink should force the log out if mounted with wsync | expand |
On Wed, Apr 01, 2020 at 09:17:05PM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <darrick.wong@oracle.com> > > Reflink should force the log out to disk if the filesystem was mounted > with wsync, the same as most other operations in xfs. Looks reasonable. That being said I really hate the way we handle this - I've been wanting to rework the wsync/dirsync code to just mark as transaction as dirsync or wsync and then let xfs_trans_commit handle checking if the file system is mounted with the option to clean this mess up. Let me see if I could resurrect that quickly.
On Thu, Apr 02, 2020 at 12:51:08AM -0700, Christoph Hellwig wrote: > On Wed, Apr 01, 2020 at 09:17:05PM -0700, Darrick J. Wong wrote: > > From: Darrick J. Wong <darrick.wong@oracle.com> > > > > Reflink should force the log out to disk if the filesystem was mounted > > with wsync, the same as most other operations in xfs. > > Looks reasonable. That being said I really hate the way we handle > this - I've been wanting to rework the wsync/dirsync code to just mark > as transaction as dirsync or wsync and then let xfs_trans_commit handle > checking if the file system is mounted with the option to clean this > mess up. Let me see if I could resurrect that quickly. Resurrected and under testing now. While forward porting your patch I noticed it could be much simpler even without the refactor by just using xfs_trans_set_sync. The downside of that is that the log force is under the inode locks, but so are the log forces for all other wysnc induced log forces. So I think you should just submit this in the simplified version matching the rest of the wsync users as a fix. If we want to optimize it later on that should be done as a separate patch and for all wsync/dirsync users.
On Thu, Apr 02, 2020 at 01:49:30AM -0700, Christoph Hellwig wrote: > On Thu, Apr 02, 2020 at 12:51:08AM -0700, Christoph Hellwig wrote: > > On Wed, Apr 01, 2020 at 09:17:05PM -0700, Darrick J. Wong wrote: > > > From: Darrick J. Wong <darrick.wong@oracle.com> > > > > > > Reflink should force the log out to disk if the filesystem was mounted > > > with wsync, the same as most other operations in xfs. > > > > Looks reasonable. That being said I really hate the way we handle > > this - I've been wanting to rework the wsync/dirsync code to just mark > > as transaction as dirsync or wsync and then let xfs_trans_commit handle > > checking if the file system is mounted with the option to clean this > > mess up. Let me see if I could resurrect that quickly. > > Resurrected and under testing now. While forward porting your patch > I noticed it could be much simpler even without the refactor by just > using xfs_trans_set_sync. The downside of that is that the log force > is under the inode locks, but so are the log forces for all other wysnc > induced log forces. So I think you should just submit this in the > simplified version matching the rest of the wsync users as a fix. If > we want to optimize it later on that should be done as a separate patch > and for all wsync/dirsync users. Can you please send in a Reviewed-by so I can get this moving? :) --D
On Thu, Apr 02, 2020 at 07:53:44AM -0700, Darrick J. Wong wrote: > > > Looks reasonable. That being said I really hate the way we handle > > > this - I've been wanting to rework the wsync/dirsync code to just mark > > > as transaction as dirsync or wsync and then let xfs_trans_commit handle > > > checking if the file system is mounted with the option to clean this > > > mess up. Let me see if I could resurrect that quickly. > > > > Resurrected and under testing now. While forward porting your patch > > I noticed it could be much simpler even without the refactor by just > > using xfs_trans_set_sync. The downside of that is that the log force > > is under the inode locks, but so are the log forces for all other wysnc > > induced log forces. So I think you should just submit this in the > > simplified version matching the rest of the wsync users as a fix. If > > we want to optimize it later on that should be done as a separate patch > > and for all wsync/dirsync users. > > Can you please send in a Reviewed-by so I can get this moving? :) In case the above wasn't clear: I don't think this is the right way to go. Just to fix the reflink vs wsync bug I think we want a one-liner like this: diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index b0ce04ffd3cd..e2cc7b84ca6c 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -948,6 +948,7 @@ xfs_reflink_update_dest( xfs_trans_log_inode(tp, dest, XFS_ILOG_CORE); + xfs_trans_set_sync(tp); error = xfs_trans_commit(tp); if (error) goto out_error;
On Thu, Apr 02, 2020 at 07:56:48AM -0700, Christoph Hellwig wrote: > On Thu, Apr 02, 2020 at 07:53:44AM -0700, Darrick J. Wong wrote: > > > > Looks reasonable. That being said I really hate the way we handle > > > > this - I've been wanting to rework the wsync/dirsync code to just mark > > > > as transaction as dirsync or wsync and then let xfs_trans_commit handle > > > > checking if the file system is mounted with the option to clean this > > > > mess up. Let me see if I could resurrect that quickly. > > > > > > Resurrected and under testing now. While forward porting your patch > > > I noticed it could be much simpler even without the refactor by just > > > using xfs_trans_set_sync. The downside of that is that the log force > > > is under the inode locks, but so are the log forces for all other wysnc > > > induced log forces. So I think you should just submit this in the > > > simplified version matching the rest of the wsync users as a fix. If > > > we want to optimize it later on that should be done as a separate patch > > > and for all wsync/dirsync users. > > > > Can you please send in a Reviewed-by so I can get this moving? :) > > In case the above wasn't clear: I don't think this is the right way > to go. Just to fix the reflink vs wsync bug I think we want a > one-liner like this: Sorry, I thought "you should just submit this in the simplified version" was referring to the first patch I sent, as opposed to the cleanups you're testing. > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c > index b0ce04ffd3cd..e2cc7b84ca6c 100644 > --- a/fs/xfs/xfs_reflink.c > +++ b/fs/xfs/xfs_reflink.c > @@ -948,6 +948,7 @@ xfs_reflink_update_dest( > > xfs_trans_log_inode(tp, dest, XFS_ILOG_CORE); > > + xfs_trans_set_sync(tp); This isn't enough because this is only the last transaction in the reflink sequence if we have to set the destination inode's size. If (say) we're reflinking a range inside EOF of two files that were already sharing blocks, we still won't force the log out. The other thing I thought of was simply invoking fsync after dropping the iolock, but that seemed like more work than was strictly necessary to land the reflink transactions on disk. --D > error = xfs_trans_commit(tp); > if (error) > goto out_error;
On Thu, Apr 02, 2020 at 08:14:31AM -0700, Darrick J. Wong wrote: > This isn't enough because this is only the last transaction in the > reflink sequence if we have to set the destination inode's size. If > (say) we're reflinking a range inside EOF of two files that were already > sharing blocks, we still won't force the log out. > > The other thing I thought of was simply invoking fsync after dropping > the iolock, but that seemed like more work than was strictly necessary > to land the reflink transactions on disk. Well, we have a lightweight version of fsync doing just that. In fact we have the same lightweight version twice: xfs_dir_fsync and xfs_fs_nfs_commit_metadata.
On Wed, Apr 01, 2020 at 09:17:05PM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <darrick.wong@oracle.com> > > Reflink should force the log out to disk if the filesystem was mounted > with wsync, the same as most other operations in xfs. > > Fixes: 3fc9f5e409319 ("xfs: remove xfs_reflink_remap_range") > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > --- > fs/xfs/xfs_file.c | 21 +++++++++++++++++++-- > fs/xfs/xfs_reflink.c | 15 ++++++++++++++- > fs/xfs/xfs_reflink.h | 3 ++- > 3 files changed, 35 insertions(+), 4 deletions(-) > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > index b8a4a3f29b36..17bdc5bbc2ae 100644 > --- a/fs/xfs/xfs_file.c > +++ b/fs/xfs/xfs_file.c > @@ -1029,8 +1029,10 @@ xfs_file_remap_range( > struct inode *inode_out = file_inode(file_out); > struct xfs_inode *dest = XFS_I(inode_out); > struct xfs_mount *mp = src->i_mount; > + xfs_lsn_t sync_lsn = 0; > loff_t remapped = 0; > xfs_extlen_t cowextsize; > + bool need_sync; > int ret; > > if (remap_flags & ~(REMAP_FILE_DEDUP | REMAP_FILE_ADVISORY)) > @@ -1068,13 +1070,28 @@ xfs_file_remap_range( > cowextsize = src->i_d.di_cowextsize; > > ret = xfs_reflink_update_dest(dest, pos_out + len, cowextsize, > - remap_flags); > + remap_flags, &need_sync); > + if (ret) > + goto out_unlock; > + > + /* > + * If this is a synchronous mount and xfs_reflink_update_dest didn't > + * already take care of this, make sure that the transaction goes to > + * disk before returning to the user. > + */ > + if (need_sync && xfs_ipincount(dest)) > + sync_lsn = dest->i_itemp->ili_last_lsn; > > out_unlock: > xfs_reflink_remap_unlock(file_in, file_out); > if (ret) > trace_xfs_reflink_remap_range_error(dest, ret, _RET_IP_); > - return remapped > 0 ? remapped : ret; > + if (remapped > 0) { > + if (sync_lsn) > + xfs_log_force_lsn(mp, sync_lsn, XFS_LOG_SYNC, NULL); > + return remapped; > + } > + return ret; This seems pretty fragile compared to all the other WSYNC cases which just set the last transaction as a sync transaction. Why can't we just set the transaction sync at the appropriate time, and if we have to do two sync commits for a reflink then we just suck it up for now? As it is, wsync is really only used for active/passive HA configs these days, which means we've already given up on performance because data integrity is an essential requirement in those configs... Cheers, Dave.
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index b8a4a3f29b36..17bdc5bbc2ae 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -1029,8 +1029,10 @@ xfs_file_remap_range( struct inode *inode_out = file_inode(file_out); struct xfs_inode *dest = XFS_I(inode_out); struct xfs_mount *mp = src->i_mount; + xfs_lsn_t sync_lsn = 0; loff_t remapped = 0; xfs_extlen_t cowextsize; + bool need_sync; int ret; if (remap_flags & ~(REMAP_FILE_DEDUP | REMAP_FILE_ADVISORY)) @@ -1068,13 +1070,28 @@ xfs_file_remap_range( cowextsize = src->i_d.di_cowextsize; ret = xfs_reflink_update_dest(dest, pos_out + len, cowextsize, - remap_flags); + remap_flags, &need_sync); + if (ret) + goto out_unlock; + + /* + * If this is a synchronous mount and xfs_reflink_update_dest didn't + * already take care of this, make sure that the transaction goes to + * disk before returning to the user. + */ + if (need_sync && xfs_ipincount(dest)) + sync_lsn = dest->i_itemp->ili_last_lsn; out_unlock: xfs_reflink_remap_unlock(file_in, file_out); if (ret) trace_xfs_reflink_remap_range_error(dest, ret, _RET_IP_); - return remapped > 0 ? remapped : ret; + if (remapped > 0) { + if (sync_lsn) + xfs_log_force_lsn(mp, sync_lsn, XFS_LOG_SYNC, NULL); + return remapped; + } + return ret; } STATIC int diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index b0ce04ffd3cd..4f148d58ff98 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -919,12 +919,19 @@ xfs_reflink_update_dest( struct xfs_inode *dest, xfs_off_t newlen, xfs_extlen_t cowextsize, - unsigned int remap_flags) + unsigned int remap_flags, + bool *need_sync) { struct xfs_mount *mp = dest->i_mount; struct xfs_trans *tp; int error; + /* + * If this is a synchronous mount, make sure that the + * transaction goes to disk before returning to the user. + */ + *need_sync = (mp->m_flags & XFS_MOUNT_WSYNC); + if (newlen <= i_size_read(VFS_I(dest)) && cowextsize == 0) return 0; @@ -948,6 +955,12 @@ xfs_reflink_update_dest( xfs_trans_log_inode(tp, dest, XFS_ILOG_CORE); + /* We can force the transaction to disk here. */ + if (*need_sync) { + xfs_trans_set_sync(tp); + *need_sync = false; + } + error = xfs_trans_commit(tp); if (error) goto out_error; diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h index 3e4fd46373ab..e9b505e0ad7a 100644 --- a/fs/xfs/xfs_reflink.h +++ b/fs/xfs/xfs_reflink.h @@ -55,7 +55,8 @@ extern int xfs_reflink_remap_blocks(struct xfs_inode *src, loff_t pos_in, struct xfs_inode *dest, loff_t pos_out, loff_t remap_len, loff_t *remapped); extern int xfs_reflink_update_dest(struct xfs_inode *dest, xfs_off_t newlen, - xfs_extlen_t cowextsize, unsigned int remap_flags); + xfs_extlen_t cowextsize, unsigned int remap_flags, + bool *need_sync); extern void xfs_reflink_remap_unlock(struct file *file_in, struct file *file_out);