diff mbox series

xfs: reflink should force the log out if mounted with wsync

Message ID 20200402041705.GD80283@magnolia (mailing list archive)
State Superseded
Headers show
Series xfs: reflink should force the log out if mounted with wsync | expand

Commit Message

Darrick J. Wong April 2, 2020, 4:17 a.m. UTC
From: Darrick J. Wong <darrick.wong@oracle.com>

Reflink should force the log out to disk if the filesystem was mounted
with wsync, the same as most other operations in xfs.

Fixes: 3fc9f5e409319 ("xfs: remove xfs_reflink_remap_range")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_file.c    |   21 +++++++++++++++++++--
 fs/xfs/xfs_reflink.c |   15 ++++++++++++++-
 fs/xfs/xfs_reflink.h |    3 ++-
 3 files changed, 35 insertions(+), 4 deletions(-)

Comments

Christoph Hellwig April 2, 2020, 7:51 a.m. UTC | #1
On Wed, Apr 01, 2020 at 09:17:05PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Reflink should force the log out to disk if the filesystem was mounted
> with wsync, the same as most other operations in xfs.

Looks reasonable.  That being said I really hate the way we handle
this - I've been wanting to rework the wsync/dirsync code to just mark
as transaction as dirsync or wsync and then let xfs_trans_commit handle
checking if the file system is mounted with the option to clean this
mess up.  Let me see if I could resurrect that quickly.
Christoph Hellwig April 2, 2020, 8:49 a.m. UTC | #2
On Thu, Apr 02, 2020 at 12:51:08AM -0700, Christoph Hellwig wrote:
> On Wed, Apr 01, 2020 at 09:17:05PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Reflink should force the log out to disk if the filesystem was mounted
> > with wsync, the same as most other operations in xfs.
> 
> Looks reasonable.  That being said I really hate the way we handle
> this - I've been wanting to rework the wsync/dirsync code to just mark
> as transaction as dirsync or wsync and then let xfs_trans_commit handle
> checking if the file system is mounted with the option to clean this
> mess up.  Let me see if I could resurrect that quickly.

Resurrected and under testing now.  While forward porting your patch
I noticed it could be much simpler even without the refactor by just
using xfs_trans_set_sync.  The downside of that is that the log force
is under the inode locks, but so are the log forces for all other wysnc
induced log forces.  So I think you should just submit this in the
simplified version matching the rest of the wsync users as a fix. If
we want to optimize it later on that should be done as a separate patch
and for all wsync/dirsync users.
Darrick J. Wong April 2, 2020, 2:53 p.m. UTC | #3
On Thu, Apr 02, 2020 at 01:49:30AM -0700, Christoph Hellwig wrote:
> On Thu, Apr 02, 2020 at 12:51:08AM -0700, Christoph Hellwig wrote:
> > On Wed, Apr 01, 2020 at 09:17:05PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > Reflink should force the log out to disk if the filesystem was mounted
> > > with wsync, the same as most other operations in xfs.
> > 
> > Looks reasonable.  That being said I really hate the way we handle
> > this - I've been wanting to rework the wsync/dirsync code to just mark
> > as transaction as dirsync or wsync and then let xfs_trans_commit handle
> > checking if the file system is mounted with the option to clean this
> > mess up.  Let me see if I could resurrect that quickly.
> 
> Resurrected and under testing now.  While forward porting your patch
> I noticed it could be much simpler even without the refactor by just
> using xfs_trans_set_sync.  The downside of that is that the log force
> is under the inode locks, but so are the log forces for all other wysnc
> induced log forces.  So I think you should just submit this in the
> simplified version matching the rest of the wsync users as a fix. If
> we want to optimize it later on that should be done as a separate patch
> and for all wsync/dirsync users.

Can you please send in a Reviewed-by so I can get this moving? :)

--D
Christoph Hellwig April 2, 2020, 2:56 p.m. UTC | #4
On Thu, Apr 02, 2020 at 07:53:44AM -0700, Darrick J. Wong wrote:
> > > Looks reasonable.  That being said I really hate the way we handle
> > > this - I've been wanting to rework the wsync/dirsync code to just mark
> > > as transaction as dirsync or wsync and then let xfs_trans_commit handle
> > > checking if the file system is mounted with the option to clean this
> > > mess up.  Let me see if I could resurrect that quickly.
> > 
> > Resurrected and under testing now.  While forward porting your patch
> > I noticed it could be much simpler even without the refactor by just
> > using xfs_trans_set_sync.  The downside of that is that the log force
> > is under the inode locks, but so are the log forces for all other wysnc
> > induced log forces.  So I think you should just submit this in the
> > simplified version matching the rest of the wsync users as a fix. If
> > we want to optimize it later on that should be done as a separate patch
> > and for all wsync/dirsync users.
> 
> Can you please send in a Reviewed-by so I can get this moving? :)

In case the above wasn't clear: I don't think this is the right way
to go.  Just to fix the reflink vs wsync bug I think we want a
one-liner like this:

diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index b0ce04ffd3cd..e2cc7b84ca6c 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -948,6 +948,7 @@ xfs_reflink_update_dest(
 
 	xfs_trans_log_inode(tp, dest, XFS_ILOG_CORE);
 
+	xfs_trans_set_sync(tp);
 	error = xfs_trans_commit(tp);
 	if (error)
 		goto out_error;
Darrick J. Wong April 2, 2020, 3:14 p.m. UTC | #5
On Thu, Apr 02, 2020 at 07:56:48AM -0700, Christoph Hellwig wrote:
> On Thu, Apr 02, 2020 at 07:53:44AM -0700, Darrick J. Wong wrote:
> > > > Looks reasonable.  That being said I really hate the way we handle
> > > > this - I've been wanting to rework the wsync/dirsync code to just mark
> > > > as transaction as dirsync or wsync and then let xfs_trans_commit handle
> > > > checking if the file system is mounted with the option to clean this
> > > > mess up.  Let me see if I could resurrect that quickly.
> > > 
> > > Resurrected and under testing now.  While forward porting your patch
> > > I noticed it could be much simpler even without the refactor by just
> > > using xfs_trans_set_sync.  The downside of that is that the log force
> > > is under the inode locks, but so are the log forces for all other wysnc
> > > induced log forces.  So I think you should just submit this in the
> > > simplified version matching the rest of the wsync users as a fix. If
> > > we want to optimize it later on that should be done as a separate patch
> > > and for all wsync/dirsync users.
> > 
> > Can you please send in a Reviewed-by so I can get this moving? :)
> 
> In case the above wasn't clear: I don't think this is the right way
> to go.  Just to fix the reflink vs wsync bug I think we want a
> one-liner like this:

Sorry, I thought "you should just submit this in the simplified version"
was referring to the first patch I sent, as opposed to the cleanups
you're testing.

> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index b0ce04ffd3cd..e2cc7b84ca6c 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -948,6 +948,7 @@ xfs_reflink_update_dest(
>  
>  	xfs_trans_log_inode(tp, dest, XFS_ILOG_CORE);
>  
> +	xfs_trans_set_sync(tp);

This isn't enough because this is only the last transaction in the
reflink sequence if we have to set the destination inode's size.  If
(say) we're reflinking a range inside EOF of two files that were already
sharing blocks, we still won't force the log out.

The other thing I thought of was simply invoking fsync after dropping
the iolock, but that seemed like more work than was strictly necessary
to land the reflink transactions on disk.

--D

>  	error = xfs_trans_commit(tp);
>  	if (error)
>  		goto out_error;
Christoph Hellwig April 2, 2020, 3:20 p.m. UTC | #6
On Thu, Apr 02, 2020 at 08:14:31AM -0700, Darrick J. Wong wrote:
> This isn't enough because this is only the last transaction in the
> reflink sequence if we have to set the destination inode's size.  If
> (say) we're reflinking a range inside EOF of two files that were already
> sharing blocks, we still won't force the log out.
> 
> The other thing I thought of was simply invoking fsync after dropping
> the iolock, but that seemed like more work than was strictly necessary
> to land the reflink transactions on disk.

Well, we have a lightweight version of fsync doing just that.  In
fact we have the same lightweight version twice: xfs_dir_fsync and
xfs_fs_nfs_commit_metadata.
Dave Chinner April 2, 2020, 9:55 p.m. UTC | #7
On Wed, Apr 01, 2020 at 09:17:05PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Reflink should force the log out to disk if the filesystem was mounted
> with wsync, the same as most other operations in xfs.
> 
> Fixes: 3fc9f5e409319 ("xfs: remove xfs_reflink_remap_range")
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/xfs_file.c    |   21 +++++++++++++++++++--
>  fs/xfs/xfs_reflink.c |   15 ++++++++++++++-
>  fs/xfs/xfs_reflink.h |    3 ++-
>  3 files changed, 35 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index b8a4a3f29b36..17bdc5bbc2ae 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -1029,8 +1029,10 @@ xfs_file_remap_range(
>  	struct inode		*inode_out = file_inode(file_out);
>  	struct xfs_inode	*dest = XFS_I(inode_out);
>  	struct xfs_mount	*mp = src->i_mount;
> +	xfs_lsn_t		sync_lsn = 0;
>  	loff_t			remapped = 0;
>  	xfs_extlen_t		cowextsize;
> +	bool			need_sync;
>  	int			ret;
>  
>  	if (remap_flags & ~(REMAP_FILE_DEDUP | REMAP_FILE_ADVISORY))
> @@ -1068,13 +1070,28 @@ xfs_file_remap_range(
>  		cowextsize = src->i_d.di_cowextsize;
>  
>  	ret = xfs_reflink_update_dest(dest, pos_out + len, cowextsize,
> -			remap_flags);
> +			remap_flags, &need_sync);
> +	if (ret)
> +		goto out_unlock;
> +
> +	/*
> +	 * If this is a synchronous mount and xfs_reflink_update_dest didn't
> +	 * already take care of this, make sure that the transaction goes to
> +	 * disk before returning to the user.
> +	 */
> +	if (need_sync && xfs_ipincount(dest))
> +		sync_lsn = dest->i_itemp->ili_last_lsn;
>  
>  out_unlock:
>  	xfs_reflink_remap_unlock(file_in, file_out);
>  	if (ret)
>  		trace_xfs_reflink_remap_range_error(dest, ret, _RET_IP_);
> -	return remapped > 0 ? remapped : ret;
> +	if (remapped > 0) {
> +		if (sync_lsn)
> +			xfs_log_force_lsn(mp, sync_lsn, XFS_LOG_SYNC, NULL);
> +		return remapped;
> +	}
> +	return ret;

This seems pretty fragile compared to all the other WSYNC cases
which just set the last transaction as a sync transaction.

Why can't we just set the transaction sync at the appropriate time,
and if we have to do two sync commits for a reflink then we just
suck it up for now?

As it is, wsync is really only used for active/passive HA configs
these days, which means we've already given up on performance
because data integrity is an essential requirement in those
configs...

Cheers,

Dave.
diff mbox series

Patch

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index b8a4a3f29b36..17bdc5bbc2ae 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1029,8 +1029,10 @@  xfs_file_remap_range(
 	struct inode		*inode_out = file_inode(file_out);
 	struct xfs_inode	*dest = XFS_I(inode_out);
 	struct xfs_mount	*mp = src->i_mount;
+	xfs_lsn_t		sync_lsn = 0;
 	loff_t			remapped = 0;
 	xfs_extlen_t		cowextsize;
+	bool			need_sync;
 	int			ret;
 
 	if (remap_flags & ~(REMAP_FILE_DEDUP | REMAP_FILE_ADVISORY))
@@ -1068,13 +1070,28 @@  xfs_file_remap_range(
 		cowextsize = src->i_d.di_cowextsize;
 
 	ret = xfs_reflink_update_dest(dest, pos_out + len, cowextsize,
-			remap_flags);
+			remap_flags, &need_sync);
+	if (ret)
+		goto out_unlock;
+
+	/*
+	 * If this is a synchronous mount and xfs_reflink_update_dest didn't
+	 * already take care of this, make sure that the transaction goes to
+	 * disk before returning to the user.
+	 */
+	if (need_sync && xfs_ipincount(dest))
+		sync_lsn = dest->i_itemp->ili_last_lsn;
 
 out_unlock:
 	xfs_reflink_remap_unlock(file_in, file_out);
 	if (ret)
 		trace_xfs_reflink_remap_range_error(dest, ret, _RET_IP_);
-	return remapped > 0 ? remapped : ret;
+	if (remapped > 0) {
+		if (sync_lsn)
+			xfs_log_force_lsn(mp, sync_lsn, XFS_LOG_SYNC, NULL);
+		return remapped;
+	}
+	return ret;
 }
 
 STATIC int
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index b0ce04ffd3cd..4f148d58ff98 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -919,12 +919,19 @@  xfs_reflink_update_dest(
 	struct xfs_inode	*dest,
 	xfs_off_t		newlen,
 	xfs_extlen_t		cowextsize,
-	unsigned int		remap_flags)
+	unsigned int		remap_flags,
+	bool			*need_sync)
 {
 	struct xfs_mount	*mp = dest->i_mount;
 	struct xfs_trans	*tp;
 	int			error;
 
+	/*
+	 * If this is a synchronous mount, make sure that the
+	 * transaction goes to disk before returning to the user.
+	 */
+	*need_sync = (mp->m_flags & XFS_MOUNT_WSYNC);
+
 	if (newlen <= i_size_read(VFS_I(dest)) && cowextsize == 0)
 		return 0;
 
@@ -948,6 +955,12 @@  xfs_reflink_update_dest(
 
 	xfs_trans_log_inode(tp, dest, XFS_ILOG_CORE);
 
+	/* We can force the transaction to disk here. */
+	if (*need_sync) {
+		xfs_trans_set_sync(tp);
+		*need_sync = false;
+	}
+
 	error = xfs_trans_commit(tp);
 	if (error)
 		goto out_error;
diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h
index 3e4fd46373ab..e9b505e0ad7a 100644
--- a/fs/xfs/xfs_reflink.h
+++ b/fs/xfs/xfs_reflink.h
@@ -55,7 +55,8 @@  extern int xfs_reflink_remap_blocks(struct xfs_inode *src, loff_t pos_in,
 		struct xfs_inode *dest, loff_t pos_out, loff_t remap_len,
 		loff_t *remapped);
 extern int xfs_reflink_update_dest(struct xfs_inode *dest, xfs_off_t newlen,
-		xfs_extlen_t cowextsize, unsigned int remap_flags);
+		xfs_extlen_t cowextsize, unsigned int remap_flags,
+		bool *need_sync);
 extern void xfs_reflink_remap_unlock(struct file *file_in,
 		struct file *file_out);