diff mbox series

[v2,09/10] fs/xfs: Handle CoW for fsdax write() path

Message ID 20210226002030.653855-10-ruansy.fnst@fujitsu.com (mailing list archive)
State New, archived
Headers show
Series fsdax, xfs: Add reflink&dedupe support for fsdax | expand

Commit Message

Shiyang Ruan Feb. 26, 2021, 12:20 a.m. UTC
In fsdax mode, WRITE and ZERO on a shared extent need CoW performed. After
CoW, new allocated extents needs to be remapped to the file.  So, add an
iomap_end for dax write ops to do the remapping work.

Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
---
 fs/xfs/xfs_bmap_util.c |  3 ++-
 fs/xfs/xfs_file.c      |  9 +++------
 fs/xfs/xfs_iomap.c     | 30 +++++++++++++++++++++++++++++-
 fs/xfs/xfs_iomap.h     |  1 +
 fs/xfs/xfs_iops.c      | 11 ++++++++---
 fs/xfs/xfs_reflink.c   |  2 ++
 6 files changed, 45 insertions(+), 11 deletions(-)

Comments

Christoph Hellwig March 3, 2021, 9:43 a.m. UTC | #1
On Fri, Feb 26, 2021 at 08:20:29AM +0800, Shiyang Ruan wrote:
>  	error = iomap_zero_range(VFS_I(ip), offset, len, NULL,
> -			&xfs_buffered_write_iomap_ops);
> +		  IS_DAX(VFS_I(ip)) ?
> +		  &xfs_dax_write_iomap_ops : &xfs_buffered_write_iomap_ops);

Please add a xfs_zero_range helper that picks the right iomap_ops
instead of open coding this in a few places.

> +static int
> +xfs_dax_write_iomap_end(
> +	struct inode		*inode,
> +	loff_t			pos,
> +	loff_t			length,
> +	ssize_t			written,
> +	unsigned int		flags,
> +	struct iomap		*iomap)
> +{
> +	int			error = 0;
> +	xfs_inode_t		*ip = XFS_I(inode);
> +
> +	if (pos + written > i_size_read(inode)) {
> +		i_size_write(inode, pos + written);
> +		error = xfs_setfilesize(ip, pos, written);
> +	}
> +	if (xfs_is_cow_inode(ip))
> +		error = xfs_reflink_end_cow(ip, pos, written);
> +
> +	return error;

What is the advantage of the ioemap_end handler here?  It adds another
indirect funtion call to the fast path, so if we can avoid it, I'd
rather do that.

Also, shouldn't we cancel the COW rather than finishing it when setting
the file size fails?
Shiyang Ruan March 3, 2021, 9:57 a.m. UTC | #2
> 
> On Fri, Feb 26, 2021 at 08:20:29AM +0800, Shiyang Ruan wrote:
> >       error = iomap_zero_range(VFS_I(ip), offset, len, NULL,
> > -                     &xfs_buffered_write_iomap_ops);
> > +               IS_DAX(VFS_I(ip)) ?
> > +               &xfs_dax_write_iomap_ops : &xfs_buffered_write_iomap_ops);
> 
> Please add a xfs_zero_range helper that picks the right iomap_ops
> instead of open coding this in a few places.

OK.  I'll add it.
> 
> > +static int
> > +xfs_dax_write_iomap_end(
> > +     struct inode            *inode,
> > +     loff_t                  pos,
> > +     loff_t                  length,
> > +     ssize_t                 written,
> > +     unsigned int            flags,
> > +     struct iomap            *iomap)
> > +{
> > +     int                     error = 0;
> > +     xfs_inode_t             *ip = XFS_I(inode);
> > +
> > +     if (pos + written > i_size_read(inode)) {
> > +             i_size_write(inode, pos + written);
> > +             error = xfs_setfilesize(ip, pos, written);
> > +     }
> > +     if (xfs_is_cow_inode(ip))
> > +             error = xfs_reflink_end_cow(ip, pos, written);
> > +
> > +     return error;
> 
> What is the advantage of the ioemap_end handler here?  It adds another
> indirect funtion call to the fast path, so if we can avoid it, I'd
> rather do that.

These code were in xfs_file_dax_write().  I moved them into the iomap_end
because the mmaped CoW need this.

I know this is not so good, but I could not find another better way. Do you
have any ideas? 

>
> Also, shouldn't we cancel the COW rather than finishing it when setting
> the file size fails?
> 

I did forget about this part.  Thanks for pointing out.


--
Thanks,
Ruan Shiyang.
Christoph Hellwig March 3, 2021, 10:43 a.m. UTC | #3
On Wed, Mar 03, 2021 at 09:57:48AM +0000, ruansy.fnst@fujitsu.com wrote:
> > What is the advantage of the ioemap_end handler here?  It adds another
> > indirect funtion call to the fast path, so if we can avoid it, I'd
> > rather do that.
> 
> These code were in xfs_file_dax_write().  I moved them into the iomap_end
> because the mmaped CoW need this.
> 
> I know this is not so good, but I could not find another better way. Do you
> have any ideas? 

mmaped copy is the copy_edge case?  Maybe just use different iomap_ops for
that case vs plain write?
Shiyang Ruan March 4, 2021, 1:35 a.m. UTC | #4
> On Wed, Mar 03, 2021 at 09:57:48AM +0000, ruansy.fnst@fujitsu.com wrote:
> > > What is the advantage of the ioemap_end handler here?  It adds another
> > > indirect funtion call to the fast path, so if we can avoid it, I'd
> > > rather do that.
> >
> > These code were in xfs_file_dax_write().  I moved them into the iomap_end
> > because the mmaped CoW need this.
> >
> > I know this is not so good, but I could not find another better way. Do you
> > have any ideas?
> mmaped copy is the copy_edge case?  Maybe just use different iomap_ops for
> that case vs plain write?

No, I mean mmaped CoW need a xfs_reflink_end_cow() to make sure the new extent
will be correctly remaped to the file.  Otherwise, the file will still refer to
the extent that srcmap point to.

We are able to call this in xfs_file_dax_write(), but cannot call it anywhere
except iomap_end in mmap path.


--
Thanks,
Ruan Shiyang.
diff mbox series

Patch

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 7371a7f7c652..65a8782b6378 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -977,7 +977,8 @@  xfs_free_file_space(
 	if (offset + len > XFS_ISIZE(ip))
 		len = XFS_ISIZE(ip) - offset;
 	error = iomap_zero_range(VFS_I(ip), offset, len, NULL,
-			&xfs_buffered_write_iomap_ops);
+		  IS_DAX(VFS_I(ip)) ?
+		  &xfs_dax_write_iomap_ops : &xfs_buffered_write_iomap_ops);
 	if (error)
 		return error;
 
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 5b0f93f73837..1987d15eab61 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -623,11 +623,8 @@  xfs_file_dax_write(
 	count = iov_iter_count(from);
 
 	trace_xfs_file_dax_write(ip, count, pos);
-	ret = dax_iomap_rw(iocb, from, &xfs_direct_write_iomap_ops);
-	if (ret > 0 && iocb->ki_pos > i_size_read(inode)) {
-		i_size_write(inode, iocb->ki_pos);
-		error = xfs_setfilesize(ip, pos, ret);
-	}
+	ret = dax_iomap_rw(iocb, from, &xfs_dax_write_iomap_ops);
+
 out:
 	xfs_iunlock(ip, iolock);
 	if (error)
@@ -1250,7 +1247,7 @@  __xfs_filemap_fault(
 
 		ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL,
 				(write_fault && !vmf->cow_page) ?
-				 &xfs_direct_write_iomap_ops :
+				 &xfs_dax_write_iomap_ops :
 				 &xfs_read_iomap_ops);
 		if (ret & VM_FAULT_NEEDDSYNC)
 			ret = dax_finish_sync_fault(vmf, pe_size, pfn);
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 7b9ff824e82d..23c6f8c97047 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -771,7 +771,8 @@  xfs_direct_write_iomap_begin(
 
 		/* may drop and re-acquire the ilock */
 		error = xfs_reflink_allocate_cow(ip, &imap, &cmap, &shared,
-				&lockmode, flags & IOMAP_DIRECT);
+				&lockmode,
+				flags & IOMAP_DIRECT || IS_DAX(inode));
 		if (error)
 			goto out_unlock;
 		if (shared)
@@ -850,6 +851,33 @@  const struct iomap_ops xfs_direct_write_iomap_ops = {
 	.iomap_begin		= xfs_direct_write_iomap_begin,
 };
 
+static int
+xfs_dax_write_iomap_end(
+	struct inode		*inode,
+	loff_t			pos,
+	loff_t			length,
+	ssize_t			written,
+	unsigned int		flags,
+	struct iomap		*iomap)
+{
+	int			error = 0;
+	xfs_inode_t		*ip = XFS_I(inode);
+
+	if (pos + written > i_size_read(inode)) {
+		i_size_write(inode, pos + written);
+		error = xfs_setfilesize(ip, pos, written);
+	}
+	if (xfs_is_cow_inode(ip))
+		error = xfs_reflink_end_cow(ip, pos, written);
+
+	return error;
+}
+
+const struct iomap_ops xfs_dax_write_iomap_ops = {
+	.iomap_begin		= xfs_direct_write_iomap_begin,
+	.iomap_end		= xfs_dax_write_iomap_end,
+};
+
 static int
 xfs_buffered_write_iomap_begin(
 	struct inode		*inode,
diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h
index 7d3703556d0e..a361c2f27cf3 100644
--- a/fs/xfs/xfs_iomap.h
+++ b/fs/xfs/xfs_iomap.h
@@ -42,6 +42,7 @@  xfs_aligned_fsb_count(
 
 extern const struct iomap_ops xfs_buffered_write_iomap_ops;
 extern const struct iomap_ops xfs_direct_write_iomap_ops;
+extern const struct iomap_ops xfs_dax_write_iomap_ops;
 extern const struct iomap_ops xfs_read_iomap_ops;
 extern const struct iomap_ops xfs_seek_iomap_ops;
 extern const struct iomap_ops xfs_xattr_iomap_ops;
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 67c8dc9de8aa..adf4467ab862 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -841,6 +841,7 @@  xfs_setattr_size(
 	int			error;
 	uint			lock_flags = 0;
 	bool			did_zeroing = false;
+	const struct iomap_ops	*ops;
 
 	ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
 	ASSERT(xfs_isilocked(ip, XFS_MMAPLOCK_EXCL));
@@ -887,10 +888,15 @@  xfs_setattr_size(
 	 * extension, or zeroing out the rest of the block on a downward
 	 * truncate.
 	 */
+	if (IS_DAX(inode))
+		ops = &xfs_direct_write_iomap_ops;
+	else
+		ops = &xfs_buffered_write_iomap_ops;
+
 	if (newsize > oldsize) {
 		trace_xfs_zero_eof(ip, oldsize, newsize - oldsize);
 		error = iomap_zero_range(inode, oldsize, newsize - oldsize,
-				&did_zeroing, &xfs_buffered_write_iomap_ops);
+				&did_zeroing, ops);
 	} else {
 		/*
 		 * iomap won't detect a dirty page over an unwritten block (or a
@@ -902,8 +908,7 @@  xfs_setattr_size(
 						     newsize);
 		if (error)
 			return error;
-		error = iomap_truncate_page(inode, newsize, &did_zeroing,
-				&xfs_buffered_write_iomap_ops);
+		error = iomap_truncate_page(inode, newsize, &did_zeroing, ops);
 	}
 
 	if (error)
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index f5b3a3da36b7..dfe4e1912ff9 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -1246,6 +1246,8 @@  xfs_reflink_zero_posteof(
 
 	trace_xfs_zero_eof(ip, isize, pos - isize);
 	return iomap_zero_range(VFS_I(ip), isize, pos - isize, NULL,
+			IS_DAX(VFS_I(ip)) ?
+			&xfs_dax_write_iomap_ops :
 			&xfs_buffered_write_iomap_ops);
 }