diff mbox series

[v4,1/7] fsdax: Introduce dax_iomap_cow_copy()

Message ID 20210408120432.1063608-2-ruansy.fnst@fujitsu.com (mailing list archive)
State New, archived
Headers show
Series fsdax,xfs: Add reflink&dedupe support for fsdax | expand

Commit Message

Shiyang Ruan April 8, 2021, 12:04 p.m. UTC
In the case where the iomap is a write operation and iomap is not equal
to srcmap after iomap_begin, we consider it is a CoW operation.

The destance extent which iomap indicated is new allocated extent.
So, it is needed to copy the data from srcmap to new allocated extent.
In theory, it is better to copy the head and tail ranges which is
outside of the non-aligned area instead of copying the whole aligned
range. But in dax page fault, it will always be an aligned range.  So,
we have to copy the whole range in this case.

Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/dax.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 77 insertions(+), 5 deletions(-)

Comments

Darrick J. Wong April 8, 2021, 9:53 p.m. UTC | #1
On Thu, Apr 08, 2021 at 08:04:26PM +0800, Shiyang Ruan wrote:
> In the case where the iomap is a write operation and iomap is not equal
> to srcmap after iomap_begin, we consider it is a CoW operation.
> 
> The destance extent which iomap indicated is new allocated extent.
> So, it is needed to copy the data from srcmap to new allocated extent.
> In theory, it is better to copy the head and tail ranges which is
> outside of the non-aligned area instead of copying the whole aligned
> range. But in dax page fault, it will always be an aligned range.  So,
> we have to copy the whole range in this case.
> 
> Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/dax.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 77 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index 8d7e4e2cc0fb..b4fd3813457a 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -1038,6 +1038,61 @@ static int dax_iomap_direct_access(struct iomap *iomap, loff_t pos, size_t size,
>  	return rc;
>  }
>  
> +/**
> + * dax_iomap_cow_copy(): Copy the data from source to destination before write.
> + * @pos:	address to do copy from.
> + * @length:	size of copy operation.
> + * @align_size:	aligned w.r.t align_size (either PMD_SIZE or PAGE_SIZE)
> + * @srcmap:	iomap srcmap
> + * @daddr:	destination address to copy to.
> + *
> + * This can be called from two places. Either during DAX write fault, to copy
> + * the length size data to daddr. Or, while doing normal DAX write operation,
> + * dax_iomap_actor() might call this to do the copy of either start or end
> + * unaligned address. In this case the rest of the copy of aligned ranges is
> + * taken care by dax_iomap_actor() itself.

Er... what?  This description is very confusing to me.  /me reads the
code, and ...

OH.

Given a range (pos, length) and a mapping for a source file, this
function copies all the bytes between pos and (pos + length) to daddr if
the range is aligned to @align_size.  But if pos and length are not both
aligned to align_src then it'll copy *around* the range, leaving the
area in the middle uncopied waiting for write_iter to fill it in with
whatever's in the iovec.

Yikes, this function is doing double duty and ought to be split into
two functions.

The first function does the COW work for a write fault to an mmap
region and does a straight copy.  Page faults are always aligned, so
this functionality is needed by dax_fault_actor.  Maybe this could be
named dax_fault_cow?

The second function does the prep COW work *around* a write so that we
always copy entire page/blocks.  This cow-around code is needed by
dax_iomap_actor.  This should perhaps be named dax_iomap_cow_around()?

> + * Also, note DAX fault will always result in aligned pos and pos + length.
> + */
> +static int dax_iomap_cow_copy(loff_t pos, loff_t length, size_t align_size,
> +		struct iomap *srcmap, void *daddr)
> +{
> +	loff_t head_off = pos & (align_size - 1);
> +	size_t size = ALIGN(head_off + length, align_size);
> +	loff_t end = pos + length;
> +	loff_t pg_end = round_up(end, align_size);
> +	bool copy_all = head_off == 0 && end == pg_end;
> +	void *saddr = 0;
> +	int ret = 0;
> +
> +	ret = dax_iomap_direct_access(srcmap, pos, size, &saddr, NULL);
> +	if (ret)
> +		return ret;
> +
> +	if (copy_all) {
> +		ret = copy_mc_to_kernel(daddr, saddr, length);
> +		return ret ? -EIO : 0;

I find it /very/ interesting that copy_mc_to_kernel takes an unsigned
int argument but returns an unsigned long (counting the bytes that
didn't get copied, oddly...but that's an existing API so I guess I'll
let it go.)

> +	}
> +
> +	/* Copy the head part of the range.  Note: we pass offset as length. */
> +	if (head_off) {
> +		ret = copy_mc_to_kernel(daddr, saddr, head_off);
> +		if (ret)
> +			return -EIO;
> +	}
> +
> +	/* Copy the tail part of the range */
> +	if (end < pg_end) {
> +		loff_t tail_off = head_off + length;
> +		loff_t tail_len = pg_end - end;
> +
> +		ret = copy_mc_to_kernel(daddr + tail_off, saddr + tail_off,
> +					tail_len);
> +		if (ret)
> +			return -EIO;
> +	}
> +	return 0;
> +}
> +
>  /*
>   * The user has performed a load from a hole in the file.  Allocating a new
>   * page in the file would cause excessive storage usage for workloads with
> @@ -1167,11 +1222,12 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  	struct dax_device *dax_dev = iomap->dax_dev;
>  	struct iov_iter *iter = data;
>  	loff_t end = pos + length, done = 0;
> +	bool write = iov_iter_rw(iter) == WRITE;
>  	ssize_t ret = 0;
>  	size_t xfer;
>  	int id;
>  
> -	if (iov_iter_rw(iter) == READ) {
> +	if (!write) {
>  		end = min(end, i_size_read(inode));
>  		if (pos >= end)
>  			return 0;
> @@ -1180,7 +1236,8 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  			return iov_iter_zero(min(length, end - pos), iter);
>  	}
>  
> -	if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED))
> +	if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED &&
> +			!(iomap->flags & IOMAP_F_SHARED)))

This is a bit subtle.  Could we add a comment:

	/*
	 * In DAX mode, we allow either pure overwrites of written extents,
	 * or writes to unwritten extents as part of a copy-on-write
	 * operation.
	 */
	if (WARN_ON_ONCE(...))

>  		return -EIO;
>  
>  	/*
> @@ -1219,6 +1276,13 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  			break;
>  		}
>  
> +		if (write && srcmap->addr != iomap->addr) {

Do you have to check if srcmap is not a hole?

--D

> +			ret = dax_iomap_cow_copy(pos, length, PAGE_SIZE, srcmap,
> +						 kaddr);
> +			if (ret)
> +				break;
> +		}
> +
>  		map_len = PFN_PHYS(map_len);
>  		kaddr += offset;
>  		map_len -= offset;
> @@ -1230,7 +1294,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
>  		 * validated via access_ok() in either vfs_read() or
>  		 * vfs_write(), depending on which operation we are doing.
>  		 */
> -		if (iov_iter_rw(iter) == WRITE)
> +		if (write)
>  			xfer = dax_copy_from_iter(dax_dev, pgoff, kaddr,
>  					map_len, iter);
>  		else
> @@ -1382,6 +1446,7 @@ static vm_fault_t dax_fault_actor(struct vm_fault *vmf, pfn_t *pfnp,
>  	unsigned long entry_flags = pmd ? DAX_PMD : 0;
>  	int err = 0;
>  	pfn_t pfn;
> +	void *kaddr;
>  
>  	/* if we are reading UNWRITTEN and HOLE, return a hole. */
>  	if (!write &&
> @@ -1392,18 +1457,25 @@ static vm_fault_t dax_fault_actor(struct vm_fault *vmf, pfn_t *pfnp,
>  			return dax_pmd_load_hole(xas, vmf, iomap, entry);
>  	}
>  
> -	if (iomap->type != IOMAP_MAPPED) {
> +	if (iomap->type != IOMAP_MAPPED && !(iomap->flags & IOMAP_F_SHARED)) {
>  		WARN_ON_ONCE(1);
>  		return pmd ? VM_FAULT_FALLBACK : VM_FAULT_SIGBUS;
>  	}
>  
> -	err = dax_iomap_direct_access(iomap, pos, size, NULL, &pfn);
> +	err = dax_iomap_direct_access(iomap, pos, size, &kaddr, &pfn);
>  	if (err)
>  		return pmd ? VM_FAULT_FALLBACK : dax_fault_return(err);
>  
>  	*entry = dax_insert_entry(xas, mapping, vmf, *entry, pfn, entry_flags,
>  				  write && !sync);
>  
> +	if (write &&
> +	    srcmap->addr != IOMAP_HOLE && srcmap->addr != iomap->addr) {
> +		err = dax_iomap_cow_copy(pos, size, size, srcmap, kaddr);
> +		if (err)
> +			return dax_fault_return(err);
> +	}
> +
>  	if (sync)
>  		return dax_fault_synchronous_pfnp(pfnp, pfn);
>  
> -- 
> 2.31.0
> 
> 
>
Shiyang Ruan April 9, 2021, 2:30 a.m. UTC | #2
> -----Original Message-----
> From: Darrick J. Wong <djwong@kernel.org>
> Sent: Friday, April 9, 2021 5:53 AM
> Subject: Re: [PATCH v4 1/7] fsdax: Introduce dax_iomap_cow_copy()
> 
> On Thu, Apr 08, 2021 at 08:04:26PM +0800, Shiyang Ruan wrote:
> > In the case where the iomap is a write operation and iomap is not
> > equal to srcmap after iomap_begin, we consider it is a CoW operation.
> >
> > The destance extent which iomap indicated is new allocated extent.
> > So, it is needed to copy the data from srcmap to new allocated extent.
> > In theory, it is better to copy the head and tail ranges which is
> > outside of the non-aligned area instead of copying the whole aligned
> > range. But in dax page fault, it will always be an aligned range.  So,
> > we have to copy the whole range in this case.
> >
> > Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  fs/dax.c | 82
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 77 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/dax.c b/fs/dax.c
> > index 8d7e4e2cc0fb..b4fd3813457a 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -1038,6 +1038,61 @@ static int dax_iomap_direct_access(struct iomap
> *iomap, loff_t pos, size_t size,
> >  	return rc;
> >  }
> >
> > +/**
> > + * dax_iomap_cow_copy(): Copy the data from source to destination before
> write.
> > + * @pos:	address to do copy from.
> > + * @length:	size of copy operation.
> > + * @align_size:	aligned w.r.t align_size (either PMD_SIZE or PAGE_SIZE)
> > + * @srcmap:	iomap srcmap
> > + * @daddr:	destination address to copy to.
> > + *
> > + * This can be called from two places. Either during DAX write fault,
> > +to copy
> > + * the length size data to daddr. Or, while doing normal DAX write
> > +operation,
> > + * dax_iomap_actor() might call this to do the copy of either start
> > +or end
> > + * unaligned address. In this case the rest of the copy of aligned
> > +ranges is
> > + * taken care by dax_iomap_actor() itself.
> 
> Er... what?  This description is very confusing to me.  /me reads the code,
> and ...
> 
> OH.
> 
> Given a range (pos, length) and a mapping for a source file, this function copies
> all the bytes between pos and (pos + length) to daddr if the range is aligned to
> @align_size.  But if pos and length are not both aligned to align_src then it'll
> copy *around* the range, leaving the area in the middle uncopied waiting for
> write_iter to fill it in with whatever's in the iovec.
> 
> Yikes, this function is doing double duty and ought to be split into two functions.
> 
> The first function does the COW work for a write fault to an mmap region and
> does a straight copy.  Page faults are always aligned, so this functionality is
> needed by dax_fault_actor.  Maybe this could be named dax_fault_cow?
> 
> The second function does the prep COW work *around* a write so that we
> always copy entire page/blocks.  This cow-around code is needed by
> dax_iomap_actor.  This should perhaps be named dax_iomap_cow_around()?

Two functions seems easier to understand.  But I think the code from dax_iomap_direct_access() to its above will be redundant in this two functions.
How about make the description better?

> 
> > + * Also, note DAX fault will always result in aligned pos and pos + length.
> > + */
> > +static int dax_iomap_cow_copy(loff_t pos, loff_t length, size_t align_size,
> > +		struct iomap *srcmap, void *daddr)
> > +{
> > +	loff_t head_off = pos & (align_size - 1);
> > +	size_t size = ALIGN(head_off + length, align_size);
> > +	loff_t end = pos + length;
> > +	loff_t pg_end = round_up(end, align_size);
> > +	bool copy_all = head_off == 0 && end == pg_end;
> > +	void *saddr = 0;
> > +	int ret = 0;
> > +
> > +	ret = dax_iomap_direct_access(srcmap, pos, size, &saddr, NULL);
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (copy_all) {
> > +		ret = copy_mc_to_kernel(daddr, saddr, length);
> > +		return ret ? -EIO : 0;
> 
> I find it /very/ interesting that copy_mc_to_kernel takes an unsigned int
> argument but returns an unsigned long (counting the bytes that didn't get
> copied, oddly...but that's an existing API so I guess I'll let it go.)
> 
> > +	}
> > +
> > +	/* Copy the head part of the range.  Note: we pass offset as length. */
> > +	if (head_off) {
> > +		ret = copy_mc_to_kernel(daddr, saddr, head_off);
> > +		if (ret)
> > +			return -EIO;
> > +	}
> > +
> > +	/* Copy the tail part of the range */
> > +	if (end < pg_end) {
> > +		loff_t tail_off = head_off + length;
> > +		loff_t tail_len = pg_end - end;
> > +
> > +		ret = copy_mc_to_kernel(daddr + tail_off, saddr + tail_off,
> > +					tail_len);
> > +		if (ret)
> > +			return -EIO;
> > +	}
> > +	return 0;
> > +}
> > +
> >  /*
> >   * The user has performed a load from a hole in the file.  Allocating a new
> >   * page in the file would cause excessive storage usage for workloads
> > with @@ -1167,11 +1222,12 @@ dax_iomap_actor(struct inode *inode,
> loff_t pos, loff_t length, void *data,
> >  	struct dax_device *dax_dev = iomap->dax_dev;
> >  	struct iov_iter *iter = data;
> >  	loff_t end = pos + length, done = 0;
> > +	bool write = iov_iter_rw(iter) == WRITE;
> >  	ssize_t ret = 0;
> >  	size_t xfer;
> >  	int id;
> >
> > -	if (iov_iter_rw(iter) == READ) {
> > +	if (!write) {
> >  		end = min(end, i_size_read(inode));
> >  		if (pos >= end)
> >  			return 0;
> > @@ -1180,7 +1236,8 @@ dax_iomap_actor(struct inode *inode, loff_t pos,
> loff_t length, void *data,
> >  			return iov_iter_zero(min(length, end - pos), iter);
> >  	}
> >
> > -	if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED))
> > +	if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED &&
> > +			!(iomap->flags & IOMAP_F_SHARED)))
> 
> This is a bit subtle.  Could we add a comment:
> 
> 	/*
> 	 * In DAX mode, we allow either pure overwrites of written extents,
> 	 * or writes to unwritten extents as part of a copy-on-write
> 	 * operation.
> 	 */
> 	if (WARN_ON_ONCE(...))

OK.

> 
> >  		return -EIO;
> >
> >  	/*
> > @@ -1219,6 +1276,13 @@ dax_iomap_actor(struct inode *inode, loff_t pos,
> loff_t length, void *data,
> >  			break;
> >  		}
> >
> > +		if (write && srcmap->addr != iomap->addr) {
> 
> Do you have to check if srcmap is not a hole?
This dax_iomap_actor() is called by iomap_apply(), in which srcmap has been checked: If srcmap is a HOLE, then iomap_apply() will tell the actor that iomap == srcmap.  So, I didn't check it here.  But in dax_fault_actor() case, because we are not using iomap_apply(), the check is needed.


--
Thanks,
Ruan Shiyang.
> 
> --D
> 
> > +			ret = dax_iomap_cow_copy(pos, length, PAGE_SIZE, srcmap,
> > +						 kaddr);
> > +			if (ret)
> > +				break;
> > +		}
> > +
> >  		map_len = PFN_PHYS(map_len);
> >  		kaddr += offset;
> >  		map_len -= offset;
> > @@ -1230,7 +1294,7 @@ dax_iomap_actor(struct inode *inode, loff_t pos,
> loff_t length, void *data,
> >  		 * validated via access_ok() in either vfs_read() or
> >  		 * vfs_write(), depending on which operation we are doing.
> >  		 */
> > -		if (iov_iter_rw(iter) == WRITE)
> > +		if (write)
> >  			xfer = dax_copy_from_iter(dax_dev, pgoff, kaddr,
> >  					map_len, iter);
> >  		else
> > @@ -1382,6 +1446,7 @@ static vm_fault_t dax_fault_actor(struct vm_fault
> *vmf, pfn_t *pfnp,
> >  	unsigned long entry_flags = pmd ? DAX_PMD : 0;
> >  	int err = 0;
> >  	pfn_t pfn;
> > +	void *kaddr;
> >
> >  	/* if we are reading UNWRITTEN and HOLE, return a hole. */
> >  	if (!write &&
> > @@ -1392,18 +1457,25 @@ static vm_fault_t dax_fault_actor(struct
> vm_fault *vmf, pfn_t *pfnp,
> >  			return dax_pmd_load_hole(xas, vmf, iomap, entry);
> >  	}
> >
> > -	if (iomap->type != IOMAP_MAPPED) {
> > +	if (iomap->type != IOMAP_MAPPED && !(iomap->flags &
> IOMAP_F_SHARED))
> > +{
> >  		WARN_ON_ONCE(1);
> >  		return pmd ? VM_FAULT_FALLBACK : VM_FAULT_SIGBUS;
> >  	}
> >
> > -	err = dax_iomap_direct_access(iomap, pos, size, NULL, &pfn);
> > +	err = dax_iomap_direct_access(iomap, pos, size, &kaddr, &pfn);
> >  	if (err)
> >  		return pmd ? VM_FAULT_FALLBACK : dax_fault_return(err);
> >
> >  	*entry = dax_insert_entry(xas, mapping, vmf, *entry, pfn, entry_flags,
> >  				  write && !sync);
> >
> > +	if (write &&
> > +	    srcmap->addr != IOMAP_HOLE && srcmap->addr != iomap->addr) {
> > +		err = dax_iomap_cow_copy(pos, size, size, srcmap, kaddr);
> > +		if (err)
> > +			return dax_fault_return(err);
> > +	}
> > +
> >  	if (sync)
> >  		return dax_fault_synchronous_pfnp(pfnp, pfn);
> >
> > --
> > 2.31.0
> >
> >
> >
diff mbox series

Patch

diff --git a/fs/dax.c b/fs/dax.c
index 8d7e4e2cc0fb..b4fd3813457a 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1038,6 +1038,61 @@  static int dax_iomap_direct_access(struct iomap *iomap, loff_t pos, size_t size,
 	return rc;
 }
 
+/**
+ * dax_iomap_cow_copy(): Copy the data from source to destination before write.
+ * @pos:	address to do copy from.
+ * @length:	size of copy operation.
+ * @align_size:	aligned w.r.t align_size (either PMD_SIZE or PAGE_SIZE)
+ * @srcmap:	iomap srcmap
+ * @daddr:	destination address to copy to.
+ *
+ * This can be called from two places. Either during DAX write fault, to copy
+ * the length size data to daddr. Or, while doing normal DAX write operation,
+ * dax_iomap_actor() might call this to do the copy of either start or end
+ * unaligned address. In this case the rest of the copy of aligned ranges is
+ * taken care by dax_iomap_actor() itself.
+ * Also, note DAX fault will always result in aligned pos and pos + length.
+ */
+static int dax_iomap_cow_copy(loff_t pos, loff_t length, size_t align_size,
+		struct iomap *srcmap, void *daddr)
+{
+	loff_t head_off = pos & (align_size - 1);
+	size_t size = ALIGN(head_off + length, align_size);
+	loff_t end = pos + length;
+	loff_t pg_end = round_up(end, align_size);
+	bool copy_all = head_off == 0 && end == pg_end;
+	void *saddr = 0;
+	int ret = 0;
+
+	ret = dax_iomap_direct_access(srcmap, pos, size, &saddr, NULL);
+	if (ret)
+		return ret;
+
+	if (copy_all) {
+		ret = copy_mc_to_kernel(daddr, saddr, length);
+		return ret ? -EIO : 0;
+	}
+
+	/* Copy the head part of the range.  Note: we pass offset as length. */
+	if (head_off) {
+		ret = copy_mc_to_kernel(daddr, saddr, head_off);
+		if (ret)
+			return -EIO;
+	}
+
+	/* Copy the tail part of the range */
+	if (end < pg_end) {
+		loff_t tail_off = head_off + length;
+		loff_t tail_len = pg_end - end;
+
+		ret = copy_mc_to_kernel(daddr + tail_off, saddr + tail_off,
+					tail_len);
+		if (ret)
+			return -EIO;
+	}
+	return 0;
+}
+
 /*
  * The user has performed a load from a hole in the file.  Allocating a new
  * page in the file would cause excessive storage usage for workloads with
@@ -1167,11 +1222,12 @@  dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 	struct dax_device *dax_dev = iomap->dax_dev;
 	struct iov_iter *iter = data;
 	loff_t end = pos + length, done = 0;
+	bool write = iov_iter_rw(iter) == WRITE;
 	ssize_t ret = 0;
 	size_t xfer;
 	int id;
 
-	if (iov_iter_rw(iter) == READ) {
+	if (!write) {
 		end = min(end, i_size_read(inode));
 		if (pos >= end)
 			return 0;
@@ -1180,7 +1236,8 @@  dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 			return iov_iter_zero(min(length, end - pos), iter);
 	}
 
-	if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED))
+	if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED &&
+			!(iomap->flags & IOMAP_F_SHARED)))
 		return -EIO;
 
 	/*
@@ -1219,6 +1276,13 @@  dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 			break;
 		}
 
+		if (write && srcmap->addr != iomap->addr) {
+			ret = dax_iomap_cow_copy(pos, length, PAGE_SIZE, srcmap,
+						 kaddr);
+			if (ret)
+				break;
+		}
+
 		map_len = PFN_PHYS(map_len);
 		kaddr += offset;
 		map_len -= offset;
@@ -1230,7 +1294,7 @@  dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		 * validated via access_ok() in either vfs_read() or
 		 * vfs_write(), depending on which operation we are doing.
 		 */
-		if (iov_iter_rw(iter) == WRITE)
+		if (write)
 			xfer = dax_copy_from_iter(dax_dev, pgoff, kaddr,
 					map_len, iter);
 		else
@@ -1382,6 +1446,7 @@  static vm_fault_t dax_fault_actor(struct vm_fault *vmf, pfn_t *pfnp,
 	unsigned long entry_flags = pmd ? DAX_PMD : 0;
 	int err = 0;
 	pfn_t pfn;
+	void *kaddr;
 
 	/* if we are reading UNWRITTEN and HOLE, return a hole. */
 	if (!write &&
@@ -1392,18 +1457,25 @@  static vm_fault_t dax_fault_actor(struct vm_fault *vmf, pfn_t *pfnp,
 			return dax_pmd_load_hole(xas, vmf, iomap, entry);
 	}
 
-	if (iomap->type != IOMAP_MAPPED) {
+	if (iomap->type != IOMAP_MAPPED && !(iomap->flags & IOMAP_F_SHARED)) {
 		WARN_ON_ONCE(1);
 		return pmd ? VM_FAULT_FALLBACK : VM_FAULT_SIGBUS;
 	}
 
-	err = dax_iomap_direct_access(iomap, pos, size, NULL, &pfn);
+	err = dax_iomap_direct_access(iomap, pos, size, &kaddr, &pfn);
 	if (err)
 		return pmd ? VM_FAULT_FALLBACK : dax_fault_return(err);
 
 	*entry = dax_insert_entry(xas, mapping, vmf, *entry, pfn, entry_flags,
 				  write && !sync);
 
+	if (write &&
+	    srcmap->addr != IOMAP_HOLE && srcmap->addr != iomap->addr) {
+		err = dax_iomap_cow_copy(pos, size, size, srcmap, kaddr);
+		if (err)
+			return dax_fault_return(err);
+	}
+
 	if (sync)
 		return dax_fault_synchronous_pfnp(pfnp, pfn);