diff mbox

[12/26] xfs: reject all unaligned direct writes to reflinked files

Message ID 20170401063512.25313-13-hch@lst.de (mailing list archive)
State Superseded
Headers show

Commit Message

Christoph Hellwig April 1, 2017, 6:34 a.m. UTC
commit 54a4ef8af4e0dc5c983d17fcb9cf5fd25666d94e upstream.

We currently fall back from direct to buffered writes if we detect a
remaining shared extent in the iomap_begin callback.  But by the time
iomap_begin is called for the potentially unaligned end block we might
have already written most of the data to disk, which we'd now write
again using buffered I/O.  To avoid this reject all writes to reflinked
files before starting I/O so that we are guaranteed to only write the
data once.

The alternative would be to unshare the unaligned start and/or end block
before doing the I/O. I think that's doable, and will actually be
required to support reflinks on DAX file system.  But it will take a
little more time and I'd rather get rid of the double write ASAP.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_file.c  |  9 +++++++++
 fs/xfs/xfs_iomap.c | 12 +-----------
 fs/xfs/xfs_trace.h |  2 +-
 3 files changed, 11 insertions(+), 12 deletions(-)

Comments

Greg KH April 1, 2017, 5:21 p.m. UTC | #1
On Sat, Apr 01, 2017 at 08:34:58AM +0200, Christoph Hellwig wrote:
> commit 54a4ef8af4e0dc5c983d17fcb9cf5fd25666d94e upstream.
> 
> We currently fall back from direct to buffered writes if we detect a
> remaining shared extent in the iomap_begin callback.  But by the time
> iomap_begin is called for the potentially unaligned end block we might
> have already written most of the data to disk, which we'd now write
> again using buffered I/O.  To avoid this reject all writes to reflinked
> files before starting I/O so that we are guaranteed to only write the
> data once.
> 
> The alternative would be to unshare the unaligned start and/or end block
> before doing the I/O. I think that's doable, and will actually be
> required to support reflinks on DAX file system.  But it will take a
> little more time and I'd rather get rid of the double write ASAP.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Any specific reason you don't want this one in 4.9 as well?

Just curious,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig April 1, 2017, 5:22 p.m. UTC | #2
On Sat, Apr 01, 2017 at 07:21:08PM +0200, Greg KH wrote:
> Any specific reason you don't want this one in 4.9 as well?

It won't quite apply as-is due to major changes in the direct I/O
code.  it would be good to have, but need a slightly different approach.
I'll see if I'll have something ready for the next round of stable
updates.
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg KH April 1, 2017, 5:26 p.m. UTC | #3
On Sat, Apr 01, 2017 at 07:21:08PM +0200, Greg KH wrote:
> On Sat, Apr 01, 2017 at 08:34:58AM +0200, Christoph Hellwig wrote:
> > commit 54a4ef8af4e0dc5c983d17fcb9cf5fd25666d94e upstream.
> > 
> > We currently fall back from direct to buffered writes if we detect a
> > remaining shared extent in the iomap_begin callback.  But by the time
> > iomap_begin is called for the potentially unaligned end block we might
> > have already written most of the data to disk, which we'd now write
> > again using buffered I/O.  To avoid this reject all writes to reflinked
> > files before starting I/O so that we are guaranteed to only write the
> > data once.
> > 
> > The alternative would be to unshare the unaligned start and/or end block
> > before doing the I/O. I think that's doable, and will actually be
> > required to support reflinks on DAX file system.  But it will take a
> > little more time and I'd rather get rid of the double write ASAP.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > Reviewed-by: Brian Foster <bfoster@redhat.com>
> > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Any specific reason you don't want this one in 4.9 as well?

Nevermind, it's there, missed it, my fault...

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig April 1, 2017, 5:42 p.m. UTC | #4
On Sat, Apr 01, 2017 at 07:26:03PM +0200, Greg KH wrote:
> Nevermind, it's there, missed it, my fault...

And I forgot that I actually updated it for the context changes
already and even commented on that in the changelog..
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 032c8a74824a..2a695a8f4fe7 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -527,6 +527,15 @@  xfs_file_dio_aio_write(
 	if ((iocb->ki_pos & mp->m_blockmask) ||
 	    ((iocb->ki_pos + count) & mp->m_blockmask)) {
 		unaligned_io = 1;
+
+		/*
+		 * We can't properly handle unaligned direct I/O to reflink
+		 * files yet, as we can't unshare a partial block.
+		 */
+		if (xfs_is_reflink_inode(ip)) {
+			trace_xfs_reflink_bounce_dio_write(ip, iocb->ki_pos, count);
+			return -EREMCHG;
+		}
 		iolock = XFS_IOLOCK_EXCL;
 	} else {
 		iolock = XFS_IOLOCK_SHARED;
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 32b113c4e973..e8811bd1019b 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1026,17 +1026,7 @@  xfs_file_iomap_begin(
 		if (error)
 			goto out_unlock;
 
-		/*
-		 * We're here because we're trying to do a directio write to a
-		 * region that isn't aligned to a filesystem block.  If the
-		 * extent is shared, fall back to buffered mode to handle the
-		 * RMW.
-		 */
-		if (!(flags & IOMAP_REPORT) && shared) {
-			trace_xfs_reflink_bounce_dio_write(ip, &imap);
-			error = -EREMCHG;
-			goto out_unlock;
-		}
+		ASSERT((flags & IOMAP_REPORT) || !shared);
 	}
 
 	if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && xfs_is_reflink_inode(ip)) {
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index d3d11905c55c..375c5e030e5b 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -3250,7 +3250,7 @@  DEFINE_INODE_IREC_EVENT(xfs_reflink_convert_cow);
 DEFINE_RW_EVENT(xfs_reflink_reserve_cow);
 DEFINE_RW_EVENT(xfs_reflink_allocate_cow_range);
 
-DEFINE_INODE_IREC_EVENT(xfs_reflink_bounce_dio_write);
+DEFINE_SIMPLE_IO_EVENT(xfs_reflink_bounce_dio_write);
 DEFINE_IOMAP_EVENT(xfs_reflink_find_cow_mapping);
 DEFINE_INODE_IREC_EVENT(xfs_reflink_trim_irec);