diff mbox

[67/71] xfs: fail ->bmap for reflink inodes

Message ID 147216835350.867.14512964799631563964.stgit@birch.djwong.org (mailing list archive)
State New, archived
Headers show

Commit Message

Darrick J. Wong Aug. 25, 2016, 11:39 p.m. UTC
From: Christoph Hellwig <hch@lst.de>

Have xfs_vm_bmap return zero for reflinked files.  This hack prevents
using a file with shared blocks as a swap file, because we don't want
to deal with CoW when we're (probably) low on memory.

Signed-off-by: Christoph Hellwig <hch@lst.de>
[darrick.wong@oracle.com: add a more descriptive changelog]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_aops.c |   11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Christoph Hellwig Sept. 6, 2016, 3:29 p.m. UTC | #1
On Thu, Aug 25, 2016 at 04:39:13PM -0700, Darrick J. Wong wrote:
> From: Christoph Hellwig <hch@lst.de>
> 
> Have xfs_vm_bmap return zero for reflinked files.  This hack prevents
> using a file with shared blocks as a swap file, because we don't want
> to deal with CoW when we're (probably) low on memory.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> [darrick.wong@oracle.com: add a more descriptive changelog]

Which happens to be incorrect :)  The swap code uses ->bmap to build
a logical to physical block map at swapon time (to avoid allocations
or even just block mappings under memory pressure I suspect).  This
means we'd get reliable data corruption when COWing a swap file.

> +	 * The swap code (ab-)uses ->bmap to get a block mapping and then
> +	 * bypasse?? the file system for actual I/O.  We really can't allow

Also it seems like I introduced some weird character instead of an
"s" here..
Darrick J. Wong Sept. 6, 2016, 4:26 p.m. UTC | #2
On Tue, Sep 06, 2016 at 08:29:48AM -0700, Christoph Hellwig wrote:
> On Thu, Aug 25, 2016 at 04:39:13PM -0700, Darrick J. Wong wrote:
> > From: Christoph Hellwig <hch@lst.de>
> > 
> > Have xfs_vm_bmap return zero for reflinked files.  This hack prevents
> > using a file with shared blocks as a swap file, because we don't want
> > to deal with CoW when we're (probably) low on memory.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > [darrick.wong@oracle.com: add a more descriptive changelog]
> 
> Which happens to be incorrect :)  The swap code uses ->bmap to build
> a logical to physical block map at swapon time (to avoid allocations
> or even just block mappings under memory pressure I suspect).  This
> means we'd get reliable data corruption when COWing a swap file.

Yeah, at the time I thought I was going to write out a clever resolution to the
conflict between swap and CoW by using swap_activate to punch and fallocate all
the shared extents and turn off the inode reflink flag which would then enable
me to drop this patch altogether, but then got busy and forgot all about it.
Clearly I didn't fix the changelog here either. :(

"Have xfs_vm_bmap return zero for reflinked files because the bmap-based
swap code requires static block mappings, which is incompatible with
copy on write."

> 
> > +	 * The swap code (ab-)uses ->bmap to get a block mapping and then
> > +	 * bypasse?? the file system for actual I/O.  We really can't allow
> 
> Also it seems like I introduced some weird character instead of an
> "s" here..

Will fix that while I'm at it.

--D
Christoph Hellwig Sept. 6, 2016, 5:02 p.m. UTC | #3
On Tue, Sep 06, 2016 at 09:26:51AM -0700, Darrick J. Wong wrote:
> Yeah, at the time I thought I was going to write out a clever resolution to the
> conflict between swap and CoW by using swap_activate to punch and fallocate all
> the shared extents and turn off the inode reflink flag which would then enable
> me to drop this patch altogether, but then got busy and forgot all about it.
> Clearly I didn't fix the changelog here either. :(

I looked at that, but that code is such a mess, doing reads through
->readpage and writes using direct_IO and requiring all kinds of ropes
to jump through..

> "Have xfs_vm_bmap return zero for reflinked files because the bmap-based
> swap code requires static block mappings, which is incompatible with
> copy on write."

Sounds fine.  Might be worth addings something like: "And existing
userspace bmap users such as lilo will have the same problems".
diff mbox

Patch

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index a85bc62..f564d02 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1553,6 +1553,17 @@  xfs_vm_bmap(
 
 	trace_xfs_vm_bmap(XFS_I(inode));
 	xfs_ilock(ip, XFS_IOLOCK_SHARED);
+
+	/*
+	 * The swap code (ab-)uses ->bmap to get a block mapping and then
+	 * bypasseѕ the file system for actual I/O.  We really can't allow
+	 * that on reflinks inodes, so we have to skip out here.  And yes,
+	 * 0 is the magic code for a bmap error..
+	 */
+	if (xfs_is_reflink_inode(ip)) {
+		xfs_iunlock(ip, XFS_IOLOCK_SHARED);
+		return 0;
+	}
 	filemap_write_and_wait(mapping);
 	xfs_iunlock(ip, XFS_IOLOCK_SHARED);
 	return generic_block_bmap(mapping, block, xfs_get_blocks);