diff mbox series

[2/7] xfs: always allocate blocks as unwritten for file data

Message ID 20181001123741.32005-3-hch@lst.de (mailing list archive)
State Deferred, archived
Headers show
Series [1/7] xfs: remove XFS_IO_INVALID | expand

Commit Message

Christoph Hellwig Oct. 1, 2018, 12:37 p.m. UTC
XFS historically had a small race that could lead to exposing
uninitialized data in case of a crash.  If we are filling holes using
buffered I/O we convert the delayed allocation to a real allocation
before writing out the data.  If we crash after the blocks were
allocated, but before the data was written this could lead to reading
uninitialized blocks (or leaked data from a previous allocation that was
reused).  Now that we have the CIL logging extent format changes is
cheap, so we can switch to always allocating blocks as unwritten.
Note that this is not be strictly necessary for writes that append
beyond i_size, but given that we have to log a transaction in that
case anyway we might as well give all block allocations a uniform
treatment.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_aops.c  | 3 +--
 fs/xfs/xfs_aops.h  | 2 --
 fs/xfs/xfs_iomap.c | 4 ++--
 3 files changed, 3 insertions(+), 6 deletions(-)

Comments

Brian Foster Oct. 1, 2018, 2:19 p.m. UTC | #1
On Mon, Oct 01, 2018 at 05:37:36AM -0700, Christoph Hellwig wrote:
> XFS historically had a small race that could lead to exposing
> uninitialized data in case of a crash.  If we are filling holes using
> buffered I/O we convert the delayed allocation to a real allocation
> before writing out the data.  If we crash after the blocks were
> allocated, but before the data was written this could lead to reading
> uninitialized blocks (or leaked data from a previous allocation that was
> reused).  Now that we have the CIL logging extent format changes is
> cheap, so we can switch to always allocating blocks as unwritten.
> Note that this is not be strictly necessary for writes that append
> beyond i_size, but given that we have to log a transaction in that
> case anyway we might as well give all block allocations a uniform
> treatment.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---

It's great that we can finally fix this, particularly with such a simple
change. IIRC, the only real thing standing in the way was the buffer
head delalloc state management mess.

>  fs/xfs/xfs_aops.c  | 3 +--
>  fs/xfs/xfs_aops.h  | 2 --
>  fs/xfs/xfs_iomap.c | 4 ++--
>  3 files changed, 3 insertions(+), 6 deletions(-)
> 
...
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 6320aca39f39..10fc93cebc42 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -662,11 +662,11 @@ xfs_iomap_write_allocate(
>  	xfs_trans_t	*tp;
>  	int		nimaps;
>  	int		error = 0;
> -	int		flags = XFS_BMAPI_DELALLOC;
> +	int		flags = XFS_BMAPI_DELALLOC | XFS_BMAPI_PREALLOC;

... though I don't quite think this is sufficient. xfs_bmapi_allocate()
has this snippet of code:

        if ((!bma->wasdel || (bma->flags & XFS_BMAPI_COWFORK)) &&
            (bma->flags & XFS_BMAPI_PREALLOC) &&
            xfs_sb_version_hasextflgbit(&mp->m_sb))
                bma->got.br_state = XFS_EXT_UNWRITTEN;

... which looks like it explicitly bypasses the PREALLOC flag for
delalloc extents. I figured this would just be an inefficiency since
prealloc conversion comes later, but if you look at
xfs_bmapi_convert_unwritten():

        /* check if we need to do real->unwritten conversion */
        if (mval->br_state == XFS_EXT_NORM &&
            (flags & (XFS_BMAPI_PREALLOC | XFS_BMAPI_CONVERT)) !=
                        (XFS_BMAPI_PREALLOC | XFS_BMAPI_CONVERT))
                return 0;

... it sees this as an existing extent and so doesn't change the state
unless the CONVERT flag is also passed. A quick test to shut down
immediately after the xfs_iomap_write_allocate() transaction commits
seems to confirm this behavior:

# xfs_io -c "fiemap -v" /mnt/file
/mnt/file:
 EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
   0: [0..7]:          72..79               8   0x1

I think the right fix here is to remove the referenced logic from
xfs_bmapi_allocate(). I also think this demonstrates the need for an
xfstest. ;) Expected behavior should be easy to confirm with a new error
tag, for example.

Brian

>  	int		nres;
>  
>  	if (whichfork == XFS_COW_FORK)
> -		flags |= XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC;
> +		flags |= XFS_BMAPI_COWFORK;
>  
>  	/*
>  	 * Make sure that the dquots are there.
> -- 
> 2.19.0
>
diff mbox series

Patch

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 338b9d9984e0..775cdcfe70c2 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -437,8 +437,7 @@  xfs_map_blocks(
 			imap.br_blockcount = cow_fsb - imap.br_startoff;
 
 		if (isnullstartblock(imap.br_startblock)) {
-			/* got a delalloc extent */
-			wpc->io_type = XFS_IO_DELALLOC;
+			wpc->io_type = XFS_IO_UNWRITTEN;
 			goto allocate_blocks;
 		}
 
diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h
index 494b4338446e..f0710c54cf68 100644
--- a/fs/xfs/xfs_aops.h
+++ b/fs/xfs/xfs_aops.h
@@ -13,7 +13,6 @@  extern struct bio_set xfs_ioend_bioset;
  */
 enum {
 	XFS_IO_HOLE,		/* covers region without any block allocation */
-	XFS_IO_DELALLOC,	/* covers delalloc region */
 	XFS_IO_UNWRITTEN,	/* covers allocated but uninitialized data */
 	XFS_IO_OVERWRITE,	/* covers already allocated extent */
 	XFS_IO_COW,		/* covers copy-on-write extent */
@@ -21,7 +20,6 @@  enum {
 
 #define XFS_IO_TYPES \
 	{ XFS_IO_HOLE,			"hole" },	\
-	{ XFS_IO_DELALLOC,		"delalloc" },	\
 	{ XFS_IO_UNWRITTEN,		"unwritten" },	\
 	{ XFS_IO_OVERWRITE,		"overwrite" },	\
 	{ XFS_IO_COW,			"CoW" }
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 6320aca39f39..10fc93cebc42 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -662,11 +662,11 @@  xfs_iomap_write_allocate(
 	xfs_trans_t	*tp;
 	int		nimaps;
 	int		error = 0;
-	int		flags = XFS_BMAPI_DELALLOC;
+	int		flags = XFS_BMAPI_DELALLOC | XFS_BMAPI_PREALLOC;
 	int		nres;
 
 	if (whichfork == XFS_COW_FORK)
-		flags |= XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC;
+		flags |= XFS_BMAPI_COWFORK;
 
 	/*
 	 * Make sure that the dquots are there.