Message ID | 20181001123741.32005-3-hch@lst.de (mailing list archive) |
---|---|
State | Deferred, archived |
Headers | show |
Series | [1/7] xfs: remove XFS_IO_INVALID | expand |
On Mon, Oct 01, 2018 at 05:37:36AM -0700, Christoph Hellwig wrote: > XFS historically had a small race that could lead to exposing > uninitialized data in case of a crash. If we are filling holes using > buffered I/O we convert the delayed allocation to a real allocation > before writing out the data. If we crash after the blocks were > allocated, but before the data was written this could lead to reading > uninitialized blocks (or leaked data from a previous allocation that was > reused). Now that we have the CIL logging extent format changes is > cheap, so we can switch to always allocating blocks as unwritten. > Note that this is not be strictly necessary for writes that append > beyond i_size, but given that we have to log a transaction in that > case anyway we might as well give all block allocations a uniform > treatment. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- It's great that we can finally fix this, particularly with such a simple change. IIRC, the only real thing standing in the way was the buffer head delalloc state management mess. > fs/xfs/xfs_aops.c | 3 +-- > fs/xfs/xfs_aops.h | 2 -- > fs/xfs/xfs_iomap.c | 4 ++-- > 3 files changed, 3 insertions(+), 6 deletions(-) > ... > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c > index 6320aca39f39..10fc93cebc42 100644 > --- a/fs/xfs/xfs_iomap.c > +++ b/fs/xfs/xfs_iomap.c > @@ -662,11 +662,11 @@ xfs_iomap_write_allocate( > xfs_trans_t *tp; > int nimaps; > int error = 0; > - int flags = XFS_BMAPI_DELALLOC; > + int flags = XFS_BMAPI_DELALLOC | XFS_BMAPI_PREALLOC; ... though I don't quite think this is sufficient. xfs_bmapi_allocate() has this snippet of code: if ((!bma->wasdel || (bma->flags & XFS_BMAPI_COWFORK)) && (bma->flags & XFS_BMAPI_PREALLOC) && xfs_sb_version_hasextflgbit(&mp->m_sb)) bma->got.br_state = XFS_EXT_UNWRITTEN; ... which looks like it explicitly bypasses the PREALLOC flag for delalloc extents. I figured this would just be an inefficiency since prealloc conversion comes later, but if you look at xfs_bmapi_convert_unwritten(): /* check if we need to do real->unwritten conversion */ if (mval->br_state == XFS_EXT_NORM && (flags & (XFS_BMAPI_PREALLOC | XFS_BMAPI_CONVERT)) != (XFS_BMAPI_PREALLOC | XFS_BMAPI_CONVERT)) return 0; ... it sees this as an existing extent and so doesn't change the state unless the CONVERT flag is also passed. A quick test to shut down immediately after the xfs_iomap_write_allocate() transaction commits seems to confirm this behavior: # xfs_io -c "fiemap -v" /mnt/file /mnt/file: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..7]: 72..79 8 0x1 I think the right fix here is to remove the referenced logic from xfs_bmapi_allocate(). I also think this demonstrates the need for an xfstest. ;) Expected behavior should be easy to confirm with a new error tag, for example. Brian > int nres; > > if (whichfork == XFS_COW_FORK) > - flags |= XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC; > + flags |= XFS_BMAPI_COWFORK; > > /* > * Make sure that the dquots are there. > -- > 2.19.0 >
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index 338b9d9984e0..775cdcfe70c2 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -437,8 +437,7 @@ xfs_map_blocks( imap.br_blockcount = cow_fsb - imap.br_startoff; if (isnullstartblock(imap.br_startblock)) { - /* got a delalloc extent */ - wpc->io_type = XFS_IO_DELALLOC; + wpc->io_type = XFS_IO_UNWRITTEN; goto allocate_blocks; } diff --git a/fs/xfs/xfs_aops.h b/fs/xfs/xfs_aops.h index 494b4338446e..f0710c54cf68 100644 --- a/fs/xfs/xfs_aops.h +++ b/fs/xfs/xfs_aops.h @@ -13,7 +13,6 @@ extern struct bio_set xfs_ioend_bioset; */ enum { XFS_IO_HOLE, /* covers region without any block allocation */ - XFS_IO_DELALLOC, /* covers delalloc region */ XFS_IO_UNWRITTEN, /* covers allocated but uninitialized data */ XFS_IO_OVERWRITE, /* covers already allocated extent */ XFS_IO_COW, /* covers copy-on-write extent */ @@ -21,7 +20,6 @@ enum { #define XFS_IO_TYPES \ { XFS_IO_HOLE, "hole" }, \ - { XFS_IO_DELALLOC, "delalloc" }, \ { XFS_IO_UNWRITTEN, "unwritten" }, \ { XFS_IO_OVERWRITE, "overwrite" }, \ { XFS_IO_COW, "CoW" } diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 6320aca39f39..10fc93cebc42 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -662,11 +662,11 @@ xfs_iomap_write_allocate( xfs_trans_t *tp; int nimaps; int error = 0; - int flags = XFS_BMAPI_DELALLOC; + int flags = XFS_BMAPI_DELALLOC | XFS_BMAPI_PREALLOC; int nres; if (whichfork == XFS_COW_FORK) - flags |= XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC; + flags |= XFS_BMAPI_COWFORK; /* * Make sure that the dquots are there.
XFS historically had a small race that could lead to exposing uninitialized data in case of a crash. If we are filling holes using buffered I/O we convert the delayed allocation to a real allocation before writing out the data. If we crash after the blocks were allocated, but before the data was written this could lead to reading uninitialized blocks (or leaked data from a previous allocation that was reused). Now that we have the CIL logging extent format changes is cheap, so we can switch to always allocating blocks as unwritten. Note that this is not be strictly necessary for writes that append beyond i_size, but given that we have to log a transaction in that case anyway we might as well give all block allocations a uniform treatment. Signed-off-by: Christoph Hellwig <hch@lst.de> --- fs/xfs/xfs_aops.c | 3 +-- fs/xfs/xfs_aops.h | 2 -- fs/xfs/xfs_iomap.c | 4 ++-- 3 files changed, 3 insertions(+), 6 deletions(-)