Message ID | 159011600308.76931.7853207930055232164.stgit@magnolia (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | xfs: fix stale disk exposure after crash | expand |
On Thu, May 21, 2020 at 07:53:23PM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <darrick.wong@oracle.com> > > When writing to a delalloc region in the data fork, commit the new > allocations (of the da reservation) as unwritten so that the mappings > are only marked written once writeback completes successfully. This > fixes the problem of stale data exposure if the system goes down during > targeted writeback of a specific region of a file, as tested by > generic/042. > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > Reviewed-by: Christoph Hellwig <hch@lst.de> > Reviewed-by: Brian Foster <bfoster@redhat.com> Observation: yesterday I forced a 4kB file create workload to use unwritten extents by setting an extent size hint. That caused about 4,500 xfs-conv kworker threads to be spawned by the workload which had 16 userspace processes creating files... I expect that any sort of "create lots of small files" worklaod is going to cause xfs-conv kworker explosions, so be prepared for users to start reporting kworker explosions with this in place... Cheers, Dave.
On Fri, May 22, 2020 at 01:31:02PM +1000, Dave Chinner wrote: > On Thu, May 21, 2020 at 07:53:23PM -0700, Darrick J. Wong wrote: > > From: Darrick J. Wong <darrick.wong@oracle.com> > > > > When writing to a delalloc region in the data fork, commit the new > > allocations (of the da reservation) as unwritten so that the mappings > > are only marked written once writeback completes successfully. This > > fixes the problem of stale data exposure if the system goes down during > > targeted writeback of a specific region of a file, as tested by > > generic/042. > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > > Reviewed-by: Christoph Hellwig <hch@lst.de> > > Reviewed-by: Brian Foster <bfoster@redhat.com> > > Observation: yesterday I forced a 4kB file create workload > to use unwritten extents by setting an extent size hint. That caused > about 4,500 xfs-conv kworker threads to be spawned by the workload > which had 16 userspace processes creating files... > > I expect that any sort of "create lots of small files" worklaod is > going to cause xfs-conv kworker explosions, so be prepared for users > to start reporting kworker explosions with this in place... Yes, I've been running this patch internally for months and /so far/ the conv explosions haven't generated any additional support calls. (That said, we probably ought to constrain that a bit, there's really no point in allowing concurrency beyond some unholy mix of AG count and the estimated iops capacity of the storage...) --D > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index fda13cd7add0..825d170e1503 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -4193,17 +4193,7 @@ xfs_bmapi_allocate( bma->got.br_blockcount = bma->length; bma->got.br_state = XFS_EXT_NORM; - /* - * In the data fork, a wasdelay extent has been initialized, so - * shouldn't be flagged as unwritten. - * - * For the cow fork, however, we convert delalloc reservations - * (extents allocated for speculative preallocation) to - * allocated unwritten extents, and only convert the unwritten - * extents to real extents when we're about to write the data. - */ - if ((!bma->wasdel || (bma->flags & XFS_BMAPI_COWFORK)) && - (bma->flags & XFS_BMAPI_PREALLOC)) + if (bma->flags & XFS_BMAPI_PREALLOC) bma->got.br_state = XFS_EXT_UNWRITTEN; if (bma->wasdel) @@ -4611,8 +4601,24 @@ xfs_bmapi_convert_delalloc( bma.offset = bma.got.br_startoff; bma.length = max_t(xfs_filblks_t, bma.got.br_blockcount, MAXEXTLEN); bma.minleft = xfs_bmapi_minleft(tp, ip, whichfork); + + /* + * When we're converting the delalloc reservations backing dirty pages + * in the page cache, we must be careful about how we create the new + * extents: + * + * New CoW fork extents are created unwritten, turned into real extents + * when we're about to write the data to disk, and mapped into the data + * fork after the write finishes. End of story. + * + * New data fork extents must be mapped in as unwritten and converted + * to real extents after the write succeeds to avoid exposing stale + * disk contents if we crash. + */ if (whichfork == XFS_COW_FORK) bma.flags = XFS_BMAPI_COWFORK | XFS_BMAPI_PREALLOC; + else + bma.flags = XFS_BMAPI_PREALLOC; if (!xfs_iext_peek_prev_extent(ifp, &bma.icur, &bma.prev)) bma.prev.br_startoff = NULLFILEOFF;