diff mbox series

xfs: fix shared extent data corruption due to missing cow reservation

Message ID 20181113170819.15220-1-bfoster@redhat.com (mailing list archive)
State Superseded, archived
Headers show
Series xfs: fix shared extent data corruption due to missing cow reservation | expand

Commit Message

Brian Foster Nov. 13, 2018, 5:08 p.m. UTC
Page writeback indirectly handles shared extents via the existence
of overlapping COW fork blocks. If COW fork blocks exist, writeback
always performs the associated copy-on-write regardless if the
underlying blocks are actually shared. If the blocks are shared,
then overlapping COW fork blocks must always exist.

fstests shared/010 reproduces a case where a buffered write occurs
over a shared block without performing the requisite COW fork
reservation.  This ultimately causes writeback to the shared extent
and data corruption that is detected across md5 checks of the
filesystem across a mount cycle.

The problem occurs when a buffered write lands over a shared extent
that crosses an extent size hint boundary and that also happens to
have a partial COW reservation that doesn't cover the start and end
blocks of the data fork extent.

For example, a buffered write occurs across the file offset (in FSB
units) range of [29, 57]. A shared extent exists at blocks [29, 35]
and COW reservation already exists at blocks [32, 34]. After
accommodating a COW extent size hint of 32 blocks and the existing
reservation at offset 32, xfs_reflink_reserve_cow() allocates 32
blocks of reservation at offset 0 and returns with COW reservation
across the range of [0, 34]. The associated data fork extent is
still [29, 35], however, which isn't fully covered by the COW
reservation.

This leads to a buffered write at file offset 35 over a shared
extent without associated COW reservation. Writeback eventually
kicks in, performs an overwrite of the underlying shared block and
causes the associated data corruption.

Update xfs_reflink_reserve_cow() to accommodate the fact that a
delalloc allocation request may not fully cover the extent in the
data fork. Trim the data fork extent appropriately, just as is done
for shared extent boundaries and/or existing COW reservations that
happen to overlap the start of the data fork extent. This prevents
shared/010 failures due to data corruption on reflink enabled
filesystems.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---

This is not fully tested yet beyond verification that it solves the
problem reproduced by shared/010. I'll be running more tests today, but
I'm sending sooner for review and testing due to the nature of the
problem and the fact that it's a fairly isolated change. I'll follow up
if I discover any resulting regressions..

Brian

 fs/xfs/xfs_reflink.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Darrick J. Wong Nov. 15, 2018, 5:50 a.m. UTC | #1
On Tue, Nov 13, 2018 at 12:08:19PM -0500, Brian Foster wrote:
> Page writeback indirectly handles shared extents via the existence
> of overlapping COW fork blocks. If COW fork blocks exist, writeback
> always performs the associated copy-on-write regardless if the
> underlying blocks are actually shared. If the blocks are shared,
> then overlapping COW fork blocks must always exist.
> 
> fstests shared/010 reproduces a case where a buffered write occurs
> over a shared block without performing the requisite COW fork
> reservation.  This ultimately causes writeback to the shared extent
> and data corruption that is detected across md5 checks of the
> filesystem across a mount cycle.
> 
> The problem occurs when a buffered write lands over a shared extent
> that crosses an extent size hint boundary and that also happens to
> have a partial COW reservation that doesn't cover the start and end
> blocks of the data fork extent.
> 
> For example, a buffered write occurs across the file offset (in FSB
> units) range of [29, 57]. A shared extent exists at blocks [29, 35]
> and COW reservation already exists at blocks [32, 34]. After
> accommodating a COW extent size hint of 32 blocks and the existing
> reservation at offset 32, xfs_reflink_reserve_cow() allocates 32
> blocks of reservation at offset 0 and returns with COW reservation
> across the range of [0, 34]. The associated data fork extent is
> still [29, 35], however, which isn't fully covered by the COW
> reservation.
> 
> This leads to a buffered write at file offset 35 over a shared
> extent without associated COW reservation. Writeback eventually
> kicks in, performs an overwrite of the underlying shared block and
> causes the associated data corruption.
> 
> Update xfs_reflink_reserve_cow() to accommodate the fact that a
> delalloc allocation request may not fully cover the extent in the
> data fork. Trim the data fork extent appropriately, just as is done
> for shared extent boundaries and/or existing COW reservations that
> happen to overlap the start of the data fork extent. This prevents
> shared/010 failures due to data corruption on reflink enabled
> filesystems.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
> 
> This is not fully tested yet beyond verification that it solves the
> problem reproduced by shared/010. I'll be running more tests today, but
> I'm sending sooner for review and testing due to the nature of the
> problem and the fact that it's a fairly isolated change. I'll follow up
> if I discover any resulting regressions..

Did you find any regressions?

I ran this through my overnight tests and saw no adverse effects, though
Dave was complaining yesterday about continuing generic/091 corruptions
(which I didn't see with this patch applied...)

Anyway it looks reasonable to me...

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> Brian
> 
>  fs/xfs/xfs_reflink.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index ecdb086bc23e..c56bdbfcf7ae 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -296,6 +296,7 @@ xfs_reflink_reserve_cow(
>  	if (error)
>  		return error;
>  
> +	xfs_trim_extent(imap, got.br_startoff, got.br_blockcount);
>  	trace_xfs_reflink_cow_alloc(ip, &got);
>  	return 0;
>  }
> -- 
> 2.17.2
>
Christoph Hellwig Nov. 15, 2018, 9:49 a.m. UTC | #2
On Wed, Nov 14, 2018 at 09:50:20PM -0800, Darrick J. Wong wrote:
> I ran this through my overnight tests and saw no adverse effects, though
> Dave was complaining yesterday about continuing generic/091 corruptions
> (which I didn't see with this patch applied...)

I've seen occasional 091 corruption failures in always_cow mode, but
not often enough to be easily reproducible.
Christoph Hellwig Nov. 15, 2018, 9:50 a.m. UTC | #3
On Tue, Nov 13, 2018 at 12:08:19PM -0500, Brian Foster wrote:
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index ecdb086bc23e..c56bdbfcf7ae 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -296,6 +296,7 @@ xfs_reflink_reserve_cow(
>  	if (error)
>  		return error;
>  
> +	xfs_trim_extent(imap, got.br_startoff, got.br_blockcount);
>  	trace_xfs_reflink_cow_alloc(ip, &got);
>  	return 0;

It would be kinda nice to have a comment explaining this trimming,
and a goto label to share it with the case where we find existing
delalloc reservations.

That being said my always_cow series totally reshuffles this area, so
it might not be worth to spend the effort here, so:

Reviewed-by: Christoph Hellwig <hch@lst.de>
Brian Foster Nov. 15, 2018, 12:33 p.m. UTC | #4
On Wed, Nov 14, 2018 at 09:50:20PM -0800, Darrick J. Wong wrote:
> On Tue, Nov 13, 2018 at 12:08:19PM -0500, Brian Foster wrote:
> > Page writeback indirectly handles shared extents via the existence
> > of overlapping COW fork blocks. If COW fork blocks exist, writeback
> > always performs the associated copy-on-write regardless if the
> > underlying blocks are actually shared. If the blocks are shared,
> > then overlapping COW fork blocks must always exist.
> > 
> > fstests shared/010 reproduces a case where a buffered write occurs
> > over a shared block without performing the requisite COW fork
> > reservation.  This ultimately causes writeback to the shared extent
> > and data corruption that is detected across md5 checks of the
> > filesystem across a mount cycle.
> > 
> > The problem occurs when a buffered write lands over a shared extent
> > that crosses an extent size hint boundary and that also happens to
> > have a partial COW reservation that doesn't cover the start and end
> > blocks of the data fork extent.
> > 
> > For example, a buffered write occurs across the file offset (in FSB
> > units) range of [29, 57]. A shared extent exists at blocks [29, 35]
> > and COW reservation already exists at blocks [32, 34]. After
> > accommodating a COW extent size hint of 32 blocks and the existing
> > reservation at offset 32, xfs_reflink_reserve_cow() allocates 32
> > blocks of reservation at offset 0 and returns with COW reservation
> > across the range of [0, 34]. The associated data fork extent is
> > still [29, 35], however, which isn't fully covered by the COW
> > reservation.
> > 
> > This leads to a buffered write at file offset 35 over a shared
> > extent without associated COW reservation. Writeback eventually
> > kicks in, performs an overwrite of the underlying shared block and
> > causes the associated data corruption.
> > 
> > Update xfs_reflink_reserve_cow() to accommodate the fact that a
> > delalloc allocation request may not fully cover the extent in the
> > data fork. Trim the data fork extent appropriately, just as is done
> > for shared extent boundaries and/or existing COW reservations that
> > happen to overlap the start of the data fork extent. This prevents
> > shared/010 failures due to data corruption on reflink enabled
> > filesystems.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> > 
> > This is not fully tested yet beyond verification that it solves the
> > problem reproduced by shared/010. I'll be running more tests today, but
> > I'm sending sooner for review and testing due to the nature of the
> > problem and the fact that it's a fairly isolated change. I'll follow up
> > if I discover any resulting regressions..
> 
> Did you find any regressions?
> 

I ended up having to restart my test run because I was hitting writeback
livelocks reported in the other large blocksize series. The testing
otherwise finished last night with no regressions. I do see what look
like corruption failures on generic/127 and generic/091 with fstests
patched with your fsx enhancements, but I see those same failures on
for-next so I suspect that is an independent issue.

> I ran this through my overnight tests and saw no adverse effects, though
> Dave was complaining yesterday about continuing generic/091 corruptions
> (which I didn't see with this patch applied...)
> 
> Anyway it looks reasonable to me...
> 
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
> 

Thanks for the review and additional testing.

Brian

> --D
> 
> > Brian
> > 
> >  fs/xfs/xfs_reflink.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> > index ecdb086bc23e..c56bdbfcf7ae 100644
> > --- a/fs/xfs/xfs_reflink.c
> > +++ b/fs/xfs/xfs_reflink.c
> > @@ -296,6 +296,7 @@ xfs_reflink_reserve_cow(
> >  	if (error)
> >  		return error;
> >  
> > +	xfs_trim_extent(imap, got.br_startoff, got.br_blockcount);
> >  	trace_xfs_reflink_cow_alloc(ip, &got);
> >  	return 0;
> >  }
> > -- 
> > 2.17.2
> >
Eric Sandeen Nov. 15, 2018, 3:51 p.m. UTC | #5
On 11/13/18 11:08 AM, Brian Foster wrote:

> For example, a buffered write occurs across the file offset (in FSB
> units) range of [29, 57]. A shared extent exists at blocks [29, 35]
> and COW reservation already exists at blocks [32, 34]. After
> accommodating a COW extent size hint of 32 blocks and the existing
> reservation at offset 32, xfs_reflink_reserve_cow() allocates 32
> blocks of reservation at offset 0 and returns with COW reservation
> across the range of [0, 34]. The associated data fork extent is
> still [29, 35], however, which isn't fully covered by the COW
> reservation.
> 
> This leads to a buffered write at file offset 35 over a shared
> extent without associated COW reservation. Writeback eventually
> kicks in, performs an overwrite of the underlying shared block and
> causes the associated data corruption.

Can you write this in the form of an xfstests reproducer please? :)

-Eric
Brian Foster Nov. 15, 2018, 3:58 p.m. UTC | #6
On Thu, Nov 15, 2018 at 09:51:58AM -0600, Eric Sandeen wrote:
> On 11/13/18 11:08 AM, Brian Foster wrote:
> 
> > For example, a buffered write occurs across the file offset (in FSB
> > units) range of [29, 57]. A shared extent exists at blocks [29, 35]
> > and COW reservation already exists at blocks [32, 34]. After
> > accommodating a COW extent size hint of 32 blocks and the existing
> > reservation at offset 32, xfs_reflink_reserve_cow() allocates 32
> > blocks of reservation at offset 0 and returns with COW reservation
> > across the range of [0, 34]. The associated data fork extent is
> > still [29, 35], however, which isn't fully covered by the COW
> > reservation.
> > 
> > This leads to a buffered write at file offset 35 over a shared
> > extent without associated COW reservation. Writeback eventually
> > kicks in, performs an overwrite of the underlying shared block and
> > causes the associated data corruption.
> 
> Can you write this in the form of an xfstests reproducer please? :)
> 

I'll add it to the todo list.

Brian

> -Eric
Eric Sandeen Nov. 15, 2018, 3:59 p.m. UTC | #7
On 11/15/18 9:58 AM, Brian Foster wrote:
> On Thu, Nov 15, 2018 at 09:51:58AM -0600, Eric Sandeen wrote:
>> On 11/13/18 11:08 AM, Brian Foster wrote:
>>
>>> For example, a buffered write occurs across the file offset (in FSB
>>> units) range of [29, 57]. A shared extent exists at blocks [29, 35]
>>> and COW reservation already exists at blocks [32, 34]. After
>>> accommodating a COW extent size hint of 32 blocks and the existing
>>> reservation at offset 32, xfs_reflink_reserve_cow() allocates 32
>>> blocks of reservation at offset 0 and returns with COW reservation
>>> across the range of [0, 34]. The associated data fork extent is
>>> still [29, 35], however, which isn't fully covered by the COW
>>> reservation.
>>>
>>> This leads to a buffered write at file offset 35 over a shared
>>> extent without associated COW reservation. Writeback eventually
>>> kicks in, performs an overwrite of the underlying shared block and
>>> causes the associated data corruption.
>>
>> Can you write this in the form of an xfstests reproducer please? :)
>>
> 
> I'll add it to the todo list.

thanks, it doesn't seem like the kind of thing that will be hit too often
at random, based on the struggles to reproduce it as first reported via
fsstress.

-Eric
Brian Foster Nov. 15, 2018, 4:10 p.m. UTC | #8
On Thu, Nov 15, 2018 at 09:59:33AM -0600, Eric Sandeen wrote:
> 
> 
> On 11/15/18 9:58 AM, Brian Foster wrote:
> > On Thu, Nov 15, 2018 at 09:51:58AM -0600, Eric Sandeen wrote:
> >> On 11/13/18 11:08 AM, Brian Foster wrote:
> >>
> >>> For example, a buffered write occurs across the file offset (in FSB
> >>> units) range of [29, 57]. A shared extent exists at blocks [29, 35]
> >>> and COW reservation already exists at blocks [32, 34]. After
> >>> accommodating a COW extent size hint of 32 blocks and the existing
> >>> reservation at offset 32, xfs_reflink_reserve_cow() allocates 32
> >>> blocks of reservation at offset 0 and returns with COW reservation
> >>> across the range of [0, 34]. The associated data fork extent is
> >>> still [29, 35], however, which isn't fully covered by the COW
> >>> reservation.
> >>>
> >>> This leads to a buffered write at file offset 35 over a shared
> >>> extent without associated COW reservation. Writeback eventually
> >>> kicks in, performs an overwrite of the underlying shared block and
> >>> causes the associated data corruption.
> >>
> >> Can you write this in the form of an xfstests reproducer please? :)
> >>
> > 
> > I'll add it to the todo list.
> 
> thanks, it doesn't seem like the kind of thing that will be hit too often
> at random, based on the struggles to reproduce it as first reported via
> fsstress.
> 

This reminds me that I wanted to look into DEBUG mode writeback time
detection of overwrites of shared extents. I think part of the
difficulty of reproducing it via shared/010 is that it required a cached
page over the particular shared block in another inode to detect the
corruption. If we can assert that overwrites are always !shared, that
removes that requirement and may avoid the need for a new test.

Brian

> -Eric
Dave Chinner Nov. 16, 2018, 4:35 a.m. UTC | #9
On Wed, Nov 14, 2018 at 09:50:20PM -0800, Darrick J. Wong wrote:
> On Tue, Nov 13, 2018 at 12:08:19PM -0500, Brian Foster wrote:
> > Page writeback indirectly handles shared extents via the existence
> > of overlapping COW fork blocks. If COW fork blocks exist, writeback
> > always performs the associated copy-on-write regardless if the
> > underlying blocks are actually shared. If the blocks are shared,
> > then overlapping COW fork blocks must always exist.
> > 
> > fstests shared/010 reproduces a case where a buffered write occurs
> > over a shared block without performing the requisite COW fork
> > reservation.  This ultimately causes writeback to the shared extent
> > and data corruption that is detected across md5 checks of the
> > filesystem across a mount cycle.
> > 
> > The problem occurs when a buffered write lands over a shared extent
> > that crosses an extent size hint boundary and that also happens to
> > have a partial COW reservation that doesn't cover the start and end
> > blocks of the data fork extent.
> > 
> > For example, a buffered write occurs across the file offset (in FSB
> > units) range of [29, 57]. A shared extent exists at blocks [29, 35]
> > and COW reservation already exists at blocks [32, 34]. After
> > accommodating a COW extent size hint of 32 blocks and the existing
> > reservation at offset 32, xfs_reflink_reserve_cow() allocates 32
> > blocks of reservation at offset 0 and returns with COW reservation
> > across the range of [0, 34]. The associated data fork extent is
> > still [29, 35], however, which isn't fully covered by the COW
> > reservation.
> > 
> > This leads to a buffered write at file offset 35 over a shared
> > extent without associated COW reservation. Writeback eventually
> > kicks in, performs an overwrite of the underlying shared block and
> > causes the associated data corruption.
> > 
> > Update xfs_reflink_reserve_cow() to accommodate the fact that a
> > delalloc allocation request may not fully cover the extent in the
> > data fork. Trim the data fork extent appropriately, just as is done
> > for shared extent boundaries and/or existing COW reservations that
> > happen to overlap the start of the data fork extent. This prevents
> > shared/010 failures due to data corruption on reflink enabled
> > filesystems.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> > 
> > This is not fully tested yet beyond verification that it solves the
> > problem reproduced by shared/010. I'll be running more tests today, but
> > I'm sending sooner for review and testing due to the nature of the
> > problem and the fact that it's a fairly isolated change. I'll follow up
> > if I discover any resulting regressions..
> 
> Did you find any regressions?
> 
> I ran this through my overnight tests and saw no adverse effects, though
> Dave was complaining yesterday about continuing generic/091 corruptions
> (which I didn't see with this patch applied...)

I can say now that this patch hasn't caused any new corruptions. It
hasn't fixed any of the (many) corruptions that I'm hitting, either,
so from that perspective it's no better or worse than what we have
now :P

Cheers,

Dave.
Brian Foster Nov. 16, 2018, 1:32 p.m. UTC | #10
On Fri, Nov 16, 2018 at 03:35:08PM +1100, Dave Chinner wrote:
> On Wed, Nov 14, 2018 at 09:50:20PM -0800, Darrick J. Wong wrote:
> > On Tue, Nov 13, 2018 at 12:08:19PM -0500, Brian Foster wrote:
> > > Page writeback indirectly handles shared extents via the existence
> > > of overlapping COW fork blocks. If COW fork blocks exist, writeback
> > > always performs the associated copy-on-write regardless if the
> > > underlying blocks are actually shared. If the blocks are shared,
> > > then overlapping COW fork blocks must always exist.
> > > 
> > > fstests shared/010 reproduces a case where a buffered write occurs
> > > over a shared block without performing the requisite COW fork
> > > reservation.  This ultimately causes writeback to the shared extent
> > > and data corruption that is detected across md5 checks of the
> > > filesystem across a mount cycle.
> > > 
> > > The problem occurs when a buffered write lands over a shared extent
> > > that crosses an extent size hint boundary and that also happens to
> > > have a partial COW reservation that doesn't cover the start and end
> > > blocks of the data fork extent.
> > > 
> > > For example, a buffered write occurs across the file offset (in FSB
> > > units) range of [29, 57]. A shared extent exists at blocks [29, 35]
> > > and COW reservation already exists at blocks [32, 34]. After
> > > accommodating a COW extent size hint of 32 blocks and the existing
> > > reservation at offset 32, xfs_reflink_reserve_cow() allocates 32
> > > blocks of reservation at offset 0 and returns with COW reservation
> > > across the range of [0, 34]. The associated data fork extent is
> > > still [29, 35], however, which isn't fully covered by the COW
> > > reservation.
> > > 
> > > This leads to a buffered write at file offset 35 over a shared
> > > extent without associated COW reservation. Writeback eventually
> > > kicks in, performs an overwrite of the underlying shared block and
> > > causes the associated data corruption.
> > > 
> > > Update xfs_reflink_reserve_cow() to accommodate the fact that a
> > > delalloc allocation request may not fully cover the extent in the
> > > data fork. Trim the data fork extent appropriately, just as is done
> > > for shared extent boundaries and/or existing COW reservations that
> > > happen to overlap the start of the data fork extent. This prevents
> > > shared/010 failures due to data corruption on reflink enabled
> > > filesystems.
> > > 
> > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > ---
> > > 
> > > This is not fully tested yet beyond verification that it solves the
> > > problem reproduced by shared/010. I'll be running more tests today, but
> > > I'm sending sooner for review and testing due to the nature of the
> > > problem and the fact that it's a fairly isolated change. I'll follow up
> > > if I discover any resulting regressions..
> > 
> > Did you find any regressions?
> > 
> > I ran this through my overnight tests and saw no adverse effects, though
> > Dave was complaining yesterday about continuing generic/091 corruptions
> > (which I didn't see with this patch applied...)
> 
> I can say now that this patch hasn't caused any new corruptions. It
> hasn't fixed any of the (many) corruptions that I'm hitting, either,
> so from that perspective it's no better or worse than what we have
> now :P
> 

So were you reproducing the shared/010 corruption or no? I suppose it
would be nice if this fixed some other problems, but it was only
intended to fix the problem Zorro was reproducing with shared/010.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
Dave Chinner Nov. 16, 2018, 9:19 p.m. UTC | #11
On Fri, Nov 16, 2018 at 08:32:23AM -0500, Brian Foster wrote:
> On Fri, Nov 16, 2018 at 03:35:08PM +1100, Dave Chinner wrote:
> > On Wed, Nov 14, 2018 at 09:50:20PM -0800, Darrick J. Wong wrote:
> > > On Tue, Nov 13, 2018 at 12:08:19PM -0500, Brian Foster wrote:
> > > > Page writeback indirectly handles shared extents via the existence
> > > > of overlapping COW fork blocks. If COW fork blocks exist, writeback
> > > > always performs the associated copy-on-write regardless if the
> > > > underlying blocks are actually shared. If the blocks are shared,
> > > > then overlapping COW fork blocks must always exist.
> > > > 
> > > > fstests shared/010 reproduces a case where a buffered write occurs
> > > > over a shared block without performing the requisite COW fork
> > > > reservation.  This ultimately causes writeback to the shared extent
> > > > and data corruption that is detected across md5 checks of the
> > > > filesystem across a mount cycle.
> > > > 
> > > > The problem occurs when a buffered write lands over a shared extent
> > > > that crosses an extent size hint boundary and that also happens to
> > > > have a partial COW reservation that doesn't cover the start and end
> > > > blocks of the data fork extent.
> > > > 
> > > > For example, a buffered write occurs across the file offset (in FSB
> > > > units) range of [29, 57]. A shared extent exists at blocks [29, 35]
> > > > and COW reservation already exists at blocks [32, 34]. After
> > > > accommodating a COW extent size hint of 32 blocks and the existing
> > > > reservation at offset 32, xfs_reflink_reserve_cow() allocates 32
> > > > blocks of reservation at offset 0 and returns with COW reservation
> > > > across the range of [0, 34]. The associated data fork extent is
> > > > still [29, 35], however, which isn't fully covered by the COW
> > > > reservation.
> > > > 
> > > > This leads to a buffered write at file offset 35 over a shared
> > > > extent without associated COW reservation. Writeback eventually
> > > > kicks in, performs an overwrite of the underlying shared block and
> > > > causes the associated data corruption.
> > > > 
> > > > Update xfs_reflink_reserve_cow() to accommodate the fact that a
> > > > delalloc allocation request may not fully cover the extent in the
> > > > data fork. Trim the data fork extent appropriately, just as is done
> > > > for shared extent boundaries and/or existing COW reservations that
> > > > happen to overlap the start of the data fork extent. This prevents
> > > > shared/010 failures due to data corruption on reflink enabled
> > > > filesystems.
> > > > 
> > > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > > ---
> > > > 
> > > > This is not fully tested yet beyond verification that it solves the
> > > > problem reproduced by shared/010. I'll be running more tests today, but
> > > > I'm sending sooner for review and testing due to the nature of the
> > > > problem and the fact that it's a fairly isolated change. I'll follow up
> > > > if I discover any resulting regressions..
> > > 
> > > Did you find any regressions?
> > > 
> > > I ran this through my overnight tests and saw no adverse effects, though
> > > Dave was complaining yesterday about continuing generic/091 corruptions
> > > (which I didn't see with this patch applied...)
> > 
> > I can say now that this patch hasn't caused any new corruptions. It
> > hasn't fixed any of the (many) corruptions that I'm hitting, either,
> > so from that perspective it's no better or worse than what we have
> > now :P
> 
> So were you reproducing the shared/010 corruption or no?

No, I wasn't, because I already had a patch in my tree that fixed
it, apparently (see followup to Darrick's flush-after-zero on
reflink patch). Basically, the EOF zeroing+truncation problem is
something fsx trips over quite quickly on 1k block size filesystems.
I've had a patch for it in my tree for about a week now.

So what I meant is that it didn't fix any of the fsx failures I'd
been seeing, but it also didn't introduce any new fsx failures,
either, as I was kind of hoping it would....

Cheers,

Dave.
Brian Foster Nov. 17, 2018, 1:33 p.m. UTC | #12
On Sat, Nov 17, 2018 at 08:19:36AM +1100, Dave Chinner wrote:
> On Fri, Nov 16, 2018 at 08:32:23AM -0500, Brian Foster wrote:
> > On Fri, Nov 16, 2018 at 03:35:08PM +1100, Dave Chinner wrote:
> > > On Wed, Nov 14, 2018 at 09:50:20PM -0800, Darrick J. Wong wrote:
> > > > On Tue, Nov 13, 2018 at 12:08:19PM -0500, Brian Foster wrote:
> > > > > Page writeback indirectly handles shared extents via the existence
> > > > > of overlapping COW fork blocks. If COW fork blocks exist, writeback
> > > > > always performs the associated copy-on-write regardless if the
> > > > > underlying blocks are actually shared. If the blocks are shared,
> > > > > then overlapping COW fork blocks must always exist.
> > > > > 
> > > > > fstests shared/010 reproduces a case where a buffered write occurs
> > > > > over a shared block without performing the requisite COW fork
> > > > > reservation.  This ultimately causes writeback to the shared extent
> > > > > and data corruption that is detected across md5 checks of the
> > > > > filesystem across a mount cycle.
> > > > > 
> > > > > The problem occurs when a buffered write lands over a shared extent
> > > > > that crosses an extent size hint boundary and that also happens to
> > > > > have a partial COW reservation that doesn't cover the start and end
> > > > > blocks of the data fork extent.
> > > > > 
> > > > > For example, a buffered write occurs across the file offset (in FSB
> > > > > units) range of [29, 57]. A shared extent exists at blocks [29, 35]
> > > > > and COW reservation already exists at blocks [32, 34]. After
> > > > > accommodating a COW extent size hint of 32 blocks and the existing
> > > > > reservation at offset 32, xfs_reflink_reserve_cow() allocates 32
> > > > > blocks of reservation at offset 0 and returns with COW reservation
> > > > > across the range of [0, 34]. The associated data fork extent is
> > > > > still [29, 35], however, which isn't fully covered by the COW
> > > > > reservation.
> > > > > 
> > > > > This leads to a buffered write at file offset 35 over a shared
> > > > > extent without associated COW reservation. Writeback eventually
> > > > > kicks in, performs an overwrite of the underlying shared block and
> > > > > causes the associated data corruption.
> > > > > 
> > > > > Update xfs_reflink_reserve_cow() to accommodate the fact that a
> > > > > delalloc allocation request may not fully cover the extent in the
> > > > > data fork. Trim the data fork extent appropriately, just as is done
> > > > > for shared extent boundaries and/or existing COW reservations that
> > > > > happen to overlap the start of the data fork extent. This prevents
> > > > > shared/010 failures due to data corruption on reflink enabled
> > > > > filesystems.
> > > > > 
> > > > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > > > ---
> > > > > 
> > > > > This is not fully tested yet beyond verification that it solves the
> > > > > problem reproduced by shared/010. I'll be running more tests today, but
> > > > > I'm sending sooner for review and testing due to the nature of the
> > > > > problem and the fact that it's a fairly isolated change. I'll follow up
> > > > > if I discover any resulting regressions..
> > > > 
> > > > Did you find any regressions?
> > > > 
> > > > I ran this through my overnight tests and saw no adverse effects, though
> > > > Dave was complaining yesterday about continuing generic/091 corruptions
> > > > (which I didn't see with this patch applied...)
> > > 
> > > I can say now that this patch hasn't caused any new corruptions. It
> > > hasn't fixed any of the (many) corruptions that I'm hitting, either,
> > > so from that perspective it's no better or worse than what we have
> > > now :P
> > 
> > So were you reproducing the shared/010 corruption or no?
> 
> No, I wasn't, because I already had a patch in my tree that fixed
> it, apparently (see followup to Darrick's flush-after-zero on
> reflink patch). Basically, the EOF zeroing+truncation problem is
> something fsx trips over quite quickly on 1k block size filesystems.
> I've had a patch for it in my tree for about a week now.
> 

Ok, that's a different issue. I happened to reproduce both via
shared/010 with the writeback assert rfc patch I posted yesterday. The
patch for this thread addresses a corruption due to failure to properly
perform COW reservation for a buffered write. Darrick's patch addresses
a corruption due to the EOF zeroing associated with dedupe leaving
around a dirty page over a shared block. Note that I reproduced this
latter issue with a page size == block size fs and Darrick (and you) had
apparently reproduced a slightly different problem caused by the same
zeroing code on 1k FSB. Darrick and I just happened to stumble on the
common cause at just about the same time..

Anyways, I'll try to confirm that your patch also resolves the issue I
reproduced (which it looks like it should)..

> So what I meant is that it didn't fix any of the fsx failures I'd
> been seeing, but it also didn't introduce any new fsx failures,
> either, as I was kind of hoping it would....
> 

Ok, I just wanted to make sure this patch doesn't get dropped on the
floor by thinking it's fixed by something else.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
diff mbox series

Patch

diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index ecdb086bc23e..c56bdbfcf7ae 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -296,6 +296,7 @@  xfs_reflink_reserve_cow(
 	if (error)
 		return error;
 
+	xfs_trim_extent(imap, got.br_startoff, got.br_blockcount);
 	trace_xfs_reflink_cow_alloc(ip, &got);
 	return 0;
 }