xfs: remove experimental tag for reflinks

Message ID	20170830145400.3681-1-hch@lst.de (mailing list archive)
State	Accepted
Headers	show Return-Path: <linux-xfs-owner@kernel.org> From: Christoph Hellwig <hch@lst.de> To: linux-xfs@vger.kernel.org Subject: [PATCH] xfs: remove experimental tag for reflinks Date: Wed, 30 Aug 2017 16:54:00 +0200 Message-Id: <20170830145400.3681-1-hch@lst.de> Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk

Christoph Hellwig Aug. 30, 2017, 2:54 p.m. UTC

But reject reflink + DAX file systems for now until the code to
support reflinks on DAX is actually implemented.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_super.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

Darrick J. Wong Aug. 31, 2017, 6:40 a.m. UTC | #1

On Wed, Aug 30, 2017 at 04:54:00PM +0200, Christoph Hellwig wrote:
> But reject reflink + DAX file systems for now until the code to
> support reflinks on DAX is actually implemented.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_super.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 38aaacdbb8b3..92521032468e 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1634,7 +1634,9 @@ xfs_fs_fill_super(
>  		}
>  		if (xfs_sb_version_hasreflink(&mp->m_sb))
>  			xfs_alert(mp,
> -		"DAX and reflink have not been tested together!");
> +		"DAX and reflink can not be used together!");
> +			error = -EINVAL;
> +			goto out_filestream_unmount;

/This/ hunk seems fine, but...

>  	}
>  
>  	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> @@ -1648,10 +1650,6 @@ xfs_fs_fill_super(
>  	"EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!");
>  	}
>  
> -	if (xfs_sb_version_hasreflink(&mp->m_sb))
> -		xfs_alert(mp,
> -	"EXPERIMENTAL reflink feature enabled. Use at your own risk!");
> -

...I frankly would rather wait until we land and stabilize the incore
extent rework, because I'd rather not have the reflink story be: 1) we
declare reflink stable in 4.14; 2) immediately people start loading up
XFSes with sparse VM images that blow up on high order allocations and
then 3) we get a whole lot of complaints about it.  Then in 4.15 we 4)
land the incore extent map only now we're dragging those same users
through the mud while /that/ stabilizes, with the result 5) that
everyone thinks we've gone off the deep end and doesn't trust us
anymore.

I wish that didn't also mean waiting another 6 months for something that
none of us /developers/ have seen in practice, especially since the
enterprise distros will have plenty of time to backport all this stuff
before their next big releases if it /does/ become a problem.  It's fine
enough for me (and Christoph's customers, evidently) but is that enough?

<shrug> Not sure what to do about my own fear factor. :)  Anyone want to
offer further comments?  A quick git history trip says sparse inodes
took about 13 months to go from initial commit to EXPERIMENTAL removed?

--D

>  	error = xfs_mountfs(mp);
>  	if (error)
>  		goto out_filestream_unmount;
> -- 
> 2.11.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Brian Foster Aug. 31, 2017, 12:43 p.m. UTC | #2

On Wed, Aug 30, 2017 at 11:40:33PM -0700, Darrick J. Wong wrote:
> On Wed, Aug 30, 2017 at 04:54:00PM +0200, Christoph Hellwig wrote:
> > But reject reflink + DAX file systems for now until the code to
> > support reflinks on DAX is actually implemented.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  fs/xfs/xfs_super.c | 8 +++-----
> >  1 file changed, 3 insertions(+), 5 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index 38aaacdbb8b3..92521032468e 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -1634,7 +1634,9 @@ xfs_fs_fill_super(
> >  		}
> >  		if (xfs_sb_version_hasreflink(&mp->m_sb))
> >  			xfs_alert(mp,
> > -		"DAX and reflink have not been tested together!");
> > +		"DAX and reflink can not be used together!");
> > +			error = -EINVAL;
> > +			goto out_filestream_unmount;
> 
> /This/ hunk seems fine, but...
> 
> >  	}
> >  
> >  	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> > @@ -1648,10 +1650,6 @@ xfs_fs_fill_super(
> >  	"EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!");
> >  	}
> >  
> > -	if (xfs_sb_version_hasreflink(&mp->m_sb))
> > -		xfs_alert(mp,
> > -	"EXPERIMENTAL reflink feature enabled. Use at your own risk!");
> > -
> 
> ...I frankly would rather wait until we land and stabilize the incore
> extent rework, because I'd rather not have the reflink story be: 1) we
> declare reflink stable in 4.14; 2) immediately people start loading up
> XFSes with sparse VM images that blow up on high order allocations and
> then 3) we get a whole lot of complaints about it.  Then in 4.15 we 4)
> land the incore extent map only now we're dragging those same users
> through the mud while /that/ stabilizes, with the result 5) that
> everyone thinks we've gone off the deep end and doesn't trust us
> anymore.
> 
> I wish that didn't also mean waiting another 6 months for something that
> none of us /developers/ have seen in practice, especially since the
> enterprise distros will have plenty of time to backport all this stuff
> before their next big releases if it /does/ become a problem.  It's fine
> enough for me (and Christoph's customers, evidently) but is that enough?
> 
> <shrug> Not sure what to do about my own fear factor. :)  Anyone want to
> offer further comments?  A quick git history trip says sparse inodes
> took about 13 months to go from initial commit to EXPERIMENTAL removed?
> 

FWIW, I don't really have a strong opinion. To me, removing experimental
means we feel the code has stabilized long enough in principle, there
are no significant problems (i.e., corruption/crash vectors) that we are
aware of and the feature is complete (full userspace tool support, etc).
The in-core extent list thing seems like more of a general problem to me
so I have no objection to removing the experimental tag from reflink if
that's really the only known drawback. Of course, I don't object if
there's developer preference to keep it for another cycle or two either.

That aside, shouldn't we consider the rmapbt experimental tag first, or
at least at the same time? It's been around for slightly longer.

Brian

> --D
> 
> >  	error = xfs_mountfs(mp);
> >  	if (error)
> >  		goto out_filestream_unmount;
> > -- 
> > 2.11.0
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Christoph Hellwig Aug. 31, 2017, 1:30 p.m. UTC | #3

On Thu, Aug 31, 2017 at 08:43:21AM -0400, Brian Foster wrote:
> FWIW, I don't really have a strong opinion. To me, removing experimental
> means we feel the code has stabilized long enough in principle, there
> are no significant problems (i.e., corruption/crash vectors) that we are
> aware of and the feature is complete (full userspace tool support, etc).
> The in-core extent list thing seems like more of a general problem to me

Agreed so far.

> That aside, shouldn't we consider the rmapbt experimental tag first, or
> at least at the same time? It's been around for slightly longer.

I've not done much testing on that or have experience with it in general,
nor do I have a customer with a big QA team beating it hard, so I can't
really comment on that one.
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Darrick J. Wong Aug. 31, 2017, 3:31 p.m. UTC | #4

On Thu, Aug 31, 2017 at 03:30:19PM +0200, Christoph Hellwig wrote:
> On Thu, Aug 31, 2017 at 08:43:21AM -0400, Brian Foster wrote:
> > FWIW, I don't really have a strong opinion. To me, removing experimental
> > means we feel the code has stabilized long enough in principle, there
> > are no significant problems (i.e., corruption/crash vectors) that we are
> > aware of and the feature is complete (full userspace tool support, etc).
> > The in-core extent list thing seems like more of a general problem to me
> 
> Agreed so far.

<nod> Dave?  Eric?  Any perspective you'd like to offer? :)

> > That aside, shouldn't we consider the rmapbt experimental tag first, or
> > at least at the same time? It's been around for slightly longer.
> 
> I've not done much testing on that or have experience with it in general,
> nor do I have a customer with a big QA team beating it hard, so I can't
> really comment on that one.

rmapbt will remain EXPERIMENTAL because I still have more patches to
send to finish the feature for realtime devices.  Speaking of which,
it's now been 53 weeks since the last dump of that, so I'll go do that
now. :P

FWIW I /also/ run rmapbt everywhere and haven't had any trouble with it
since adding the per-AG reservations.

--D

> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Eric Sandeen Aug. 31, 2017, 3:55 p.m. UTC | #5

On 8/31/17 10:31 AM, Darrick J. Wong wrote:
> On Thu, Aug 31, 2017 at 03:30:19PM +0200, Christoph Hellwig wrote:
>> On Thu, Aug 31, 2017 at 08:43:21AM -0400, Brian Foster wrote:
>>> FWIW, I don't really have a strong opinion. To me, removing experimental
>>> means we feel the code has stabilized long enough in principle, there
>>> are no significant problems (i.e., corruption/crash vectors) that we are
>>> aware of and the feature is complete (full userspace tool support, etc).
>>> The in-core extent list thing seems like more of a general problem to me
>>
>> Agreed so far.
> 
> <nod> Dave?  Eric?  Any perspective you'd like to offer? :)

This is a bit of a naiive question, but how many applications are out
there that can be used with reflink right now?  This would obviously
delay things a bit, but I had considered writing something up for
LWN or Fedora Planet or $WHATEVER describing these new xfs features,
how they can be used, and encourage some early-adopter testing.  Try
to get some buzz going and some real-world use.

It's always a catch 22; nobody uses it until it's marked stable,
but we never know if it's really stable until people outside the
development community use it.  ;)

As for the allocation issues w/ the in core extent list, yeah, that
worries me.

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Brian Foster Aug. 31, 2017, 3:59 p.m. UTC | #6

On Thu, Aug 31, 2017 at 08:31:48AM -0700, Darrick J. Wong wrote:
> On Thu, Aug 31, 2017 at 03:30:19PM +0200, Christoph Hellwig wrote:
> > On Thu, Aug 31, 2017 at 08:43:21AM -0400, Brian Foster wrote:
> > > FWIW, I don't really have a strong opinion. To me, removing experimental
> > > means we feel the code has stabilized long enough in principle, there
> > > are no significant problems (i.e., corruption/crash vectors) that we are
> > > aware of and the feature is complete (full userspace tool support, etc).
> > > The in-core extent list thing seems like more of a general problem to me
> > 
> > Agreed so far.
> 
> <nod> Dave?  Eric?  Any perspective you'd like to offer? :)
> 
> > > That aside, shouldn't we consider the rmapbt experimental tag first, or
> > > at least at the same time? It's been around for slightly longer.
> > 
> > I've not done much testing on that or have experience with it in general,
> > nor do I have a customer with a big QA team beating it hard, so I can't
> > really comment on that one.
> 
> rmapbt will remain EXPERIMENTAL because I still have more patches to
> send to finish the feature for realtime devices.  Speaking of which,
> it's now been 53 weeks since the last dump of that, so I'll go do that
> now. :P
> 
> FWIW I /also/ run rmapbt everywhere and haven't had any trouble with it
> since adding the per-AG reservations.
> 

My question then is do we want to encourage users to run reflink without
rmapbt because of the latter being experimental, and only so because of
a lack of realtime support? *shrug* Maybe it doesn't really matter.

But realistically, how likely is it that the stability of forthcoming
rmapbt+realtime code has any bearing on making rmapbt non-experimental
in general? I suspect we're not going to leave it experimental for
another however many months just to let the rt code sit around. I also
suspect not many people will be actually using/testing that code outside
of some of us, but maybe there are real users out there and I'm just not
aware of them..? (This all coming from somebody who has CONFIG_XFS_RT
disabled on his configs. ;)

If realtime is the only barrier, ISTM we could remove the experimental
rmapbt status and just disable rmapbt+rt for now. Then re-enable
EXPERIMENTAL just for rmapbt+rt when that code goes in (which seems like
the most likely end result to me anyways).

Brian

> --D
> 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Darrick J. Wong Aug. 31, 2017, 8:02 p.m. UTC | #7

On Thu, Aug 31, 2017 at 10:55:31AM -0500, Eric Sandeen wrote:
> On 8/31/17 10:31 AM, Darrick J. Wong wrote:
> > On Thu, Aug 31, 2017 at 03:30:19PM +0200, Christoph Hellwig wrote:
> >> On Thu, Aug 31, 2017 at 08:43:21AM -0400, Brian Foster wrote:
> >>> FWIW, I don't really have a strong opinion. To me, removing experimental
> >>> means we feel the code has stabilized long enough in principle, there
> >>> are no significant problems (i.e., corruption/crash vectors) that we are
> >>> aware of and the feature is complete (full userspace tool support, etc).
> >>> The in-core extent list thing seems like more of a general problem to me
> >>
> >> Agreed so far.
> > 
> > <nod> Dave?  Eric?  Any perspective you'd like to offer? :)
> 
> This is a bit of a naiive question, but how many applications are out
> there that can be used with reflink right now?  This would obviously

Anything that uses cp --reflink, which could be any number of deploy
scripts to VM "snapshot" tools.  ISTR that cifs & nfs now both support
reflink too, and now that ocfs2 also supports the ficlone* ioctls (sort
of) we can expect wider use.

dupremove (offline dedupe tool for btrfs) can also talk to xfs/ocfs2.

> delay things a bit, but I had considered writing something up for
> LWN or Fedora Planet or $WHATEVER describing these new xfs features,
> how they can be used, and encourage some early-adopter testing.  Try
> to get some buzz going and some real-world use.

I know there's already some dedicated experimenters using reflink...

> It's always a catch 22; nobody uses it until it's marked stable,
> but we never know if it's really stable until people outside the
> development community use it.  ;)
> 
> As for the allocation issues w/ the in core extent list, yeah, that
> worries me.

Yeah.  On the one hand it's a pre-existing condition that's orthogonal
to reflink (and hey, we even put in cowextsize to mitigate the problems
on VMs) but OTOH it probably gets worse with cow.

--D

> 
> -Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Darrick J. Wong Aug. 31, 2017, 8:09 p.m. UTC | #8

On Thu, Aug 31, 2017 at 11:59:24AM -0400, Brian Foster wrote:
> On Thu, Aug 31, 2017 at 08:31:48AM -0700, Darrick J. Wong wrote:
> > On Thu, Aug 31, 2017 at 03:30:19PM +0200, Christoph Hellwig wrote:
> > > On Thu, Aug 31, 2017 at 08:43:21AM -0400, Brian Foster wrote:
> > > > FWIW, I don't really have a strong opinion. To me, removing experimental
> > > > means we feel the code has stabilized long enough in principle, there
> > > > are no significant problems (i.e., corruption/crash vectors) that we are
> > > > aware of and the feature is complete (full userspace tool support, etc).
> > > > The in-core extent list thing seems like more of a general problem to me
> > > 
> > > Agreed so far.
> > 
> > <nod> Dave?  Eric?  Any perspective you'd like to offer? :)
> > 
> > > > That aside, shouldn't we consider the rmapbt experimental tag first, or
> > > > at least at the same time? It's been around for slightly longer.
> > > 
> > > I've not done much testing on that or have experience with it in general,
> > > nor do I have a customer with a big QA team beating it hard, so I can't
> > > really comment on that one.
> > 
> > rmapbt will remain EXPERIMENTAL because I still have more patches to
> > send to finish the feature for realtime devices.  Speaking of which,
> > it's now been 53 weeks since the last dump of that, so I'll go do that
> > now. :P
> > 
> > FWIW I /also/ run rmapbt everywhere and haven't had any trouble with it
> > since adding the per-AG reservations.
> > 
> 
> My question then is do we want to encourage users to run reflink without
> rmapbt because of the latter being experimental, and only so because of
> a lack of realtime support? *shrug* Maybe it doesn't really matter.

Probably not, I think realtime users are fairly infrequent and
especially so on v5.  The only reasons I can think of to extend our new
features to rt are (a) to avoid screwing over the existing usecases and
(b) I guess you could build a hybrid xfs between an SSD and a SMR drive
wherein we always CoW from one end of the disk to the other.  (That's
crazy, but hey.)

> But realistically, how likely is it that the stability of forthcoming
> rmapbt+realtime code has any bearing on making rmapbt non-experimental
> in general?

The biggest code churn to add realtime rmap is reworking the btree code
to support putting btree records in an inode fork.  The rmap code itself
is relatively unchanged aside from widening rm_startblock/rm_blockcount
to 64 bits.

> I suspect we're not going to leave it experimental for another however
> many months just to let the rt code sit around. I also suspect not
> many people will be actually using/testing that code outside of some
> of us, but maybe there are real users out there and I'm just not aware
> of them..? (This all coming from somebody who has CONFIG_XFS_RT
> disabled on his configs. ;)

/me has it enabled and (occasionally) runs with realtime to see what
breaks. :)

> If realtime is the only barrier, ISTM we could remove the experimental
> rmapbt status and just disable rmapbt+rt for now. Then re-enable
> EXPERIMENTAL just for rmapbt+rt when that code goes in (which seems like
> the most likely end result to me anyways).

It's already disabled.  I suppose it's not /that/ big of a deal if old
kernels reject certain feature combinations...

...by the way, we can't add a rt device to an already-mounted
filesystem, right?

--D

> 
> Brian
> 
> > --D
> > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Brian Foster Aug. 31, 2017, 8:36 p.m. UTC | #9

On Thu, Aug 31, 2017 at 01:09:36PM -0700, Darrick J. Wong wrote:
> On Thu, Aug 31, 2017 at 11:59:24AM -0400, Brian Foster wrote:
> > On Thu, Aug 31, 2017 at 08:31:48AM -0700, Darrick J. Wong wrote:
> > > On Thu, Aug 31, 2017 at 03:30:19PM +0200, Christoph Hellwig wrote:
> > > > On Thu, Aug 31, 2017 at 08:43:21AM -0400, Brian Foster wrote:
> > > > > FWIW, I don't really have a strong opinion. To me, removing experimental
> > > > > means we feel the code has stabilized long enough in principle, there
> > > > > are no significant problems (i.e., corruption/crash vectors) that we are
> > > > > aware of and the feature is complete (full userspace tool support, etc).
> > > > > The in-core extent list thing seems like more of a general problem to me
> > > > 
> > > > Agreed so far.
> > > 
> > > <nod> Dave?  Eric?  Any perspective you'd like to offer? :)
> > > 
> > > > > That aside, shouldn't we consider the rmapbt experimental tag first, or
> > > > > at least at the same time? It's been around for slightly longer.
> > > > 
> > > > I've not done much testing on that or have experience with it in general,
> > > > nor do I have a customer with a big QA team beating it hard, so I can't
> > > > really comment on that one.
> > > 
> > > rmapbt will remain EXPERIMENTAL because I still have more patches to
> > > send to finish the feature for realtime devices.  Speaking of which,
> > > it's now been 53 weeks since the last dump of that, so I'll go do that
> > > now. :P
> > > 
> > > FWIW I /also/ run rmapbt everywhere and haven't had any trouble with it
> > > since adding the per-AG reservations.
> > > 
> > 
> > My question then is do we want to encourage users to run reflink without
> > rmapbt because of the latter being experimental, and only so because of
> > a lack of realtime support? *shrug* Maybe it doesn't really matter.
> 
> Probably not, I think realtime users are fairly infrequent and
> especially so on v5.  The only reasons I can think of to extend our new
> features to rt are (a) to avoid screwing over the existing usecases and
> (b) I guess you could build a hybrid xfs between an SSD and a SMR drive
> wherein we always CoW from one end of the disk to the other.  (That's
> crazy, but hey.)
> 

Yeah, the only use cases I've really heard wrt to realtime any time in
the recent past has been these kind of future facing experiments as
opposed to traditional use. Not to say there aren't users out there, but
I also can't recall seeing any bug reports or anything of that nature
either any time recently. Maybe it's just perfect code. ;)

I think Dave has thought about the SMR+RT thing in the past. Somebody
else mentioned some other thing they were trying to do using RT to me at
Vault, to which I suggested to post some rfc code for discussion (which
obviously hasn't happened), but I honestly can't remember what that
experiment even was.

> > But realistically, how likely is it that the stability of forthcoming
> > rmapbt+realtime code has any bearing on making rmapbt non-experimental
> > in general?
> 
> The biggest code churn to add realtime rmap is reworking the btree code
> to support putting btree records in an inode fork.  The rmap code itself
> is relatively unchanged aside from widening rm_startblock/rm_blockcount
> to 64 bits.
> 
> > I suspect we're not going to leave it experimental for another however
> > many months just to let the rt code sit around. I also suspect not
> > many people will be actually using/testing that code outside of some
> > of us, but maybe there are real users out there and I'm just not aware
> > of them..? (This all coming from somebody who has CONFIG_XFS_RT
> > disabled on his configs. ;)
> 
> /me has it enabled and (occasionally) runs with realtime to see what
> breaks. :)
> 

Heh, I think I did that once in the past. That about sums up my
experience with rt. :)

> > If realtime is the only barrier, ISTM we could remove the experimental
> > rmapbt status and just disable rmapbt+rt for now. Then re-enable
> > EXPERIMENTAL just for rmapbt+rt when that code goes in (which seems like
> > the most likely end result to me anyways).
> 
> It's already disabled.  I suppose it's not /that/ big of a deal if old
> kernels reject certain feature combinations...
> 

Er, right. :P That seems reasonable to me. Again, I don't feel terribly
strongly about whether to remove the tag or not for rmapbt in general.
It just seems a bit strange to me to open up reflink and not rmapbt just
because we don't support the latter on realtime yet.

> ...by the way, we can't add a rt device to an already-mounted
> filesystem, right?
> 

I thought it was a mkfs time thing, but I could be mistaken...

Brian

> --D
> 
> > 
> > Brian
> > 
> > > --D
> > > 
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Dave Chinner Aug. 31, 2017, 10:19 p.m. UTC | #10

On Wed, Aug 30, 2017 at 11:40:33PM -0700, Darrick J. Wong wrote:
> On Wed, Aug 30, 2017 at 04:54:00PM +0200, Christoph Hellwig wrote:
> > But reject reflink + DAX file systems for now until the code to
> > support reflinks on DAX is actually implemented.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  fs/xfs/xfs_super.c | 8 +++-----
> >  1 file changed, 3 insertions(+), 5 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index 38aaacdbb8b3..92521032468e 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -1634,7 +1634,9 @@ xfs_fs_fill_super(
> >  		}
> >  		if (xfs_sb_version_hasreflink(&mp->m_sb))
> >  			xfs_alert(mp,
> > -		"DAX and reflink have not been tested together!");
> > +		"DAX and reflink can not be used together!");
> > +			error = -EINVAL;
> > +			goto out_filestream_unmount;
> 
> /This/ hunk seems fine, but...
> 
> >  	}
> >  
> >  	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> > @@ -1648,10 +1650,6 @@ xfs_fs_fill_super(
> >  	"EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!");
> >  	}
> >  
> > -	if (xfs_sb_version_hasreflink(&mp->m_sb))
> > -		xfs_alert(mp,
> > -	"EXPERIMENTAL reflink feature enabled. Use at your own risk!");
> > -
> 
> ...I frankly would rather wait until we land and stabilize the incore
> extent rework, because I'd rather not have the reflink story be: 1) we
> declare reflink stable in 4.14; 2) immediately people start loading up
> XFSes with sparse VM images that blow up on high order allocations and
> then 3) we get a whole lot of complaints about it.  Then in 4.15 we 4)
> land the incore extent map only now we're dragging those same users
> through the mud while /that/ stabilizes, with the result 5) that
> everyone thinks we've gone off the deep end and doesn't trust us
> anymore.

Which is a lose-lose proposition. IOWs, to be avoided at all costs.

> I wish that didn't also mean waiting another 6 months for something that
> none of us /developers/ have seen in practice,

Oh, I've seen it plenty of times in practice. Restoring large
metadumps with (tens of) millions of tiny extents is the common
cause for me.

> especially since the
> enterprise distros will have plenty of time to backport all this stuff
> before their next big releases if it /does/ become a problem.  It's fine
> enough for me (and Christoph's customers, evidently) but is that enough?

FYI, this has been a problem on RHEL for a long time. It's a long
standing, well known problem and that's one of the reasons why
it's at the top of my list to fix right now.

> <shrug> Not sure what to do about my own fear factor. :)  Anyone want to
> offer further comments?  A quick git history trip says sparse inodes
> took about 13 months to go from initial commit to EXPERIMENTAL removed?

I always aimed to remove the experimental tag 4 kernel releases
after the initial merge, assuming testing and required functionality
was all there and complete early on in the stabilisation process.
Generally I had a set of criteria that needed to be fulfilled before
I'd remove the tag, and ticked them off as fixes were merged.

e.g.  CRCs were introduced in 3.10, the experimental tag was removed
in 3.15, so 5 releases in that case. We didn't get CRCs working
sanely until 3.11 and there was still a lot of userspace stuff that
lagged (and having a fully working userspace was a criteria for
removing experimental from the kernel code), so it took longer to
stabilise than something smaller like sparse inodes.

Fixing the extent list memory allocation problem was always
something on my list of "need to fix before reflink can be deployed
in anger", so I'd much prefer that we are conservative in removing
the experimental tag. It doesn't hurt anyone if we keep it on until
we get the extent tree rework merged, and it's better to err on the
side of keeping it too long than not long enough.

Cheers,

Dave.

Dave Chinner Aug. 31, 2017, 10:58 p.m. UTC | #11

On Thu, Aug 31, 2017 at 04:36:14PM -0400, Brian Foster wrote:
> On Thu, Aug 31, 2017 at 01:09:36PM -0700, Darrick J. Wong wrote:
> > On Thu, Aug 31, 2017 at 11:59:24AM -0400, Brian Foster wrote:
> > > On Thu, Aug 31, 2017 at 08:31:48AM -0700, Darrick J. Wong wrote:
> > > > On Thu, Aug 31, 2017 at 03:30:19PM +0200, Christoph Hellwig wrote:
> > > > > On Thu, Aug 31, 2017 at 08:43:21AM -0400, Brian Foster wrote:
> > > > > > FWIW, I don't really have a strong opinion. To me, removing experimental
> > > > > > means we feel the code has stabilized long enough in principle, there
> > > > > > are no significant problems (i.e., corruption/crash vectors) that we are
> > > > > > aware of and the feature is complete (full userspace tool support, etc).
> > > > > > The in-core extent list thing seems like more of a general problem to me
> > > > > 
> > > > > Agreed so far.
> > > > 
> > > > <nod> Dave?  Eric?  Any perspective you'd like to offer? :)
> > > > 
> > > > > > That aside, shouldn't we consider the rmapbt experimental tag first, or
> > > > > > at least at the same time? It's been around for slightly longer.
> > > > > 
> > > > > I've not done much testing on that or have experience with it in general,
> > > > > nor do I have a customer with a big QA team beating it hard, so I can't
> > > > > really comment on that one.
> > > > 
> > > > rmapbt will remain EXPERIMENTAL because I still have more patches to
> > > > send to finish the feature for realtime devices.  Speaking of which,
> > > > it's now been 53 weeks since the last dump of that, so I'll go do that
> > > > now. :P
> > > > 
> > > > FWIW I /also/ run rmapbt everywhere and haven't had any trouble with it
> > > > since adding the per-AG reservations.
> > > > 
> > > 
> > > My question then is do we want to encourage users to run reflink without
> > > rmapbt because of the latter being experimental, and only so because of
> > > a lack of realtime support? *shrug* Maybe it doesn't really matter.
> > 
> > Probably not, I think realtime users are fairly infrequent and
> > especially so on v5.  The only reasons I can think of to extend our new
> > features to rt are (a) to avoid screwing over the existing usecases and
> > (b) I guess you could build a hybrid xfs between an SSD and a SMR drive
> > wherein we always CoW from one end of the disk to the other.  (That's
> > crazy, but hey.)

Or use RT files as VM image files with deterministic, low overhead
allocation characteristics. And for that use case, I^Hwe want rmap
and reflink on the realtime device so we can scrub and
snapshot/send/recv those image files......

> Yeah, the only use cases I've really heard wrt to realtime any time in
> the recent past has been these kind of future facing experiments as
> opposed to traditional use. Not to say there aren't users out there, but

Let's not forget that there are tens of millions of existing
XFS RT filesystems out there in smart TVs and DVRs.

> I also can't recall seeing any bug reports or anything of that nature
> either any time recently. Maybe it's just perfect code. ;)

No, just carefully constrained applications and a lot of QA.

> I think Dave has thought about the SMR+RT thing in the past.

Yes, and I wrote a document describing it for the device vendors
that were screaming for us to do something to be able to go away and
implement.....  <crickets>

Regardless, IIRC, there have been people using RT devices with SMR
drives for some years now, mainly because the RT device could
guarantee zone aligned and sized allocations for sequential write
applications like video recording....

> > > If realtime is the only barrier, ISTM we could remove the experimental
> > > rmapbt status and just disable rmapbt+rt for now. Then re-enable
> > > EXPERIMENTAL just for rmapbt+rt when that code goes in (which seems like
> > > the most likely end result to me anyways).
> > 
> > It's already disabled.  I suppose it's not /that/ big of a deal if old
> > kernels reject certain feature combinations...

Just add an incompat bit for rtrmapbt and rtreflink when they come
along, and everything will be fine :P

> > ...by the way, we can't add a rt device to an already-mounted
> > filesystem, right?
> 
> I thought it was a mkfs time thing, but I could be mistaken...

It is at the moment, but there's nothing that prevents us from
adding it dynamically if we wanted (same as journal resizing). e.g.
via a remount to specify and open the rt device and growfs to resize
it from zero to device size and allocate and initialise the bitmap
and summary inodes.

Cheers,

Dave.

Brian Foster Sept. 1, 2017, 11:16 a.m. UTC | #12

On Fri, Sep 01, 2017 at 08:58:39AM +1000, Dave Chinner wrote:
> On Thu, Aug 31, 2017 at 04:36:14PM -0400, Brian Foster wrote:
> > On Thu, Aug 31, 2017 at 01:09:36PM -0700, Darrick J. Wong wrote:
> > > On Thu, Aug 31, 2017 at 11:59:24AM -0400, Brian Foster wrote:
> > > > On Thu, Aug 31, 2017 at 08:31:48AM -0700, Darrick J. Wong wrote:
> > > > > On Thu, Aug 31, 2017 at 03:30:19PM +0200, Christoph Hellwig wrote:
> > > > > > On Thu, Aug 31, 2017 at 08:43:21AM -0400, Brian Foster wrote:
> > > > > > > FWIW, I don't really have a strong opinion. To me, removing experimental
> > > > > > > means we feel the code has stabilized long enough in principle, there
> > > > > > > are no significant problems (i.e., corruption/crash vectors) that we are
> > > > > > > aware of and the feature is complete (full userspace tool support, etc).
> > > > > > > The in-core extent list thing seems like more of a general problem to me
> > > > > > 
> > > > > > Agreed so far.
> > > > > 
> > > > > <nod> Dave?  Eric?  Any perspective you'd like to offer? :)
> > > > > 
> > > > > > > That aside, shouldn't we consider the rmapbt experimental tag first, or
> > > > > > > at least at the same time? It's been around for slightly longer.
> > > > > > 
> > > > > > I've not done much testing on that or have experience with it in general,
> > > > > > nor do I have a customer with a big QA team beating it hard, so I can't
> > > > > > really comment on that one.
> > > > > 
> > > > > rmapbt will remain EXPERIMENTAL because I still have more patches to
> > > > > send to finish the feature for realtime devices.  Speaking of which,
> > > > > it's now been 53 weeks since the last dump of that, so I'll go do that
> > > > > now. :P
> > > > > 
> > > > > FWIW I /also/ run rmapbt everywhere and haven't had any trouble with it
> > > > > since adding the per-AG reservations.
> > > > > 
> > > > 
> > > > My question then is do we want to encourage users to run reflink without
> > > > rmapbt because of the latter being experimental, and only so because of
> > > > a lack of realtime support? *shrug* Maybe it doesn't really matter.
> > > 
> > > Probably not, I think realtime users are fairly infrequent and
> > > especially so on v5.  The only reasons I can think of to extend our new
> > > features to rt are (a) to avoid screwing over the existing usecases and
> > > (b) I guess you could build a hybrid xfs between an SSD and a SMR drive
> > > wherein we always CoW from one end of the disk to the other.  (That's
> > > crazy, but hey.)
> 
> Or use RT files as VM image files with deterministic, low overhead
> allocation characteristics. And for that use case, I^Hwe want rmap
> and reflink on the realtime device so we can scrub and
> snapshot/send/recv those image files......
> 
> > Yeah, the only use cases I've really heard wrt to realtime any time in
> > the recent past has been these kind of future facing experiments as
> > opposed to traditional use. Not to say there aren't users out there, but
> 
> Let's not forget that there are tens of millions of existing
> XFS RT filesystems out there in smart TVs and DVRs.
> 

Oh, I'm not suggesting there are zero users or that we should never port
over the associated features to RT. I'm just suggesting that if we were
going to untag reflink as experimental, it may not be worth holding back
doing the same for rmapbt if lack of RT support is the only concern.

I do question how many of these existing RT users may be on newer
features like v5 filesystems (if supported?), or kernels that even
support v5 filesystems for that matter. ;)

> > I also can't recall seeing any bug reports or anything of that nature
> > either any time recently. Maybe it's just perfect code. ;)
> 
> No, just carefully constrained applications and a lot of QA.
> 

I'd hope we have at least the same amount of QA for !RT releases. (I
know from experience that the margins on many of those kinds of consumer
devices don't exactly facilitate the level of QE we typically see at the
enterprise level). If anything, I'd guess that whatever historical sense
of reliability this feature offers for things like embedded environments
stems more from using older kernels and code that basically never
changes than anything else. The one time I tried using RT way back when,
it fell on its face fairly hard and fairly quickly. Granted that is
anecdotal and I don't recall the specifics anyways. Maybe its been fixed
up since then.

> > I think Dave has thought about the SMR+RT thing in the past.
> 
> Yes, and I wrote a document describing it for the device vendors
> that were screaming for us to do something to be able to go away and
> implement.....  <crickets>
> 
> Regardless, IIRC, there have been people using RT devices with SMR
> drives for some years now, mainly because the RT device could
> guarantee zone aligned and sized allocations for sequential write
> applications like video recording....
> 

Yep, I recall that discussion.

> > > > If realtime is the only barrier, ISTM we could remove the experimental
> > > > rmapbt status and just disable rmapbt+rt for now. Then re-enable
> > > > EXPERIMENTAL just for rmapbt+rt when that code goes in (which seems like
> > > > the most likely end result to me anyways).
> > > 
> > > It's already disabled.  I suppose it's not /that/ big of a deal if old
> > > kernels reject certain feature combinations...
> 
> Just add an incompat bit for rtrmapbt and rtreflink when they come
> along, and everything will be fine :P
> 

That sounds quite reasonable to me.

Brian

> > > ...by the way, we can't add a rt device to an already-mounted
> > > filesystem, right?
> > 
> > I thought it was a mkfs time thing, but I could be mistaken...
> 
> It is at the moment, but there's nothing that prevents us from
> adding it dynamically if we wanted (same as journal resizing). e.g.
> via a remount to specify and open the rt device and growfs to resize
> it from zero to device size and allocate and initialise the bitmap
> and summary inodes.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Darrick J. Wong Nov. 15, 2017, 1:10 a.m. UTC | #13

On Wed, Aug 30, 2017 at 04:54:00PM +0200, Christoph Hellwig wrote:
> But reject reflink + DAX file systems for now until the code to
> support reflinks on DAX is actually implemented.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_super.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 38aaacdbb8b3..92521032468e 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1634,7 +1634,9 @@ xfs_fs_fill_super(
>  		}
>  		if (xfs_sb_version_hasreflink(&mp->m_sb))
>  			xfs_alert(mp,
> -		"DAX and reflink have not been tested together!");
> +		"DAX and reflink can not be used together!");
> +			error = -EINVAL;
> +			goto out_filestream_unmount;
>  	}
>  
>  	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> @@ -1648,10 +1650,6 @@ xfs_fs_fill_super(
>  	"EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!");
>  	}
>  
> -	if (xfs_sb_version_hasreflink(&mp->m_sb))
> -		xfs_alert(mp,
> -	"EXPERIMENTAL reflink feature enabled. Use at your own risk!");

Ok, now that the incore extent map rework has landed upstream, can we
please get everyone's QA to stress the reflink/cow code vigorously
during this release cycle so that we can re-consider this patch for
4.16?

(And if you're particularly mean, force cowextsize = 1fsb to age the
filesystem prematurely?)

Right now I think the only problem I know about is that log recovery bug
where the defer ops of replayed defer ops get logged in the wrong order,
but that's it.

--D

> -
>  	error = xfs_mountfs(mp);
>  	if (error)
>  		goto out_filestream_unmount;
> -- 
> 2.11.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Amir Goldstein Nov. 15, 2017, 6:14 a.m. UTC | #14

On Wed, Nov 15, 2017 at 3:10 AM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
> On Wed, Aug 30, 2017 at 04:54:00PM +0200, Christoph Hellwig wrote:
>> But reject reflink + DAX file systems for now until the code to
>> support reflinks on DAX is actually implemented.
>>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>> ---
>>  fs/xfs/xfs_super.c | 8 +++-----
>>  1 file changed, 3 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
>> index 38aaacdbb8b3..92521032468e 100644
>> --- a/fs/xfs/xfs_super.c
>> +++ b/fs/xfs/xfs_super.c
>> @@ -1634,7 +1634,9 @@ xfs_fs_fill_super(
>>               }
>>               if (xfs_sb_version_hasreflink(&mp->m_sb))
>>                       xfs_alert(mp,
>> -             "DAX and reflink have not been tested together!");
>> +             "DAX and reflink can not be used together!");
>> +                     error = -EINVAL;
>> +                     goto out_filestream_unmount;
>>       }
>>
>>       if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
>> @@ -1648,10 +1650,6 @@ xfs_fs_fill_super(
>>       "EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!");
>>       }
>>
>> -     if (xfs_sb_version_hasreflink(&mp->m_sb))
>> -             xfs_alert(mp,
>> -     "EXPERIMENTAL reflink feature enabled. Use at your own risk!");
>
> Ok, now that the incore extent map rework has landed upstream, can we
> please get everyone's QA to stress the reflink/cow code vigorously
> during this release cycle so that we can re-consider this patch for
> 4.16?
>
> (And if you're particularly mean, force cowextsize = 1fsb to age the
> filesystem prematurely?)
>
> Right now I think the only problem I know about is that log recovery bug
> where the defer ops of replayed defer ops get logged in the wrong order,
> but that's it.
>

Last time you wrote about this bug you had a "hard question" about transaction
reservation for the solution and said your're going to go have a think about it:
https://marc.info/?l=linux-xfs&m=150766311924170&w=2
Did you come to any conclusions?
That sounds like one of those nasty CoW corner cases, so I'd be happy to know
there is at least a well thought design for a solution - if not a fix.

Practically, I would love if that bug could be solved soon so that we can all
start running generic/503 for more than a few iterations to stress
test reflink/cow
with power failure. Success on this front could be a big upside before
turning off
EXPERIMENTAL.

Thanks,
Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Darrick J. Wong Nov. 22, 2017, 6:31 p.m. UTC | #15

On Wed, Nov 15, 2017 at 08:14:33AM +0200, Amir Goldstein wrote:
> On Wed, Nov 15, 2017 at 3:10 AM, Darrick J. Wong
> <darrick.wong@oracle.com> wrote:
> > On Wed, Aug 30, 2017 at 04:54:00PM +0200, Christoph Hellwig wrote:
> >> But reject reflink + DAX file systems for now until the code to
> >> support reflinks on DAX is actually implemented.
> >>
> >> Signed-off-by: Christoph Hellwig <hch@lst.de>
> >> ---
> >>  fs/xfs/xfs_super.c | 8 +++-----
> >>  1 file changed, 3 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> >> index 38aaacdbb8b3..92521032468e 100644
> >> --- a/fs/xfs/xfs_super.c
> >> +++ b/fs/xfs/xfs_super.c
> >> @@ -1634,7 +1634,9 @@ xfs_fs_fill_super(
> >>               }
> >>               if (xfs_sb_version_hasreflink(&mp->m_sb))
> >>                       xfs_alert(mp,
> >> -             "DAX and reflink have not been tested together!");
> >> +             "DAX and reflink can not be used together!");
> >> +                     error = -EINVAL;
> >> +                     goto out_filestream_unmount;
> >>       }
> >>
> >>       if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> >> @@ -1648,10 +1650,6 @@ xfs_fs_fill_super(
> >>       "EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!");
> >>       }
> >>
> >> -     if (xfs_sb_version_hasreflink(&mp->m_sb))
> >> -             xfs_alert(mp,
> >> -     "EXPERIMENTAL reflink feature enabled. Use at your own risk!");
> >
> > Ok, now that the incore extent map rework has landed upstream, can we
> > please get everyone's QA to stress the reflink/cow code vigorously
> > during this release cycle so that we can re-consider this patch for
> > 4.16?
> >
> > (And if you're particularly mean, force cowextsize = 1fsb to age the
> > filesystem prematurely?)
> >
> > Right now I think the only problem I know about is that log recovery bug
> > where the defer ops of replayed defer ops get logged in the wrong order,
> > but that's it.
> >
> 
> Last time you wrote about this bug you had a "hard question" about transaction
> reservation for the solution and said your're going to go have a think about it:
> https://marc.info/?l=linux-xfs&m=150766311924170&w=2
> Did you come to any conclusions?
> That sounds like one of those nasty CoW corner cases, so I'd be happy to know
> there is at least a well thought design for a solution - if not a fix.

Sorry I let myself get distracted/stressed with the merge window;
hopefully the patch I sent out will address that problem?

> Practically, I would love if that bug could be solved soon so that we can all
> start running generic/503 for more than a few iterations to stress
> test reflink/cow
> with power failure. Success on this front could be a big upside before
> turning off
> EXPERIMENTAL.

Indeed!  What is the status of those tests, anyway?  Are they in xfstests?

--D

> Thanks,
> Amir.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Amir Goldstein Nov. 22, 2017, 8:40 p.m. UTC | #16

On Wed, Nov 22, 2017 at 8:31 PM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
> On Wed, Nov 15, 2017 at 08:14:33AM +0200, Amir Goldstein wrote:
[...]
>>
>> Last time you wrote about this bug you had a "hard question" about transaction
>> reservation for the solution and said your're going to go have a think about it:
>> https://marc.info/?l=linux-xfs&m=150766311924170&w=2
>> Did you come to any conclusions?
>> That sounds like one of those nasty CoW corner cases, so I'd be happy to know
>> there is at least a well thought design for a solution - if not a fix.
>
> Sorry I let myself get distracted/stressed with the merge window;
> hopefully the patch I sent out will address that problem?
>

Problem reproduces better on a spinning rust I have at the office, so will
give it a spin tomorrow.

>> Practically, I would love if that bug could be solved soon so that we can all
>> start running generic/503 for more than a few iterations to stress
>> test reflink/cow
>> with power failure. Success on this front could be a big upside before
>> turning off
>> EXPERIMENTAL.
>
> Indeed!  What is the status of those tests, anyway?  Are they in xfstests?
>

Yes.
2 fsx stress tests, with and without clones
generic/455
generic/457
(replay group)

One regression test for ext4 crash bug that was already fixed
generic/456

And one regression test for xfs reflink crash bug that you already fixed
generic/458

So generic/457 is the one we should be hammering (fsx and reflink)
it creates 10 clones and runs fsx workers on them.
I imagine it is not long before there are no more shared extents.
It's not much, but its a good start.
I recon it would be good if you guys added some more variants of this test
to try and cover more interesting reflink cases.

FYI, Josef also has an fsstress based test, but it is plain shell script
and I never got around to adapting it to an xfstest:
https://github.com/josefbacik/log-writes

Cheers,
Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Darrick J. Wong Nov. 22, 2017, 9 p.m. UTC | #17

On Wed, Nov 22, 2017 at 10:40:07PM +0200, Amir Goldstein wrote:
> On Wed, Nov 22, 2017 at 8:31 PM, Darrick J. Wong
> <darrick.wong@oracle.com> wrote:
> > On Wed, Nov 15, 2017 at 08:14:33AM +0200, Amir Goldstein wrote:
> [...]
> >>
> >> Last time you wrote about this bug you had a "hard question" about transaction
> >> reservation for the solution and said your're going to go have a think about it:
> >> https://marc.info/?l=linux-xfs&m=150766311924170&w=2
> >> Did you come to any conclusions?
> >> That sounds like one of those nasty CoW corner cases, so I'd be happy to know
> >> there is at least a well thought design for a solution - if not a fix.
> >
> > Sorry I let myself get distracted/stressed with the merge window;
> > hopefully the patch I sent out will address that problem?
> >
> 
> Problem reproduces better on a spinning rust I have at the office, so will
> give it a spin tomorrow.

Ok.

> >> Practically, I would love if that bug could be solved soon so that we can all
> >> start running generic/503 for more than a few iterations to stress
> >> test reflink/cow
> >> with power failure. Success on this front could be a big upside before
> >> turning off
> >> EXPERIMENTAL.
> >
> > Indeed!  What is the status of those tests, anyway?  Are they in xfstests?
> >
> 
> Yes.

Excellent!

> 2 fsx stress tests, with and without clones
> generic/455
> generic/457
> (replay group)
> 
> One regression test for ext4 crash bug that was already fixed
> generic/456
> 
> And one regression test for xfs reflink crash bug that you already fixed
> generic/458
> 
> So generic/457 is the one we should be hammering (fsx and reflink)

Ok, will do.

--D

> it creates 10 clones and runs fsx workers on them.
> I imagine it is not long before there are no more shared extents.
> It's not much, but its a good start.
> I recon it would be good if you guys added some more variants of this test
> to try and cover more interesting reflink cases.
> 
> FYI, Josef also has an fsstress based test, but it is plain shell script
> and I never got around to adapting it to an xfstest:
> https://github.com/josefbacik/log-writes
> 
> Cheers,
> Amir.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Amir Goldstein Nov. 23, 2017, 10:44 a.m. UTC | #18

On Wed, Nov 22, 2017 at 11:00 PM, Darrick J. Wong
<darrick.wong@oracle.com> wrote:
> On Wed, Nov 22, 2017 at 10:40:07PM +0200, Amir Goldstein wrote:
>> On Wed, Nov 22, 2017 at 8:31 PM, Darrick J. Wong
>> <darrick.wong@oracle.com> wrote:
>> > On Wed, Nov 15, 2017 at 08:14:33AM +0200, Amir Goldstein wrote:
>> [...]
>> >>
>> >> Last time you wrote about this bug you had a "hard question" about transaction
>> >> reservation for the solution and said your're going to go have a think about it:
>> >> https://marc.info/?l=linux-xfs&m=150766311924170&w=2
>> >> Did you come to any conclusions?
>> >> That sounds like one of those nasty CoW corner cases, so I'd be happy to know
>> >> there is at least a well thought design for a solution - if not a fix.
>> >
>> > Sorry I let myself get distracted/stressed with the merge window;
>> > hopefully the patch I sent out will address that problem?
>> >
>>
>> Problem reproduces better on a spinning rust I have at the office, so will
>> give it a spin tomorrow.
>

So far all good.
Spinning rust never survived 10 rounds of generic/457
and it is past the 100's round with your patch already.
Will keep it spinning for the weekend.

Thanks,
Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Darrick J. Wong Jan. 8, 2018, 9:43 p.m. UTC | #19

On Fri, Sep 01, 2017 at 08:19:52AM +1000, Dave Chinner wrote:
> On Wed, Aug 30, 2017 at 11:40:33PM -0700, Darrick J. Wong wrote:
> > On Wed, Aug 30, 2017 at 04:54:00PM +0200, Christoph Hellwig wrote:
> > > But reject reflink + DAX file systems for now until the code to
> > > support reflinks on DAX is actually implemented.
> > > 
> > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > > ---
> > >  fs/xfs/xfs_super.c | 8 +++-----
> > >  1 file changed, 3 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > > index 38aaacdbb8b3..92521032468e 100644
> > > --- a/fs/xfs/xfs_super.c
> > > +++ b/fs/xfs/xfs_super.c
> > > @@ -1634,7 +1634,9 @@ xfs_fs_fill_super(
> > >  		}
> > >  		if (xfs_sb_version_hasreflink(&mp->m_sb))
> > >  			xfs_alert(mp,
> > > -		"DAX and reflink have not been tested together!");
> > > +		"DAX and reflink can not be used together!");
> > > +			error = -EINVAL;
> > > +			goto out_filestream_unmount;
> > 
> > /This/ hunk seems fine, but...
> > 
> > >  	}
> > >  
> > >  	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> > > @@ -1648,10 +1650,6 @@ xfs_fs_fill_super(
> > >  	"EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!");
> > >  	}
> > >  
> > > -	if (xfs_sb_version_hasreflink(&mp->m_sb))
> > > -		xfs_alert(mp,
> > > -	"EXPERIMENTAL reflink feature enabled. Use at your own risk!");
> > > -
> > 
> > ...I frankly would rather wait until we land and stabilize the incore
> > extent rework, because I'd rather not have the reflink story be: 1) we
> > declare reflink stable in 4.14; 2) immediately people start loading up
> > XFSes with sparse VM images that blow up on high order allocations and
> > then 3) we get a whole lot of complaints about it.  Then in 4.15 we 4)
> > land the incore extent map only now we're dragging those same users
> > through the mud while /that/ stabilizes, with the result 5) that
> > everyone thinks we've gone off the deep end and doesn't trust us
> > anymore.
> 
> Which is a lose-lose proposition. IOWs, to be avoided at all costs.
> 
> > I wish that didn't also mean waiting another 6 months for something that
> > none of us /developers/ have seen in practice,
> 
> Oh, I've seen it plenty of times in practice. Restoring large
> metadumps with (tens of) millions of tiny extents is the common
> cause for me.
> 
> > especially since the
> > enterprise distros will have plenty of time to backport all this stuff
> > before their next big releases if it /does/ become a problem.  It's fine
> > enough for me (and Christoph's customers, evidently) but is that enough?
> 
> FYI, this has been a problem on RHEL for a long time. It's a long
> standing, well known problem and that's one of the reasons why
> it's at the top of my list to fix right now.
> 
> > <shrug> Not sure what to do about my own fear factor. :)  Anyone want to
> > offer further comments?  A quick git history trip says sparse inodes
> > took about 13 months to go from initial commit to EXPERIMENTAL removed?
> 
> I always aimed to remove the experimental tag 4 kernel releases
> after the initial merge, assuming testing and required functionality
> was all there and complete early on in the stabilisation process.
> Generally I had a set of criteria that needed to be fulfilled before
> I'd remove the tag, and ticked them off as fixes were merged.
> 
> e.g.  CRCs were introduced in 3.10, the experimental tag was removed
> in 3.15, so 5 releases in that case. We didn't get CRCs working
> sanely until 3.11 and there was still a lot of userspace stuff that
> lagged (and having a fully working userspace was a criteria for
> removing experimental from the kernel code), so it took longer to
> stabilise than something smaller like sparse inodes.
> 
> Fixing the extent list memory allocation problem was always
> something on my list of "need to fix before reflink can be deployed
> in anger", so I'd much prefer that we are conservative in removing
> the experimental tag. It doesn't hurt anyone if we keep it on until
> we get the extent tree rework merged, and it's better to err on the
> side of keeping it too long than not long enough.

Hmm, it's been 4 months since we last picked up this thread.  We've
had 7 releases since reflink itself went upstream, which for most
features would be long enough to remove the EXPERIMENTAL tag for 4.16.

On the other hand, the extent list memory allocation fix series hasn't
had much time to stabilize, which could be a reason to leave the tag.

Thoughts?

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Dave Chinner Jan. 8, 2018, 10:11 p.m. UTC | #20

On Mon, Jan 08, 2018 at 01:43:27PM -0800, Darrick J. Wong wrote:
> On Fri, Sep 01, 2017 at 08:19:52AM +1000, Dave Chinner wrote:
> > On Wed, Aug 30, 2017 at 11:40:33PM -0700, Darrick J. Wong wrote:
> > > On Wed, Aug 30, 2017 at 04:54:00PM +0200, Christoph Hellwig wrote:
> > > > But reject reflink + DAX file systems for now until the code to
> > > > support reflinks on DAX is actually implemented.
> > > > 
> > > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > > > ---
> > > >  fs/xfs/xfs_super.c | 8 +++-----
> > > >  1 file changed, 3 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > > > index 38aaacdbb8b3..92521032468e 100644
> > > > --- a/fs/xfs/xfs_super.c
> > > > +++ b/fs/xfs/xfs_super.c
> > > > @@ -1634,7 +1634,9 @@ xfs_fs_fill_super(
> > > >  		}
> > > >  		if (xfs_sb_version_hasreflink(&mp->m_sb))
> > > >  			xfs_alert(mp,
> > > > -		"DAX and reflink have not been tested together!");
> > > > +		"DAX and reflink can not be used together!");
> > > > +			error = -EINVAL;
> > > > +			goto out_filestream_unmount;
> > > 
> > > /This/ hunk seems fine, but...
> > > 
> > > >  	}
> > > >  
> > > >  	if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> > > > @@ -1648,10 +1650,6 @@ xfs_fs_fill_super(
> > > >  	"EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!");
> > > >  	}
> > > >  
> > > > -	if (xfs_sb_version_hasreflink(&mp->m_sb))
> > > > -		xfs_alert(mp,
> > > > -	"EXPERIMENTAL reflink feature enabled. Use at your own risk!");
> > > > -
> > > 
> > > ...I frankly would rather wait until we land and stabilize the incore
> > > extent rework, because I'd rather not have the reflink story be: 1) we
> > > declare reflink stable in 4.14; 2) immediately people start loading up
> > > XFSes with sparse VM images that blow up on high order allocations and
> > > then 3) we get a whole lot of complaints about it.  Then in 4.15 we 4)
> > > land the incore extent map only now we're dragging those same users
> > > through the mud while /that/ stabilizes, with the result 5) that
> > > everyone thinks we've gone off the deep end and doesn't trust us
> > > anymore.
> > 
> > Which is a lose-lose proposition. IOWs, to be avoided at all costs.
> > 
> > > I wish that didn't also mean waiting another 6 months for something that
> > > none of us /developers/ have seen in practice,
> > 
> > Oh, I've seen it plenty of times in practice. Restoring large
> > metadumps with (tens of) millions of tiny extents is the common
> > cause for me.
> > 
> > > especially since the
> > > enterprise distros will have plenty of time to backport all this stuff
> > > before their next big releases if it /does/ become a problem.  It's fine
> > > enough for me (and Christoph's customers, evidently) but is that enough?
> > 
> > FYI, this has been a problem on RHEL for a long time. It's a long
> > standing, well known problem and that's one of the reasons why
> > it's at the top of my list to fix right now.
> > 
> > > <shrug> Not sure what to do about my own fear factor. :)  Anyone want to
> > > offer further comments?  A quick git history trip says sparse inodes
> > > took about 13 months to go from initial commit to EXPERIMENTAL removed?
> > 
> > I always aimed to remove the experimental tag 4 kernel releases
> > after the initial merge, assuming testing and required functionality
> > was all there and complete early on in the stabilisation process.
> > Generally I had a set of criteria that needed to be fulfilled before
> > I'd remove the tag, and ticked them off as fixes were merged.
> > 
> > e.g.  CRCs were introduced in 3.10, the experimental tag was removed
> > in 3.15, so 5 releases in that case. We didn't get CRCs working
> > sanely until 3.11 and there was still a lot of userspace stuff that
> > lagged (and having a fully working userspace was a criteria for
> > removing experimental from the kernel code), so it took longer to
> > stabilise than something smaller like sparse inodes.
> > 
> > Fixing the extent list memory allocation problem was always
> > something on my list of "need to fix before reflink can be deployed
> > in anger", so I'd much prefer that we are conservative in removing
> > the experimental tag. It doesn't hurt anyone if we keep it on until
> > we get the extent tree rework merged, and it's better to err on the
> > side of keeping it too long than not long enough.
> 
> Hmm, it's been 4 months since we last picked up this thread.  We've
> had 7 releases since reflink itself went upstream, which for most
> features would be long enough to remove the EXPERIMENTAL tag for 4.16.
> 
> On the other hand, the extent list memory allocation fix series hasn't
> had much time to stabilize, which could be a reason to leave the tag.
> 
> Thoughts?

I was thinking about this last night, actually. The extent rework
seems stable enough - I've been hammering large extent count
directories for the past week and I haven't come across any problems
so I think we're good to remove the EXPERIMENTAL tags now...

Cheers,

Dave.

xfs: remove experimental tag for reflinks

Commit Message

Comments

Patch