Message ID | 20170830145400.3681-1-hch@lst.de (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
On Wed, Aug 30, 2017 at 04:54:00PM +0200, Christoph Hellwig wrote: > But reject reflink + DAX file systems for now until the code to > support reflinks on DAX is actually implemented. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > fs/xfs/xfs_super.c | 8 +++----- > 1 file changed, 3 insertions(+), 5 deletions(-) > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > index 38aaacdbb8b3..92521032468e 100644 > --- a/fs/xfs/xfs_super.c > +++ b/fs/xfs/xfs_super.c > @@ -1634,7 +1634,9 @@ xfs_fs_fill_super( > } > if (xfs_sb_version_hasreflink(&mp->m_sb)) > xfs_alert(mp, > - "DAX and reflink have not been tested together!"); > + "DAX and reflink can not be used together!"); > + error = -EINVAL; > + goto out_filestream_unmount; /This/ hunk seems fine, but... > } > > if (xfs_sb_version_hasrmapbt(&mp->m_sb)) { > @@ -1648,10 +1650,6 @@ xfs_fs_fill_super( > "EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!"); > } > > - if (xfs_sb_version_hasreflink(&mp->m_sb)) > - xfs_alert(mp, > - "EXPERIMENTAL reflink feature enabled. Use at your own risk!"); > - ...I frankly would rather wait until we land and stabilize the incore extent rework, because I'd rather not have the reflink story be: 1) we declare reflink stable in 4.14; 2) immediately people start loading up XFSes with sparse VM images that blow up on high order allocations and then 3) we get a whole lot of complaints about it. Then in 4.15 we 4) land the incore extent map only now we're dragging those same users through the mud while /that/ stabilizes, with the result 5) that everyone thinks we've gone off the deep end and doesn't trust us anymore. I wish that didn't also mean waiting another 6 months for something that none of us /developers/ have seen in practice, especially since the enterprise distros will have plenty of time to backport all this stuff before their next big releases if it /does/ become a problem. It's fine enough for me (and Christoph's customers, evidently) but is that enough? <shrug> Not sure what to do about my own fear factor. :) Anyone want to offer further comments? A quick git history trip says sparse inodes took about 13 months to go from initial commit to EXPERIMENTAL removed? --D > error = xfs_mountfs(mp); > if (error) > goto out_filestream_unmount; > -- > 2.11.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Aug 30, 2017 at 11:40:33PM -0700, Darrick J. Wong wrote: > On Wed, Aug 30, 2017 at 04:54:00PM +0200, Christoph Hellwig wrote: > > But reject reflink + DAX file systems for now until the code to > > support reflinks on DAX is actually implemented. > > > > Signed-off-by: Christoph Hellwig <hch@lst.de> > > --- > > fs/xfs/xfs_super.c | 8 +++----- > > 1 file changed, 3 insertions(+), 5 deletions(-) > > > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > > index 38aaacdbb8b3..92521032468e 100644 > > --- a/fs/xfs/xfs_super.c > > +++ b/fs/xfs/xfs_super.c > > @@ -1634,7 +1634,9 @@ xfs_fs_fill_super( > > } > > if (xfs_sb_version_hasreflink(&mp->m_sb)) > > xfs_alert(mp, > > - "DAX and reflink have not been tested together!"); > > + "DAX and reflink can not be used together!"); > > + error = -EINVAL; > > + goto out_filestream_unmount; > > /This/ hunk seems fine, but... > > > } > > > > if (xfs_sb_version_hasrmapbt(&mp->m_sb)) { > > @@ -1648,10 +1650,6 @@ xfs_fs_fill_super( > > "EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!"); > > } > > > > - if (xfs_sb_version_hasreflink(&mp->m_sb)) > > - xfs_alert(mp, > > - "EXPERIMENTAL reflink feature enabled. Use at your own risk!"); > > - > > ...I frankly would rather wait until we land and stabilize the incore > extent rework, because I'd rather not have the reflink story be: 1) we > declare reflink stable in 4.14; 2) immediately people start loading up > XFSes with sparse VM images that blow up on high order allocations and > then 3) we get a whole lot of complaints about it. Then in 4.15 we 4) > land the incore extent map only now we're dragging those same users > through the mud while /that/ stabilizes, with the result 5) that > everyone thinks we've gone off the deep end and doesn't trust us > anymore. > > I wish that didn't also mean waiting another 6 months for something that > none of us /developers/ have seen in practice, especially since the > enterprise distros will have plenty of time to backport all this stuff > before their next big releases if it /does/ become a problem. It's fine > enough for me (and Christoph's customers, evidently) but is that enough? > > <shrug> Not sure what to do about my own fear factor. :) Anyone want to > offer further comments? A quick git history trip says sparse inodes > took about 13 months to go from initial commit to EXPERIMENTAL removed? > FWIW, I don't really have a strong opinion. To me, removing experimental means we feel the code has stabilized long enough in principle, there are no significant problems (i.e., corruption/crash vectors) that we are aware of and the feature is complete (full userspace tool support, etc). The in-core extent list thing seems like more of a general problem to me so I have no objection to removing the experimental tag from reflink if that's really the only known drawback. Of course, I don't object if there's developer preference to keep it for another cycle or two either. That aside, shouldn't we consider the rmapbt experimental tag first, or at least at the same time? It's been around for slightly longer. Brian > --D > > > error = xfs_mountfs(mp); > > if (error) > > goto out_filestream_unmount; > > -- > > 2.11.0 > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 31, 2017 at 08:43:21AM -0400, Brian Foster wrote: > FWIW, I don't really have a strong opinion. To me, removing experimental > means we feel the code has stabilized long enough in principle, there > are no significant problems (i.e., corruption/crash vectors) that we are > aware of and the feature is complete (full userspace tool support, etc). > The in-core extent list thing seems like more of a general problem to me Agreed so far. > That aside, shouldn't we consider the rmapbt experimental tag first, or > at least at the same time? It's been around for slightly longer. I've not done much testing on that or have experience with it in general, nor do I have a customer with a big QA team beating it hard, so I can't really comment on that one. -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 31, 2017 at 03:30:19PM +0200, Christoph Hellwig wrote: > On Thu, Aug 31, 2017 at 08:43:21AM -0400, Brian Foster wrote: > > FWIW, I don't really have a strong opinion. To me, removing experimental > > means we feel the code has stabilized long enough in principle, there > > are no significant problems (i.e., corruption/crash vectors) that we are > > aware of and the feature is complete (full userspace tool support, etc). > > The in-core extent list thing seems like more of a general problem to me > > Agreed so far. <nod> Dave? Eric? Any perspective you'd like to offer? :) > > That aside, shouldn't we consider the rmapbt experimental tag first, or > > at least at the same time? It's been around for slightly longer. > > I've not done much testing on that or have experience with it in general, > nor do I have a customer with a big QA team beating it hard, so I can't > really comment on that one. rmapbt will remain EXPERIMENTAL because I still have more patches to send to finish the feature for realtime devices. Speaking of which, it's now been 53 weeks since the last dump of that, so I'll go do that now. :P FWIW I /also/ run rmapbt everywhere and haven't had any trouble with it since adding the per-AG reservations. --D > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 8/31/17 10:31 AM, Darrick J. Wong wrote: > On Thu, Aug 31, 2017 at 03:30:19PM +0200, Christoph Hellwig wrote: >> On Thu, Aug 31, 2017 at 08:43:21AM -0400, Brian Foster wrote: >>> FWIW, I don't really have a strong opinion. To me, removing experimental >>> means we feel the code has stabilized long enough in principle, there >>> are no significant problems (i.e., corruption/crash vectors) that we are >>> aware of and the feature is complete (full userspace tool support, etc). >>> The in-core extent list thing seems like more of a general problem to me >> >> Agreed so far. > > <nod> Dave? Eric? Any perspective you'd like to offer? :) This is a bit of a naiive question, but how many applications are out there that can be used with reflink right now? This would obviously delay things a bit, but I had considered writing something up for LWN or Fedora Planet or $WHATEVER describing these new xfs features, how they can be used, and encourage some early-adopter testing. Try to get some buzz going and some real-world use. It's always a catch 22; nobody uses it until it's marked stable, but we never know if it's really stable until people outside the development community use it. ;) As for the allocation issues w/ the in core extent list, yeah, that worries me. -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 31, 2017 at 08:31:48AM -0700, Darrick J. Wong wrote: > On Thu, Aug 31, 2017 at 03:30:19PM +0200, Christoph Hellwig wrote: > > On Thu, Aug 31, 2017 at 08:43:21AM -0400, Brian Foster wrote: > > > FWIW, I don't really have a strong opinion. To me, removing experimental > > > means we feel the code has stabilized long enough in principle, there > > > are no significant problems (i.e., corruption/crash vectors) that we are > > > aware of and the feature is complete (full userspace tool support, etc). > > > The in-core extent list thing seems like more of a general problem to me > > > > Agreed so far. > > <nod> Dave? Eric? Any perspective you'd like to offer? :) > > > > That aside, shouldn't we consider the rmapbt experimental tag first, or > > > at least at the same time? It's been around for slightly longer. > > > > I've not done much testing on that or have experience with it in general, > > nor do I have a customer with a big QA team beating it hard, so I can't > > really comment on that one. > > rmapbt will remain EXPERIMENTAL because I still have more patches to > send to finish the feature for realtime devices. Speaking of which, > it's now been 53 weeks since the last dump of that, so I'll go do that > now. :P > > FWIW I /also/ run rmapbt everywhere and haven't had any trouble with it > since adding the per-AG reservations. > My question then is do we want to encourage users to run reflink without rmapbt because of the latter being experimental, and only so because of a lack of realtime support? *shrug* Maybe it doesn't really matter. But realistically, how likely is it that the stability of forthcoming rmapbt+realtime code has any bearing on making rmapbt non-experimental in general? I suspect we're not going to leave it experimental for another however many months just to let the rt code sit around. I also suspect not many people will be actually using/testing that code outside of some of us, but maybe there are real users out there and I'm just not aware of them..? (This all coming from somebody who has CONFIG_XFS_RT disabled on his configs. ;) If realtime is the only barrier, ISTM we could remove the experimental rmapbt status and just disable rmapbt+rt for now. Then re-enable EXPERIMENTAL just for rmapbt+rt when that code goes in (which seems like the most likely end result to me anyways). Brian > --D > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 31, 2017 at 10:55:31AM -0500, Eric Sandeen wrote: > On 8/31/17 10:31 AM, Darrick J. Wong wrote: > > On Thu, Aug 31, 2017 at 03:30:19PM +0200, Christoph Hellwig wrote: > >> On Thu, Aug 31, 2017 at 08:43:21AM -0400, Brian Foster wrote: > >>> FWIW, I don't really have a strong opinion. To me, removing experimental > >>> means we feel the code has stabilized long enough in principle, there > >>> are no significant problems (i.e., corruption/crash vectors) that we are > >>> aware of and the feature is complete (full userspace tool support, etc). > >>> The in-core extent list thing seems like more of a general problem to me > >> > >> Agreed so far. > > > > <nod> Dave? Eric? Any perspective you'd like to offer? :) > > This is a bit of a naiive question, but how many applications are out > there that can be used with reflink right now? This would obviously Anything that uses cp --reflink, which could be any number of deploy scripts to VM "snapshot" tools. ISTR that cifs & nfs now both support reflink too, and now that ocfs2 also supports the ficlone* ioctls (sort of) we can expect wider use. dupremove (offline dedupe tool for btrfs) can also talk to xfs/ocfs2. > delay things a bit, but I had considered writing something up for > LWN or Fedora Planet or $WHATEVER describing these new xfs features, > how they can be used, and encourage some early-adopter testing. Try > to get some buzz going and some real-world use. I know there's already some dedicated experimenters using reflink... > It's always a catch 22; nobody uses it until it's marked stable, > but we never know if it's really stable until people outside the > development community use it. ;) > > As for the allocation issues w/ the in core extent list, yeah, that > worries me. Yeah. On the one hand it's a pre-existing condition that's orthogonal to reflink (and hey, we even put in cowextsize to mitigate the problems on VMs) but OTOH it probably gets worse with cow. --D > > -Eric > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 31, 2017 at 11:59:24AM -0400, Brian Foster wrote: > On Thu, Aug 31, 2017 at 08:31:48AM -0700, Darrick J. Wong wrote: > > On Thu, Aug 31, 2017 at 03:30:19PM +0200, Christoph Hellwig wrote: > > > On Thu, Aug 31, 2017 at 08:43:21AM -0400, Brian Foster wrote: > > > > FWIW, I don't really have a strong opinion. To me, removing experimental > > > > means we feel the code has stabilized long enough in principle, there > > > > are no significant problems (i.e., corruption/crash vectors) that we are > > > > aware of and the feature is complete (full userspace tool support, etc). > > > > The in-core extent list thing seems like more of a general problem to me > > > > > > Agreed so far. > > > > <nod> Dave? Eric? Any perspective you'd like to offer? :) > > > > > > That aside, shouldn't we consider the rmapbt experimental tag first, or > > > > at least at the same time? It's been around for slightly longer. > > > > > > I've not done much testing on that or have experience with it in general, > > > nor do I have a customer with a big QA team beating it hard, so I can't > > > really comment on that one. > > > > rmapbt will remain EXPERIMENTAL because I still have more patches to > > send to finish the feature for realtime devices. Speaking of which, > > it's now been 53 weeks since the last dump of that, so I'll go do that > > now. :P > > > > FWIW I /also/ run rmapbt everywhere and haven't had any trouble with it > > since adding the per-AG reservations. > > > > My question then is do we want to encourage users to run reflink without > rmapbt because of the latter being experimental, and only so because of > a lack of realtime support? *shrug* Maybe it doesn't really matter. Probably not, I think realtime users are fairly infrequent and especially so on v5. The only reasons I can think of to extend our new features to rt are (a) to avoid screwing over the existing usecases and (b) I guess you could build a hybrid xfs between an SSD and a SMR drive wherein we always CoW from one end of the disk to the other. (That's crazy, but hey.) > But realistically, how likely is it that the stability of forthcoming > rmapbt+realtime code has any bearing on making rmapbt non-experimental > in general? The biggest code churn to add realtime rmap is reworking the btree code to support putting btree records in an inode fork. The rmap code itself is relatively unchanged aside from widening rm_startblock/rm_blockcount to 64 bits. > I suspect we're not going to leave it experimental for another however > many months just to let the rt code sit around. I also suspect not > many people will be actually using/testing that code outside of some > of us, but maybe there are real users out there and I'm just not aware > of them..? (This all coming from somebody who has CONFIG_XFS_RT > disabled on his configs. ;) /me has it enabled and (occasionally) runs with realtime to see what breaks. :) > If realtime is the only barrier, ISTM we could remove the experimental > rmapbt status and just disable rmapbt+rt for now. Then re-enable > EXPERIMENTAL just for rmapbt+rt when that code goes in (which seems like > the most likely end result to me anyways). It's already disabled. I suppose it's not /that/ big of a deal if old kernels reject certain feature combinations... ...by the way, we can't add a rt device to an already-mounted filesystem, right? --D > > Brian > > > --D > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 31, 2017 at 01:09:36PM -0700, Darrick J. Wong wrote: > On Thu, Aug 31, 2017 at 11:59:24AM -0400, Brian Foster wrote: > > On Thu, Aug 31, 2017 at 08:31:48AM -0700, Darrick J. Wong wrote: > > > On Thu, Aug 31, 2017 at 03:30:19PM +0200, Christoph Hellwig wrote: > > > > On Thu, Aug 31, 2017 at 08:43:21AM -0400, Brian Foster wrote: > > > > > FWIW, I don't really have a strong opinion. To me, removing experimental > > > > > means we feel the code has stabilized long enough in principle, there > > > > > are no significant problems (i.e., corruption/crash vectors) that we are > > > > > aware of and the feature is complete (full userspace tool support, etc). > > > > > The in-core extent list thing seems like more of a general problem to me > > > > > > > > Agreed so far. > > > > > > <nod> Dave? Eric? Any perspective you'd like to offer? :) > > > > > > > > That aside, shouldn't we consider the rmapbt experimental tag first, or > > > > > at least at the same time? It's been around for slightly longer. > > > > > > > > I've not done much testing on that or have experience with it in general, > > > > nor do I have a customer with a big QA team beating it hard, so I can't > > > > really comment on that one. > > > > > > rmapbt will remain EXPERIMENTAL because I still have more patches to > > > send to finish the feature for realtime devices. Speaking of which, > > > it's now been 53 weeks since the last dump of that, so I'll go do that > > > now. :P > > > > > > FWIW I /also/ run rmapbt everywhere and haven't had any trouble with it > > > since adding the per-AG reservations. > > > > > > > My question then is do we want to encourage users to run reflink without > > rmapbt because of the latter being experimental, and only so because of > > a lack of realtime support? *shrug* Maybe it doesn't really matter. > > Probably not, I think realtime users are fairly infrequent and > especially so on v5. The only reasons I can think of to extend our new > features to rt are (a) to avoid screwing over the existing usecases and > (b) I guess you could build a hybrid xfs between an SSD and a SMR drive > wherein we always CoW from one end of the disk to the other. (That's > crazy, but hey.) > Yeah, the only use cases I've really heard wrt to realtime any time in the recent past has been these kind of future facing experiments as opposed to traditional use. Not to say there aren't users out there, but I also can't recall seeing any bug reports or anything of that nature either any time recently. Maybe it's just perfect code. ;) I think Dave has thought about the SMR+RT thing in the past. Somebody else mentioned some other thing they were trying to do using RT to me at Vault, to which I suggested to post some rfc code for discussion (which obviously hasn't happened), but I honestly can't remember what that experiment even was. > > But realistically, how likely is it that the stability of forthcoming > > rmapbt+realtime code has any bearing on making rmapbt non-experimental > > in general? > > The biggest code churn to add realtime rmap is reworking the btree code > to support putting btree records in an inode fork. The rmap code itself > is relatively unchanged aside from widening rm_startblock/rm_blockcount > to 64 bits. > > > I suspect we're not going to leave it experimental for another however > > many months just to let the rt code sit around. I also suspect not > > many people will be actually using/testing that code outside of some > > of us, but maybe there are real users out there and I'm just not aware > > of them..? (This all coming from somebody who has CONFIG_XFS_RT > > disabled on his configs. ;) > > /me has it enabled and (occasionally) runs with realtime to see what > breaks. :) > Heh, I think I did that once in the past. That about sums up my experience with rt. :) > > If realtime is the only barrier, ISTM we could remove the experimental > > rmapbt status and just disable rmapbt+rt for now. Then re-enable > > EXPERIMENTAL just for rmapbt+rt when that code goes in (which seems like > > the most likely end result to me anyways). > > It's already disabled. I suppose it's not /that/ big of a deal if old > kernels reject certain feature combinations... > Er, right. :P That seems reasonable to me. Again, I don't feel terribly strongly about whether to remove the tag or not for rmapbt in general. It just seems a bit strange to me to open up reflink and not rmapbt just because we don't support the latter on realtime yet. > ...by the way, we can't add a rt device to an already-mounted > filesystem, right? > I thought it was a mkfs time thing, but I could be mistaken... Brian > --D > > > > > Brian > > > > > --D > > > > > > > -- > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > > > the body of a message to majordomo@vger.kernel.org > > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Aug 30, 2017 at 11:40:33PM -0700, Darrick J. Wong wrote: > On Wed, Aug 30, 2017 at 04:54:00PM +0200, Christoph Hellwig wrote: > > But reject reflink + DAX file systems for now until the code to > > support reflinks on DAX is actually implemented. > > > > Signed-off-by: Christoph Hellwig <hch@lst.de> > > --- > > fs/xfs/xfs_super.c | 8 +++----- > > 1 file changed, 3 insertions(+), 5 deletions(-) > > > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > > index 38aaacdbb8b3..92521032468e 100644 > > --- a/fs/xfs/xfs_super.c > > +++ b/fs/xfs/xfs_super.c > > @@ -1634,7 +1634,9 @@ xfs_fs_fill_super( > > } > > if (xfs_sb_version_hasreflink(&mp->m_sb)) > > xfs_alert(mp, > > - "DAX and reflink have not been tested together!"); > > + "DAX and reflink can not be used together!"); > > + error = -EINVAL; > > + goto out_filestream_unmount; > > /This/ hunk seems fine, but... > > > } > > > > if (xfs_sb_version_hasrmapbt(&mp->m_sb)) { > > @@ -1648,10 +1650,6 @@ xfs_fs_fill_super( > > "EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!"); > > } > > > > - if (xfs_sb_version_hasreflink(&mp->m_sb)) > > - xfs_alert(mp, > > - "EXPERIMENTAL reflink feature enabled. Use at your own risk!"); > > - > > ...I frankly would rather wait until we land and stabilize the incore > extent rework, because I'd rather not have the reflink story be: 1) we > declare reflink stable in 4.14; 2) immediately people start loading up > XFSes with sparse VM images that blow up on high order allocations and > then 3) we get a whole lot of complaints about it. Then in 4.15 we 4) > land the incore extent map only now we're dragging those same users > through the mud while /that/ stabilizes, with the result 5) that > everyone thinks we've gone off the deep end and doesn't trust us > anymore. Which is a lose-lose proposition. IOWs, to be avoided at all costs. > I wish that didn't also mean waiting another 6 months for something that > none of us /developers/ have seen in practice, Oh, I've seen it plenty of times in practice. Restoring large metadumps with (tens of) millions of tiny extents is the common cause for me. > especially since the > enterprise distros will have plenty of time to backport all this stuff > before their next big releases if it /does/ become a problem. It's fine > enough for me (and Christoph's customers, evidently) but is that enough? FYI, this has been a problem on RHEL for a long time. It's a long standing, well known problem and that's one of the reasons why it's at the top of my list to fix right now. > <shrug> Not sure what to do about my own fear factor. :) Anyone want to > offer further comments? A quick git history trip says sparse inodes > took about 13 months to go from initial commit to EXPERIMENTAL removed? I always aimed to remove the experimental tag 4 kernel releases after the initial merge, assuming testing and required functionality was all there and complete early on in the stabilisation process. Generally I had a set of criteria that needed to be fulfilled before I'd remove the tag, and ticked them off as fixes were merged. e.g. CRCs were introduced in 3.10, the experimental tag was removed in 3.15, so 5 releases in that case. We didn't get CRCs working sanely until 3.11 and there was still a lot of userspace stuff that lagged (and having a fully working userspace was a criteria for removing experimental from the kernel code), so it took longer to stabilise than something smaller like sparse inodes. Fixing the extent list memory allocation problem was always something on my list of "need to fix before reflink can be deployed in anger", so I'd much prefer that we are conservative in removing the experimental tag. It doesn't hurt anyone if we keep it on until we get the extent tree rework merged, and it's better to err on the side of keeping it too long than not long enough. Cheers, Dave.
On Thu, Aug 31, 2017 at 04:36:14PM -0400, Brian Foster wrote: > On Thu, Aug 31, 2017 at 01:09:36PM -0700, Darrick J. Wong wrote: > > On Thu, Aug 31, 2017 at 11:59:24AM -0400, Brian Foster wrote: > > > On Thu, Aug 31, 2017 at 08:31:48AM -0700, Darrick J. Wong wrote: > > > > On Thu, Aug 31, 2017 at 03:30:19PM +0200, Christoph Hellwig wrote: > > > > > On Thu, Aug 31, 2017 at 08:43:21AM -0400, Brian Foster wrote: > > > > > > FWIW, I don't really have a strong opinion. To me, removing experimental > > > > > > means we feel the code has stabilized long enough in principle, there > > > > > > are no significant problems (i.e., corruption/crash vectors) that we are > > > > > > aware of and the feature is complete (full userspace tool support, etc). > > > > > > The in-core extent list thing seems like more of a general problem to me > > > > > > > > > > Agreed so far. > > > > > > > > <nod> Dave? Eric? Any perspective you'd like to offer? :) > > > > > > > > > > That aside, shouldn't we consider the rmapbt experimental tag first, or > > > > > > at least at the same time? It's been around for slightly longer. > > > > > > > > > > I've not done much testing on that or have experience with it in general, > > > > > nor do I have a customer with a big QA team beating it hard, so I can't > > > > > really comment on that one. > > > > > > > > rmapbt will remain EXPERIMENTAL because I still have more patches to > > > > send to finish the feature for realtime devices. Speaking of which, > > > > it's now been 53 weeks since the last dump of that, so I'll go do that > > > > now. :P > > > > > > > > FWIW I /also/ run rmapbt everywhere and haven't had any trouble with it > > > > since adding the per-AG reservations. > > > > > > > > > > My question then is do we want to encourage users to run reflink without > > > rmapbt because of the latter being experimental, and only so because of > > > a lack of realtime support? *shrug* Maybe it doesn't really matter. > > > > Probably not, I think realtime users are fairly infrequent and > > especially so on v5. The only reasons I can think of to extend our new > > features to rt are (a) to avoid screwing over the existing usecases and > > (b) I guess you could build a hybrid xfs between an SSD and a SMR drive > > wherein we always CoW from one end of the disk to the other. (That's > > crazy, but hey.) Or use RT files as VM image files with deterministic, low overhead allocation characteristics. And for that use case, I^Hwe want rmap and reflink on the realtime device so we can scrub and snapshot/send/recv those image files...... > Yeah, the only use cases I've really heard wrt to realtime any time in > the recent past has been these kind of future facing experiments as > opposed to traditional use. Not to say there aren't users out there, but Let's not forget that there are tens of millions of existing XFS RT filesystems out there in smart TVs and DVRs. > I also can't recall seeing any bug reports or anything of that nature > either any time recently. Maybe it's just perfect code. ;) No, just carefully constrained applications and a lot of QA. > I think Dave has thought about the SMR+RT thing in the past. Yes, and I wrote a document describing it for the device vendors that were screaming for us to do something to be able to go away and implement..... <crickets> Regardless, IIRC, there have been people using RT devices with SMR drives for some years now, mainly because the RT device could guarantee zone aligned and sized allocations for sequential write applications like video recording.... > > > If realtime is the only barrier, ISTM we could remove the experimental > > > rmapbt status and just disable rmapbt+rt for now. Then re-enable > > > EXPERIMENTAL just for rmapbt+rt when that code goes in (which seems like > > > the most likely end result to me anyways). > > > > It's already disabled. I suppose it's not /that/ big of a deal if old > > kernels reject certain feature combinations... Just add an incompat bit for rtrmapbt and rtreflink when they come along, and everything will be fine :P > > ...by the way, we can't add a rt device to an already-mounted > > filesystem, right? > > I thought it was a mkfs time thing, but I could be mistaken... It is at the moment, but there's nothing that prevents us from adding it dynamically if we wanted (same as journal resizing). e.g. via a remount to specify and open the rt device and growfs to resize it from zero to device size and allocate and initialise the bitmap and summary inodes. Cheers, Dave.
On Fri, Sep 01, 2017 at 08:58:39AM +1000, Dave Chinner wrote: > On Thu, Aug 31, 2017 at 04:36:14PM -0400, Brian Foster wrote: > > On Thu, Aug 31, 2017 at 01:09:36PM -0700, Darrick J. Wong wrote: > > > On Thu, Aug 31, 2017 at 11:59:24AM -0400, Brian Foster wrote: > > > > On Thu, Aug 31, 2017 at 08:31:48AM -0700, Darrick J. Wong wrote: > > > > > On Thu, Aug 31, 2017 at 03:30:19PM +0200, Christoph Hellwig wrote: > > > > > > On Thu, Aug 31, 2017 at 08:43:21AM -0400, Brian Foster wrote: > > > > > > > FWIW, I don't really have a strong opinion. To me, removing experimental > > > > > > > means we feel the code has stabilized long enough in principle, there > > > > > > > are no significant problems (i.e., corruption/crash vectors) that we are > > > > > > > aware of and the feature is complete (full userspace tool support, etc). > > > > > > > The in-core extent list thing seems like more of a general problem to me > > > > > > > > > > > > Agreed so far. > > > > > > > > > > <nod> Dave? Eric? Any perspective you'd like to offer? :) > > > > > > > > > > > > That aside, shouldn't we consider the rmapbt experimental tag first, or > > > > > > > at least at the same time? It's been around for slightly longer. > > > > > > > > > > > > I've not done much testing on that or have experience with it in general, > > > > > > nor do I have a customer with a big QA team beating it hard, so I can't > > > > > > really comment on that one. > > > > > > > > > > rmapbt will remain EXPERIMENTAL because I still have more patches to > > > > > send to finish the feature for realtime devices. Speaking of which, > > > > > it's now been 53 weeks since the last dump of that, so I'll go do that > > > > > now. :P > > > > > > > > > > FWIW I /also/ run rmapbt everywhere and haven't had any trouble with it > > > > > since adding the per-AG reservations. > > > > > > > > > > > > > My question then is do we want to encourage users to run reflink without > > > > rmapbt because of the latter being experimental, and only so because of > > > > a lack of realtime support? *shrug* Maybe it doesn't really matter. > > > > > > Probably not, I think realtime users are fairly infrequent and > > > especially so on v5. The only reasons I can think of to extend our new > > > features to rt are (a) to avoid screwing over the existing usecases and > > > (b) I guess you could build a hybrid xfs between an SSD and a SMR drive > > > wherein we always CoW from one end of the disk to the other. (That's > > > crazy, but hey.) > > Or use RT files as VM image files with deterministic, low overhead > allocation characteristics. And for that use case, I^Hwe want rmap > and reflink on the realtime device so we can scrub and > snapshot/send/recv those image files...... > > > Yeah, the only use cases I've really heard wrt to realtime any time in > > the recent past has been these kind of future facing experiments as > > opposed to traditional use. Not to say there aren't users out there, but > > Let's not forget that there are tens of millions of existing > XFS RT filesystems out there in smart TVs and DVRs. > Oh, I'm not suggesting there are zero users or that we should never port over the associated features to RT. I'm just suggesting that if we were going to untag reflink as experimental, it may not be worth holding back doing the same for rmapbt if lack of RT support is the only concern. I do question how many of these existing RT users may be on newer features like v5 filesystems (if supported?), or kernels that even support v5 filesystems for that matter. ;) > > I also can't recall seeing any bug reports or anything of that nature > > either any time recently. Maybe it's just perfect code. ;) > > No, just carefully constrained applications and a lot of QA. > I'd hope we have at least the same amount of QA for !RT releases. (I know from experience that the margins on many of those kinds of consumer devices don't exactly facilitate the level of QE we typically see at the enterprise level). If anything, I'd guess that whatever historical sense of reliability this feature offers for things like embedded environments stems more from using older kernels and code that basically never changes than anything else. The one time I tried using RT way back when, it fell on its face fairly hard and fairly quickly. Granted that is anecdotal and I don't recall the specifics anyways. Maybe its been fixed up since then. > > I think Dave has thought about the SMR+RT thing in the past. > > Yes, and I wrote a document describing it for the device vendors > that were screaming for us to do something to be able to go away and > implement..... <crickets> > > Regardless, IIRC, there have been people using RT devices with SMR > drives for some years now, mainly because the RT device could > guarantee zone aligned and sized allocations for sequential write > applications like video recording.... > Yep, I recall that discussion. > > > > If realtime is the only barrier, ISTM we could remove the experimental > > > > rmapbt status and just disable rmapbt+rt for now. Then re-enable > > > > EXPERIMENTAL just for rmapbt+rt when that code goes in (which seems like > > > > the most likely end result to me anyways). > > > > > > It's already disabled. I suppose it's not /that/ big of a deal if old > > > kernels reject certain feature combinations... > > Just add an incompat bit for rtrmapbt and rtreflink when they come > along, and everything will be fine :P > That sounds quite reasonable to me. Brian > > > ...by the way, we can't add a rt device to an already-mounted > > > filesystem, right? > > > > I thought it was a mkfs time thing, but I could be mistaken... > > It is at the moment, but there's nothing that prevents us from > adding it dynamically if we wanted (same as journal resizing). e.g. > via a remount to specify and open the rt device and growfs to resize > it from zero to device size and allocate and initialise the bitmap > and summary inodes. > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Aug 30, 2017 at 04:54:00PM +0200, Christoph Hellwig wrote: > But reject reflink + DAX file systems for now until the code to > support reflinks on DAX is actually implemented. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > fs/xfs/xfs_super.c | 8 +++----- > 1 file changed, 3 insertions(+), 5 deletions(-) > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > index 38aaacdbb8b3..92521032468e 100644 > --- a/fs/xfs/xfs_super.c > +++ b/fs/xfs/xfs_super.c > @@ -1634,7 +1634,9 @@ xfs_fs_fill_super( > } > if (xfs_sb_version_hasreflink(&mp->m_sb)) > xfs_alert(mp, > - "DAX and reflink have not been tested together!"); > + "DAX and reflink can not be used together!"); > + error = -EINVAL; > + goto out_filestream_unmount; > } > > if (xfs_sb_version_hasrmapbt(&mp->m_sb)) { > @@ -1648,10 +1650,6 @@ xfs_fs_fill_super( > "EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!"); > } > > - if (xfs_sb_version_hasreflink(&mp->m_sb)) > - xfs_alert(mp, > - "EXPERIMENTAL reflink feature enabled. Use at your own risk!"); Ok, now that the incore extent map rework has landed upstream, can we please get everyone's QA to stress the reflink/cow code vigorously during this release cycle so that we can re-consider this patch for 4.16? (And if you're particularly mean, force cowextsize = 1fsb to age the filesystem prematurely?) Right now I think the only problem I know about is that log recovery bug where the defer ops of replayed defer ops get logged in the wrong order, but that's it. --D > - > error = xfs_mountfs(mp); > if (error) > goto out_filestream_unmount; > -- > 2.11.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 15, 2017 at 3:10 AM, Darrick J. Wong <darrick.wong@oracle.com> wrote: > On Wed, Aug 30, 2017 at 04:54:00PM +0200, Christoph Hellwig wrote: >> But reject reflink + DAX file systems for now until the code to >> support reflinks on DAX is actually implemented. >> >> Signed-off-by: Christoph Hellwig <hch@lst.de> >> --- >> fs/xfs/xfs_super.c | 8 +++----- >> 1 file changed, 3 insertions(+), 5 deletions(-) >> >> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c >> index 38aaacdbb8b3..92521032468e 100644 >> --- a/fs/xfs/xfs_super.c >> +++ b/fs/xfs/xfs_super.c >> @@ -1634,7 +1634,9 @@ xfs_fs_fill_super( >> } >> if (xfs_sb_version_hasreflink(&mp->m_sb)) >> xfs_alert(mp, >> - "DAX and reflink have not been tested together!"); >> + "DAX and reflink can not be used together!"); >> + error = -EINVAL; >> + goto out_filestream_unmount; >> } >> >> if (xfs_sb_version_hasrmapbt(&mp->m_sb)) { >> @@ -1648,10 +1650,6 @@ xfs_fs_fill_super( >> "EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!"); >> } >> >> - if (xfs_sb_version_hasreflink(&mp->m_sb)) >> - xfs_alert(mp, >> - "EXPERIMENTAL reflink feature enabled. Use at your own risk!"); > > Ok, now that the incore extent map rework has landed upstream, can we > please get everyone's QA to stress the reflink/cow code vigorously > during this release cycle so that we can re-consider this patch for > 4.16? > > (And if you're particularly mean, force cowextsize = 1fsb to age the > filesystem prematurely?) > > Right now I think the only problem I know about is that log recovery bug > where the defer ops of replayed defer ops get logged in the wrong order, > but that's it. > Last time you wrote about this bug you had a "hard question" about transaction reservation for the solution and said your're going to go have a think about it: https://marc.info/?l=linux-xfs&m=150766311924170&w=2 Did you come to any conclusions? That sounds like one of those nasty CoW corner cases, so I'd be happy to know there is at least a well thought design for a solution - if not a fix. Practically, I would love if that bug could be solved soon so that we can all start running generic/503 for more than a few iterations to stress test reflink/cow with power failure. Success on this front could be a big upside before turning off EXPERIMENTAL. Thanks, Amir. -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 15, 2017 at 08:14:33AM +0200, Amir Goldstein wrote: > On Wed, Nov 15, 2017 at 3:10 AM, Darrick J. Wong > <darrick.wong@oracle.com> wrote: > > On Wed, Aug 30, 2017 at 04:54:00PM +0200, Christoph Hellwig wrote: > >> But reject reflink + DAX file systems for now until the code to > >> support reflinks on DAX is actually implemented. > >> > >> Signed-off-by: Christoph Hellwig <hch@lst.de> > >> --- > >> fs/xfs/xfs_super.c | 8 +++----- > >> 1 file changed, 3 insertions(+), 5 deletions(-) > >> > >> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > >> index 38aaacdbb8b3..92521032468e 100644 > >> --- a/fs/xfs/xfs_super.c > >> +++ b/fs/xfs/xfs_super.c > >> @@ -1634,7 +1634,9 @@ xfs_fs_fill_super( > >> } > >> if (xfs_sb_version_hasreflink(&mp->m_sb)) > >> xfs_alert(mp, > >> - "DAX and reflink have not been tested together!"); > >> + "DAX and reflink can not be used together!"); > >> + error = -EINVAL; > >> + goto out_filestream_unmount; > >> } > >> > >> if (xfs_sb_version_hasrmapbt(&mp->m_sb)) { > >> @@ -1648,10 +1650,6 @@ xfs_fs_fill_super( > >> "EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!"); > >> } > >> > >> - if (xfs_sb_version_hasreflink(&mp->m_sb)) > >> - xfs_alert(mp, > >> - "EXPERIMENTAL reflink feature enabled. Use at your own risk!"); > > > > Ok, now that the incore extent map rework has landed upstream, can we > > please get everyone's QA to stress the reflink/cow code vigorously > > during this release cycle so that we can re-consider this patch for > > 4.16? > > > > (And if you're particularly mean, force cowextsize = 1fsb to age the > > filesystem prematurely?) > > > > Right now I think the only problem I know about is that log recovery bug > > where the defer ops of replayed defer ops get logged in the wrong order, > > but that's it. > > > > Last time you wrote about this bug you had a "hard question" about transaction > reservation for the solution and said your're going to go have a think about it: > https://marc.info/?l=linux-xfs&m=150766311924170&w=2 > Did you come to any conclusions? > That sounds like one of those nasty CoW corner cases, so I'd be happy to know > there is at least a well thought design for a solution - if not a fix. Sorry I let myself get distracted/stressed with the merge window; hopefully the patch I sent out will address that problem? > Practically, I would love if that bug could be solved soon so that we can all > start running generic/503 for more than a few iterations to stress > test reflink/cow > with power failure. Success on this front could be a big upside before > turning off > EXPERIMENTAL. Indeed! What is the status of those tests, anyway? Are they in xfstests? --D > Thanks, > Amir. > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 22, 2017 at 8:31 PM, Darrick J. Wong <darrick.wong@oracle.com> wrote: > On Wed, Nov 15, 2017 at 08:14:33AM +0200, Amir Goldstein wrote: [...] >> >> Last time you wrote about this bug you had a "hard question" about transaction >> reservation for the solution and said your're going to go have a think about it: >> https://marc.info/?l=linux-xfs&m=150766311924170&w=2 >> Did you come to any conclusions? >> That sounds like one of those nasty CoW corner cases, so I'd be happy to know >> there is at least a well thought design for a solution - if not a fix. > > Sorry I let myself get distracted/stressed with the merge window; > hopefully the patch I sent out will address that problem? > Problem reproduces better on a spinning rust I have at the office, so will give it a spin tomorrow. >> Practically, I would love if that bug could be solved soon so that we can all >> start running generic/503 for more than a few iterations to stress >> test reflink/cow >> with power failure. Success on this front could be a big upside before >> turning off >> EXPERIMENTAL. > > Indeed! What is the status of those tests, anyway? Are they in xfstests? > Yes. 2 fsx stress tests, with and without clones generic/455 generic/457 (replay group) One regression test for ext4 crash bug that was already fixed generic/456 And one regression test for xfs reflink crash bug that you already fixed generic/458 So generic/457 is the one we should be hammering (fsx and reflink) it creates 10 clones and runs fsx workers on them. I imagine it is not long before there are no more shared extents. It's not much, but its a good start. I recon it would be good if you guys added some more variants of this test to try and cover more interesting reflink cases. FYI, Josef also has an fsstress based test, but it is plain shell script and I never got around to adapting it to an xfstest: https://github.com/josefbacik/log-writes Cheers, Amir. -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 22, 2017 at 10:40:07PM +0200, Amir Goldstein wrote: > On Wed, Nov 22, 2017 at 8:31 PM, Darrick J. Wong > <darrick.wong@oracle.com> wrote: > > On Wed, Nov 15, 2017 at 08:14:33AM +0200, Amir Goldstein wrote: > [...] > >> > >> Last time you wrote about this bug you had a "hard question" about transaction > >> reservation for the solution and said your're going to go have a think about it: > >> https://marc.info/?l=linux-xfs&m=150766311924170&w=2 > >> Did you come to any conclusions? > >> That sounds like one of those nasty CoW corner cases, so I'd be happy to know > >> there is at least a well thought design for a solution - if not a fix. > > > > Sorry I let myself get distracted/stressed with the merge window; > > hopefully the patch I sent out will address that problem? > > > > Problem reproduces better on a spinning rust I have at the office, so will > give it a spin tomorrow. Ok. > >> Practically, I would love if that bug could be solved soon so that we can all > >> start running generic/503 for more than a few iterations to stress > >> test reflink/cow > >> with power failure. Success on this front could be a big upside before > >> turning off > >> EXPERIMENTAL. > > > > Indeed! What is the status of those tests, anyway? Are they in xfstests? > > > > Yes. Excellent! > 2 fsx stress tests, with and without clones > generic/455 > generic/457 > (replay group) > > One regression test for ext4 crash bug that was already fixed > generic/456 > > And one regression test for xfs reflink crash bug that you already fixed > generic/458 > > So generic/457 is the one we should be hammering (fsx and reflink) Ok, will do. --D > it creates 10 clones and runs fsx workers on them. > I imagine it is not long before there are no more shared extents. > It's not much, but its a good start. > I recon it would be good if you guys added some more variants of this test > to try and cover more interesting reflink cases. > > FYI, Josef also has an fsstress based test, but it is plain shell script > and I never got around to adapting it to an xfstest: > https://github.com/josefbacik/log-writes > > Cheers, > Amir. > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 22, 2017 at 11:00 PM, Darrick J. Wong <darrick.wong@oracle.com> wrote: > On Wed, Nov 22, 2017 at 10:40:07PM +0200, Amir Goldstein wrote: >> On Wed, Nov 22, 2017 at 8:31 PM, Darrick J. Wong >> <darrick.wong@oracle.com> wrote: >> > On Wed, Nov 15, 2017 at 08:14:33AM +0200, Amir Goldstein wrote: >> [...] >> >> >> >> Last time you wrote about this bug you had a "hard question" about transaction >> >> reservation for the solution and said your're going to go have a think about it: >> >> https://marc.info/?l=linux-xfs&m=150766311924170&w=2 >> >> Did you come to any conclusions? >> >> That sounds like one of those nasty CoW corner cases, so I'd be happy to know >> >> there is at least a well thought design for a solution - if not a fix. >> > >> > Sorry I let myself get distracted/stressed with the merge window; >> > hopefully the patch I sent out will address that problem? >> > >> >> Problem reproduces better on a spinning rust I have at the office, so will >> give it a spin tomorrow. > So far all good. Spinning rust never survived 10 rounds of generic/457 and it is past the 100's round with your patch already. Will keep it spinning for the weekend. Thanks, Amir. -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Sep 01, 2017 at 08:19:52AM +1000, Dave Chinner wrote: > On Wed, Aug 30, 2017 at 11:40:33PM -0700, Darrick J. Wong wrote: > > On Wed, Aug 30, 2017 at 04:54:00PM +0200, Christoph Hellwig wrote: > > > But reject reflink + DAX file systems for now until the code to > > > support reflinks on DAX is actually implemented. > > > > > > Signed-off-by: Christoph Hellwig <hch@lst.de> > > > --- > > > fs/xfs/xfs_super.c | 8 +++----- > > > 1 file changed, 3 insertions(+), 5 deletions(-) > > > > > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > > > index 38aaacdbb8b3..92521032468e 100644 > > > --- a/fs/xfs/xfs_super.c > > > +++ b/fs/xfs/xfs_super.c > > > @@ -1634,7 +1634,9 @@ xfs_fs_fill_super( > > > } > > > if (xfs_sb_version_hasreflink(&mp->m_sb)) > > > xfs_alert(mp, > > > - "DAX and reflink have not been tested together!"); > > > + "DAX and reflink can not be used together!"); > > > + error = -EINVAL; > > > + goto out_filestream_unmount; > > > > /This/ hunk seems fine, but... > > > > > } > > > > > > if (xfs_sb_version_hasrmapbt(&mp->m_sb)) { > > > @@ -1648,10 +1650,6 @@ xfs_fs_fill_super( > > > "EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!"); > > > } > > > > > > - if (xfs_sb_version_hasreflink(&mp->m_sb)) > > > - xfs_alert(mp, > > > - "EXPERIMENTAL reflink feature enabled. Use at your own risk!"); > > > - > > > > ...I frankly would rather wait until we land and stabilize the incore > > extent rework, because I'd rather not have the reflink story be: 1) we > > declare reflink stable in 4.14; 2) immediately people start loading up > > XFSes with sparse VM images that blow up on high order allocations and > > then 3) we get a whole lot of complaints about it. Then in 4.15 we 4) > > land the incore extent map only now we're dragging those same users > > through the mud while /that/ stabilizes, with the result 5) that > > everyone thinks we've gone off the deep end and doesn't trust us > > anymore. > > Which is a lose-lose proposition. IOWs, to be avoided at all costs. > > > I wish that didn't also mean waiting another 6 months for something that > > none of us /developers/ have seen in practice, > > Oh, I've seen it plenty of times in practice. Restoring large > metadumps with (tens of) millions of tiny extents is the common > cause for me. > > > especially since the > > enterprise distros will have plenty of time to backport all this stuff > > before their next big releases if it /does/ become a problem. It's fine > > enough for me (and Christoph's customers, evidently) but is that enough? > > FYI, this has been a problem on RHEL for a long time. It's a long > standing, well known problem and that's one of the reasons why > it's at the top of my list to fix right now. > > > <shrug> Not sure what to do about my own fear factor. :) Anyone want to > > offer further comments? A quick git history trip says sparse inodes > > took about 13 months to go from initial commit to EXPERIMENTAL removed? > > I always aimed to remove the experimental tag 4 kernel releases > after the initial merge, assuming testing and required functionality > was all there and complete early on in the stabilisation process. > Generally I had a set of criteria that needed to be fulfilled before > I'd remove the tag, and ticked them off as fixes were merged. > > e.g. CRCs were introduced in 3.10, the experimental tag was removed > in 3.15, so 5 releases in that case. We didn't get CRCs working > sanely until 3.11 and there was still a lot of userspace stuff that > lagged (and having a fully working userspace was a criteria for > removing experimental from the kernel code), so it took longer to > stabilise than something smaller like sparse inodes. > > Fixing the extent list memory allocation problem was always > something on my list of "need to fix before reflink can be deployed > in anger", so I'd much prefer that we are conservative in removing > the experimental tag. It doesn't hurt anyone if we keep it on until > we get the extent tree rework merged, and it's better to err on the > side of keeping it too long than not long enough. Hmm, it's been 4 months since we last picked up this thread. We've had 7 releases since reflink itself went upstream, which for most features would be long enough to remove the EXPERIMENTAL tag for 4.16. On the other hand, the extent list memory allocation fix series hasn't had much time to stabilize, which could be a reason to leave the tag. Thoughts? --D > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jan 08, 2018 at 01:43:27PM -0800, Darrick J. Wong wrote: > On Fri, Sep 01, 2017 at 08:19:52AM +1000, Dave Chinner wrote: > > On Wed, Aug 30, 2017 at 11:40:33PM -0700, Darrick J. Wong wrote: > > > On Wed, Aug 30, 2017 at 04:54:00PM +0200, Christoph Hellwig wrote: > > > > But reject reflink + DAX file systems for now until the code to > > > > support reflinks on DAX is actually implemented. > > > > > > > > Signed-off-by: Christoph Hellwig <hch@lst.de> > > > > --- > > > > fs/xfs/xfs_super.c | 8 +++----- > > > > 1 file changed, 3 insertions(+), 5 deletions(-) > > > > > > > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > > > > index 38aaacdbb8b3..92521032468e 100644 > > > > --- a/fs/xfs/xfs_super.c > > > > +++ b/fs/xfs/xfs_super.c > > > > @@ -1634,7 +1634,9 @@ xfs_fs_fill_super( > > > > } > > > > if (xfs_sb_version_hasreflink(&mp->m_sb)) > > > > xfs_alert(mp, > > > > - "DAX and reflink have not been tested together!"); > > > > + "DAX and reflink can not be used together!"); > > > > + error = -EINVAL; > > > > + goto out_filestream_unmount; > > > > > > /This/ hunk seems fine, but... > > > > > > > } > > > > > > > > if (xfs_sb_version_hasrmapbt(&mp->m_sb)) { > > > > @@ -1648,10 +1650,6 @@ xfs_fs_fill_super( > > > > "EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!"); > > > > } > > > > > > > > - if (xfs_sb_version_hasreflink(&mp->m_sb)) > > > > - xfs_alert(mp, > > > > - "EXPERIMENTAL reflink feature enabled. Use at your own risk!"); > > > > - > > > > > > ...I frankly would rather wait until we land and stabilize the incore > > > extent rework, because I'd rather not have the reflink story be: 1) we > > > declare reflink stable in 4.14; 2) immediately people start loading up > > > XFSes with sparse VM images that blow up on high order allocations and > > > then 3) we get a whole lot of complaints about it. Then in 4.15 we 4) > > > land the incore extent map only now we're dragging those same users > > > through the mud while /that/ stabilizes, with the result 5) that > > > everyone thinks we've gone off the deep end and doesn't trust us > > > anymore. > > > > Which is a lose-lose proposition. IOWs, to be avoided at all costs. > > > > > I wish that didn't also mean waiting another 6 months for something that > > > none of us /developers/ have seen in practice, > > > > Oh, I've seen it plenty of times in practice. Restoring large > > metadumps with (tens of) millions of tiny extents is the common > > cause for me. > > > > > especially since the > > > enterprise distros will have plenty of time to backport all this stuff > > > before their next big releases if it /does/ become a problem. It's fine > > > enough for me (and Christoph's customers, evidently) but is that enough? > > > > FYI, this has been a problem on RHEL for a long time. It's a long > > standing, well known problem and that's one of the reasons why > > it's at the top of my list to fix right now. > > > > > <shrug> Not sure what to do about my own fear factor. :) Anyone want to > > > offer further comments? A quick git history trip says sparse inodes > > > took about 13 months to go from initial commit to EXPERIMENTAL removed? > > > > I always aimed to remove the experimental tag 4 kernel releases > > after the initial merge, assuming testing and required functionality > > was all there and complete early on in the stabilisation process. > > Generally I had a set of criteria that needed to be fulfilled before > > I'd remove the tag, and ticked them off as fixes were merged. > > > > e.g. CRCs were introduced in 3.10, the experimental tag was removed > > in 3.15, so 5 releases in that case. We didn't get CRCs working > > sanely until 3.11 and there was still a lot of userspace stuff that > > lagged (and having a fully working userspace was a criteria for > > removing experimental from the kernel code), so it took longer to > > stabilise than something smaller like sparse inodes. > > > > Fixing the extent list memory allocation problem was always > > something on my list of "need to fix before reflink can be deployed > > in anger", so I'd much prefer that we are conservative in removing > > the experimental tag. It doesn't hurt anyone if we keep it on until > > we get the extent tree rework merged, and it's better to err on the > > side of keeping it too long than not long enough. > > Hmm, it's been 4 months since we last picked up this thread. We've > had 7 releases since reflink itself went upstream, which for most > features would be long enough to remove the EXPERIMENTAL tag for 4.16. > > On the other hand, the extent list memory allocation fix series hasn't > had much time to stabilize, which could be a reason to leave the tag. > > Thoughts? I was thinking about this last night, actually. The extent rework seems stable enough - I've been hammering large extent count directories for the past week and I haven't come across any problems so I think we're good to remove the EXPERIMENTAL tags now... Cheers, Dave.
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 38aaacdbb8b3..92521032468e 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1634,7 +1634,9 @@ xfs_fs_fill_super( } if (xfs_sb_version_hasreflink(&mp->m_sb)) xfs_alert(mp, - "DAX and reflink have not been tested together!"); + "DAX and reflink can not be used together!"); + error = -EINVAL; + goto out_filestream_unmount; } if (xfs_sb_version_hasrmapbt(&mp->m_sb)) { @@ -1648,10 +1650,6 @@ xfs_fs_fill_super( "EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!"); } - if (xfs_sb_version_hasreflink(&mp->m_sb)) - xfs_alert(mp, - "EXPERIMENTAL reflink feature enabled. Use at your own risk!"); - error = xfs_mountfs(mp); if (error) goto out_filestream_unmount;
But reject reflink + DAX file systems for now until the code to support reflinks on DAX is actually implemented. Signed-off-by: Christoph Hellwig <hch@lst.de> --- fs/xfs/xfs_super.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)