Message ID | 20181129060110.159878-25-sashal@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | None | expand |
On Thu, Nov 29, 2018 at 01:00:59AM -0500, Sasha Levin wrote: > From: Dave Chinner <dchinner@redhat.com> > > [ Upstream commit b450672fb66b4a991a5b55ee24209ac7ae7690ce ] > > If we are doing sub-block dio that extends EOF, we need to zero > the unused tail of the block to initialise the data in it it. If we > do not zero the tail of the block, then an immediate mmap read of > the EOF block will expose stale data beyond EOF to userspace. Found > with fsx running sub-block DIO sizes vs MAPREAD/MAPWRITE operations. > > Fix this by detecting if the end of the DIO write is beyond EOF > and zeroing the tail if necessary. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > Reviewed-by: Christoph Hellwig <hch@lst.de> > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > Signed-off-by: Sasha Levin <sashal@kernel.org> > --- > fs/iomap.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/fs/iomap.c b/fs/iomap.c > index 8f7673a69273..407efdae3978 100644 > --- a/fs/iomap.c > +++ b/fs/iomap.c > @@ -940,7 +940,14 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length, > dio->submit.cookie = submit_bio(bio); > } while (nr_pages); > > - if (need_zeroout) { > + /* > + * We need to zeroout the tail of a sub-block write if the extent type > + * requires zeroing or the write extends beyond EOF. If we don't zero > + * the block tail in the latter case, we can expose stale data via mmap > + * reads of the EOF block. > + */ > + if (need_zeroout || > + ((dio->flags & IOMAP_DIO_WRITE) && pos >= i_size_read(inode))) { > /* zero out from the end of the write to the end of the block */ > pad = pos & (fs_block_size - 1); > if (pad) How do you propose to validate that this doesn't introduce new data corruptions in isolation? I've spent the last 4 weeks of my life and about 15 billion fsx ops chasing an validating the bug corruption fixes we've pushed recently into the 4.19 and 4.20 codebase. Cherry picking only one of the 50-odd patches we've committed into late 4.19 and 4.20 kernels to fix the problems we've found really seems like asking for trouble. If you're going to back port random data corruption fixes, then you need to spend a *lot* of time validating that it doesn't make things worse than they already are... Cheers, Dave.
On Thu, Nov 29, 2018 at 11:14:59PM +1100, Dave Chinner wrote: > > Cherry picking only one of the 50-odd patches we've committed into > late 4.19 and 4.20 kernels to fix the problems we've found really > seems like asking for trouble. If you're going to back port random > data corruption fixes, then you need to spend a *lot* of time > validating that it doesn't make things worse than they already > are... Any reason why we can't take the 50-odd patches in their entirety? It sounds like 4.19 isn't fully fixed, but 4.20-rc1 is? If so, what do you recommend we do to make 4.19 working properly? thanks, greg k-h
On Thu, Nov 29, 2018 at 01:47:56PM +0100, Greg KH wrote: > On Thu, Nov 29, 2018 at 11:14:59PM +1100, Dave Chinner wrote: > > > > Cherry picking only one of the 50-odd patches we've committed into > > late 4.19 and 4.20 kernels to fix the problems we've found really > > seems like asking for trouble. If you're going to back port random > > data corruption fixes, then you need to spend a *lot* of time > > validating that it doesn't make things worse than they already > > are... > > Any reason why we can't take the 50-odd patches in their entirety? It > sounds like 4.19 isn't fully fixed, but 4.20-rc1 is? If so, what do you > recommend we do to make 4.19 working properly? You coul dpull all the fixes, but then you have a QA problem. Basically, we have multiple badly broken syscalls (FICLONERANGE, FIDEDUPERANGE and copy_file_range), and even 4.20-rc4 isn't fully fixed. There were ~5 critical dedupe/clone data corruption fixes for XFS went into 4.19-rc8. There were ~30 patches that went into 4.20-rc1 that fixed the FICLONERANGE/FIDEDUPERANGE ioctls. That completely reworks the entire VFS infrastructure for those calls, and touches several filesystems as well. It fixes problems with setuid files, swap files, modifying immutable files, failure to enforce rlimit and max file size constraints, behaviour that didn't match man page descriptions, etc. There were another ~10 patches that went into 4.20-rc4 that fixed yet more data corruption and API problems that we found when we enhanced fsx to use the above syscalls. And I have another ~10 patches that I'm working on right now to fix the copy_file_range() implementation - it has all the same problems I listed above for FICLONERANGE/FIDEDUPERANGE and some other unique ones. I'm currently writing error condition tests for fstests so that we at least have some coverage of the conditions copy_file_range() is supposed to catch and fail. This might all make a late 4.20-rcX, but it's looking more like 4.21 at this point. As to testing this stuff, I've spend several weeks now on this and so has Darrick. Between us we've done a huge amount of QA needed to verify that the problems are fixed and it is still ongoing. From #xfs a couple of days ago: [28/11/18 16:59] * djwong hits 6 billion fsxops... [28/11/18 17:07] <dchinner_> djwong: I've got about 3.75 billion ops running on a machine here.... [28/11/18 17:20] <djwong> note that's 1 billion fsxops x 6 machines [28/11/18 17:21] <djwong> [xfsv4, xfsv5, xfsv5 w/ 1k blocks] * [directio fsx, buffered fsx] [28/11/18 17:21] <dchinner_> Oh, I've got 3.75B x 4 instances on one filesystem :P [28/11/18 17:22] <dchinner_> [direct io, buffered] x [small op lengths, large op lengths] And this morning: [30/11/18 08:53] <djwong> 7 billion fsxops... I stopped my tests at 5 billion ops yesterday (i.e. 20 billion ops aggregate) to focus on testing the copy_file_range() changes, but Darrick's tests are still ongoing and have passed 40 billion ops in aggregate over the past few days. The reason we are running these so long is that we've seen fsx data corruption failures after 12+ hours of runtime and hundreds of millions of ops. Hence the testing for backported fixes will need to replicate these test runs across multiple configurations for multiple days before we have any confidence that we've actually fixed the data corruptions and not introduced any new ones. If you pull only a small subset of the fixes, the fsx will still fail and we have no real way of actually verifying that there have been no regression introduced by the backport. IOWs, there's a /massive/ amount of QA needed for ensuring that these backports work correctly. Right now the XFS developers don't have the time or resources available to validate stable backports are correct and regression fre because we are focussed on ensuring the upstream fixes we've already made (and are still writing) are solid and reliable. Cheers, Dave.
On Fri, Nov 30, 2018 at 09:40:19AM +1100, Dave Chinner wrote: > On Thu, Nov 29, 2018 at 01:47:56PM +0100, Greg KH wrote: > > On Thu, Nov 29, 2018 at 11:14:59PM +1100, Dave Chinner wrote: > > > > > > Cherry picking only one of the 50-odd patches we've committed into > > > late 4.19 and 4.20 kernels to fix the problems we've found really > > > seems like asking for trouble. If you're going to back port random > > > data corruption fixes, then you need to spend a *lot* of time > > > validating that it doesn't make things worse than they already > > > are... > > > > Any reason why we can't take the 50-odd patches in their entirety? It > > sounds like 4.19 isn't fully fixed, but 4.20-rc1 is? If so, what do you > > recommend we do to make 4.19 working properly? > > You coul dpull all the fixes, but then you have a QA problem. > Basically, we have multiple badly broken syscalls (FICLONERANGE, > FIDEDUPERANGE and copy_file_range), and even 4.20-rc4 isn't fully > fixed. > > There were ~5 critical dedupe/clone data corruption fixes for XFS > went into 4.19-rc8. Have any of those been tagged for stable? > There were ~30 patches that went into 4.20-rc1 that fixed the > FICLONERANGE/FIDEDUPERANGE ioctls. That completely reworks the > entire VFS infrastructure for those calls, and touches several > filesystems as well. It fixes problems with setuid files, swap > files, modifying immutable files, failure to enforce rlimit and > max file size constraints, behaviour that didn't match man page > descriptions, etc. > > There were another ~10 patches that went into 4.20-rc4 that fixed > yet more data corruption and API problems that we found when we > enhanced fsx to use the above syscalls. > > And I have another ~10 patches that I'm working on right now to fix > the copy_file_range() implementation - it has all the same problems > I listed above for FICLONERANGE/FIDEDUPERANGE and some other unique > ones. I'm currently writing error condition tests for fstests so > that we at least have some coverage of the conditions > copy_file_range() is supposed to catch and fail. This might all make > a late 4.20-rcX, but it's looking more like 4.21 at this point. > > As to testing this stuff, I've spend several weeks now on this and > so has Darrick. Between us we've done a huge amount of QA needed to > verify that the problems are fixed and it is still ongoing. From > #xfs a couple of days ago: > > [28/11/18 16:59] * djwong hits 6 billion fsxops... > [28/11/18 17:07] <dchinner_> djwong: I've got about 3.75 billion ops running on a machine here.... > [28/11/18 17:20] <djwong> note that's 1 billion fsxops x 6 machines > [28/11/18 17:21] <djwong> [xfsv4, xfsv5, xfsv5 w/ 1k blocks] * [directio fsx, buffered fsx] > [28/11/18 17:21] <dchinner_> Oh, I've got 3.75B x 4 instances on one filesystem :P > [28/11/18 17:22] <dchinner_> [direct io, buffered] x [small op lengths, large op lengths] > > And this morning: > > [30/11/18 08:53] <djwong> 7 billion fsxops... > > I stopped my tests at 5 billion ops yesterday (i.e. 20 billion ops > aggregate) to focus on testing the copy_file_range() changes, but > Darrick's tests are still ongoing and have passed 40 billion ops in > aggregate over the past few days. > > The reason we are running these so long is that we've seen fsx data > corruption failures after 12+ hours of runtime and hundreds of > millions of ops. Hence the testing for backported fixes will need to > replicate these test runs across multiple configurations for > multiple days before we have any confidence that we've actually > fixed the data corruptions and not introduced any new ones. > > If you pull only a small subset of the fixes, the fsx will still > fail and we have no real way of actually verifying that there have > been no regression introduced by the backport. IOWs, there's a > /massive/ amount of QA needed for ensuring that these backports work > correctly. > > Right now the XFS developers don't have the time or resources > available to validate stable backports are correct and regression > fre because we are focussed on ensuring the upstream fixes we've > already made (and are still writing) are solid and reliable. Ok, that's fine, so users of XFS should wait until the 4.20 release before relying on it? :) I understand your reluctance to want to backport anything, but it really feels like you are not even allowing for fixes that are "obviously right" to be backported either, even after they pass testing. Which isn't ok for your users. thanks, greg k-h
On Fri, Nov 30, 2018 at 09:22:03AM +0100, Greg KH wrote: >On Fri, Nov 30, 2018 at 09:40:19AM +1100, Dave Chinner wrote: >> I stopped my tests at 5 billion ops yesterday (i.e. 20 billion ops >> aggregate) to focus on testing the copy_file_range() changes, but >> Darrick's tests are still ongoing and have passed 40 billion ops in >> aggregate over the past few days. >> >> The reason we are running these so long is that we've seen fsx data >> corruption failures after 12+ hours of runtime and hundreds of >> millions of ops. Hence the testing for backported fixes will need to >> replicate these test runs across multiple configurations for >> multiple days before we have any confidence that we've actually >> fixed the data corruptions and not introduced any new ones. >> >> If you pull only a small subset of the fixes, the fsx will still >> fail and we have no real way of actually verifying that there have >> been no regression introduced by the backport. IOWs, there's a >> /massive/ amount of QA needed for ensuring that these backports work >> correctly. >> >> Right now the XFS developers don't have the time or resources >> available to validate stable backports are correct and regression >> fre because we are focussed on ensuring the upstream fixes we've >> already made (and are still writing) are solid and reliable. > >Ok, that's fine, so users of XFS should wait until the 4.20 release >before relying on it? :) It's getting to the point that with the amount of known issues with XFS on LTS kernels it makes sense to mark it as CONFIG_BROKEN. >I understand your reluctance to want to backport anything, but it really >feels like you are not even allowing for fixes that are "obviously >right" to be backported either, even after they pass testing. Which >isn't ok for your users. Do the XFS maintainers expect users to always use the latest upstream kernel? -- Thanks, Sasha
On Fri, Nov 30, 2018 at 05:14:41AM -0500, Sasha Levin wrote: > On Fri, Nov 30, 2018 at 09:22:03AM +0100, Greg KH wrote: > > On Fri, Nov 30, 2018 at 09:40:19AM +1100, Dave Chinner wrote: > > > I stopped my tests at 5 billion ops yesterday (i.e. 20 billion ops > > > aggregate) to focus on testing the copy_file_range() changes, but > > > Darrick's tests are still ongoing and have passed 40 billion ops in > > > aggregate over the past few days. > > > > > > The reason we are running these so long is that we've seen fsx data > > > corruption failures after 12+ hours of runtime and hundreds of > > > millions of ops. Hence the testing for backported fixes will need to > > > replicate these test runs across multiple configurations for > > > multiple days before we have any confidence that we've actually > > > fixed the data corruptions and not introduced any new ones. > > > > > > If you pull only a small subset of the fixes, the fsx will still > > > fail and we have no real way of actually verifying that there have > > > been no regression introduced by the backport. IOWs, there's a > > > /massive/ amount of QA needed for ensuring that these backports work > > > correctly. > > > > > > Right now the XFS developers don't have the time or resources > > > available to validate stable backports are correct and regression > > > fre because we are focussed on ensuring the upstream fixes we've > > > already made (and are still writing) are solid and reliable. I feel the need to contribute my own interpretation of what's been going on the last four months: What you're seeing is not the usual level of reluctance to backport fixes to LTS kernels, it's our own frustrations at the kernel community's systemic inability to QA new fs features properly. Four months ago (prior to 4.19) Zorro started digging into periodic test failures with shared/010, which resulted in some fixes to the btrfs dedupe and clone range ioctl implementations. He then saw the same failures on XFS. Dave and I stared at the btrfs patches for a while, then started looking at the xfs counterparts, and realized that nobody had ever added those commands to the fstests stressor programs, nor had anyone ever encoded into a test the side effects of a file remap (mtime update, removal of suid). Nor were there any tests to ensure that these ioctls couldn't be abused to violate system security and stability constraints. That's why I refactored a whole ton of vfs file remap code for 4.20, and (with the help of Dave and Brian and others) worked on fixing all the problems where fsx and fsstress demonstrate file corruption problems. Then we started asking the same questions of the copy_file_range system call, and discovered that yes, we have all of the same problems. We also discovered several failure cases that aren't mentioned in any documentation, which has complicated the generation of automatable tests. Worse yet, the stressor programs fell over even sooner with the fallback splice implementation. TLDR: New features show up in the vfs without a lot of design documentation, incomplete userspace interface manuals, and not much beyond trivial testing. So the problem I'm facing here is that the XFS team are singlehandedly trying to pay off years of accumulated technical debt in the vfs. We definitely had a role in adding to that debt, so we're fixing it. Dave is now refactoring the copy_file_range backend to implement all the necessary security and stability checks, and I'm still QAing all the stuff we've added to 4.20. We're not finished, where "finished" means that we can get /one/ kernel tree to go ~100 billion fsxops without burping up failures, and we've written fstests to check that said kernel can handle correctly all the weird side cases. Until all those fstests go upstream, I don't want to spread out into backporting and testing LTS kernels, even with test automation. By the time we're done with all our upstream work you ought to be able to autosel backport the whole mess into the LTS kernels /and/ fstests will be able to tell you if the autosel has succeeded without causing any obvious regressions. > > Ok, that's fine, so users of XFS should wait until the 4.20 release > > before relying on it? :) At the rate we're going, we're not going to finish until 4.21, but yes, let's wait until 4.20 is closer to release to start in on porting all of its fixes to 4.14/4.19. > It's getting to the point that with the amount of known issues with XFS > on LTS kernels it makes sense to mark it as CONFIG_BROKEN. These aren't all issues specific to XFS; some plague every fs in subtle weird ways that only show up with extreme testing. We need the extreme testing to flush out as many bugs as we can before enabling the feature by default. XFS reflink is not enabled by default and due to all this is not likely to get it any time soon. (That copy_file_range syscall should have been rigorously tested before it was turned on in the kernel...) > > I understand your reluctance to want to backport anything, but it really > > feels like you are not even allowing for fixes that are "obviously > > right" to be backported either, even after they pass testing. Which > > isn't ok for your users. > > Do the XFS maintainers expect users to always use the latest upstream > kernel? For features that are EXPERIMENTAL or aren't enabled by default, yes, they should be. --D > > -- > Thanks, > Sasha
On Fri, Nov 30, 2018 at 09:22:03AM +0100, Greg KH wrote: > On Fri, Nov 30, 2018 at 09:40:19AM +1100, Dave Chinner wrote: > > On Thu, Nov 29, 2018 at 01:47:56PM +0100, Greg KH wrote: > > > On Thu, Nov 29, 2018 at 11:14:59PM +1100, Dave Chinner wrote: > > > > > > > > Cherry picking only one of the 50-odd patches we've committed into > > > > late 4.19 and 4.20 kernels to fix the problems we've found really > > > > seems like asking for trouble. If you're going to back port random > > > > data corruption fixes, then you need to spend a *lot* of time > > > > validating that it doesn't make things worse than they already > > > > are... > > > > > > Any reason why we can't take the 50-odd patches in their entirety? It > > > sounds like 4.19 isn't fully fixed, but 4.20-rc1 is? If so, what do you > > > recommend we do to make 4.19 working properly? > > > > You coul dpull all the fixes, but then you have a QA problem. > > Basically, we have multiple badly broken syscalls (FICLONERANGE, > > FIDEDUPERANGE and copy_file_range), and even 4.20-rc4 isn't fully > > fixed. > > > > There were ~5 critical dedupe/clone data corruption fixes for XFS > > went into 4.19-rc8. > > Have any of those been tagged for stable? None, because I have no confidence that the stable process will do the necessary QA to validate that such a significant backport is regression and data corruption free. The backport needs to be done as a complete series when we've finished the upstream work because we can't test isolated patches adequately because fsx will fall over due to all the unfixed problems and not exercise the fixes that were backported. Further, we just had a regression reported in one of the commit that the autosel bot has selected for automatic backports. It has been uncovered by overlay which appears to do some unique things with the piece of crap that is do_splice_direct(). And Darrick just commented on #xfs that he's just noticed more bugs with FICLONERANGE and overlay. IOWs, we're still finding broken stuff in this code and we are fixing it as fast as we can - we're still putting out fires. We most certainly don't need the added pressure of having you guys create more spot fires by breaking stable kernels with largely untested partial backports and having users exposed to whacky new data corruption issues. So, no, it isn't tagged for stable kernels because "commit into mainline" != "this should be backported immediately". Backports of these fixes are largely going to be done largely as a function of time and resources, of which we have zero available right now. Doing backports right now is premature and ill-advised because we haven't finished finding and fixing all the bugs and regressions in this code. > > Right now the XFS developers don't have the time or resources > > available to validate stable backports are correct and regression > > fre because we are focussed on ensuring the upstream fixes we've > > already made (and are still writing) are solid and reliable. > > Ok, that's fine, so users of XFS should wait until the 4.20 release > before relying on it? :) Ok, Greg, that's *out of line*. I should throw the CoC at you because I find that comment offensive, condescending, belittling, denegrating and insulting. Your smug and superior "I know what is right for you" attitude is completely inappropriate, and a little smiley face does not make it acceptible. If you think your comment is funny, you've badly misjudged how much effort I've put into this (100-hour weeks for over a month now), how close I'm flying to burn out (again!), and how pissed off I am about this whole scenario. We ended up here because we *trusted* that other people had implemented and tested their APIs and code properly before it got merged. We've been severely burnt, and we've been left to clean up the mess made by other people by ourselves. Instead of thanks, what we get instead is "we know better" attitude and jokes implying our work is crap and we don't care about our users. That's just plain *insulting*. If anyone is looking for a demonstration of everything that is wrong with the Linux kernel development culture, then they don't need to look any further. > I understand your reluctance to want to backport anything, but it really > feels like you are not even allowing for fixes that are "obviously > right" to be backported either, even after they pass testing. Which > isn't ok for your users. It's worse for our users if we introduce regressions into stable kernels, which is exactly what this "obviously right" auto-backport would have done. -Dave.
On Fri, Nov 30, 2018 at 05:14:41AM -0500, Sasha Levin wrote: > On Fri, Nov 30, 2018 at 09:22:03AM +0100, Greg KH wrote: > >On Fri, Nov 30, 2018 at 09:40:19AM +1100, Dave Chinner wrote: > >>I stopped my tests at 5 billion ops yesterday (i.e. 20 billion ops > >>aggregate) to focus on testing the copy_file_range() changes, but > >>Darrick's tests are still ongoing and have passed 40 billion ops in > >>aggregate over the past few days. > >> > >>The reason we are running these so long is that we've seen fsx data > >>corruption failures after 12+ hours of runtime and hundreds of > >>millions of ops. Hence the testing for backported fixes will need to > >>replicate these test runs across multiple configurations for > >>multiple days before we have any confidence that we've actually > >>fixed the data corruptions and not introduced any new ones. > >> > >>If you pull only a small subset of the fixes, the fsx will still > >>fail and we have no real way of actually verifying that there have > >>been no regression introduced by the backport. IOWs, there's a > >>/massive/ amount of QA needed for ensuring that these backports work > >>correctly. > >> > >>Right now the XFS developers don't have the time or resources > >>available to validate stable backports are correct and regression > >>fre because we are focussed on ensuring the upstream fixes we've > >>already made (and are still writing) are solid and reliable. > > > >Ok, that's fine, so users of XFS should wait until the 4.20 release > >before relying on it? :) > > It's getting to the point that with the amount of known issues with XFS > on LTS kernels it makes sense to mark it as CONFIG_BROKEN. Really? Where are the bug reports? Cheers, Dave.
On Sat, Dec 01, 2018 at 08:50:05AM +1100, Dave Chinner wrote: >On Fri, Nov 30, 2018 at 05:14:41AM -0500, Sasha Levin wrote: >> On Fri, Nov 30, 2018 at 09:22:03AM +0100, Greg KH wrote: >> >On Fri, Nov 30, 2018 at 09:40:19AM +1100, Dave Chinner wrote: >> >>I stopped my tests at 5 billion ops yesterday (i.e. 20 billion ops >> >>aggregate) to focus on testing the copy_file_range() changes, but >> >>Darrick's tests are still ongoing and have passed 40 billion ops in >> >>aggregate over the past few days. >> >> >> >>The reason we are running these so long is that we've seen fsx data >> >>corruption failures after 12+ hours of runtime and hundreds of >> >>millions of ops. Hence the testing for backported fixes will need to >> >>replicate these test runs across multiple configurations for >> >>multiple days before we have any confidence that we've actually >> >>fixed the data corruptions and not introduced any new ones. >> >> >> >>If you pull only a small subset of the fixes, the fsx will still >> >>fail and we have no real way of actually verifying that there have >> >>been no regression introduced by the backport. IOWs, there's a >> >>/massive/ amount of QA needed for ensuring that these backports work >> >>correctly. >> >> >> >>Right now the XFS developers don't have the time or resources >> >>available to validate stable backports are correct and regression >> >>fre because we are focussed on ensuring the upstream fixes we've >> >>already made (and are still writing) are solid and reliable. >> > >> >Ok, that's fine, so users of XFS should wait until the 4.20 release >> >before relying on it? :) >> >> It's getting to the point that with the amount of known issues with XFS >> on LTS kernels it makes sense to mark it as CONFIG_BROKEN. > >Really? Where are the bug reports? In 'git log'! You report these every time you fix something in upstream xfs but don't backport it to stable trees: $ git log --oneline v4.18-rc1..v4.18 fs/xfs d4a34e165557 xfs: properly handle free inodes in extent hint validators 9991274fddb9 xfs: Initialize variables in xfs_alloc_get_rec before using them d8cb5e423789 xfs: fix fdblocks accounting w/ RMAPBT per-AG reservation e53c4b598372 xfs: ensure post-EOF zeroing happens after zeroing part of a file a3a374bf1889 xfs: fix off-by-one error in xfs_rtalloc_query_range 232d0a24b0fc xfs: fix uninitialized field in rtbitmap fsmap backend 5bd88d153998 xfs: recheck reflink state after grabbing ILOCK_SHARED for a write f62cb48e4319 xfs: don't allow insert-range to shift extents past the maximum offset aafe12cee0b1 xfs: don't trip over negative free space in xfs_reserve_blocks 10ee25268e1f xfs: allow empty transactions while frozen e53946dbd31a xfs: xfs_iflush_abort() can be called twice on cluster writeback failure 23fcb3340d03 xfs: More robust inode extent count validation e2ac836307e3 xfs: simplify xfs_bmap_punch_delalloc_range Since I'm assuming that at least some of them are based on actual issues users hit, and some of those apply to stable kernels, why would users want to use an XFS version which is knowingly buggy? -- Thanks, Sasha
> >> It's getting to the point that with the amount of known issues with XFS > >> on LTS kernels it makes sense to mark it as CONFIG_BROKEN. > > > >Really? Where are the bug reports? > > In 'git log'! You report these every time you fix something in upstream > xfs but don't backport it to stable trees: > > $ git log --oneline v4.18-rc1..v4.18 fs/xfs > d4a34e165557 xfs: properly handle free inodes in extent hint validators > 9991274fddb9 xfs: Initialize variables in xfs_alloc_get_rec before using them > d8cb5e423789 xfs: fix fdblocks accounting w/ RMAPBT per-AG reservation > e53c4b598372 xfs: ensure post-EOF zeroing happens after zeroing part of a file > a3a374bf1889 xfs: fix off-by-one error in xfs_rtalloc_query_range > 232d0a24b0fc xfs: fix uninitialized field in rtbitmap fsmap backend > 5bd88d153998 xfs: recheck reflink state after grabbing ILOCK_SHARED for a write > f62cb48e4319 xfs: don't allow insert-range to shift extents past the maximum offset > aafe12cee0b1 xfs: don't trip over negative free space in xfs_reserve_blocks > 10ee25268e1f xfs: allow empty transactions while frozen > e53946dbd31a xfs: xfs_iflush_abort() can be called twice on cluster writeback failure > 23fcb3340d03 xfs: More robust inode extent count validation > e2ac836307e3 xfs: simplify xfs_bmap_punch_delalloc_range > > Since I'm assuming that at least some of them are based on actual issues > users hit, and some of those apply to stable kernels, why would users > want to use an XFS version which is knowingly buggy? > Sasha, There is one more point to consider. Until v4.16, reflink and rmapbt features were experimental: 76883f7988e6 xfs: remove experimental tag for reverse mapping 1e369b0e199b xfs: remove experimental tag for reflinks And MANY of the bug fixes flowing in through XFS tree to master are related to those new XFS features and also to vfs functionality that depends on them (e.g. clone/dedupe), so there MAY be no bug reports at all for XFS in stable trees. IMO users should NOT be expecting XFS to be stable with those features enabled (they are still disabled by default) when running on stable kernels below v4.16. Allow me to act as a self-appointed mediator here and say: There is obviously some bad blood between xfs developers and stable tree maintainers. The conflicts are caused by long standing frustration on both sides. We would all be better off with looking forward on how to improve the situation instead dwelling on past mistakes. This issue was on the agenda at the XFS team meeting on last LSF/MM. The path towards compliance has been laid out by xfs maintainers. Luis, Sasha and myself have been working to improve the filesystem test coverage for stable tree candidate patches. We have still some way to go. The stable candidate patches that triggered the recent flames was outside of the fs/xfs subsystem, which AUTOSEL already know to stay away from, so nobody had any intention to stir things up. At the end of the day, most xfs developers work for companies that ship enterprise distros and need to maintain stable trees, so I would hope that it is in the best interest of everyone involved to cooperate on the goal of better stable-xfs ecosystem. On my part, I would be happy if AUTOSEL could point me at candidate patch *series* for review instead of single patches. For that matter, it sure wouldn't hurt if an xfs developer sending out a patch series would cc:stable on the cover letter and if a developer would be kind enough to add some backporting hints to the cover letter text that would be very helpful indeed. Thanks, Amir.
On Sat, Dec 01, 2018 at 11:09:05AM +0200, Amir Goldstein wrote: >> >> It's getting to the point that with the amount of known issues with XFS >> >> on LTS kernels it makes sense to mark it as CONFIG_BROKEN. >> > >> >Really? Where are the bug reports? >> >> In 'git log'! You report these every time you fix something in upstream >> xfs but don't backport it to stable trees: >> >> $ git log --oneline v4.18-rc1..v4.18 fs/xfs >> d4a34e165557 xfs: properly handle free inodes in extent hint validators >> 9991274fddb9 xfs: Initialize variables in xfs_alloc_get_rec before using them >> d8cb5e423789 xfs: fix fdblocks accounting w/ RMAPBT per-AG reservation >> e53c4b598372 xfs: ensure post-EOF zeroing happens after zeroing part of a file >> a3a374bf1889 xfs: fix off-by-one error in xfs_rtalloc_query_range >> 232d0a24b0fc xfs: fix uninitialized field in rtbitmap fsmap backend >> 5bd88d153998 xfs: recheck reflink state after grabbing ILOCK_SHARED for a write >> f62cb48e4319 xfs: don't allow insert-range to shift extents past the maximum offset >> aafe12cee0b1 xfs: don't trip over negative free space in xfs_reserve_blocks >> 10ee25268e1f xfs: allow empty transactions while frozen >> e53946dbd31a xfs: xfs_iflush_abort() can be called twice on cluster writeback failure >> 23fcb3340d03 xfs: More robust inode extent count validation >> e2ac836307e3 xfs: simplify xfs_bmap_punch_delalloc_range >> >> Since I'm assuming that at least some of them are based on actual issues >> users hit, and some of those apply to stable kernels, why would users >> want to use an XFS version which is knowingly buggy? >> > >Sasha, > >There is one more point to consider. >Until v4.16, reflink and rmapbt features were experimental: >76883f7988e6 xfs: remove experimental tag for reverse mapping >1e369b0e199b xfs: remove experimental tag for reflinks > >And MANY of the bug fixes flowing in through XFS tree to master >are related to those new XFS features and also to vfs functionality >that depends on them (e.g. clone/dedupe), so there MAY be no >bug reports at all for XFS in stable trees. > >IMO users should NOT be expecting XFS to be stable with those >features enabled (they are still disabled by default) >when running on stable kernels below v4.16. > >Allow me to act as a self-appointed mediator here and say: >There is obviously some bad blood between xfs developers and stable >tree maintainers. >The conflicts are caused by long standing frustration on both sides. >We would all be better off with looking forward on how to improve the >situation instead dwelling on past mistakes. >This issue was on the agenda at the XFS team meeting on last LSF/MM. >The path towards compliance has been laid out by xfs maintainers. >Luis, Sasha and myself have been working to improve the filesystem >test coverage for stable tree candidate patches. >We have still some way to go. > >The stable candidate patches that triggered the recent flames >was outside of the fs/xfs subsystem, which AUTOSEL already know >to stay away from, so nobody had any intention to stir things up. > >At the end of the day, most xfs developers work for companies that >ship enterprise distros and need to maintain stable trees, so I would >hope that it is in the best interest of everyone involved to cooperate >on the goal of better stable-xfs ecosystem. > >On my part, I would be happy if AUTOSEL could point me at >candidate patch *series* for review instead of single patches. I'm afraid it's not smart enough to do that :( I can grab an entire series if it selects a single patch in a series, but from my experience it's usually the wrong thing to do. >For that matter, it sure wouldn't hurt if an xfs developer sending >out a patch series would cc:stable on the cover letter and if a developer >would be kind enough to add some backporting hints to the cover letter >text that would be very helpful indeed. Given that we have folks (Luis, Amir, etc) working on it already, maybe a step in the right direction would be having the XFS folks tag fixes some other way ("#wants-a-backport"?) where this would give a hint that this should be backported after sufficient testing? We won't pick these commits to stable ourselves, but only after the XFS maintainers are satisfied that the commit was sufficiently tested on LTS trees? -- Thanks, Sasha
As someone who has done xfs stable backports for a while I really don't think the autoselection is helpful at all. Someone who is vaguely familiar with the code needs to manually select the commits and QA them, which takes a fair amount of time, but just needs some manual help if it should work ok. I think we are about ready to have a new xfs stable maintainer lined up if everything works well fortunately.
On Sun, Dec 02, 2018 at 08:10:16AM -0800, Christoph Hellwig wrote: > As someone who has done xfs stable backports for a while I really don't > think the autoselection is helpful at all. autoselection for xfs patches has been turned off for a while, what triggered this email thread was a core vfs patch that was backported that was not obvious it was created by the xfs developers due to a problem they had found. > Someone who is vaguely familiar with the code needs to manually select > the commits and QA them, which takes a fair amount of time, but just > needs some manual help if it should work ok. > > I think we are about ready to have a new xfs stable maintainer lined up > if everything works well fortunately. That would be wonderful news. thanks, greg k-h
On Sat, Dec 01, 2018 at 08:45:48AM +1100, Dave Chinner wrote: > > > Right now the XFS developers don't have the time or resources > > > available to validate stable backports are correct and regression > > > fre because we are focussed on ensuring the upstream fixes we've > > > already made (and are still writing) are solid and reliable. > > > > Ok, that's fine, so users of XFS should wait until the 4.20 release > > before relying on it? :) > > Ok, Greg, that's *out of line*. Sorry, I did not mean it that way at all, I apologize. I do appreciate all the work you do on your subsystem, I was not criticizing that at all. I was just trying to make a bad joke that it felt like no xfs patches should ever be accepted into stable kernels because more are always being fixed, so the treadmill wouldn't stop. It's like asking a processor developer "what chip to buy" and they always say "the next one is going to be great!" because that is what they are working on at the moment, yet you need to buy something today to get your work done. That's all, no harm ment at all, sorry if it came across the wrong way. greg k-h
On Sat, Dec 01, 2018 at 02:49:09AM -0500, Sasha Levin wrote: > On Sat, Dec 01, 2018 at 08:50:05AM +1100, Dave Chinner wrote: > >On Fri, Nov 30, 2018 at 05:14:41AM -0500, Sasha Levin wrote: > >>On Fri, Nov 30, 2018 at 09:22:03AM +0100, Greg KH wrote: > >>>On Fri, Nov 30, 2018 at 09:40:19AM +1100, Dave Chinner wrote: > >>>>I stopped my tests at 5 billion ops yesterday (i.e. 20 billion ops > >>>>aggregate) to focus on testing the copy_file_range() changes, but > >>>>Darrick's tests are still ongoing and have passed 40 billion ops in > >>>>aggregate over the past few days. > >>>> > >>>>The reason we are running these so long is that we've seen fsx data > >>>>corruption failures after 12+ hours of runtime and hundreds of > >>>>millions of ops. Hence the testing for backported fixes will need to > >>>>replicate these test runs across multiple configurations for > >>>>multiple days before we have any confidence that we've actually > >>>>fixed the data corruptions and not introduced any new ones. > >>>> > >>>>If you pull only a small subset of the fixes, the fsx will still > >>>>fail and we have no real way of actually verifying that there have > >>>>been no regression introduced by the backport. IOWs, there's a > >>>>/massive/ amount of QA needed for ensuring that these backports work > >>>>correctly. > >>>> > >>>>Right now the XFS developers don't have the time or resources > >>>>available to validate stable backports are correct and regression > >>>>fre because we are focussed on ensuring the upstream fixes we've > >>>>already made (and are still writing) are solid and reliable. > >>> > >>>Ok, that's fine, so users of XFS should wait until the 4.20 release > >>>before relying on it? :) > >> > >>It's getting to the point that with the amount of known issues with XFS > >>on LTS kernels it makes sense to mark it as CONFIG_BROKEN. > > > >Really? Where are the bug reports? > > In 'git log'! You report these every time you fix something in upstream > xfs but don't backport it to stable trees: That is so wrong on so many levels I don't really know where to begin. I guess doing a *basic risk analysis* demonstrating that none of those fixes are backport candidates is a good start: > $ git log --oneline v4.18-rc1..v4.18 fs/xfs > d4a34e165557 xfs: properly handle free inodes in extent hint validators Found by QA with generic/229 on a non-standard config. Not user reported, unlikely to ever be seen by users. > 9991274fddb9 xfs: Initialize variables in xfs_alloc_get_rec before using them Cleaning up coverity reported issues to do with corruption log messages. No visible symptoms, Not user reported. > d8cb5e423789 xfs: fix fdblocks accounting w/ RMAPBT per-AG reservation Minor free space accounting issue, not user reported, doesn't affect normal operation. > e53c4b598372 xfs: ensure post-EOF zeroing happens after zeroing part of a file Found with fsx via generic/127. Not user reported, doesn't affect userspace operation at all. > a3a374bf1889 xfs: fix off-by-one error in xfs_rtalloc_query_range Regression fix for code introduced in 4.18-rc1. Not user reported because the code has never been released. > 232d0a24b0fc xfs: fix uninitialized field in rtbitmap fsmap backend Coverity warning fix, not user reported, not user impact. > 5bd88d153998 xfs: recheck reflink state after grabbing ILOCK_SHARED for a write Fixes warning from generic/166, not user reported. Could affect users mixing direct IO with reflink, but we expect people using new functionality like reflink to be tracking TOT fairly closely anyway. > f62cb48e4319 xfs: don't allow insert-range to shift extents past the maximum offset Found by QA w/ generic/465. Not user reported, only affects files in the exabyte range so not a real world problem.... > aafe12cee0b1 xfs: don't trip over negative free space in xfs_reserve_blocks Found during ENOSPC stress tests that depeleted the reserve pool. Not user reported, unlikely to ever be hit by users. > 10ee25268e1f xfs: allow empty transactions while frozen Removes a spurious warning when running GETFSMAP ioctl on a frozen filesystem. Not user reported, highly unlikely any user will ever hit this as nothing but XFs utilities use GETFSMAP at the moment. > e53946dbd31a xfs: xfs_iflush_abort() can be called twice on cluster writeback failure Bug in corrupted filesystem handling, been there for ~15 years IIRC. Not user reported - found by one of our shutdown stress tests on a debug kernel (generic/388, IIRC). Highly unlikely to show up in the real world given how long the bug has been there. > 23fcb3340d03 xfs: More robust inode extent count validation Found by filesystem image fuzzing (i.e. intentional filesystem corruption). Not user reported, and the filesystem corruption that triggered this problem is so artificial there is really no chance of it ever occurring in the real world. > e2ac836307e3 xfs: simplify xfs_bmap_punch_delalloc_range Cleanup and simplification. Not a bug fix, not user reported, not a backport candidate. IOWs, there isn't a single commit in this list that is user reported, nor anything that I'd consider a stable kernel backport candidate because none of them affect normal user workloads. i.e. they've all be found by tools designed to break filesystems and exercise rarely travelled error paths. > Since I'm assuming that at least some of them are based on actual issues > users hit, and some of those apply to stable kernels, why would users > want to use an XFS version which is knowingly buggy? Your assumption is not only incorrect, it is fundamentally flawed. A list of commits containing bug fixes is not a list of bug reports from users. IOWs, backporting them only increases the risk of regressions for users, it doesn't reduce the risk of users hitting problems or fix any problems that users are at risk of actually hitting. IOWs, all of these changes fall on the wrong side of the risk-benefit analysis equation. Risk/benefit analysis is fundamental to software engineering processes. Running "git log" is not a risk analysis - it's just provides a list of things that you need to perform an analysis on. Risk analsysis takes time and effort, and to imply that it is not necessary and we should just backport everything makes the incorrect assumption that backporting carries no risk at all. It seems to me that the stable kernel process measures itself on how many commits an dhow fast they are backported from mainline kernels, and the entire focus of improvement is on backporting /more/ commits /faster/. i.e. it's all about the speed and quantity of code being moved back to the "stable" kernels. What it should be doing is identifying and addressing bugs or flaws that put users are risk or that users are reporting. Further, the speed at which backports occur (i.e. within a day or 3 of upstream commit) means that the code being backported hasn't had time to reach a wide testing audience and have regressions shaken out of it. The whole purpose of having progressively stricter -rcX upstream kernel releases is to allow the new code to stabilise and shake out unforseen regressions before it gets to users. The stable process is actually releasing upstream code to users before they can even get it in a released upstream kernel (i.e. a .0 kernel, not a -rcX). IOWs, pulling code back to stable kernels before it's had a chance to stabilise and be more widely tested in the upstream kernel is entirely the wrong thing to be doing. Speed here does not improve stability, it just increases the risk of regressions and unforseen bugs being introduced into the stable tree. And that's made worse by the fact that the -rcX process and widespread upstream testing that goes along with it* to catch those bugs and regressions. And that's made even worse by the fact that subsystems don't have control over what is backported anymore, so they may not even be aware that a fix for a fix needs to be sent back to stable kernels. This is the issue here - the "stable kernel" criteria is not about stability - it's being optimised to shovel as much change as possible with /as little effort as possible/ back into older code bases. That's not a recipe for stability, especially considering the relative lack of QA the stable kernels get. IMO, the whole set of linux kernel processes are being optimised around the wrong metrics - we count new features, the number of commits per release and the quantity of code that gets changed. We then optimise our processes to increase these metrics. IOWs, we're optimising for speed and rapid change, not quality, reliability and stability. We are not measuring code quality improvements, how effective our code review is, we do not do post-mortem analysis of major failures and we most certainly don't change processes to avoid those problems in future, etc. And worst of all is that people who want better processes to improve code quality, testing, etc get shouted at because it may slow down the rate at which we change code. i.e. only "speed and quantity" seems to matter to the core upstream kernel developement community. As Darrick said, what we are seeing here is a result of "[...] the kernel community's systemic inability to QA new fs features properly." I'm taking that one step further - what we are seeing here is the kernel community's systemic inability to address fundamental engineering process deficiencies because "speed and quantity" are considered more important than the quality of the product being produced. Cheers, Dave.
On Mon, Dec 3, 2018 at 1:23 AM Dave Chinner <david@fromorbit.com> wrote: > > On Sat, Dec 01, 2018 at 02:49:09AM -0500, Sasha Levin wrote: > > On Sat, Dec 01, 2018 at 08:50:05AM +1100, Dave Chinner wrote: > > >On Fri, Nov 30, 2018 at 05:14:41AM -0500, Sasha Levin wrote: > > >>On Fri, Nov 30, 2018 at 09:22:03AM +0100, Greg KH wrote: > > >>>On Fri, Nov 30, 2018 at 09:40:19AM +1100, Dave Chinner wrote: > > >>>>I stopped my tests at 5 billion ops yesterday (i.e. 20 billion ops > > >>>>aggregate) to focus on testing the copy_file_range() changes, but > > >>>>Darrick's tests are still ongoing and have passed 40 billion ops in > > >>>>aggregate over the past few days. > > >>>> > > >>>>The reason we are running these so long is that we've seen fsx data > > >>>>corruption failures after 12+ hours of runtime and hundreds of > > >>>>millions of ops. Hence the testing for backported fixes will need to > > >>>>replicate these test runs across multiple configurations for > > >>>>multiple days before we have any confidence that we've actually > > >>>>fixed the data corruptions and not introduced any new ones. > > >>>> > > >>>>If you pull only a small subset of the fixes, the fsx will still > > >>>>fail and we have no real way of actually verifying that there have > > >>>>been no regression introduced by the backport. IOWs, there's a > > >>>>/massive/ amount of QA needed for ensuring that these backports work > > >>>>correctly. > > >>>> > > >>>>Right now the XFS developers don't have the time or resources > > >>>>available to validate stable backports are correct and regression > > >>>>fre because we are focussed on ensuring the upstream fixes we've > > >>>>already made (and are still writing) are solid and reliable. > > >>> > > >>>Ok, that's fine, so users of XFS should wait until the 4.20 release > > >>>before relying on it? :) > > >> > > >>It's getting to the point that with the amount of known issues with XFS > > >>on LTS kernels it makes sense to mark it as CONFIG_BROKEN. > > > > > >Really? Where are the bug reports? > > > > In 'git log'! You report these every time you fix something in upstream > > xfs but don't backport it to stable trees: > > That is so wrong on so many levels I don't really know where to > begin. I guess doing a *basic risk analysis* demonstrating that none > of those fixes are backport candidates is a good start: > > > $ git log --oneline v4.18-rc1..v4.18 fs/xfs > > d4a34e165557 xfs: properly handle free inodes in extent hint validators > > Found by QA with generic/229 on a non-standard config. Not user > reported, unlikely to ever be seen by users. > > > 9991274fddb9 xfs: Initialize variables in xfs_alloc_get_rec before using them > > Cleaning up coverity reported issues to do with corruption log > messages. No visible symptoms, Not user reported. > > > d8cb5e423789 xfs: fix fdblocks accounting w/ RMAPBT per-AG reservation > > Minor free space accounting issue, not user reported, doesn't affect > normal operation. > > > e53c4b598372 xfs: ensure post-EOF zeroing happens after zeroing part of a file > > Found with fsx via generic/127. Not user reported, doesn't affect > userspace operation at all. > > > a3a374bf1889 xfs: fix off-by-one error in xfs_rtalloc_query_range > > Regression fix for code introduced in 4.18-rc1. Not user reported > because the code has never been released. > > > 232d0a24b0fc xfs: fix uninitialized field in rtbitmap fsmap backend > > Coverity warning fix, not user reported, not user impact. > > > 5bd88d153998 xfs: recheck reflink state after grabbing ILOCK_SHARED for a write > > Fixes warning from generic/166, not user reported. Could affect > users mixing direct IO with reflink, but we expect people using > new functionality like reflink to be tracking TOT fairly closely > anyway. > > > f62cb48e4319 xfs: don't allow insert-range to shift extents past the maximum offset > > Found by QA w/ generic/465. Not user reported, only affects files in > the exabyte range so not a real world problem.... > > > aafe12cee0b1 xfs: don't trip over negative free space in xfs_reserve_blocks > > Found during ENOSPC stress tests that depeleted the reserve pool. > Not user reported, unlikely to ever be hit by users. > > > 10ee25268e1f xfs: allow empty transactions while frozen > > Removes a spurious warning when running GETFSMAP ioctl on a frozen > filesystem. Not user reported, highly unlikely any user will ever > hit this as nothing but XFs utilities use GETFSMAP at the moment. > > > e53946dbd31a xfs: xfs_iflush_abort() can be called twice on cluster writeback failure > > Bug in corrupted filesystem handling, been there for ~15 years IIRC. > Not user reported - found by one of our shutdown stress tests > on a debug kernel (generic/388, IIRC). Highly unlikely to show up in > the real world given how long the bug has been there. > > > 23fcb3340d03 xfs: More robust inode extent count validation > > Found by filesystem image fuzzing (i.e. intentional filesystem > corruption). Not user reported, and the filesystem corruption that > triggered this problem is so artificial there is really no chance of > it ever occurring in the real world. > > > e2ac836307e3 xfs: simplify xfs_bmap_punch_delalloc_range > > Cleanup and simplification. Not a bug fix, not user reported, not a > backport candidate. > > IOWs, there isn't a single commit in this list that is user > reported, nor anything that I'd consider a stable kernel backport > candidate because none of them affect normal user workloads. i.e. > they've all be found by tools designed to break filesystems and > exercise rarely travelled error paths. > > > Since I'm assuming that at least some of them are based on actual issues > > users hit, and some of those apply to stable kernels, why would users > > want to use an XFS version which is knowingly buggy? > > Your assumption is not only incorrect, it is fundamentally flawed. > A list of commits containing bug fixes is not a list of bug reports > from users. > Up to here, we are in complete agreement. Thank you for the effort you've put into showing how much effort it takes to properly review candidate patches. Further down, although I can tell that your harsh response is due to Sasha's provoking suggestion to set CONFIG_BROKEN, IMO, your response can be perceived as sliding a bit into the territory of telling someone else how to do their job. It is one thing to advocate that well tested distro kernels are a better choice for end users. Greg has always advocated the same. It is another thing to suggest that the kernel.org stable trees have no value because they are not being maintained with the same standards as the distro stable kernels. The time and place for XFS maintainers to make a judgement about stable trees is whether or not they are willing to look at bug reports reproduced on kernel.org stable tree kernels. Whether or not kernel.org stable trees are useful is a risk/benefit analysis that each and every downstream user should be doing themselves. And it is the responsibility of the the stable tree maintainer to make the choices that affect their downstream users. In my personal opinion, as a downstream kernel.org stable tree user, there will be great value in the community maintained stable trees, if and when filesystem test suites will be run regularly on stable tree candidates. Whether or not those stable trees include "minor" bug fixes, as the ones that you listed above, should not be the concern of XFS maintainer, it should be the concern of downstream users making their own risk/benefit analysis. I am very much aware of the paradigm that less changes == less risk, which is the corner stone of maintaining a stable/maint branch. But at the same time, you seem to be ignoring the fact that people often make mistakes when cherry-picking over selectively, because some patches in the series that look like meaningless re-factoring or ones that fix "minor" bugs may actually be required for a later bug fix and it is not always evident from reading the commit messages. So there is more to the risk/benefit analysis then what you present. There is no replacement for good test coverage. The XFS subsystem excels in that department, which makes the validation of stable XFS tree candidates with xfstests very valuable. There is no replacement for human review of stable tree patch candidates. *HOWEVER*! the purpose of this review should be to point out backporting bugs - it should not be to tell the stable tree maintainer which bugs are stable tree eligible and which bugs are not. Please join me in an early New Year's resolution: We shall all strive to make 4.19.y LTS kernel more reliable than previous LTS kernels w.r.t filesystems in general and XFS in particular. Cheers to that, Amir. > IOWs, backporting them only increases the risk of regressions for > users, it doesn't reduce the risk of users hitting problems or fix > any problems that users are at risk of actually hitting. IOWs, all > of these changes fall on the wrong side of the risk-benefit analysis > equation. > > Risk/benefit analysis is fundamental to software engineering > processes. Running "git log" is not a risk analysis - it's just > provides a list of things that you need to perform an analysis on. > Risk analsysis takes time and effort, and to imply that it is not > necessary and we should just backport everything makes the incorrect > assumption that backporting carries no risk at all. > > It seems to me that the stable kernel process measures itself on how > many commits an dhow fast they are backported from mainline kernels, > and the entire focus of improvement is on backporting /more/ commits > /faster/. i.e. it's all about the speed and quantity of code being > moved back to the "stable" kernels. What it should be doing is > identifying and addressing bugs or flaws that put users are risk or > that users are reporting. > > Further, the speed at which backports occur (i.e. within a day or 3 > of upstream commit) means that the code being backported hasn't had > time to reach a wide testing audience and have regressions shaken > out of it. The whole purpose of having progressively stricter -rcX > upstream kernel releases is to allow the new code to stabilise and > shake out unforseen regressions before it gets to users. The stable > process is actually releasing upstream code to users before they can > even get it in a released upstream kernel (i.e. a .0 kernel, not a > -rcX). > > IOWs, pulling code back to stable kernels before it's had a chance > to stabilise and be more widely tested in the upstream kernel is > entirely the wrong thing to be doing. Speed here does not improve > stability, it just increases the risk of regressions and unforseen > bugs being introduced into the stable tree. And that's made worse by > the fact that the -rcX process and widespread upstream testing that > goes along with it* to catch those bugs and regressions. And that's > made even worse by the fact that subsystems don't have control over > what is backported anymore, so they may not even be aware that a fix > for a fix needs to be sent back to stable kernels. > > This is the issue here - the "stable kernel" criteria is not about > stability - it's being optimised to shovel as much change as > possible with /as little effort as possible/ back into older code > bases. That's not a recipe for stability, especially considering the > relative lack of QA the stable kernels get. > > IMO, the whole set of linux kernel processes are being optimised > around the wrong metrics - we count new features, the number of > commits per release and the quantity of code that gets changed. We > then optimise our processes to increase these metrics. IOWs, we're > optimising for speed and rapid change, not quality, reliability and > stability. > > We are not measuring code quality improvements, how effective our > code review is, we do not do post-mortem analysis of major failures > and we most certainly don't change processes to avoid those problems > in future, etc. And worst of all is that people who want better > processes to improve code quality, testing, etc get shouted at > because it may slow down the rate at which we change code. i.e. only > "speed and quantity" seems to matter to the core upstream kernel > developement community. > > As Darrick said, what we are seeing here is a result of "[...] the > kernel community's systemic inability to QA new fs features > properly." I'm taking that one step further - what we are seeing > here is the kernel community's systemic inability to address > fundamental engineering process deficiencies because "speed and > quantity" are considered more important than the quality of the > product being produced. > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com
On Mon, Dec 03, 2018 at 10:23:03AM +1100, Dave Chinner wrote: >On Sat, Dec 01, 2018 at 02:49:09AM -0500, Sasha Levin wrote: >> In 'git log'! You report these every time you fix something in upstream >> xfs but don't backport it to stable trees: > >That is so wrong on so many levels I don't really know where to >begin. I guess doing a *basic risk analysis* demonstrating that none >of those fixes are backport candidates is a good start: > >> $ git log --oneline v4.18-rc1..v4.18 fs/xfs >> d4a34e165557 xfs: properly handle free inodes in extent hint validators > >Found by QA with generic/229 on a non-standard config. Not user >reported, unlikely to ever be seen by users. > >> 9991274fddb9 xfs: Initialize variables in xfs_alloc_get_rec before using them > >Cleaning up coverity reported issues to do with corruption log >messages. No visible symptoms, Not user reported. > >> d8cb5e423789 xfs: fix fdblocks accounting w/ RMAPBT per-AG reservation > >Minor free space accounting issue, not user reported, doesn't affect >normal operation. > >> e53c4b598372 xfs: ensure post-EOF zeroing happens after zeroing part of a file > >Found with fsx via generic/127. Not user reported, doesn't affect >userspace operation at all. > >> a3a374bf1889 xfs: fix off-by-one error in xfs_rtalloc_query_range > >Regression fix for code introduced in 4.18-rc1. Not user reported >because the code has never been released. > >> 232d0a24b0fc xfs: fix uninitialized field in rtbitmap fsmap backend > >Coverity warning fix, not user reported, not user impact. > >> 5bd88d153998 xfs: recheck reflink state after grabbing ILOCK_SHARED for a write > >Fixes warning from generic/166, not user reported. Could affect >users mixing direct IO with reflink, but we expect people using >new functionality like reflink to be tracking TOT fairly closely >anyway. > >> f62cb48e4319 xfs: don't allow insert-range to shift extents past the maximum offset > >Found by QA w/ generic/465. Not user reported, only affects files in >the exabyte range so not a real world problem.... > >> aafe12cee0b1 xfs: don't trip over negative free space in xfs_reserve_blocks > >Found during ENOSPC stress tests that depeleted the reserve pool. >Not user reported, unlikely to ever be hit by users. > >> 10ee25268e1f xfs: allow empty transactions while frozen > >Removes a spurious warning when running GETFSMAP ioctl on a frozen >filesystem. Not user reported, highly unlikely any user will ever >hit this as nothing but XFs utilities use GETFSMAP at the moment. > >> e53946dbd31a xfs: xfs_iflush_abort() can be called twice on cluster writeback failure > >Bug in corrupted filesystem handling, been there for ~15 years IIRC. >Not user reported - found by one of our shutdown stress tests >on a debug kernel (generic/388, IIRC). Highly unlikely to show up in >the real world given how long the bug has been there. > >> 23fcb3340d03 xfs: More robust inode extent count validation > >Found by filesystem image fuzzing (i.e. intentional filesystem >corruption). Not user reported, and the filesystem corruption that >triggered this problem is so artificial there is really no chance of >it ever occurring in the real world. > >> e2ac836307e3 xfs: simplify xfs_bmap_punch_delalloc_range > >Cleanup and simplification. Not a bug fix, not user reported, not a >backport candidate. > >IOWs, there isn't a single commit in this list that is user >reported, nor anything that I'd consider a stable kernel backport >candidate because none of them affect normal user workloads. i.e. >they've all be found by tools designed to break filesystems and >exercise rarely travelled error paths. I think that part of our disagreement is the whole "user reported" criteria. Looking at myself as an example, unless I experience an obvious corruption I can reproduce, I am most likely to just ignore it and recreate the filesystem. This is even more true for "enterprisy" workloads where data may be replicated across multiple filesystems, and if one of these fails then its just silently discarded and replaced. User reports are hard to come by, not just for XFS but pretty much anywhere else in the kernel. Our debugging/reporting story is almost as bad as our QA ;) A few times above you used the word "unlikely" to indicate that a bug will never really be hit by real users. I strongly disagree with using this guess to decide if we're going to backport anything or not. Every time I meet with the FB folks I keep hearing how they end up hitting "once in a lifetime" bugs over and over on their infrastructure. Do we agree that the ideal solution would be backporting every fix, and having a solid QA system to validate it? Obviously it's not going to happen in the next year or two, but if we agree on the end goal then there's no point in this continued arguing about the steps in between :) >> Since I'm assuming that at least some of them are based on actual issues >> users hit, and some of those apply to stable kernels, why would users >> want to use an XFS version which is knowingly buggy? > >Your assumption is not only incorrect, it is fundamentally flawed. >A list of commits containing bug fixes is not a list of bug reports >from users. > >IOWs, backporting them only increases the risk of regressions for >users, it doesn't reduce the risk of users hitting problems or fix >any problems that users are at risk of actually hitting. IOWs, all >of these changes fall on the wrong side of the risk-benefit analysis >equation. > >Risk/benefit analysis is fundamental to software engineering >processes. Running "git log" is not a risk analysis - it's just >provides a list of things that you need to perform an analysis on. >Risk analsysis takes time and effort, and to imply that it is not >necessary and we should just backport everything makes the incorrect >assumption that backporting carries no risk at all. > >It seems to me that the stable kernel process measures itself on how >many commits an dhow fast they are backported from mainline kernels, >and the entire focus of improvement is on backporting /more/ commits >/faster/. i.e. it's all about the speed and quantity of code being >moved back to the "stable" kernels. What it should be doing is >identifying and addressing bugs or flaws that put users are risk or >that users are reporting. > >Further, the speed at which backports occur (i.e. within a day or 3 >of upstream commit) means that the code being backported hasn't had >time to reach a wide testing audience and have regressions shaken >out of it. The whole purpose of having progressively stricter -rcX >upstream kernel releases is to allow the new code to stabilise and >shake out unforseen regressions before it gets to users. The stable >process is actually releasing upstream code to users before they can >even get it in a released upstream kernel (i.e. a .0 kernel, not a >-rcX). One of the concerns I have about stable trees which we both share here is that no one really uses Linus's tree: it's used as an integration tree, but very few people actually test their workloads on it. Most testing ends up hapenning, sadly enough, on stable trees. I see it as an issue with our process for which I don't have an idea how to solve. >IOWs, pulling code back to stable kernels before it's had a chance >to stabilise and be more widely tested in the upstream kernel is >entirely the wrong thing to be doing. Speed here does not improve >stability, it just increases the risk of regressions and unforseen >bugs being introduced into the stable tree. And that's made worse by >the fact that the -rcX process and widespread upstream testing that >goes along with it* to catch those bugs and regressions. And that's >made even worse by the fact that subsystems don't have control over >what is backported anymore, so they may not even be aware that a fix >for a fix needs to be sent back to stable kernels. > >This is the issue here - the "stable kernel" criteria is not about >stability - it's being optimised to shovel as much change as >possible with /as little effort as possible/ back into older code >bases. That's not a recipe for stability, especially considering the >relative lack of QA the stable kernels get. > >IMO, the whole set of linux kernel processes are being optimised >around the wrong metrics - we count new features, the number of >commits per release and the quantity of code that gets changed. We >then optimise our processes to increase these metrics. IOWs, we're >optimising for speed and rapid change, not quality, reliability and >stability. > >We are not measuring code quality improvements, how effective our >code review is, we do not do post-mortem analysis of major failures >and we most certainly don't change processes to avoid those problems >in future, etc. And worst of all is that people who want better >processes to improve code quality, testing, etc get shouted at >because it may slow down the rate at which we change code. i.e. only >"speed and quantity" seems to matter to the core upstream kernel >developement community. > >As Darrick said, what we are seeing here is a result of "[...] the >kernel community's systemic inability to QA new fs features >properly." I'm taking that one step further - what we are seeing >here is the kernel community's systemic inability to address >fundamental engineering process deficiencies because "speed and >quantity" are considered more important than the quality of the >product being produced. This is a case where theory collides with the real world. Yes, our QA is lacking, but we don't have the option of not doing the current process. If we stop backporting until a future data where our QA problem is solved we'll end up with what we had before: users stuck on ancient kernels without a way to upgrade. With the current model we're aware that bugs sneak through, but we try to deal with it by both improving our QA, and encouraging users to do their own extensive QA. If we encourage users to update frequently we can keep improving our process and the quality of kernels will keep getting better. We simply can't go back to the "enterprise distro" days. -- Thanks, Sasha
On Sun, Dec 2, 2018 at 9:09 PM Greg KH <gregkh@linuxfoundation.org> wrote: > > On Sun, Dec 02, 2018 at 08:10:16AM -0800, Christoph Hellwig wrote: > > As someone who has done xfs stable backports for a while I really don't > > think the autoselection is helpful at all. > > autoselection for xfs patches has been turned off for a while, what > triggered this email thread was a core vfs patch that was backported > that was not obvious it was created by the xfs developers due to a > problem they had found. Sorry for hijacking this thread. Can you please also disable autoselection for MTD, UBI and UBIFS? fs/ubifs/ drivers/mtd/ include/linux/mtd/ include/uapi/mtd/
On Mon, Dec 03, 2018 at 03:41:27PM +0100, Richard Weinberger wrote: >On Sun, Dec 2, 2018 at 9:09 PM Greg KH <gregkh@linuxfoundation.org> wrote: >> >> On Sun, Dec 02, 2018 at 08:10:16AM -0800, Christoph Hellwig wrote: >> > As someone who has done xfs stable backports for a while I really don't >> > think the autoselection is helpful at all. >> >> autoselection for xfs patches has been turned off for a while, what >> triggered this email thread was a core vfs patch that was backported >> that was not obvious it was created by the xfs developers due to a >> problem they had found. > >Sorry for hijacking this thread. >Can you please also disable autoselection for MTD, UBI and UBIFS? > >fs/ubifs/ >drivers/mtd/ >include/linux/mtd/ >include/uapi/mtd/ Sure, done! -- Thanks, Sasha
Den 2018-12-03 kl. 11:22, skrev Sasha Levin: > > This is a case where theory collides with the real world. Yes, our QA is > lacking, but we don't have the option of not doing the current process. > If we stop backporting until a future data where our QA problem is > solved we'll end up with what we had before: users stuck on ancient > kernels without a way to upgrade. > Sorry, but you seem to be living in a different "real world"... People stay on "ancient kernels" that "just works" instead of updating to a newer one that "hopefully/maybe/... works" > With the current model we're aware that bugs sneak through, but we try > to deal with it by both improving our QA, and encouraging users to do > their own extensive QA. If we encourage users to update frequently we > can keep improving our process and the quality of kernels will keep > getting better. And here you want to turn/force users into QA ... good luck with that. In reality they wont "update frequently", instead they will stop updating when they have something that works... and start ignoring updates as they expect something "to break as usual" as they actually need to get some real work done too... > > We simply can't go back to the "enterprise distro" days. > Maybe so, but we should atleast get back to having "stable" or "longterm" actually mean something again... Or what does it say when distros starts thinking about ignoring (and some already do) stable/longterm trees because there is _way_ too much questionable changes coming through, even overriding maintainers to the point where they basically state "we dont care about monitoring stable trees anymore, as they add whatever they want anyway"... And pretending that every fix is important enough to backport, and saying if you dont take everything you have an "unsecure" kernel wont help, as reality has shown from time to time that backports can/will open up a new issue instead for no good reason Wich for distros starts to mean, switch back to selectively taking fixes for _known_ security issues are considered way better choice End result, no-one cares about -stable trees -> no-one uses them -> a lot of wasted work for nothing... -- Thomas
On Mon, Dec 03, 2018 at 11:22:46PM +0159, Thomas Backlund wrote: > Den 2018-12-03 kl. 11:22, skrev Sasha Levin: > > > > > This is a case where theory collides with the real world. Yes, our QA is > > lacking, but we don't have the option of not doing the current process. > > If we stop backporting until a future data where our QA problem is > > solved we'll end up with what we had before: users stuck on ancient > > kernels without a way to upgrade. > > > > Sorry, but you seem to be living in a different "real world"... > > People stay on "ancient kernels" that "just works" instead of updating > to a newer one that "hopefully/maybe/... works" That's not good as those "ancient kernels" really just are "kernels with lots of known security bugs". It's your systems, I can't tell you what to do, but I will tell you that running older, unfixed kernels, is a known liability. Good luck! greg k-h
On Mon, Dec 03, 2018 at 11:22:46PM +0159, Thomas Backlund wrote: >Den 2018-12-03 kl. 11:22, skrev Sasha Levin: > >> >> This is a case where theory collides with the real world. Yes, our QA is >> lacking, but we don't have the option of not doing the current process. >> If we stop backporting until a future data where our QA problem is >> solved we'll end up with what we had before: users stuck on ancient >> kernels without a way to upgrade. >> > >Sorry, but you seem to be living in a different "real world"... > >People stay on "ancient kernels" that "just works" instead of updating >to a newer one that "hopefully/maybe/... works" If users are stuck at older kernels and refuse to update then there's not much I can do about it. They are knowingly staying on kernels with known issues and will end up paying a much bigger price later to update. >> With the current model we're aware that bugs sneak through, but we try >> to deal with it by both improving our QA, and encouraging users to do >> their own extensive QA. If we encourage users to update frequently we >> can keep improving our process and the quality of kernels will keep >> getting better. > >And here you want to turn/force users into QA ... good luck with that. Yes, users are expected to test their workloads with new kernels - I'm not sure why this is a surprise to anyone. Isn't it true for every other piece of software? I invite you to read Jon's great summary on LWN of a related session that happened during the maintainer's summit: https://lwn.net/Articles/769253/ . The conclusion reached was very similar. >In reality they wont "update frequently", instead they will stop >updating when they have something that works... and start ignoring >updates as they expect something "to break as usual" as they actually >need to get some real work done too... Again, this model was proven to be bad in the past, and if users keep following it then they're knowingly shooting themselves in the foot. > >> >> We simply can't go back to the "enterprise distro" days. >> > >Maybe so, but we should atleast get back to having "stable" or >"longterm" actually mean something again... > >Or what does it say when distros starts thinking about ignoring >(and some already do) stable/longterm trees because there is >_way_ too much questionable changes coming through, even overriding >maintainers to the point where they basically state "we dont care >about monitoring stable trees anymore, as they add whatever they want >anyway"... I'm assuming you mean "enterprise distros" here, as most of the community distros I'm aware of are tracking stable trees. Enterprise distros are a mix of everything: on one hand they would refuse most stable patches because they don't have any demand from customers to fix those bugs, but on the other hand they will update drivers and subsystems as a whole to create these frankenstein kernels that are very difficult to support. When your kernel is driven by paying customer demands it's difficult to argue for the technical merits of your process. >And pretending that every fix is important enough to backport, >and saying if you dont take everything you have an "unsecure" kernel >wont help, as reality has shown from time to time that backports >can/will open up a new issue instead for no good reason > >Wich for distros starts to mean, switch back to selectively taking fixes >for _known_ security issues are considered way better choice That was my exact thinking 2 years ago (see my stable-security project: https://lwn.net/Articles/683335/). I even had a back-and-forth with Greg on LKML when I was trying to argue your point: "Lets only take security fixes because no one cares about the other crap". If you're interested, I'd be happy to explain further why this was a complete flop. -- Thanks, Sasha
On Mon 2018-12-03 23:22:46, Thomas Backlund wrote: > Den 2018-12-03 kl. 11:22, skrev Sasha Levin: > > > > > This is a case where theory collides with the real world. Yes, our QA is > > lacking, but we don't have the option of not doing the current process. > > If we stop backporting until a future data where our QA problem is > > solved we'll end up with what we had before: users stuck on ancient > > kernels without a way to upgrade. > > > > Sorry, but you seem to be living in a different "real world"... I have to agree here :-(. > People stay on "ancient kernels" that "just works" instead of updating > to a newer one that "hopefully/maybe/... works" Stable has a rules community agreed on, unfortunately stable team just simply ignores those and decided to do "whatever they please". Process went from "serious bugs that bother people only" to "hey, this looks like a bugfix, lets put it into tree and see what it breaks"... :-(. Pavel
On Fri, Dec 28, 2018 at 09:06:24AM +0100, Pavel Machek wrote: > On Mon 2018-12-03 23:22:46, Thomas Backlund wrote: > > Den 2018-12-03 kl. 11:22, skrev Sasha Levin: > > > > > > > > This is a case where theory collides with the real world. Yes, our QA is > > > lacking, but we don't have the option of not doing the current process. > > > If we stop backporting until a future data where our QA problem is > > > solved we'll end up with what we had before: users stuck on ancient > > > kernels without a way to upgrade. > > > > > > > Sorry, but you seem to be living in a different "real world"... > > I have to agree here :-(. > > > People stay on "ancient kernels" that "just works" instead of updating > > to a newer one that "hopefully/maybe/... works" > > Stable has a rules community agreed on, unfortunately stable team just > simply ignores those and decided to do "whatever they please". > > Process went from "serious bugs that bother people only" to "hey, this > looks like a bugfix, lets put it into tree and see what it breaks"... Resulting in us having to tell users not to use stable kernels because they can contain broken commits from upstream that did not go through maintainer tree and test cycles. https://marc.info/?l=linux-xfs&m=154544499507105&w=2 In this case, the broken commit to the fs/iomap.c code was merged upstream through the akpm tree, rather than the XFS tree and test process as previous changes to this code had been staged. It was then backported so fast and released so quickly that it hadn't got back into the XFS upstream tree test cycles until after it had already committed to at least one stable kernel. We'd only just registered and confirmed a regression in in post -rc7 upstream trees when the stale kernel containing the commit was released. It took us another couple of days to isolate failing configuration and bisect it down to the commit. Only when I got "formlettered" for cc'ing the stable kernel list on the revert patch (because I wanted to make sure the stable kernel maintainers knew it was being reverted and so it wouldn't be backported) did I learn it had already been "auto-backported" and released in a stable kernel in under a week. Essentially, the "auto-backport" completely short-circuited the upstream QA process..... IOWs, if you were looking for a case study to demonstrate the failings of the current stable process, this is it. Cheers, Dave.
diff --git a/fs/iomap.c b/fs/iomap.c index 8f7673a69273..407efdae3978 100644 --- a/fs/iomap.c +++ b/fs/iomap.c @@ -940,7 +940,14 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length, dio->submit.cookie = submit_bio(bio); } while (nr_pages); - if (need_zeroout) { + /* + * We need to zeroout the tail of a sub-block write if the extent type + * requires zeroing or the write extends beyond EOF. If we don't zero + * the block tail in the latter case, we can expose stale data via mmap + * reads of the EOF block. + */ + if (need_zeroout || + ((dio->flags & IOMAP_DIO_WRITE) && pos >= i_size_read(inode))) { /* zero out from the end of the write to the end of the block */ pad = pos & (fs_block_size - 1); if (pad)