Message ID | 173197064441.904310.18406008193922603782.stgit@frogsfrogsfrogs (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Series | [01/12] generic/757: fix various bugs in this test | expand |
On Mon, Nov 18, 2024 at 03:01:38PM -0800, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@kernel.org> > > Fix this test so the check doesn't fail on XFS, and restrict runtime to > 100 loops because otherwise this test takes many hours. > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > Reviewed-by: Christoph Hellwig <hch@lst.de> > --- > tests/generic/757 | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > > diff --git a/tests/generic/757 b/tests/generic/757 > index 0ff5a8ac00182b..9d41975bde07bb 100755 > --- a/tests/generic/757 > +++ b/tests/generic/757 > @@ -63,9 +63,14 @@ prev=$(_log_writes_mark_to_entry_number mkfs) > cur=$(_log_writes_find_next_fua $prev) > [ -z "$cur" ] && _fail "failed to locate next FUA write" > > -while [ ! -z "$cur" ]; do > +for ((i = 0; i < 100; i++)); do > _log_writes_replay_log_range $cur $SCRATCH_DEV >> $seqres.full > > + # xfs_repair won't run if the log is dirty > + if [ $FSTYP = "xfs" ]; then > + _scratch_mount > + _scratch_unmount > + fi Hi Darrick, I didn't merge this patch last week, due to we were still talking about the "discards" things: https://lore.kernel.org/fstests/20241115182821.s3pt4wmkueyjggx3@dell-per750-06-vm-08.rhts.eng.pek2.redhat.com/T/#u Do you think we need to do a force discards at here, or change the SCRATCH_DEV to dmthin to support discards? Thanks, Zorro > _check_scratch_fs > > prev=$cur >
On Thu, Nov 21, 2024 at 05:56:24PM +0800, Zorro Lang wrote: > I didn't merge this patch last week, due to we were still talking > about the "discards" things: > > https://lore.kernel.org/fstests/20241115182821.s3pt4wmkueyjggx3@dell-per750-06-vm-08.rhts.eng.pek2.redhat.com/T/#u > > Do you think we need to do a force discards at here, or change the > SCRATCH_DEV to dmthin to support discards? FYI, I'm seeing regular failures with generic/757 when using Darrick's not yet merged RT rmap support, but only with that. But the whole discard thing leaves me really confused, and the commit log in the patch references by the above link doesn't clear that up either. Why does dmlogwrites require discard for XFS (and apprently XFS only)? Note that discard is not required and often does not zero data. So if we need data to be zeroed we need to do that explicitly, and preferably in a way that is obvious.
On Thu, Nov 21, 2024 at 11:05:55AM +0100, Christoph Hellwig wrote: > But the whole discard thing leaves me really confused, and the commit > log in the patch references by the above link doesn't clear that up > either. > > Why does dmlogwrites require discard for XFS (and apprently XFS only)? > Note that discard is not required and often does not zero data. So > if we need data to be zeroed we need to do that explicitly, and > preferably in a way that is obvious. Ok, I found the problem. src/log-writes/log-writes.c does try to discard to zero, which is broken. This patch switches it to the proper zeroout ioctl, which fixes my generic/757 issues, and should also allow to revert commits to use dm-thin referenced in your link. diff --git a/src/log-writes/log-writes.c b/src/log-writes/log-writes.c index aa53473974d9..cb3ac962574c 100644 --- a/src/log-writes/log-writes.c +++ b/src/log-writes/log-writes.c @@ -31,7 +31,7 @@ static int discard_range(struct log *log, u64 start, u64 len) { u64 range[2] = { start, len }; - if (ioctl(log->replayfd, BLKDISCARD, &range) < 0) { + if (ioctl(log->replayfd, BLKZEROOUT, &range) < 0) { if (log_writes_verbose) printf("replay device doesn't support discard, " "switching to writing zeros\n");
On Thu, Nov 21, 2024 at 11:13:25AM +0100, Christoph Hellwig wrote:
> proper zeroout ioctl, which fixes my generic/757 issues, and should
It turns out then when I extrapolate my shortened 10 iteration run
to the 100 currently in the test it would take ~ 30 minutes. I'm
not sure that's really a reasonable run time for a single test in
the auto group.
On Thu, Nov 21, 2024 at 11:05:55AM +0100, Christoph Hellwig wrote: > On Thu, Nov 21, 2024 at 05:56:24PM +0800, Zorro Lang wrote: > > I didn't merge this patch last week, due to we were still talking > > about the "discards" things: > > > > https://lore.kernel.org/fstests/20241115182821.s3pt4wmkueyjggx3@dell-per750-06-vm-08.rhts.eng.pek2.redhat.com/T/#u > > > > Do you think we need to do a force discards at here, or change the > > SCRATCH_DEV to dmthin to support discards? > > FYI, I'm seeing regular failures with generic/757 when using Darrick's > not yet merged RT rmap support, but only with that. > > But the whole discard thing leaves me really confused, and the commit > log in the patch references by the above link doesn't clear that up > either. > > Why does dmlogwrites require discard for XFS (and apprently XFS only)? > Note that discard is not required and often does not zero data. So > if we need data to be zeroed we need to do that explicitly, and > preferably in a way that is obvious. > IIRC it was to accommodate the test program, which presumably used discard for efficiency reasons because it did a lot of context switching to different point-in-time variations of the fs. If the discard didn't actually zero the range (depending on the underlying test dev), then at least on XFS, we'd see odd recovery issues and whatnot from the fs going forward/back in time. Therefore the reason for using dm-thin was that it was an easy way to provide predictable behavior to the test program, where discards punch out blocks that subsequently return zeroes. I don't recall all the specifics, but I thought part of the reason for using discard over explicit zeroing was the latter made the test impractically slow. I could be misremembering, but if you want to change it I'd suggest to at least verify runtimes on some of the preexisting logwrites tests as well. Brian
On Thu, Nov 21, 2024 at 07:28:09AM -0500, Brian Foster wrote: > IIRC it was to accommodate the test program, which presumably used > discard for efficiency reasons because it did a lot of context switching > to different point-in-time variations of the fs. If the discard didn't > actually zero the range (depending on the underlying test dev), then at > least on XFS, we'd see odd recovery issues and whatnot from the fs going > forward/back in time. But the fundamental problem is that discard does not actually zero data. It sometimes might, but also often doesn't because it's not part of the contract. That's why my patch two switch to the zeroing ioctl is the right thing. Note that this doesn't mean explicitly writing zeroes, it still uses zeroing offloads available in the drivers. > I don't recall all the specifics, but I thought part of the reason for > using discard over explicit zeroing was the latter made the test > impractically slow. I could be misremembering, but if you want to change > it I'd suggest to at least verify runtimes on some of the preexisting > logwrites tests as well. I'm all for speeding up tests. But relying on a unspecified side effect of an operation and then requiring a driver that implements that side effect without documenting that isn't really good practice.
On Thu, Nov 21, 2024 at 02:12:39PM +0100, Christoph Hellwig wrote: > On Thu, Nov 21, 2024 at 07:28:09AM -0500, Brian Foster wrote: > > IIRC it was to accommodate the test program, which presumably used > > discard for efficiency reasons because it did a lot of context switching > > to different point-in-time variations of the fs. If the discard didn't > > actually zero the range (depending on the underlying test dev), then at > > least on XFS, we'd see odd recovery issues and whatnot from the fs going > > forward/back in time. > > But the fundamental problem is that discard does not actually zero > data. It sometimes might, but also often doesn't because it's not > part of the contract. That's why my patch two switch to the zeroing > ioctl is the right thing. Note that this doesn't mean explicitly > writing zeroes, it still uses zeroing offloads available in the drivers. > Ok. I get that wrt discard.. I'm saying that's why we inserted dm-thin into the stack, because it had predictable behavior that fell in line with this particular design quirk of the higher level test program. I've no issue with changing it so long as we maintain effectiveness of the test. I'm not familiar enough with the zeroing offloads in the drivers you mention to know whether that means this test can still run quickly in the general case, or only with select hardware..? > > I don't recall all the specifics, but I thought part of the reason for > > using discard over explicit zeroing was the latter made the test > > impractically slow. I could be misremembering, but if you want to change > > it I'd suggest to at least verify runtimes on some of the preexisting > > logwrites tests as well. > > I'm all for speeding up tests. But relying on a unspecified side effect > of an operation and then requiring a driver that implements that side > effect without documenting that isn't really good practice. > It's a hack to facilitate test coverage. It would obviously need to be revisited if behavior changed sufficiently to break the test. I'm not really sure what you're asking for wrt documentation. A quick scan of the git history shows the first such commit is 65cc9a235919 ("generic/482: use thin volume as data device"), the commit log for which seems to explain the reasoning. Brian
On Thu, Nov 21, 2024 at 07:28:09AM -0500, Brian Foster wrote: > On Thu, Nov 21, 2024 at 11:05:55AM +0100, Christoph Hellwig wrote: > > On Thu, Nov 21, 2024 at 05:56:24PM +0800, Zorro Lang wrote: > > > I didn't merge this patch last week, due to we were still talking > > > about the "discards" things: > > > > > > https://lore.kernel.org/fstests/20241115182821.s3pt4wmkueyjggx3@dell-per750-06-vm-08.rhts.eng.pek2.redhat.com/T/#u > > > > > > Do you think we need to do a force discards at here, or change the > > > SCRATCH_DEV to dmthin to support discards? > > > > FYI, I'm seeing regular failures with generic/757 when using Darrick's > > not yet merged RT rmap support, but only with that. > > > > But the whole discard thing leaves me really confused, and the commit > > log in the patch references by the above link doesn't clear that up > > either. > > > > Why does dmlogwrites require discard for XFS (and apprently XFS only)? > > Note that discard is not required and often does not zero data. So > > if we need data to be zeroed we need to do that explicitly, and > > preferably in a way that is obvious. > > > > IIRC it was to accommodate the test program, which presumably used > discard for efficiency reasons because it did a lot of context switching > to different point-in-time variations of the fs. If the discard didn't > actually zero the range (depending on the underlying test dev), then at > least on XFS, we'd see odd recovery issues and whatnot from the fs going > forward/back in time. Yes, that's my recollection too -- performing a logwrite replay of an old mark means that you can end up with blocks with the correct fs uuid but an LSN that's higher than anything in the log. Recovery will then skip the block replay, which is not correct. I suppose we could fix log recovery to treat incoming block LSNs that are higher than the log head as if there were no block contents at all. OTOH going backwards in time isn't usually a concern...right? > Therefore the reason for using dm-thin was that it was an easy way to > provide predictable behavior to the test program, where discards punch > out blocks that subsequently return zeroes. Yep. The test needs to reset the block device to a zeroed state. Discards get us there quickly, but only if discard_zeroes_data==1. Hence bolting dm-thinp (where this is guaranteed) onto the logwrites tests. > I don't recall all the specifics, but I thought part of the reason for > using discard over explicit zeroing was the latter made the test > impractically slow. I could be misremembering, but if you want to change > it I'd suggest to at least verify runtimes on some of the preexisting > logwrites tests as well. Not sure -- I think BLKZEROOUT will cause allocations and real disk writes if we're not careful. --D > Brian > >
On Thu, Nov 21, 2024 at 11:52:48AM +0100, Christoph Hellwig wrote: > On Thu, Nov 21, 2024 at 11:13:25AM +0100, Christoph Hellwig wrote: > > proper zeroout ioctl, which fixes my generic/757 issues, and should > > It turns out then when I extrapolate my shortened 10 iteration run > to the 100 currently in the test it would take ~ 30 minutes. I'm > not sure that's really a reasonable run time for a single test in > the auto group. Yeah, perhaps I should adjust this one to use TIME_FACTOR too? while _soak_loop_running $((10 * TIME_FACTOR)); do # timeconsuming stuff done (and then you can SOAK_DURATION=10s to limit the runtime) --D
On Thu, Nov 21, 2024 at 08:33:55AM -0800, Darrick J. Wong wrote: > On Thu, Nov 21, 2024 at 11:52:48AM +0100, Christoph Hellwig wrote: > > On Thu, Nov 21, 2024 at 11:13:25AM +0100, Christoph Hellwig wrote: > > > proper zeroout ioctl, which fixes my generic/757 issues, and should > > > > It turns out then when I extrapolate my shortened 10 iteration run > > to the 100 currently in the test it would take ~ 30 minutes. I'm > > not sure that's really a reasonable run time for a single test in > > the auto group. > > Yeah, perhaps I should adjust this one to use TIME_FACTOR too? > > while _soak_loop_running $((10 * TIME_FACTOR)); do > # timeconsuming stuff > done > > (and then you can SOAK_DURATION=10s to limit the runtime) Oh, that's with your BLKDISCARD -> BLKZEROOUT change applied, isn't it? On my system, 100 loops takes 96 seconds with discard and 639 seconds with zeroout. So yes, we should shift this test to use thinp. I was going to say that we should also make _log_writes_init skip the test if the block device has discard_zeroes_data==0, but apparently it's now hardwired to zero and (AFAICT) there's no way to tell if unmap actually zeroes even for software defined devices like thinp where we *know* that works. :( --D
On Thu, Nov 21, 2024 at 09:11:56AM -0500, Brian Foster wrote: > > I'm all for speeding up tests. But relying on a unspecified side effect > > of an operation and then requiring a driver that implements that side > > effect without documenting that isn't really good practice. > > > > It's a hack to facilitate test coverage. It would obviously need to be > revisited if behavior changed sufficiently to break the test. > > I'm not really sure what you're asking for wrt documentation. A quick > scan of the git history shows the first such commit is 65cc9a235919 > ("generic/482: use thin volume as data device"), the commit log for > which seems to explain the reasoning. A comment on _log_writes_init that it must only be used by dm-thin because it relies on the undocumented behavior that dm-trim zeroes all blocks discarded. Or even better my moving the dm-think setup boilerplate into the log writes helpers, so that it gets done automatically.
On Thu, Nov 21, 2024 at 08:04:15AM -0800, Darrick J. Wong wrote: > > IIRC it was to accommodate the test program, which presumably used > > discard for efficiency reasons because it did a lot of context switching > > to different point-in-time variations of the fs. If the discard didn't > > actually zero the range (depending on the underlying test dev), then at > > least on XFS, we'd see odd recovery issues and whatnot from the fs going > > forward/back in time. > > Yes, that's my recollection too -- performing a logwrite replay of an > old mark means that you can end up with blocks with the correct fs uuid > but an LSN that's higher than anything in the log. Recovery will then > skip the block replay, which is not correct. > > I suppose we could fix log recovery to treat incoming block LSNs that > are higher than the log head as if there were no block contents at all. > OTOH going backwards in time isn't usually a concern...right? It's probably the best we can do. Recover as far as everything validated and then give up. > > > Therefore the reason for using dm-thin was that it was an easy way to > > provide predictable behavior to the test program, where discards punch > > out blocks that subsequently return zeroes. > > Yep. The test needs to reset the block device to a zeroed state. > Discards get us there quickly, but only if discard_zeroes_data==1. > Hence bolting dm-thinp (where this is guaranteed) onto the logwrites > tests. discard_zeroes_data was unfortunately always broken as no standard gives you any such guarantee. The best you get is a guarantee that it returns zeroes if it actually deallocated the block, but if it deallocates a given block or not is a black box. > > > I don't recall all the specifics, but I thought part of the reason for > > using discard over explicit zeroing was the latter made the test > > impractically slow. I could be misremembering, but if you want to change > > it I'd suggest to at least verify runtimes on some of the preexisting > > logwrites tests as well. > > Not sure -- I think BLKZEROOUT will cause allocations and real disk > writes if we're not careful. If the device reports a queue/write_zeroes_max_bytes it supports a hardware offload. That could still write zeroes to the media if the device is stupid enough, but hopefully not many are.
On Thu, Nov 21, 2024 at 09:19:24AM -0800, Darrick J. Wong wrote: > Oh, that's with your BLKDISCARD -> BLKZEROOUT change applied, isn't it? > On my system, 100 loops takes 96 seconds with discard and 639 seconds > with zeroout. The discard path is guaranteed to fail after the first iteration, but that at least happens very quickly..
On Fri, Nov 22, 2024 at 01:31:33PM +0100, Christoph Hellwig wrote: > On Thu, Nov 21, 2024 at 09:11:56AM -0500, Brian Foster wrote: > > > I'm all for speeding up tests. But relying on a unspecified side effect > > > of an operation and then requiring a driver that implements that side > > > effect without documenting that isn't really good practice. > > > > > > > It's a hack to facilitate test coverage. It would obviously need to be > > revisited if behavior changed sufficiently to break the test. > > > > I'm not really sure what you're asking for wrt documentation. A quick > > scan of the git history shows the first such commit is 65cc9a235919 > > ("generic/482: use thin volume as data device"), the commit log for > > which seems to explain the reasoning. > > A comment on _log_writes_init that it must only be used by dm-thin > because it relies on the undocumented behavior that dm-trim zeroes > all blocks discarded. > > Or even better my moving the dm-think setup boilerplate into the log > writes helpers, so that it gets done automatically. > A related idea might be to incorporate your BLKZEROOUT fix so the core tool is fundamentally correct, but then wrap the existing discard behavior in a param or something that the dm-thin oriented tests can pass to enable it as a fast zero hack/optimization. But that all seems reasonable to me either way. I'm not sure that's something I would have fully abstracted into the logwrites stuff initially, but here we are ~5 years later and it seems pretty much every additional logwrites test has wanted the same treatment. If whoever wants to convert this newer test over wants to start by refactoring things that way, that sounds like a welcome cleanup to me. Brian
On Fri, Nov 22, 2024 at 08:49:42AM -0500, Brian Foster wrote: > On Fri, Nov 22, 2024 at 01:31:33PM +0100, Christoph Hellwig wrote: > > On Thu, Nov 21, 2024 at 09:11:56AM -0500, Brian Foster wrote: > > > > I'm all for speeding up tests. But relying on a unspecified side effect > > > > of an operation and then requiring a driver that implements that side > > > > effect without documenting that isn't really good practice. > > > > > > > > > > It's a hack to facilitate test coverage. It would obviously need to be > > > revisited if behavior changed sufficiently to break the test. > > > > > > I'm not really sure what you're asking for wrt documentation. A quick > > > scan of the git history shows the first such commit is 65cc9a235919 > > > ("generic/482: use thin volume as data device"), the commit log for > > > which seems to explain the reasoning. > > > > A comment on _log_writes_init that it must only be used by dm-thin > > because it relies on the undocumented behavior that dm-trim zeroes > > all blocks discarded. > > > > Or even better my moving the dm-think setup boilerplate into the log > > writes helpers, so that it gets done automatically. > > > > A related idea might be to incorporate your BLKZEROOUT fix so the core > tool is fundamentally correct, but then wrap the existing discard > behavior in a param or something that the dm-thin oriented tests can > pass to enable it as a fast zero hack/optimization. > > But that all seems reasonable to me either way. I'm not sure that's > something I would have fully abstracted into the logwrites stuff > initially, but here we are ~5 years later and it seems pretty much every > additional logwrites test has wanted the same treatment. If whoever > wants to convert this newer test over wants to start by refactoring > things that way, that sounds like a welcome cleanup to me. Ugh, I just want to fix this stupid test and move on with the bugfixes, not refactor every logwrites user in the codebase just to reduce one test's runtime from hours to 90s. It's not as simple as making the logwrites init function set up thinp on its own, because there's at least one test out there (generic/470) that takes care of its own discarding, and then there's whatever the strange stuff that the tests/btrfs/ users do -- it looks fairly simple, but I don't really want to go digging into that just to make sure I didn't break their testing. I'll send what I have currently, which adds a warning about running logwrites on a device that supports discard but isn't thinp... in addition to fixing the xfs log recovery thing, and in addition to fixing the loop duration. I guess I can add yet another patch to switch the replay program to use BLKDISCARD if the _init function thinks it's ok, but seriously... you guys need to send start sending patches implementing the new functionality that you suggest. --D > Brian > >
On Fri, Nov 22, 2024 at 08:13:47AM -0800, Darrick J. Wong wrote: > I guess I can add yet another patch to switch the replay program to use > BLKDISCARD if the _init function thinks it's ok, but seriously... you > guys need to send start sending patches implementing the new > functionality that you suggest. I can look into the refactoring and the comments, I wasn't planning to offload the work onto you, just making sure we have robust infrastructure.
On Fri, Nov 22, 2024 at 08:13:47AM -0800, Darrick J. Wong wrote: > On Fri, Nov 22, 2024 at 08:49:42AM -0500, Brian Foster wrote: > > On Fri, Nov 22, 2024 at 01:31:33PM +0100, Christoph Hellwig wrote: > > > On Thu, Nov 21, 2024 at 09:11:56AM -0500, Brian Foster wrote: > > > > > I'm all for speeding up tests. But relying on a unspecified side effect > > > > > of an operation and then requiring a driver that implements that side > > > > > effect without documenting that isn't really good practice. > > > > > > > > > > > > > It's a hack to facilitate test coverage. It would obviously need to be > > > > revisited if behavior changed sufficiently to break the test. > > > > > > > > I'm not really sure what you're asking for wrt documentation. A quick > > > > scan of the git history shows the first such commit is 65cc9a235919 > > > > ("generic/482: use thin volume as data device"), the commit log for > > > > which seems to explain the reasoning. > > > > > > A comment on _log_writes_init that it must only be used by dm-thin > > > because it relies on the undocumented behavior that dm-trim zeroes > > > all blocks discarded. > > > > > > Or even better my moving the dm-think setup boilerplate into the log > > > writes helpers, so that it gets done automatically. > > > > > > > A related idea might be to incorporate your BLKZEROOUT fix so the core > > tool is fundamentally correct, but then wrap the existing discard > > behavior in a param or something that the dm-thin oriented tests can > > pass to enable it as a fast zero hack/optimization. > > > > But that all seems reasonable to me either way. I'm not sure that's > > something I would have fully abstracted into the logwrites stuff > > initially, but here we are ~5 years later and it seems pretty much every > > additional logwrites test has wanted the same treatment. If whoever > > wants to convert this newer test over wants to start by refactoring > > things that way, that sounds like a welcome cleanup to me. > > Ugh, I just want to fix this stupid test and move on with the bugfixes, > not refactor every logwrites user in the codebase just to reduce one > test's runtime from hours to 90s. > > It's not as simple as making the logwrites init function set up thinp on > its own, because there's at least one test out there (generic/470) that > takes care of its own discarding, and then there's whatever the strange > stuff that the tests/btrfs/ users do -- it looks fairly simple, but I > don't really want to go digging into that just to make sure I didn't > break their testing. > > I'll send what I have currently, which adds a warning about running > logwrites on a device that supports discard but isn't thinp... in > addition to fixing the xfs log recovery thing, and in addition to fixing > the loop duration. > > I guess I can add yet another patch to switch the replay program to use > BLKDISCARD if the _init function thinks it's ok, but seriously... you > guys need to send start sending patches implementing the new > functionality that you suggest. > Sorry, I should have been more clear. I certainly don't insist on it as an immediate change or to gatekeep the current patch. I'm just acking the idea, and I think it's perfectly fair to say "more time consuming than I have time for right now" if you planned to just fixup the test itself. I may get to it opportunistically someday, or if hch cares enough about it he's certainly capable of picking it up sooner. For future reference, I'm generally not trying to tell people what to do with their patches or force work on people, etc. I realize we have a tendency to do that. I don't like it either. It would be nice if we had a clearer way to express/discuss an idea without implying it as a demand. Brian > --D > > > Brian > > > > >
On Fri, Nov 22, 2024 at 11:33:24AM -0500, Brian Foster wrote: > On Fri, Nov 22, 2024 at 08:13:47AM -0800, Darrick J. Wong wrote: > > On Fri, Nov 22, 2024 at 08:49:42AM -0500, Brian Foster wrote: > > > On Fri, Nov 22, 2024 at 01:31:33PM +0100, Christoph Hellwig wrote: > > > > On Thu, Nov 21, 2024 at 09:11:56AM -0500, Brian Foster wrote: > > > > > > I'm all for speeding up tests. But relying on a unspecified side effect > > > > > > of an operation and then requiring a driver that implements that side > > > > > > effect without documenting that isn't really good practice. > > > > > > > > > > > > > > > > It's a hack to facilitate test coverage. It would obviously need to be > > > > > revisited if behavior changed sufficiently to break the test. > > > > > > > > > > I'm not really sure what you're asking for wrt documentation. A quick > > > > > scan of the git history shows the first such commit is 65cc9a235919 > > > > > ("generic/482: use thin volume as data device"), the commit log for > > > > > which seems to explain the reasoning. > > > > > > > > A comment on _log_writes_init that it must only be used by dm-thin > > > > because it relies on the undocumented behavior that dm-trim zeroes > > > > all blocks discarded. > > > > > > > > Or even better my moving the dm-think setup boilerplate into the log > > > > writes helpers, so that it gets done automatically. > > > > > > > > > > A related idea might be to incorporate your BLKZEROOUT fix so the core > > > tool is fundamentally correct, but then wrap the existing discard > > > behavior in a param or something that the dm-thin oriented tests can > > > pass to enable it as a fast zero hack/optimization. > > > > > > But that all seems reasonable to me either way. I'm not sure that's > > > something I would have fully abstracted into the logwrites stuff > > > initially, but here we are ~5 years later and it seems pretty much every > > > additional logwrites test has wanted the same treatment. If whoever > > > wants to convert this newer test over wants to start by refactoring > > > things that way, that sounds like a welcome cleanup to me. > > > > Ugh, I just want to fix this stupid test and move on with the bugfixes, > > not refactor every logwrites user in the codebase just to reduce one > > test's runtime from hours to 90s. > > > > It's not as simple as making the logwrites init function set up thinp on > > its own, because there's at least one test out there (generic/470) that > > takes care of its own discarding, and then there's whatever the strange > > stuff that the tests/btrfs/ users do -- it looks fairly simple, but I > > don't really want to go digging into that just to make sure I didn't > > break their testing. > > > > I'll send what I have currently, which adds a warning about running > > logwrites on a device that supports discard but isn't thinp... in > > addition to fixing the xfs log recovery thing, and in addition to fixing > > the loop duration. > > > > I guess I can add yet another patch to switch the replay program to use > > BLKDISCARD if the _init function thinks it's ok, but seriously... you > > guys need to send start sending patches implementing the new > > functionality that you suggest. > > > > Sorry, I should have been more clear. I certainly don't insist on it as > an immediate change or to gatekeep the current patch. I'm just acking > the idea, and I think it's perfectly fair to say "more time consuming > than I have time for right now" if you planned to just fixup the test > itself. I may get to it opportunistically someday, or if hch cares > enough about it he's certainly capable of picking it up sooner. > > For future reference, I'm generally not trying to tell people what to do > with their patches or force work on people, etc. I realize we have a > tendency to do that. I don't like it either. It would be nice if we had > a clearer way to express/discuss an idea without implying it as a > demand. /me suggests "Here's something that we ought to do, though as a separate patchset: clean up all the $fubar to be $less_fubar. In the meantime this patch is good enough for now." --D > Brian > > > --D > > > > > Brian > > > > > > > > >
diff --git a/tests/generic/757 b/tests/generic/757 index 0ff5a8ac00182b..9d41975bde07bb 100755 --- a/tests/generic/757 +++ b/tests/generic/757 @@ -63,9 +63,14 @@ prev=$(_log_writes_mark_to_entry_number mkfs) cur=$(_log_writes_find_next_fua $prev) [ -z "$cur" ] && _fail "failed to locate next FUA write" -while [ ! -z "$cur" ]; do +for ((i = 0; i < 100; i++)); do _log_writes_replay_log_range $cur $SCRATCH_DEV >> $seqres.full + # xfs_repair won't run if the log is dirty + if [ $FSTYP = "xfs" ]; then + _scratch_mount + _scratch_unmount + fi _check_scratch_fs prev=$cur