diff mbox series

[13/23] generic/650: revert SOAK DURATION changes

Message ID 173706974273.1927324.11899201065662863518.stgit@frogsfrogsfrogs (mailing list archive)
State New
Headers show
Series [01/23] generic/476: fix fsstress process management | expand

Commit Message

Darrick J. Wong Jan. 16, 2025, 11:28 p.m. UTC
From: Darrick J. Wong <djwong@kernel.org>

Prior to commit 8973af00ec21, in the absence of an explicit
SOAK_DURATION, this test would run 2500 fsstress operations each of ten
times through the loop body.  On the author's machines, this kept the
runtime to about 30s total.  Oddly, this was changed to 30s per loop
body with no specific justification in the middle of an fsstress process
management change.

On the author's machine, this explodes the runtime from ~30s to 420s.
Put things back the way they were.

Cc: <fstests@vger.kernel.org> # v2024.12.08
Fixes: 8973af00ec212f ("fstests: cleanup fsstress process management")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
 tests/generic/650 |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

Comments

Dave Chinner Jan. 21, 2025, 4:57 a.m. UTC | #1
On Thu, Jan 16, 2025 at 03:28:33PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Prior to commit 8973af00ec21, in the absence of an explicit
> SOAK_DURATION, this test would run 2500 fsstress operations each of ten
> times through the loop body.  On the author's machines, this kept the
> runtime to about 30s total.  Oddly, this was changed to 30s per loop
> body with no specific justification in the middle of an fsstress process
> management change.

I'm pretty sure that was because when you run g/650 on a machine
with 64p, the number of ops performed on the filesystem is
nr_cpus * 2500 * nr_loops.

In that case, each loop was taking over 90s to run, so the overall
runtime was up in the 15-20 minute mark. I wanted to cap the runtime
of each loop to min(nr_ops, SOAK_DURATION) so that it ran in about 5
minutes in the worst case i.e. (nr_loops * SOAK_DURATION).

I probably misunderstood how -n nr_ops vs --duration=30 interact;
I expected it to run until either were exhausted, not for duration
to override nr_ops as implied by this:

> On the author's machine, this explodes the runtime from ~30s to 420s.
> Put things back the way they were.

Yeah, OK, that's exactly waht keep_running() does - duration
overrides nr_ops.

Ok, so keeping or reverting the change will simply make different
people unhappy because of the excessive runtime the test has at
either ends of the CPU count spectrum - what's the best way to go
about providing the desired min(nr_ops, max loop time) behaviour?
Do we simply cap the maximum process count to keep the number of ops
down to something reasonable (e.g. 16), or something else?

-Dave.
Theodore Ts'o Jan. 21, 2025, 1 p.m. UTC | #2
On Tue, Jan 21, 2025 at 03:57:23PM +1100, Dave Chinner wrote:
> I probably misunderstood how -n nr_ops vs --duration=30 interact;
> I expected it to run until either were exhausted, not for duration
> to override nr_ops as implied by this:

There are (at least) two ways that a soak duration is being used
today; one is where someone wants to run a very long soak for hours
and where if you go long by an hour or two it's no big deals.  The
other is where you are specifying a soak duration as part of a smoke
test (using the smoketest group), where you might be hoping to keep
the overall run time to 15-20 minutes and so you set SOAK_DURATION to
3m.

(This was based on some research that Darrick did which showed that
running the original 5 tests in the smoketest group gave you most of
the code coverage of running all of the quick group, which had
ballooned from 15 minutes many years ago to an hour or more.  I just
noticed that we've since added two more tests to the smoketest group;
it might be worth checking whether those two new tests addded to thhe
smoketest groups significantly improves code coverage or not.  It
would be unfortunate if the runtime bloat that happened to the quick
group also happens to the smoketest group...)

The bottom line is in addition to trying to design semantics for users
who might be at either end of the CPU count spectrum, we should also
consider that SOAK_DURATION could be set for values ranging from
minutes to hours.

Thanks,

						- Ted
Dave Chinner Jan. 21, 2025, 10:15 p.m. UTC | #3
On Tue, Jan 21, 2025 at 08:00:27AM -0500, Theodore Ts'o wrote:
> On Tue, Jan 21, 2025 at 03:57:23PM +1100, Dave Chinner wrote:
> > I probably misunderstood how -n nr_ops vs --duration=30 interact;
> > I expected it to run until either were exhausted, not for duration
> > to override nr_ops as implied by this:
> 
> There are (at least) two ways that a soak duration is being used
> today; one is where someone wants to run a very long soak for hours
> and where if you go long by an hour or two it's no big deals.  The
> other is where you are specifying a soak duration as part of a smoke
> test (using the smoketest group), where you might be hoping to keep
> the overall run time to 15-20 minutes and so you set SOAK_DURATION to
> 3m.

check-parallel on my 64p machine runs the full auto group test in
under 10 minutes.

i.e. if you have a typical modern server (64-128p, 256GB RAM and a
couple of NVMe SSDs), then check-parallel allows a full test run in
the same time that './check -g smoketest' will run....

> (This was based on some research that Darrick did which showed that
> running the original 5 tests in the smoketest group gave you most of
> the code coverage of running all of the quick group, which had
> ballooned from 15 minutes many years ago to an hour or more.  I just
> noticed that we've since added two more tests to the smoketest group;
> it might be worth checking whether those two new tests addded to thhe
> smoketest groups significantly improves code coverage or not.  It
> would be unfortunate if the runtime bloat that happened to the quick
> group also happens to the smoketest group...)

Yes, and I've previously made the point about how check-parallel
changes the way we should be looking at dev-test cycles. We no
longer have to care that auto group testing takes 4 hours to run and
have to work around that with things like smoketest groups. If you
can run the whole auto test group in 10-15 minutes, then we don't
need "quick", "smoketest", etc to reduce dev-test cycle time
anymore...

> The bottom line is in addition to trying to design semantics for users
> who might be at either end of the CPU count spectrum, we should also
> consider that SOAK_DURATION could be set for values ranging from
> minutes to hours.

I don't see much point in testing for hours with check-parallel. The
whole point of it is to enable iteration across the entire fs test
matrix as fast as possible.

If you want to do long running soak tests, then keep using check for
that. If you want to run the auto group test across 100 different
mkfs option combinations, then that is where check-parallel comes in
- it'll take a few hours to do this instead of a week.

-Dave.
Darrick J. Wong Jan. 22, 2025, 3:49 a.m. UTC | #4
On Tue, Jan 21, 2025 at 03:57:23PM +1100, Dave Chinner wrote:
> On Thu, Jan 16, 2025 at 03:28:33PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Prior to commit 8973af00ec21, in the absence of an explicit
> > SOAK_DURATION, this test would run 2500 fsstress operations each of ten
> > times through the loop body.  On the author's machines, this kept the
> > runtime to about 30s total.  Oddly, this was changed to 30s per loop
> > body with no specific justification in the middle of an fsstress process
> > management change.
> 
> I'm pretty sure that was because when you run g/650 on a machine
> with 64p, the number of ops performed on the filesystem is
> nr_cpus * 2500 * nr_loops.

Where does that happen?

Oh, heh.  -n is the number of ops *per process*.

> In that case, each loop was taking over 90s to run, so the overall
> runtime was up in the 15-20 minute mark. I wanted to cap the runtime
> of each loop to min(nr_ops, SOAK_DURATION) so that it ran in about 5
> minutes in the worst case i.e. (nr_loops * SOAK_DURATION).
> 
> I probably misunderstood how -n nr_ops vs --duration=30 interact;
> I expected it to run until either were exhausted, not for duration
> to override nr_ops as implied by this:

Yeah, SOAK_DURATION overrides pretty much everything.

> > On the author's machine, this explodes the runtime from ~30s to 420s.
> > Put things back the way they were.
> 
> Yeah, OK, that's exactly waht keep_running() does - duration
> overrides nr_ops.
> 
> Ok, so keeping or reverting the change will simply make different
> people unhappy because of the excessive runtime the test has at
> either ends of the CPU count spectrum - what's the best way to go
> about providing the desired min(nr_ops, max loop time) behaviour?
> Do we simply cap the maximum process count to keep the number of ops
> down to something reasonable (e.g. 16), or something else?

How about running fsstress with --duration=3 if SOAK_DURATION isn't set?
That should keep the runtime to 30 seconds or so even on larger
machines:

if [ -n "$SOAK_DURATION" ]; then
	test "$SOAK_DURATION" -lt 10 && SOAK_DURATION=10
	fsstress_args+=(--duration="$((SOAK_DURATION / 10))")
else
	# run for 3s per iteration max for a default runtime of ~30s.
	fsstress_args+=(--duration=3)
fi

--D

> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
>
Darrick J. Wong Jan. 22, 2025, 3:51 a.m. UTC | #5
On Wed, Jan 22, 2025 at 09:15:48AM +1100, Dave Chinner wrote:
> On Tue, Jan 21, 2025 at 08:00:27AM -0500, Theodore Ts'o wrote:
> > On Tue, Jan 21, 2025 at 03:57:23PM +1100, Dave Chinner wrote:
> > > I probably misunderstood how -n nr_ops vs --duration=30 interact;
> > > I expected it to run until either were exhausted, not for duration
> > > to override nr_ops as implied by this:
> > 
> > There are (at least) two ways that a soak duration is being used
> > today; one is where someone wants to run a very long soak for hours
> > and where if you go long by an hour or two it's no big deals.  The
> > other is where you are specifying a soak duration as part of a smoke
> > test (using the smoketest group), where you might be hoping to keep
> > the overall run time to 15-20 minutes and so you set SOAK_DURATION to
> > 3m.
> 
> check-parallel on my 64p machine runs the full auto group test in
> under 10 minutes.
> 
> i.e. if you have a typical modern server (64-128p, 256GB RAM and a
> couple of NVMe SSDs), then check-parallel allows a full test run in
> the same time that './check -g smoketest' will run....
> 
> > (This was based on some research that Darrick did which showed that
> > running the original 5 tests in the smoketest group gave you most of
> > the code coverage of running all of the quick group, which had
> > ballooned from 15 minutes many years ago to an hour or more.  I just
> > noticed that we've since added two more tests to the smoketest group;
> > it might be worth checking whether those two new tests addded to thhe
> > smoketest groups significantly improves code coverage or not.  It
> > would be unfortunate if the runtime bloat that happened to the quick
> > group also happens to the smoketest group...)
> 
> Yes, and I've previously made the point about how check-parallel
> changes the way we should be looking at dev-test cycles. We no
> longer have to care that auto group testing takes 4 hours to run and
> have to work around that with things like smoketest groups. If you
> can run the whole auto test group in 10-15 minutes, then we don't
> need "quick", "smoketest", etc to reduce dev-test cycle time
> anymore...
> 
> > The bottom line is in addition to trying to design semantics for users
> > who might be at either end of the CPU count spectrum, we should also
> > consider that SOAK_DURATION could be set for values ranging from
> > minutes to hours.
> 
> I don't see much point in testing for hours with check-parallel. The
> whole point of it is to enable iteration across the entire fs test
> matrix as fast as possible.

I do -- running all the soak tests in parallel on a (probably old lower
spec) machine.  Parallelism is all right for a lot of things.

--D

> If you want to do long running soak tests, then keep using check for
> that. If you want to run the auto group test across 100 different
> mkfs option combinations, then that is where check-parallel comes in
> - it'll take a few hours to do this instead of a week.
> 
> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
>
Theodore Ts'o Jan. 22, 2025, 4:08 a.m. UTC | #6
On Wed, Jan 22, 2025 at 09:15:48AM +1100, Dave Chinner wrote:
> check-parallel on my 64p machine runs the full auto group test in
> under 10 minutes.
> 
> i.e. if you have a typical modern server (64-128p, 256GB RAM and a
> couple of NVMe SSDs), then check-parallel allows a full test run in
> the same time that './check -g smoketest' will run....

Interesting.  I would have thought that even with NVMe SSD's, you'd be
I/O speed constrained, especially given that some of the tests
(especially the ENOSPC hitters) can take quite a lot of time to fill
the storage device, even if they are using fallocate.

How do you have your test and scratch devices configured?

> Yes, and I've previously made the point about how check-parallel
> changes the way we should be looking at dev-test cycles. We no
> longer have to care that auto group testing takes 4 hours to run and
> have to work around that with things like smoketest groups. If you
> can run the whole auto test group in 10-15 minutes, then we don't
> need "quick", "smoketest", etc to reduce dev-test cycle time
> anymore...

Well, yes, if the only consideration is test run time latency.

I can think of two off-setting considerations.  The first is if you
care about cost.  The cheapest you can get a 64 CPU, 24 GiB VM on
Google Cloud is $3.04 USD/hour (n1-stndard-64 in a Iowa data center),
so ten minutes of run time is about 51 cents USD (ignoring the storage
costs).  Not bad.  But running xfs/4k on the auto group on an
e2-standard-2 VM takes 3.2 hours; but the e2-standard-2 VM is much
cheaper, coming in at $0.087651 USD/ hour.  So that translates to 28
cents for the VM, and that's not taking into account the fact you
almost certainly much more expensive, high-performance storge to
support the 64 CPU VM.  So if you don't care about time to run
completion (for example, if I'm monitoring the 5.15, 6.1, 6.6, and
6.12 LTS LTS rc git trees, and kicking off a build whenever Greg or
Sasha updates them), using a serialized xfstests is going to be
cheaper because you can use less expensive cloud resources.

The second concern is that for certain class of failures (UBSAN,
KCSAN, Lockdep, RCU soft lockups, WARN_ON, BUG_ON, and other
panics/OOPS), if you are runnig 64 tests in parllel it might not be
obvious which test caused the failure.  Today, even if the test VM
crashes or hangs, I can have test manager (which runs on a e2-small VM
costing $0.021913 USD/hour and can manage dozens of test VM's all at the
same time), can restart the test VM, and we know which test is at at
fault, and we mark that a particular test with the Junit XML status of
"error" (as distinct from "success" or "failure").  If there are 64
test runs in parallel, if I wanted to have automated recovery if the
test appliance hangs or crashes, life gets a lot more complicated.....
I suppose we could have the human (or test automation) try run each
individual test that had been running at the time of the crash but
that's a lot more complicated, and what if the tests pass when run
once at a time?  I guess we should happen that check-parallel found a
bug that plain check didn't find, but the human being still has to
root cause the failure.

Cheers,

						- Ted
Dave Chinner Jan. 22, 2025, 4:12 a.m. UTC | #7
On Tue, Jan 21, 2025 at 07:49:44PM -0800, Darrick J. Wong wrote:
> On Tue, Jan 21, 2025 at 03:57:23PM +1100, Dave Chinner wrote:
> > On Thu, Jan 16, 2025 at 03:28:33PM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > Prior to commit 8973af00ec21, in the absence of an explicit
> > > SOAK_DURATION, this test would run 2500 fsstress operations each of ten
> > > times through the loop body.  On the author's machines, this kept the
> > > runtime to about 30s total.  Oddly, this was changed to 30s per loop
> > > body with no specific justification in the middle of an fsstress process
> > > management change.
> > 
> > I'm pretty sure that was because when you run g/650 on a machine
> > with 64p, the number of ops performed on the filesystem is
> > nr_cpus * 2500 * nr_loops.
> 
> Where does that happen?
> 
> Oh, heh.  -n is the number of ops *per process*.

Yeah, I just noticed another case of this:

Ten slowest tests - runtime in seconds:
generic/750 559
generic/311 486
.....

generic/750 does:

nr_cpus=$((LOAD_FACTOR * 4))
nr_ops=$((25000 * nr_cpus * TIME_FACTOR))
fsstress_args=(-w -d $SCRATCH_MNT -n $nr_ops -p $nr_cpus)

So the actual load factor increase is exponential:

Load factor	nr_cpus		nr_ops		total ops
1		4		100k		400k
2		8		200k		1.6M
3		12		300k		3.6M
4		16		400k		6.4M

and so on.

I suspect that there are other similar cpu scaling issues
lurking across the many fsstress tests...

> > > On the author's machine, this explodes the runtime from ~30s to 420s.
> > > Put things back the way they were.
> > 
> > Yeah, OK, that's exactly waht keep_running() does - duration
> > overrides nr_ops.
> > 
> > Ok, so keeping or reverting the change will simply make different
> > people unhappy because of the excessive runtime the test has at
> > either ends of the CPU count spectrum - what's the best way to go
> > about providing the desired min(nr_ops, max loop time) behaviour?
> > Do we simply cap the maximum process count to keep the number of ops
> > down to something reasonable (e.g. 16), or something else?
> 
> How about running fsstress with --duration=3 if SOAK_DURATION isn't set?
> That should keep the runtime to 30 seconds or so even on larger
> machines:
> 
> if [ -n "$SOAK_DURATION" ]; then
> 	test "$SOAK_DURATION" -lt 10 && SOAK_DURATION=10
> 	fsstress_args+=(--duration="$((SOAK_DURATION / 10))")
> else
> 	# run for 3s per iteration max for a default runtime of ~30s.
> 	fsstress_args+=(--duration=3)
> fi

Yeah, that works for me.

As a rainy day project, perhaps we should look to convert all the
fsstress invocations to be time bound rather than running a specific
number of ops. i.e. hard code nr_ops=<some huge number> in
_run_fstress_bg() and the tests only need to define parallelism and
runtime.

This would make the test runtimes more deterministic across machines
with vastly different capabilities and and largely make "test xyz is
slow on my test machine" reports largely go away.

Thoughts?

-Dave.
Darrick J. Wong Jan. 22, 2025, 4:37 a.m. UTC | #8
On Wed, Jan 22, 2025 at 03:12:11PM +1100, Dave Chinner wrote:
> On Tue, Jan 21, 2025 at 07:49:44PM -0800, Darrick J. Wong wrote:
> > On Tue, Jan 21, 2025 at 03:57:23PM +1100, Dave Chinner wrote:
> > > On Thu, Jan 16, 2025 at 03:28:33PM -0800, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > 
> > > > Prior to commit 8973af00ec21, in the absence of an explicit
> > > > SOAK_DURATION, this test would run 2500 fsstress operations each of ten
> > > > times through the loop body.  On the author's machines, this kept the
> > > > runtime to about 30s total.  Oddly, this was changed to 30s per loop
> > > > body with no specific justification in the middle of an fsstress process
> > > > management change.
> > > 
> > > I'm pretty sure that was because when you run g/650 on a machine
> > > with 64p, the number of ops performed on the filesystem is
> > > nr_cpus * 2500 * nr_loops.
> > 
> > Where does that happen?
> > 
> > Oh, heh.  -n is the number of ops *per process*.
> 
> Yeah, I just noticed another case of this:
> 
> Ten slowest tests - runtime in seconds:
> generic/750 559
> generic/311 486
> .....
> 
> generic/750 does:
> 
> nr_cpus=$((LOAD_FACTOR * 4))
> nr_ops=$((25000 * nr_cpus * TIME_FACTOR))
> fsstress_args=(-w -d $SCRATCH_MNT -n $nr_ops -p $nr_cpus)
> 
> So the actual load factor increase is exponential:
> 
> Load factor	nr_cpus		nr_ops		total ops
> 1		4		100k		400k
> 2		8		200k		1.6M
> 3		12		300k		3.6M
> 4		16		400k		6.4M
> 
> and so on.
> 
> I suspect that there are other similar cpu scaling issues
> lurking across the many fsstress tests...
> 
> > > > On the author's machine, this explodes the runtime from ~30s to 420s.
> > > > Put things back the way they were.
> > > 
> > > Yeah, OK, that's exactly waht keep_running() does - duration
> > > overrides nr_ops.
> > > 
> > > Ok, so keeping or reverting the change will simply make different
> > > people unhappy because of the excessive runtime the test has at
> > > either ends of the CPU count spectrum - what's the best way to go
> > > about providing the desired min(nr_ops, max loop time) behaviour?
> > > Do we simply cap the maximum process count to keep the number of ops
> > > down to something reasonable (e.g. 16), or something else?
> > 
> > How about running fsstress with --duration=3 if SOAK_DURATION isn't set?
> > That should keep the runtime to 30 seconds or so even on larger
> > machines:
> > 
> > if [ -n "$SOAK_DURATION" ]; then
> > 	test "$SOAK_DURATION" -lt 10 && SOAK_DURATION=10
> > 	fsstress_args+=(--duration="$((SOAK_DURATION / 10))")
> > else
> > 	# run for 3s per iteration max for a default runtime of ~30s.
> > 	fsstress_args+=(--duration=3)
> > fi
> 
> Yeah, that works for me.
> 
> As a rainy day project, perhaps we should look to convert all the
> fsstress invocations to be time bound rather than running a specific
> number of ops. i.e. hard code nr_ops=<some huge number> in
> _run_fstress_bg() and the tests only need to define parallelism and
> runtime.

I /think/ the only ones that do that are generic/1220 generic/476
generic/642 generic/750.  I could drop the nr_cpus term from the nr_ops
calculation.

> This would make the test runtimes more deterministic across machines
> with vastly different capabilities and and largely make "test xyz is
> slow on my test machine" reports largely go away.
> 
> Thoughts?

I'm fine with _run_fsstress injecting --duration=30 if no other duration
argument is passed in.

--D

> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
>
Dave Chinner Jan. 22, 2025, 6:01 a.m. UTC | #9
On Tue, Jan 21, 2025 at 11:08:39PM -0500, Theodore Ts'o wrote:
> On Wed, Jan 22, 2025 at 09:15:48AM +1100, Dave Chinner wrote:
> > check-parallel on my 64p machine runs the full auto group test in
> > under 10 minutes.
> > 
> > i.e. if you have a typical modern server (64-128p, 256GB RAM and a
> > couple of NVMe SSDs), then check-parallel allows a full test run in
> > the same time that './check -g smoketest' will run....
> 
> Interesting.  I would have thought that even with NVMe SSD's, you'd be
> I/O speed constrained, especially given that some of the tests
> (especially the ENOSPC hitters) can take quite a lot of time to fill
> the storage device, even if they are using fallocate.

You haven't looked at how check-parallel works, have you? :/

> How do you have your test and scratch devices configured?

Please go and read the check-parallel script. It does all the
per-runner process test and scratch device configuration itself
using loop devices.

> > Yes, and I've previously made the point about how check-parallel
> > changes the way we should be looking at dev-test cycles. We no
> > longer have to care that auto group testing takes 4 hours to run and
> > have to work around that with things like smoketest groups. If you
> > can run the whole auto test group in 10-15 minutes, then we don't
> > need "quick", "smoketest", etc to reduce dev-test cycle time
> > anymore...
> 
> Well, yes, if the only consideration is test run time latency.

Sure.

> I can think of two off-setting considerations.  The first is if you
> care about cost.

Which I really don't care about.

That's something for a QE organisation to worry about, and it's up
to them to make the best use of the tools they have within the
budget they have.

> The second concern is that for certain class of failures (UBSAN,
> KCSAN, Lockdep, RCU soft lockups, WARN_ON, BUG_ON, and other
> panics/OOPS), if you are runnig 64 tests in parllel it might not be
> obvious which test caused the failure.

Then multiple tests will fail with the same dmesg error, but it's
generally pretty clear which of the tests caused it. Yes, it's a bit
more work to isolate the specific test, but it's not hard or any
different to how a test failure is debugged now.

If you want to automate such failures, then my process is to grep
the log files for all the tests that failed with a dmesg error then
run them again using check instead of check-parallel.  Then I get
exactly which test generated the dmesg output without having to put
time into trying to work out which test triggered the failure.

> Today, even if the test VM
> crashes or hangs, I can have test manager (which runs on a e2-small VM
> costing $0.021913 USD/hour and can manage dozens of test VM's all at the
> same time), can restart the test VM, and we know which test is at at
> fault, and we mark that a particular test with the Junit XML status of
> "error" (as distinct from "success" or "failure").  If there are 64
> test runs in parallel, if I wanted to have automated recovery if the
> test appliance hangs or crashes, life gets a lot more complicated.....

Not really. Both dmesg and the results files will have tracked all
the tests inflight when the system crashes, so it's just an extra
step to extract all those tests and run them again using check
and/or check-parallel to further isolate which test caused the
failure....

I'm sure this could be automated eventually, but that's way down my
priority list right now.

> I suppose we could have the human (or test automation) try run each
> individual test that had been running at the time of the crash but
> that's a lot more complicated, and what if the tests pass when run
> once at a time?  I guess we should happen that check-parallel found a
> bug that plain check didn't find, but the human being still has to
> root cause the failure.

Yes. This is no different to a test that is flakey or compeltely
fails when run serially by check multiple times. You still need a
human to find the root cause of the failure.

Nobody is being forced to change their tooling or processes to use
check-parallel if they don't want or need to. It is an alternative
method for running the tests within the fstests suite - if using
check meets your needs, there is no reason to use check-parallel or
even care that it exists...

-Dave.
Darrick J. Wong Jan. 22, 2025, 7:02 a.m. UTC | #10
On Wed, Jan 22, 2025 at 05:01:47PM +1100, Dave Chinner wrote:
> On Tue, Jan 21, 2025 at 11:08:39PM -0500, Theodore Ts'o wrote:
> > On Wed, Jan 22, 2025 at 09:15:48AM +1100, Dave Chinner wrote:
> > > check-parallel on my 64p machine runs the full auto group test in
> > > under 10 minutes.
> > > 
> > > i.e. if you have a typical modern server (64-128p, 256GB RAM and a
> > > couple of NVMe SSDs), then check-parallel allows a full test run in
> > > the same time that './check -g smoketest' will run....
> > 
> > Interesting.  I would have thought that even with NVMe SSD's, you'd be
> > I/O speed constrained, especially given that some of the tests
> > (especially the ENOSPC hitters) can take quite a lot of time to fill
> > the storage device, even if they are using fallocate.
> 
> You haven't looked at how check-parallel works, have you? :/
> 
> > How do you have your test and scratch devices configured?
> 
> Please go and read the check-parallel script. It does all the
> per-runner process test and scratch device configuration itself
> using loop devices.
> 
> > > Yes, and I've previously made the point about how check-parallel
> > > changes the way we should be looking at dev-test cycles. We no
> > > longer have to care that auto group testing takes 4 hours to run and
> > > have to work around that with things like smoketest groups. If you
> > > can run the whole auto test group in 10-15 minutes, then we don't
> > > need "quick", "smoketest", etc to reduce dev-test cycle time
> > > anymore...
> > 
> > Well, yes, if the only consideration is test run time latency.
> 
> Sure.
> 
> > I can think of two off-setting considerations.  The first is if you
> > care about cost.
> 
> Which I really don't care about.
> 
> That's something for a QE organisation to worry about, and it's up
> to them to make the best use of the tools they have within the
> budget they have.
> 
> > The second concern is that for certain class of failures (UBSAN,
> > KCSAN, Lockdep, RCU soft lockups, WARN_ON, BUG_ON, and other
> > panics/OOPS), if you are runnig 64 tests in parllel it might not be
> > obvious which test caused the failure.
> 
> Then multiple tests will fail with the same dmesg error, but it's
> generally pretty clear which of the tests caused it. Yes, it's a bit
> more work to isolate the specific test, but it's not hard or any
> different to how a test failure is debugged now.
> 
> If you want to automate such failures, then my process is to grep
> the log files for all the tests that failed with a dmesg error then
> run them again using check instead of check-parallel.  Then I get
> exactly which test generated the dmesg output without having to put
> time into trying to work out which test triggered the failure.
> 
> > Today, even if the test VM
> > crashes or hangs, I can have test manager (which runs on a e2-small VM
> > costing $0.021913 USD/hour and can manage dozens of test VM's all at the
> > same time), can restart the test VM, and we know which test is at at
> > fault, and we mark that a particular test with the Junit XML status of
> > "error" (as distinct from "success" or "failure").  If there are 64
> > test runs in parallel, if I wanted to have automated recovery if the
> > test appliance hangs or crashes, life gets a lot more complicated.....
> 
> Not really. Both dmesg and the results files will have tracked all
> the tests inflight when the system crashes, so it's just an extra
> step to extract all those tests and run them again using check
> and/or check-parallel to further isolate which test caused the
> failure....

That reminds me to go see if ./check actually fsyncs the state and
report files and whatnot between tests, so that we have a better chance
of figuring out where exactly fstests blew up the machine.

(Luckily xfs is stable enough I haven't had a machine explode in quite
some time, good job everyone! :))

--D

> I'm sure this could be automated eventually, but that's way down my
> priority list right now.
> 
> > I suppose we could have the human (or test automation) try run each
> > individual test that had been running at the time of the crash but
> > that's a lot more complicated, and what if the tests pass when run
> > once at a time?  I guess we should happen that check-parallel found a
> > bug that plain check didn't find, but the human being still has to
> > root cause the failure.
> 
> Yes. This is no different to a test that is flakey or compeltely
> fails when run serially by check multiple times. You still need a
> human to find the root cause of the failure.
> 
> Nobody is being forced to change their tooling or processes to use
> check-parallel if they don't want or need to. It is an alternative
> method for running the tests within the fstests suite - if using
> check meets your needs, there is no reason to use check-parallel or
> even care that it exists...
> 
> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
>
diff mbox series

Patch

diff --git a/tests/generic/650 b/tests/generic/650
index 60f86fdf518961..d376488f2fedeb 100755
--- a/tests/generic/650
+++ b/tests/generic/650
@@ -68,11 +68,8 @@  test "$nr_cpus" -gt 1024 && nr_cpus="$nr_hotplug_cpus"
 fsstress_args+=(-p $nr_cpus)
 if [ -n "$SOAK_DURATION" ]; then
 	test "$SOAK_DURATION" -lt 10 && SOAK_DURATION=10
-else
-	# run for 30s per iteration max
-	SOAK_DURATION=300
+	fsstress_args+=(--duration="$((SOAK_DURATION / 10))")
 fi
-fsstress_args+=(--duration="$((SOAK_DURATION / 10))")
 
 nr_ops=$((2500 * TIME_FACTOR))
 fsstress_args+=(-n $nr_ops)