Message ID | 168123682679.4086541.13812285218510940665.stgit@frogsfrogsfrogs (mailing list archive) |
---|---|
Headers | show |
Series | fstests: direct specification of looping test duration | expand |
On Tue, Apr 11, 2023 at 11:13:46AM -0700, Darrick J. Wong wrote: > Hi all, > > One of the things that I do as a maintainer is to designate a handful of > VMs to run fstests for unusually long periods of time. This practice I > call long term soak testing. There are actually three separate fleets > for this -- one runs alongside the nightly builds, one runs alongside > weekly rebases, and the last one runs stable releases. > > My interactions with all three fleets is pretty much the same -- load > current builds of software, and try to run the exerciser tests for a > duration of time -- 12 hours, 6.5 days, 30 days, etc. TIME_FACTOR does > not work well for this usage model, because it is difficult to guess > the correct time factor given that the VMs are hetergeneous and the IO > completion rate is not perfectly predictable. > > Worse yet, if you want to run (say) all the recoveryloop tests on one VM > (because recoveryloop is prone to crashing), it's impossible to set a > TIME_FACTOR so that each loop test gets equal runtime. That can be > hacked around with config sections, but that doesn't solve the first > problem. > > This series introduces a new configuration variable, SOAK_DURATION, that > allows test runners to control directly various long soak and looping > recovery tests. This is intended to be an alternative to TIME_FACTOR, > since that variable usually adjusts operation counts, which are > proportional to runtime but otherwise not a direct measure of time. > > With this override in place, I can configure the long soak fleet to run > for exactly as long as I want them to, and they actually hit the time > budget targets. The recoveryloop fleet now divides looping-test time > equally among the four that are in that group so that they all get ~3 > hours of coverage every night. > > There are more tests that could use this than I actually modified here, > but I've done enough to show this off as a proof of concept. > > If you're going to start using this mess, you probably ought to just > pull from my git trees, which are linked below. > > This is an extraordinary way to destroy everything. Enjoy! > Comments and questions are, as always, welcome. > > --D > > fstests git tree: > https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=soak-duration > --- > check | 14 +++++++++ > common/config | 7 ++++ > common/fuzzy | 7 ++++ > common/rc | 34 +++++++++++++++++++++ > common/report | 1 + > ltp/fsstress.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++-- > ltp/fsx.c | 50 +++++++++++++++++++++++++++++++ > src/soak_duration.awk | 23 ++++++++++++++ > tests/generic/019 | 1 + > tests/generic/388 | 2 + > tests/generic/475 | 2 + > tests/generic/476 | 7 +++- > tests/generic/482 | 5 +++ > tests/generic/521 | 1 + > tests/generic/522 | 1 + > tests/generic/642 | 1 + > tests/generic/648 | 8 +++-- > 17 files changed, 229 insertions(+), 13 deletions(-) > create mode 100644 src/soak_duration.awk > The set looks good to me (the second commit has different var name, but fine by me) Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com>
On Thu, Apr 13, 2023 at 12:48:36PM +0200, Andrey Albershteyn wrote: > On Tue, Apr 11, 2023 at 11:13:46AM -0700, Darrick J. Wong wrote: > > Hi all, > > > > One of the things that I do as a maintainer is to designate a handful of > > VMs to run fstests for unusually long periods of time. This practice I > > call long term soak testing. There are actually three separate fleets > > for this -- one runs alongside the nightly builds, one runs alongside > > weekly rebases, and the last one runs stable releases. > > > > My interactions with all three fleets is pretty much the same -- load > > current builds of software, and try to run the exerciser tests for a > > duration of time -- 12 hours, 6.5 days, 30 days, etc. TIME_FACTOR does > > not work well for this usage model, because it is difficult to guess > > the correct time factor given that the VMs are hetergeneous and the IO > > completion rate is not perfectly predictable. > > > > Worse yet, if you want to run (say) all the recoveryloop tests on one VM > > (because recoveryloop is prone to crashing), it's impossible to set a > > TIME_FACTOR so that each loop test gets equal runtime. That can be > > hacked around with config sections, but that doesn't solve the first > > problem. > > > > This series introduces a new configuration variable, SOAK_DURATION, that > > allows test runners to control directly various long soak and looping > > recovery tests. This is intended to be an alternative to TIME_FACTOR, > > since that variable usually adjusts operation counts, which are > > proportional to runtime but otherwise not a direct measure of time. > > > > With this override in place, I can configure the long soak fleet to run > > for exactly as long as I want them to, and they actually hit the time > > budget targets. The recoveryloop fleet now divides looping-test time > > equally among the four that are in that group so that they all get ~3 > > hours of coverage every night. > > > > There are more tests that could use this than I actually modified here, > > but I've done enough to show this off as a proof of concept. > > > > If you're going to start using this mess, you probably ought to just > > pull from my git trees, which are linked below. > > > > This is an extraordinary way to destroy everything. Enjoy! > > Comments and questions are, as always, welcome. > > > > --D > > > > fstests git tree: > > https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=soak-duration > > --- > > check | 14 +++++++++ > > common/config | 7 ++++ > > common/fuzzy | 7 ++++ > > common/rc | 34 +++++++++++++++++++++ > > common/report | 1 + > > ltp/fsstress.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++-- > > ltp/fsx.c | 50 +++++++++++++++++++++++++++++++ > > src/soak_duration.awk | 23 ++++++++++++++ > > tests/generic/019 | 1 + > > tests/generic/388 | 2 + > > tests/generic/475 | 2 + > > tests/generic/476 | 7 +++- > > tests/generic/482 | 5 +++ > > tests/generic/521 | 1 + > > tests/generic/522 | 1 + > > tests/generic/642 | 1 + > > tests/generic/648 | 8 +++-- > > 17 files changed, 229 insertions(+), 13 deletions(-) > > create mode 100644 src/soak_duration.awk > > > > The set looks good to me (the second commit has different var name, > but fine by me) Which variable name, specifically? --D > Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com> > > -- > - Andrey >
On Thu, Apr 13, 2023 at 07:47:08AM -0700, Darrick J. Wong wrote: > On Thu, Apr 13, 2023 at 12:48:36PM +0200, Andrey Albershteyn wrote: > > > tests/generic/648 | 8 +++-- > > > 17 files changed, 229 insertions(+), 13 deletions(-) > > > create mode 100644 src/soak_duration.awk > > > > > > > The set looks good to me (the second commit has different var name, > > but fine by me) > > Which variable name, specifically? STRESS_DURATION in the commit message
On Thu, Apr 13, 2023 at 05:43:52PM +0200, Andrey Albershteyn wrote: > On Thu, Apr 13, 2023 at 07:47:08AM -0700, Darrick J. Wong wrote: > > On Thu, Apr 13, 2023 at 12:48:36PM +0200, Andrey Albershteyn wrote: > > > > tests/generic/648 | 8 +++-- > > > > 17 files changed, 229 insertions(+), 13 deletions(-) > > > > create mode 100644 src/soak_duration.awk > > > > > > > > > > The set looks good to me (the second commit has different var name, > > > but fine by me) > > > > Which variable name, specifically? > > STRESS_DURATION in the commit message Ah, will fix and resend. Thanks for noticing! --D > -- > - Andrey >