diff mbox series

[net-next,v2,2/2] selftests: net: ksft: support marking tests as disruptive

Message ID 20240730223932.3432862-2-sdf@fomichev.me (mailing list archive)
State New
Headers show
Series [net-next,v2,1/2] selftests: net-drv: exercise queue stats when the device is down | expand

Commit Message

Stanislav Fomichev July 30, 2024, 10:39 p.m. UTC
Add new @ksft_disruptive decorator to mark the tests that might
be disruptive to the system. Depending on how well the previous
test works in the CI we might want to disable disruptive tests
by default and only let the developers run them manually.

KSFT framework runs disruptive tests by default. DISRUPTIVE=False
environment (or config file) can be used to disable these tests.
ksft_setup should be called by the test cases that want to use
new decorator (ksft_setup is only called via NetDrvEnv/NetDrvEpEnv for now).

In the future we can add similar decorators to, for example, avoid
running slow tests all the time. And/or have some option to run
only 'fast' tests for some sort of smoke test scenario.

  $ DISRUPTIVE=False ./stats.py
  KTAP version 1
  1..5
  ok 1 stats.check_pause
  ok 2 stats.check_fec
  ok 3 stats.pkt_byte_sum
  ok 4 stats.qstat_by_ifindex
  ok 5 stats.check_down # SKIP marked as disruptive
  # Totals: pass:4 fail:0 xfail:0 xpass:0 skip:1 error:0

v2:
- convert from cli argument to env variable (Jakub)

Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
--
Cc: Shuah Khan <shuah@kernel.org>
Cc: Joe Damato <jdamato@fastly.com>
Cc: Petr Machata <petrm@nvidia.com>
Cc: linux-kselftest@vger.kernel.org
---
 .../selftests/drivers/net/lib/py/env.py       |  5 +--
 tools/testing/selftests/drivers/net/stats.py  |  2 ++
 tools/testing/selftests/net/lib/py/ksft.py    | 32 +++++++++++++++++++
 3 files changed, 37 insertions(+), 2 deletions(-)

Comments

Petr Machata July 31, 2024, 11:55 a.m. UTC | #1
Stanislav Fomichev <sdf@fomichev.me> writes:

> Add new @ksft_disruptive decorator to mark the tests that might
> be disruptive to the system. Depending on how well the previous
> test works in the CI we might want to disable disruptive tests
> by default and only let the developers run them manually.
>
> KSFT framework runs disruptive tests by default. DISRUPTIVE=False
> environment (or config file) can be used to disable these tests.
> ksft_setup should be called by the test cases that want to use
> new decorator (ksft_setup is only called via NetDrvEnv/NetDrvEpEnv for now).

Is that something that tests would want to genuinely do, manage this
stuff by hand? I don't really mind having the helper globally
accessible, but default I'd keep it inside env.py and expect others to
inherit appropriately.

> @@ -127,6 +129,36 @@ KSFT_RESULT_ALL = True
>              KSFT_RESULT = False
>  
>  
> +def ksft_disruptive(func):
> +    """
> +    Decorator that marks the test as disruptive (e.g. the test
> +    that can down the interface). Disruptive tests can be skipped
> +    by passing DISRUPTIVE=False environment variable.
> +    """
> +
> +    @functools.wraps(func)
> +    def wrapper(*args, **kwargs):
> +        if not KSFT_DISRUPTIVE:
> +            raise KsftSkipEx(f"marked as disruptive")

Since this is a skip, it will fail the overall run. But that happened
because the user themselves set DISRUPTIVE=0 to avoid, um, disruption to
the system. I think it should either be xfail, or something else
dedicated that conveys the idea that we didn't run the test, but that's
fine.

Using xfail for this somehow doesn't seem correct, nothing failed. Maybe
we need KsftOmitEx, which would basically be an xfail with a more
appropriate name?

> +def ksft_setup(env):
> +    """
> +    Setup test framework global state from the environment.
> +    """
> +
> +    def get_bool(env, name):
> +        return env.get(name, "").lower() in ["true", "1"]

"yes" should alse be considered, for compatibility with the bash
selftests.

It's also odd that 0 is false, 1 is true, but 2 is false again. How
about something like this?

    def get_bool(env, name):
        value = env.get(name, "").lower()
        if value in ["yes", "true"]:
            return True
        if value in ["no", "false"]:
            return False

        try:
            return bool(int(value))
        except:
            raise something something invalid value

So that people at least know if they set it to nonsense that it's
nonsense?

Dunno. The bash selftests just take "yes" and don't care about being
very user friendly in that regard at all. _load_env_file() likewise
looks like it just takes strings and doesn't care about the semantics.
So I don't feel too strongly about this at all. Besides the "yes" bit,
that should be recognized.
Stanislav Fomichev July 31, 2024, 8:47 p.m. UTC | #2
On 07/31, Petr Machata wrote:
> 
> Stanislav Fomichev <sdf@fomichev.me> writes:
> 
> > Add new @ksft_disruptive decorator to mark the tests that might
> > be disruptive to the system. Depending on how well the previous
> > test works in the CI we might want to disable disruptive tests
> > by default and only let the developers run them manually.
> >
> > KSFT framework runs disruptive tests by default. DISRUPTIVE=False
> > environment (or config file) can be used to disable these tests.
> > ksft_setup should be called by the test cases that want to use
> > new decorator (ksft_setup is only called via NetDrvEnv/NetDrvEpEnv for now).
> 
> Is that something that tests would want to genuinely do, manage this
> stuff by hand? I don't really mind having the helper globally
> accessible, but default I'd keep it inside env.py and expect others to
> inherit appropriately.

Hard to say how well it's gonna work tbh. But at least from
what I've seen, large code bases (outside of kernel) usually
have some way to attach metadata to the testcase to indicate
various things. For example, this is how the timeout
can be controlled:

https://bazel.build/reference/test-encyclopedia#role-test-runner

So I'd imagine we can eventually have @kstf_short/@ksft_long to
control that using similar techniques.

Regarding keeping it inside env.py: can you expand more on what
you mean by having the default in env.py?

> > @@ -127,6 +129,36 @@ KSFT_RESULT_ALL = True
> >              KSFT_RESULT = False
> >  
> >  
> > +def ksft_disruptive(func):
> > +    """
> > +    Decorator that marks the test as disruptive (e.g. the test
> > +    that can down the interface). Disruptive tests can be skipped
> > +    by passing DISRUPTIVE=False environment variable.
> > +    """
> > +
> > +    @functools.wraps(func)
> > +    def wrapper(*args, **kwargs):
> > +        if not KSFT_DISRUPTIVE:
> > +            raise KsftSkipEx(f"marked as disruptive")
> 
> Since this is a skip, it will fail the overall run. But that happened
> because the user themselves set DISRUPTIVE=0 to avoid, um, disruption to
> the system. I think it should either be xfail, or something else
> dedicated that conveys the idea that we didn't run the test, but that's
> fine.
> 
> Using xfail for this somehow doesn't seem correct, nothing failed. Maybe
> we need KsftOmitEx, which would basically be an xfail with a more
> appropriate name?

Are you sure skip will fail the overall run? At least looking at
tools/testing/selftests/net/lib/py/ksft.py, both skip and xfail are
considered KSFT_RESULT=True. Or am I looking at the wrong place?

> > +def ksft_setup(env):
> > +    """
> > +    Setup test framework global state from the environment.
> > +    """
> > +
> > +    def get_bool(env, name):
> > +        return env.get(name, "").lower() in ["true", "1"]
> 
> "yes" should alse be considered, for compatibility with the bash
> selftests.
> 
> It's also odd that 0 is false, 1 is true, but 2 is false again. How
> about something like this?
> 
>     def get_bool(env, name):
>         value = env.get(name, "").lower()
>         if value in ["yes", "true"]:
>             return True
>         if value in ["no", "false"]:
>             return False
> 
>         try:
>             return bool(int(value))
>         except:
>             raise something something invalid value
> 
> So that people at least know if they set it to nonsense that it's
> nonsense?
> 
> Dunno. The bash selftests just take "yes" and don't care about being
> very user friendly in that regard at all. _load_env_file() likewise
> looks like it just takes strings and doesn't care about the semantics.
> So I don't feel too strongly about this at all. Besides the "yes" bit,
> that should be recognized.

Sure, will do!

(will also apply your suggestions for 1/2 so want reply separately)
Petr Machata Aug. 1, 2024, 8:36 a.m. UTC | #3
Stanislav Fomichev <sdf@fomichev.me> writes:

> On 07/31, Petr Machata wrote:
>> 
>> Stanislav Fomichev <sdf@fomichev.me> writes:
>> 
>> > Add new @ksft_disruptive decorator to mark the tests that might
>> > be disruptive to the system. Depending on how well the previous
>> > test works in the CI we might want to disable disruptive tests
>> > by default and only let the developers run them manually.
>> >
>> > KSFT framework runs disruptive tests by default. DISRUPTIVE=False
>> > environment (or config file) can be used to disable these tests.
>> > ksft_setup should be called by the test cases that want to use
>> > new decorator (ksft_setup is only called via NetDrvEnv/NetDrvEpEnv for now).
>> 
>> Is that something that tests would want to genuinely do, manage this
>> stuff by hand? I don't really mind having the helper globally
>> accessible, but default I'd keep it inside env.py and expect others to
>> inherit appropriately.
>
> Hard to say how well it's gonna work tbh. But at least from
> what I've seen, large code bases (outside of kernel) usually
> have some way to attach metadata to the testcase to indicate
> various things. For example, this is how the timeout
> can be controlled:
>
> https://bazel.build/reference/test-encyclopedia#role-test-runner
>
> So I'd imagine we can eventually have @kstf_short/@ksft_long to
> control that using similar techniques.
>
> Regarding keeping it inside env.py: can you expand more on what
> you mean by having the default in env.py?

I'm looking into it now and I missed how this is layered. ksft.py is the
comparatively general piece of code, and env.py is something
specifically for driver testing. It makes sense for ksft_setup() to be
where it is, because not-driver tests might want to be marked disruptive
as well. It also makes sense that env.py invokes the general helper.

All is good.

>> > @@ -127,6 +129,36 @@ KSFT_RESULT_ALL = True
>> >              KSFT_RESULT = False
>> >  
>> >  
>> > +def ksft_disruptive(func):
>> > +    """
>> > +    Decorator that marks the test as disruptive (e.g. the test
>> > +    that can down the interface). Disruptive tests can be skipped
>> > +    by passing DISRUPTIVE=False environment variable.
>> > +    """
>> > +
>> > +    @functools.wraps(func)
>> > +    def wrapper(*args, **kwargs):
>> > +        if not KSFT_DISRUPTIVE:
>> > +            raise KsftSkipEx(f"marked as disruptive")
>> 
>> Since this is a skip, it will fail the overall run. But that happened
>> because the user themselves set DISRUPTIVE=0 to avoid, um, disruption to
>> the system. I think it should either be xfail, or something else
>> dedicated that conveys the idea that we didn't run the test, but that's
>> fine.
>> 
>> Using xfail for this somehow doesn't seem correct, nothing failed. Maybe
>> we need KsftOmitEx, which would basically be an xfail with a more
>> appropriate name?
>
> Are you sure skip will fail the overall run? At least looking at
> tools/testing/selftests/net/lib/py/ksft.py, both skip and xfail are
> considered KSFT_RESULT=True. Or am I looking at the wrong place?

You seem to be right about the exit code. This was discussed some time
ago, that SKIP is considered a sort of a failure. As the person running
the test you would want to go in and fix whatever configuration issue is
preventing the test from running. I'm not sure how it works in practice,
whether people look for skips in the test log explicitly or rely on exit
codes.

Maybe Jakub can chime in, since he's the one that cajoled me into
handling this whole SKIP / XFAIL business properly in bash selftests.
Jakub Kicinski Aug. 1, 2024, 2:07 p.m. UTC | #4
On Thu, 1 Aug 2024 10:36:18 +0200 Petr Machata wrote:
> You seem to be right about the exit code. This was discussed some time
> ago, that SKIP is considered a sort of a failure. As the person running
> the test you would want to go in and fix whatever configuration issue is
> preventing the test from running. I'm not sure how it works in practice,
> whether people look for skips in the test log explicitly or rely on exit
> codes.
> 
> Maybe Jakub can chime in, since he's the one that cajoled me into
> handling this whole SKIP / XFAIL business properly in bash selftests.

For HW testing there is a lot more variables than just "is there some
tool missing in the VM image". Not sure how well we can do in detecting
HW capabilities and XFAILing without making the tests super long.
And this case itself is not very clear cut. On one hand, you expect 
the test not to run if it's disruptive and executor can't deal with
disruptive - IOW it's an eXpected FAIL. But it is an executor
limitation, the device/driver could have been tested if it wasn't
for the executor, so not entirely dissimilar to a tool missing.

Either way - no strong opinion as of yet, we need someone to actually
continuously run these to get experience :(
Petr Machata Aug. 1, 2024, 9:31 p.m. UTC | #5
Jakub Kicinski <kuba@kernel.org> writes:

> On Thu, 1 Aug 2024 10:36:18 +0200 Petr Machata wrote:
>> You seem to be right about the exit code. This was discussed some time
>> ago, that SKIP is considered a sort of a failure. As the person running
>> the test you would want to go in and fix whatever configuration issue is
>> preventing the test from running. I'm not sure how it works in practice,
>> whether people look for skips in the test log explicitly or rely on exit
>> codes.
>> 
>> Maybe Jakub can chime in, since he's the one that cajoled me into
>> handling this whole SKIP / XFAIL business properly in bash selftests.
>
> For HW testing there is a lot more variables than just "is there some
> tool missing in the VM image". Not sure how well we can do in detecting
> HW capabilities and XFAILing without making the tests super long.
> And this case itself is not very clear cut. On one hand, you expect 
> the test not to run if it's disruptive and executor can't deal with
> disruptive - IOW it's an eXpected FAIL. But it is an executor
> limitation, the device/driver could have been tested if it wasn't
> for the executor, so not entirely dissimilar to a tool missing.
>
> Either way - no strong opinion as of yet, we need someone to actually
> continuously run these to get experience :(

After sending my response I realized we talked about this once already.
Apparently I forgot.

I think it's odd that SKIP is a fail in one framework but a pass in
another. But XFAIL is not a good name for something that was not even
run. And if we add something like "omit", nobody will know what it
means.

Ho hum.

Let's keep SKIP as passing in Python tests then...
diff mbox series

Patch

diff --git a/tools/testing/selftests/drivers/net/lib/py/env.py b/tools/testing/selftests/drivers/net/lib/py/env.py
index a5e800b8f103..1ea9bb695e94 100644
--- a/tools/testing/selftests/drivers/net/lib/py/env.py
+++ b/tools/testing/selftests/drivers/net/lib/py/env.py
@@ -4,6 +4,7 @@  import os
 import time
 from pathlib import Path
 from lib.py import KsftSkipEx, KsftXfailEx
+from lib.py import ksft_setup
 from lib.py import cmd, ethtool, ip
 from lib.py import NetNS, NetdevSimDev
 from .remote import Remote
@@ -14,7 +15,7 @@  from .remote import Remote
 
     src_dir = Path(src_path).parent.resolve()
     if not (src_dir / "net.config").exists():
-        return env
+        return ksft_setup(env)
 
     with open((src_dir / "net.config").as_posix(), 'r') as fp:
         for line in fp.readlines():
@@ -30,7 +31,7 @@  from .remote import Remote
             if len(pair) != 2:
                 raise Exception("Can't parse configuration line:", full_file)
             env[pair[0]] = pair[1]
-    return env
+    return ksft_setup(env)
 
 
 class NetDrvEnv:
diff --git a/tools/testing/selftests/drivers/net/stats.py b/tools/testing/selftests/drivers/net/stats.py
index 93f9204f51c4..4c58080cf893 100755
--- a/tools/testing/selftests/drivers/net/stats.py
+++ b/tools/testing/selftests/drivers/net/stats.py
@@ -3,6 +3,7 @@ 
 
 from lib.py import ksft_run, ksft_exit, ksft_pr
 from lib.py import ksft_ge, ksft_eq, ksft_in, ksft_true, ksft_raises, KsftSkipEx, KsftXfailEx
+from lib.py import ksft_disruptive
 from lib.py import EthtoolFamily, NetdevFamily, RtnlFamily, NlError
 from lib.py import NetDrvEnv
 from lib.py import ip, defer
@@ -134,6 +135,7 @@  rtnl = RtnlFamily()
     ksft_eq(cm.exception.nl_msg.extack['bad-attr'], '.ifindex')
 
 
+@ksft_disruptive
 def check_down(cfg) -> None:
     try:
         qstat = netfam.qstats_get({"ifindex": cfg.ifindex}, dump=True)
diff --git a/tools/testing/selftests/net/lib/py/ksft.py b/tools/testing/selftests/net/lib/py/ksft.py
index f26c20df9db4..a9a24ea77226 100644
--- a/tools/testing/selftests/net/lib/py/ksft.py
+++ b/tools/testing/selftests/net/lib/py/ksft.py
@@ -1,6 +1,7 @@ 
 # SPDX-License-Identifier: GPL-2.0
 
 import builtins
+import functools
 import inspect
 import sys
 import time
@@ -10,6 +11,7 @@  from .utils import global_defer_queue
 
 KSFT_RESULT = None
 KSFT_RESULT_ALL = True
+KSFT_DISRUPTIVE = True
 
 
 class KsftFailEx(Exception):
@@ -127,6 +129,36 @@  KSFT_RESULT_ALL = True
             KSFT_RESULT = False
 
 
+def ksft_disruptive(func):
+    """
+    Decorator that marks the test as disruptive (e.g. the test
+    that can down the interface). Disruptive tests can be skipped
+    by passing DISRUPTIVE=False environment variable.
+    """
+
+    @functools.wraps(func)
+    def wrapper(*args, **kwargs):
+        if not KSFT_DISRUPTIVE:
+            raise KsftSkipEx(f"marked as disruptive")
+        return func(*args, **kwargs)
+    return wrapper
+
+
+def ksft_setup(env):
+    """
+    Setup test framework global state from the environment.
+    """
+
+    def get_bool(env, name):
+        return env.get(name, "").lower() in ["true", "1"]
+
+    if "DISRUPTIVE" in env:
+        global KSFT_DISRUPTIVE
+        KSFT_DISRUPTIVE = get_bool(env, "DISRUPTIVE")
+
+    return env
+
+
 def ksft_run(cases=None, globs=None, case_pfx=None, args=()):
     cases = cases or []