Message ID | 20240821223012.3757828-1-vipinsh@google.com (mailing list archive) |
---|---|
Headers | show |
Series | KVM selftests runner for running more than just default | expand |
Oops! Adding archs mailing list and maintainers which have arch folder in tool/testing/selftests/kvm On Wed, Aug 21, 2024 at 3:30 PM Vipin Sharma <vipinsh@google.com> wrote: > > This series is introducing a KVM selftests runner to make it easier to > run selftests with some interesting configurations and provide some > enhancement over existing kselftests runner. > > I would like to get an early feedback from the community and see if this > is something which can be useful for improving KVM selftests coverage > and worthwhile investing time in it. Some specific questions: > > 1. Should this be done? > 2. What features are must? > 3. Any other way to write test configuration compared to what is done here? > > Note, python code written for runner is not optimized but shows how this > runner can be useful. > > What are the goals? > - Run tests with more than just the default settings of KVM module > parameters and test itself. > - Capture issues which only show when certain combination of module > parameter and tests options are used. > - Provide minimum testing which can be standardised for KVM patches. > - Run tests parallely. > - Dump output in a hierarchical folder structure for easier tracking of > failures/success output > - Feel free to add yours :) > > Why not use/extend kselftests? > - Other submodules goal might not align and its gonna be difficult to > capture broader set of requirements. > - Instead of test configuration we will need separate shell scripts > which will act as tests for each test arg and module parameter > combination. This will easily pollute the KVM selftests directory. > - Easier to enhance features using Python packages than shell scripts. > > What this runner do? > - Reads a test configuration file (tests.json in patch 1). > Configuration in json are written in hierarchy where multiple suites > exist and each suite contains multiple tests. > - Provides a way to execute tests inside a suite parallelly. > - Provides a way to dump output to a folder in a hierarchical manner. > - Allows to run selected suites, or tests in a specific suite. > - Allows to do some setup and teardown for test suites and tests. > - Timeout can be provided to limit test execution duration. > - Allows to run test suites or tests on specific architecture only. > > Runner is written in python and goal is to only use standard library > constructs. This runner will work on Python 3.6 and up > > What does a test configuration file looks like? > Test configuration are written in json as it is easier to read and has > inbuilt package support in Python. Root level is a json array denoting > suites and each suite can multiple tests in it using json array. > > [ > { > "suite": "dirty_log_perf_tests", > "timeout_s": 300, > "arch": "x86_64", > "setup": "echo Setting up suite", > "teardown": "echo tearing down suite", > "tests": [ > { > "name": "dirty_log_perf_test_max_vcpu_no_manual_protect", > "command": "./dirty_log_perf_test -v $(grep -c ^processor /proc/cpuinfo) -g", > "arch": "x86_64", > "setup": "echo Setting up test", > "teardown": "echo tearing down test", > "timeout_s": 5 > } > ] > } > ] > > Usage: > Runner "runner.py" and test configuration "tests.json" lives in > tool/testing/selftests/kvm directory. > > To run serially: > ./runner.py tests.json > > To run specific test suites: > ./runner.py tests.json dirty_log_perf_tests x86_sanity_tests > > To run specific test in a suite: > ./runner.py tests.json x86_sanity_tests/vmx_msrs_test > > To run everything parallely (runs tests inside a suite parallely): > ./runner.py -j 10 tests.json > > To dump output to disk: > ./runner.py -j 10 tests.json -o sample_run > > Sample output (after removing timestamp, process ID, and logging > level columns): > > ./runner.py tests.json -j 10 -o sample_run > PASSED: dirty_log_perf_tests/dirty_log_perf_test_max_vcpu_no_manual_protect > PASSED: dirty_log_perf_tests/dirty_log_perf_test_max_vcpu_manual_protect > PASSED: dirty_log_perf_tests/dirty_log_perf_test_max_vcpu_manual_protect_random_access > PASSED: dirty_log_perf_tests/dirty_log_perf_test_max_10_vcpu_hugetlb > PASSED: x86_sanity_tests/vmx_msrs_test > SKIPPED: x86_sanity_tests/private_mem_conversions_test > FAILED: x86_sanity_tests/apic_bus_clock_test > PASSED: x86_sanity_tests/dirty_log_page_splitting_test > -------------------------------------------------------------------------- > Test runner result: > 1) dirty_log_perf_tests: > 1) PASSED: dirty_log_perf_test_max_vcpu_no_manual_protect > 2) PASSED: dirty_log_perf_test_max_vcpu_manual_protect > 3) PASSED: dirty_log_perf_test_max_vcpu_manual_protect_random_access > 4) PASSED: dirty_log_perf_test_max_10_vcpu_hugetlb > 2) x86_sanity_tests: > 1) PASSED: vmx_msrs_test > 2) SKIPPED: private_mem_conversions_test > 3) FAILED: apic_bus_clock_test > 4) PASSED: dirty_log_page_splitting_test > -------------------------------------------------------------------------- > > Directory structure created: > > sample_run/ > |-- dirty_log_perf_tests > | |-- dirty_log_perf_test_max_10_vcpu_hugetlb > | | |-- command.stderr > | | |-- command.stdout > | | |-- setup.stderr > | | |-- setup.stdout > | | |-- teardown.stderr > | | `-- teardown.stdout > | |-- dirty_log_perf_test_max_vcpu_manual_protect > | | |-- command.stderr > | | `-- command.stdout > | |-- dirty_log_perf_test_max_vcpu_manual_protect_random_access > | | |-- command.stderr > | | `-- command.stdout > | `-- dirty_log_perf_test_max_vcpu_no_manual_protect > | |-- command.stderr > | `-- command.stdout > `-- x86_sanity_tests > |-- apic_bus_clock_test > | |-- command.stderr > | `-- command.stdout > |-- dirty_log_page_splitting_test > | |-- command.stderr > | |-- command.stdout > | |-- setup.stderr > | |-- setup.stdout > | |-- teardown.stderr > | `-- teardown.stdout > |-- private_mem_conversions_test > | |-- command.stderr > | `-- command.stdout > `-- vmx_msrs_test > |-- command.stderr > `-- command.stdout > > > Some other features for future: > - Provide "precheck" command option in json, which can filter/skip tests if > certain conditions are not met. > - Iteration option in the runner. This will allow the same test suites to > run again. > > Vipin Sharma (1): > KVM: selftestsi: Create KVM selftests runnner to run interesting tests > > tools/testing/selftests/kvm/runner.py | 282 +++++++++++++++++++++++++ > tools/testing/selftests/kvm/tests.json | 60 ++++++ > 2 files changed, 342 insertions(+) > create mode 100755 tools/testing/selftests/kvm/runner.py > create mode 100644 tools/testing/selftests/kvm/tests.json > > > base-commit: de9c2c66ad8e787abec7c9d7eff4f8c3cdd28aed > -- > 2.46.0.184.g6999bdac58-goog >
On Thu, Aug 22, 2024 at 1:55 PM Vipin Sharma <vipinsh@google.com> wrote: > > Oops! Adding archs mailing list and maintainers which have arch folder > in tool/testing/selftests/kvm > > On Wed, Aug 21, 2024 at 3:30 PM Vipin Sharma <vipinsh@google.com> wrote: > > > > This series is introducing a KVM selftests runner to make it easier to > > run selftests with some interesting configurations and provide some > > enhancement over existing kselftests runner. > > > > I would like to get an early feedback from the community and see if this > > is something which can be useful for improving KVM selftests coverage > > and worthwhile investing time in it. Some specific questions: > > > > 1. Should this be done? > > 2. What features are must? > > 3. Any other way to write test configuration compared to what is done here? > > > > Note, python code written for runner is not optimized but shows how this > > runner can be useful. > > > > What are the goals? > > - Run tests with more than just the default settings of KVM module > > parameters and test itself. > > - Capture issues which only show when certain combination of module > > parameter and tests options are used. > > - Provide minimum testing which can be standardised for KVM patches. > > - Run tests parallely. > > - Dump output in a hierarchical folder structure for easier tracking of > > failures/success output > > - Feel free to add yours :) > > > > Why not use/extend kselftests? > > - Other submodules goal might not align and its gonna be difficult to > > capture broader set of requirements. > > - Instead of test configuration we will need separate shell scripts > > which will act as tests for each test arg and module parameter > > combination. This will easily pollute the KVM selftests directory. > > - Easier to enhance features using Python packages than shell scripts. > > > > What this runner do? > > - Reads a test configuration file (tests.json in patch 1). > > Configuration in json are written in hierarchy where multiple suites > > exist and each suite contains multiple tests. > > - Provides a way to execute tests inside a suite parallelly. > > - Provides a way to dump output to a folder in a hierarchical manner. > > - Allows to run selected suites, or tests in a specific suite. > > - Allows to do some setup and teardown for test suites and tests. > > - Timeout can be provided to limit test execution duration. > > - Allows to run test suites or tests on specific architecture only. > > > > Runner is written in python and goal is to only use standard library > > constructs. This runner will work on Python 3.6 and up > > > > What does a test configuration file looks like? > > Test configuration are written in json as it is easier to read and has > > inbuilt package support in Python. Root level is a json array denoting > > suites and each suite can multiple tests in it using json array. > > > > [ > > { > > "suite": "dirty_log_perf_tests", > > "timeout_s": 300, > > "arch": "x86_64", > > "setup": "echo Setting up suite", > > "teardown": "echo tearing down suite", > > "tests": [ > > { > > "name": "dirty_log_perf_test_max_vcpu_no_manual_protect", > > "command": "./dirty_log_perf_test -v $(grep -c ^processor /proc/cpuinfo) -g", > > "arch": "x86_64", > > "setup": "echo Setting up test", > > "teardown": "echo tearing down test", > > "timeout_s": 5 > > } > > ] > > } > > ] > > > > Usage: > > Runner "runner.py" and test configuration "tests.json" lives in > > tool/testing/selftests/kvm directory. > > > > To run serially: > > ./runner.py tests.json > > > > To run specific test suites: > > ./runner.py tests.json dirty_log_perf_tests x86_sanity_tests > > > > To run specific test in a suite: > > ./runner.py tests.json x86_sanity_tests/vmx_msrs_test > > > > To run everything parallely (runs tests inside a suite parallely): > > ./runner.py -j 10 tests.json > > > > To dump output to disk: > > ./runner.py -j 10 tests.json -o sample_run > > > > Sample output (after removing timestamp, process ID, and logging > > level columns): > > > > ./runner.py tests.json -j 10 -o sample_run > > PASSED: dirty_log_perf_tests/dirty_log_perf_test_max_vcpu_no_manual_protect > > PASSED: dirty_log_perf_tests/dirty_log_perf_test_max_vcpu_manual_protect > > PASSED: dirty_log_perf_tests/dirty_log_perf_test_max_vcpu_manual_protect_random_access > > PASSED: dirty_log_perf_tests/dirty_log_perf_test_max_10_vcpu_hugetlb > > PASSED: x86_sanity_tests/vmx_msrs_test > > SKIPPED: x86_sanity_tests/private_mem_conversions_test > > FAILED: x86_sanity_tests/apic_bus_clock_test > > PASSED: x86_sanity_tests/dirty_log_page_splitting_test > > -------------------------------------------------------------------------- > > Test runner result: > > 1) dirty_log_perf_tests: > > 1) PASSED: dirty_log_perf_test_max_vcpu_no_manual_protect > > 2) PASSED: dirty_log_perf_test_max_vcpu_manual_protect > > 3) PASSED: dirty_log_perf_test_max_vcpu_manual_protect_random_access > > 4) PASSED: dirty_log_perf_test_max_10_vcpu_hugetlb > > 2) x86_sanity_tests: > > 1) PASSED: vmx_msrs_test > > 2) SKIPPED: private_mem_conversions_test > > 3) FAILED: apic_bus_clock_test > > 4) PASSED: dirty_log_page_splitting_test > > -------------------------------------------------------------------------- > > > > Directory structure created: > > > > sample_run/ > > |-- dirty_log_perf_tests > > | |-- dirty_log_perf_test_max_10_vcpu_hugetlb > > | | |-- command.stderr > > | | |-- command.stdout > > | | |-- setup.stderr > > | | |-- setup.stdout > > | | |-- teardown.stderr > > | | `-- teardown.stdout > > | |-- dirty_log_perf_test_max_vcpu_manual_protect > > | | |-- command.stderr > > | | `-- command.stdout > > | |-- dirty_log_perf_test_max_vcpu_manual_protect_random_access > > | | |-- command.stderr > > | | `-- command.stdout > > | `-- dirty_log_perf_test_max_vcpu_no_manual_protect > > | |-- command.stderr > > | `-- command.stdout > > `-- x86_sanity_tests > > |-- apic_bus_clock_test > > | |-- command.stderr > > | `-- command.stdout > > |-- dirty_log_page_splitting_test > > | |-- command.stderr > > | |-- command.stdout > > | |-- setup.stderr > > | |-- setup.stdout > > | |-- teardown.stderr > > | `-- teardown.stdout > > |-- private_mem_conversions_test > > | |-- command.stderr > > | `-- command.stdout > > `-- vmx_msrs_test > > |-- command.stderr > > `-- command.stdout > > > > > > Some other features for future: > > - Provide "precheck" command option in json, which can filter/skip tests if > > certain conditions are not met. > > - Iteration option in the runner. This will allow the same test suites to > > run again. > > > > Vipin Sharma (1): > > KVM: selftestsi: Create KVM selftests runnner to run interesting tests > > > > tools/testing/selftests/kvm/runner.py | 282 +++++++++++++++++++++++++ > > tools/testing/selftests/kvm/tests.json | 60 ++++++ > > 2 files changed, 342 insertions(+) > > create mode 100755 tools/testing/selftests/kvm/runner.py > > create mode 100644 tools/testing/selftests/kvm/tests.json > > > > > > base-commit: de9c2c66ad8e787abec7c9d7eff4f8c3cdd28aed > > -- > > 2.46.0.184.g6999bdac58-goog > > Had an offline discussion with Sean, providing a summary on what we discussed (Sean, correct me if something is not aligned from our discussion): We need to have a roadmap for the runner in terms of features we support. Phase 1: Having a basic selftest runner is useful which can: - Run tests parallely - Provide a summary of what passed and failed, or only in case of failure. - Dump output which can be easily accessed and parsed. - Allow to run with different command line parameters. Current patch does more than this and can be simplified. Phase 2: Environment setup via runner Current patch, allows to write "setup" commands at test suite and test level in the json config file to setup the environment needed by a test to run. This might not be ideal as some settings are exposed differently on different platforms. For example, To enable TDP: - Intel needs npt=Y - AMD needs ept=Y - ARM always on. To enable APIC virtualization - Intel needs enable_apicv=Y - AMD needs avic=Y To enable/disable nested, they both have the same file name "nested" in their module params directory which should be changed. These kinds of settings become more verbose and unnecessary on other platforms. Instead, runners should have some programming constructs (API, command line options, default) to enable these options in a generic way. For example, enable/disable nested can be exposed as a command line --enable_nested, then based on the platform, runner can update corresponding module param or ignore. This will easily extend to providing sane configuration on the corresponding platforms without lots of hardcoding in JSON. These individual constructs will provide a generic view/option to run a KVM feature, and under the hood will do things differently based on the platform it is running on like arm, x86-intel, x86-amd, s390, etc. Phase 3: Provide collection of interesting configurations Specific individual constructs can be combined in a meaningful way to provide interesting configurations to run on a platform. For example, user doesn't need to specify each individual configuration instead, some prebuilt configurations can be exposed like --stress_test_shadow_mmu, --test_basic_nested Tests need to handle the environment in which they are running gracefully, which many tests already do but not exhaustively. If some setting is not provided or set up properly for their execution then they should fail/skip accordingly. Runner will not be responsible to precheck things on tests behalf. Next steps: 1. Consensus on above phases and features. 2. Start development. Thanks, Vipin
On Fri, Nov 01, 2024, Vipin Sharma wrote: > Had an offline discussion with Sean, providing a summary on what we > discussed (Sean, correct me if something is not aligned from our > discussion): > > We need to have a roadmap for the runner in terms of features we support. > > Phase 1: Having a basic selftest runner is useful which can: > > - Run tests parallely Maybe with a (very conversative) per test timeout? Selftests generally don't have the same problems as KVM-Unit-Tests (KUT), as selftests are a little better at guarding against waiting indefinitely, i.e. I don't think we need a configurable timeout. But a 120 second timeout or so would be helpful. E.g. I recently was testing a patch (of mine) that had a "minor" bug where it caused KVM to do a remote TLB flush on *every* SPTE update in the shadow MMU, which manifested as hilariously long runtimes for max_guest_memory_test. I was _this_ close to not catching the bug (which would have been quite embarrasing), because my hack-a-scripts don't use timeouts (I only noticed because a completely unrelated bug was causing failures). > - Provide a summary of what passed and failed, or only in case of failure. I think a summary is always warranted. And for failures, it would be helpful to spit out _what_ test failed, versus the annoying KUT runner's behavior of stating only the number of passes/failures, which forces the user to go spelunking just to find out what (sub)test failed. I also think the runner should have a "heartbeat" mechanism, i.e. something that communicates to the user that forward progress is being made. And IMO, that mechanism should also spit out skips and failures (this could be optional though). One of the flaws with the KUT runner is that it's either super noisy and super quiet. E.g. my mess of bash outputs this when running selftests in parallel (trimmed for brevity): Running selftests with npt_disabled Waiting for 'access_tracking_perf_test', PID '92317' Waiting for 'amx_test', PID '92318' SKIPPED amx_test Waiting for 'apic_bus_clock_test', PID '92319' Waiting for 'coalesced_io_test', PID '92321' Waiting for 'cpuid_test', PID '92324' ... Waiting for 'hyperv_svm_test', PID '92552' SKIPPED hyperv_svm_test Waiting for 'hyperv_tlb_flush', PID '92563' FAILED hyperv_tlb_flush : ret ='254' Random seed: 0x6b8b4567 ==== Test Assertion Failure ==== x86_64/hyperv_tlb_flush.c:117: val == expected pid=92731 tid=93548 errno=4 - Interrupted system call 1 0x0000000000411566: assert_on_unhandled_exception at processor.c:627 2 0x000000000040889a: _vcpu_run at kvm_util.c:1649 3 (inlined by) vcpu_run at kvm_util.c:1660 4 0x00000000004041a1: vcpu_thread at hyperv_tlb_flush.c:548 5 0x000000000043a305: start_thread at pthread_create.o:? 6 0x000000000045f857: __clone3 at ??:? val == expected Waiting for 'kvm_binary_stats_test', PID '92579' ... SKIPPED vmx_preemption_timer_test Waiting for 'vmx_set_nested_state_test', PID '93316' SKIPPED vmx_set_nested_state_test Waiting for 'vmx_tsc_adjust_test', PID '93329' SKIPPED vmx_tsc_adjust_test Waiting for 'xapic_ipi_test', PID '93350' Waiting for 'xapic_state_test', PID '93360' Waiting for 'xcr0_cpuid_test', PID '93374' Waiting for 'xen_shinfo_test', PID '93391' Waiting for 'xen_vmcall_test', PID '93405' Waiting for 'xss_msr_test', PID '93420' It's far from perfect, e.g. just waits in alphabetical order, but it gives me easy to read feedback, and signal that tests are indeed running and completing. > - Dump output which can be easily accessed and parsed. And persist the output/logs somewhere, e.g. so that the user can triage failures after the fact. > - Allow to run with different command line parameters. Command line parameters for tests? If so, I would put this in phase 3. I.e. make the goal of Phase 1 purely about running tests in parallel. > Current patch does more than this and can be simplified. > > Phase 2: Environment setup via runner > > Current patch, allows to write "setup" commands at test suite and test > level in the json config file to setup the environment needed by a > test to run. This might not be ideal as some settings are exposed > differently on different platforms. > > For example, > To enable TDP: > - Intel needs npt=Y > - AMD needs ept=Y > - ARM always on. > > To enable APIC virtualization > - Intel needs enable_apicv=Y > - AMD needs avic=Y > > To enable/disable nested, they both have the same file name "nested" > in their module params directory which should be changed. > > These kinds of settings become more verbose and unnecessary on other > platforms. Instead, runners should have some programming constructs > (API, command line options, default) to enable these options in a > generic way. For example, enable/disable nested can be exposed as a > command line --enable_nested, then based on the platform, runner can > update corresponding module param or ignore. > > This will easily extend to providing sane configuration on the > corresponding platforms without lots of hardcoding in JSON. These > individual constructs will provide a generic view/option to run a KVM > feature, and under the hood will do things differently based on the > platform it is running on like arm, x86-intel, x86-amd, s390, etc. My main input on this front is that the runner needs to configure module params (and other environment settings) _on behalf of the user_, i.e. in response to a command line option (to the runner), not in response to per-test configurations. One of my complaints with our internal infrastructure is that the testcases themselves can dictate environment settings. There are certainly benefits to that approach, but it really only makes sense at scale where there are many machines available, i.e. where the runner can achieve parallelism by running tests on multiple machines, and where the complexity of managing the environment on a per-test basis is worth the payout. For the upstream runner, I want to cater to developers, i.e. to people that are running tests on one or two machines. And I want the runner to rip through tests as fast as possible, i.e. I don't want tests to get serialized because each one insists on being a special snowflake and doesn't play nice with other children. Organizations that the have a fleet of systems can pony up the resources to develop their own support (on top?). Selftests can and do check for module params, and should and do use TEST_REQUIRE() to skip when a module param isn't set as needed. Extending that to arbitrary sysfs knobs should be trivial. I.e. if we get _failures_ because of an incompatible environment, then it's a test bug. > Phase 3: Provide collection of interesting configurations > > Specific individual constructs can be combined in a meaningful way to > provide interesting configurations to run on a platform. For example, > user doesn't need to specify each individual configuration instead, > some prebuilt configurations can be exposed like > --stress_test_shadow_mmu, --test_basic_nested IMO, this shouldn't be baked into the runner, i.e. should not surface as dedicated command line options. Users shouldn't need to modify the runner just to bring their own configuration. I also think configurations should be discoverable, e.g. not hardcoded like KUT's unittest.cfg. A very real problem with KUT's approach is that testing different combinations is frustratingly difficult, because running a testcase with different configuration requires modifying a file that is tracked by git. There are underlying issues with KUT that essentially necessitate that approach, e.g. x86 has several testcases that fail if run without the exact right config. But that's just another reason to NOT follow KUT's pattern, e.g. to force us to write robust tests. E.g. instead of per-config command line options, let the user specify a file, and/or a directory (using a well known filename pattern to detect configs). > Tests need to handle the environment in which they are running > gracefully, which many tests already do but not exhaustively. If some > setting is not provided or set up properly for their execution then > they should fail/skip accordingly. This belongs in phase 2. > Runner will not be responsible to precheck things on tests behalf. > > > Next steps: > 1. Consensus on above phases and features. > 2. Start development. > > Thanks, > Vipin