diff mbox series

[2/4] docs: Add migration tests documentation

Message ID 20241017143211.17771-3-farosas@suse.de (mailing list archive)
State New
Headers show
Series tests/qtest: Move the bulk of migration tests into a separate target | expand

Commit Message

Fabiano Rosas Oct. 17, 2024, 2:32 p.m. UTC
Add documentation about how to write, run and debug migration tests.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 docs/devel/testing/index.rst     |   1 +
 docs/devel/testing/main.rst      |  13 ++
 docs/devel/testing/migration.rst | 275 +++++++++++++++++++++++++++++++
 docs/devel/testing/qtest.rst     |   1 +
 4 files changed, 290 insertions(+)
 create mode 100644 docs/devel/testing/migration.rst
diff mbox series

Patch

diff --git a/docs/devel/testing/index.rst b/docs/devel/testing/index.rst
index 45eb4a7181..e46a33e9e8 100644
--- a/docs/devel/testing/index.rst
+++ b/docs/devel/testing/index.rst
@@ -9,6 +9,7 @@  testing infrastructure.
 
    main
    qtest
+   migration
    functional
    avocado
    acpi-bits
diff --git a/docs/devel/testing/main.rst b/docs/devel/testing/main.rst
index 09725e8ea9..f238926300 100644
--- a/docs/devel/testing/main.rst
+++ b/docs/devel/testing/main.rst
@@ -96,6 +96,19 @@  QTest cases can be executed with
 
    make check-qtest
 
+Migration
+~~~~~~~~~
+
+Migration tests are part of QTest, but are run independently.  Refer
+to :doc:`migration` for more details.
+
+Migration test cases can be executed with
+
+.. code::
+
+   make check-migration
+   make check-migration-quick
+
 Writing portable test cases
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Both unit tests and qtests can run on POSIX hosts as well as Windows hosts.
diff --git a/docs/devel/testing/migration.rst b/docs/devel/testing/migration.rst
new file mode 100644
index 0000000000..e877a979bf
--- /dev/null
+++ b/docs/devel/testing/migration.rst
@@ -0,0 +1,275 @@ 
+.. _migration:
+
+Migration tests
+===============
+
+Migration tests are part of QTest, but have some particularities of
+their own, such as:
+
+- Extended test time due to the need to exercise the iterative phase
+  of migration;
+- Extra requirements on the QEMU binary being used due to
+  :ref:`cross-version migration <cross-version-tests>`;
+- The use of a custom binary for the guest code to test memory
+  integrity (see :ref:`guest-code`).
+
+Invocation
+----------
+
+Migration tests can be ran with:
+
+.. code::
+
+   make check-migration
+   make check-migration-quick
+
+or directly:
+
+.. code::
+
+   # all tests
+   QTEST_QEMU_BINARY=./qemu-system-x86_64 ./tests/qtest/migration-test
+
+   # single test
+   QTEST_QEMU_BINARY=./qemu-system-x86_64 ./tests/qtest/migration-test -p /x86_64/migration/bad_dest
+
+   # all tests under /multifd (note no trailing slash)
+   QTEST_QEMU_BINARY=./qemu-system-x86_64 ./tests/qtest/migration-test -r /x86_64/migration/multifd
+
+for cross-version tests (see :ref:`cross-version-tests`):
+
+.. code::
+
+   # old QEMU -> new QEMU
+   QTEST_QEMU_BINARY=./qemu-system-x86_64 QTEST_QEMU_BINARY_SRC=./old/qemu-system-x86_64 ./tests/qtest/migration-test
+   QTEST_QEMU_BINARY_DST=./qemu-system-x86_64 QTEST_QEMU_BINARY=./old/qemu-system-x86_64 ./tests/qtest/migration-test
+
+   # new QEMU -> old QEMU (backwards migration)
+   QTEST_QEMU_BINARY_SRC=./qemu-system-x86_64 QTEST_QEMU_BINARY=./old/qemu-system-x86_64 ./tests/qtest/migration-test
+   QTEST_QEMU_BINARY=./qemu-system-x86_64 QTEST_QEMU_BINARY_DST=./old/qemu-system-x86_64 ./tests/qtest/migration-test
+
+   # both _SRC and _DST variants are supported for convenience
+
+.. _cross-version-tests:
+
+Cross-version tests
+~~~~~~~~~~~~~~~~~~~
+
+To detect compatibility issues between different QEMU versions, all
+tests from migration-test can be executed with two different QEMU
+versions. The common machine type between the two versions is used.
+
+To setup cross-version tests, a previous build of QEMU must be kept,
+e.g.:
+
+.. code::
+
+   # build current code
+   mkdir build
+   cd build
+   ../configure; make
+
+   # build previous version
+   cd ../
+   mkdir build-9.1
+   git checkout v9.1.0
+   cd build
+   ../configure; make
+
+To avoid issues with newly added features and new tests, it is highly
+recommended to run the tests from the source directory of *older*
+version being tested.
+
+.. code::
+
+   ./build/qemu-system-x86_64 --version
+   QEMU emulator version 9.1.50
+
+   ./build-9.1/qemu-system-x86_64 --version
+   QEMU emulator version 9.1.0
+
+   cd build-9.1
+   QTEST_QEMU_BINARY=./qemu-system-x86_64 QTEST_QEMU_BINARY_DST=../build/qemu-system-x86_64 ./tests/qtest/migration-test
+
+
+How to write migration tests
+----------------------------
+
+Add a test function (prefixed with ``test_``) that gets registered
+with QTest using the ``migration_test_add*()`` helpers.
+
+The choice between ``migration_test_add()`` and
+``migration_test_add_quick()`` affects whether a test is executed via
+``make check``. The former helper adds a test to the "slow" set, which
+only runs via ``make check-migration``, while the latter allows a test
+to run via ``make check`` or ``make check-migration-quick``.
+
+A new test should preferably not add too much time to ``make
+check``. In general, code that depends on external libraries or pieces
+outside of migration could be tested in the quick set as well as tests
+that are fast (e.g. only tests migration URL and exits). Code that is
+core to the migration framework or changes infrequently should be
+tested in the slow set as well as test that are considerably slower
+(e.g. run several migration iterations).
+
+.. code::
+
+  migration_test_add("/migration/multifd/tcp/plain/cancel", test_multifd_tcp_cancel);
+
+There is no formal grammar for the definition of the test paths, but
+an informal rule is followed for consistency. Usually:
+
+``/migration/<multifd|precopy|postcopy>/<url type>/<test-specific>/``
+
+Bear in mind that the path string affects test execution order and
+filtering when using the ``-r`` flag.
+
+For simpler tests, the test function can setup the test arguments in
+the ``MigrateCommon`` structure and call into a common test
+routine. Currently there are two common test routines:
+
+ - test_precopy_common - for generic precopy migration
+ - test_file_common - for migration using the file: URL
+
+The general structure of a test routine is:
+
+- call ``test_migrate_start()`` to initialize the two QEMU
+  instances. Usually named "from", for the source machine and "to" for
+  the destination machine;
+
+- define the migration duration, (roughly speaking either quick or
+  slow) by altering the convergence parameters with
+  ``migrate_ensure[_non]_converge()``;
+
+- wait for the machines to be in the desired state with the ``wait_for_*``
+  helpers;
+
+- migrate with ``migrate_qmp()/migrate_incoming_qmp()/migrate_qmp_fail()``;
+
+- check that guest memory was not corrupted and clean up the QEMU
+  instances with ``test_migrate_end()``.
+
+If using the common test routines, the ``.start_hook`` and ``.finish_hook``
+callbacks can be used to perform test-specific tasks.
+
+.. _guest-code:
+
+About guest code
+----------------
+
+The tests all use a custom, architecture-specific binary as the guest
+code. This code, known as a-b-kernel or a-b-bootblock, constantly
+iterates over the guest memory, writing a number to the start of each
+guest page, incrementing it as it loops around (i.e. a generation
+count). This allows the tests to catch memory corruption errors that
+occur during migration as every page's first byte must have the same
+value, except at the point where the transition happens.
+
+Whenever guest memory is migrated incorrectly, the test will output
+the address and amount of pages that present a value inconsistent with
+the generation count, e.g.:
+
+.. code::
+
+  Memory content inconsistency at d53000 first_byte = 27 last_byte = 26 current = 27 hit_edge = 1
+  Memory content inconsistency at d54000 first_byte = 27 last_byte = 26 current = 27 hit_edge = 1
+  Memory content inconsistency at d55000 first_byte = 27 last_byte = 26 current = 27 hit_edge = 1
+  and in another 4929 pages
+
+In the scenario above,
+
+``first_byte`` shows that the current generation number is 27, therefore
+all pages should have 27 as their first byte. Since ``hit_edge=1``, that
+means the transition point was found, i.e. the guest was stopped for
+migration while not all pages had yet been updated to the new
+generation count. So 26 is also a valid byte to find in some pages.
+
+The inconsistency here is that ``last_byte``, i.e. the previous
+generation count is smaller than the ``current`` byte, which should not
+be possible. This would indicate a memory layout such as:
+
+.. code::
+
+  0xb00000 | 27 00 00 ...
+  ...
+  0xc00000 | 27 00 00 ...
+  ...
+  0xd00000 | 27 00 00 ...
+  0x?????? | 26 00 00 ... <-- pages around this addr weren't migrated correctly
+  ...
+  0xd53000 | 27 00 00 ...
+  0xd54000 | 27 00 00 ...
+  0xd55000 | 27 00 00 ...
+  ...
+
+The a-b code is located at ``tests/migration/<arch>``.
+
+Troubleshooting
+---------------
+
+Migration tests usually run as part of make check, which is most
+likely to not have been using the verbose flag, so the first thing to
+check is the test log from meson (``meson-logs/testlog.txt``).
+
+There, look for the last "Running" entry, which will be the current
+test. Notice whether the failing program is one of the QEMU instances
+or the migration-test itself.
+
+E.g.:
+
+.. code::
+
+  # Running /s390x/migration/precopy/unix/plain
+  # Using machine type: s390-ccw-virtio-9.2
+  # starting QEMU: exec ./qemu-system-s390x -qtest ...
+  # starting QEMU: exec ./qemu-system-s390x -qtest ...
+  ----------------------------------- stderr -----------------------------------
+  migration-test: ../tests/qtest/migration-test.c:1712: test_precopy_common: Assertion `0' failed.
+
+  (test program exited with status code -6)
+
+.. code::
+
+  # Running /x86_64/migration/bad_dest
+  # Using machine type: pc-q35-9.2
+  # starting QEMU: exec ./qemu-system-x86_64 -qtest ...
+  # starting QEMU: exec ./qemu-system-x86_64 -qtest ...
+  ----------------------------------- stderr -----------------------------------
+  Broken pipe
+  ../tests/qtest/libqtest.c:205: kill_qemu() detected QEMU death from signal 6 (Aborted) (core dumped)
+
+  (test program exited with status code -6)
+
+The above is usually not enough to determine what happened, so
+re-running the test directly is helpful:
+
+.. code::
+
+   QTEST_QEMU_BINARY=./qemu-system-x86_64 ./tests/qtest/migration-test -p /x86_64/migration/bad_dest
+
+There are also the QTEST_LOG and QTEST_TRACE variables for increased
+logging and tracing.
+
+The QTEST_QEMU_BINARY environment variable can be abused to hook GDB
+or valgrind into the invocation:
+
+.. code::
+
+   QTEST_QEMU_BINARY='gdb -q --ex "set pagination off" --ex "set print thread-events off" \
+   --ex "handle SIGUSR1 noprint" --ex "break <breakpoint>" --ex "run" --ex "quit \$_exitcode" \
+   --args ./qemu-system-x86_64' ./tests/qtest/migration-test -p /x86_64/migration/multifd/file/mapped-ram/fdset/dio
+
+.. code::
+
+   QTEST_QEMU_BINARY='valgrind -q --leak-check=full --show-leak-kinds=definite,indirect \
+   ./qemu-system-x86_64' ./tests/qtest/migration-test -r /x86_64/migration
+
+Whenever a test fails, it will leave behind a temporary
+directory. This is useful for file migrations to inspect the generated
+migration file:
+
+.. code::
+
+   $ file /tmp/migration-test-X496U2/migfile
+   /tmp/migration-test-X496U2/migfile: QEMU suspend to disk image
+   $ hexdump -C /tmp/migration-test-X496U2/migfile | less
diff --git a/docs/devel/testing/qtest.rst b/docs/devel/testing/qtest.rst
index c5b8546b3e..4665c160b6 100644
--- a/docs/devel/testing/qtest.rst
+++ b/docs/devel/testing/qtest.rst
@@ -5,6 +5,7 @@  QTest Device Emulation Testing Framework
 .. toctree::
 
    qgraph
+   migration
 
 QTest is a device emulation testing framework.  It can be very useful to test
 device models; it could also control certain aspects of QEMU (such as virtual