mbox series

[00/17] replay: Fixes and avocado test updates

Message ID 20241220104220.2007786-1-npiggin@gmail.com (mailing list archive)
Headers show
Series replay: Fixes and avocado test updates | expand

Message

Nicholas Piggin Dec. 20, 2024, 10:42 a.m. UTC
Hi,

This is another round of replay fixes posted here

https://lore.kernel.org/qemu-devel/20240813050638.446172-1-npiggin@gmail.com/

A bunch of those fixes have been merged, but there are still some
outstanding here.

Dropped from the series is the net announce change, which seemed to
be the main issue Pavel had so far:

https://lore.kernel.org/qemu-devel/6e9b8e49-f00f-46fc-bbf8-4af27e0c3906@ispras.ru/

New in this series is a reworking of the replay BH APIs because people
didn't like the replay_xxx APIs throughout the tree. These new APIs
also have some assertions added to catch un-converted users when replay
is enabled, because it is far harder to debug it when it surfaces as a
replay failure.

These new API assertions caught a hw/ide replay bug which solves some
replay_linux test hangs. Couple of fixes in the replay_linux test case,
and now all tests are passing including aarch64 tests, see here

  https://gitlab.com/npiggin/qemu/-/jobs/8695386122

(In that run a couple of the x86_64 tests were disabled to fit the
aarch64 tests in because gitlab seems to kill the job after 1 hour
so we can't fit them all in)

ppc64 also passes replay_linux after a couple of ppc64 fixes I'll post
a patch to add the ppc64 test later after everything works through.

Thanks,
Nick


Nicholas Piggin (17):
  replay: Fix migration use of clock for statistics
  replay: Fix migration replay_mutex locking
  async: rework async event API for replay
  util/main-loop: Convert to new bh API
  util/thread-pool: Convert to new bh API
  util/aio-wait: Convert to new bh API
  async/coroutine: Convert to new bh API
  migration: Convert to new bh API
  monitor: Convert to new bh API
  qmp: Convert to new bh API
  block: Convert to new bh API
  hw/ide: Fix record-replay and convert to new bh API
  hw/scsi: Convert to new bh API
  async: add debugging assertions for record/replay in bh APIs
  tests/avocado/replay_linux: Fix compile error
  tests/avocado/replay_linux: Fix cdrom device setup
  tests/avocado/replay_linux: remove the timeout expected guards

 docs/devel/replay.rst              |  7 +--
 include/block/aio.h                | 44 ++++++++++++++++--
 include/sysemu/replay.h            |  2 +-
 backends/rng-builtin.c             |  2 +-
 block.c                            |  4 +-
 block/blkreplay.c                  | 10 +++-
 block/block-backend.c              | 24 ++++++----
 block/io.c                         |  5 +-
 block/iscsi.c                      |  5 +-
 block/nfs.c                        | 10 ++--
 block/null.c                       |  4 +-
 block/nvme.c                       |  8 ++--
 hw/ide/core.c                      |  9 ++--
 hw/net/virtio-net.c                | 14 +++---
 hw/scsi/scsi-bus.c                 | 14 ++++--
 job.c                              |  3 +-
 migration/migration.c              | 17 +++++--
 migration/savevm.c                 | 15 +++---
 monitor/monitor.c                  |  3 +-
 monitor/qmp.c                      |  5 +-
 qapi/qmp-dispatch.c                |  5 +-
 replay/replay-events.c             | 29 +++++-------
 stubs/replay-tools.c               |  2 +-
 util/aio-wait.c                    |  3 +-
 util/async.c                       | 75 ++++++++++++++++++++++++++++--
 util/main-loop.c                   |  2 +-
 util/thread-pool.c                 |  8 ++--
 scripts/block-coroutine-wrapper.py |  3 +-
 tests/avocado/replay_linux.py      |  9 ++--
 29 files changed, 245 insertions(+), 96 deletions(-)

Comments

Pavel Dovgalyuk Dec. 20, 2024, 11:42 a.m. UTC | #1
Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>

On 20.12.2024 13:42, Nicholas Piggin wrote:
> Hi,
> 
> This is another round of replay fixes posted here
> 
> https://lore.kernel.org/qemu-devel/20240813050638.446172-1-npiggin@gmail.com/
> 
> A bunch of those fixes have been merged, but there are still some
> outstanding here.
> 
> Dropped from the series is the net announce change, which seemed to
> be the main issue Pavel had so far:
> 
> https://lore.kernel.org/qemu-devel/6e9b8e49-f00f-46fc-bbf8-4af27e0c3906@ispras.ru/
> 
> New in this series is a reworking of the replay BH APIs because people
> didn't like the replay_xxx APIs throughout the tree. These new APIs
> also have some assertions added to catch un-converted users when replay
> is enabled, because it is far harder to debug it when it surfaces as a
> replay failure.
> 
> These new API assertions caught a hw/ide replay bug which solves some
> replay_linux test hangs. Couple of fixes in the replay_linux test case,
> and now all tests are passing including aarch64 tests, see here
> 
>    https://gitlab.com/npiggin/qemu/-/jobs/8695386122
> 
> (In that run a couple of the x86_64 tests were disabled to fit the
> aarch64 tests in because gitlab seems to kill the job after 1 hour
> so we can't fit them all in)
> 
> ppc64 also passes replay_linux after a couple of ppc64 fixes I'll post
> a patch to add the ppc64 test later after everything works through.
> 
> Thanks,
> Nick
> 
> 
> Nicholas Piggin (17):
>    replay: Fix migration use of clock for statistics
>    replay: Fix migration replay_mutex locking
>    async: rework async event API for replay
>    util/main-loop: Convert to new bh API
>    util/thread-pool: Convert to new bh API
>    util/aio-wait: Convert to new bh API
>    async/coroutine: Convert to new bh API
>    migration: Convert to new bh API
>    monitor: Convert to new bh API
>    qmp: Convert to new bh API
>    block: Convert to new bh API
>    hw/ide: Fix record-replay and convert to new bh API
>    hw/scsi: Convert to new bh API
>    async: add debugging assertions for record/replay in bh APIs
>    tests/avocado/replay_linux: Fix compile error
>    tests/avocado/replay_linux: Fix cdrom device setup
>    tests/avocado/replay_linux: remove the timeout expected guards
> 
>   docs/devel/replay.rst              |  7 +--
>   include/block/aio.h                | 44 ++++++++++++++++--
>   include/sysemu/replay.h            |  2 +-
>   backends/rng-builtin.c             |  2 +-
>   block.c                            |  4 +-
>   block/blkreplay.c                  | 10 +++-
>   block/block-backend.c              | 24 ++++++----
>   block/io.c                         |  5 +-
>   block/iscsi.c                      |  5 +-
>   block/nfs.c                        | 10 ++--
>   block/null.c                       |  4 +-
>   block/nvme.c                       |  8 ++--
>   hw/ide/core.c                      |  9 ++--
>   hw/net/virtio-net.c                | 14 +++---
>   hw/scsi/scsi-bus.c                 | 14 ++++--
>   job.c                              |  3 +-
>   migration/migration.c              | 17 +++++--
>   migration/savevm.c                 | 15 +++---
>   monitor/monitor.c                  |  3 +-
>   monitor/qmp.c                      |  5 +-
>   qapi/qmp-dispatch.c                |  5 +-
>   replay/replay-events.c             | 29 +++++-------
>   stubs/replay-tools.c               |  2 +-
>   util/aio-wait.c                    |  3 +-
>   util/async.c                       | 75 ++++++++++++++++++++++++++++--
>   util/main-loop.c                   |  2 +-
>   util/thread-pool.c                 |  8 ++--
>   scripts/block-coroutine-wrapper.py |  3 +-
>   tests/avocado/replay_linux.py      |  9 ++--
>   29 files changed, 245 insertions(+), 96 deletions(-)
>