mbox series

[RFC,v5,0/3] migration: reduce time of loading non-iterable vmstate

Message ID 20230117115511.3215273-1-xuchuangxclwt@bytedance.com (mailing list archive)
Headers show
Series migration: reduce time of loading non-iterable vmstate | expand

Message

Chuang Xu Jan. 17, 2023, 11:55 a.m. UTC
In this version:

- rename rcu_read_locked() to rcu_read_is_locked().
- adjust the sanity check in address_space_to_flatview().
- improve some comments.

The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of	
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Here are the test1 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 8 16-queue vhost-net device
  - 16 4-queue vhost-user-blk device.

	time of loading non-iterable vmstate     downtime
before		about 150 ms			  740+ ms
after		about 30 ms			  630+ ms

(This result is different from that of v1. It may be that someone has 
changed something on my host.., but it does not affect the display of 
the optimization effect.)


In test2, we keep the number of the device the same as test1, reduce the 
number of queues per device:

Here are the test2 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 8 1-queue vhost-net device
  - 16 1-queue vhost-user-blk device.

	time of loading non-iterable vmstate     downtime
before		about 90 ms			 about 250 ms

after		about 25 ms			 about 160 ms



In test3, we keep the number of queues per device the same as test1, reduce 
the number of devices:

Here are the test3 results:
test info:
- Host
  - Intel(R) Xeon(R) Platinum 8260 CPU
  - NVIDIA Mellanox ConnectX-5
- VM
  - 32 CPUs 128GB RAM VM
  - 1 16-queue vhost-net device
  - 1 4-queue vhost-user-blk device.

	time of loading non-iterable vmstate     downtime
before		about 20 ms			 about 70 ms
after		about 11 ms			 about 60 ms


As we can see from the test results above, both the number of queues and 
the number of devices have a great impact on the time of loading non-iterable 
vmstate. The growth of the number of devices and queues will lead to more 
mr commits, and the time consumption caused by the flatview reconstruction 
will also increase.

Please review, Chuang.

[v4]

- attach more information in the cover letter.
- remove changes on virtio_load.
- add rcu_read_locked() to detect holding of rcu lock.

[v3]

- move virtio_load_check_delay() from virtio_memory_listener_commit() to 
  virtio_vmstate_change().
- add delay_check flag to VirtIODevice to make sure virtio_load_check_delay() 
  will be called when delay_check is true.

[v2]

- rebase to latest upstream.
- add sanity check to address_space_to_flatview().
- postpone the init of the vring cache until migration's loading completes. 

[v1]

The duration of loading non-iterable vmstate accounts for a significant
portion of downtime (starting with the timestamp of source qemu stop and
ending with the timestamp of target qemu start). Most of the time is spent
committing memory region changes repeatedly.

This patch packs all the changes to memory region during the period of
loading non-iterable vmstate in a single memory transaction. With the
increase of devices, this patch will greatly improve the performance.

Here are the test results:
test vm info:
- 32 CPUs 128GB RAM
- 8 16-queue vhost-net device
- 16 4-queue vhost-user-blk device.

	time of loading non-iterable vmstate
before		about 210 ms
after		about 40 ms

Comments

Peter Xu Jan. 17, 2023, 3:41 p.m. UTC | #1
On Tue, Jan 17, 2023 at 07:55:08PM +0800, Chuang Xu wrote:
> In this version:
> 
> - rename rcu_read_locked() to rcu_read_is_locked().
> - adjust the sanity check in address_space_to_flatview().
> - improve some comments.

Acked-by: Peter Xu <peterx@redhat.com>
Juan Quintela Feb. 2, 2023, 11:07 a.m. UTC | #2
Chuang Xu <xuchuangxclwt@bytedance.com> wrote:
> In this version:
>
> - rename rcu_read_locked() to rcu_read_is_locked().
> - adjust the sanity check in address_space_to_flatview().
> - improve some comments.
>
> The duration of loading non-iterable vmstate accounts for a significant
> portion of downtime (starting with the timestamp of source qemu stop and
> ending with the timestamp of target qemu start). Most of the time is spent
> committing memory region changes repeatedly.
>
> This patch packs all the changes to memory region during the period of	
> loading non-iterable vmstate in a single memory transaction. With the
> increase of devices, this patch will greatly improve the performance.
>
> Here are the test1 results:
> test info:
> - Host
>   - Intel(R) Xeon(R) Platinum 8260 CPU
>   - NVIDIA Mellanox ConnectX-5
> - VM
>   - 32 CPUs 128GB RAM VM
>   - 8 16-queue vhost-net device
>   - 16 4-queue vhost-user-blk device.
>
> 	time of loading non-iterable vmstate     downtime
> before		about 150 ms			  740+ ms
> after		about 30 ms			  630+ ms
>
> (This result is different from that of v1. It may be that someone has 
> changed something on my host.., but it does not affect the display of 
> the optimization effect.)
>
>
> In test2, we keep the number of the device the same as test1, reduce the 
> number of queues per device:
>
> Here are the test2 results:
> test info:
> - Host
>   - Intel(R) Xeon(R) Platinum 8260 CPU
>   - NVIDIA Mellanox ConnectX-5
> - VM
>   - 32 CPUs 128GB RAM VM
>   - 8 1-queue vhost-net device
>   - 16 1-queue vhost-user-blk device.
>
> 	time of loading non-iterable vmstate     downtime
> before		about 90 ms			 about 250 ms
>
> after		about 25 ms			 about 160 ms
>
>
>
> In test3, we keep the number of queues per device the same as test1, reduce 
> the number of devices:
>
> Here are the test3 results:
> test info:
> - Host
>   - Intel(R) Xeon(R) Platinum 8260 CPU
>   - NVIDIA Mellanox ConnectX-5
> - VM
>   - 32 CPUs 128GB RAM VM
>   - 1 16-queue vhost-net device
>   - 1 4-queue vhost-user-blk device.
>
> 	time of loading non-iterable vmstate     downtime
> before		about 20 ms			 about 70 ms
> after		about 11 ms			 about 60 ms
>
>
> As we can see from the test results above, both the number of queues and 
> the number of devices have a great impact on the time of loading non-iterable 
> vmstate. The growth of the number of devices and queues will lead to more 
> mr commits, and the time consumption caused by the flatview reconstruction 
> will also increase.
>
> Please review, Chuang.

Hi

As on the review, I aggree with the patches, but I am waiting for Paolo
to review the rcu change (that I think that it is trivial, but I am not
the rcu maintainer).

If it happens that you need to send another version, I think that you
can change the RFC for PATCH.

Again, very good job.

Later, Juan.


> [v4]
>
> - attach more information in the cover letter.
> - remove changes on virtio_load.
> - add rcu_read_locked() to detect holding of rcu lock.
>
> [v3]
>
> - move virtio_load_check_delay() from virtio_memory_listener_commit() to 
>   virtio_vmstate_change().
> - add delay_check flag to VirtIODevice to make sure virtio_load_check_delay() 
>   will be called when delay_check is true.
>
> [v2]
>
> - rebase to latest upstream.
> - add sanity check to address_space_to_flatview().
> - postpone the init of the vring cache until migration's loading completes. 
>
> [v1]
>
> The duration of loading non-iterable vmstate accounts for a significant
> portion of downtime (starting with the timestamp of source qemu stop and
> ending with the timestamp of target qemu start). Most of the time is spent
> committing memory region changes repeatedly.
>
> This patch packs all the changes to memory region during the period of
> loading non-iterable vmstate in a single memory transaction. With the
> increase of devices, this patch will greatly improve the performance.
>
> Here are the test results:
> test vm info:
> - 32 CPUs 128GB RAM
> - 8 16-queue vhost-net device
> - 16 4-queue vhost-user-blk device.
>
> 	time of loading non-iterable vmstate
> before		about 210 ms
> after		about 40 ms
Juan Quintela Feb. 15, 2023, 5 p.m. UTC | #3
Chuang Xu <xuchuangxclwt@bytedance.com> wrote:
> In this version:
>
> - rename rcu_read_locked() to rcu_read_is_locked().
> - adjust the sanity check in address_space_to_flatview().
> - improve some comments.

queued series.

>
> The duration of loading non-iterable vmstate accounts for a significant
> portion of downtime (starting with the timestamp of source qemu stop and
> ending with the timestamp of target qemu start). Most of the time is spent
> committing memory region changes repeatedly.
>
> This patch packs all the changes to memory region during the period of	
> loading non-iterable vmstate in a single memory transaction. With the
> increase of devices, this patch will greatly improve the performance.
>
> Here are the test1 results:
> test info:
> - Host
>   - Intel(R) Xeon(R) Platinum 8260 CPU
>   - NVIDIA Mellanox ConnectX-5
> - VM
>   - 32 CPUs 128GB RAM VM
>   - 8 16-queue vhost-net device
>   - 16 4-queue vhost-user-blk device.
>
> 	time of loading non-iterable vmstate     downtime
> before		about 150 ms			  740+ ms
> after		about 30 ms			  630+ ms
>
> (This result is different from that of v1. It may be that someone has 
> changed something on my host.., but it does not affect the display of 
> the optimization effect.)
>
>
> In test2, we keep the number of the device the same as test1, reduce the 
> number of queues per device:
>
> Here are the test2 results:
> test info:
> - Host
>   - Intel(R) Xeon(R) Platinum 8260 CPU
>   - NVIDIA Mellanox ConnectX-5
> - VM
>   - 32 CPUs 128GB RAM VM
>   - 8 1-queue vhost-net device
>   - 16 1-queue vhost-user-blk device.
>
> 	time of loading non-iterable vmstate     downtime
> before		about 90 ms			 about 250 ms
>
> after		about 25 ms			 about 160 ms
>
>
>
> In test3, we keep the number of queues per device the same as test1, reduce 
> the number of devices:
>
> Here are the test3 results:
> test info:
> - Host
>   - Intel(R) Xeon(R) Platinum 8260 CPU
>   - NVIDIA Mellanox ConnectX-5
> - VM
>   - 32 CPUs 128GB RAM VM
>   - 1 16-queue vhost-net device
>   - 1 4-queue vhost-user-blk device.
>
> 	time of loading non-iterable vmstate     downtime
> before		about 20 ms			 about 70 ms
> after		about 11 ms			 about 60 ms
>
>
> As we can see from the test results above, both the number of queues and 
> the number of devices have a great impact on the time of loading non-iterable 
> vmstate. The growth of the number of devices and queues will lead to more 
> mr commits, and the time consumption caused by the flatview reconstruction 
> will also increase.
>
> Please review, Chuang.
>
> [v4]
>
> - attach more information in the cover letter.
> - remove changes on virtio_load.
> - add rcu_read_locked() to detect holding of rcu lock.
>
> [v3]
>
> - move virtio_load_check_delay() from virtio_memory_listener_commit() to 
>   virtio_vmstate_change().
> - add delay_check flag to VirtIODevice to make sure virtio_load_check_delay() 
>   will be called when delay_check is true.
>
> [v2]
>
> - rebase to latest upstream.
> - add sanity check to address_space_to_flatview().
> - postpone the init of the vring cache until migration's loading completes. 
>
> [v1]
>
> The duration of loading non-iterable vmstate accounts for a significant
> portion of downtime (starting with the timestamp of source qemu stop and
> ending with the timestamp of target qemu start). Most of the time is spent
> committing memory region changes repeatedly.
>
> This patch packs all the changes to memory region during the period of
> loading non-iterable vmstate in a single memory transaction. With the
> increase of devices, this patch will greatly improve the performance.
>
> Here are the test results:
> test vm info:
> - 32 CPUs 128GB RAM
> - 8 16-queue vhost-net device
> - 16 4-queue vhost-user-blk device.
>
> 	time of loading non-iterable vmstate
> before		about 210 ms
> after		about 40 ms
Claudio Fontana Feb. 15, 2023, 5:06 p.m. UTC | #4
On 1/17/23 12:55, Chuang Xu wrote:
> In this version:
> 
> - rename rcu_read_locked() to rcu_read_is_locked().
> - adjust the sanity check in address_space_to_flatview().
> - improve some comments.
> 
> The duration of loading non-iterable vmstate accounts for a significant
> portion of downtime (starting with the timestamp of source qemu stop and
> ending with the timestamp of target qemu start). Most of the time is spent
> committing memory region changes repeatedly.
> 
> This patch packs all the changes to memory region during the period of	
> loading non-iterable vmstate in a single memory transaction. With the
> increase of devices, this patch will greatly improve the performance.
> 
> Here are the test1 results:
> test info:
> - Host
>   - Intel(R) Xeon(R) Platinum 8260 CPU
>   - NVIDIA Mellanox ConnectX-5
> - VM
>   - 32 CPUs 128GB RAM VM
>   - 8 16-queue vhost-net device
>   - 16 4-queue vhost-user-blk device.
> 
> 	time of loading non-iterable vmstate     downtime
> before		about 150 ms			  740+ ms
> after		about 30 ms			  630+ ms
> 
> (This result is different from that of v1. It may be that someone has 
> changed something on my host.., but it does not affect the display of 
> the optimization effect.)
> 
> 
> In test2, we keep the number of the device the same as test1, reduce the 
> number of queues per device:
> 
> Here are the test2 results:
> test info:
> - Host
>   - Intel(R) Xeon(R) Platinum 8260 CPU
>   - NVIDIA Mellanox ConnectX-5
> - VM
>   - 32 CPUs 128GB RAM VM
>   - 8 1-queue vhost-net device
>   - 16 1-queue vhost-user-blk device.
> 
> 	time of loading non-iterable vmstate     downtime
> before		about 90 ms			 about 250 ms
> 
> after		about 25 ms			 about 160 ms
> 
> 
> 
> In test3, we keep the number of queues per device the same as test1, reduce 
> the number of devices:
> 
> Here are the test3 results:
> test info:
> - Host
>   - Intel(R) Xeon(R) Platinum 8260 CPU
>   - NVIDIA Mellanox ConnectX-5
> - VM
>   - 32 CPUs 128GB RAM VM
>   - 1 16-queue vhost-net device
>   - 1 4-queue vhost-user-blk device.
> 
> 	time of loading non-iterable vmstate     downtime
> before		about 20 ms			 about 70 ms
> after		about 11 ms			 about 60 ms
> 
> 
> As we can see from the test results above, both the number of queues and 
> the number of devices have a great impact on the time of loading non-iterable 
> vmstate. The growth of the number of devices and queues will lead to more 
> mr commits, and the time consumption caused by the flatview reconstruction 
> will also increase.
> 
> Please review, Chuang.
> 
> [v4]
> 
> - attach more information in the cover letter.
> - remove changes on virtio_load.
> - add rcu_read_locked() to detect holding of rcu lock.
> 
> [v3]
> 
> - move virtio_load_check_delay() from virtio_memory_listener_commit() to 
>   virtio_vmstate_change().
> - add delay_check flag to VirtIODevice to make sure virtio_load_check_delay() 
>   will be called when delay_check is true.
> 
> [v2]
> 
> - rebase to latest upstream.
> - add sanity check to address_space_to_flatview().
> - postpone the init of the vring cache until migration's loading completes. 
> 
> [v1]
> 
> The duration of loading non-iterable vmstate accounts for a significant
> portion of downtime (starting with the timestamp of source qemu stop and
> ending with the timestamp of target qemu start). Most of the time is spent
> committing memory region changes repeatedly.
> 
> This patch packs all the changes to memory region during the period of
> loading non-iterable vmstate in a single memory transaction. With the
> increase of devices, this patch will greatly improve the performance.
> 
> Here are the test results:
> test vm info:
> - 32 CPUs 128GB RAM
> - 8 16-queue vhost-net device
> - 16 4-queue vhost-user-blk device.
> 
> 	time of loading non-iterable vmstate
> before		about 210 ms
> after		about 40 ms
> 
> 

great improvements on the load times, congrats!

Claudio
Juan Quintela Feb. 15, 2023, 7:10 p.m. UTC | #5
Chuang Xu <xuchuangxclwt@bytedance.com> wrote:
> In this version:

Hi

I had to drop this.  It breaks migration of dbus-vmstate.

.[K144/179 qemu:qtest+qtest-x86_64 / qtest-x86_64/virtio-net-failover             ERROR           5.66s   killed by signal 6 SIGABRT
>>> G_TEST_DBUS_DAEMON=/mnt/code/qemu/multifd/tests/dbus-vmstate-daemon.sh QTEST_QEMU_BINARY=./qemu-system-x86_64 MALLOC_PERTURB_=145 /scratch/qemu/multifd/x64/tests/qtest/virtio-net-failover --tap -k
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― ✀  ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
stderr:
qemu-system-x86_64: /mnt/code/qemu/multifd/include/exec/memory.h:1112: address_space_to_flatview: Assertion `(!memory_region_transaction_in_progress() && qemu_mutex_iothread_locked()) || rcu_read_is_locked()' failed.
Broken pipe
../../../../mnt/code/qemu/multifd/tests/qtest/libqtest.c:190: kill_qemu() detected QEMU death from signal 6 (Aborted) (core dumped)

(test program exited with status code -6)

TAP parsing error: Too few tests run (expected 23, got 12)
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

Can you take a look at this?

I reproduced it with "make check" and qemu compiled with the configure
line attached.

Later, Juan.

configure --enable-trace-backends=log
	--prefix=/usr
	--sysconfdir=/etc/sysconfig/
	--audio-drv-list=pa
	--target-list=x86_64-softmmu
	--with-coroutine=ucontext
	--with-git-submodules=validate
	--enable-fdt=system
	--enable-alsa
	--enable-attr
	--enable-auth-pam
	--enable-avx2
	--enable-avx512f
	--enable-bochs
	--enable-bpf
	--enable-brlapi
	--disable-bsd-user
	--enable-bzip2
	--enable-cap-ng
	--disable-capstone
	--disable-cfi
	--disable-cfi-debug
	--enable-cloop
	--disable-cocoa
	--enable-containers
	--disable-coreaudio
	--enable-coroutine-pool
	--enable-crypto-afalg
	--enable-curl
	--enable-curses
	--enable-dbus-display
	--enable-debug-info
	--disable-debug-mutex
	--disable-debug-stack-usage
	--disable-debug-tcg
	--enable-dmg
	--disable-docs
	--disable-dsound
	--enable-fdt
	--disable-fuse
	--disable-fuse-lseek
	--disable-fuzzing
	--disable-gcov
	--enable-gettext
	--enable-gio
	--enable-glusterfs
	--enable-gnutls
	--disable-gprof
	--enable-gtk
	--disable-guest-agent
	--disable-guest-agent-msi
	--disable-hax
	--disable-hvf
	--enable-iconv
	--enable-install-blobs
	--enable-jack
	--enable-keyring
	--enable-kvm
	--enable-l2tpv3
	--enable-libdaxctl
	--enable-libiscsi
	--enable-libnfs
	--enable-libpmem
	--enable-libssh
	--enable-libudev
	--enable-libusb
	--enable-linux-aio
	--enable-linux-io-uring
	--disable-linux-user
	--enable-live-block-migration
	--disable-lto
	--disable-lzfse
	--enable-lzo
	--disable-malloc-trim
	--enable-membarrier
	--enable-module-upgrades
	--enable-modules
	--enable-mpath
	--enable-multiprocess
	--disable-netmap
	--enable-nettle
	--enable-numa
	--disable-nvmm
	--enable-opengl
	--enable-oss
	--enable-pa
	--enable-parallels
	--enable-pie
	--enable-plugins
	--enable-png
	--disable-profiler
	--enable-pvrdma
	--enable-qcow1
	--enable-qed
	--disable-qom-cast-debug
	--enable-rbd
	--enable-rdma
	--enable-replication
	--enable-rng-none
	--disable-safe-stack
	--disable-sanitizers
	--enable-stack-protector
	--enable-sdl
	--enable-sdl-image
	--enable-seccomp
	--enable-selinux
	--enable-slirp
	--enable-slirp-smbd
	--enable-smartcard
	--enable-snappy
	--enable-sparse
	--enable-spice
	--enable-spice-protocol
	--enable-system
	--enable-tcg
	--disable-tcg-interpreter
	--disable-tools
	--enable-tpm
	--disable-tsan
	--disable-u2f
	--enable-usb-redir
	--disable-user
	--disable-vde
	--enable-vdi
	--enable-vhost-crypto
	--enable-vhost-kernel
	--enable-vhost-net
	--enable-vhost-user
	--enable-vhost-user-blk-server
	--enable-vhost-vdpa
	--enable-virglrenderer
	--enable-virtfs
	--enable-virtiofsd
	--enable-vnc
	--enable-vnc-jpeg
	--enable-vnc-sasl
	--enable-vte
	--enable-vvfat
	--enable-werror
	--disable-whpx
	--enable-xen
	--enable-xen-pci-passthrough
	--enable-xkbcommon
	--enable-zstd
	--disable-gcrypt
Chuang Xu Feb. 16, 2023, 3:41 p.m. UTC | #6
Hi, Juan

Thanks for your test results!

On 2023/2/16 上午3:10, Juan Quintela wrote:
> Chuang Xu wrote:
>> In this version:
> Hi
>
> I had to drop this. It breaks migration of dbus-vmstate.

Previously, I only focused on the precopy migration test
in the normal scenario, but did not run qtest. So I need to
apologize for my inexperience..

>
> .[K144/179 qemu:qtest+qtest-x86_64 / qtest-x86_64/virtio-net-failover
ERROR 5.66s killed by signal 6 SIGABRT
>>>> G_TEST_DBUS_DAEMON=/mnt/code/qemu/multifd/tests/dbus-vmstate-daemon.sh
QTEST_QEMU_BINARY=./qemu-system-x86_64 MALLOC_PERTURB_=145
/scratch/qemu/multifd/x64/tests/qtest/virtio-net-failover --tap -k
>
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
✀
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

> stderr:
> qemu-system-x86_64: /mnt/code/qemu/multifd/include/exec/memory.h:1112:
address_space_to_flatview: Assertion
`(!memory_region_transaction_in_progress() && qemu_mutex_iothread_locked())
|| rcu_read_is_locked()' failed.
> Broken pipe
> ../../../../mnt/code/qemu/multifd/tests/qtest/libqtest.c:190: kill_qemu()
detected QEMU death from signal 6 (Aborted) (core dumped)
>
> (test program exited with status code -6)
>
> TAP parsing error: Too few tests run (expected 23, got 12)
>
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

>
> Can you take a look at this?
>
> I reproduced it with "make check" and qemu compiled with the configure
> line attached.
>
> Later, Juan.
>
> configure --enable-trace-backends=log
> --prefix=/usr
> --sysconfdir=/etc/sysconfig/
> --audio-drv-list=pa
> --target-list=x86_64-softmmu
> --with-coroutine=ucontext
> --with-git-submodules=validate
> --enable-fdt=system
> --enable-alsa
> --enable-attr
> --enable-auth-pam
> --enable-avx2
> --enable-avx512f
> --enable-bochs
> --enable-bpf
> --enable-brlapi
> --disable-bsd-user
> --enable-bzip2
> --enable-cap-ng
> --disable-capstone
> --disable-cfi
> --disable-cfi-debug
> --enable-cloop
> --disable-cocoa
> --enable-containers
> --disable-coreaudio
> --enable-coroutine-pool
> --enable-crypto-afalg
> --enable-curl
> --enable-curses
> --enable-dbus-display
> --enable-debug-info
> --disable-debug-mutex
> --disable-debug-stack-usage
> --disable-debug-tcg
> --enable-dmg
> --disable-docs
> --disable-dsound
> --enable-fdt
> --disable-fuse
> --disable-fuse-lseek
> --disable-fuzzing
> --disable-gcov
> --enable-gettext
> --enable-gio
> --enable-glusterfs
> --enable-gnutls
> --disable-gprof
> --enable-gtk
> --disable-guest-agent
> --disable-guest-agent-msi
> --disable-hax
> --disable-hvf
> --enable-iconv
> --enable-install-blobs
> --enable-jack
> --enable-keyring
> --enable-kvm
> --enable-l2tpv3
> --enable-libdaxctl
> --enable-libiscsi
> --enable-libnfs
> --enable-libpmem
> --enable-libssh
> --enable-libudev
> --enable-libusb
> --enable-linux-aio
> --enable-linux-io-uring
> --disable-linux-user
> --enable-live-block-migration
> --disable-lto
> --disable-lzfse
> --enable-lzo
> --disable-malloc-trim
> --enable-membarrier
> --enable-module-upgrades
> --enable-modules
> --enable-mpath
> --enable-multiprocess
> --disable-netmap
> --enable-nettle
> --enable-numa
> --disable-nvmm
> --enable-opengl
> --enable-oss
> --enable-pa
> --enable-parallels
> --enable-pie
> --enable-plugins
> --enable-png
> --disable-profiler
> --enable-pvrdma
> --enable-qcow1
> --enable-qed
> --disable-qom-cast-debug
> --enable-rbd
> --enable-rdma
> --enable-replication
> --enable-rng-none
> --disable-safe-stack
> --disable-sanitizers
> --enable-stack-protector
> --enable-sdl
> --enable-sdl-image
> --enable-seccomp
> --enable-selinux
> --enable-slirp
> --enable-slirp-smbd
> --enable-smartcard
> --enable-snappy
> --enable-sparse
> --enable-spice
> --enable-spice-protocol
> --enable-system
> --enable-tcg
> --disable-tcg-interpreter
> --disable-tools
> --enable-tpm
> --disable-tsan
> --disable-u2f
> --enable-usb-redir
> --disable-user
> --disable-vde
> --enable-vdi
> --enable-vhost-crypto
> --enable-vhost-kernel
> --enable-vhost-net
> --enable-vhost-user
> --enable-vhost-user-blk-server
> --enable-vhost-vdpa
> --enable-virglrenderer
> --enable-virtfs
> --enable-virtiofsd
> --enable-vnc
> --enable-vnc-jpeg
> --enable-vnc-sasl
> --enable-vte
> --enable-vvfat
> --enable-werror
> --disable-whpx
> --enable-xen
> --enable-xen-pci-passthrough
> --enable-xkbcommon
> --enable-zstd
> --disable-gcrypt

I ran qtest with reference to your environment, and finally reported two
errors.

Error 1(the same as yours):

QTEST_QEMU_BINARY=./qemu-system-x86_64 MALLOC_PERTURB_=87
G_TEST_DBUS_DAEMON=/data00/migration/qemu-open/tests/dbus-vmstate-daemon.sh
/data00/migration/qemu-open/build/tests/qtest/virtio-net-failover --tap -k
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
✀
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

stderr:
qemu-system-x86_64: /data00/migration/qemu-open/include/exec/memory.h:1114:
address_space_to_flatview: Assertion
`(!memory_region_transaction_in_progress() && qemu_mutex_iothread_locked())
|| rcu_read_is_locked()' failed.
Broken pipe
../tests/qtest/libqtest.c:190: kill_qemu() detected QEMU death from signal
6 (Aborted) (core dumped)

(test program exited with status code -6)

TAP parsing error: Too few tests run (expected 23, got 12)

Coredump backtrace:
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007f3af64a8535 in __GI_abort () at abort.c:79
#2 0x00007f3af64a840f in __assert_fail_base (fmt=0x7f3af6609ef0 "%s%s%s:%u:
%s%sAssertion `%s' failed.\n%n", assertion=0x55d9425f48a8
"(!memory_region_transaction_in_progress() && qemu_mutex_iothread_locked())
|| rcu_read_is_locked()",
file=0x55d9425f4870 "/data00/migration/qemu-open/include/exec/memory.h",
line=1114, function=) at assert.c:92
#3 0x00007f3af64b61a2 in __GI___assert_fail
(assertion=assertion@entry=0x55d9425f48a8
"(!memory_region_transaction_in_progress() && qemu_mutex_iothread_locked())
|| rcu_read_is_locked()",
file=file@entry=0x55d9425f4870
"/data00/migration/qemu-open/include/exec/memory.h", line=line@entry=1114,
function=function@entry=0x55d9426cdce0 <__PRETTY_FUNCTION__.20039>
"address_space_to_flatview") at assert.c:101
#4 0x000055d942373853 in address_space_to_flatview (as=0x55d944738648) at
/data00/migration/qemu-open/include/exec/memory.h:1112
#5 0x000055d9423746f5 in address_space_to_flatview (as=0x55d944738648) at
/data00/migration/qemu-open/include/qemu/rcu.h:126
#6 address_space_set_flatview (as=as@entry=0x55d944738648) at
../softmmu/memory.c:1029
#7 0x000055d94237ace3 in address_space_update_topology (as=0x55d944738648)
at ../softmmu/memory.c:1080
#8 address_space_init (as=as@entry=0x55d944738648,
root=root@entry=0x55d9447386a0,
name=name@entry=0x55d9447384f0 "virtio-net-pci") at
../softmmu/memory.c:3082
#9 0x000055d942151e43 in do_pci_register_device (errp=0x7f3aef7fe850,
devfn=, name=0x55d9444b6c40 "virtio-net-pci", pci_dev=0x55d944738410) at
../hw/pci/pci.c:1145
#10 pci_qdev_realize (qdev=0x55d944738410, errp=0x7f3aef7fe850) at
../hw/pci/pci.c:2036
#11 0x000055d942404a8f in device_set_realized (obj=, value=true,
errp=0x7f3aef7feae0) at ../hw/core/qdev.c:510
#12 0x000055d942407e36 in property_set_bool (obj=0x55d944738410, v=, name=,
opaque=0x55d9444c71d0, errp=0x7f3aef7feae0) at ../qom/object.c:2285
#13 0x000055d94240a0e3 in object_property_set (obj=obj@entry=0x55d944738410,
name=name@entry=0x55d942670c23 "realized", v=v@entry=0x55d9452f7a00,
errp=errp@entry=0x7f3aef7feae0) at ../qom/object.c:1420
#14 0x000055d94240d15f in object_property_set_qobject
(obj=obj@entry=0x55d944738410,
name=name@entry=0x55d942670c23 "realized", value=value@entry=0x55d945306cb0,
errp=errp@entry=0x7f3aef7feae0) at ../qom/qom-qobject.c:28
#15 0x000055d94240a354 in object_property_set_bool (obj=0x55d944738410,
name=name@entry=0x55d942670c23 "realized", value=value@entry=true,
errp=errp@entry=0x7f3aef7feae0) at ../qom/object.c:1489
#16 0x000055d94240427c in qdev_realize (dev=, bus=bus@entry=0x55d945141400,
errp=errp@entry=0x7f3aef7feae0) at ../hw/core/qdev.c:292
#17 0x000055d9421ef4a0 in qdev_device_add_from_qdict (opts=0x55d945309c00,
from_json=, errp=, errp@entry=0x7f3aef7feae0) at
/data00/migration/qemu-open/include/hw/qdev-core.h:17
#18 0x000055d942311c85 in failover_add_primary (errp=0x7f3aef7fead8,
n=0x55d9454e8530) at ../hw/net/virtio-net.c:933
#19 virtio_net_set_features (vdev=, features=4611687122246533156) at
../hw/net/virtio-net.c:1004
#20 0x000055d94233d248 in virtio_set_features_nocheck
(vdev=vdev@entry=0x55d9454e8530,
val=val@entry=4611687122246533156) at ../hw/virtio/virtio.c:2851
#21 0x000055d942342eae in virtio_load (vdev=0x55d9454e8530,
f=0x55d944700de0, version_id=11) at ../hw/virtio/virtio.c:3027
#22 0x000055d942207601 in vmstate_load_state (f=f@entry=0x55d944700de0,
vmsd=0x55d9429baba0 , opaque=0x55d9454e8530, version_id=11) at
../migration/vmstate.c:137
#23 0x000055d942222672 in vmstate_load (f=0x55d944700de0,
se=0x55d94561b700) at ../migration/savevm.c:919
#24 0x000055d942222927 in qemu_loadvm_section_start_full
(f=f@entry=0x55d944700de0,
mis=0x55d9444c23e0) at ../migration/savevm.c:2503
#25 0x000055d942225cc8 in qemu_loadvm_state_main (f=f@entry=0x55d944700de0,
mis=mis@entry=0x55d9444c23e0) at ../migration/savevm.c:2729
#26 0x000055d942227195 in qemu_loadvm_state (f=0x55d944700de0) at
../migration/savevm.c:2816
#27 0x000055d94221480e in process_incoming_migration_co (opaque=) at
../migration/migration.c:606
#28 0x000055d94257d2ab in coroutine_trampoline (i0=, i1=) at
../util/coroutine-ucontext.c:177
#29 0x00007f3af64d2c80 in __correctly_grouped_prefixwc (begin=0x2 ,
end=0x0, thousands=0 L'\000', grouping=0x7f3af64bd8eb <__GI_raise+267>
"H\213\214$\b\001") at grouping.c:171
#30 0x0000000000000000 in ?? ()


It seems that when address_space_to_flatview() is called, there is mr
transaction
in progress, and the rcu read lock is not held.

I need to further consider the conditions for sanity check or whether we
can hold the
rcu read lock before address_space_init() to solve the problem.


Error 2:

ERROR:../tests/qtest/migration-helpers.c:205:wait_for_migration_status:
assertion failed: (g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT)
ERROR
180/180 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test ERROR 146.32s
killed by signal 6 SIGABRT
>>> QTEST_QEMU_BINARY=./qemu-system-x86_64 MALLOC_PERTURB_=250
G_TEST_DBUS_DAEMON=/data00/migration/qemu-open/tests/dbus-vmstate-daemon.sh
/data00/migration/qemu-open/build/tests/qtest/migration-test --tap -k
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
✀
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

qemu-system-x86_64: ../softmmu/memory.c:1094:
memory_region_transaction_commit: Assertion `qemu_mutex_iothread_locked()'
failed.
**
ERROR:../tests/qtest/migration-helpers.c:205:wait_for_migration_status:
assertion failed: (g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT)
../tests/qtest/libqtest.c:190: kill_qemu() detected QEMU death from signal
6 (Aborted) (core dumped)

(test program exited with status code -6)

Coredump backtrace:
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007fed5c14d535 in __GI_abort () at abort.c:79
#2 0x00007fed5c14d40f in __assert_fail_base (fmt=0x7fed5c2aeef0 "%s%s%s:%u:
%s%sAssertion `%s' failed.\n%n", assertion=0x561bc4152424
"qemu_mutex_iothread_locked()", file=0x561bc41ae94b "../softmmu/memory.c",
line=1094, function=) at assert.c:92
#3 0x00007fed5c15b1a2 in __GI___assert_fail
(assertion=assertion@entry=0x561bc4152424
"qemu_mutex_iothread_locked()", file=file@entry=0x561bc41ae94b
"../softmmu/memory.c", line=line@entry=1094,
function=function@entry=0x561bc41afca0 <__PRETTY_FUNCTION__.38746>
"memory_region_transaction_commit") at assert.c:101
#4 0x0000561bc3e5a053 in memory_region_transaction_commit () at
../softmmu/memory.c:1094
#5 0x0000561bc3d07b55 in qemu_loadvm_state_main (f=f@entry=0x561bc6443aa0,
mis=mis@entry=0x561bc62028a0) at ../migration/savevm.c:2789
#6 0x0000561bc3d08e46 in postcopy_ram_listen_thread
(opaque=opaque@entry=0x561bc62028a0)
at ../migration/savevm.c:1922
#7 0x0000561bc404b3da in qemu_thread_start (args=) at
../util/qemu-thread-posix.c:505
#8 0x00007fed5c2f2fa3 in start_thread (arg=) at pthread_create.c:486
#9 0x00007fed5c22406f in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95


Error2 is related to postcopy. I don't know much about the code of
postcopy.
So I need some time to look at this part of code.

And later I will send another email to discuss it with Peter.

Copy Peter.

Thanks!
Chuang Xu Feb. 17, 2023, 8:11 a.m. UTC | #7
Hi, Peter!

In my last email to Juan, I mentioned two errors.
Now I want to discuss them with you.

On 2023/2/16 下午11:41, Chuang Xu wrote:
> I ran qtest with reference to your environment, and finally reported 
> two errors.
>
> Error 1(the same as yours):
>
>  QTEST_QEMU_BINARY=./qemu-system-x86_64 MALLOC_PERTURB_=87 
> G_TEST_DBUS_DAEMON=/data00/migration/qemu-open/tests/dbus-vmstate-daemon.sh 
> /data00/migration/qemu-open/build/tests/qtest/virtio-net-failover 
> --tap -k
> ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― 
> ✀ 
> ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
> stderr:
> qemu-system-x86_64: 
> /data00/migration/qemu-open/include/exec/memory.h:1114: 
> address_space_to_flatview: Assertion 
> `(!memory_region_transaction_in_progress() && 
> qemu_mutex_iothread_locked()) || rcu_read_is_locked()' failed.
> Broken pipe
> ../tests/qtest/libqtest.c:190: kill_qemu() detected QEMU death from 
> signal 6 (Aborted) (core dumped)
>
> (test program exited with status code -6)
>
> TAP parsing error: Too few tests run (expected 23, got 12)
>
> Coredump backtrace:
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1  0x00007f3af64a8535 in __GI_abort () at abort.c:79
> #2  0x00007f3af64a840f in __assert_fail_base (fmt=0x7f3af6609ef0 
> "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55d9425f48a8 
> "(!memory_region_transaction_in_progress() && 
> qemu_mutex_iothread_locked()) || rcu_read_is_locked()",
>     file=0x55d9425f4870 
> "/data00/migration/qemu-open/include/exec/memory.h", line=1114, 
> function=<optimized out>) at assert.c:92
> #3  0x00007f3af64b61a2 in __GI___assert_fail 
> (assertion=assertion@entry=0x55d9425f48a8 
> "(!memory_region_transaction_in_progress() && 
> qemu_mutex_iothread_locked()) || rcu_read_is_locked()",
>     file=file@entry=0x55d9425f4870 
> "/data00/migration/qemu-open/include/exec/memory.h", 
> line=line@entry=1114, function=function@entry=0x55d9426cdce0 
> <__PRETTY_FUNCTION__.20039> "address_space_to_flatview") at assert.c:101
> #4  0x000055d942373853 in address_space_to_flatview 
> (as=0x55d944738648) at 
> /data00/migration/qemu-open/include/exec/memory.h:1112
> #5  0x000055d9423746f5 in address_space_to_flatview 
> (as=0x55d944738648) at /data00/migration/qemu-open/include/qemu/rcu.h:126
> #6  address_space_set_flatview (as=as@entry=0x55d944738648) at 
> ../softmmu/memory.c:1029
> #7  0x000055d94237ace3 in address_space_update_topology 
> (as=0x55d944738648) at ../softmmu/memory.c:1080
> #8  address_space_init (as=as@entry=0x55d944738648, 
> root=root@entry=0x55d9447386a0, name=name@entry=0x55d9447384f0 
> "virtio-net-pci") at ../softmmu/memory.c:3082
> #9  0x000055d942151e43 in do_pci_register_device (errp=0x7f3aef7fe850, 
> devfn=<optimized out>, name=0x55d9444b6c40 "virtio-net-pci", 
> pci_dev=0x55d944738410) at ../hw/pci/pci.c:1145
> #10 pci_qdev_realize (qdev=0x55d944738410, errp=0x7f3aef7fe850) at 
> ../hw/pci/pci.c:2036
> #11 0x000055d942404a8f in device_set_realized (obj=<optimized out>, 
> value=true, errp=0x7f3aef7feae0) at ../hw/core/qdev.c:510
> #12 0x000055d942407e36 in property_set_bool (obj=0x55d944738410, 
> v=<optimized out>, name=<optimized out>, opaque=0x55d9444c71d0, 
> errp=0x7f3aef7feae0) at ../qom/object.c:2285
> #13 0x000055d94240a0e3 in object_property_set 
> (obj=obj@entry=0x55d944738410, name=name@entry=0x55d942670c23 
> "realized", v=v@entry=0x55d9452f7a00, errp=errp@entry=0x7f3aef7feae0) 
> at ../qom/object.c:1420
> #14 0x000055d94240d15f in object_property_set_qobject 
> (obj=obj@entry=0x55d944738410, name=name@entry=0x55d942670c23 
> "realized", value=value@entry=0x55d945306cb0, 
> errp=errp@entry=0x7f3aef7feae0) at ../qom/qom-qobject.c:28
> #15 0x000055d94240a354 in object_property_set_bool 
> (obj=0x55d944738410, name=name@entry=0x55d942670c23 "realized", 
> value=value@entry=true, errp=errp@entry=0x7f3aef7feae0) at 
> ../qom/object.c:1489
> #16 0x000055d94240427c in qdev_realize (dev=<optimized out>, 
> bus=bus@entry=0x55d945141400, errp=errp@entry=0x7f3aef7feae0) at 
> ../hw/core/qdev.c:292
> #17 0x000055d9421ef4a0 in qdev_device_add_from_qdict 
> (opts=0x55d945309c00, from_json=<optimized out>, errp=<optimized out>, 
> errp@entry=0x7f3aef7feae0) at 
> /data00/migration/qemu-open/include/hw/qdev-core.h:17
> #18 0x000055d942311c85 in failover_add_primary (errp=0x7f3aef7fead8, 
> n=0x55d9454e8530) at ../hw/net/virtio-net.c:933
> #19 virtio_net_set_features (vdev=<optimized out>, 
> features=4611687122246533156) at ../hw/net/virtio-net.c:1004
> #20 0x000055d94233d248 in virtio_set_features_nocheck 
> (vdev=vdev@entry=0x55d9454e8530, val=val@entry=4611687122246533156) at 
> ../hw/virtio/virtio.c:2851
> #21 0x000055d942342eae in virtio_load (vdev=0x55d9454e8530, 
> f=0x55d944700de0, version_id=11) at ../hw/virtio/virtio.c:3027
> #22 0x000055d942207601 in vmstate_load_state 
> (f=f@entry=0x55d944700de0, vmsd=0x55d9429baba0 <vmstate_virtio_net>, 
> opaque=0x55d9454e8530, version_id=11) at ../migration/vmstate.c:137
> #23 0x000055d942222672 in vmstate_load (f=0x55d944700de0, 
> se=0x55d94561b700) at ../migration/savevm.c:919
> #24 0x000055d942222927 in qemu_loadvm_section_start_full 
> (f=f@entry=0x55d944700de0, mis=0x55d9444c23e0) at 
> ../migration/savevm.c:2503
> #25 0x000055d942225cc8 in qemu_loadvm_state_main 
> (f=f@entry=0x55d944700de0, mis=mis@entry=0x55d9444c23e0) at 
> ../migration/savevm.c:2729
> #26 0x000055d942227195 in qemu_loadvm_state (f=0x55d944700de0) at 
> ../migration/savevm.c:2816
> #27 0x000055d94221480e in process_incoming_migration_co 
> (opaque=<optimized out>) at ../migration/migration.c:606
> #28 0x000055d94257d2ab in coroutine_trampoline (i0=<optimized out>, 
> i1=<optimized out>) at ../util/coroutine-ucontext.c:177
> #29 0x00007f3af64d2c80 in __correctly_grouped_prefixwc (begin=0x2 
> <error: Cannot access memory at address 0x2>, end=0x0, thousands=0 
> L'\000', grouping=0x7f3af64bd8eb <__GI_raise+267> "H\213\214$\b\001") 
> at grouping.c:171
> #30 0x0000000000000000 in ?? ()
>
>
> It seems that when address_space_to_flatview() is called, there is mr 
> transaction
> in progress, and the rcu read lock is not held.
>
> I need to further consider the conditions for sanity check or whether 
> we can hold the
> rcu read lock before address_space_init() to solve the problem.
>
>
> Error 2:
>
> ERROR:../tests/qtest/migration-helpers.c:205:wait_for_migration_status: 
> assertion failed: (g_test_timer_elapsed() < 
> MIGRATION_STATUS_WAIT_TIMEOUT) ERROR
> 180/180 qemu:qtest+qtest-x86_64 / 
> qtest-x86_64/migration-test                  ERROR 146.32s   killed by 
> signal 6 SIGABRT
>>>> QTEST_QEMU_BINARY=./qemu-system-x86_64 MALLOC_PERTURB_=250 
>>>> G_TEST_DBUS_DAEMON=/data00/migration/qemu-open/tests/dbus-vmstate-daemon.sh 
>>>> /data00/migration/qemu-open/build/tests/qtest/migration-test --tap -k
> ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― 
> ✀ 
> ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
> qemu-system-x86_64: ../softmmu/memory.c:1094: 
> memory_region_transaction_commit: Assertion 
> `qemu_mutex_iothread_locked()' failed.
> **
> ERROR:../tests/qtest/migration-helpers.c:205:wait_for_migration_status: 
> assertion failed: (g_test_timer_elapsed() < 
> MIGRATION_STATUS_WAIT_TIMEOUT)
> ../tests/qtest/libqtest.c:190: kill_qemu() detected QEMU death from 
> signal 6 (Aborted) (core dumped)
>
> (test program exited with status code -6)
>
> Coredump backtrace:
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1  0x00007fed5c14d535 in __GI_abort () at abort.c:79
> #2  0x00007fed5c14d40f in __assert_fail_base (fmt=0x7fed5c2aeef0 
> "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x561bc4152424 
> "qemu_mutex_iothread_locked()", file=0x561bc41ae94b 
> "../softmmu/memory.c", line=1094, function=<optimized out>) at 
> assert.c:92
> #3  0x00007fed5c15b1a2 in __GI___assert_fail 
> (assertion=assertion@entry=0x561bc4152424 
> "qemu_mutex_iothread_locked()", file=file@entry=0x561bc41ae94b 
> "../softmmu/memory.c", line=line@entry=1094,
>     function=function@entry=0x561bc41afca0 <__PRETTY_FUNCTION__.38746> 
> "memory_region_transaction_commit") at assert.c:101
> #4  0x0000561bc3e5a053 in memory_region_transaction_commit () at 
> ../softmmu/memory.c:1094
> #5  0x0000561bc3d07b55 in qemu_loadvm_state_main 
> (f=f@entry=0x561bc6443aa0, mis=mis@entry=0x561bc62028a0) at 
> ../migration/savevm.c:2789
> #6  0x0000561bc3d08e46 in postcopy_ram_listen_thread 
> (opaque=opaque@entry=0x561bc62028a0) at ../migration/savevm.c:1922
> #7  0x0000561bc404b3da in qemu_thread_start (args=<optimized out>) at 
> ../util/qemu-thread-posix.c:505
> #8  0x00007fed5c2f2fa3 in start_thread (arg=<optimized out>) at 
> pthread_create.c:486
> #9  0x00007fed5c22406f in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
>
>
> Error2 is related to postcopy. I don't know much about the code of 
> postcopy.
> So I need some time to look at this part of code.
>
> And later I will send another email to discuss it with Peter.
>
> Copy Peter.
>
> Thanks!

Error 1 was triggered by our sanity check. I try to add RCU_READ_LOCK_GUARD()
in address_space_init() and it works. But I'm not sure if this code change is
appropriate. If this change is not appropriate, we may need to consider other
sanity check.

Error 2 was related to postcopy. I read the official document of postcopy
(I hope it is the latest) and learned that two threads will call
qemu_loadvm_state_main() in the process of postcopy. The one called by main thread
will take the BQL, and the one called by ram_listen thread won't take the BQL.
The latter checks whether the BQL is held when calling memory_region_transaction_commit(),
thus triggering the assertion. Creating a new function qemu_loadvm_state_ram_listen()
without memory_region_transaction_commit() will solve this error.

I don't know if you suggest using this patch in postcopy. If this patch is applicable to
postcopy, considering the difference between how postcopy and precheck load device state,
do we need to consider more details?

Looking forward to your reply.

Thanks.
Peter Xu Feb. 17, 2023, 3:52 p.m. UTC | #8
Hello, Chuang,

On Fri, Feb 17, 2023 at 04:11:19PM +0800, Chuang Xu wrote:
> Error 1 was triggered by our sanity check. I try to add RCU_READ_LOCK_GUARD()
> in address_space_init() and it works. But I'm not sure if this code change is
> appropriate. If this change is not appropriate, we may need to consider other
> sanity check.

I'd suggest not adding RCU locks without a good reason.

address_space_init() is definitely a special context because the AS is
exclusively owned by the caller before it returns.  It means no RCU
protection needed at all because no one else is touching it; neither do we
need qatomic_rcu_read() when read.

So I suggest we directly reference current_map, even though that'll need a
rich comment:

 static void address_space_set_flatview(AddressSpace *as)
 {
-    FlatView *old_view = address_space_to_flatview(as);
+    /*
+     * NOTE: we don't use RCU flavoured of address_space_to_flatview()
+     * because we exclusively own as->current_map here: it's either during
+     * init of an address space, or during commit() with BQL held.
+     */
+    FlatView *old_view = as->current_map;

We can have address_space_to_flatview_raw() but since we'll directly modify
as->current_map very soon in the same function, so may not even bother.

> 
> Error 2 was related to postcopy. I read the official document of postcopy
> (I hope it is the latest) and learned that two threads will call
> qemu_loadvm_state_main() in the process of postcopy. The one called by main thread
> will take the BQL, and the one called by ram_listen thread won't take the BQL.
> The latter checks whether the BQL is held when calling memory_region_transaction_commit(),
> thus triggering the assertion. Creating a new function qemu_loadvm_state_ram_listen()
> without memory_region_transaction_commit() will solve this error.

Sounds right, because the whole qemu_loadvm_state_main() process shouldn't
load any device state or anything that requires BQL at all; in most cases
that should be only RAM states leftovers.

I think we only want to optimize precopy but not the postcopy phase. Note!
it should include the phase when transferring precopy -> postcopy too, so
it's covering postcopy, just not covering the "post" phase of migration -
if you see that's a nested call to qemu_loadvm_state_main() with a whole
MIG_CMD_PACKAGED package which is actually got covered, which is the real
meat for postcopy on device transitions.

So in short: instead of creating qemu_loadvm_state_ram_listen(), how about
modifying your patch 3, instead of changing inside qemu_loadvm_state_main()
we can do that for qemu_loadvm_state() only (so you can wrap the begin()
and commit() over qemu_loadvm_state_main() there)?

> 
> I don't know if you suggest using this patch in postcopy. If this patch is applicable to
> postcopy, considering the difference between how postcopy and precheck load device state,
> do we need to consider more details?

See above.  Yes I definitely hope postcopy will be covered too.

Thanks!
Chuang Xu Feb. 20, 2023, 9:53 a.m. UTC | #9
Hi, Juan

On 2023/2/16 上午3:10, Juan Quintela wrote:
> Chuang Xu <xuchuangxclwt@bytedance.com> wrote:
>> In this version:
> Hi
>
> I had to drop this.  It breaks migration of dbus-vmstate.
>
> .[K144/179 qemu:qtest+qtest-x86_64 / qtest-x86_64/virtio-net-failover             ERROR           5.66s   killed by signal 6 SIGABRT
>>>> G_TEST_DBUS_DAEMON=/mnt/code/qemu/multifd/tests/dbus-vmstate-daemon.sh QTEST_QEMU_BINARY=./qemu-system-x86_64 MALLOC_PERTURB_=145 /scratch/qemu/multifd/x64/tests/qtest/virtio-net-failover --tap -k
> ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― ✀  ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
> stderr:
> qemu-system-x86_64: /mnt/code/qemu/multifd/include/exec/memory.h:1112: address_space_to_flatview: Assertion `(!memory_region_transaction_in_progress() && qemu_mutex_iothread_locked()) || rcu_read_is_locked()' failed.
> Broken pipe
> ../../../../mnt/code/qemu/multifd/tests/qtest/libqtest.c:190: kill_qemu() detected QEMU death from signal 6 (Aborted) (core dumped)
>
> (test program exited with status code -6)
>
> TAP parsing error: Too few tests run (expected 23, got 12)
> ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
>
> Can you take a look at this?
>
> I reproduced it with "make check" and qemu compiled with the configure
> line attached.
>
> Later, Juan.
>
> configure --enable-trace-backends=log
> 	--prefix=/usr
> 	--sysconfdir=/etc/sysconfig/
> 	--audio-drv-list=pa
> 	--target-list=x86_64-softmmu
> 	--with-coroutine=ucontext
> 	--with-git-submodules=validate
> 	--enable-fdt=system
> 	--enable-alsa
> 	--enable-attr
> 	--enable-auth-pam
> 	--enable-avx2
> 	--enable-avx512f
> 	--enable-bochs
> 	--enable-bpf
> 	--enable-brlapi
> 	--disable-bsd-user
> 	--enable-bzip2
> 	--enable-cap-ng
> 	--disable-capstone
> 	--disable-cfi
> 	--disable-cfi-debug
> 	--enable-cloop
> 	--disable-cocoa
> 	--enable-containers
> 	--disable-coreaudio
> 	--enable-coroutine-pool
> 	--enable-crypto-afalg
> 	--enable-curl
> 	--enable-curses
> 	--enable-dbus-display
> 	--enable-debug-info
> 	--disable-debug-mutex
> 	--disable-debug-stack-usage
> 	--disable-debug-tcg
> 	--enable-dmg
> 	--disable-docs
> 	--disable-dsound
> 	--enable-fdt
> 	--disable-fuse
> 	--disable-fuse-lseek
> 	--disable-fuzzing
> 	--disable-gcov
> 	--enable-gettext
> 	--enable-gio
> 	--enable-glusterfs
> 	--enable-gnutls
> 	--disable-gprof
> 	--enable-gtk
> 	--disable-guest-agent
> 	--disable-guest-agent-msi
> 	--disable-hax
> 	--disable-hvf
> 	--enable-iconv
> 	--enable-install-blobs
> 	--enable-jack
> 	--enable-keyring
> 	--enable-kvm
> 	--enable-l2tpv3
> 	--enable-libdaxctl
> 	--enable-libiscsi
> 	--enable-libnfs
> 	--enable-libpmem
> 	--enable-libssh
> 	--enable-libudev
> 	--enable-libusb
> 	--enable-linux-aio
> 	--enable-linux-io-uring
> 	--disable-linux-user
> 	--enable-live-block-migration
> 	--disable-lto
> 	--disable-lzfse
> 	--enable-lzo
> 	--disable-malloc-trim
> 	--enable-membarrier
> 	--enable-module-upgrades
> 	--enable-modules
> 	--enable-mpath
> 	--enable-multiprocess
> 	--disable-netmap
> 	--enable-nettle
> 	--enable-numa
> 	--disable-nvmm
> 	--enable-opengl
> 	--enable-oss
> 	--enable-pa
> 	--enable-parallels
> 	--enable-pie
> 	--enable-plugins
> 	--enable-png
> 	--disable-profiler
> 	--enable-pvrdma
> 	--enable-qcow1
> 	--enable-qed
> 	--disable-qom-cast-debug
> 	--enable-rbd
> 	--enable-rdma
> 	--enable-replication
> 	--enable-rng-none
> 	--disable-safe-stack
> 	--disable-sanitizers
> 	--enable-stack-protector
> 	--enable-sdl
> 	--enable-sdl-image
> 	--enable-seccomp
> 	--enable-selinux
> 	--enable-slirp
> 	--enable-slirp-smbd
> 	--enable-smartcard
> 	--enable-snappy
> 	--enable-sparse
> 	--enable-spice
> 	--enable-spice-protocol
> 	--enable-system
> 	--enable-tcg
> 	--disable-tcg-interpreter
> 	--disable-tools
> 	--enable-tpm
> 	--disable-tsan
> 	--disable-u2f
> 	--enable-usb-redir
> 	--disable-user
> 	--disable-vde
> 	--enable-vdi
> 	--enable-vhost-crypto
> 	--enable-vhost-kernel
> 	--enable-vhost-net
> 	--enable-vhost-user
> 	--enable-vhost-user-blk-server
> 	--enable-vhost-vdpa
> 	--enable-virglrenderer
> 	--enable-virtfs
> 	--enable-virtiofsd
> 	--enable-vnc
> 	--enable-vnc-jpeg
> 	--enable-vnc-sasl
> 	--enable-vte
> 	--enable-vvfat
> 	--enable-werror
> 	--disable-whpx
> 	--enable-xen
> 	--enable-xen-pci-passthrough
> 	--enable-xkbcommon
> 	--enable-zstd
> 	--disable-gcrypt

I'll fix this error in v6.

In addition to the test mentioned in your email, are there
any other conditions that need to be tested? I hope to have
a full test before I send v6.

Thanks!
Juan Quintela Feb. 20, 2023, 12:07 p.m. UTC | #10
Chuang Xu <xuchuangxclwt@bytedance.com> wrote:
> Hi, Juan

>> 	--target-list=x86_64-softmmu

Compile withouth this line, that will compile all system emulators.
If you pass "make check" there, I would think that you have done your
part.

Thanks, Juan.
Chuang Xu Feb. 20, 2023, 1:36 p.m. UTC | #11
Hi, Peter

On 2023/2/17 下午11:52, Peter Xu wrote:
> Hello, Chuang,
>
> On Fri, Feb 17, 2023 at 04:11:19PM +0800, Chuang Xu wrote:
>> Error 1 was triggered by our sanity check. I try to add RCU_READ_LOCK_GUARD()
>> in address_space_init() and it works. But I'm not sure if this code change is
>> appropriate. If this change is not appropriate, we may need to consider other
>> sanity check.
> I'd suggest not adding RCU locks without a good reason.
>
> address_space_init() is definitely a special context because the AS is
> exclusively owned by the caller before it returns.  It means no RCU
> protection needed at all because no one else is touching it; neither do we
> need qatomic_rcu_read() when read.
>
> So I suggest we directly reference current_map, even though that'll need a
> rich comment:
>
>   static void address_space_set_flatview(AddressSpace *as)
>   {
> -    FlatView *old_view = address_space_to_flatview(as);
> +    /*
> +     * NOTE: we don't use RCU flavoured of address_space_to_flatview()
> +     * because we exclusively own as->current_map here: it's either during
> +     * init of an address space, or during commit() with BQL held.
> +     */
> +    FlatView *old_view = as->current_map;
>
> We can have address_space_to_flatview_raw() but since we'll directly modify
> as->current_map very soon in the same function, so may not even bother.

I agree with you.

But now I am facing a new problem. Our sanity check is not as reliable as we think.

Although my current code can pass all the tests that Juan told me in the email.
But if I configure with nothing and run 'make check', My patch triggers
error in ahci migrate test:

G_TEST_DBUS_DAEMON=/data00/migration/qemu-open/tests/dbus-vmstate-daemon.sh QTEST_QEMU_BINARY=./qemu-system-x86_64 QTEST_QEMU_IMG=./qemu-img MALLOC_PERTURB_=1 QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon /data00/migration/qemu-open/build/tests/qtest/ahci-test --tap -k
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― ✀  ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
stderr:
qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS receive buffer address
qemu-system-x86_64: Failed to load ich9_ahci:ahci
qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:1f.2/ich9_ahci'
qemu-system-x86_64: load of migration failed: Operation not permitted
Migration did not complete, status: failed

It seems that ahci_state_post_load() will call memory_access_is_direct() and access mr->ram.
Due to mr transaction delay, this value doesn't meet expectations.
And Here is the call chain for you to read the code:
->ahci_state_post_load
->ahci_cond_start_engines
->ahci_map_fis_address
->map_page
->dma_memory_map
->address_space_map
->memory_access_is_direct

I think we need a memory_region_transaction_commit_force() to force commit
some transactions when load vmstate. This function is designed like this:

|
/* memory_region_transaction_commit_force() is desgined to * force the 
mr transaction to be commited in the process * of loading vmstate. */ 
void memory_region_transaction_commit_force(void) { AddressSpace *as; 
unsigned int memory_region_transaction_depth_copy = 
memory_region_transaction_depth; /* Temporarily replace 
memory_region_transaction_depth with 0 to prevent * 
memory_region_transaction_commit_force() and address_space_to_flatview() 
* call each other recursively. */ memory_region_transaction_depth = 0; 
assert(qemu_mutex_iothread_locked()); if (memory_region_update_pending) 
{ flatviews_reset(); MEMORY_LISTENER_CALL_GLOBAL(begin, Forward); 
QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) { 
address_space_set_flatview(as); address_space_update_ioeventfds(as); } 
memory_region_update_pending = false; ioeventfd_update_pending = false; 
MEMORY_LISTENER_CALL_GLOBAL(commit, Forward); } else if 
(ioeventfd_update_pending) { QTAILQ_FOREACH(as, &address_spaces, 
address_spaces_link) { address_space_update_ioeventfds(as); } 
ioeventfd_update_pending = false; } /* recover 
memory_region_transaction_depth */ memory_region_transaction_depth = 
memory_region_transaction_depth_copy; }
Now there are two options to use this function:
1. call it in address_space_to_flatview():

|
static inline FlatView *address_space_to_flatview(AddressSpace *as) {
/* * Before using any flatview, check whether we're during a memory * 
region transaction. If so, force commit the memory transaction. */
if (memory_region_transaction_in_progress()) 
memory_region_transaction_commit_force();
return qatomic_rcu_read(&as->current_map); }
2. call it before each post_load()
I prefer to use the former one, it is more general than the latter.
And with this function, the sanity check is not necessary any more.
Although we may inevitably call 
memory_region_transaction_commit_force()||||
several times, in my actual test, the number of calls to ||
|memory_region_transaction_commit_force|() is still insignificant compared
with the number of calls to|memory_region_transaction_commit()| before
optimization. As a result, This code won't affect the optimization effect,
but it can ensure reliability. |

|

>> Error 2 was related to postcopy. I read the official document of postcopy
>> (I hope it is the latest) and learned that two threads will call
>> qemu_loadvm_state_main() in the process of postcopy. The one called by main thread
>> will take the BQL, and the one called by ram_listen thread won't take the BQL.
>> The latter checks whether the BQL is held when calling memory_region_transaction_commit(),
>> thus triggering the assertion. Creating a new function qemu_loadvm_state_ram_listen()
>> without memory_region_transaction_commit() will solve this error.
> Sounds right, because the whole qemu_loadvm_state_main() process shouldn't
> load any device state or anything that requires BQL at all; in most cases
> that should be only RAM states leftovers.
>
> I think we only want to optimize precopy but not the postcopy phase. Note!
> it should include the phase when transferring precopy -> postcopy too, so
> it's covering postcopy, just not covering the "post" phase of migration -
> if you see that's a nested call to qemu_loadvm_state_main() with a whole
> MIG_CMD_PACKAGED package which is actually got covered, which is the real
> meat for postcopy on device transitions.
>
> So in short: instead of creating qemu_loadvm_state_ram_listen(), how about
> modifying your patch 3, instead of changing inside qemu_loadvm_state_main()
> we can do that for qemu_loadvm_state() only (so you can wrap the begin()
> and commit() over qemu_loadvm_state_main() there)?

In Patch v6 I'll adopt the change you suggested, Thanks!

>> I don't know if you suggest using this patch in postcopy. If this patch is applicable to
>> postcopy, considering the difference between how postcopy and precheck load device state,
>> do we need to consider more details?
> See above.  Yes I definitely hope postcopy will be covered too.
>
> Thanks!
>
Chuang Xu Feb. 21, 2023, 3:38 a.m. UTC | #12
Hi, Peter

It seems that there is a problem with the code format in my last email.
I adjusted the format and resend this to you. Hope the format of this
email won't be wrong again..  :)

On 2023/2/17 下午11:52, Peter Xu wrote:
> Hello, Chuang,
>
> On Fri, Feb 17, 2023 at 04:11:19PM +0800, Chuang Xu wrote:
>> Error 1 was triggered by our sanity check. I try to add RCU_READ_LOCK_GUARD()
>> in address_space_init() and it works. But I'm not sure if this code change is
>> appropriate. If this change is not appropriate, we may need to consider other
>> sanity check.
> I'd suggest not adding RCU locks without a good reason.
>
> address_space_init() is definitely a special context because the AS is
> exclusively owned by the caller before it returns.  It means no RCU
> protection needed at all because no one else is touching it; neither do we
> need qatomic_rcu_read() when read.
>
> So I suggest we directly reference current_map, even though that'll need a
> rich comment:
>
>   static void address_space_set_flatview(AddressSpace *as)
>   {
> -    FlatView *old_view = address_space_to_flatview(as);
> +    /*
> +     * NOTE: we don't use RCU flavoured of address_space_to_flatview()
> +     * because we exclusively own as->current_map here: it's either during
> +     * init of an address space, or during commit() with BQL held.
> +     */
> +    FlatView *old_view = as->current_map;
>
> We can have address_space_to_flatview_raw() but since we'll directly modify
> as->current_map very soon in the same function, so may not even bother.
I agree with you.

But now I am facing a new problem. Our sanity check is not as reliable 
as we think.

Although my current code can pass all the tests that Juan told me in the 
email.
But if I configure with nothing and run 'make check', My patch triggers 
error
in ahci migrate test:

G_TEST_DBUS_DAEMON=/data00/migration/qemu-open/tests/dbus-vmstate-daemon.sh
QTEST_QEMU_BINARY=./qemu-system-x86_64 QTEST_QEMU_IMG=./qemu-img
MALLOC_PERTURB_=1
QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon
/data00/migration/qemu-open/build/tests/qtest/ahci-test --tap -k
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― 

✀
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― 

stderr:
qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS
receive buffer address
qemu-system-x86_64: Failed to load ich9_ahci:ahci
qemu-system-x86_64: error while loading state for instance 0x0 of device
'0000:00:1f.2/ich9_ahci'
qemu-system-x86_64: load of migration failed: Operation not permitted
Migration did not complete, status: failed

It seems that ahci_state_post_load() will call memory_access_is_direct()
and access mr->ram. Due to mr transaction delay, this value doesn't meet
expectations. And Here is the call chain for you to read the code:
->ahci_state_post_load
->ahci_cond_start_engines
->ahci_map_fis_address
->map_page
->dma_memory_map
->address_space_map
->memory_access_is_direct


I think we need a memory_region_transaction_commit_force() to force commit
some transactions when load vmstate. This function is designed like this:

/*
  * memory_region_transaction_commit_force() is desgined to
  * force the mr transaction to be commited in the process
  * of loading vmstate.
  */
void memory_region_transaction_commit_force(void)
{
     AddressSpace *as;
     unsigned int memory_region_transaction_depth_copy = 
memory_region_transaction_depth;

     /*
      * Temporarily replace memory_region_transaction_depth with 0 to 
prevent
      * memory_region_transaction_commit_force() and 
address_space_to_flatview()
      * call each other recursively.
      */
     memory_region_transaction_depth = 0;

     assert(qemu_mutex_iothread_locked());


     if (memory_region_update_pending) {
         flatviews_reset();

         MEMORY_LISTENER_CALL_GLOBAL(begin, Forward);

         QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
             address_space_set_flatview(as);
             address_space_update_ioeventfds(as);
         }
         memory_region_update_pending = false;
         ioeventfd_update_pending = false;
         MEMORY_LISTENER_CALL_GLOBAL(commit, Forward);
     } else if (ioeventfd_update_pending) {
         QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
             address_space_update_ioeventfds(as);
         }
         ioeventfd_update_pending = false;
     }

     /* recover memory_region_transaction_depth */
     memory_region_transaction_depth = 
memory_region_transaction_depth_copy;
}

Now there are two options to use this function:
1. call it in address_space_to_flatview():

static inline FlatView *address_space_to_flatview(AddressSpace *as)
{
     /*
      * Before using any flatview, check whether we're during a memory
      * region transaction. If so, force commit the memory region 
transaction.
      */
     if (memory_region_transaction_in_progress())
         memory_region_transaction_commit_force();
     return qatomic_rcu_read(&as->current_map);
}

2. call it before each post_load()

I prefer to use the former one, it is more general than the latter.
And with this function, the sanity check is not necessary any more.
Although we may inevitably call memory_region_transaction_commit_force()
several times, in my actual test, the number of calls to
memory_region_transaction_commit_force() is still insignificant compared
with the number of calls to memory_region_transaction_commit() before
optimization. As a result, This code won't affect the optimization effect,
but it can ensure reliability.
>> Error 2 was related to postcopy. I read the official document of postcopy
>> (I hope it is the latest) and learned that two threads will call
>> qemu_loadvm_state_main() in the process of postcopy. The one called by main thread
>> will take the BQL, and the one called by ram_listen thread won't take the BQL.
>> The latter checks whether the BQL is held when calling memory_region_transaction_commit(),
>> thus triggering the assertion. Creating a new function qemu_loadvm_state_ram_listen()
>> without memory_region_transaction_commit() will solve this error.
> Sounds right, because the whole qemu_loadvm_state_main() process shouldn't
> load any device state or anything that requires BQL at all; in most cases
> that should be only RAM states leftovers.
>
> I think we only want to optimize precopy but not the postcopy phase. Note!
> it should include the phase when transferring precopy -> postcopy too, so
> it's covering postcopy, just not covering the "post" phase of migration -
> if you see that's a nested call to qemu_loadvm_state_main() with a whole
> MIG_CMD_PACKAGED package which is actually got covered, which is the real
> meat for postcopy on device transitions.
>
> So in short: instead of creating qemu_loadvm_state_ram_listen(), how about
> modifying your patch 3, instead of changing inside qemu_loadvm_state_main()
> we can do that for qemu_loadvm_state() only (so you can wrap the begin()
> and commit() over qemu_loadvm_state_main() there)?
In Patch v6 I'll adopt the change you suggested, Thanks!
>> I don't know if you suggest using this patch in postcopy. If this patch is applicable to
>> postcopy, considering the difference between how postcopy and precheck load device state,
>> do we need to consider more details?
> See above.  Yes I definitely hope postcopy will be covered too.
>
> Thanks!
>
Chuang Xu Feb. 21, 2023, 8:57 a.m. UTC | #13
Hi, Peter

This email is a supplement to the previous one.

On 2023/2/21 上午11:38, Chuang Xu wrote:
>
> I think we need a memory_region_transaction_commit_force() to force 
> commit
> some transactions when load vmstate. This function is designed like this:
>
> /*
>  * memory_region_transaction_commit_force() is desgined to
>  * force the mr transaction to be commited in the process
>  * of loading vmstate.
>  */
> void memory_region_transaction_commit_force(void)
> {
>     AddressSpace *as;
>     unsigned int memory_region_transaction_depth_copy = 
> memory_region_transaction_depth;
>
>     /*
>      * Temporarily replace memory_region_transaction_depth with 0 to 
> prevent
>      * memory_region_transaction_commit_force() and 
> address_space_to_flatview()
>      * call each other recursively.
>      */
>     memory_region_transaction_depth = 0;
>
>     assert(qemu_mutex_iothread_locked());
>
>
>     if (memory_region_update_pending) {
>         flatviews_reset();
>
>         MEMORY_LISTENER_CALL_GLOBAL(begin, Forward);
>
>         QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
>             address_space_set_flatview(as);
>             address_space_update_ioeventfds(as);
>         }
>         memory_region_update_pending = false;
>         ioeventfd_update_pending = false;
>         MEMORY_LISTENER_CALL_GLOBAL(commit, Forward);
>     } else if (ioeventfd_update_pending) {
>         QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
>             address_space_update_ioeventfds(as);
>         }
>         ioeventfd_update_pending = false;
>     }
>
>     /* recover memory_region_transaction_depth */
>     memory_region_transaction_depth = 
> memory_region_transaction_depth_copy;
> }
>
> Now there are two options to use this function:
> 1. call it in address_space_to_flatview():
>
> static inline FlatView *address_space_to_flatview(AddressSpace *as)
> {
>     /*
>      * Before using any flatview, check whether we're during a memory
>      * region transaction. If so, force commit the memory region 
> transaction.
>      */
>     if (memory_region_transaction_in_progress())

Here we need to add the condition of BQL holding, or some threads without
BQL held running here will trigger the assertion in
memory_region_transaction_commit_force().

I'm not sure whether this condition is sufficient, at least for the mr access
in the load thread it is sufficient (because the load thread will hold the BQL
when accessing mr). But for other cases, it seems that we will return to
our discussion on sanity check..

Another point I worry about is whether the number of mr transaction commits
has increased in some other scenarios because of this force commit. Although
So far, I haven't seen a simple virtual machine lifecycle trigger this force commit.
I did a little test: replace commit_force() with abort() and run qtest.
Almost all error I can see is related to migration..

> memory_region_transaction_commit_force();
>     return qatomic_rcu_read(&as->current_map);
> }
>
> 2. call it before each post_load()
>
> I prefer to use the former one, it is more general than the latter.
> And with this function, the sanity check is not necessary any more.
> Although we may inevitably call memory_region_transaction_commit_force()
> several times, in my actual test, the number of calls to
> memory_region_transaction_commit_force() is still insignificant compared
> with the number of calls to memory_region_transaction_commit() before
> optimization. As a result, This code won't affect the optimization 
> effect,
> but it can ensure reliability.
>
Looking forward to your opinion, Thanks!
Peter Xu Feb. 21, 2023, 8:36 p.m. UTC | #14
On Tue, Feb 21, 2023 at 04:57:30PM +0800, Chuang Xu wrote:
> Hi, Peter

Hi, Chuang,

> 
> This email is a supplement to the previous one.

AFAICT I mostly agree with you, but see a few more comments below.

> 
> On 2023/2/21 上午11:38, Chuang Xu wrote:
> > 
> > I think we need a memory_region_transaction_commit_force() to force
> > commit
> > some transactions when load vmstate. This function is designed like this:
> > 
> > /*
> >  * memory_region_transaction_commit_force() is desgined to
> >  * force the mr transaction to be commited in the process
> >  * of loading vmstate.
> >  */
> > void memory_region_transaction_commit_force(void)

I would call this memory_region_transaction_do_commit(), and I don't think
the manipulation of memory_region_transaction_depth is needed here since we
don't release BQL during the whole process, so changing that depth isn't
needed at all to me.

So, I think we can...

> > {
> >     AddressSpace *as;
> >     unsigned int memory_region_transaction_depth_copy =
> > memory_region_transaction_depth;
> > 
> >     /*
> >      * Temporarily replace memory_region_transaction_depth with 0 to
> > prevent
> >      * memory_region_transaction_commit_force() and
> > address_space_to_flatview()
> >      * call each other recursively.
> >      */
> >     memory_region_transaction_depth = 0;

... drop here ...

> > 
> >     assert(qemu_mutex_iothread_locked());
> > 
> > 
> >     if (memory_region_update_pending) {
> >         flatviews_reset();
> > 
> >         MEMORY_LISTENER_CALL_GLOBAL(begin, Forward);
> > 
> >         QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
> >             address_space_set_flatview(as);
> >             address_space_update_ioeventfds(as);
> >         }
> >         memory_region_update_pending = false;
> >         ioeventfd_update_pending = false;
> >         MEMORY_LISTENER_CALL_GLOBAL(commit, Forward);
> >     } else if (ioeventfd_update_pending) {
> >         QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
> >             address_space_update_ioeventfds(as);
> >         }
> >         ioeventfd_update_pending = false;
> >     }
> > 
> >     /* recover memory_region_transaction_depth */
> >     memory_region_transaction_depth =
> > memory_region_transaction_depth_copy;

... drop here ...

> > }

... then call this new memory_region_transaction_do_commit() in
memory_region_transaction_commit().

void memory_region_transaction_commit(void)
{
    AddressSpace *as;

    assert(memory_region_transaction_depth);
    --memory_region_transaction_depth;
    memory_region_transaction_do_commit();
}

Then...

> > 
> > Now there are two options to use this function:
> > 1. call it in address_space_to_flatview():
> > 
> > static inline FlatView *address_space_to_flatview(AddressSpace *as)
> > {
> >     /*
> >      * Before using any flatview, check whether we're during a memory
> >      * region transaction. If so, force commit the memory region
> > transaction.
> >      */
> >     if (memory_region_transaction_in_progress())
> 
> Here we need to add the condition of BQL holding, or some threads without
> BQL held running here will trigger the assertion in
> memory_region_transaction_commit_force().
> 
> I'm not sure whether this condition is sufficient, at least for the mr access
> in the load thread it is sufficient (because the load thread will hold the BQL
> when accessing mr). But for other cases, it seems that we will return to
> our discussion on sanity check..

Yes, I think the sanity checks are actually good stuff.

I would think it's nice to impl address_space_to_flatview() like this.  I
guess we don't have an use case to fetch the flatview during a memory
update procedure, but I also don't see why it can't be supported.

/* Need to be called with either BQL or RCU read lock held */
static inline FlatView *address_space_to_flatview(AddressSpace *as)
{
    if (qemu_mutex_iothread_locked()) {
        /* We exclusively own the flatview now.. */
        if (memory_region_transaction_in_progress()) {
            /*
             * Fetch the flatview within a transaction in-progress, it
             * means current_map may not be the latest, we need to update
             * immediately to make sure the caller won't see obsolete
             * mapping.
             */
            memory_region_transaction_do_commit();
        }

        /* No further protection needed to access current_map */
        return as->current_map;
    }

    /* Otherwise we must have had the RCU lock or something went wrong */
    assert(rcu_read_is_locked());

    return qatomic_rcu_read(&as->current_map);
}

Then IIUC everything should start to run normal again, with the same hope
that it will keep the benefit of your whole idea.  Does that look sane to
you?

> 
> Another point I worry about is whether the number of mr transaction commits
> has increased in some other scenarios because of this force commit. Although
> So far, I haven't seen a simple virtual machine lifecycle trigger this force commit.

It won't be so bad; with the help of existing memory_region_update_pending
and ioeventfd_update_pending, the commit() will be noop if both are unset.

> I did a little test: replace commit_force() with abort() and run qtest.
> Almost all error I can see is related to migration..
> 
> > memory_region_transaction_commit_force();
> >     return qatomic_rcu_read(&as->current_map);
> > }
> > 
> > 2. call it before each post_load()
> > 
> > I prefer to use the former one, it is more general than the latter.
> > And with this function, the sanity check is not necessary any more.
> > Although we may inevitably call memory_region_transaction_commit_force()
> > several times, in my actual test, the number of calls to
> > memory_region_transaction_commit_force() is still insignificant compared
> > with the number of calls to memory_region_transaction_commit() before
> > optimization. As a result, This code won't affect the optimization
> > effect,
> > but it can ensure reliability.

Yes the 2nd option doesn't sound appealing to me, because we have so many
post_load()s and it can quickly beat your original purpose, I think.

Thanks,
Chuang Xu Feb. 22, 2023, 6:27 a.m. UTC | #15
Hi, Peter

On 2023/2/22 上午4:36, Peter Xu wrote:
>> On 2023/2/21 上午11:38, Chuang Xu wrote:
>>> I think we need a memory_region_transaction_commit_force() to force
>>> commit
>>> some transactions when load vmstate. This function is designed like this:
>>>
>>> /*
>>>   * memory_region_transaction_commit_force() is desgined to
>>>   * force the mr transaction to be commited in the process
>>>   * of loading vmstate.
>>>   */
>>> void memory_region_transaction_commit_force(void)
> I would call this memory_region_transaction_do_commit(), and I don't think
> the manipulation of memory_region_transaction_depth is needed here since we
> don't release BQL during the whole process, so changing that depth isn't
> needed at all to me.
>
> So, I think we can...
>
>>> {
>>>      AddressSpace *as;
>>>      unsigned int memory_region_transaction_depth_copy =
>>> memory_region_transaction_depth;
>>>
>>>      /*
>>>       * Temporarily replace memory_region_transaction_depth with 0 to
>>> prevent
>>>       * memory_region_transaction_commit_force() and
>>> address_space_to_flatview()
>>>       * call each other recursively.
>>>       */
>>>      memory_region_transaction_depth = 0;
> ... drop here ...

Note that as I mentioned in the comment, we temporarily replace this value
to prevent commit() and address_space_to_flatview() call each other recursively,
and eventually stack overflow.

Part of the coredump call stack is attached here:

#8  0x0000558de5a998b5 in memory_region_transaction_do_commit () at ../softmmu/memory.c:1131
#9  0x0000558de5a99dfd in address_space_to_flatview (as=0x558de6516060 <address_space_memory>) at /data00/migration/qemu-open/include/exec/memory.h:1130
#10 address_space_get_flatview (as=as@entry=0x558de6516060 <address_space_memory>) at ../softmmu/memory.c:810
#11 0x0000558de5a9a199 in address_space_update_ioeventfds (as=as@entry=0x558de6516060 <address_space_memory>) at ../softmmu/memory.c:836
#12 0x0000558de5a99900 in memory_region_transaction_do_commit () at ../softmmu/memory.c:1137
#13 memory_region_transaction_do_commit () at ../softmmu/memory.c:1125
#14 0x0000558de5a99dfd in address_space_to_flatview (as=0x558de6516060 <address_space_memory>) at /data00/migration/qemu-open/include/exec/memory.h:1130
#15 address_space_get_flatview (as=as@entry=0x558de6516060 <address_space_memory>) at ../softmmu/memory.c:810
#16 0x0000558de5a9a199 in address_space_update_ioeventfds (as=as@entry=0x558de6516060 <address_space_memory>) at ../softmmu/memory.c:836
#17 0x0000558de5a99900 in memory_region_transaction_do_commit () at ../softmmu/memory.c:1137
#18 memory_region_transaction_do_commit () at ../softmmu/memory.c:1125
#19 0x0000558de5a99dfd in address_space_to_flatview (as=0x558de6516060 <address_space_memory>) at /data00/migration/qemu-open/include/exec/memory.h:1130
#20 address_space_get_flatview (as=as@entry=0x558de6516060 <address_space_memory>) at ../softmmu/memory.c:810
#21 0x0000558de5a9a199 in address_space_update_ioeventfds (as=as@entry=0x558de6516060 <address_space_memory>) at ../softmmu/memory.c:836
#22 0x0000558de5a99900 in memory_region_transaction_do_commit () at ../softmmu/memory.c:1137
#23 memory_region_transaction_do_commit () at ../softmmu/memory.c:1125
#24 0x0000558de5a99dfd in address_space_to_flatview (as=0x558de6516060 <address_space_memory>) at /data00/migration/qemu-open/include/exec/memory.h:1130

So I think we need to change the depth here.

>
>>>      assert(qemu_mutex_iothread_locked());
>>>
>>>
>>>      if (memory_region_update_pending) {
>>>          flatviews_reset();
>>>
>>>          MEMORY_LISTENER_CALL_GLOBAL(begin, Forward);
>>>
>>>          QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
>>>              address_space_set_flatview(as);
>>>              address_space_update_ioeventfds(as);
>>>          }
>>>          memory_region_update_pending = false;
>>>          ioeventfd_update_pending = false;
>>>          MEMORY_LISTENER_CALL_GLOBAL(commit, Forward);
>>>      } else if (ioeventfd_update_pending) {
>>>          QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
>>>              address_space_update_ioeventfds(as);
>>>          }
>>>          ioeventfd_update_pending = false;
>>>      }
>>>
>>>      /* recover memory_region_transaction_depth */
>>>      memory_region_transaction_depth =
>>> memory_region_transaction_depth_copy;
> ... drop here ...
>
>>> }
> ... then call this new memory_region_transaction_do_commit() in
> memory_region_transaction_commit().
>
> void memory_region_transaction_commit(void)
> {
>      AddressSpace *as;
>
>      assert(memory_region_transaction_depth);
>      --memory_region_transaction_depth;
>      memory_region_transaction_do_commit();
> }
>
> Then...
>
>>> Now there are two options to use this function:
>>> 1. call it in address_space_to_flatview():
>>>
>>> static inline FlatView *address_space_to_flatview(AddressSpace *as)
>>> {
>>>      /*
>>>       * Before using any flatview, check whether we're during a memory
>>>       * region transaction. If so, force commit the memory region
>>> transaction.
>>>       */
>>>      if (memory_region_transaction_in_progress())
>> Here we need to add the condition of BQL holding, or some threads without
>> BQL held running here will trigger the assertion in
>> memory_region_transaction_commit_force().
>>
>> I'm not sure whether this condition is sufficient, at least for the mr access
>> in the load thread it is sufficient (because the load thread will hold the BQL
>> when accessing mr). But for other cases, it seems that we will return to
>> our discussion on sanity check..
> Yes, I think the sanity checks are actually good stuff.
>
> I would think it's nice to impl address_space_to_flatview() like this.  I
> guess we don't have an use case to fetch the flatview during a memory
> update procedure, but I also don't see why it can't be supported.
>
> /* Need to be called with either BQL or RCU read lock held */
> static inline FlatView *address_space_to_flatview(AddressSpace *as)
> {
>      if (qemu_mutex_iothread_locked()) {
>          /* We exclusively own the flatview now.. */
>          if (memory_region_transaction_in_progress()) {
>              /*
>               * Fetch the flatview within a transaction in-progress, it
>               * means current_map may not be the latest, we need to update
>               * immediately to make sure the caller won't see obsolete
>               * mapping.
>               */
>              memory_region_transaction_do_commit();
>          }
>
>          /* No further protection needed to access current_map */
>          return as->current_map;
>      }
>
>      /* Otherwise we must have had the RCU lock or something went wrong */
>      assert(rcu_read_is_locked());
>
>      return qatomic_rcu_read(&as->current_map);
> }
>
> Then IIUC everything should start to run normal again, with the same hope
> that it will keep the benefit of your whole idea.  Does that look sane to
> you?
>
Yes, I mostly agree with you, except for the part about transaction_depth.

Thanks!
Peter Xu Feb. 22, 2023, 3:57 p.m. UTC | #16
On Wed, Feb 22, 2023 at 02:27:55PM +0800, Chuang Xu wrote:
> Hi, Peter

Hi, Chuang,

> Note that as I mentioned in the comment, we temporarily replace this value
> to prevent commit() and address_space_to_flatview() call each other recursively,
> and eventually stack overflow.

Sorry to have overlooked that part.  IMHO here it's not about the depth,
but rather that we don't even need any RCU protection when updating
ioeventfds because we exclusively own the FlatView* too there.

I wanted to describe what I had in mind but instead I figured a patch may
be needed to be accurate (with some cleanups alongside), hence attached.
IIUC it can work with what I suggested before without fiddling with depth.
Please have a look.  The patch should apply cleanly to master branch so if
it works it can be your 1st patch too.

PS: Paolo - I know I asked this before, but it'll be good to have your
review comment on anything above.

Thanks,
Chuang Xu Feb. 23, 2023, 3:28 a.m. UTC | #17
Hi, Peter

On 2023/2/22 下午11:57, Peter Xu wrote:
> On Wed, Feb 22, 2023 at 02:27:55PM +0800, Chuang Xu wrote:
>> Hi, Peter
> Hi, Chuang,
>
>> Note that as I mentioned in the comment, we temporarily replace this value
>> to prevent commit() and address_space_to_flatview() call each other recursively,
>> and eventually stack overflow.
> Sorry to have overlooked that part.  IMHO here it's not about the depth,
> but rather that we don't even need any RCU protection when updating
> ioeventfds because we exclusively own the FlatView* too there.
>
> I wanted to describe what I had in mind but instead I figured a patch may
> be needed to be accurate (with some cleanups alongside), hence attached.
> IIUC it can work with what I suggested before without fiddling with depth.
> Please have a look.  The patch should apply cleanly to master branch so if
> it works it can be your 1st patch too.
>
> PS: Paolo - I know I asked this before, but it'll be good to have your
> review comment on anything above.
>
> Thanks,
>
Here are two problems I can see:

1. It is inappropriate to use assert(qemu_mutex_iothread_locked()
&& !memory_region_update_pending) in update_ioeventfds().

For example, when entering commit(), if memory_region_update_pending
is true, the assertion will be triggered immediately when update_ioeventfds
is called.

2. The problem of stack overflow has not been solved. There are
too many places where address_space_to_flatview() may be called.

Here are another coredump stack:

#8  0x000055a3a769ed85 in memory_region_transaction_commit_force () at ../softmmu/memory.c:1154
#9  0x000055a3a769fd75 in address_space_to_flatview (as=0x55a3a7ede180 <address_space_memory>) at /data00/migration/qemu-open/include/exec/memory.h:1118
#10 address_space_update_topology_pass (as=as@entry=0x55a3a7ede180 <address_space_memory>, old_view=old_view@entry=0x55a3a9d44990, new_view=new_view@entry=0x55a3d6837390,
     adding=adding@entry=false) at ../softmmu/memory.c:955
#11 0x000055a3a76a007c in address_space_set_flatview (as=as@entry=0x55a3a7ede180 <address_space_memory>) at ../softmmu/memory.c:1062
#12 0x000055a3a769e870 in address_space_update_flatview_all () at ../softmmu/memory.c:1107
#13 0x000055a3a769ed85 in memory_region_transaction_commit_force () at ../softmmu/memory.c:1154
#14 0x000055a3a769fd75 in address_space_to_flatview (as=0x55a3a7ede180 <address_space_memory>) at /data00/migration/qemu-open/include/exec/memory.h:1118
#15 address_space_update_topology_pass (as=as@entry=0x55a3a7ede180 <address_space_memory>, old_view=old_view@entry=0x55a3a9d44990, new_view=new_view@entry=0x55a3d67f8d90,
     adding=adding@entry=false) at ../softmmu/memory.c:955
#16 0x000055a3a76a007c in address_space_set_flatview (as=as@entry=0x55a3a7ede180 <address_space_memory>) at ../softmmu/memory.c:1062
#17 0x000055a3a769e870 in address_space_update_flatview_all () at ../softmmu/memory.c:1107
#18 0x000055a3a769ed85 in memory_region_transaction_commit_force () at ../softmmu/memory.c:1154
#19 0x000055a3a769fd75 in address_space_to_flatview (as=0x55a3a7ede180 <address_space_memory>) at /data00/migration/qemu-open/include/exec/memory.h:1118
#20 address_space_update_topology_pass (as=as@entry=0x55a3a7ede180 <address_space_memory>, old_view=old_view@entry=0x55a3a9d44990, new_view=new_view@entry=0x55a3d67ba790,
     adding=adding@entry=false) at ../softmmu/memory.c:955
#21 0x000055a3a76a007c in address_space_set_flatview (as=as@entry=0x55a3a7ede180 <address_space_memory>) at ../softmmu/memory.c:1062
#22 0x000055a3a769e870 in address_space_update_flatview_all () at ../softmmu/memory.c:1107
#23 0x000055a3a769ed85 in memory_region_transaction_commit_force () at ../softmmu/memory.c:1154

And this may not be the only case where stack overflow occurs.
Thus, changing the depth value is the safest way I think.

Thanks!
Peter Xu Feb. 25, 2023, 3:32 p.m. UTC | #18
On Thu, Feb 23, 2023 at 11:28:46AM +0800, Chuang Xu wrote:
> Hi, Peter

Hi, Chuang,

> 
> On 2023/2/22 下午11:57, Peter Xu wrote:
> > On Wed, Feb 22, 2023 at 02:27:55PM +0800, Chuang Xu wrote:
> > > Hi, Peter
> > Hi, Chuang,
> > 
> > > Note that as I mentioned in the comment, we temporarily replace this value
> > > to prevent commit() and address_space_to_flatview() call each other recursively,
> > > and eventually stack overflow.
> > Sorry to have overlooked that part.  IMHO here it's not about the depth,
> > but rather that we don't even need any RCU protection when updating
> > ioeventfds because we exclusively own the FlatView* too there.
> > 
> > I wanted to describe what I had in mind but instead I figured a patch may
> > be needed to be accurate (with some cleanups alongside), hence attached.
> > IIUC it can work with what I suggested before without fiddling with depth.
> > Please have a look.  The patch should apply cleanly to master branch so if
> > it works it can be your 1st patch too.
> > 
> > PS: Paolo - I know I asked this before, but it'll be good to have your
> > review comment on anything above.
> > 
> > Thanks,
> > 
> Here are two problems I can see:
> 
> 1. It is inappropriate to use assert(qemu_mutex_iothread_locked()
> && !memory_region_update_pending) in update_ioeventfds().
> 
> For example, when entering commit(), if memory_region_update_pending
> is true, the assertion will be triggered immediately when update_ioeventfds
> is called.

I don't see why it's wrong, and that's exactly what I wanted to guarantee,
that if memory_region_update_pending==true when updating ioeventfd, we may
have some serios problem.

For this, I split my patch into two parts and I put this change into the
last patch.  See the attachment.  If you worry about this, you can e.g. try
applying patch 1 only later, but I still don't see why it could.

> 
> 2. The problem of stack overflow has not been solved. There are
> too many places where address_space_to_flatview() may be called.
> 
> Here are another coredump stack:
> 
> #8  0x000055a3a769ed85 in memory_region_transaction_commit_force () at ../softmmu/memory.c:1154
> #9  0x000055a3a769fd75 in address_space_to_flatview (as=0x55a3a7ede180 <address_space_memory>) at /data00/migration/qemu-open/include/exec/memory.h:1118
> #10 address_space_update_topology_pass (as=as@entry=0x55a3a7ede180 <address_space_memory>, old_view=old_view@entry=0x55a3a9d44990, new_view=new_view@entry=0x55a3d6837390,
>     adding=adding@entry=false) at ../softmmu/memory.c:955
> #11 0x000055a3a76a007c in address_space_set_flatview (as=as@entry=0x55a3a7ede180 <address_space_memory>) at ../softmmu/memory.c:1062
> #12 0x000055a3a769e870 in address_space_update_flatview_all () at ../softmmu/memory.c:1107
> #13 0x000055a3a769ed85 in memory_region_transaction_commit_force () at ../softmmu/memory.c:1154
> #14 0x000055a3a769fd75 in address_space_to_flatview (as=0x55a3a7ede180 <address_space_memory>) at /data00/migration/qemu-open/include/exec/memory.h:1118
> #15 address_space_update_topology_pass (as=as@entry=0x55a3a7ede180 <address_space_memory>, old_view=old_view@entry=0x55a3a9d44990, new_view=new_view@entry=0x55a3d67f8d90,
>     adding=adding@entry=false) at ../softmmu/memory.c:955
> #16 0x000055a3a76a007c in address_space_set_flatview (as=as@entry=0x55a3a7ede180 <address_space_memory>) at ../softmmu/memory.c:1062
> #17 0x000055a3a769e870 in address_space_update_flatview_all () at ../softmmu/memory.c:1107
> #18 0x000055a3a769ed85 in memory_region_transaction_commit_force () at ../softmmu/memory.c:1154
> #19 0x000055a3a769fd75 in address_space_to_flatview (as=0x55a3a7ede180 <address_space_memory>) at /data00/migration/qemu-open/include/exec/memory.h:1118
> #20 address_space_update_topology_pass (as=as@entry=0x55a3a7ede180 <address_space_memory>, old_view=old_view@entry=0x55a3a9d44990, new_view=new_view@entry=0x55a3d67ba790,
>     adding=adding@entry=false) at ../softmmu/memory.c:955
> #21 0x000055a3a76a007c in address_space_set_flatview (as=as@entry=0x55a3a7ede180 <address_space_memory>) at ../softmmu/memory.c:1062
> #22 0x000055a3a769e870 in address_space_update_flatview_all () at ../softmmu/memory.c:1107
> #23 0x000055a3a769ed85 in memory_region_transaction_commit_force () at ../softmmu/memory.c:1154
> 
> And this may not be the only case where stack overflow occurs.
> Thus, changing the depth value is the safest way I think.

I really think changing depth is a hack... :(

Here IMHO the problem is we have other missing calls to
address_space_to_flatview() during commit() and that caused the issue.
There aren't a lot of those, and sorry to miss yet another one.

So let me try one more time on this (with patch 1; I think I've got two
places missed in the previous attempt).  Let's see how it goes.

We may want to add a tracepoint or have some way to know when enfornced
commit() is triggered during the vm load phase.  I just noticed when you
worried about having enforced commit() to often then it beats the original
purpose and I think yes that's something to worry.

I still believe AHCI condition is rare (since e.g. you've passed all Juan's
tests even before we have this "do_commit" stuff), but in short: I think it
would still be interesting if you can capture all the users of enforced
commit() like the AHCI case you quoted before, and list them in the cover
letter in your next post (along with a new perf measurements, to make sure
your worry is not true on having too much enforced commit will invalid the
whole idea).

When I was digging this out, I also found some RCU issue when using
address_space_to_flatview() (when BQL was not taken), only in the
memory_region_clear_dirty_bitmap() path.  I hope the new assertion
(rcu_read_is_locked()) won't trigger on some of the use cases for you
already, but anyway feel free to ignore this whole paragraph for now until
if you see some assert(rcu_read_is_locked()) being triggered.  I plan to
post some RFC for fixing RCU and I'll have you copied just for reference
(may be separate issue as what you're working on).

Thanks,
Chuang Xu Feb. 27, 2023, 1:19 p.m. UTC | #19
Hi, Peter

On 2023/2/25 下午11:32, Peter Xu wrote:
> On Thu, Feb 23, 2023 at 11:28:46AM +0800, Chuang Xu wrote:
>> Hi, Peter
> Hi, Chuang,
>
>> On 2023/2/22 下午11:57, Peter Xu wrote:
> I don't see why it's wrong, and that's exactly what I wanted to guarantee,
> that if memory_region_update_pending==true when updating ioeventfd, we may
> have some serios problem.
>
> For this, I split my patch into two parts and I put this change into the
> last patch.  See the attachment.  If you worry about this, you can e.g. try
> applying patch 1 only later, but I still don't see why it could.

Sorry, I made some mistakes in my previous understanding of the code. However,
if this assertion is added, it will indeed trigger some coredump in qtest with
my patches. Here is the coredump(This is the only one I found):

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007fa5e7b17535 in __GI_abort () at abort.c:79
#2  0x00007fa5e7b1740f in __assert_fail_base (fmt=0x7fa5e7c78ef0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
     assertion=0x56305fc02d60 "qemu_mutex_iothread_locked() && !memory_region_update_pending", file=0x56305fc028cb "../softmmu/memory.c", line=855, function=<optimized out>) at assert.c:92
#3  0x00007fa5e7b251a2 in __GI___assert_fail (assertion=assertion@entry=0x56305fc02d60 "qemu_mutex_iothread_locked() && !memory_region_update_pending",
     file=file@entry=0x56305fc028cb "../softmmu/memory.c", line=line@entry=855, function=function@entry=0x56305fc03cc0 <__PRETTY_FUNCTION__.38596> "address_space_update_ioeventfds")
     at assert.c:101
#4  0x000056305f8a9194 in address_space_update_ioeventfds (as=as@entry=0x563061293648) at ../softmmu/memory.c:855
#5  0x000056305f8afe4f in address_space_init (as=as@entry=0x563061293648, root=root@entry=0x5630612936a0, name=name@entry=0x5630612934f0 "virtio-net-pci") at ../softmmu/memory.c:3172
#6  0x000056305f686e43 in do_pci_register_device (errp=0x7fa5e4f39850, devfn=<optimized out>, name=0x563061011c40 "virtio-net-pci", pci_dev=0x563061293410) at ../hw/pci/pci.c:1145
#7  pci_qdev_realize (qdev=0x563061293410, errp=0x7fa5e4f39850) at ../hw/pci/pci.c:2036
#8  0x000056305f939daf in device_set_realized (obj=<optimized out>, value=true, errp=0x7fa5e4f39ae0) at ../hw/core/qdev.c:510
#9  0x000056305f93d156 in property_set_bool (obj=0x563061293410, v=<optimized out>, name=<optimized out>, opaque=0x5630610221d0, errp=0x7fa5e4f39ae0) at ../qom/object.c:2285
#10 0x000056305f93f403 in object_property_set (obj=obj@entry=0x563061293410, name=name@entry=0x56305fba6bc3 "realized", v=v@entry=0x563061e52a00, errp=errp@entry=0x7fa5e4f39ae0)
     at ../qom/object.c:1420
#11 0x000056305f94247f in object_property_set_qobject (obj=obj@entry=0x563061293410, name=name@entry=0x56305fba6bc3 "realized", value=value@entry=0x563061e61cb0,
     errp=errp@entry=0x7fa5e4f39ae0) at ../qom/qom-qobject.c:28
#12 0x000056305f93f674 in object_property_set_bool (obj=0x563061293410, name=name@entry=0x56305fba6bc3 "realized", value=value@entry=true, errp=errp@entry=0x7fa5e4f39ae0)
     at ../qom/object.c:1489
#13 0x000056305f93959c in qdev_realize (dev=<optimized out>, bus=bus@entry=0x563061c9c400, errp=errp@entry=0x7fa5e4f39ae0) at ../hw/core/qdev.c:292
#14 0x000056305f7244a0 in qdev_device_add_from_qdict (opts=0x563061e64c00, from_json=<optimized out>, errp=<optimized out>, errp@entry=0x7fa5e4f39ae0)
     at /data00/migration/qemu-open/include/hw/qdev-core.h:17
#15 0x000056305f846c75 in failover_add_primary (errp=0x7fa5e4f39ad8, n=0x563062043530) at ../hw/net/virtio-net.c:933
#16 virtio_net_set_features (vdev=<optimized out>, features=4611687122246533156) at ../hw/net/virtio-net.c:1004
#17 0x000056305f872238 in virtio_set_features_nocheck (vdev=vdev@entry=0x563062043530, val=val@entry=4611687122246533156) at ../hw/virtio/virtio.c:2851
#18 0x000056305f877e9e in virtio_load (vdev=0x563062043530, f=0x56306125bde0, version_id=11) at ../hw/virtio/virtio.c:3027
#19 0x000056305f73c601 in vmstate_load_state (f=f@entry=0x56306125bde0, vmsd=0x56305fef16c0 <vmstate_virtio_net>, opaque=0x563062043530, version_id=11) at ../migration/vmstate.c:137
#20 0x000056305f757672 in vmstate_load (f=0x56306125bde0, se=0x563062176700) at ../migration/savevm.c:919
#21 0x000056305f757905 in qemu_loadvm_section_start_full (f=f@entry=0x56306125bde0, mis=0x56306101d3e0) at ../migration/savevm.c:2503
#22 0x000056305f75aca8 in qemu_loadvm_state_main (f=f@entry=0x56306125bde0, mis=mis@entry=0x56306101d3e0) at ../migration/savevm.c:2719
#23 0x000056305f75c17a in qemu_loadvm_state (f=0x56306125bde0) at ../migration/savevm.c:2809
#24 0x000056305f74980e in process_incoming_migration_co (opaque=<optimized out>) at ../migration/migration.c:606
#25 0x000056305fab25cb in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at ../util/coroutine-ucontext.c:177

> I really think changing depth is a hack... :(
>
> Here IMHO the problem is we have other missing calls to
> address_space_to_flatview() during commit() and that caused the issue.
> There aren't a lot of those, and sorry to miss yet another one.
>
> So let me try one more time on this (with patch 1; I think I've got two
> places missed in the previous attempt).  Let's see how it goes.
>
> We may want to add a tracepoint or have some way to know when enfornced
> commit() is triggered during the vm load phase.  I just noticed when you
> worried about having enforced commit() to often then it beats the original
> purpose and I think yes that's something to worry.
>
> I still believe AHCI condition is rare (since e.g. you've passed all Juan's
> tests even before we have this "do_commit" stuff), but in short: I think it
> would still be interesting if you can capture all the users of enforced
> commit() like the AHCI case you quoted before, and list them in the cover
> letter in your next post (along with a new perf measurements, to make sure
> your worry is not true on having too much enforced commit will invalid the
> whole idea).
>
> When I was digging this out, I also found some RCU issue when using
> address_space_to_flatview() (when BQL was not taken), only in the
> memory_region_clear_dirty_bitmap() path.  I hope the new assertion
> (rcu_read_is_locked()) won't trigger on some of the use cases for you
> already, but anyway feel free to ignore this whole paragraph for now until
> if you see some assert(rcu_read_is_locked()) being triggered.  I plan to
> post some RFC for fixing RCU and I'll have you copied just for reference
> (may be separate issue as what you're working on).
>
> Thanks,
>
Unfortunately, there is another case of stack overflow...

#8  memory_region_transaction_do_commit () at ../softmmu/memory.c:1145
#9  0x00005610e96d3665 in address_space_to_flatview (as=0x5610e9ece820 <address_space_memory>) at /data00/migration/qemu-open/include/exec/memory.h:1119
#10 address_space_get_flatview (as=0x5610e9ece820 <address_space_memory>) at ../softmmu/memory.c:818
#11 0x00005610e96dfa14 in address_space_cache_init (cache=cache@entry=0x56111cdee410, as=<optimized out>, addr=addr@entry=1048576, len=len@entry=4096, is_write=false)
     at ../softmmu/physmem.c:3350
#12 0x00005610e96a0928 in virtio_init_region_cache (vdev=vdev@entry=0x5610eb72bf10, n=n@entry=0) at ../hw/virtio/virtio.c:247
#13 0x00005610e96a0db4 in virtio_memory_listener_commit (listener=0x5610eb72bff8) at ../hw/virtio/virtio.c:3592
#14 0x00005610e96d2e5e in address_space_update_flatviews_all () at ../softmmu/memory.c:1126
#15 memory_region_transaction_do_commit () at ../softmmu/memory.c:1145

Fortunately, this is probably the last one (at least according to the qtest results)
Peter Xu Feb. 27, 2023, 8:56 p.m. UTC | #20
On Mon, Feb 27, 2023 at 09:19:39PM +0800, Chuang Xu wrote:
> Hi, Peter

Hi, Chuang,

> 
> On 2023/2/25 下午11:32, Peter Xu wrote:
> > On Thu, Feb 23, 2023 at 11:28:46AM +0800, Chuang Xu wrote:
> > > Hi, Peter
> > Hi, Chuang,
> > 
> > > On 2023/2/22 下午11:57, Peter Xu wrote:
> > I don't see why it's wrong, and that's exactly what I wanted to guarantee,
> > that if memory_region_update_pending==true when updating ioeventfd, we may
> > have some serios problem.
> > 
> > For this, I split my patch into two parts and I put this change into the
> > last patch.  See the attachment.  If you worry about this, you can e.g. try
> > applying patch 1 only later, but I still don't see why it could.
> 
> Sorry, I made some mistakes in my previous understanding of the code. However,
> if this assertion is added, it will indeed trigger some coredump in qtest with
> my patches. Here is the coredump(This is the only one I found):
> 
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1  0x00007fa5e7b17535 in __GI_abort () at abort.c:79
> #2  0x00007fa5e7b1740f in __assert_fail_base (fmt=0x7fa5e7c78ef0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
>     assertion=0x56305fc02d60 "qemu_mutex_iothread_locked() && !memory_region_update_pending", file=0x56305fc028cb "../softmmu/memory.c", line=855, function=<optimized out>) at assert.c:92
> #3  0x00007fa5e7b251a2 in __GI___assert_fail (assertion=assertion@entry=0x56305fc02d60 "qemu_mutex_iothread_locked() && !memory_region_update_pending",
>     file=file@entry=0x56305fc028cb "../softmmu/memory.c", line=line@entry=855, function=function@entry=0x56305fc03cc0 <__PRETTY_FUNCTION__.38596> "address_space_update_ioeventfds")
>     at assert.c:101
> #4  0x000056305f8a9194 in address_space_update_ioeventfds (as=as@entry=0x563061293648) at ../softmmu/memory.c:855
> #5  0x000056305f8afe4f in address_space_init (as=as@entry=0x563061293648, root=root@entry=0x5630612936a0, name=name@entry=0x5630612934f0 "virtio-net-pci") at ../softmmu/memory.c:3172
> #6  0x000056305f686e43 in do_pci_register_device (errp=0x7fa5e4f39850, devfn=<optimized out>, name=0x563061011c40 "virtio-net-pci", pci_dev=0x563061293410) at ../hw/pci/pci.c:1145
> #7  pci_qdev_realize (qdev=0x563061293410, errp=0x7fa5e4f39850) at ../hw/pci/pci.c:2036
> #8  0x000056305f939daf in device_set_realized (obj=<optimized out>, value=true, errp=0x7fa5e4f39ae0) at ../hw/core/qdev.c:510
> #9  0x000056305f93d156 in property_set_bool (obj=0x563061293410, v=<optimized out>, name=<optimized out>, opaque=0x5630610221d0, errp=0x7fa5e4f39ae0) at ../qom/object.c:2285
> #10 0x000056305f93f403 in object_property_set (obj=obj@entry=0x563061293410, name=name@entry=0x56305fba6bc3 "realized", v=v@entry=0x563061e52a00, errp=errp@entry=0x7fa5e4f39ae0)
>     at ../qom/object.c:1420
> #11 0x000056305f94247f in object_property_set_qobject (obj=obj@entry=0x563061293410, name=name@entry=0x56305fba6bc3 "realized", value=value@entry=0x563061e61cb0,
>     errp=errp@entry=0x7fa5e4f39ae0) at ../qom/qom-qobject.c:28
> #12 0x000056305f93f674 in object_property_set_bool (obj=0x563061293410, name=name@entry=0x56305fba6bc3 "realized", value=value@entry=true, errp=errp@entry=0x7fa5e4f39ae0)
>     at ../qom/object.c:1489
> #13 0x000056305f93959c in qdev_realize (dev=<optimized out>, bus=bus@entry=0x563061c9c400, errp=errp@entry=0x7fa5e4f39ae0) at ../hw/core/qdev.c:292
> #14 0x000056305f7244a0 in qdev_device_add_from_qdict (opts=0x563061e64c00, from_json=<optimized out>, errp=<optimized out>, errp@entry=0x7fa5e4f39ae0)
>     at /data00/migration/qemu-open/include/hw/qdev-core.h:17
> #15 0x000056305f846c75 in failover_add_primary (errp=0x7fa5e4f39ad8, n=0x563062043530) at ../hw/net/virtio-net.c:933
> #16 virtio_net_set_features (vdev=<optimized out>, features=4611687122246533156) at ../hw/net/virtio-net.c:1004
> #17 0x000056305f872238 in virtio_set_features_nocheck (vdev=vdev@entry=0x563062043530, val=val@entry=4611687122246533156) at ../hw/virtio/virtio.c:2851
> #18 0x000056305f877e9e in virtio_load (vdev=0x563062043530, f=0x56306125bde0, version_id=11) at ../hw/virtio/virtio.c:3027
> #19 0x000056305f73c601 in vmstate_load_state (f=f@entry=0x56306125bde0, vmsd=0x56305fef16c0 <vmstate_virtio_net>, opaque=0x563062043530, version_id=11) at ../migration/vmstate.c:137
> #20 0x000056305f757672 in vmstate_load (f=0x56306125bde0, se=0x563062176700) at ../migration/savevm.c:919
> #21 0x000056305f757905 in qemu_loadvm_section_start_full (f=f@entry=0x56306125bde0, mis=0x56306101d3e0) at ../migration/savevm.c:2503
> #22 0x000056305f75aca8 in qemu_loadvm_state_main (f=f@entry=0x56306125bde0, mis=mis@entry=0x56306101d3e0) at ../migration/savevm.c:2719
> #23 0x000056305f75c17a in qemu_loadvm_state (f=0x56306125bde0) at ../migration/savevm.c:2809
> #24 0x000056305f74980e in process_incoming_migration_co (opaque=<optimized out>) at ../migration/migration.c:606
> #25 0x000056305fab25cb in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at ../util/coroutine-ucontext.c:177

I thought about something like this one and assumed it was fine, but I
forgot this can trigger after your other patch applied..  It means we need
to drop the 2nd patch that I provided in the last reply.

> 
> > I really think changing depth is a hack... :(
> > 
> > Here IMHO the problem is we have other missing calls to
> > address_space_to_flatview() during commit() and that caused the issue.
> > There aren't a lot of those, and sorry to miss yet another one.
> > 
> > So let me try one more time on this (with patch 1; I think I've got two
> > places missed in the previous attempt).  Let's see how it goes.
> > 
> > We may want to add a tracepoint or have some way to know when enfornced
> > commit() is triggered during the vm load phase.  I just noticed when you
> > worried about having enforced commit() to often then it beats the original
> > purpose and I think yes that's something to worry.
> > 
> > I still believe AHCI condition is rare (since e.g. you've passed all Juan's
> > tests even before we have this "do_commit" stuff), but in short: I think it
> > would still be interesting if you can capture all the users of enforced
> > commit() like the AHCI case you quoted before, and list them in the cover
> > letter in your next post (along with a new perf measurements, to make sure
> > your worry is not true on having too much enforced commit will invalid the
> > whole idea).
> > 
> > When I was digging this out, I also found some RCU issue when using
> > address_space_to_flatview() (when BQL was not taken), only in the
> > memory_region_clear_dirty_bitmap() path.  I hope the new assertion
> > (rcu_read_is_locked()) won't trigger on some of the use cases for you
> > already, but anyway feel free to ignore this whole paragraph for now until
> > if you see some assert(rcu_read_is_locked()) being triggered.  I plan to
> > post some RFC for fixing RCU and I'll have you copied just for reference
> > (may be separate issue as what you're working on).
> > 
> > Thanks,
> > 
> Unfortunately, there is another case of stack overflow...
> 
> #8  memory_region_transaction_do_commit () at ../softmmu/memory.c:1145
> #9  0x00005610e96d3665 in address_space_to_flatview (as=0x5610e9ece820 <address_space_memory>) at /data00/migration/qemu-open/include/exec/memory.h:1119
> #10 address_space_get_flatview (as=0x5610e9ece820 <address_space_memory>) at ../softmmu/memory.c:818
> #11 0x00005610e96dfa14 in address_space_cache_init (cache=cache@entry=0x56111cdee410, as=<optimized out>, addr=addr@entry=1048576, len=len@entry=4096, is_write=false)
>     at ../softmmu/physmem.c:3350
> #12 0x00005610e96a0928 in virtio_init_region_cache (vdev=vdev@entry=0x5610eb72bf10, n=n@entry=0) at ../hw/virtio/virtio.c:247
> #13 0x00005610e96a0db4 in virtio_memory_listener_commit (listener=0x5610eb72bff8) at ../hw/virtio/virtio.c:3592
> #14 0x00005610e96d2e5e in address_space_update_flatviews_all () at ../softmmu/memory.c:1126
> #15 memory_region_transaction_do_commit () at ../softmmu/memory.c:1145
> 
> Fortunately, this is probably the last one (at least according to the qtest results)