Message ID | 20230117115511.3215273-1-xuchuangxclwt@bytedance.com (mailing list archive) |
---|---|
Headers | show |
Series | migration: reduce time of loading non-iterable vmstate | expand |
On Tue, Jan 17, 2023 at 07:55:08PM +0800, Chuang Xu wrote: > In this version: > > - rename rcu_read_locked() to rcu_read_is_locked(). > - adjust the sanity check in address_space_to_flatview(). > - improve some comments. Acked-by: Peter Xu <peterx@redhat.com>
Chuang Xu <xuchuangxclwt@bytedance.com> wrote: > In this version: > > - rename rcu_read_locked() to rcu_read_is_locked(). > - adjust the sanity check in address_space_to_flatview(). > - improve some comments. > > The duration of loading non-iterable vmstate accounts for a significant > portion of downtime (starting with the timestamp of source qemu stop and > ending with the timestamp of target qemu start). Most of the time is spent > committing memory region changes repeatedly. > > This patch packs all the changes to memory region during the period of > loading non-iterable vmstate in a single memory transaction. With the > increase of devices, this patch will greatly improve the performance. > > Here are the test1 results: > test info: > - Host > - Intel(R) Xeon(R) Platinum 8260 CPU > - NVIDIA Mellanox ConnectX-5 > - VM > - 32 CPUs 128GB RAM VM > - 8 16-queue vhost-net device > - 16 4-queue vhost-user-blk device. > > time of loading non-iterable vmstate downtime > before about 150 ms 740+ ms > after about 30 ms 630+ ms > > (This result is different from that of v1. It may be that someone has > changed something on my host.., but it does not affect the display of > the optimization effect.) > > > In test2, we keep the number of the device the same as test1, reduce the > number of queues per device: > > Here are the test2 results: > test info: > - Host > - Intel(R) Xeon(R) Platinum 8260 CPU > - NVIDIA Mellanox ConnectX-5 > - VM > - 32 CPUs 128GB RAM VM > - 8 1-queue vhost-net device > - 16 1-queue vhost-user-blk device. > > time of loading non-iterable vmstate downtime > before about 90 ms about 250 ms > > after about 25 ms about 160 ms > > > > In test3, we keep the number of queues per device the same as test1, reduce > the number of devices: > > Here are the test3 results: > test info: > - Host > - Intel(R) Xeon(R) Platinum 8260 CPU > - NVIDIA Mellanox ConnectX-5 > - VM > - 32 CPUs 128GB RAM VM > - 1 16-queue vhost-net device > - 1 4-queue vhost-user-blk device. > > time of loading non-iterable vmstate downtime > before about 20 ms about 70 ms > after about 11 ms about 60 ms > > > As we can see from the test results above, both the number of queues and > the number of devices have a great impact on the time of loading non-iterable > vmstate. The growth of the number of devices and queues will lead to more > mr commits, and the time consumption caused by the flatview reconstruction > will also increase. > > Please review, Chuang. Hi As on the review, I aggree with the patches, but I am waiting for Paolo to review the rcu change (that I think that it is trivial, but I am not the rcu maintainer). If it happens that you need to send another version, I think that you can change the RFC for PATCH. Again, very good job. Later, Juan. > [v4] > > - attach more information in the cover letter. > - remove changes on virtio_load. > - add rcu_read_locked() to detect holding of rcu lock. > > [v3] > > - move virtio_load_check_delay() from virtio_memory_listener_commit() to > virtio_vmstate_change(). > - add delay_check flag to VirtIODevice to make sure virtio_load_check_delay() > will be called when delay_check is true. > > [v2] > > - rebase to latest upstream. > - add sanity check to address_space_to_flatview(). > - postpone the init of the vring cache until migration's loading completes. > > [v1] > > The duration of loading non-iterable vmstate accounts for a significant > portion of downtime (starting with the timestamp of source qemu stop and > ending with the timestamp of target qemu start). Most of the time is spent > committing memory region changes repeatedly. > > This patch packs all the changes to memory region during the period of > loading non-iterable vmstate in a single memory transaction. With the > increase of devices, this patch will greatly improve the performance. > > Here are the test results: > test vm info: > - 32 CPUs 128GB RAM > - 8 16-queue vhost-net device > - 16 4-queue vhost-user-blk device. > > time of loading non-iterable vmstate > before about 210 ms > after about 40 ms
Chuang Xu <xuchuangxclwt@bytedance.com> wrote: > In this version: > > - rename rcu_read_locked() to rcu_read_is_locked(). > - adjust the sanity check in address_space_to_flatview(). > - improve some comments. queued series. > > The duration of loading non-iterable vmstate accounts for a significant > portion of downtime (starting with the timestamp of source qemu stop and > ending with the timestamp of target qemu start). Most of the time is spent > committing memory region changes repeatedly. > > This patch packs all the changes to memory region during the period of > loading non-iterable vmstate in a single memory transaction. With the > increase of devices, this patch will greatly improve the performance. > > Here are the test1 results: > test info: > - Host > - Intel(R) Xeon(R) Platinum 8260 CPU > - NVIDIA Mellanox ConnectX-5 > - VM > - 32 CPUs 128GB RAM VM > - 8 16-queue vhost-net device > - 16 4-queue vhost-user-blk device. > > time of loading non-iterable vmstate downtime > before about 150 ms 740+ ms > after about 30 ms 630+ ms > > (This result is different from that of v1. It may be that someone has > changed something on my host.., but it does not affect the display of > the optimization effect.) > > > In test2, we keep the number of the device the same as test1, reduce the > number of queues per device: > > Here are the test2 results: > test info: > - Host > - Intel(R) Xeon(R) Platinum 8260 CPU > - NVIDIA Mellanox ConnectX-5 > - VM > - 32 CPUs 128GB RAM VM > - 8 1-queue vhost-net device > - 16 1-queue vhost-user-blk device. > > time of loading non-iterable vmstate downtime > before about 90 ms about 250 ms > > after about 25 ms about 160 ms > > > > In test3, we keep the number of queues per device the same as test1, reduce > the number of devices: > > Here are the test3 results: > test info: > - Host > - Intel(R) Xeon(R) Platinum 8260 CPU > - NVIDIA Mellanox ConnectX-5 > - VM > - 32 CPUs 128GB RAM VM > - 1 16-queue vhost-net device > - 1 4-queue vhost-user-blk device. > > time of loading non-iterable vmstate downtime > before about 20 ms about 70 ms > after about 11 ms about 60 ms > > > As we can see from the test results above, both the number of queues and > the number of devices have a great impact on the time of loading non-iterable > vmstate. The growth of the number of devices and queues will lead to more > mr commits, and the time consumption caused by the flatview reconstruction > will also increase. > > Please review, Chuang. > > [v4] > > - attach more information in the cover letter. > - remove changes on virtio_load. > - add rcu_read_locked() to detect holding of rcu lock. > > [v3] > > - move virtio_load_check_delay() from virtio_memory_listener_commit() to > virtio_vmstate_change(). > - add delay_check flag to VirtIODevice to make sure virtio_load_check_delay() > will be called when delay_check is true. > > [v2] > > - rebase to latest upstream. > - add sanity check to address_space_to_flatview(). > - postpone the init of the vring cache until migration's loading completes. > > [v1] > > The duration of loading non-iterable vmstate accounts for a significant > portion of downtime (starting with the timestamp of source qemu stop and > ending with the timestamp of target qemu start). Most of the time is spent > committing memory region changes repeatedly. > > This patch packs all the changes to memory region during the period of > loading non-iterable vmstate in a single memory transaction. With the > increase of devices, this patch will greatly improve the performance. > > Here are the test results: > test vm info: > - 32 CPUs 128GB RAM > - 8 16-queue vhost-net device > - 16 4-queue vhost-user-blk device. > > time of loading non-iterable vmstate > before about 210 ms > after about 40 ms
On 1/17/23 12:55, Chuang Xu wrote: > In this version: > > - rename rcu_read_locked() to rcu_read_is_locked(). > - adjust the sanity check in address_space_to_flatview(). > - improve some comments. > > The duration of loading non-iterable vmstate accounts for a significant > portion of downtime (starting with the timestamp of source qemu stop and > ending with the timestamp of target qemu start). Most of the time is spent > committing memory region changes repeatedly. > > This patch packs all the changes to memory region during the period of > loading non-iterable vmstate in a single memory transaction. With the > increase of devices, this patch will greatly improve the performance. > > Here are the test1 results: > test info: > - Host > - Intel(R) Xeon(R) Platinum 8260 CPU > - NVIDIA Mellanox ConnectX-5 > - VM > - 32 CPUs 128GB RAM VM > - 8 16-queue vhost-net device > - 16 4-queue vhost-user-blk device. > > time of loading non-iterable vmstate downtime > before about 150 ms 740+ ms > after about 30 ms 630+ ms > > (This result is different from that of v1. It may be that someone has > changed something on my host.., but it does not affect the display of > the optimization effect.) > > > In test2, we keep the number of the device the same as test1, reduce the > number of queues per device: > > Here are the test2 results: > test info: > - Host > - Intel(R) Xeon(R) Platinum 8260 CPU > - NVIDIA Mellanox ConnectX-5 > - VM > - 32 CPUs 128GB RAM VM > - 8 1-queue vhost-net device > - 16 1-queue vhost-user-blk device. > > time of loading non-iterable vmstate downtime > before about 90 ms about 250 ms > > after about 25 ms about 160 ms > > > > In test3, we keep the number of queues per device the same as test1, reduce > the number of devices: > > Here are the test3 results: > test info: > - Host > - Intel(R) Xeon(R) Platinum 8260 CPU > - NVIDIA Mellanox ConnectX-5 > - VM > - 32 CPUs 128GB RAM VM > - 1 16-queue vhost-net device > - 1 4-queue vhost-user-blk device. > > time of loading non-iterable vmstate downtime > before about 20 ms about 70 ms > after about 11 ms about 60 ms > > > As we can see from the test results above, both the number of queues and > the number of devices have a great impact on the time of loading non-iterable > vmstate. The growth of the number of devices and queues will lead to more > mr commits, and the time consumption caused by the flatview reconstruction > will also increase. > > Please review, Chuang. > > [v4] > > - attach more information in the cover letter. > - remove changes on virtio_load. > - add rcu_read_locked() to detect holding of rcu lock. > > [v3] > > - move virtio_load_check_delay() from virtio_memory_listener_commit() to > virtio_vmstate_change(). > - add delay_check flag to VirtIODevice to make sure virtio_load_check_delay() > will be called when delay_check is true. > > [v2] > > - rebase to latest upstream. > - add sanity check to address_space_to_flatview(). > - postpone the init of the vring cache until migration's loading completes. > > [v1] > > The duration of loading non-iterable vmstate accounts for a significant > portion of downtime (starting with the timestamp of source qemu stop and > ending with the timestamp of target qemu start). Most of the time is spent > committing memory region changes repeatedly. > > This patch packs all the changes to memory region during the period of > loading non-iterable vmstate in a single memory transaction. With the > increase of devices, this patch will greatly improve the performance. > > Here are the test results: > test vm info: > - 32 CPUs 128GB RAM > - 8 16-queue vhost-net device > - 16 4-queue vhost-user-blk device. > > time of loading non-iterable vmstate > before about 210 ms > after about 40 ms > > great improvements on the load times, congrats! Claudio
Chuang Xu <xuchuangxclwt@bytedance.com> wrote: > In this version: Hi I had to drop this. It breaks migration of dbus-vmstate. .[K144/179 qemu:qtest+qtest-x86_64 / qtest-x86_64/virtio-net-failover ERROR 5.66s killed by signal 6 SIGABRT >>> G_TEST_DBUS_DAEMON=/mnt/code/qemu/multifd/tests/dbus-vmstate-daemon.sh QTEST_QEMU_BINARY=./qemu-system-x86_64 MALLOC_PERTURB_=145 /scratch/qemu/multifd/x64/tests/qtest/virtio-net-failover --tap -k ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― ✀ ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― stderr: qemu-system-x86_64: /mnt/code/qemu/multifd/include/exec/memory.h:1112: address_space_to_flatview: Assertion `(!memory_region_transaction_in_progress() && qemu_mutex_iothread_locked()) || rcu_read_is_locked()' failed. Broken pipe ../../../../mnt/code/qemu/multifd/tests/qtest/libqtest.c:190: kill_qemu() detected QEMU death from signal 6 (Aborted) (core dumped) (test program exited with status code -6) TAP parsing error: Too few tests run (expected 23, got 12) ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― Can you take a look at this? I reproduced it with "make check" and qemu compiled with the configure line attached. Later, Juan. configure --enable-trace-backends=log --prefix=/usr --sysconfdir=/etc/sysconfig/ --audio-drv-list=pa --target-list=x86_64-softmmu --with-coroutine=ucontext --with-git-submodules=validate --enable-fdt=system --enable-alsa --enable-attr --enable-auth-pam --enable-avx2 --enable-avx512f --enable-bochs --enable-bpf --enable-brlapi --disable-bsd-user --enable-bzip2 --enable-cap-ng --disable-capstone --disable-cfi --disable-cfi-debug --enable-cloop --disable-cocoa --enable-containers --disable-coreaudio --enable-coroutine-pool --enable-crypto-afalg --enable-curl --enable-curses --enable-dbus-display --enable-debug-info --disable-debug-mutex --disable-debug-stack-usage --disable-debug-tcg --enable-dmg --disable-docs --disable-dsound --enable-fdt --disable-fuse --disable-fuse-lseek --disable-fuzzing --disable-gcov --enable-gettext --enable-gio --enable-glusterfs --enable-gnutls --disable-gprof --enable-gtk --disable-guest-agent --disable-guest-agent-msi --disable-hax --disable-hvf --enable-iconv --enable-install-blobs --enable-jack --enable-keyring --enable-kvm --enable-l2tpv3 --enable-libdaxctl --enable-libiscsi --enable-libnfs --enable-libpmem --enable-libssh --enable-libudev --enable-libusb --enable-linux-aio --enable-linux-io-uring --disable-linux-user --enable-live-block-migration --disable-lto --disable-lzfse --enable-lzo --disable-malloc-trim --enable-membarrier --enable-module-upgrades --enable-modules --enable-mpath --enable-multiprocess --disable-netmap --enable-nettle --enable-numa --disable-nvmm --enable-opengl --enable-oss --enable-pa --enable-parallels --enable-pie --enable-plugins --enable-png --disable-profiler --enable-pvrdma --enable-qcow1 --enable-qed --disable-qom-cast-debug --enable-rbd --enable-rdma --enable-replication --enable-rng-none --disable-safe-stack --disable-sanitizers --enable-stack-protector --enable-sdl --enable-sdl-image --enable-seccomp --enable-selinux --enable-slirp --enable-slirp-smbd --enable-smartcard --enable-snappy --enable-sparse --enable-spice --enable-spice-protocol --enable-system --enable-tcg --disable-tcg-interpreter --disable-tools --enable-tpm --disable-tsan --disable-u2f --enable-usb-redir --disable-user --disable-vde --enable-vdi --enable-vhost-crypto --enable-vhost-kernel --enable-vhost-net --enable-vhost-user --enable-vhost-user-blk-server --enable-vhost-vdpa --enable-virglrenderer --enable-virtfs --enable-virtiofsd --enable-vnc --enable-vnc-jpeg --enable-vnc-sasl --enable-vte --enable-vvfat --enable-werror --disable-whpx --enable-xen --enable-xen-pci-passthrough --enable-xkbcommon --enable-zstd --disable-gcrypt
Hi, Juan Thanks for your test results! On 2023/2/16 上午3:10, Juan Quintela wrote: > Chuang Xu wrote: >> In this version: > Hi > > I had to drop this. It breaks migration of dbus-vmstate. Previously, I only focused on the precopy migration test in the normal scenario, but did not run qtest. So I need to apologize for my inexperience.. > > .[K144/179 qemu:qtest+qtest-x86_64 / qtest-x86_64/virtio-net-failover ERROR 5.66s killed by signal 6 SIGABRT >>>> G_TEST_DBUS_DAEMON=/mnt/code/qemu/multifd/tests/dbus-vmstate-daemon.sh QTEST_QEMU_BINARY=./qemu-system-x86_64 MALLOC_PERTURB_=145 /scratch/qemu/multifd/x64/tests/qtest/virtio-net-failover --tap -k > ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― ✀ ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― > stderr: > qemu-system-x86_64: /mnt/code/qemu/multifd/include/exec/memory.h:1112: address_space_to_flatview: Assertion `(!memory_region_transaction_in_progress() && qemu_mutex_iothread_locked()) || rcu_read_is_locked()' failed. > Broken pipe > ../../../../mnt/code/qemu/multifd/tests/qtest/libqtest.c:190: kill_qemu() detected QEMU death from signal 6 (Aborted) (core dumped) > > (test program exited with status code -6) > > TAP parsing error: Too few tests run (expected 23, got 12) > ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― > > Can you take a look at this? > > I reproduced it with "make check" and qemu compiled with the configure > line attached. > > Later, Juan. > > configure --enable-trace-backends=log > --prefix=/usr > --sysconfdir=/etc/sysconfig/ > --audio-drv-list=pa > --target-list=x86_64-softmmu > --with-coroutine=ucontext > --with-git-submodules=validate > --enable-fdt=system > --enable-alsa > --enable-attr > --enable-auth-pam > --enable-avx2 > --enable-avx512f > --enable-bochs > --enable-bpf > --enable-brlapi > --disable-bsd-user > --enable-bzip2 > --enable-cap-ng > --disable-capstone > --disable-cfi > --disable-cfi-debug > --enable-cloop > --disable-cocoa > --enable-containers > --disable-coreaudio > --enable-coroutine-pool > --enable-crypto-afalg > --enable-curl > --enable-curses > --enable-dbus-display > --enable-debug-info > --disable-debug-mutex > --disable-debug-stack-usage > --disable-debug-tcg > --enable-dmg > --disable-docs > --disable-dsound > --enable-fdt > --disable-fuse > --disable-fuse-lseek > --disable-fuzzing > --disable-gcov > --enable-gettext > --enable-gio > --enable-glusterfs > --enable-gnutls > --disable-gprof > --enable-gtk > --disable-guest-agent > --disable-guest-agent-msi > --disable-hax > --disable-hvf > --enable-iconv > --enable-install-blobs > --enable-jack > --enable-keyring > --enable-kvm > --enable-l2tpv3 > --enable-libdaxctl > --enable-libiscsi > --enable-libnfs > --enable-libpmem > --enable-libssh > --enable-libudev > --enable-libusb > --enable-linux-aio > --enable-linux-io-uring > --disable-linux-user > --enable-live-block-migration > --disable-lto > --disable-lzfse > --enable-lzo > --disable-malloc-trim > --enable-membarrier > --enable-module-upgrades > --enable-modules > --enable-mpath > --enable-multiprocess > --disable-netmap > --enable-nettle > --enable-numa > --disable-nvmm > --enable-opengl > --enable-oss > --enable-pa > --enable-parallels > --enable-pie > --enable-plugins > --enable-png > --disable-profiler > --enable-pvrdma > --enable-qcow1 > --enable-qed > --disable-qom-cast-debug > --enable-rbd > --enable-rdma > --enable-replication > --enable-rng-none > --disable-safe-stack > --disable-sanitizers > --enable-stack-protector > --enable-sdl > --enable-sdl-image > --enable-seccomp > --enable-selinux > --enable-slirp > --enable-slirp-smbd > --enable-smartcard > --enable-snappy > --enable-sparse > --enable-spice > --enable-spice-protocol > --enable-system > --enable-tcg > --disable-tcg-interpreter > --disable-tools > --enable-tpm > --disable-tsan > --disable-u2f > --enable-usb-redir > --disable-user > --disable-vde > --enable-vdi > --enable-vhost-crypto > --enable-vhost-kernel > --enable-vhost-net > --enable-vhost-user > --enable-vhost-user-blk-server > --enable-vhost-vdpa > --enable-virglrenderer > --enable-virtfs > --enable-virtiofsd > --enable-vnc > --enable-vnc-jpeg > --enable-vnc-sasl > --enable-vte > --enable-vvfat > --enable-werror > --disable-whpx > --enable-xen > --enable-xen-pci-passthrough > --enable-xkbcommon > --enable-zstd > --disable-gcrypt I ran qtest with reference to your environment, and finally reported two errors. Error 1(the same as yours): QTEST_QEMU_BINARY=./qemu-system-x86_64 MALLOC_PERTURB_=87 G_TEST_DBUS_DAEMON=/data00/migration/qemu-open/tests/dbus-vmstate-daemon.sh /data00/migration/qemu-open/build/tests/qtest/virtio-net-failover --tap -k ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― ✀ ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― stderr: qemu-system-x86_64: /data00/migration/qemu-open/include/exec/memory.h:1114: address_space_to_flatview: Assertion `(!memory_region_transaction_in_progress() && qemu_mutex_iothread_locked()) || rcu_read_is_locked()' failed. Broken pipe ../tests/qtest/libqtest.c:190: kill_qemu() detected QEMU death from signal 6 (Aborted) (core dumped) (test program exited with status code -6) TAP parsing error: Too few tests run (expected 23, got 12) Coredump backtrace: #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007f3af64a8535 in __GI_abort () at abort.c:79 #2 0x00007f3af64a840f in __assert_fail_base (fmt=0x7f3af6609ef0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55d9425f48a8 "(!memory_region_transaction_in_progress() && qemu_mutex_iothread_locked()) || rcu_read_is_locked()", file=0x55d9425f4870 "/data00/migration/qemu-open/include/exec/memory.h", line=1114, function=) at assert.c:92 #3 0x00007f3af64b61a2 in __GI___assert_fail (assertion=assertion@entry=0x55d9425f48a8 "(!memory_region_transaction_in_progress() && qemu_mutex_iothread_locked()) || rcu_read_is_locked()", file=file@entry=0x55d9425f4870 "/data00/migration/qemu-open/include/exec/memory.h", line=line@entry=1114, function=function@entry=0x55d9426cdce0 <__PRETTY_FUNCTION__.20039> "address_space_to_flatview") at assert.c:101 #4 0x000055d942373853 in address_space_to_flatview (as=0x55d944738648) at /data00/migration/qemu-open/include/exec/memory.h:1112 #5 0x000055d9423746f5 in address_space_to_flatview (as=0x55d944738648) at /data00/migration/qemu-open/include/qemu/rcu.h:126 #6 address_space_set_flatview (as=as@entry=0x55d944738648) at ../softmmu/memory.c:1029 #7 0x000055d94237ace3 in address_space_update_topology (as=0x55d944738648) at ../softmmu/memory.c:1080 #8 address_space_init (as=as@entry=0x55d944738648, root=root@entry=0x55d9447386a0, name=name@entry=0x55d9447384f0 "virtio-net-pci") at ../softmmu/memory.c:3082 #9 0x000055d942151e43 in do_pci_register_device (errp=0x7f3aef7fe850, devfn=, name=0x55d9444b6c40 "virtio-net-pci", pci_dev=0x55d944738410) at ../hw/pci/pci.c:1145 #10 pci_qdev_realize (qdev=0x55d944738410, errp=0x7f3aef7fe850) at ../hw/pci/pci.c:2036 #11 0x000055d942404a8f in device_set_realized (obj=, value=true, errp=0x7f3aef7feae0) at ../hw/core/qdev.c:510 #12 0x000055d942407e36 in property_set_bool (obj=0x55d944738410, v=, name=, opaque=0x55d9444c71d0, errp=0x7f3aef7feae0) at ../qom/object.c:2285 #13 0x000055d94240a0e3 in object_property_set (obj=obj@entry=0x55d944738410, name=name@entry=0x55d942670c23 "realized", v=v@entry=0x55d9452f7a00, errp=errp@entry=0x7f3aef7feae0) at ../qom/object.c:1420 #14 0x000055d94240d15f in object_property_set_qobject (obj=obj@entry=0x55d944738410, name=name@entry=0x55d942670c23 "realized", value=value@entry=0x55d945306cb0, errp=errp@entry=0x7f3aef7feae0) at ../qom/qom-qobject.c:28 #15 0x000055d94240a354 in object_property_set_bool (obj=0x55d944738410, name=name@entry=0x55d942670c23 "realized", value=value@entry=true, errp=errp@entry=0x7f3aef7feae0) at ../qom/object.c:1489 #16 0x000055d94240427c in qdev_realize (dev=, bus=bus@entry=0x55d945141400, errp=errp@entry=0x7f3aef7feae0) at ../hw/core/qdev.c:292 #17 0x000055d9421ef4a0 in qdev_device_add_from_qdict (opts=0x55d945309c00, from_json=, errp=, errp@entry=0x7f3aef7feae0) at /data00/migration/qemu-open/include/hw/qdev-core.h:17 #18 0x000055d942311c85 in failover_add_primary (errp=0x7f3aef7fead8, n=0x55d9454e8530) at ../hw/net/virtio-net.c:933 #19 virtio_net_set_features (vdev=, features=4611687122246533156) at ../hw/net/virtio-net.c:1004 #20 0x000055d94233d248 in virtio_set_features_nocheck (vdev=vdev@entry=0x55d9454e8530, val=val@entry=4611687122246533156) at ../hw/virtio/virtio.c:2851 #21 0x000055d942342eae in virtio_load (vdev=0x55d9454e8530, f=0x55d944700de0, version_id=11) at ../hw/virtio/virtio.c:3027 #22 0x000055d942207601 in vmstate_load_state (f=f@entry=0x55d944700de0, vmsd=0x55d9429baba0 , opaque=0x55d9454e8530, version_id=11) at ../migration/vmstate.c:137 #23 0x000055d942222672 in vmstate_load (f=0x55d944700de0, se=0x55d94561b700) at ../migration/savevm.c:919 #24 0x000055d942222927 in qemu_loadvm_section_start_full (f=f@entry=0x55d944700de0, mis=0x55d9444c23e0) at ../migration/savevm.c:2503 #25 0x000055d942225cc8 in qemu_loadvm_state_main (f=f@entry=0x55d944700de0, mis=mis@entry=0x55d9444c23e0) at ../migration/savevm.c:2729 #26 0x000055d942227195 in qemu_loadvm_state (f=0x55d944700de0) at ../migration/savevm.c:2816 #27 0x000055d94221480e in process_incoming_migration_co (opaque=) at ../migration/migration.c:606 #28 0x000055d94257d2ab in coroutine_trampoline (i0=, i1=) at ../util/coroutine-ucontext.c:177 #29 0x00007f3af64d2c80 in __correctly_grouped_prefixwc (begin=0x2 , end=0x0, thousands=0 L'\000', grouping=0x7f3af64bd8eb <__GI_raise+267> "H\213\214$\b\001") at grouping.c:171 #30 0x0000000000000000 in ?? () It seems that when address_space_to_flatview() is called, there is mr transaction in progress, and the rcu read lock is not held. I need to further consider the conditions for sanity check or whether we can hold the rcu read lock before address_space_init() to solve the problem. Error 2: ERROR:../tests/qtest/migration-helpers.c:205:wait_for_migration_status: assertion failed: (g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT) ERROR 180/180 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test ERROR 146.32s killed by signal 6 SIGABRT >>> QTEST_QEMU_BINARY=./qemu-system-x86_64 MALLOC_PERTURB_=250 G_TEST_DBUS_DAEMON=/data00/migration/qemu-open/tests/dbus-vmstate-daemon.sh /data00/migration/qemu-open/build/tests/qtest/migration-test --tap -k ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― ✀ ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― qemu-system-x86_64: ../softmmu/memory.c:1094: memory_region_transaction_commit: Assertion `qemu_mutex_iothread_locked()' failed. ** ERROR:../tests/qtest/migration-helpers.c:205:wait_for_migration_status: assertion failed: (g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT) ../tests/qtest/libqtest.c:190: kill_qemu() detected QEMU death from signal 6 (Aborted) (core dumped) (test program exited with status code -6) Coredump backtrace: #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007fed5c14d535 in __GI_abort () at abort.c:79 #2 0x00007fed5c14d40f in __assert_fail_base (fmt=0x7fed5c2aeef0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x561bc4152424 "qemu_mutex_iothread_locked()", file=0x561bc41ae94b "../softmmu/memory.c", line=1094, function=) at assert.c:92 #3 0x00007fed5c15b1a2 in __GI___assert_fail (assertion=assertion@entry=0x561bc4152424 "qemu_mutex_iothread_locked()", file=file@entry=0x561bc41ae94b "../softmmu/memory.c", line=line@entry=1094, function=function@entry=0x561bc41afca0 <__PRETTY_FUNCTION__.38746> "memory_region_transaction_commit") at assert.c:101 #4 0x0000561bc3e5a053 in memory_region_transaction_commit () at ../softmmu/memory.c:1094 #5 0x0000561bc3d07b55 in qemu_loadvm_state_main (f=f@entry=0x561bc6443aa0, mis=mis@entry=0x561bc62028a0) at ../migration/savevm.c:2789 #6 0x0000561bc3d08e46 in postcopy_ram_listen_thread (opaque=opaque@entry=0x561bc62028a0) at ../migration/savevm.c:1922 #7 0x0000561bc404b3da in qemu_thread_start (args=) at ../util/qemu-thread-posix.c:505 #8 0x00007fed5c2f2fa3 in start_thread (arg=) at pthread_create.c:486 #9 0x00007fed5c22406f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Error2 is related to postcopy. I don't know much about the code of postcopy. So I need some time to look at this part of code. And later I will send another email to discuss it with Peter. Copy Peter. Thanks!
Hi, Peter! In my last email to Juan, I mentioned two errors. Now I want to discuss them with you. On 2023/2/16 下午11:41, Chuang Xu wrote: > I ran qtest with reference to your environment, and finally reported > two errors. > > Error 1(the same as yours): > > QTEST_QEMU_BINARY=./qemu-system-x86_64 MALLOC_PERTURB_=87 > G_TEST_DBUS_DAEMON=/data00/migration/qemu-open/tests/dbus-vmstate-daemon.sh > /data00/migration/qemu-open/build/tests/qtest/virtio-net-failover > --tap -k > ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― > ✀ > ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― > stderr: > qemu-system-x86_64: > /data00/migration/qemu-open/include/exec/memory.h:1114: > address_space_to_flatview: Assertion > `(!memory_region_transaction_in_progress() && > qemu_mutex_iothread_locked()) || rcu_read_is_locked()' failed. > Broken pipe > ../tests/qtest/libqtest.c:190: kill_qemu() detected QEMU death from > signal 6 (Aborted) (core dumped) > > (test program exited with status code -6) > > TAP parsing error: Too few tests run (expected 23, got 12) > > Coredump backtrace: > #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 > #1 0x00007f3af64a8535 in __GI_abort () at abort.c:79 > #2 0x00007f3af64a840f in __assert_fail_base (fmt=0x7f3af6609ef0 > "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55d9425f48a8 > "(!memory_region_transaction_in_progress() && > qemu_mutex_iothread_locked()) || rcu_read_is_locked()", > file=0x55d9425f4870 > "/data00/migration/qemu-open/include/exec/memory.h", line=1114, > function=<optimized out>) at assert.c:92 > #3 0x00007f3af64b61a2 in __GI___assert_fail > (assertion=assertion@entry=0x55d9425f48a8 > "(!memory_region_transaction_in_progress() && > qemu_mutex_iothread_locked()) || rcu_read_is_locked()", > file=file@entry=0x55d9425f4870 > "/data00/migration/qemu-open/include/exec/memory.h", > line=line@entry=1114, function=function@entry=0x55d9426cdce0 > <__PRETTY_FUNCTION__.20039> "address_space_to_flatview") at assert.c:101 > #4 0x000055d942373853 in address_space_to_flatview > (as=0x55d944738648) at > /data00/migration/qemu-open/include/exec/memory.h:1112 > #5 0x000055d9423746f5 in address_space_to_flatview > (as=0x55d944738648) at /data00/migration/qemu-open/include/qemu/rcu.h:126 > #6 address_space_set_flatview (as=as@entry=0x55d944738648) at > ../softmmu/memory.c:1029 > #7 0x000055d94237ace3 in address_space_update_topology > (as=0x55d944738648) at ../softmmu/memory.c:1080 > #8 address_space_init (as=as@entry=0x55d944738648, > root=root@entry=0x55d9447386a0, name=name@entry=0x55d9447384f0 > "virtio-net-pci") at ../softmmu/memory.c:3082 > #9 0x000055d942151e43 in do_pci_register_device (errp=0x7f3aef7fe850, > devfn=<optimized out>, name=0x55d9444b6c40 "virtio-net-pci", > pci_dev=0x55d944738410) at ../hw/pci/pci.c:1145 > #10 pci_qdev_realize (qdev=0x55d944738410, errp=0x7f3aef7fe850) at > ../hw/pci/pci.c:2036 > #11 0x000055d942404a8f in device_set_realized (obj=<optimized out>, > value=true, errp=0x7f3aef7feae0) at ../hw/core/qdev.c:510 > #12 0x000055d942407e36 in property_set_bool (obj=0x55d944738410, > v=<optimized out>, name=<optimized out>, opaque=0x55d9444c71d0, > errp=0x7f3aef7feae0) at ../qom/object.c:2285 > #13 0x000055d94240a0e3 in object_property_set > (obj=obj@entry=0x55d944738410, name=name@entry=0x55d942670c23 > "realized", v=v@entry=0x55d9452f7a00, errp=errp@entry=0x7f3aef7feae0) > at ../qom/object.c:1420 > #14 0x000055d94240d15f in object_property_set_qobject > (obj=obj@entry=0x55d944738410, name=name@entry=0x55d942670c23 > "realized", value=value@entry=0x55d945306cb0, > errp=errp@entry=0x7f3aef7feae0) at ../qom/qom-qobject.c:28 > #15 0x000055d94240a354 in object_property_set_bool > (obj=0x55d944738410, name=name@entry=0x55d942670c23 "realized", > value=value@entry=true, errp=errp@entry=0x7f3aef7feae0) at > ../qom/object.c:1489 > #16 0x000055d94240427c in qdev_realize (dev=<optimized out>, > bus=bus@entry=0x55d945141400, errp=errp@entry=0x7f3aef7feae0) at > ../hw/core/qdev.c:292 > #17 0x000055d9421ef4a0 in qdev_device_add_from_qdict > (opts=0x55d945309c00, from_json=<optimized out>, errp=<optimized out>, > errp@entry=0x7f3aef7feae0) at > /data00/migration/qemu-open/include/hw/qdev-core.h:17 > #18 0x000055d942311c85 in failover_add_primary (errp=0x7f3aef7fead8, > n=0x55d9454e8530) at ../hw/net/virtio-net.c:933 > #19 virtio_net_set_features (vdev=<optimized out>, > features=4611687122246533156) at ../hw/net/virtio-net.c:1004 > #20 0x000055d94233d248 in virtio_set_features_nocheck > (vdev=vdev@entry=0x55d9454e8530, val=val@entry=4611687122246533156) at > ../hw/virtio/virtio.c:2851 > #21 0x000055d942342eae in virtio_load (vdev=0x55d9454e8530, > f=0x55d944700de0, version_id=11) at ../hw/virtio/virtio.c:3027 > #22 0x000055d942207601 in vmstate_load_state > (f=f@entry=0x55d944700de0, vmsd=0x55d9429baba0 <vmstate_virtio_net>, > opaque=0x55d9454e8530, version_id=11) at ../migration/vmstate.c:137 > #23 0x000055d942222672 in vmstate_load (f=0x55d944700de0, > se=0x55d94561b700) at ../migration/savevm.c:919 > #24 0x000055d942222927 in qemu_loadvm_section_start_full > (f=f@entry=0x55d944700de0, mis=0x55d9444c23e0) at > ../migration/savevm.c:2503 > #25 0x000055d942225cc8 in qemu_loadvm_state_main > (f=f@entry=0x55d944700de0, mis=mis@entry=0x55d9444c23e0) at > ../migration/savevm.c:2729 > #26 0x000055d942227195 in qemu_loadvm_state (f=0x55d944700de0) at > ../migration/savevm.c:2816 > #27 0x000055d94221480e in process_incoming_migration_co > (opaque=<optimized out>) at ../migration/migration.c:606 > #28 0x000055d94257d2ab in coroutine_trampoline (i0=<optimized out>, > i1=<optimized out>) at ../util/coroutine-ucontext.c:177 > #29 0x00007f3af64d2c80 in __correctly_grouped_prefixwc (begin=0x2 > <error: Cannot access memory at address 0x2>, end=0x0, thousands=0 > L'\000', grouping=0x7f3af64bd8eb <__GI_raise+267> "H\213\214$\b\001") > at grouping.c:171 > #30 0x0000000000000000 in ?? () > > > It seems that when address_space_to_flatview() is called, there is mr > transaction > in progress, and the rcu read lock is not held. > > I need to further consider the conditions for sanity check or whether > we can hold the > rcu read lock before address_space_init() to solve the problem. > > > Error 2: > > ERROR:../tests/qtest/migration-helpers.c:205:wait_for_migration_status: > assertion failed: (g_test_timer_elapsed() < > MIGRATION_STATUS_WAIT_TIMEOUT) ERROR > 180/180 qemu:qtest+qtest-x86_64 / > qtest-x86_64/migration-test ERROR 146.32s killed by > signal 6 SIGABRT >>>> QTEST_QEMU_BINARY=./qemu-system-x86_64 MALLOC_PERTURB_=250 >>>> G_TEST_DBUS_DAEMON=/data00/migration/qemu-open/tests/dbus-vmstate-daemon.sh >>>> /data00/migration/qemu-open/build/tests/qtest/migration-test --tap -k > ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― > ✀ > ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― > qemu-system-x86_64: ../softmmu/memory.c:1094: > memory_region_transaction_commit: Assertion > `qemu_mutex_iothread_locked()' failed. > ** > ERROR:../tests/qtest/migration-helpers.c:205:wait_for_migration_status: > assertion failed: (g_test_timer_elapsed() < > MIGRATION_STATUS_WAIT_TIMEOUT) > ../tests/qtest/libqtest.c:190: kill_qemu() detected QEMU death from > signal 6 (Aborted) (core dumped) > > (test program exited with status code -6) > > Coredump backtrace: > #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 > #1 0x00007fed5c14d535 in __GI_abort () at abort.c:79 > #2 0x00007fed5c14d40f in __assert_fail_base (fmt=0x7fed5c2aeef0 > "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x561bc4152424 > "qemu_mutex_iothread_locked()", file=0x561bc41ae94b > "../softmmu/memory.c", line=1094, function=<optimized out>) at > assert.c:92 > #3 0x00007fed5c15b1a2 in __GI___assert_fail > (assertion=assertion@entry=0x561bc4152424 > "qemu_mutex_iothread_locked()", file=file@entry=0x561bc41ae94b > "../softmmu/memory.c", line=line@entry=1094, > function=function@entry=0x561bc41afca0 <__PRETTY_FUNCTION__.38746> > "memory_region_transaction_commit") at assert.c:101 > #4 0x0000561bc3e5a053 in memory_region_transaction_commit () at > ../softmmu/memory.c:1094 > #5 0x0000561bc3d07b55 in qemu_loadvm_state_main > (f=f@entry=0x561bc6443aa0, mis=mis@entry=0x561bc62028a0) at > ../migration/savevm.c:2789 > #6 0x0000561bc3d08e46 in postcopy_ram_listen_thread > (opaque=opaque@entry=0x561bc62028a0) at ../migration/savevm.c:1922 > #7 0x0000561bc404b3da in qemu_thread_start (args=<optimized out>) at > ../util/qemu-thread-posix.c:505 > #8 0x00007fed5c2f2fa3 in start_thread (arg=<optimized out>) at > pthread_create.c:486 > #9 0x00007fed5c22406f in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 > > > Error2 is related to postcopy. I don't know much about the code of > postcopy. > So I need some time to look at this part of code. > > And later I will send another email to discuss it with Peter. > > Copy Peter. > > Thanks! Error 1 was triggered by our sanity check. I try to add RCU_READ_LOCK_GUARD() in address_space_init() and it works. But I'm not sure if this code change is appropriate. If this change is not appropriate, we may need to consider other sanity check. Error 2 was related to postcopy. I read the official document of postcopy (I hope it is the latest) and learned that two threads will call qemu_loadvm_state_main() in the process of postcopy. The one called by main thread will take the BQL, and the one called by ram_listen thread won't take the BQL. The latter checks whether the BQL is held when calling memory_region_transaction_commit(), thus triggering the assertion. Creating a new function qemu_loadvm_state_ram_listen() without memory_region_transaction_commit() will solve this error. I don't know if you suggest using this patch in postcopy. If this patch is applicable to postcopy, considering the difference between how postcopy and precheck load device state, do we need to consider more details? Looking forward to your reply. Thanks.
Hello, Chuang, On Fri, Feb 17, 2023 at 04:11:19PM +0800, Chuang Xu wrote: > Error 1 was triggered by our sanity check. I try to add RCU_READ_LOCK_GUARD() > in address_space_init() and it works. But I'm not sure if this code change is > appropriate. If this change is not appropriate, we may need to consider other > sanity check. I'd suggest not adding RCU locks without a good reason. address_space_init() is definitely a special context because the AS is exclusively owned by the caller before it returns. It means no RCU protection needed at all because no one else is touching it; neither do we need qatomic_rcu_read() when read. So I suggest we directly reference current_map, even though that'll need a rich comment: static void address_space_set_flatview(AddressSpace *as) { - FlatView *old_view = address_space_to_flatview(as); + /* + * NOTE: we don't use RCU flavoured of address_space_to_flatview() + * because we exclusively own as->current_map here: it's either during + * init of an address space, or during commit() with BQL held. + */ + FlatView *old_view = as->current_map; We can have address_space_to_flatview_raw() but since we'll directly modify as->current_map very soon in the same function, so may not even bother. > > Error 2 was related to postcopy. I read the official document of postcopy > (I hope it is the latest) and learned that two threads will call > qemu_loadvm_state_main() in the process of postcopy. The one called by main thread > will take the BQL, and the one called by ram_listen thread won't take the BQL. > The latter checks whether the BQL is held when calling memory_region_transaction_commit(), > thus triggering the assertion. Creating a new function qemu_loadvm_state_ram_listen() > without memory_region_transaction_commit() will solve this error. Sounds right, because the whole qemu_loadvm_state_main() process shouldn't load any device state or anything that requires BQL at all; in most cases that should be only RAM states leftovers. I think we only want to optimize precopy but not the postcopy phase. Note! it should include the phase when transferring precopy -> postcopy too, so it's covering postcopy, just not covering the "post" phase of migration - if you see that's a nested call to qemu_loadvm_state_main() with a whole MIG_CMD_PACKAGED package which is actually got covered, which is the real meat for postcopy on device transitions. So in short: instead of creating qemu_loadvm_state_ram_listen(), how about modifying your patch 3, instead of changing inside qemu_loadvm_state_main() we can do that for qemu_loadvm_state() only (so you can wrap the begin() and commit() over qemu_loadvm_state_main() there)? > > I don't know if you suggest using this patch in postcopy. If this patch is applicable to > postcopy, considering the difference between how postcopy and precheck load device state, > do we need to consider more details? See above. Yes I definitely hope postcopy will be covered too. Thanks!
Hi, Juan On 2023/2/16 上午3:10, Juan Quintela wrote: > Chuang Xu <xuchuangxclwt@bytedance.com> wrote: >> In this version: > Hi > > I had to drop this. It breaks migration of dbus-vmstate. > > .[K144/179 qemu:qtest+qtest-x86_64 / qtest-x86_64/virtio-net-failover ERROR 5.66s killed by signal 6 SIGABRT >>>> G_TEST_DBUS_DAEMON=/mnt/code/qemu/multifd/tests/dbus-vmstate-daemon.sh QTEST_QEMU_BINARY=./qemu-system-x86_64 MALLOC_PERTURB_=145 /scratch/qemu/multifd/x64/tests/qtest/virtio-net-failover --tap -k > ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― ✀ ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― > stderr: > qemu-system-x86_64: /mnt/code/qemu/multifd/include/exec/memory.h:1112: address_space_to_flatview: Assertion `(!memory_region_transaction_in_progress() && qemu_mutex_iothread_locked()) || rcu_read_is_locked()' failed. > Broken pipe > ../../../../mnt/code/qemu/multifd/tests/qtest/libqtest.c:190: kill_qemu() detected QEMU death from signal 6 (Aborted) (core dumped) > > (test program exited with status code -6) > > TAP parsing error: Too few tests run (expected 23, got 12) > ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― > > Can you take a look at this? > > I reproduced it with "make check" and qemu compiled with the configure > line attached. > > Later, Juan. > > configure --enable-trace-backends=log > --prefix=/usr > --sysconfdir=/etc/sysconfig/ > --audio-drv-list=pa > --target-list=x86_64-softmmu > --with-coroutine=ucontext > --with-git-submodules=validate > --enable-fdt=system > --enable-alsa > --enable-attr > --enable-auth-pam > --enable-avx2 > --enable-avx512f > --enable-bochs > --enable-bpf > --enable-brlapi > --disable-bsd-user > --enable-bzip2 > --enable-cap-ng > --disable-capstone > --disable-cfi > --disable-cfi-debug > --enable-cloop > --disable-cocoa > --enable-containers > --disable-coreaudio > --enable-coroutine-pool > --enable-crypto-afalg > --enable-curl > --enable-curses > --enable-dbus-display > --enable-debug-info > --disable-debug-mutex > --disable-debug-stack-usage > --disable-debug-tcg > --enable-dmg > --disable-docs > --disable-dsound > --enable-fdt > --disable-fuse > --disable-fuse-lseek > --disable-fuzzing > --disable-gcov > --enable-gettext > --enable-gio > --enable-glusterfs > --enable-gnutls > --disable-gprof > --enable-gtk > --disable-guest-agent > --disable-guest-agent-msi > --disable-hax > --disable-hvf > --enable-iconv > --enable-install-blobs > --enable-jack > --enable-keyring > --enable-kvm > --enable-l2tpv3 > --enable-libdaxctl > --enable-libiscsi > --enable-libnfs > --enable-libpmem > --enable-libssh > --enable-libudev > --enable-libusb > --enable-linux-aio > --enable-linux-io-uring > --disable-linux-user > --enable-live-block-migration > --disable-lto > --disable-lzfse > --enable-lzo > --disable-malloc-trim > --enable-membarrier > --enable-module-upgrades > --enable-modules > --enable-mpath > --enable-multiprocess > --disable-netmap > --enable-nettle > --enable-numa > --disable-nvmm > --enable-opengl > --enable-oss > --enable-pa > --enable-parallels > --enable-pie > --enable-plugins > --enable-png > --disable-profiler > --enable-pvrdma > --enable-qcow1 > --enable-qed > --disable-qom-cast-debug > --enable-rbd > --enable-rdma > --enable-replication > --enable-rng-none > --disable-safe-stack > --disable-sanitizers > --enable-stack-protector > --enable-sdl > --enable-sdl-image > --enable-seccomp > --enable-selinux > --enable-slirp > --enable-slirp-smbd > --enable-smartcard > --enable-snappy > --enable-sparse > --enable-spice > --enable-spice-protocol > --enable-system > --enable-tcg > --disable-tcg-interpreter > --disable-tools > --enable-tpm > --disable-tsan > --disable-u2f > --enable-usb-redir > --disable-user > --disable-vde > --enable-vdi > --enable-vhost-crypto > --enable-vhost-kernel > --enable-vhost-net > --enable-vhost-user > --enable-vhost-user-blk-server > --enable-vhost-vdpa > --enable-virglrenderer > --enable-virtfs > --enable-virtiofsd > --enable-vnc > --enable-vnc-jpeg > --enable-vnc-sasl > --enable-vte > --enable-vvfat > --enable-werror > --disable-whpx > --enable-xen > --enable-xen-pci-passthrough > --enable-xkbcommon > --enable-zstd > --disable-gcrypt I'll fix this error in v6. In addition to the test mentioned in your email, are there any other conditions that need to be tested? I hope to have a full test before I send v6. Thanks!
Chuang Xu <xuchuangxclwt@bytedance.com> wrote: > Hi, Juan >> --target-list=x86_64-softmmu Compile withouth this line, that will compile all system emulators. If you pass "make check" there, I would think that you have done your part. Thanks, Juan.
Hi, Peter On 2023/2/17 下午11:52, Peter Xu wrote: > Hello, Chuang, > > On Fri, Feb 17, 2023 at 04:11:19PM +0800, Chuang Xu wrote: >> Error 1 was triggered by our sanity check. I try to add RCU_READ_LOCK_GUARD() >> in address_space_init() and it works. But I'm not sure if this code change is >> appropriate. If this change is not appropriate, we may need to consider other >> sanity check. > I'd suggest not adding RCU locks without a good reason. > > address_space_init() is definitely a special context because the AS is > exclusively owned by the caller before it returns. It means no RCU > protection needed at all because no one else is touching it; neither do we > need qatomic_rcu_read() when read. > > So I suggest we directly reference current_map, even though that'll need a > rich comment: > > static void address_space_set_flatview(AddressSpace *as) > { > - FlatView *old_view = address_space_to_flatview(as); > + /* > + * NOTE: we don't use RCU flavoured of address_space_to_flatview() > + * because we exclusively own as->current_map here: it's either during > + * init of an address space, or during commit() with BQL held. > + */ > + FlatView *old_view = as->current_map; > > We can have address_space_to_flatview_raw() but since we'll directly modify > as->current_map very soon in the same function, so may not even bother. I agree with you. But now I am facing a new problem. Our sanity check is not as reliable as we think. Although my current code can pass all the tests that Juan told me in the email. But if I configure with nothing and run 'make check', My patch triggers error in ahci migrate test: G_TEST_DBUS_DAEMON=/data00/migration/qemu-open/tests/dbus-vmstate-daemon.sh QTEST_QEMU_BINARY=./qemu-system-x86_64 QTEST_QEMU_IMG=./qemu-img MALLOC_PERTURB_=1 QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon /data00/migration/qemu-open/build/tests/qtest/ahci-test --tap -k ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― ✀ ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― stderr: qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS receive buffer address qemu-system-x86_64: Failed to load ich9_ahci:ahci qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:1f.2/ich9_ahci' qemu-system-x86_64: load of migration failed: Operation not permitted Migration did not complete, status: failed It seems that ahci_state_post_load() will call memory_access_is_direct() and access mr->ram. Due to mr transaction delay, this value doesn't meet expectations. And Here is the call chain for you to read the code: ->ahci_state_post_load ->ahci_cond_start_engines ->ahci_map_fis_address ->map_page ->dma_memory_map ->address_space_map ->memory_access_is_direct I think we need a memory_region_transaction_commit_force() to force commit some transactions when load vmstate. This function is designed like this: | /* memory_region_transaction_commit_force() is desgined to * force the mr transaction to be commited in the process * of loading vmstate. */ void memory_region_transaction_commit_force(void) { AddressSpace *as; unsigned int memory_region_transaction_depth_copy = memory_region_transaction_depth; /* Temporarily replace memory_region_transaction_depth with 0 to prevent * memory_region_transaction_commit_force() and address_space_to_flatview() * call each other recursively. */ memory_region_transaction_depth = 0; assert(qemu_mutex_iothread_locked()); if (memory_region_update_pending) { flatviews_reset(); MEMORY_LISTENER_CALL_GLOBAL(begin, Forward); QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) { address_space_set_flatview(as); address_space_update_ioeventfds(as); } memory_region_update_pending = false; ioeventfd_update_pending = false; MEMORY_LISTENER_CALL_GLOBAL(commit, Forward); } else if (ioeventfd_update_pending) { QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) { address_space_update_ioeventfds(as); } ioeventfd_update_pending = false; } /* recover memory_region_transaction_depth */ memory_region_transaction_depth = memory_region_transaction_depth_copy; } Now there are two options to use this function: 1. call it in address_space_to_flatview(): | static inline FlatView *address_space_to_flatview(AddressSpace *as) { /* * Before using any flatview, check whether we're during a memory * region transaction. If so, force commit the memory transaction. */ if (memory_region_transaction_in_progress()) memory_region_transaction_commit_force(); return qatomic_rcu_read(&as->current_map); } 2. call it before each post_load() I prefer to use the former one, it is more general than the latter. And with this function, the sanity check is not necessary any more. Although we may inevitably call memory_region_transaction_commit_force()|||| several times, in my actual test, the number of calls to || |memory_region_transaction_commit_force|() is still insignificant compared with the number of calls to|memory_region_transaction_commit()| before optimization. As a result, This code won't affect the optimization effect, but it can ensure reliability. | | >> Error 2 was related to postcopy. I read the official document of postcopy >> (I hope it is the latest) and learned that two threads will call >> qemu_loadvm_state_main() in the process of postcopy. The one called by main thread >> will take the BQL, and the one called by ram_listen thread won't take the BQL. >> The latter checks whether the BQL is held when calling memory_region_transaction_commit(), >> thus triggering the assertion. Creating a new function qemu_loadvm_state_ram_listen() >> without memory_region_transaction_commit() will solve this error. > Sounds right, because the whole qemu_loadvm_state_main() process shouldn't > load any device state or anything that requires BQL at all; in most cases > that should be only RAM states leftovers. > > I think we only want to optimize precopy but not the postcopy phase. Note! > it should include the phase when transferring precopy -> postcopy too, so > it's covering postcopy, just not covering the "post" phase of migration - > if you see that's a nested call to qemu_loadvm_state_main() with a whole > MIG_CMD_PACKAGED package which is actually got covered, which is the real > meat for postcopy on device transitions. > > So in short: instead of creating qemu_loadvm_state_ram_listen(), how about > modifying your patch 3, instead of changing inside qemu_loadvm_state_main() > we can do that for qemu_loadvm_state() only (so you can wrap the begin() > and commit() over qemu_loadvm_state_main() there)? In Patch v6 I'll adopt the change you suggested, Thanks! >> I don't know if you suggest using this patch in postcopy. If this patch is applicable to >> postcopy, considering the difference between how postcopy and precheck load device state, >> do we need to consider more details? > See above. Yes I definitely hope postcopy will be covered too. > > Thanks! >
Hi, Peter It seems that there is a problem with the code format in my last email. I adjusted the format and resend this to you. Hope the format of this email won't be wrong again.. :) On 2023/2/17 下午11:52, Peter Xu wrote: > Hello, Chuang, > > On Fri, Feb 17, 2023 at 04:11:19PM +0800, Chuang Xu wrote: >> Error 1 was triggered by our sanity check. I try to add RCU_READ_LOCK_GUARD() >> in address_space_init() and it works. But I'm not sure if this code change is >> appropriate. If this change is not appropriate, we may need to consider other >> sanity check. > I'd suggest not adding RCU locks without a good reason. > > address_space_init() is definitely a special context because the AS is > exclusively owned by the caller before it returns. It means no RCU > protection needed at all because no one else is touching it; neither do we > need qatomic_rcu_read() when read. > > So I suggest we directly reference current_map, even though that'll need a > rich comment: > > static void address_space_set_flatview(AddressSpace *as) > { > - FlatView *old_view = address_space_to_flatview(as); > + /* > + * NOTE: we don't use RCU flavoured of address_space_to_flatview() > + * because we exclusively own as->current_map here: it's either during > + * init of an address space, or during commit() with BQL held. > + */ > + FlatView *old_view = as->current_map; > > We can have address_space_to_flatview_raw() but since we'll directly modify > as->current_map very soon in the same function, so may not even bother. I agree with you. But now I am facing a new problem. Our sanity check is not as reliable as we think. Although my current code can pass all the tests that Juan told me in the email. But if I configure with nothing and run 'make check', My patch triggers error in ahci migrate test: G_TEST_DBUS_DAEMON=/data00/migration/qemu-open/tests/dbus-vmstate-daemon.sh QTEST_QEMU_BINARY=./qemu-system-x86_64 QTEST_QEMU_IMG=./qemu-img MALLOC_PERTURB_=1 QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon /data00/migration/qemu-open/build/tests/qtest/ahci-test --tap -k ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― ✀ ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― stderr: qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS receive buffer address qemu-system-x86_64: Failed to load ich9_ahci:ahci qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:1f.2/ich9_ahci' qemu-system-x86_64: load of migration failed: Operation not permitted Migration did not complete, status: failed It seems that ahci_state_post_load() will call memory_access_is_direct() and access mr->ram. Due to mr transaction delay, this value doesn't meet expectations. And Here is the call chain for you to read the code: ->ahci_state_post_load ->ahci_cond_start_engines ->ahci_map_fis_address ->map_page ->dma_memory_map ->address_space_map ->memory_access_is_direct I think we need a memory_region_transaction_commit_force() to force commit some transactions when load vmstate. This function is designed like this: /* * memory_region_transaction_commit_force() is desgined to * force the mr transaction to be commited in the process * of loading vmstate. */ void memory_region_transaction_commit_force(void) { AddressSpace *as; unsigned int memory_region_transaction_depth_copy = memory_region_transaction_depth; /* * Temporarily replace memory_region_transaction_depth with 0 to prevent * memory_region_transaction_commit_force() and address_space_to_flatview() * call each other recursively. */ memory_region_transaction_depth = 0; assert(qemu_mutex_iothread_locked()); if (memory_region_update_pending) { flatviews_reset(); MEMORY_LISTENER_CALL_GLOBAL(begin, Forward); QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) { address_space_set_flatview(as); address_space_update_ioeventfds(as); } memory_region_update_pending = false; ioeventfd_update_pending = false; MEMORY_LISTENER_CALL_GLOBAL(commit, Forward); } else if (ioeventfd_update_pending) { QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) { address_space_update_ioeventfds(as); } ioeventfd_update_pending = false; } /* recover memory_region_transaction_depth */ memory_region_transaction_depth = memory_region_transaction_depth_copy; } Now there are two options to use this function: 1. call it in address_space_to_flatview(): static inline FlatView *address_space_to_flatview(AddressSpace *as) { /* * Before using any flatview, check whether we're during a memory * region transaction. If so, force commit the memory region transaction. */ if (memory_region_transaction_in_progress()) memory_region_transaction_commit_force(); return qatomic_rcu_read(&as->current_map); } 2. call it before each post_load() I prefer to use the former one, it is more general than the latter. And with this function, the sanity check is not necessary any more. Although we may inevitably call memory_region_transaction_commit_force() several times, in my actual test, the number of calls to memory_region_transaction_commit_force() is still insignificant compared with the number of calls to memory_region_transaction_commit() before optimization. As a result, This code won't affect the optimization effect, but it can ensure reliability. >> Error 2 was related to postcopy. I read the official document of postcopy >> (I hope it is the latest) and learned that two threads will call >> qemu_loadvm_state_main() in the process of postcopy. The one called by main thread >> will take the BQL, and the one called by ram_listen thread won't take the BQL. >> The latter checks whether the BQL is held when calling memory_region_transaction_commit(), >> thus triggering the assertion. Creating a new function qemu_loadvm_state_ram_listen() >> without memory_region_transaction_commit() will solve this error. > Sounds right, because the whole qemu_loadvm_state_main() process shouldn't > load any device state or anything that requires BQL at all; in most cases > that should be only RAM states leftovers. > > I think we only want to optimize precopy but not the postcopy phase. Note! > it should include the phase when transferring precopy -> postcopy too, so > it's covering postcopy, just not covering the "post" phase of migration - > if you see that's a nested call to qemu_loadvm_state_main() with a whole > MIG_CMD_PACKAGED package which is actually got covered, which is the real > meat for postcopy on device transitions. > > So in short: instead of creating qemu_loadvm_state_ram_listen(), how about > modifying your patch 3, instead of changing inside qemu_loadvm_state_main() > we can do that for qemu_loadvm_state() only (so you can wrap the begin() > and commit() over qemu_loadvm_state_main() there)? In Patch v6 I'll adopt the change you suggested, Thanks! >> I don't know if you suggest using this patch in postcopy. If this patch is applicable to >> postcopy, considering the difference between how postcopy and precheck load device state, >> do we need to consider more details? > See above. Yes I definitely hope postcopy will be covered too. > > Thanks! >
Hi, Peter This email is a supplement to the previous one. On 2023/2/21 上午11:38, Chuang Xu wrote: > > I think we need a memory_region_transaction_commit_force() to force > commit > some transactions when load vmstate. This function is designed like this: > > /* > * memory_region_transaction_commit_force() is desgined to > * force the mr transaction to be commited in the process > * of loading vmstate. > */ > void memory_region_transaction_commit_force(void) > { > AddressSpace *as; > unsigned int memory_region_transaction_depth_copy = > memory_region_transaction_depth; > > /* > * Temporarily replace memory_region_transaction_depth with 0 to > prevent > * memory_region_transaction_commit_force() and > address_space_to_flatview() > * call each other recursively. > */ > memory_region_transaction_depth = 0; > > assert(qemu_mutex_iothread_locked()); > > > if (memory_region_update_pending) { > flatviews_reset(); > > MEMORY_LISTENER_CALL_GLOBAL(begin, Forward); > > QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) { > address_space_set_flatview(as); > address_space_update_ioeventfds(as); > } > memory_region_update_pending = false; > ioeventfd_update_pending = false; > MEMORY_LISTENER_CALL_GLOBAL(commit, Forward); > } else if (ioeventfd_update_pending) { > QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) { > address_space_update_ioeventfds(as); > } > ioeventfd_update_pending = false; > } > > /* recover memory_region_transaction_depth */ > memory_region_transaction_depth = > memory_region_transaction_depth_copy; > } > > Now there are two options to use this function: > 1. call it in address_space_to_flatview(): > > static inline FlatView *address_space_to_flatview(AddressSpace *as) > { > /* > * Before using any flatview, check whether we're during a memory > * region transaction. If so, force commit the memory region > transaction. > */ > if (memory_region_transaction_in_progress()) Here we need to add the condition of BQL holding, or some threads without BQL held running here will trigger the assertion in memory_region_transaction_commit_force(). I'm not sure whether this condition is sufficient, at least for the mr access in the load thread it is sufficient (because the load thread will hold the BQL when accessing mr). But for other cases, it seems that we will return to our discussion on sanity check.. Another point I worry about is whether the number of mr transaction commits has increased in some other scenarios because of this force commit. Although So far, I haven't seen a simple virtual machine lifecycle trigger this force commit. I did a little test: replace commit_force() with abort() and run qtest. Almost all error I can see is related to migration.. > memory_region_transaction_commit_force(); > return qatomic_rcu_read(&as->current_map); > } > > 2. call it before each post_load() > > I prefer to use the former one, it is more general than the latter. > And with this function, the sanity check is not necessary any more. > Although we may inevitably call memory_region_transaction_commit_force() > several times, in my actual test, the number of calls to > memory_region_transaction_commit_force() is still insignificant compared > with the number of calls to memory_region_transaction_commit() before > optimization. As a result, This code won't affect the optimization > effect, > but it can ensure reliability. > Looking forward to your opinion, Thanks!
On Tue, Feb 21, 2023 at 04:57:30PM +0800, Chuang Xu wrote: > Hi, Peter Hi, Chuang, > > This email is a supplement to the previous one. AFAICT I mostly agree with you, but see a few more comments below. > > On 2023/2/21 上午11:38, Chuang Xu wrote: > > > > I think we need a memory_region_transaction_commit_force() to force > > commit > > some transactions when load vmstate. This function is designed like this: > > > > /* > > * memory_region_transaction_commit_force() is desgined to > > * force the mr transaction to be commited in the process > > * of loading vmstate. > > */ > > void memory_region_transaction_commit_force(void) I would call this memory_region_transaction_do_commit(), and I don't think the manipulation of memory_region_transaction_depth is needed here since we don't release BQL during the whole process, so changing that depth isn't needed at all to me. So, I think we can... > > { > > AddressSpace *as; > > unsigned int memory_region_transaction_depth_copy = > > memory_region_transaction_depth; > > > > /* > > * Temporarily replace memory_region_transaction_depth with 0 to > > prevent > > * memory_region_transaction_commit_force() and > > address_space_to_flatview() > > * call each other recursively. > > */ > > memory_region_transaction_depth = 0; ... drop here ... > > > > assert(qemu_mutex_iothread_locked()); > > > > > > if (memory_region_update_pending) { > > flatviews_reset(); > > > > MEMORY_LISTENER_CALL_GLOBAL(begin, Forward); > > > > QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) { > > address_space_set_flatview(as); > > address_space_update_ioeventfds(as); > > } > > memory_region_update_pending = false; > > ioeventfd_update_pending = false; > > MEMORY_LISTENER_CALL_GLOBAL(commit, Forward); > > } else if (ioeventfd_update_pending) { > > QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) { > > address_space_update_ioeventfds(as); > > } > > ioeventfd_update_pending = false; > > } > > > > /* recover memory_region_transaction_depth */ > > memory_region_transaction_depth = > > memory_region_transaction_depth_copy; ... drop here ... > > } ... then call this new memory_region_transaction_do_commit() in memory_region_transaction_commit(). void memory_region_transaction_commit(void) { AddressSpace *as; assert(memory_region_transaction_depth); --memory_region_transaction_depth; memory_region_transaction_do_commit(); } Then... > > > > Now there are two options to use this function: > > 1. call it in address_space_to_flatview(): > > > > static inline FlatView *address_space_to_flatview(AddressSpace *as) > > { > > /* > > * Before using any flatview, check whether we're during a memory > > * region transaction. If so, force commit the memory region > > transaction. > > */ > > if (memory_region_transaction_in_progress()) > > Here we need to add the condition of BQL holding, or some threads without > BQL held running here will trigger the assertion in > memory_region_transaction_commit_force(). > > I'm not sure whether this condition is sufficient, at least for the mr access > in the load thread it is sufficient (because the load thread will hold the BQL > when accessing mr). But for other cases, it seems that we will return to > our discussion on sanity check.. Yes, I think the sanity checks are actually good stuff. I would think it's nice to impl address_space_to_flatview() like this. I guess we don't have an use case to fetch the flatview during a memory update procedure, but I also don't see why it can't be supported. /* Need to be called with either BQL or RCU read lock held */ static inline FlatView *address_space_to_flatview(AddressSpace *as) { if (qemu_mutex_iothread_locked()) { /* We exclusively own the flatview now.. */ if (memory_region_transaction_in_progress()) { /* * Fetch the flatview within a transaction in-progress, it * means current_map may not be the latest, we need to update * immediately to make sure the caller won't see obsolete * mapping. */ memory_region_transaction_do_commit(); } /* No further protection needed to access current_map */ return as->current_map; } /* Otherwise we must have had the RCU lock or something went wrong */ assert(rcu_read_is_locked()); return qatomic_rcu_read(&as->current_map); } Then IIUC everything should start to run normal again, with the same hope that it will keep the benefit of your whole idea. Does that look sane to you? > > Another point I worry about is whether the number of mr transaction commits > has increased in some other scenarios because of this force commit. Although > So far, I haven't seen a simple virtual machine lifecycle trigger this force commit. It won't be so bad; with the help of existing memory_region_update_pending and ioeventfd_update_pending, the commit() will be noop if both are unset. > I did a little test: replace commit_force() with abort() and run qtest. > Almost all error I can see is related to migration.. > > > memory_region_transaction_commit_force(); > > return qatomic_rcu_read(&as->current_map); > > } > > > > 2. call it before each post_load() > > > > I prefer to use the former one, it is more general than the latter. > > And with this function, the sanity check is not necessary any more. > > Although we may inevitably call memory_region_transaction_commit_force() > > several times, in my actual test, the number of calls to > > memory_region_transaction_commit_force() is still insignificant compared > > with the number of calls to memory_region_transaction_commit() before > > optimization. As a result, This code won't affect the optimization > > effect, > > but it can ensure reliability. Yes the 2nd option doesn't sound appealing to me, because we have so many post_load()s and it can quickly beat your original purpose, I think. Thanks,
Hi, Peter On 2023/2/22 上午4:36, Peter Xu wrote: >> On 2023/2/21 上午11:38, Chuang Xu wrote: >>> I think we need a memory_region_transaction_commit_force() to force >>> commit >>> some transactions when load vmstate. This function is designed like this: >>> >>> /* >>> * memory_region_transaction_commit_force() is desgined to >>> * force the mr transaction to be commited in the process >>> * of loading vmstate. >>> */ >>> void memory_region_transaction_commit_force(void) > I would call this memory_region_transaction_do_commit(), and I don't think > the manipulation of memory_region_transaction_depth is needed here since we > don't release BQL during the whole process, so changing that depth isn't > needed at all to me. > > So, I think we can... > >>> { >>> AddressSpace *as; >>> unsigned int memory_region_transaction_depth_copy = >>> memory_region_transaction_depth; >>> >>> /* >>> * Temporarily replace memory_region_transaction_depth with 0 to >>> prevent >>> * memory_region_transaction_commit_force() and >>> address_space_to_flatview() >>> * call each other recursively. >>> */ >>> memory_region_transaction_depth = 0; > ... drop here ... Note that as I mentioned in the comment, we temporarily replace this value to prevent commit() and address_space_to_flatview() call each other recursively, and eventually stack overflow. Part of the coredump call stack is attached here: #8 0x0000558de5a998b5 in memory_region_transaction_do_commit () at ../softmmu/memory.c:1131 #9 0x0000558de5a99dfd in address_space_to_flatview (as=0x558de6516060 <address_space_memory>) at /data00/migration/qemu-open/include/exec/memory.h:1130 #10 address_space_get_flatview (as=as@entry=0x558de6516060 <address_space_memory>) at ../softmmu/memory.c:810 #11 0x0000558de5a9a199 in address_space_update_ioeventfds (as=as@entry=0x558de6516060 <address_space_memory>) at ../softmmu/memory.c:836 #12 0x0000558de5a99900 in memory_region_transaction_do_commit () at ../softmmu/memory.c:1137 #13 memory_region_transaction_do_commit () at ../softmmu/memory.c:1125 #14 0x0000558de5a99dfd in address_space_to_flatview (as=0x558de6516060 <address_space_memory>) at /data00/migration/qemu-open/include/exec/memory.h:1130 #15 address_space_get_flatview (as=as@entry=0x558de6516060 <address_space_memory>) at ../softmmu/memory.c:810 #16 0x0000558de5a9a199 in address_space_update_ioeventfds (as=as@entry=0x558de6516060 <address_space_memory>) at ../softmmu/memory.c:836 #17 0x0000558de5a99900 in memory_region_transaction_do_commit () at ../softmmu/memory.c:1137 #18 memory_region_transaction_do_commit () at ../softmmu/memory.c:1125 #19 0x0000558de5a99dfd in address_space_to_flatview (as=0x558de6516060 <address_space_memory>) at /data00/migration/qemu-open/include/exec/memory.h:1130 #20 address_space_get_flatview (as=as@entry=0x558de6516060 <address_space_memory>) at ../softmmu/memory.c:810 #21 0x0000558de5a9a199 in address_space_update_ioeventfds (as=as@entry=0x558de6516060 <address_space_memory>) at ../softmmu/memory.c:836 #22 0x0000558de5a99900 in memory_region_transaction_do_commit () at ../softmmu/memory.c:1137 #23 memory_region_transaction_do_commit () at ../softmmu/memory.c:1125 #24 0x0000558de5a99dfd in address_space_to_flatview (as=0x558de6516060 <address_space_memory>) at /data00/migration/qemu-open/include/exec/memory.h:1130 So I think we need to change the depth here. > >>> assert(qemu_mutex_iothread_locked()); >>> >>> >>> if (memory_region_update_pending) { >>> flatviews_reset(); >>> >>> MEMORY_LISTENER_CALL_GLOBAL(begin, Forward); >>> >>> QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) { >>> address_space_set_flatview(as); >>> address_space_update_ioeventfds(as); >>> } >>> memory_region_update_pending = false; >>> ioeventfd_update_pending = false; >>> MEMORY_LISTENER_CALL_GLOBAL(commit, Forward); >>> } else if (ioeventfd_update_pending) { >>> QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) { >>> address_space_update_ioeventfds(as); >>> } >>> ioeventfd_update_pending = false; >>> } >>> >>> /* recover memory_region_transaction_depth */ >>> memory_region_transaction_depth = >>> memory_region_transaction_depth_copy; > ... drop here ... > >>> } > ... then call this new memory_region_transaction_do_commit() in > memory_region_transaction_commit(). > > void memory_region_transaction_commit(void) > { > AddressSpace *as; > > assert(memory_region_transaction_depth); > --memory_region_transaction_depth; > memory_region_transaction_do_commit(); > } > > Then... > >>> Now there are two options to use this function: >>> 1. call it in address_space_to_flatview(): >>> >>> static inline FlatView *address_space_to_flatview(AddressSpace *as) >>> { >>> /* >>> * Before using any flatview, check whether we're during a memory >>> * region transaction. If so, force commit the memory region >>> transaction. >>> */ >>> if (memory_region_transaction_in_progress()) >> Here we need to add the condition of BQL holding, or some threads without >> BQL held running here will trigger the assertion in >> memory_region_transaction_commit_force(). >> >> I'm not sure whether this condition is sufficient, at least for the mr access >> in the load thread it is sufficient (because the load thread will hold the BQL >> when accessing mr). But for other cases, it seems that we will return to >> our discussion on sanity check.. > Yes, I think the sanity checks are actually good stuff. > > I would think it's nice to impl address_space_to_flatview() like this. I > guess we don't have an use case to fetch the flatview during a memory > update procedure, but I also don't see why it can't be supported. > > /* Need to be called with either BQL or RCU read lock held */ > static inline FlatView *address_space_to_flatview(AddressSpace *as) > { > if (qemu_mutex_iothread_locked()) { > /* We exclusively own the flatview now.. */ > if (memory_region_transaction_in_progress()) { > /* > * Fetch the flatview within a transaction in-progress, it > * means current_map may not be the latest, we need to update > * immediately to make sure the caller won't see obsolete > * mapping. > */ > memory_region_transaction_do_commit(); > } > > /* No further protection needed to access current_map */ > return as->current_map; > } > > /* Otherwise we must have had the RCU lock or something went wrong */ > assert(rcu_read_is_locked()); > > return qatomic_rcu_read(&as->current_map); > } > > Then IIUC everything should start to run normal again, with the same hope > that it will keep the benefit of your whole idea. Does that look sane to > you? > Yes, I mostly agree with you, except for the part about transaction_depth. Thanks!
On Wed, Feb 22, 2023 at 02:27:55PM +0800, Chuang Xu wrote: > Hi, Peter Hi, Chuang, > Note that as I mentioned in the comment, we temporarily replace this value > to prevent commit() and address_space_to_flatview() call each other recursively, > and eventually stack overflow. Sorry to have overlooked that part. IMHO here it's not about the depth, but rather that we don't even need any RCU protection when updating ioeventfds because we exclusively own the FlatView* too there. I wanted to describe what I had in mind but instead I figured a patch may be needed to be accurate (with some cleanups alongside), hence attached. IIUC it can work with what I suggested before without fiddling with depth. Please have a look. The patch should apply cleanly to master branch so if it works it can be your 1st patch too. PS: Paolo - I know I asked this before, but it'll be good to have your review comment on anything above. Thanks,
Hi, Peter On 2023/2/22 下午11:57, Peter Xu wrote: > On Wed, Feb 22, 2023 at 02:27:55PM +0800, Chuang Xu wrote: >> Hi, Peter > Hi, Chuang, > >> Note that as I mentioned in the comment, we temporarily replace this value >> to prevent commit() and address_space_to_flatview() call each other recursively, >> and eventually stack overflow. > Sorry to have overlooked that part. IMHO here it's not about the depth, > but rather that we don't even need any RCU protection when updating > ioeventfds because we exclusively own the FlatView* too there. > > I wanted to describe what I had in mind but instead I figured a patch may > be needed to be accurate (with some cleanups alongside), hence attached. > IIUC it can work with what I suggested before without fiddling with depth. > Please have a look. The patch should apply cleanly to master branch so if > it works it can be your 1st patch too. > > PS: Paolo - I know I asked this before, but it'll be good to have your > review comment on anything above. > > Thanks, > Here are two problems I can see: 1. It is inappropriate to use assert(qemu_mutex_iothread_locked() && !memory_region_update_pending) in update_ioeventfds(). For example, when entering commit(), if memory_region_update_pending is true, the assertion will be triggered immediately when update_ioeventfds is called. 2. The problem of stack overflow has not been solved. There are too many places where address_space_to_flatview() may be called. Here are another coredump stack: #8 0x000055a3a769ed85 in memory_region_transaction_commit_force () at ../softmmu/memory.c:1154 #9 0x000055a3a769fd75 in address_space_to_flatview (as=0x55a3a7ede180 <address_space_memory>) at /data00/migration/qemu-open/include/exec/memory.h:1118 #10 address_space_update_topology_pass (as=as@entry=0x55a3a7ede180 <address_space_memory>, old_view=old_view@entry=0x55a3a9d44990, new_view=new_view@entry=0x55a3d6837390, adding=adding@entry=false) at ../softmmu/memory.c:955 #11 0x000055a3a76a007c in address_space_set_flatview (as=as@entry=0x55a3a7ede180 <address_space_memory>) at ../softmmu/memory.c:1062 #12 0x000055a3a769e870 in address_space_update_flatview_all () at ../softmmu/memory.c:1107 #13 0x000055a3a769ed85 in memory_region_transaction_commit_force () at ../softmmu/memory.c:1154 #14 0x000055a3a769fd75 in address_space_to_flatview (as=0x55a3a7ede180 <address_space_memory>) at /data00/migration/qemu-open/include/exec/memory.h:1118 #15 address_space_update_topology_pass (as=as@entry=0x55a3a7ede180 <address_space_memory>, old_view=old_view@entry=0x55a3a9d44990, new_view=new_view@entry=0x55a3d67f8d90, adding=adding@entry=false) at ../softmmu/memory.c:955 #16 0x000055a3a76a007c in address_space_set_flatview (as=as@entry=0x55a3a7ede180 <address_space_memory>) at ../softmmu/memory.c:1062 #17 0x000055a3a769e870 in address_space_update_flatview_all () at ../softmmu/memory.c:1107 #18 0x000055a3a769ed85 in memory_region_transaction_commit_force () at ../softmmu/memory.c:1154 #19 0x000055a3a769fd75 in address_space_to_flatview (as=0x55a3a7ede180 <address_space_memory>) at /data00/migration/qemu-open/include/exec/memory.h:1118 #20 address_space_update_topology_pass (as=as@entry=0x55a3a7ede180 <address_space_memory>, old_view=old_view@entry=0x55a3a9d44990, new_view=new_view@entry=0x55a3d67ba790, adding=adding@entry=false) at ../softmmu/memory.c:955 #21 0x000055a3a76a007c in address_space_set_flatview (as=as@entry=0x55a3a7ede180 <address_space_memory>) at ../softmmu/memory.c:1062 #22 0x000055a3a769e870 in address_space_update_flatview_all () at ../softmmu/memory.c:1107 #23 0x000055a3a769ed85 in memory_region_transaction_commit_force () at ../softmmu/memory.c:1154 And this may not be the only case where stack overflow occurs. Thus, changing the depth value is the safest way I think. Thanks!
On Thu, Feb 23, 2023 at 11:28:46AM +0800, Chuang Xu wrote: > Hi, Peter Hi, Chuang, > > On 2023/2/22 下午11:57, Peter Xu wrote: > > On Wed, Feb 22, 2023 at 02:27:55PM +0800, Chuang Xu wrote: > > > Hi, Peter > > Hi, Chuang, > > > > > Note that as I mentioned in the comment, we temporarily replace this value > > > to prevent commit() and address_space_to_flatview() call each other recursively, > > > and eventually stack overflow. > > Sorry to have overlooked that part. IMHO here it's not about the depth, > > but rather that we don't even need any RCU protection when updating > > ioeventfds because we exclusively own the FlatView* too there. > > > > I wanted to describe what I had in mind but instead I figured a patch may > > be needed to be accurate (with some cleanups alongside), hence attached. > > IIUC it can work with what I suggested before without fiddling with depth. > > Please have a look. The patch should apply cleanly to master branch so if > > it works it can be your 1st patch too. > > > > PS: Paolo - I know I asked this before, but it'll be good to have your > > review comment on anything above. > > > > Thanks, > > > Here are two problems I can see: > > 1. It is inappropriate to use assert(qemu_mutex_iothread_locked() > && !memory_region_update_pending) in update_ioeventfds(). > > For example, when entering commit(), if memory_region_update_pending > is true, the assertion will be triggered immediately when update_ioeventfds > is called. I don't see why it's wrong, and that's exactly what I wanted to guarantee, that if memory_region_update_pending==true when updating ioeventfd, we may have some serios problem. For this, I split my patch into two parts and I put this change into the last patch. See the attachment. If you worry about this, you can e.g. try applying patch 1 only later, but I still don't see why it could. > > 2. The problem of stack overflow has not been solved. There are > too many places where address_space_to_flatview() may be called. > > Here are another coredump stack: > > #8 0x000055a3a769ed85 in memory_region_transaction_commit_force () at ../softmmu/memory.c:1154 > #9 0x000055a3a769fd75 in address_space_to_flatview (as=0x55a3a7ede180 <address_space_memory>) at /data00/migration/qemu-open/include/exec/memory.h:1118 > #10 address_space_update_topology_pass (as=as@entry=0x55a3a7ede180 <address_space_memory>, old_view=old_view@entry=0x55a3a9d44990, new_view=new_view@entry=0x55a3d6837390, > adding=adding@entry=false) at ../softmmu/memory.c:955 > #11 0x000055a3a76a007c in address_space_set_flatview (as=as@entry=0x55a3a7ede180 <address_space_memory>) at ../softmmu/memory.c:1062 > #12 0x000055a3a769e870 in address_space_update_flatview_all () at ../softmmu/memory.c:1107 > #13 0x000055a3a769ed85 in memory_region_transaction_commit_force () at ../softmmu/memory.c:1154 > #14 0x000055a3a769fd75 in address_space_to_flatview (as=0x55a3a7ede180 <address_space_memory>) at /data00/migration/qemu-open/include/exec/memory.h:1118 > #15 address_space_update_topology_pass (as=as@entry=0x55a3a7ede180 <address_space_memory>, old_view=old_view@entry=0x55a3a9d44990, new_view=new_view@entry=0x55a3d67f8d90, > adding=adding@entry=false) at ../softmmu/memory.c:955 > #16 0x000055a3a76a007c in address_space_set_flatview (as=as@entry=0x55a3a7ede180 <address_space_memory>) at ../softmmu/memory.c:1062 > #17 0x000055a3a769e870 in address_space_update_flatview_all () at ../softmmu/memory.c:1107 > #18 0x000055a3a769ed85 in memory_region_transaction_commit_force () at ../softmmu/memory.c:1154 > #19 0x000055a3a769fd75 in address_space_to_flatview (as=0x55a3a7ede180 <address_space_memory>) at /data00/migration/qemu-open/include/exec/memory.h:1118 > #20 address_space_update_topology_pass (as=as@entry=0x55a3a7ede180 <address_space_memory>, old_view=old_view@entry=0x55a3a9d44990, new_view=new_view@entry=0x55a3d67ba790, > adding=adding@entry=false) at ../softmmu/memory.c:955 > #21 0x000055a3a76a007c in address_space_set_flatview (as=as@entry=0x55a3a7ede180 <address_space_memory>) at ../softmmu/memory.c:1062 > #22 0x000055a3a769e870 in address_space_update_flatview_all () at ../softmmu/memory.c:1107 > #23 0x000055a3a769ed85 in memory_region_transaction_commit_force () at ../softmmu/memory.c:1154 > > And this may not be the only case where stack overflow occurs. > Thus, changing the depth value is the safest way I think. I really think changing depth is a hack... :( Here IMHO the problem is we have other missing calls to address_space_to_flatview() during commit() and that caused the issue. There aren't a lot of those, and sorry to miss yet another one. So let me try one more time on this (with patch 1; I think I've got two places missed in the previous attempt). Let's see how it goes. We may want to add a tracepoint or have some way to know when enfornced commit() is triggered during the vm load phase. I just noticed when you worried about having enforced commit() to often then it beats the original purpose and I think yes that's something to worry. I still believe AHCI condition is rare (since e.g. you've passed all Juan's tests even before we have this "do_commit" stuff), but in short: I think it would still be interesting if you can capture all the users of enforced commit() like the AHCI case you quoted before, and list them in the cover letter in your next post (along with a new perf measurements, to make sure your worry is not true on having too much enforced commit will invalid the whole idea). When I was digging this out, I also found some RCU issue when using address_space_to_flatview() (when BQL was not taken), only in the memory_region_clear_dirty_bitmap() path. I hope the new assertion (rcu_read_is_locked()) won't trigger on some of the use cases for you already, but anyway feel free to ignore this whole paragraph for now until if you see some assert(rcu_read_is_locked()) being triggered. I plan to post some RFC for fixing RCU and I'll have you copied just for reference (may be separate issue as what you're working on). Thanks,
Hi, Peter On 2023/2/25 下午11:32, Peter Xu wrote: > On Thu, Feb 23, 2023 at 11:28:46AM +0800, Chuang Xu wrote: >> Hi, Peter > Hi, Chuang, > >> On 2023/2/22 下午11:57, Peter Xu wrote: > I don't see why it's wrong, and that's exactly what I wanted to guarantee, > that if memory_region_update_pending==true when updating ioeventfd, we may > have some serios problem. > > For this, I split my patch into two parts and I put this change into the > last patch. See the attachment. If you worry about this, you can e.g. try > applying patch 1 only later, but I still don't see why it could. Sorry, I made some mistakes in my previous understanding of the code. However, if this assertion is added, it will indeed trigger some coredump in qtest with my patches. Here is the coredump(This is the only one I found): #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007fa5e7b17535 in __GI_abort () at abort.c:79 #2 0x00007fa5e7b1740f in __assert_fail_base (fmt=0x7fa5e7c78ef0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x56305fc02d60 "qemu_mutex_iothread_locked() && !memory_region_update_pending", file=0x56305fc028cb "../softmmu/memory.c", line=855, function=<optimized out>) at assert.c:92 #3 0x00007fa5e7b251a2 in __GI___assert_fail (assertion=assertion@entry=0x56305fc02d60 "qemu_mutex_iothread_locked() && !memory_region_update_pending", file=file@entry=0x56305fc028cb "../softmmu/memory.c", line=line@entry=855, function=function@entry=0x56305fc03cc0 <__PRETTY_FUNCTION__.38596> "address_space_update_ioeventfds") at assert.c:101 #4 0x000056305f8a9194 in address_space_update_ioeventfds (as=as@entry=0x563061293648) at ../softmmu/memory.c:855 #5 0x000056305f8afe4f in address_space_init (as=as@entry=0x563061293648, root=root@entry=0x5630612936a0, name=name@entry=0x5630612934f0 "virtio-net-pci") at ../softmmu/memory.c:3172 #6 0x000056305f686e43 in do_pci_register_device (errp=0x7fa5e4f39850, devfn=<optimized out>, name=0x563061011c40 "virtio-net-pci", pci_dev=0x563061293410) at ../hw/pci/pci.c:1145 #7 pci_qdev_realize (qdev=0x563061293410, errp=0x7fa5e4f39850) at ../hw/pci/pci.c:2036 #8 0x000056305f939daf in device_set_realized (obj=<optimized out>, value=true, errp=0x7fa5e4f39ae0) at ../hw/core/qdev.c:510 #9 0x000056305f93d156 in property_set_bool (obj=0x563061293410, v=<optimized out>, name=<optimized out>, opaque=0x5630610221d0, errp=0x7fa5e4f39ae0) at ../qom/object.c:2285 #10 0x000056305f93f403 in object_property_set (obj=obj@entry=0x563061293410, name=name@entry=0x56305fba6bc3 "realized", v=v@entry=0x563061e52a00, errp=errp@entry=0x7fa5e4f39ae0) at ../qom/object.c:1420 #11 0x000056305f94247f in object_property_set_qobject (obj=obj@entry=0x563061293410, name=name@entry=0x56305fba6bc3 "realized", value=value@entry=0x563061e61cb0, errp=errp@entry=0x7fa5e4f39ae0) at ../qom/qom-qobject.c:28 #12 0x000056305f93f674 in object_property_set_bool (obj=0x563061293410, name=name@entry=0x56305fba6bc3 "realized", value=value@entry=true, errp=errp@entry=0x7fa5e4f39ae0) at ../qom/object.c:1489 #13 0x000056305f93959c in qdev_realize (dev=<optimized out>, bus=bus@entry=0x563061c9c400, errp=errp@entry=0x7fa5e4f39ae0) at ../hw/core/qdev.c:292 #14 0x000056305f7244a0 in qdev_device_add_from_qdict (opts=0x563061e64c00, from_json=<optimized out>, errp=<optimized out>, errp@entry=0x7fa5e4f39ae0) at /data00/migration/qemu-open/include/hw/qdev-core.h:17 #15 0x000056305f846c75 in failover_add_primary (errp=0x7fa5e4f39ad8, n=0x563062043530) at ../hw/net/virtio-net.c:933 #16 virtio_net_set_features (vdev=<optimized out>, features=4611687122246533156) at ../hw/net/virtio-net.c:1004 #17 0x000056305f872238 in virtio_set_features_nocheck (vdev=vdev@entry=0x563062043530, val=val@entry=4611687122246533156) at ../hw/virtio/virtio.c:2851 #18 0x000056305f877e9e in virtio_load (vdev=0x563062043530, f=0x56306125bde0, version_id=11) at ../hw/virtio/virtio.c:3027 #19 0x000056305f73c601 in vmstate_load_state (f=f@entry=0x56306125bde0, vmsd=0x56305fef16c0 <vmstate_virtio_net>, opaque=0x563062043530, version_id=11) at ../migration/vmstate.c:137 #20 0x000056305f757672 in vmstate_load (f=0x56306125bde0, se=0x563062176700) at ../migration/savevm.c:919 #21 0x000056305f757905 in qemu_loadvm_section_start_full (f=f@entry=0x56306125bde0, mis=0x56306101d3e0) at ../migration/savevm.c:2503 #22 0x000056305f75aca8 in qemu_loadvm_state_main (f=f@entry=0x56306125bde0, mis=mis@entry=0x56306101d3e0) at ../migration/savevm.c:2719 #23 0x000056305f75c17a in qemu_loadvm_state (f=0x56306125bde0) at ../migration/savevm.c:2809 #24 0x000056305f74980e in process_incoming_migration_co (opaque=<optimized out>) at ../migration/migration.c:606 #25 0x000056305fab25cb in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at ../util/coroutine-ucontext.c:177 > I really think changing depth is a hack... :( > > Here IMHO the problem is we have other missing calls to > address_space_to_flatview() during commit() and that caused the issue. > There aren't a lot of those, and sorry to miss yet another one. > > So let me try one more time on this (with patch 1; I think I've got two > places missed in the previous attempt). Let's see how it goes. > > We may want to add a tracepoint or have some way to know when enfornced > commit() is triggered during the vm load phase. I just noticed when you > worried about having enforced commit() to often then it beats the original > purpose and I think yes that's something to worry. > > I still believe AHCI condition is rare (since e.g. you've passed all Juan's > tests even before we have this "do_commit" stuff), but in short: I think it > would still be interesting if you can capture all the users of enforced > commit() like the AHCI case you quoted before, and list them in the cover > letter in your next post (along with a new perf measurements, to make sure > your worry is not true on having too much enforced commit will invalid the > whole idea). > > When I was digging this out, I also found some RCU issue when using > address_space_to_flatview() (when BQL was not taken), only in the > memory_region_clear_dirty_bitmap() path. I hope the new assertion > (rcu_read_is_locked()) won't trigger on some of the use cases for you > already, but anyway feel free to ignore this whole paragraph for now until > if you see some assert(rcu_read_is_locked()) being triggered. I plan to > post some RFC for fixing RCU and I'll have you copied just for reference > (may be separate issue as what you're working on). > > Thanks, > Unfortunately, there is another case of stack overflow... #8 memory_region_transaction_do_commit () at ../softmmu/memory.c:1145 #9 0x00005610e96d3665 in address_space_to_flatview (as=0x5610e9ece820 <address_space_memory>) at /data00/migration/qemu-open/include/exec/memory.h:1119 #10 address_space_get_flatview (as=0x5610e9ece820 <address_space_memory>) at ../softmmu/memory.c:818 #11 0x00005610e96dfa14 in address_space_cache_init (cache=cache@entry=0x56111cdee410, as=<optimized out>, addr=addr@entry=1048576, len=len@entry=4096, is_write=false) at ../softmmu/physmem.c:3350 #12 0x00005610e96a0928 in virtio_init_region_cache (vdev=vdev@entry=0x5610eb72bf10, n=n@entry=0) at ../hw/virtio/virtio.c:247 #13 0x00005610e96a0db4 in virtio_memory_listener_commit (listener=0x5610eb72bff8) at ../hw/virtio/virtio.c:3592 #14 0x00005610e96d2e5e in address_space_update_flatviews_all () at ../softmmu/memory.c:1126 #15 memory_region_transaction_do_commit () at ../softmmu/memory.c:1145 Fortunately, this is probably the last one (at least according to the qtest results)
On Mon, Feb 27, 2023 at 09:19:39PM +0800, Chuang Xu wrote: > Hi, Peter Hi, Chuang, > > On 2023/2/25 下午11:32, Peter Xu wrote: > > On Thu, Feb 23, 2023 at 11:28:46AM +0800, Chuang Xu wrote: > > > Hi, Peter > > Hi, Chuang, > > > > > On 2023/2/22 下午11:57, Peter Xu wrote: > > I don't see why it's wrong, and that's exactly what I wanted to guarantee, > > that if memory_region_update_pending==true when updating ioeventfd, we may > > have some serios problem. > > > > For this, I split my patch into two parts and I put this change into the > > last patch. See the attachment. If you worry about this, you can e.g. try > > applying patch 1 only later, but I still don't see why it could. > > Sorry, I made some mistakes in my previous understanding of the code. However, > if this assertion is added, it will indeed trigger some coredump in qtest with > my patches. Here is the coredump(This is the only one I found): > > #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 > #1 0x00007fa5e7b17535 in __GI_abort () at abort.c:79 > #2 0x00007fa5e7b1740f in __assert_fail_base (fmt=0x7fa5e7c78ef0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", > assertion=0x56305fc02d60 "qemu_mutex_iothread_locked() && !memory_region_update_pending", file=0x56305fc028cb "../softmmu/memory.c", line=855, function=<optimized out>) at assert.c:92 > #3 0x00007fa5e7b251a2 in __GI___assert_fail (assertion=assertion@entry=0x56305fc02d60 "qemu_mutex_iothread_locked() && !memory_region_update_pending", > file=file@entry=0x56305fc028cb "../softmmu/memory.c", line=line@entry=855, function=function@entry=0x56305fc03cc0 <__PRETTY_FUNCTION__.38596> "address_space_update_ioeventfds") > at assert.c:101 > #4 0x000056305f8a9194 in address_space_update_ioeventfds (as=as@entry=0x563061293648) at ../softmmu/memory.c:855 > #5 0x000056305f8afe4f in address_space_init (as=as@entry=0x563061293648, root=root@entry=0x5630612936a0, name=name@entry=0x5630612934f0 "virtio-net-pci") at ../softmmu/memory.c:3172 > #6 0x000056305f686e43 in do_pci_register_device (errp=0x7fa5e4f39850, devfn=<optimized out>, name=0x563061011c40 "virtio-net-pci", pci_dev=0x563061293410) at ../hw/pci/pci.c:1145 > #7 pci_qdev_realize (qdev=0x563061293410, errp=0x7fa5e4f39850) at ../hw/pci/pci.c:2036 > #8 0x000056305f939daf in device_set_realized (obj=<optimized out>, value=true, errp=0x7fa5e4f39ae0) at ../hw/core/qdev.c:510 > #9 0x000056305f93d156 in property_set_bool (obj=0x563061293410, v=<optimized out>, name=<optimized out>, opaque=0x5630610221d0, errp=0x7fa5e4f39ae0) at ../qom/object.c:2285 > #10 0x000056305f93f403 in object_property_set (obj=obj@entry=0x563061293410, name=name@entry=0x56305fba6bc3 "realized", v=v@entry=0x563061e52a00, errp=errp@entry=0x7fa5e4f39ae0) > at ../qom/object.c:1420 > #11 0x000056305f94247f in object_property_set_qobject (obj=obj@entry=0x563061293410, name=name@entry=0x56305fba6bc3 "realized", value=value@entry=0x563061e61cb0, > errp=errp@entry=0x7fa5e4f39ae0) at ../qom/qom-qobject.c:28 > #12 0x000056305f93f674 in object_property_set_bool (obj=0x563061293410, name=name@entry=0x56305fba6bc3 "realized", value=value@entry=true, errp=errp@entry=0x7fa5e4f39ae0) > at ../qom/object.c:1489 > #13 0x000056305f93959c in qdev_realize (dev=<optimized out>, bus=bus@entry=0x563061c9c400, errp=errp@entry=0x7fa5e4f39ae0) at ../hw/core/qdev.c:292 > #14 0x000056305f7244a0 in qdev_device_add_from_qdict (opts=0x563061e64c00, from_json=<optimized out>, errp=<optimized out>, errp@entry=0x7fa5e4f39ae0) > at /data00/migration/qemu-open/include/hw/qdev-core.h:17 > #15 0x000056305f846c75 in failover_add_primary (errp=0x7fa5e4f39ad8, n=0x563062043530) at ../hw/net/virtio-net.c:933 > #16 virtio_net_set_features (vdev=<optimized out>, features=4611687122246533156) at ../hw/net/virtio-net.c:1004 > #17 0x000056305f872238 in virtio_set_features_nocheck (vdev=vdev@entry=0x563062043530, val=val@entry=4611687122246533156) at ../hw/virtio/virtio.c:2851 > #18 0x000056305f877e9e in virtio_load (vdev=0x563062043530, f=0x56306125bde0, version_id=11) at ../hw/virtio/virtio.c:3027 > #19 0x000056305f73c601 in vmstate_load_state (f=f@entry=0x56306125bde0, vmsd=0x56305fef16c0 <vmstate_virtio_net>, opaque=0x563062043530, version_id=11) at ../migration/vmstate.c:137 > #20 0x000056305f757672 in vmstate_load (f=0x56306125bde0, se=0x563062176700) at ../migration/savevm.c:919 > #21 0x000056305f757905 in qemu_loadvm_section_start_full (f=f@entry=0x56306125bde0, mis=0x56306101d3e0) at ../migration/savevm.c:2503 > #22 0x000056305f75aca8 in qemu_loadvm_state_main (f=f@entry=0x56306125bde0, mis=mis@entry=0x56306101d3e0) at ../migration/savevm.c:2719 > #23 0x000056305f75c17a in qemu_loadvm_state (f=0x56306125bde0) at ../migration/savevm.c:2809 > #24 0x000056305f74980e in process_incoming_migration_co (opaque=<optimized out>) at ../migration/migration.c:606 > #25 0x000056305fab25cb in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at ../util/coroutine-ucontext.c:177 I thought about something like this one and assumed it was fine, but I forgot this can trigger after your other patch applied.. It means we need to drop the 2nd patch that I provided in the last reply. > > > I really think changing depth is a hack... :( > > > > Here IMHO the problem is we have other missing calls to > > address_space_to_flatview() during commit() and that caused the issue. > > There aren't a lot of those, and sorry to miss yet another one. > > > > So let me try one more time on this (with patch 1; I think I've got two > > places missed in the previous attempt). Let's see how it goes. > > > > We may want to add a tracepoint or have some way to know when enfornced > > commit() is triggered during the vm load phase. I just noticed when you > > worried about having enforced commit() to often then it beats the original > > purpose and I think yes that's something to worry. > > > > I still believe AHCI condition is rare (since e.g. you've passed all Juan's > > tests even before we have this "do_commit" stuff), but in short: I think it > > would still be interesting if you can capture all the users of enforced > > commit() like the AHCI case you quoted before, and list them in the cover > > letter in your next post (along with a new perf measurements, to make sure > > your worry is not true on having too much enforced commit will invalid the > > whole idea). > > > > When I was digging this out, I also found some RCU issue when using > > address_space_to_flatview() (when BQL was not taken), only in the > > memory_region_clear_dirty_bitmap() path. I hope the new assertion > > (rcu_read_is_locked()) won't trigger on some of the use cases for you > > already, but anyway feel free to ignore this whole paragraph for now until > > if you see some assert(rcu_read_is_locked()) being triggered. I plan to > > post some RFC for fixing RCU and I'll have you copied just for reference > > (may be separate issue as what you're working on). > > > > Thanks, > > > Unfortunately, there is another case of stack overflow... > > #8 memory_region_transaction_do_commit () at ../softmmu/memory.c:1145 > #9 0x00005610e96d3665 in address_space_to_flatview (as=0x5610e9ece820 <address_space_memory>) at /data00/migration/qemu-open/include/exec/memory.h:1119 > #10 address_space_get_flatview (as=0x5610e9ece820 <address_space_memory>) at ../softmmu/memory.c:818 > #11 0x00005610e96dfa14 in address_space_cache_init (cache=cache@entry=0x56111cdee410, as=<optimized out>, addr=addr@entry=1048576, len=len@entry=4096, is_write=false) > at ../softmmu/physmem.c:3350 > #12 0x00005610e96a0928 in virtio_init_region_cache (vdev=vdev@entry=0x5610eb72bf10, n=n@entry=0) at ../hw/virtio/virtio.c:247 > #13 0x00005610e96a0db4 in virtio_memory_listener_commit (listener=0x5610eb72bff8) at ../hw/virtio/virtio.c:3592 > #14 0x00005610e96d2e5e in address_space_update_flatviews_all () at ../softmmu/memory.c:1126 > #15 memory_region_transaction_do_commit () at ../softmmu/memory.c:1145 > > Fortunately, this is probably the last one (at least according to the qtest results)