Message ID | 20240616010357.2874662-1-dmitry.osipenko@collabora.com (mailing list archive) |
---|---|
Headers | show |
Series | Support blob memory and venus on qemu | expand |
Dmitry Osipenko <dmitry.osipenko@collabora.com> writes: > Hello, > > This series enables Vulkan Venus context support on virtio-gpu. > > All virglrender and almost all Linux kernel prerequisite changes > needed by Venus are already in upstream. For kernel there is a pending > KVM patchset that fixes mapping of compound pages needed for DRM drivers > using TTM [1], othewrwise hostmem blob mapping will fail with a KVM error > from Qemu. So I've been experimenting with Aarch64 TCG with an Intel backend like this: ./qemu-system-aarch64 \ -M virt -cpu cortex-a76 \ -device virtio-net-pci,netdev=unet \ -netdev user,id=unet,hostfwd=tcp::2222-:22 \ -m 8192 \ -object memory-backend-memfd,id=mem,size=8G,share=on \ -serial mon:stdio \ -kernel ~/lsrc/linux.git/builds/arm64.initramfs/arch/arm64/boot/Image \ -append "console=ttyAMA0" \ -device qemu-xhci -device usb-kbd -device usb-tablet \ -device virtio-gpu-gl-pci,blob=true,venus=true,hostmem=4G \ -display sdl,gl=on -d plugin,guest_errors,trace:virtio_gpu_cmd_res_create_blob,trace:virtio_gpu_cmd_res_back_\*,trace:virtio_gpu_cmd_res_xfer_toh_3d,trace:virtio_gpu_cmd_res_xfer_fromh_3d,trace:address_space_map And I've noticed a couple of things. First trying to launch vkmark to run a KMS mode test fails with: vkr_context_add_object: 5 -> 0x7f24b81d7198 address_space_map as:0x561b48ec48c0 addr 0x1008ac648:20 write:0 attrs:0x1 address_space_map as:0x561b48ec48c0 addr 0x109dc5be0:18 write:0 attrs:0x1 address_space_map as:0x561b48ec48c0 addr 0x1008ac668:18 write:1 attrs:0x1 vkr_context_add_object: 6 -> 0x7f24b81d7240 address_space_map as:0x561b48ec48c0 addr 0x1008ac648:20 write:0 attrs:0x1 address_space_map as:0x561b48ec48c0 addr 0x109dc5be0:18 write:0 attrs:0x1 address_space_map as:0x561b48ec48c0 addr 0x1008ac668:18 write:1 attrs:0x1 vkr_context_add_object: 7 -> 0x7f24b81d71e0 address_space_map as:0x561b48ec48c0 addr 0x1008ac648:48 write:0 attrs:0x1 address_space_map as:0x561b48ec48c0 addr 0x1008ac690:18 write:1 attrs:0x1 address_space_map as:0x561b48ec48c0 addr 0x1008ac570:20 write:0 attrs:0x1 address_space_map as:0x561b48ec48c0 addr 0x101d64300:40 write:0 attrs:0x1 address_space_map as:0x561b48ec48c0 addr 0x1008ac590:18 write:1 attrs:0x1 address_space_map as:0x561b48ec48c0 addr 0x1008ac720:20 write:0 attrs:0x1 address_space_map as:0x561b48ec48c0 addr 0x1008ac740:18 write:1 attrs:0x1 virtio_gpu_cmd_res_back_attach res 0x5, 4 entries address_space_map as:0x561b48ec48c0 addr 0x109fd5000:2b000 write:0 attrs:0x1 address_space_map as:0x561b48ec48c0 addr 0x102200000:100000 write:0 attrs:0x1 address_space_map as:0x561b48ec48c0 addr 0x100e00000:200000 write:0 attrs:0x1 address_space_map as:0x561b48ec48c0 addr 0x10a000000:bd000 write:0 attrs:0x1 address_space_map as:0x561b48ec48c0 addr 0x1008ac648:48 write:0 attrs:0x1 address_space_map as:0x561b48ec48c0 addr 0x1008ac690:18 write:1 attrs:0x1 address_space_map as:0x561b48ec48c0 addr 0x1008ac720:20 write:0 attrs:0x1 address_space_map as:0x561b48ec48c0 addr 0x109dc5be0:20 write:0 attrs:0x1 address_space_map as:0x561b48ec48c0 addr 0x1008ac740:18 write:1 attrs:0x1 address_space_map as:0x561b48ec48c0 addr 0x1008ac570:20 write:0 attrs:0x1 address_space_map as:0x561b48ec48c0 addr 0x1008ac590:18 write:1 attrs:0x1 virtio_gpu_cmd_res_back_attach res 0x6, 2 entries address_space_map as:0x561b48ec48c0 addr 0x10a0bd000:343000 write:0 attrs:0x1 address_space_map as:0x561b48ec48c0 addr 0x11a800000:a5000 write:0 attrs:0x1 address_space_map as:0x561b48ec48c0 addr 0x1008ac648:20 write:0 attrs:0x1 address_space_map as:0x561b48ec48c0 addr 0x109dc5be0:18 write:0 attrs:0x1 address_space_map as:0x561b48ec48c0 addr 0x1008ac668:18 write:1 attrs:0x1 vkr_context_add_object: 8 -> 0x7f24b81d7180 address_space_map as:0x561b48ec48c0 addr 0x1008ac648:20 write:0 attrs:0x1 address_space_map as:0x561b48ec48c0 addr 0x109dc5be0:18 write:0 attrs:0x1 address_space_map as:0x561b48ec48c0 addr 0x1008ac668:18 write:1 attrs:0x1 address_space_map as:0x561b48ec48c0 addr 0x1008ac648:20 write:0 attrs:0x1 address_space_map as:0x561b48ec48c0 addr 0x1008ac668:18 write:1 attrs:0x1 address_space_map as:0x561b48ec48c0 addr 0x1008ac648:20 write:0 attrs:0x1 address_space_map as:0x561b48ec48c0 addr 0x109dc5be0:18 write:0 attrs:0x1 address_space_map as:0x561b48ec48c0 addr 0x1008ac668:18 write:1 attrs:0x1 address_space_map as:0x561b48ec48c0 addr 0x1008ac648:20 write:0 attrs:0x1 address_space_map as:0x561b48ec48c0 addr 0x109dc5be0:18 write:0 attrs:0x1 address_space_map as:0x561b48ec48c0 addr 0x1008ac668:18 write:1 attrs:0x1 virgl_render_server[1875931]: vkr: failed to import resource: invalid res_id 5 virgl_render_server[1875931]: vkr: vkAllocateMemory resulted in CS error virgl_render_server[1875931]: vkr: ring_submit_cmd: vn_dispatch_command failed More interestingly when shutting stuff down we see weirdness like: address_space_map as:0x561b48ec48c0 addr 0x1008ac4b0:18 write:1 attrs:0x1 virgl_render_server[1875931]: vkr: destroying context 3 (vkmark) with a valid instance virgl_render_server[1875931]: vkr: destroying device with valid objects vkr_context_remove_object: -7438602987017907480 vkr_context_remove_object: 7 vkr_context_remove_object: 5 which indicates something has gone very wrong. I'm not super familiar with the memory allocation patterns but should stuff that is done as virtio_gpu_cmd_res_back_attach() be find-able in the list of resources? I tried running under RR to further debug but weirdly I can't get working graphics with that. I did try running under threadsan which complained about a potential data race: vkr_context_add_object: 1 -> 0x7b2c00000288 vkr_context_add_object: 2 -> 0x7b2c00000270 vkr_context_add_object: 3 -> 0x7b3800007f28 vkr_context_add_object: 4 -> 0x7b3800007fa0 vkr_context_add_object: 5 -> 0x7b48000103f8 vkr_context_add_object: 6 -> 0x7b48000104a0 vkr_context_add_object: 7 -> 0x7b4800010440 virtio_gpu_cmd_res_back_attach res 0x5 virtio_gpu_cmd_res_back_attach res 0x6 vkr_context_add_object: 8 -> 0x7b48000103e0 virgl_render_server[1751430]: vkr: failed to import resource: invalid res_id 5 virgl_render_server[1751430]: vkr: vkAllocateMemory resulted in CS error virgl_render_server[1751430]: vkr: ring_submit_cmd: vn_dispatch_command failed ================== WARNING: ThreadSanitizer: data race (pid=1751256) Read of size 8 at 0x7f7fa0ea9138 by main thread (mutexes: write M0): #0 memcpy <null> (qemu-system-aarch64+0x41fede) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #1 iov_to_buf_full /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/iov.c:51:13 (qemu-system-aarch64+0x19839cf) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #2 iov_to_buf /home/alex/lsrc/qemu.git/include/qemu/iov.h:62:16 (qemu-system-aarch64+0xe3db91) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #3 virtio_gpu_virgl_process_cmd /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/display/virtio-gpu-virgl.c:914:5 (qemu-system-aarch64+0xe3d178) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #4 virtio_gpu_process_cmdq /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/display/virtio-gpu.c:1055:9 (qemu-system-aarch64+0xe308ca) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #5 virtio_gpu_gl_handle_ctrl /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/display/virtio-gpu-gl.c:100:5 (qemu-system-aarch64+0xe3c8fd) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #6 virtio_gpu_ctrl_bh /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/display/virtio-gpu.c:1134:5 (qemu-system-aarch64+0xe3173a) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #7 aio_bh_call /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/async.c:171:5 (qemu-system-aarch64+0x19643e7) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #8 aio_bh_poll /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/async.c:218:13 (qemu-system-aarch64+0x1964723) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #9 aio_dispatch /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/aio-posix.c:423:5 (qemu-system-aarch64+0x192ab55) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #10 aio_ctx_dispatch /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/async.c:360:5 (qemu-system-aarch64+0x1966d94) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #11 g_main_context_dispatch <null> (libglib-2.0.so.0+0x547a8) (BuildId: 9f90bd7bbfcf84a1f1c5a6102f70e6264837b9d4) #12 os_host_main_loop_wait /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/main-loop.c:310:5 (qemu-system-aarch64+0x1967f14) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #13 main_loop_wait /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/main-loop.c:589:11 (qemu-system-aarch64+0x1967d78) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #14 qemu_main_loop /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/runstate.c:795:9 (qemu-system-aarch64+0xce803c) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #15 qemu_default_main /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/main.c:37:14 (qemu-system-aarch64+0x1583c15) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #16 main /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/main.c:48:12 (qemu-system-aarch64+0x1583c8a) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) Previous write of size 8 at 0x7f7fa0ea9138 by thread T3: #0 memset <null> (qemu-system-aarch64+0x41fbdd) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #1 helper_dc_zva /home/alex/lsrc/qemu.git/builds/system.threadsan/../../target/arm/tcg/helper-a64.c:974:5 (qemu-system-aarch64+0x1305506) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #2 <null> <null> (0x7f82e14300aa) #3 cpu_loop_exec_tb /home/alex/lsrc/qemu.git/builds/system.threadsan/../../accel/tcg/cpu-exec.c:917:10 (qemu-system-aarch64+0x15383fd) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #4 cpu_exec_loop /home/alex/lsrc/qemu.git/builds/system.threadsan/../../accel/tcg/cpu-exec.c:1031:13 (qemu-system-aarch64+0x153746f) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #5 cpu_exec_setjmp /home/alex/lsrc/qemu.git/builds/system.threadsan/../../accel/tcg/cpu-exec.c:1048:12 (qemu-system-aarch64+0x15354db) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #6 cpu_exec /home/alex/lsrc/qemu.git/builds/system.threadsan/../../accel/tcg/cpu-exec.c:1074:11 (qemu-system-aarch64+0x1535040) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #7 tcg_cpu_exec /home/alex/lsrc/qemu.git/builds/system.threadsan/../../accel/tcg/tcg-accel-ops.c:78:11 (qemu-system-aarch64+0x157aabe) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #8 mttcg_cpu_thread_fn /home/alex/lsrc/qemu.git/builds/system.threadsan/../../accel/tcg/tcg-accel-ops-mttcg.c:95:17 (qemu-system-aarch64+0x157bb7b) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #9 qemu_thread_start /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/qemu-thread-posix.c:541:9 (qemu-system-aarch64+0x19363d1) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) Mutex M0 (0x562c55066680) created at: #0 pthread_mutex_init <null> (qemu-system-aarch64+0x41746f) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #1 qemu_mutex_init /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/qemu-thread-posix.c:71:11 (qemu-system-aarch64+0x19345d9) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #2 qemu_init_cpu_loop /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/cpus.c:423:5 (qemu-system-aarch64+0xccd145) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #3 qemu_init_subsystems /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/runstate.c:825:5 (qemu-system-aarch64+0xce83a7) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #4 qemu_init /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/vl.c:2795:5 (qemu-system-aarch64+0xcea0f3) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #5 main /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/main.c:47:5 (qemu-system-aarch64+0x1583c78) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) Thread T3 'CPU 0/TCG' (tid=1751259, running) created by main thread at: #0 pthread_create <null> (qemu-system-aarch64+0x415c6d) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #1 qemu_thread_create /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/qemu-thread-posix.c:581:11 (qemu-system-aarch64+0x193619b) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #2 mttcg_start_vcpu_thread /home/alex/lsrc/qemu.git/builds/system.threadsan/../../accel/tcg/tcg-accel-ops-mttcg.c:144:5 (qemu-system-aarch64+0x157b940) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #3 qemu_init_vcpu /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/cpus.c:680:5 (qemu-system-aarch64+0xcce7a0) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #4 arm_cpu_realizefn /home/alex/lsrc/qemu.git/builds/system.threadsan/../../target/arm/cpu.c:2588:5 (qemu-system-aarch64+0xfd9023) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #5 device_set_realized /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/core/qdev.c:510:13 (qemu-system-aarch64+0x158e205) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #6 property_set_bool /home/alex/lsrc/qemu.git/builds/system.threadsan/../../qom/object.c:2354:5 (qemu-system-aarch64+0x159dffb) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #7 object_property_set /home/alex/lsrc/qemu.git/builds/system.threadsan/../../qom/object.c:1463:5 (qemu-system-aarch64+0x159a6a7) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #8 object_property_set_qobject /home/alex/lsrc/qemu.git/builds/system.threadsan/../../qom/qom-qobject.c:28:10 (qemu-system-aarch64+0x15a35e2) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #9 object_property_set_bool /home/alex/lsrc/qemu.git/builds/system.threadsan/../../qom/object.c:1533:15 (qemu-system-aarch64+0x159b00e) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #10 qdev_realize /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/core/qdev.c:291:12 (qemu-system-aarch64+0x158b989) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #11 machvirt_init /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/arm/virt.c:2295:9 (qemu-system-aarch64+0xef0028) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #12 machine_run_board_init /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/core/machine.c:1583:5 (qemu-system-aarch64+0x607380) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #13 qemu_init_board /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/vl.c:2621:5 (qemu-system-aarch64+0xce9b30) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #14 qmp_x_exit_preconfig /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/vl.c:2713:5 (qemu-system-aarch64+0xce98df) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #15 qemu_init /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/vl.c:3759:9 (qemu-system-aarch64+0xced7ca) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #16 main /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/main.c:47:5 (qemu-system-aarch64+0x1583c78) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) SUMMARY: ThreadSanitizer: data race (/home/alex/lsrc/qemu.git/builds/system.threadsan/qemu-system-aarch64+0x41fede) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) in __interceptor_memcpy ================== ================== WARNING: ThreadSanitizer: data race (pid=1751256) Write of size 8 at 0x7f7fa0ea9158 by main thread (mutexes: write M0): #0 memcpy <null> (qemu-system-aarch64+0x41fede) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #1 iov_from_buf_full /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/iov.c:32:13 (qemu-system-aarch64+0x1983771) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #2 iov_from_buf /home/alex/lsrc/qemu.git/include/qemu/iov.h:49:16 (qemu-system-aarch64+0xe2c471) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #3 virtio_gpu_ctrl_response /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/display/virtio-gpu.c:173:9 (qemu-system-aarch64+0xe2c2bb) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #4 virtio_gpu_ctrl_response_nodata /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/display/virtio-gpu.c:192:5 (qemu-system-aarch64+0xe2c5b1) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #5 virtio_gpu_virgl_process_cmd /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/display/virtio-gpu-virgl.c:1006:9 (qemu-system-aarch64+0xe3da2a) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #6 virtio_gpu_process_cmdq /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/display/virtio-gpu.c:1055:9 (qemu-system-aarch64+0xe308ca) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #7 virtio_gpu_gl_handle_ctrl /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/display/virtio-gpu-gl.c:100:5 (qemu-system-aarch64+0xe3c8fd) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #8 virtio_gpu_ctrl_bh /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/display/virtio-gpu.c:1134:5 (qemu-system-aarch64+0xe3173a) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #9 aio_bh_call /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/async.c:171:5 (qemu-system-aarch64+0x19643e7) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #10 aio_bh_poll /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/async.c:218:13 (qemu-system-aarch64+0x1964723) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #11 aio_dispatch /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/aio-posix.c:423:5 (qemu-system-aarch64+0x192ab55) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #12 aio_ctx_dispatch /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/async.c:360:5 (qemu-system-aarch64+0x1966d94) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #13 g_main_context_dispatch <null> (libglib-2.0.so.0+0x547a8) (BuildId: 9f90bd7bbfcf84a1f1c5a6102f70e6264837b9d4) #14 os_host_main_loop_wait /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/main-loop.c:310:5 (qemu-system-aarch64+0x1967f14) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #15 main_loop_wait /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/main-loop.c:589:11 (qemu-system-aarch64+0x1967d78) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #16 qemu_main_loop /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/runstate.c:795:9 (qemu-system-aarch64+0xce803c) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #17 qemu_default_main /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/main.c:37:14 (qemu-system-aarch64+0x1583c15) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #18 main /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/main.c:48:12 (qemu-system-aarch64+0x1583c8a) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) Previous write of size 8 at 0x7f7fa0ea9158 by thread T3: #0 memset <null> (qemu-system-aarch64+0x41fbdd) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #1 helper_dc_zva /home/alex/lsrc/qemu.git/builds/system.threadsan/../../target/arm/tcg/helper-a64.c:974:5 (qemu-system-aarch64+0x1305506) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #2 <null> <null> (0x7f82e1360e74) #3 cpu_loop_exec_tb /home/alex/lsrc/qemu.git/builds/system.threadsan/../../accel/tcg/cpu-exec.c:917:10 (qemu-system-aarch64+0x15383fd) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #4 cpu_exec_loop /home/alex/lsrc/qemu.git/builds/system.threadsan/../../accel/tcg/cpu-exec.c:1031:13 (qemu-system-aarch64+0x153746f) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #5 cpu_exec_setjmp /home/alex/lsrc/qemu.git/builds/system.threadsan/../../accel/tcg/cpu-exec.c:1048:12 (qemu-system-aarch64+0x15354db) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #6 cpu_exec /home/alex/lsrc/qemu.git/builds/system.threadsan/../../accel/tcg/cpu-exec.c:1074:11 (qemu-system-aarch64+0x1535040) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #7 tcg_cpu_exec /home/alex/lsrc/qemu.git/builds/system.threadsan/../../accel/tcg/tcg-accel-ops.c:78:11 (qemu-system-aarch64+0x157aabe) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #8 mttcg_cpu_thread_fn /home/alex/lsrc/qemu.git/builds/system.threadsan/../../accel/tcg/tcg-accel-ops-mttcg.c:95:17 (qemu-system-aarch64+0x157bb7b) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #9 qemu_thread_start /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/qemu-thread-posix.c:541:9 (qemu-system-aarch64+0x19363d1) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) Mutex M0 (0x562c55066680) created at: #0 pthread_mutex_init <null> (qemu-system-aarch64+0x41746f) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #1 qemu_mutex_init /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/qemu-thread-posix.c:71:11 (qemu-system-aarch64+0x19345d9) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #2 qemu_init_cpu_loop /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/cpus.c:423:5 (qemu-system-aarch64+0xccd145) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #3 qemu_init_subsystems /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/runstate.c:825:5 (qemu-system-aarch64+0xce83a7) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #4 qemu_init /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/vl.c:2795:5 (qemu-system-aarch64+0xcea0f3) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #5 main /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/main.c:47:5 (qemu-system-aarch64+0x1583c78) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) Thread T3 'CPU 0/TCG' (tid=1751259, running) created by main thread at: #0 pthread_create <null> (qemu-system-aarch64+0x415c6d) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #1 qemu_thread_create /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/qemu-thread-posix.c:581:11 (qemu-system-aarch64+0x193619b) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #2 mttcg_start_vcpu_thread /home/alex/lsrc/qemu.git/builds/system.threadsan/../../accel/tcg/tcg-accel-ops-mttcg.c:144:5 (qemu-system-aarch64+0x157b940) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #3 qemu_init_vcpu /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/cpus.c:680:5 (qemu-system-aarch64+0xcce7a0) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #4 arm_cpu_realizefn /home/alex/lsrc/qemu.git/builds/system.threadsan/../../target/arm/cpu.c:2588:5 (qemu-system-aarch64+0xfd9023) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #5 device_set_realized /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/core/qdev.c:510:13 (qemu-system-aarch64+0x158e205) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #6 property_set_bool /home/alex/lsrc/qemu.git/builds/system.threadsan/../../qom/object.c:2354:5 (qemu-system-aarch64+0x159dffb) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #7 object_property_set /home/alex/lsrc/qemu.git/builds/system.threadsan/../../qom/object.c:1463:5 (qemu-system-aarch64+0x159a6a7) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #8 object_property_set_qobject /home/alex/lsrc/qemu.git/builds/system.threadsan/../../qom/qom-qobject.c:28:10 (qemu-system-aarch64+0x15a35e2) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #9 object_property_set_bool /home/alex/lsrc/qemu.git/builds/system.threadsan/../../qom/object.c:1533:15 (qemu-system-aarch64+0x159b00e) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #10 qdev_realize /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/core/qdev.c:291:12 (qemu-system-aarch64+0x158b989) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #11 machvirt_init /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/arm/virt.c:2295:9 (qemu-system-aarch64+0xef0028) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #12 machine_run_board_init /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/core/machine.c:1583:5 (qemu-system-aarch64+0x607380) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #13 qemu_init_board /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/vl.c:2621:5 (qemu-system-aarch64+0xce9b30) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #14 qmp_x_exit_preconfig /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/vl.c:2713:5 (qemu-system-aarch64+0xce98df) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #15 qemu_init /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/vl.c:3759:9 (qemu-system-aarch64+0xced7ca) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) #16 main /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/main.c:47:5 (qemu-system-aarch64+0x1583c78) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) SUMMARY: ThreadSanitizer: data race (/home/alex/lsrc/qemu.git/builds/system.threadsan/qemu-system-aarch64+0x41fede) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) in __interceptor_memcpy This could be a false positive or it could be a race between the guest kernel clearing memory while we are still doing virtio_gpu_ctrl_response. What do you think? > > [1] https://lore.kernel.org/kvm/20240229025759.1187910-1-stevensd@google.com/ > > You'll need to use recent Mesa version containing patch that removes > dependency on cross-device feature from Venus that isn't supported by > Qemu [2]. > > [2] https://gitlab.freedesktop.org/mesa/mesa/-/commit/087e9a96d13155e26987befae78b6ccbb7ae242b > > Example Qemu cmdline that enables Venus: > > qemu-system-x86_64 -device virtio-vga-gl,hostmem=4G,blob=true,venus=true \ > -machine q35,accel=kvm,memory-backend=mem1 \ > -object memory-backend-memfd,id=mem1,size=8G -m 8G > > > Changes from V13 to V14 > > - Fixed erronous fall-through in renderer_state's switch-case that was > spotted by Marc-André Lureau. > > - Reworked HOSTMEM_MR_FINISH_UNMAPPING handling as was suggested by > Akihiko Odaki. Now it shares the same code path with HOSTMEM_MR_MAPPED. > > - Made use of g_autofree in virgl_cmd_resource_create_blob() as was > suggested by Akihiko Odaki. > > - Removed virtio_gpu_virgl_deinit() and moved all deinit code to > virtio_gpu_gl_device_unrealize() as was suggested by Marc-André Lureau. > > - Replaced HAVE_FEATURE in mseon.build with virglrenderer's VERSION_MAJOR > check as was suggested by Marc-André Lureau. > > - Added trace event for cmd-suspension as was suggested by Marc-André Lureau. > > - Added patch to replace in-flight printf's with trace events as was > suggested by Marc-André Lureau > > Changes from V12 to V13 > > - Replaced `res->async_unmap_in_progress` flag with a mapping state, > moved it to the virtio_gpu_virgl_hostmem_region like was suggested > by Akihiko Odaki. > > - Renamed blob_unmap function and added back cmd_suspended argument > to it. Suggested by Akihiko Odaki. > > - Reordered VirtIOGPUGL refactoring patches to minimize code changes > like was suggested by Akihiko Odaki. > > - Replaced gl->renderer_inited with gl->renderer_state, like was suggested > by Alex Bennée. > > - Added gl->renderer state resetting to gl_device_unrealize(), for > consistency. Suggested by Alex Bennée. > > - Added rb's from Alex and Manos. > > - Fixed compiling with !HAVE_VIRGL_RESOURCE_BLOB. > > Changes from V11 to V12 > > - Fixed virgl_cmd_resource_create_blob() error handling. Now it doesn't > corrupt resource list and releases resource properly on error. Thanks > to Akihiko Odaki for spotting the bug. > > - Added new patch that handles virtio_gpu_virgl_init() failure gracefully, > fixing Qemu crash. Besides fixing the crash, it allows to implement > a cleaner virtio_gpu_virgl_deinit(). > > - virtio_gpu_virgl_deinit() now assumes that previously virgl was > initialized successfully when it was inited at all. Suggested by > Akihiko Odaki. > > - Fixed missed freeing of print_stats timer in virtio_gpu_virgl_deinit() > > - Added back blob unmapping or RESOURCE_UNREF that was requested > by Akihiko Odaki. Added comment to the code explaining how > async unmapping works. Added back `res->async_unmap_in_progress` > flag and added comment telling why it's needed. > > - Moved cmdq_resume_bh to VirtIOGPUGL and made coding style changes > suggested by Akihiko Odaki. > > - Added patches that move fence_poll and print_stats timers to VirtIOGPUGL > for consistency with cmdq_resume_bh. > > Changes from V10 to V11 > > - Replaced cmd_resume bool in struct ctrl_command with > "cmd->finished + !VIRTIO_GPU_FLAG_FENCE" checking as was requested > by Akihiko Odaki. > > - Reworked virgl_cmd_resource_unmap/unref_blob() to avoid re-adding > the 'async_unmap_in_progress' flag that was dropped in v9: > > 1. virgl_cmd_resource_[un]map_blob() now doesn't check itself whether > resource was previously mapped and lets virglrenderer to do the > checking. > > 2. error returned by virgl_renderer_resource_unmap() is now handled > and reported properly, previously the error wasn't checked. The > virgl_renderer_resource_unmap() fails if resource wasn't mapped. > > 3. virgl_cmd_resource_unref_blob() now doesn't allow to unref resource > that is mapped, it's a error condition if guest didn't unmap resource > before doing the unref. Previously unref was implicitly unmapping > resource. > > Changes from V9 to V10 > > - Dropped 'async_unmap_in_progress' variable and switched to use > aio_bh_new() isntead of oneshot variant in the "blob commands" patch. > > - Further improved error messages by printing error code when actual error > occurrs and using ERR_UNSPEC instead of ERR_ENOMEM when we don't really > know if it was ENOMEM for sure. > > - Added vdc->unrealize for the virtio GL device and freed virgl data. > > - Dropped UUID and doc/migration patches. UUID feature isn't needed > anymore, instead we changed Mesa Venus driver to not require UUID. > > - Renamed virtio-gpu-gl "vulkan" property name back to "venus". > > Changes from V8 to V9 > > - Added resuming of cmdq processing when hostmem MR is freed, > as was suggested by Akihiko Odaki. > > - Added more error messages, suggested by Akihiko Odaki > > - Dropped superfluous 'res->async_unmap_completed', suggested > by Akihiko Odaki. > > - Kept using cmd->suspended flag. Akihiko Odaki suggested to make > virtio_gpu_virgl_process_cmd() return false if cmd processing is > suspended, but it's not easy to implement due to ubiquitous > VIRTIO_GPU_FILL_CMD() macros that returns void, requiring to change > all the virtio-gpu processing code. > > - Added back virtio_gpu_virgl_resource as was requested by Akihiko Odaki, > though I'm not convinced it's really needed. > > - Switched to use GArray, renamed capset2_max_ver/size vars and moved > "vulkan" property definition to the virtio-gpu-gl device in the Venus > patch, like was suggested by Akihiko Odaki. > > - Moved UUID to virtio_gpu_virgl_resource and dropped UUID save/restore > since it will require bumping VM version and virgl device isn't miratable > anyways. > > - Fixed exposing UUID feature with Rutabaga > > - Dropped linux-headers update patch because headers were already updated > in Qemu/staging. > > - Added patch that updates virtio migration doc with a note about virtio-gpu > migration specifics, suggested by Akihiko Odaki. > > - Addressed coding style issue noticed by Akihiko Odaki > > Changes from V7 to V8 > > - Supported suspension of virtio-gpu commands processing and made > unmapping of hostmem region asynchronous by blocking/suspending > cmd processing until region is unmapped. Suggested by Akihiko Odaki. > > - Fixed arm64 building of x86 targets using updated linux-headers. > Corrected the update script. Thanks to Rob Clark for reporting > the issue. > > - Added new patch that makes registration of virgl capsets dynamic. > Requested by Antonio Caggiano and Pierre-Eric Pelloux-Prayer. > > - Venus capset now isn't advertised if Vulkan is disabled with vulkan=false > > Changes from V6 to V7 > > - Used scripts/update-linux-headers.sh to update Qemu headers based > on Linux v6.8-rc3 that adds Venus capset definition to virtio-gpu > protocol, was requested by Peter Maydel > > - Added r-bs that were given to v6 patches. Corrected missing s-o-bs > > - Dropped context_init Qemu's virtio-gpu device configuration flag, > was suggested by Marc-André Lureau > > - Added missing error condition checks spotted by Marc-André Lureau > and Akihiko Odaki, and few more > > - Returned back res->mr referencing to memory_region_init_ram_ptr() like > was suggested by Akihiko Odaki. Incorporated fix suggested by Pierre-Eric > to specify the MR name > > - Dropped the virgl_gpu_resource wrapper, cleaned up and simplified > patch that adds blob-cmd support > > - Fixed improper blob resource removal from resource list on resource_unref > that was spotted by Akihiko Odaki > > - Change order of the blob patches, was suggested by Akihiko Odaki. > The cmd_set_scanout_blob support is enabled first > > - Factored out patch that adds resource management support to virtio-gpu-gl, > was requested by Marc-André Lureau > > - Simplified and improved the UUID support patch, dropped the hash table > as we don't need it for now. Moved QemuUUID to virtio_gpu_simple_resource. > This all was suggested by Akihiko Odaki and Marc-André Lureau > > - Dropped console_has_gl() check, suggested by Akihiko Odaki > > - Reworked Meson cheking of libvirglrender features, made new features > available based on virglrender pkgconfig version instead of checking > symbols in header. This should fix build error using older virglrender > version, reported by Alex Bennée > > - Made enabling of Venus context configrable via new virtio-gpu device > "vulkan=true" flag, suggested by Marc-André Lureau. The flag is disabled > by default because it requires blob and hostmem options to be enabled > and configured > > Changes from V5 to V6 > > - Move macros configurations under virgl.found() and rename > HAVE_VIRGL_CONTEXT_CREATE_WITH_FLAGS. > > - Handle the case while context_init is disabled. > > - Enable context_init by default. > > - Move virtio_gpu_virgl_resource_unmap() into > virgl_cmd_resource_unmap_blob(). > > - Introduce new struct virgl_gpu_resource to store virgl specific members. > > - Remove erro handling of g_new0, because glib will abort() on OOM. > > - Set resource uuid as option. > > - Implement optional subsection of vmstate_virtio_gpu_resource_uuid_state > for virtio live migration. > > - Use g_int_hash/g_int_equal instead of the default > > - Add scanout_blob function for virtio-gpu-virgl > > - Resolve the memory leak on virtio-gpu-virgl > > - Remove the unstable API flags check because virglrenderer is already 1.0 > > - Squash the render server flag support into "Initialize Venus" > > Changes from V4 (virtio gpu V4) to V5 > > - Inverted patch 5 and 6 because we should configure > HAVE_VIRGL_CONTEXT_INIT firstly. > > - Validate owner of memory region to avoid slowing down DMA. > > - Use memory_region_init_ram_ptr() instead of > memory_region_init_ram_device_ptr(). > > - Adjust sequence to allocate gpu resource before virglrender resource > creation > > - Add virtio migration handling for uuid. > > - Send kernel patch to define VIRTIO_GPU_CAPSET_VENUS. > https://lore.kernel.org/lkml/20230915105918.3763061-1-ray.huang@amd.com/ > > - Add meson check to make sure unstable APIs defined from 0.9.0. > > Changes from V1 to V2 (virtio gpu V4) > > - Remove unused #include "hw/virtio/virtio-iommu.h" > > - Add a local function, called virgl_resource_destroy(), that is used > to release a vgpu resource on error paths and in resource_unref. > > - Remove virtio_gpu_virgl_resource_unmap from > virtio_gpu_cleanup_mapping(), > since this function won't be called on blob resources and also because > blob resources are unmapped via virgl_cmd_resource_unmap_blob(). > > - In virgl_cmd_resource_create_blob(), do proper cleanup in error paths > and move QTAILQ_INSERT_HEAD(&g->reslist, res, next) after the resource > has been fully initialized. > > - Memory region has a different life-cycle from virtio gpu resources > i.e. cannot be released synchronously along with the vgpu resource. > So, here the field "region" was changed to a pointer and is allocated > dynamically when the blob is mapped. > Also, since the pointer can be used to indicate whether the blob > is mapped, the explicite field "mapped" was removed. > > - In virgl_cmd_resource_map_blob(), add check on the value of > res->region, to prevent beeing called twice on the same resource. > > - Add a patch to enable automatic deallocation of memory regions to resolve > use-after-free memory corruption with a reference. > > > Antonio Caggiano (2): > virtio-gpu: Handle resource blob commands > virtio-gpu: Support Venus context > > Dmitry Osipenko (8): > virtio-gpu: Use trace events for tracking number of in-flight fences > virtio-gpu: Move fence_poll timer to VirtIOGPUGL > virtio-gpu: Move print_stats timer to VirtIOGPUGL > virtio-gpu: Handle virtio_gpu_virgl_init() failure > virtio-gpu: Unrealize GL device > virtio-gpu: Use pkgconfig version to decide which virgl features are > available > virtio-gpu: Don't require udmabuf when blobs and virgl are enabled > virtio-gpu: Support suspension of commands processing > > Huang Rui (2): > virtio-gpu: Support context-init feature with virglrenderer > virtio-gpu: Add virgl resource management > > Pierre-Eric Pelloux-Prayer (1): > virtio-gpu: Register capsets dynamically > > Robert Beckett (1): > virtio-gpu: Support blob scanout using dmabuf fd > > hw/display/trace-events | 3 + > hw/display/virtio-gpu-gl.c | 62 +++- > hw/display/virtio-gpu-virgl.c | 589 +++++++++++++++++++++++++++++++-- > hw/display/virtio-gpu.c | 44 ++- > include/hw/virtio/virtio-gpu.h | 32 +- > meson.build | 5 +- > 6 files changed, 678 insertions(+), 57 deletions(-)
On 6/19/24 20:37, Alex Bennée wrote: > So I've been experimenting with Aarch64 TCG with an Intel backend like > this: > > ./qemu-system-aarch64 \ > -M virt -cpu cortex-a76 \ > -device virtio-net-pci,netdev=unet \ > -netdev user,id=unet,hostfwd=tcp::2222-:22 \ > -m 8192 \ > -object memory-backend-memfd,id=mem,size=8G,share=on \ > -serial mon:stdio \ > -kernel ~/lsrc/linux.git/builds/arm64.initramfs/arch/arm64/boot/Image \ > -append "console=ttyAMA0" \ > -device qemu-xhci -device usb-kbd -device usb-tablet \ > -device virtio-gpu-gl-pci,blob=true,venus=true,hostmem=4G \ > -display sdl,gl=on -d plugin,guest_errors,trace:virtio_gpu_cmd_res_create_blob,trace:virtio_gpu_cmd_res_back_\*,trace:virtio_gpu_cmd_res_xfer_toh_3d,trace:virtio_gpu_cmd_res_xfer_fromh_3d,trace:address_space_map > > And I've noticed a couple of things. First trying to launch vkmark to > run a KMS mode test fails with: > ... > virgl_render_server[1875931]: vkr: failed to import resource: invalid res_id 5 > virgl_render_server[1875931]: vkr: vkAllocateMemory resulted in CS error > virgl_render_server[1875931]: vkr: ring_submit_cmd: vn_dispatch_command failed > > More interestingly when shutting stuff down we see weirdness like: > > address_space_map as:0x561b48ec48c0 addr 0x1008ac4b0:18 write:1 attrs:0x1 > virgl_render_server[1875931]: vkr: destroying context 3 (vkmark) with a valid instance > virgl_render_server[1875931]: vkr: destroying device with valid objects > vkr_context_remove_object: -7438602987017907480 > vkr_context_remove_object: 7 > vkr_context_remove_object: 5 > > which indicates something has gone very wrong. I'm not super familiar > with the memory allocation patterns but should stuff that is done as > virtio_gpu_cmd_res_back_attach() be find-able in the list of resources? This is expected to fail. Vkmark creates shmem virgl GBM FB BO on guest that isn't exportable on host. AFAICT, more code changes should be needed to support this case. Note that "destroying device with valid objects" msg is fine, won't hurt to silence it in Venus to avoid confusion. It will happen every time guest application is closed without explicitly releasing every VK object. > I tried running under RR to further debug but weirdly I can't get > working graphics with that. I did try running under threadsan which > complained about a potential data race: > > vkr_context_add_object: 1 -> 0x7b2c00000288 > vkr_context_add_object: 2 -> 0x7b2c00000270 > vkr_context_add_object: 3 -> 0x7b3800007f28 > vkr_context_add_object: 4 -> 0x7b3800007fa0 > vkr_context_add_object: 5 -> 0x7b48000103f8 > vkr_context_add_object: 6 -> 0x7b48000104a0 > vkr_context_add_object: 7 -> 0x7b4800010440 > virtio_gpu_cmd_res_back_attach res 0x5 > virtio_gpu_cmd_res_back_attach res 0x6 > vkr_context_add_object: 8 -> 0x7b48000103e0 > virgl_render_server[1751430]: vkr: failed to import resource: invalid res_id 5 > virgl_render_server[1751430]: vkr: vkAllocateMemory resulted in CS error > virgl_render_server[1751430]: vkr: ring_submit_cmd: vn_dispatch_command failed > ================== > WARNING: ThreadSanitizer: data race (pid=1751256) > Read of size 8 at 0x7f7fa0ea9138 by main thread (mutexes: write M0): > #0 memcpy <null> (qemu-system-aarch64+0x41fede) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) .. > ================== > SUMMARY: ThreadSanitizer: data race (/home/alex/lsrc/qemu.git/builds/system.threadsan/qemu-system-aarch64+0x41fede) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) in __interceptor_memcpy > > This could be a false positive or it could be a race between the guest > kernel clearing memory while we are still doing > virtio_gpu_ctrl_response. > > What do you think? The memcpy warning looks a bit suspicion, but likely is harmless. I don't see such warning with TSAN and x86 VM.
Dmitry Osipenko <dmitry.osipenko@collabora.com> writes: > On 6/19/24 20:37, Alex Bennée wrote: >> So I've been experimenting with Aarch64 TCG with an Intel backend like >> this: >> >> ./qemu-system-aarch64 \ >> -M virt -cpu cortex-a76 \ >> -device virtio-net-pci,netdev=unet \ >> -netdev user,id=unet,hostfwd=tcp::2222-:22 \ >> -m 8192 \ >> -object memory-backend-memfd,id=mem,size=8G,share=on \ >> -serial mon:stdio \ >> -kernel ~/lsrc/linux.git/builds/arm64.initramfs/arch/arm64/boot/Image \ >> -append "console=ttyAMA0" \ >> -device qemu-xhci -device usb-kbd -device usb-tablet \ >> -device virtio-gpu-gl-pci,blob=true,venus=true,hostmem=4G \ >> -display sdl,gl=on -d plugin,guest_errors,trace:virtio_gpu_cmd_res_create_blob,trace:virtio_gpu_cmd_res_back_\*,trace:virtio_gpu_cmd_res_xfer_toh_3d,trace:virtio_gpu_cmd_res_xfer_fromh_3d,trace:address_space_map >> >> And I've noticed a couple of things. First trying to launch vkmark to >> run a KMS mode test fails with: >> > ... >> virgl_render_server[1875931]: vkr: failed to import resource: invalid res_id 5 >> virgl_render_server[1875931]: vkr: vkAllocateMemory resulted in CS error >> virgl_render_server[1875931]: vkr: ring_submit_cmd: vn_dispatch_command failed >> >> More interestingly when shutting stuff down we see weirdness like: >> >> address_space_map as:0x561b48ec48c0 addr 0x1008ac4b0:18 write:1 attrs:0x1 >> virgl_render_server[1875931]: vkr: destroying context 3 (vkmark) with a valid instance >> virgl_render_server[1875931]: vkr: destroying device with valid objects >> vkr_context_remove_object: -7438602987017907480 >> vkr_context_remove_object: 7 >> vkr_context_remove_object: 5 >> >> which indicates something has gone very wrong. I'm not super familiar >> with the memory allocation patterns but should stuff that is done as >> virtio_gpu_cmd_res_back_attach() be find-able in the list of resources? > > This is expected to fail. Vkmark creates shmem virgl GBM FB BO on guest > that isn't exportable on host. AFAICT, more code changes should be > needed to support this case. There are a lot of acronyms there. If this is pure guest memory why isn't it exportable to the host? Or should the underlying mesa library be making sure the allocation happens from the shared region? Is vkmark particularly special here? > Note that "destroying device with valid objects" msg is fine, won't hurt > to silence it in Venus to avoid confusion. It will happen every time > guest application is closed without explicitly releasing every VK > object. I was more concerned with: >> vkr_context_remove_object: -7438602987017907480 which looks like a corruption of the object ids (or maybe an offby one) > >> I tried running under RR to further debug but weirdly I can't get >> working graphics with that. I did try running under threadsan which >> complained about a potential data race: >> >> vkr_context_add_object: 1 -> 0x7b2c00000288 >> vkr_context_add_object: 2 -> 0x7b2c00000270 >> vkr_context_add_object: 3 -> 0x7b3800007f28 >> vkr_context_add_object: 4 -> 0x7b3800007fa0 >> vkr_context_add_object: 5 -> 0x7b48000103f8 >> vkr_context_add_object: 6 -> 0x7b48000104a0 >> vkr_context_add_object: 7 -> 0x7b4800010440 >> virtio_gpu_cmd_res_back_attach res 0x5 >> virtio_gpu_cmd_res_back_attach res 0x6 >> vkr_context_add_object: 8 -> 0x7b48000103e0 >> virgl_render_server[1751430]: vkr: failed to import resource: invalid res_id 5 >> virgl_render_server[1751430]: vkr: vkAllocateMemory resulted in CS error >> virgl_render_server[1751430]: vkr: ring_submit_cmd: vn_dispatch_command failed >> ================== >> WARNING: ThreadSanitizer: data race (pid=1751256) >> Read of size 8 at 0x7f7fa0ea9138 by main thread (mutexes: write M0): >> #0 memcpy <null> (qemu-system-aarch64+0x41fede) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) > .. >> ================== >> SUMMARY: ThreadSanitizer: data race (/home/alex/lsrc/qemu.git/builds/system.threadsan/qemu-system-aarch64+0x41fede) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) in __interceptor_memcpy >> >> This could be a false positive or it could be a race between the guest >> kernel clearing memory while we are still doing >> virtio_gpu_ctrl_response. >> >> What do you think? > > The memcpy warning looks a bit suspicion, but likely is harmless. I > don't see such warning with TSAN and x86 VM. TSAN can only pick up these interactions with TCG guests because it can track guest memory accesses. With a KVM guest we have no visibility of the guest accesses.
On 6/21/24 11:59, Alex Bennée wrote: > Dmitry Osipenko <dmitry.osipenko@collabora.com> writes: > >> On 6/19/24 20:37, Alex Bennée wrote: >>> So I've been experimenting with Aarch64 TCG with an Intel backend like >>> this: >>> >>> ./qemu-system-aarch64 \ >>> -M virt -cpu cortex-a76 \ >>> -device virtio-net-pci,netdev=unet \ >>> -netdev user,id=unet,hostfwd=tcp::2222-:22 \ >>> -m 8192 \ >>> -object memory-backend-memfd,id=mem,size=8G,share=on \ >>> -serial mon:stdio \ >>> -kernel ~/lsrc/linux.git/builds/arm64.initramfs/arch/arm64/boot/Image \ >>> -append "console=ttyAMA0" \ >>> -device qemu-xhci -device usb-kbd -device usb-tablet \ >>> -device virtio-gpu-gl-pci,blob=true,venus=true,hostmem=4G \ >>> -display sdl,gl=on -d plugin,guest_errors,trace:virtio_gpu_cmd_res_create_blob,trace:virtio_gpu_cmd_res_back_\*,trace:virtio_gpu_cmd_res_xfer_toh_3d,trace:virtio_gpu_cmd_res_xfer_fromh_3d,trace:address_space_map >>> >>> And I've noticed a couple of things. First trying to launch vkmark to >>> run a KMS mode test fails with: >>> >> ... >>> virgl_render_server[1875931]: vkr: failed to import resource: invalid res_id 5 >>> virgl_render_server[1875931]: vkr: vkAllocateMemory resulted in CS error >>> virgl_render_server[1875931]: vkr: ring_submit_cmd: vn_dispatch_command failed >>> >>> More interestingly when shutting stuff down we see weirdness like: >>> >>> address_space_map as:0x561b48ec48c0 addr 0x1008ac4b0:18 write:1 attrs:0x1 >>> virgl_render_server[1875931]: vkr: destroying context 3 (vkmark) with a valid instance >>> virgl_render_server[1875931]: vkr: destroying device with valid objects >>> vkr_context_remove_object: -7438602987017907480 >>> vkr_context_remove_object: 7 >>> vkr_context_remove_object: 5 >>> >>> which indicates something has gone very wrong. I'm not super familiar >>> with the memory allocation patterns but should stuff that is done as >>> virtio_gpu_cmd_res_back_attach() be find-able in the list of resources? >> >> This is expected to fail. Vkmark creates shmem virgl GBM FB BO on guest >> that isn't exportable on host. AFAICT, more code changes should be >> needed to support this case. > > There are a lot of acronyms there. If this is pure guest memory why > isn't it exportable to the host? Or should the underlying mesa library > be making sure the allocation happens from the shared region? > > Is vkmark particularly special here? Actually, you could get it to work to a some degree if you'll compile virglrenderer with -Dminigbm_allocation=true. On host use GTK/Wayland display. Vkmark isn't special. It's virglrenderer that has a room for improvement. ChromeOS doesn't use KMS in VMs, proper KMS support was never a priority for Venus. >> Note that "destroying device with valid objects" msg is fine, won't hurt >> to silence it in Venus to avoid confusion. It will happen every time >> guest application is closed without explicitly releasing every VK >> object. > > I was more concerned with: > >>> vkr_context_remove_object: -7438602987017907480 > > which looks like a corruption of the object ids (or maybe an offby one) At first this appeared to be a valid value, otherwise venus should've crashed Qemu with a debug-assert if ID was invalid. But I never see such odd IDs with my testing. >>> I tried running under RR to further debug but weirdly I can't get >>> working graphics with that. I did try running under threadsan which >>> complained about a potential data race: >>> >>> vkr_context_add_object: 1 -> 0x7b2c00000288 >>> vkr_context_add_object: 2 -> 0x7b2c00000270 >>> vkr_context_add_object: 3 -> 0x7b3800007f28 >>> vkr_context_add_object: 4 -> 0x7b3800007fa0 >>> vkr_context_add_object: 5 -> 0x7b48000103f8 >>> vkr_context_add_object: 6 -> 0x7b48000104a0 >>> vkr_context_add_object: 7 -> 0x7b4800010440 >>> virtio_gpu_cmd_res_back_attach res 0x5 >>> virtio_gpu_cmd_res_back_attach res 0x6 >>> vkr_context_add_object: 8 -> 0x7b48000103e0 >>> virgl_render_server[1751430]: vkr: failed to import resource: invalid res_id 5 >>> virgl_render_server[1751430]: vkr: vkAllocateMemory resulted in CS error >>> virgl_render_server[1751430]: vkr: ring_submit_cmd: vn_dispatch_command failed >>> ================== >>> WARNING: ThreadSanitizer: data race (pid=1751256) >>> Read of size 8 at 0x7f7fa0ea9138 by main thread (mutexes: write M0): >>> #0 memcpy <null> (qemu-system-aarch64+0x41fede) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) >> .. >>> ================== >>> SUMMARY: ThreadSanitizer: data race (/home/alex/lsrc/qemu.git/builds/system.threadsan/qemu-system-aarch64+0x41fede) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) in __interceptor_memcpy >>> >>> This could be a false positive or it could be a race between the guest >>> kernel clearing memory while we are still doing >>> virtio_gpu_ctrl_response. >>> >>> What do you think? >> >> The memcpy warning looks a bit suspicion, but likely is harmless. I >> don't see such warning with TSAN and x86 VM. > > TSAN can only pick up these interactions with TCG guests because it can > track guest memory accesses. With a KVM guest we have no visibility of > the guest accesses. I couldn't reproduce this issue with my KVM/TCG/ARM64 setups. Fox x86 I checked both KVM and TCG, TSAN only warns about vitio-net memcpy's for me.
Dmitry Osipenko <dmitry.osipenko@collabora.com> writes: > On 6/21/24 11:59, Alex Bennée wrote: >> Dmitry Osipenko <dmitry.osipenko@collabora.com> writes: >> >>> On 6/19/24 20:37, Alex Bennée wrote: >>>> So I've been experimenting with Aarch64 TCG with an Intel backend like >>>> this: >>>> >>>> ./qemu-system-aarch64 \ >>>> -M virt -cpu cortex-a76 \ >>>> -device virtio-net-pci,netdev=unet \ >>>> -netdev user,id=unet,hostfwd=tcp::2222-:22 \ >>>> -m 8192 \ >>>> -object memory-backend-memfd,id=mem,size=8G,share=on \ >>>> -serial mon:stdio \ >>>> -kernel ~/lsrc/linux.git/builds/arm64.initramfs/arch/arm64/boot/Image \ >>>> -append "console=ttyAMA0" \ >>>> -device qemu-xhci -device usb-kbd -device usb-tablet \ >>>> -device virtio-gpu-gl-pci,blob=true,venus=true,hostmem=4G \ >>>> -display sdl,gl=on -d plugin,guest_errors,trace:virtio_gpu_cmd_res_create_blob,trace:virtio_gpu_cmd_res_back_\*,trace:virtio_gpu_cmd_res_xfer_toh_3d,trace:virtio_gpu_cmd_res_xfer_fromh_3d,trace:address_space_map >>>> >>>> And I've noticed a couple of things. First trying to launch vkmark to >>>> run a KMS mode test fails with: >>>> >>> ... >>>> virgl_render_server[1875931]: vkr: failed to import resource: invalid res_id 5 >>>> virgl_render_server[1875931]: vkr: vkAllocateMemory resulted in CS error >>>> virgl_render_server[1875931]: vkr: ring_submit_cmd: vn_dispatch_command failed >>>> >>>> More interestingly when shutting stuff down we see weirdness like: >>>> >>>> address_space_map as:0x561b48ec48c0 addr 0x1008ac4b0:18 write:1 attrs:0x1 >>>> virgl_render_server[1875931]: vkr: destroying context 3 (vkmark) with a valid instance >>>> virgl_render_server[1875931]: vkr: destroying device with valid objects >>>> vkr_context_remove_object: -7438602987017907480 >>>> vkr_context_remove_object: 7 >>>> vkr_context_remove_object: 5 >>>> >>>> which indicates something has gone very wrong. I'm not super familiar >>>> with the memory allocation patterns but should stuff that is done as >>>> virtio_gpu_cmd_res_back_attach() be find-able in the list of resources? >>> >>> This is expected to fail. Vkmark creates shmem virgl GBM FB BO on guest >>> that isn't exportable on host. AFAICT, more code changes should be >>> needed to support this case. >> >> There are a lot of acronyms there. If this is pure guest memory why >> isn't it exportable to the host? Or should the underlying mesa library >> be making sure the allocation happens from the shared region? >> >> Is vkmark particularly special here? > > Actually, you could get it to work to a some degree if you'll compile > virglrenderer with -Dminigbm_allocation=true. On host use GTK/Wayland > display. I'll give that a go. > Vkmark isn't special. It's virglrenderer that has a room for > improvement. ChromeOS doesn't use KMS in VMs, proper KMS support was > never a priority for Venus. Is there a tracking bug for KMS support for Venus? Or Venus should work fine if virglrenderer can export the buffer to the host? <snip> >>>> >>>> This could be a false positive or it could be a race between the guest >>>> kernel clearing memory while we are still doing >>>> virtio_gpu_ctrl_response. >>>> >>>> What do you think? >>> >>> The memcpy warning looks a bit suspicion, but likely is harmless. I >>> don't see such warning with TSAN and x86 VM. >> >> TSAN can only pick up these interactions with TCG guests because it can >> track guest memory accesses. With a KVM guest we have no visibility of >> the guest accesses. > > I couldn't reproduce this issue with my KVM/TCG/ARM64 setups. Fox x86 I > checked both KVM and TCG, TSAN only warns about vitio-net memcpy's for > me. Hmm OK. I'll keep an eye out as I test the next version.