mbox series

[v14,00/14] Support blob memory and venus on qemu

Message ID 20240616010357.2874662-1-dmitry.osipenko@collabora.com (mailing list archive)
Headers show
Series Support blob memory and venus on qemu | expand

Message

Dmitry Osipenko June 16, 2024, 1:03 a.m. UTC
Hello,

This series enables Vulkan Venus context support on virtio-gpu.

All virglrender and almost all Linux kernel prerequisite changes
needed by Venus are already in upstream. For kernel there is a pending
KVM patchset that fixes mapping of compound pages needed for DRM drivers
using TTM [1], othewrwise hostmem blob mapping will fail with a KVM error
from Qemu.

[1] https://lore.kernel.org/kvm/20240229025759.1187910-1-stevensd@google.com/

You'll need to use recent Mesa version containing patch that removes
dependency on cross-device feature from Venus that isn't supported by
Qemu [2].

[2] https://gitlab.freedesktop.org/mesa/mesa/-/commit/087e9a96d13155e26987befae78b6ccbb7ae242b

Example Qemu cmdline that enables Venus:

  qemu-system-x86_64 -device virtio-vga-gl,hostmem=4G,blob=true,venus=true \
      -machine q35,accel=kvm,memory-backend=mem1 \
      -object memory-backend-memfd,id=mem1,size=8G -m 8G


Changes from V13 to V14

- Fixed erronous fall-through in renderer_state's switch-case that was
  spotted by Marc-André Lureau.

- Reworked HOSTMEM_MR_FINISH_UNMAPPING handling as was suggested by
  Akihiko Odaki. Now it shares the same code path with HOSTMEM_MR_MAPPED.

- Made use of g_autofree in virgl_cmd_resource_create_blob() as was
  suggested by Akihiko Odaki.

- Removed virtio_gpu_virgl_deinit() and moved all deinit code to
  virtio_gpu_gl_device_unrealize() as was suggested by Marc-André Lureau.

- Replaced HAVE_FEATURE in mseon.build with virglrenderer's VERSION_MAJOR
  check as was suggested by Marc-André Lureau.

- Added trace event for cmd-suspension as was suggested by Marc-André Lureau.

- Added patch to replace in-flight printf's with trace events as was
  suggested by Marc-André Lureau

Changes from V12 to V13

- Replaced `res->async_unmap_in_progress` flag with a mapping state,
  moved it to the virtio_gpu_virgl_hostmem_region like was suggested
  by Akihiko Odaki.

- Renamed blob_unmap function and added back cmd_suspended argument
  to it. Suggested by Akihiko Odaki.

- Reordered VirtIOGPUGL refactoring patches to minimize code changes
  like was suggested by Akihiko Odaki.

- Replaced gl->renderer_inited with gl->renderer_state, like was suggested
  by Alex Bennée.

- Added gl->renderer state resetting to gl_device_unrealize(), for
  consistency. Suggested by Alex Bennée.

- Added rb's from Alex and Manos.

- Fixed compiling with !HAVE_VIRGL_RESOURCE_BLOB.

Changes from V11 to V12

- Fixed virgl_cmd_resource_create_blob() error handling. Now it doesn't
  corrupt resource list and releases resource properly on error. Thanks
  to Akihiko Odaki for spotting the bug.

- Added new patch that handles virtio_gpu_virgl_init() failure gracefully,
  fixing Qemu crash. Besides fixing the crash, it allows to implement
  a cleaner virtio_gpu_virgl_deinit().

- virtio_gpu_virgl_deinit() now assumes that previously virgl was
  initialized successfully when it was inited at all. Suggested by
  Akihiko Odaki.

- Fixed missed freeing of print_stats timer in virtio_gpu_virgl_deinit()

- Added back blob unmapping or RESOURCE_UNREF that was requested
  by Akihiko Odaki. Added comment to the code explaining how
  async unmapping works. Added back `res->async_unmap_in_progress`
  flag and added comment telling why it's needed.

- Moved cmdq_resume_bh to VirtIOGPUGL and made coding style changes
  suggested by Akihiko Odaki.

- Added patches that move fence_poll and print_stats timers to VirtIOGPUGL
  for consistency with cmdq_resume_bh.

Changes from V10 to V11

- Replaced cmd_resume bool in struct ctrl_command with
  "cmd->finished + !VIRTIO_GPU_FLAG_FENCE" checking as was requested
  by Akihiko Odaki.

- Reworked virgl_cmd_resource_unmap/unref_blob() to avoid re-adding
  the 'async_unmap_in_progress' flag that was dropped in v9:

    1. virgl_cmd_resource_[un]map_blob() now doesn't check itself whether
       resource was previously mapped and lets virglrenderer to do the
       checking.

    2. error returned by virgl_renderer_resource_unmap() is now handled
       and reported properly, previously the error wasn't checked. The
       virgl_renderer_resource_unmap() fails if resource wasn't mapped.

    3. virgl_cmd_resource_unref_blob() now doesn't allow to unref resource
       that is mapped, it's a error condition if guest didn't unmap resource
       before doing the unref. Previously unref was implicitly unmapping
       resource.

Changes from V9 to V10

- Dropped 'async_unmap_in_progress' variable and switched to use
  aio_bh_new() isntead of oneshot variant in the "blob commands" patch.

- Further improved error messages by printing error code when actual error
  occurrs and using ERR_UNSPEC instead of ERR_ENOMEM when we don't really
  know if it was ENOMEM for sure.

- Added vdc->unrealize for the virtio GL device and freed virgl data.

- Dropped UUID and doc/migration patches. UUID feature isn't needed
  anymore, instead we changed Mesa Venus driver to not require UUID.

- Renamed virtio-gpu-gl "vulkan" property name back to "venus".

Changes from V8 to V9

- Added resuming of cmdq processing when hostmem MR is freed,
  as was suggested by Akihiko Odaki.

- Added more error messages, suggested by Akihiko Odaki

- Dropped superfluous 'res->async_unmap_completed', suggested
  by Akihiko Odaki.

- Kept using cmd->suspended flag. Akihiko Odaki suggested to make
  virtio_gpu_virgl_process_cmd() return false if cmd processing is
  suspended, but it's not easy to implement due to ubiquitous
  VIRTIO_GPU_FILL_CMD() macros that returns void, requiring to change
  all the virtio-gpu processing code.

- Added back virtio_gpu_virgl_resource as was requested by Akihiko Odaki,
  though I'm not convinced it's really needed.

- Switched to use GArray, renamed capset2_max_ver/size vars and moved
  "vulkan" property definition to the virtio-gpu-gl device in the Venus
  patch, like was suggested by Akihiko Odaki.

- Moved UUID to virtio_gpu_virgl_resource and dropped UUID save/restore
  since it will require bumping VM version and virgl device isn't miratable
  anyways.

- Fixed exposing UUID feature with Rutabaga

- Dropped linux-headers update patch because headers were already updated
  in Qemu/staging.

- Added patch that updates virtio migration doc with a note about virtio-gpu
  migration specifics, suggested by Akihiko Odaki.

- Addressed coding style issue noticed by Akihiko Odaki

Changes from V7 to V8

- Supported suspension of virtio-gpu commands processing and made
  unmapping of hostmem region asynchronous by blocking/suspending
  cmd processing until region is unmapped. Suggested by Akihiko Odaki.

- Fixed arm64 building of x86 targets using updated linux-headers.
  Corrected the update script. Thanks to Rob Clark for reporting
  the issue.

- Added new patch that makes registration of virgl capsets dynamic.
  Requested by Antonio Caggiano and Pierre-Eric Pelloux-Prayer.

- Venus capset now isn't advertised if Vulkan is disabled with vulkan=false

Changes from V6 to V7

- Used scripts/update-linux-headers.sh to update Qemu headers based
  on Linux v6.8-rc3 that adds Venus capset definition to virtio-gpu
  protocol, was requested by Peter Maydel

- Added r-bs that were given to v6 patches. Corrected missing s-o-bs

- Dropped context_init Qemu's virtio-gpu device configuration flag,
  was suggested by Marc-André Lureau

- Added missing error condition checks spotted by Marc-André Lureau
  and Akihiko Odaki, and few more

- Returned back res->mr referencing to memory_region_init_ram_ptr() like
  was suggested by Akihiko Odaki. Incorporated fix suggested by Pierre-Eric
  to specify the MR name

- Dropped the virgl_gpu_resource wrapper, cleaned up and simplified
  patch that adds blob-cmd support

- Fixed improper blob resource removal from resource list on resource_unref
  that was spotted by Akihiko Odaki

- Change order of the blob patches, was suggested by Akihiko Odaki.
  The cmd_set_scanout_blob support is enabled first

- Factored out patch that adds resource management support to virtio-gpu-gl,
  was requested by Marc-André Lureau

- Simplified and improved the UUID support patch, dropped the hash table
  as we don't need it for now. Moved QemuUUID to virtio_gpu_simple_resource.
  This all was suggested by Akihiko Odaki and Marc-André Lureau

- Dropped console_has_gl() check, suggested by Akihiko Odaki

- Reworked Meson cheking of libvirglrender features, made new features
  available based on virglrender pkgconfig version instead of checking
  symbols in header. This should fix build error using older virglrender
  version, reported by Alex Bennée

- Made enabling of Venus context configrable via new virtio-gpu device
  "vulkan=true" flag, suggested by Marc-André Lureau. The flag is disabled
  by default because it requires blob and hostmem options to be enabled
  and configured

Changes from V5 to V6

- Move macros configurations under virgl.found() and rename
  HAVE_VIRGL_CONTEXT_CREATE_WITH_FLAGS.

- Handle the case while context_init is disabled.

- Enable context_init by default.

- Move virtio_gpu_virgl_resource_unmap() into
  virgl_cmd_resource_unmap_blob().

- Introduce new struct virgl_gpu_resource to store virgl specific members.

- Remove erro handling of g_new0, because glib will abort() on OOM.

- Set resource uuid as option.

- Implement optional subsection of vmstate_virtio_gpu_resource_uuid_state
  for virtio live migration.

- Use g_int_hash/g_int_equal instead of the default

- Add scanout_blob function for virtio-gpu-virgl

- Resolve the memory leak on virtio-gpu-virgl

- Remove the unstable API flags check because virglrenderer is already 1.0

- Squash the render server flag support into "Initialize Venus"

Changes from V4 (virtio gpu V4) to V5

- Inverted patch 5 and 6 because we should configure
  HAVE_VIRGL_CONTEXT_INIT firstly.

- Validate owner of memory region to avoid slowing down DMA.

- Use memory_region_init_ram_ptr() instead of
  memory_region_init_ram_device_ptr().

- Adjust sequence to allocate gpu resource before virglrender resource
  creation

- Add virtio migration handling for uuid.

- Send kernel patch to define VIRTIO_GPU_CAPSET_VENUS.
  https://lore.kernel.org/lkml/20230915105918.3763061-1-ray.huang@amd.com/

- Add meson check to make sure unstable APIs defined from 0.9.0.

Changes from V1 to V2 (virtio gpu V4)

- Remove unused #include "hw/virtio/virtio-iommu.h"

- Add a local function, called virgl_resource_destroy(), that is used
  to release a vgpu resource on error paths and in resource_unref.

- Remove virtio_gpu_virgl_resource_unmap from
  virtio_gpu_cleanup_mapping(),
  since this function won't be called on blob resources and also because
  blob resources are unmapped via virgl_cmd_resource_unmap_blob().

- In virgl_cmd_resource_create_blob(), do proper cleanup in error paths
  and move QTAILQ_INSERT_HEAD(&g->reslist, res, next) after the resource
  has been fully initialized.

- Memory region has a different life-cycle from virtio gpu resources
  i.e. cannot be released synchronously along with the vgpu resource.
  So, here the field "region" was changed to a pointer and is allocated
  dynamically when the blob is mapped.
  Also, since the pointer can be used to indicate whether the blob
  is mapped, the explicite field "mapped" was removed.

- In virgl_cmd_resource_map_blob(), add check on the value of
  res->region, to prevent beeing called twice on the same resource.

- Add a patch to enable automatic deallocation of memory regions to resolve
  use-after-free memory corruption with a reference.


Antonio Caggiano (2):
  virtio-gpu: Handle resource blob commands
  virtio-gpu: Support Venus context

Dmitry Osipenko (8):
  virtio-gpu: Use trace events for tracking number of in-flight fences
  virtio-gpu: Move fence_poll timer to VirtIOGPUGL
  virtio-gpu: Move print_stats timer to VirtIOGPUGL
  virtio-gpu: Handle virtio_gpu_virgl_init() failure
  virtio-gpu: Unrealize GL device
  virtio-gpu: Use pkgconfig version to decide which virgl features are
    available
  virtio-gpu: Don't require udmabuf when blobs and virgl are enabled
  virtio-gpu: Support suspension of commands processing

Huang Rui (2):
  virtio-gpu: Support context-init feature with virglrenderer
  virtio-gpu: Add virgl resource management

Pierre-Eric Pelloux-Prayer (1):
  virtio-gpu: Register capsets dynamically

Robert Beckett (1):
  virtio-gpu: Support blob scanout using dmabuf fd

 hw/display/trace-events        |   3 +
 hw/display/virtio-gpu-gl.c     |  62 +++-
 hw/display/virtio-gpu-virgl.c  | 589 +++++++++++++++++++++++++++++++--
 hw/display/virtio-gpu.c        |  44 ++-
 include/hw/virtio/virtio-gpu.h |  32 +-
 meson.build                    |   5 +-
 6 files changed, 678 insertions(+), 57 deletions(-)

Comments

Alex Bennée June 19, 2024, 5:37 p.m. UTC | #1
Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:

> Hello,
>
> This series enables Vulkan Venus context support on virtio-gpu.
>
> All virglrender and almost all Linux kernel prerequisite changes
> needed by Venus are already in upstream. For kernel there is a pending
> KVM patchset that fixes mapping of compound pages needed for DRM drivers
> using TTM [1], othewrwise hostmem blob mapping will fail with a KVM error
> from Qemu.

So I've been experimenting with Aarch64 TCG with an Intel backend like
this:

./qemu-system-aarch64 \
           -M virt -cpu cortex-a76 \
           -device virtio-net-pci,netdev=unet \
           -netdev user,id=unet,hostfwd=tcp::2222-:22 \
           -m 8192 \
           -object memory-backend-memfd,id=mem,size=8G,share=on \
           -serial mon:stdio \
           -kernel ~/lsrc/linux.git/builds/arm64.initramfs/arch/arm64/boot/Image \
           -append "console=ttyAMA0" \
           -device qemu-xhci -device usb-kbd -device usb-tablet \
           -device virtio-gpu-gl-pci,blob=true,venus=true,hostmem=4G \
           -display sdl,gl=on -d plugin,guest_errors,trace:virtio_gpu_cmd_res_create_blob,trace:virtio_gpu_cmd_res_back_\*,trace:virtio_gpu_cmd_res_xfer_toh_3d,trace:virtio_gpu_cmd_res_xfer_fromh_3d,trace:address_space_map 

And I've noticed a couple of things. First trying to launch vkmark to
run a KMS mode test fails with:

  vkr_context_add_object: 5 -> 0x7f24b81d7198                                                                                                                                  
  address_space_map as:0x561b48ec48c0 addr 0x1008ac648:20 write:0 attrs:0x1                                                                                                    
  address_space_map as:0x561b48ec48c0 addr 0x109dc5be0:18 write:0 attrs:0x1                                                                                                    
  address_space_map as:0x561b48ec48c0 addr 0x1008ac668:18 write:1 attrs:0x1                                                                                                    
  vkr_context_add_object: 6 -> 0x7f24b81d7240                                                                                                                                  
  address_space_map as:0x561b48ec48c0 addr 0x1008ac648:20 write:0 attrs:0x1                                                                                                    
  address_space_map as:0x561b48ec48c0 addr 0x109dc5be0:18 write:0 attrs:0x1                                                                                                    
  address_space_map as:0x561b48ec48c0 addr 0x1008ac668:18 write:1 attrs:0x1                                                                                                    
  vkr_context_add_object: 7 -> 0x7f24b81d71e0                                                                                                                                  
  address_space_map as:0x561b48ec48c0 addr 0x1008ac648:48 write:0 attrs:0x1                                                                                                    
  address_space_map as:0x561b48ec48c0 addr 0x1008ac690:18 write:1 attrs:0x1                                                                                                    
  address_space_map as:0x561b48ec48c0 addr 0x1008ac570:20 write:0 attrs:0x1                                                                                                    
  address_space_map as:0x561b48ec48c0 addr 0x101d64300:40 write:0 attrs:0x1                                                                                                    
  address_space_map as:0x561b48ec48c0 addr 0x1008ac590:18 write:1 attrs:0x1                                                                                                    
  address_space_map as:0x561b48ec48c0 addr 0x1008ac720:20 write:0 attrs:0x1                                                                                                    
  address_space_map as:0x561b48ec48c0 addr 0x1008ac740:18 write:1 attrs:0x1                                                                                                    
  virtio_gpu_cmd_res_back_attach res 0x5, 4 entries                                                                                                                            
  address_space_map as:0x561b48ec48c0 addr 0x109fd5000:2b000 write:0 attrs:0x1                                                                                                 
  address_space_map as:0x561b48ec48c0 addr 0x102200000:100000 write:0 attrs:0x1                                                                                                
  address_space_map as:0x561b48ec48c0 addr 0x100e00000:200000 write:0 attrs:0x1                                                                                                
  address_space_map as:0x561b48ec48c0 addr 0x10a000000:bd000 write:0 attrs:0x1                                                                                                 
  address_space_map as:0x561b48ec48c0 addr 0x1008ac648:48 write:0 attrs:0x1                                                                                                    
  address_space_map as:0x561b48ec48c0 addr 0x1008ac690:18 write:1 attrs:0x1                                                                                                    
  address_space_map as:0x561b48ec48c0 addr 0x1008ac720:20 write:0 attrs:0x1                                                                                                    
  address_space_map as:0x561b48ec48c0 addr 0x109dc5be0:20 write:0 attrs:0x1                                                                                                    
  address_space_map as:0x561b48ec48c0 addr 0x1008ac740:18 write:1 attrs:0x1                                                                                                    
  address_space_map as:0x561b48ec48c0 addr 0x1008ac570:20 write:0 attrs:0x1
  address_space_map as:0x561b48ec48c0 addr 0x1008ac590:18 write:1 attrs:0x1
  virtio_gpu_cmd_res_back_attach res 0x6, 2 entries                        
  address_space_map as:0x561b48ec48c0 addr 0x10a0bd000:343000 write:0 attrs:0x1
  address_space_map as:0x561b48ec48c0 addr 0x11a800000:a5000 write:0 attrs:0x1
  address_space_map as:0x561b48ec48c0 addr 0x1008ac648:20 write:0 attrs:0x1
  address_space_map as:0x561b48ec48c0 addr 0x109dc5be0:18 write:0 attrs:0x1 
  address_space_map as:0x561b48ec48c0 addr 0x1008ac668:18 write:1 attrs:0x1
  vkr_context_add_object: 8 -> 0x7f24b81d7180                              
  address_space_map as:0x561b48ec48c0 addr 0x1008ac648:20 write:0 attrs:0x1
  address_space_map as:0x561b48ec48c0 addr 0x109dc5be0:18 write:0 attrs:0x1   
  address_space_map as:0x561b48ec48c0 addr 0x1008ac668:18 write:1 attrs:0x1
  address_space_map as:0x561b48ec48c0 addr 0x1008ac648:20 write:0 attrs:0x1
  address_space_map as:0x561b48ec48c0 addr 0x1008ac668:18 write:1 attrs:0x1
  address_space_map as:0x561b48ec48c0 addr 0x1008ac648:20 write:0 attrs:0x1    
  address_space_map as:0x561b48ec48c0 addr 0x109dc5be0:18 write:0 attrs:0x1
  address_space_map as:0x561b48ec48c0 addr 0x1008ac668:18 write:1 attrs:0x1
  address_space_map as:0x561b48ec48c0 addr 0x1008ac648:20 write:0 attrs:0x1
  address_space_map as:0x561b48ec48c0 addr 0x109dc5be0:18 write:0 attrs:0x1
  address_space_map as:0x561b48ec48c0 addr 0x1008ac668:18 write:1 attrs:0x1
  virgl_render_server[1875931]: vkr: failed to import resource: invalid res_id 5
  virgl_render_server[1875931]: vkr: vkAllocateMemory resulted in CS error 
  virgl_render_server[1875931]: vkr: ring_submit_cmd: vn_dispatch_command failed

More interestingly when shutting stuff down we see weirdness like:

  address_space_map as:0x561b48ec48c0 addr 0x1008ac4b0:18 write:1 attrs:0x1                                                                                                    
  virgl_render_server[1875931]: vkr: destroying context 3 (vkmark) with a valid instance                                                                                       
  virgl_render_server[1875931]: vkr: destroying device with valid objects                                                                                                      
  vkr_context_remove_object: -7438602987017907480                                                                                                                              
  vkr_context_remove_object: 7                                                                                                                                                 
  vkr_context_remove_object: 5       

which indicates something has gone very wrong. I'm not super familiar
with the memory allocation patterns but should stuff that is done as
virtio_gpu_cmd_res_back_attach() be find-able in the list of resources?

I tried running under RR to further debug but weirdly I can't get
working graphics with that. I did try running under threadsan which
complained about a potential data race:

  vkr_context_add_object: 1 -> 0x7b2c00000288
  vkr_context_add_object: 2 -> 0x7b2c00000270
  vkr_context_add_object: 3 -> 0x7b3800007f28
  vkr_context_add_object: 4 -> 0x7b3800007fa0
  vkr_context_add_object: 5 -> 0x7b48000103f8
  vkr_context_add_object: 6 -> 0x7b48000104a0
  vkr_context_add_object: 7 -> 0x7b4800010440
  virtio_gpu_cmd_res_back_attach res 0x5
  virtio_gpu_cmd_res_back_attach res 0x6
  vkr_context_add_object: 8 -> 0x7b48000103e0
  virgl_render_server[1751430]: vkr: failed to import resource: invalid res_id 5
  virgl_render_server[1751430]: vkr: vkAllocateMemory resulted in CS error
  virgl_render_server[1751430]: vkr: ring_submit_cmd: vn_dispatch_command failed
  ==================
  WARNING: ThreadSanitizer: data race (pid=1751256)
    Read of size 8 at 0x7f7fa0ea9138 by main thread (mutexes: write M0):
      #0 memcpy <null> (qemu-system-aarch64+0x41fede) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #1 iov_to_buf_full /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/iov.c:51:13 (qemu-system-aarch64+0x19839cf) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #2 iov_to_buf /home/alex/lsrc/qemu.git/include/qemu/iov.h:62:16 (qemu-system-aarch64+0xe3db91) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #3 virtio_gpu_virgl_process_cmd /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/display/virtio-gpu-virgl.c:914:5 (qemu-system-aarch64+0xe3d178) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #4 virtio_gpu_process_cmdq /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/display/virtio-gpu.c:1055:9 (qemu-system-aarch64+0xe308ca) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #5 virtio_gpu_gl_handle_ctrl /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/display/virtio-gpu-gl.c:100:5 (qemu-system-aarch64+0xe3c8fd) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #6 virtio_gpu_ctrl_bh /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/display/virtio-gpu.c:1134:5 (qemu-system-aarch64+0xe3173a) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #7 aio_bh_call /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/async.c:171:5 (qemu-system-aarch64+0x19643e7) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #8 aio_bh_poll /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/async.c:218:13 (qemu-system-aarch64+0x1964723) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #9 aio_dispatch /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/aio-posix.c:423:5 (qemu-system-aarch64+0x192ab55) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #10 aio_ctx_dispatch /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/async.c:360:5 (qemu-system-aarch64+0x1966d94) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #11 g_main_context_dispatch <null> (libglib-2.0.so.0+0x547a8) (BuildId: 9f90bd7bbfcf84a1f1c5a6102f70e6264837b9d4)
      #12 os_host_main_loop_wait /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/main-loop.c:310:5 (qemu-system-aarch64+0x1967f14) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #13 main_loop_wait /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/main-loop.c:589:11 (qemu-system-aarch64+0x1967d78) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #14 qemu_main_loop /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/runstate.c:795:9 (qemu-system-aarch64+0xce803c) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #15 qemu_default_main /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/main.c:37:14 (qemu-system-aarch64+0x1583c15) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #16 main /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/main.c:48:12 (qemu-system-aarch64+0x1583c8a) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)

    Previous write of size 8 at 0x7f7fa0ea9138 by thread T3:
      #0 memset <null> (qemu-system-aarch64+0x41fbdd) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #1 helper_dc_zva /home/alex/lsrc/qemu.git/builds/system.threadsan/../../target/arm/tcg/helper-a64.c:974:5 (qemu-system-aarch64+0x1305506) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #2 <null> <null> (0x7f82e14300aa)
      #3 cpu_loop_exec_tb /home/alex/lsrc/qemu.git/builds/system.threadsan/../../accel/tcg/cpu-exec.c:917:10 (qemu-system-aarch64+0x15383fd) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #4 cpu_exec_loop /home/alex/lsrc/qemu.git/builds/system.threadsan/../../accel/tcg/cpu-exec.c:1031:13 (qemu-system-aarch64+0x153746f) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #5 cpu_exec_setjmp /home/alex/lsrc/qemu.git/builds/system.threadsan/../../accel/tcg/cpu-exec.c:1048:12 (qemu-system-aarch64+0x15354db) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #6 cpu_exec /home/alex/lsrc/qemu.git/builds/system.threadsan/../../accel/tcg/cpu-exec.c:1074:11 (qemu-system-aarch64+0x1535040) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #7 tcg_cpu_exec /home/alex/lsrc/qemu.git/builds/system.threadsan/../../accel/tcg/tcg-accel-ops.c:78:11 (qemu-system-aarch64+0x157aabe) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #8 mttcg_cpu_thread_fn /home/alex/lsrc/qemu.git/builds/system.threadsan/../../accel/tcg/tcg-accel-ops-mttcg.c:95:17 (qemu-system-aarch64+0x157bb7b) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #9 qemu_thread_start /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/qemu-thread-posix.c:541:9 (qemu-system-aarch64+0x19363d1) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)

    Mutex M0 (0x562c55066680) created at:
      #0 pthread_mutex_init <null> (qemu-system-aarch64+0x41746f) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #1 qemu_mutex_init /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/qemu-thread-posix.c:71:11 (qemu-system-aarch64+0x19345d9) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #2 qemu_init_cpu_loop /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/cpus.c:423:5 (qemu-system-aarch64+0xccd145) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #3 qemu_init_subsystems /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/runstate.c:825:5 (qemu-system-aarch64+0xce83a7) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #4 qemu_init /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/vl.c:2795:5 (qemu-system-aarch64+0xcea0f3) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #5 main /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/main.c:47:5 (qemu-system-aarch64+0x1583c78) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)

    Thread T3 'CPU 0/TCG' (tid=1751259, running) created by main thread at:
      #0 pthread_create <null> (qemu-system-aarch64+0x415c6d) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #1 qemu_thread_create /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/qemu-thread-posix.c:581:11 (qemu-system-aarch64+0x193619b) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #2 mttcg_start_vcpu_thread /home/alex/lsrc/qemu.git/builds/system.threadsan/../../accel/tcg/tcg-accel-ops-mttcg.c:144:5 (qemu-system-aarch64+0x157b940) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #3 qemu_init_vcpu /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/cpus.c:680:5 (qemu-system-aarch64+0xcce7a0) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #4 arm_cpu_realizefn /home/alex/lsrc/qemu.git/builds/system.threadsan/../../target/arm/cpu.c:2588:5 (qemu-system-aarch64+0xfd9023) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #5 device_set_realized /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/core/qdev.c:510:13 (qemu-system-aarch64+0x158e205) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #6 property_set_bool /home/alex/lsrc/qemu.git/builds/system.threadsan/../../qom/object.c:2354:5 (qemu-system-aarch64+0x159dffb) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #7 object_property_set /home/alex/lsrc/qemu.git/builds/system.threadsan/../../qom/object.c:1463:5 (qemu-system-aarch64+0x159a6a7) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #8 object_property_set_qobject /home/alex/lsrc/qemu.git/builds/system.threadsan/../../qom/qom-qobject.c:28:10 (qemu-system-aarch64+0x15a35e2) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #9 object_property_set_bool /home/alex/lsrc/qemu.git/builds/system.threadsan/../../qom/object.c:1533:15 (qemu-system-aarch64+0x159b00e) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #10 qdev_realize /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/core/qdev.c:291:12 (qemu-system-aarch64+0x158b989) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #11 machvirt_init /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/arm/virt.c:2295:9 (qemu-system-aarch64+0xef0028) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #12 machine_run_board_init /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/core/machine.c:1583:5 (qemu-system-aarch64+0x607380) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #13 qemu_init_board /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/vl.c:2621:5 (qemu-system-aarch64+0xce9b30) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #14 qmp_x_exit_preconfig /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/vl.c:2713:5 (qemu-system-aarch64+0xce98df) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #15 qemu_init /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/vl.c:3759:9 (qemu-system-aarch64+0xced7ca) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #16 main /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/main.c:47:5 (qemu-system-aarch64+0x1583c78) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)

  SUMMARY: ThreadSanitizer: data race (/home/alex/lsrc/qemu.git/builds/system.threadsan/qemu-system-aarch64+0x41fede) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) in __interceptor_memcpy
  ==================
  ==================
  WARNING: ThreadSanitizer: data race (pid=1751256)
    Write of size 8 at 0x7f7fa0ea9158 by main thread (mutexes: write M0):
      #0 memcpy <null> (qemu-system-aarch64+0x41fede) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #1 iov_from_buf_full /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/iov.c:32:13 (qemu-system-aarch64+0x1983771) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #2 iov_from_buf /home/alex/lsrc/qemu.git/include/qemu/iov.h:49:16 (qemu-system-aarch64+0xe2c471) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #3 virtio_gpu_ctrl_response /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/display/virtio-gpu.c:173:9 (qemu-system-aarch64+0xe2c2bb) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #4 virtio_gpu_ctrl_response_nodata /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/display/virtio-gpu.c:192:5 (qemu-system-aarch64+0xe2c5b1) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #5 virtio_gpu_virgl_process_cmd /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/display/virtio-gpu-virgl.c:1006:9 (qemu-system-aarch64+0xe3da2a) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #6 virtio_gpu_process_cmdq /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/display/virtio-gpu.c:1055:9 (qemu-system-aarch64+0xe308ca) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #7 virtio_gpu_gl_handle_ctrl /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/display/virtio-gpu-gl.c:100:5 (qemu-system-aarch64+0xe3c8fd) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #8 virtio_gpu_ctrl_bh /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/display/virtio-gpu.c:1134:5 (qemu-system-aarch64+0xe3173a) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #9 aio_bh_call /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/async.c:171:5 (qemu-system-aarch64+0x19643e7) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #10 aio_bh_poll /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/async.c:218:13 (qemu-system-aarch64+0x1964723) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #11 aio_dispatch /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/aio-posix.c:423:5 (qemu-system-aarch64+0x192ab55) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #12 aio_ctx_dispatch /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/async.c:360:5 (qemu-system-aarch64+0x1966d94) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #13 g_main_context_dispatch <null> (libglib-2.0.so.0+0x547a8) (BuildId: 9f90bd7bbfcf84a1f1c5a6102f70e6264837b9d4)
      #14 os_host_main_loop_wait /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/main-loop.c:310:5 (qemu-system-aarch64+0x1967f14) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #15 main_loop_wait /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/main-loop.c:589:11 (qemu-system-aarch64+0x1967d78) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #16 qemu_main_loop /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/runstate.c:795:9 (qemu-system-aarch64+0xce803c) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #17 qemu_default_main /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/main.c:37:14 (qemu-system-aarch64+0x1583c15) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #18 main /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/main.c:48:12 (qemu-system-aarch64+0x1583c8a) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)

    Previous write of size 8 at 0x7f7fa0ea9158 by thread T3:
      #0 memset <null> (qemu-system-aarch64+0x41fbdd) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #1 helper_dc_zva /home/alex/lsrc/qemu.git/builds/system.threadsan/../../target/arm/tcg/helper-a64.c:974:5 (qemu-system-aarch64+0x1305506) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #2 <null> <null> (0x7f82e1360e74)
      #3 cpu_loop_exec_tb /home/alex/lsrc/qemu.git/builds/system.threadsan/../../accel/tcg/cpu-exec.c:917:10 (qemu-system-aarch64+0x15383fd) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #4 cpu_exec_loop /home/alex/lsrc/qemu.git/builds/system.threadsan/../../accel/tcg/cpu-exec.c:1031:13 (qemu-system-aarch64+0x153746f) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #5 cpu_exec_setjmp /home/alex/lsrc/qemu.git/builds/system.threadsan/../../accel/tcg/cpu-exec.c:1048:12 (qemu-system-aarch64+0x15354db) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #6 cpu_exec /home/alex/lsrc/qemu.git/builds/system.threadsan/../../accel/tcg/cpu-exec.c:1074:11 (qemu-system-aarch64+0x1535040) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #7 tcg_cpu_exec /home/alex/lsrc/qemu.git/builds/system.threadsan/../../accel/tcg/tcg-accel-ops.c:78:11 (qemu-system-aarch64+0x157aabe) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #8 mttcg_cpu_thread_fn /home/alex/lsrc/qemu.git/builds/system.threadsan/../../accel/tcg/tcg-accel-ops-mttcg.c:95:17 (qemu-system-aarch64+0x157bb7b) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #9 qemu_thread_start /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/qemu-thread-posix.c:541:9 (qemu-system-aarch64+0x19363d1) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)

    Mutex M0 (0x562c55066680) created at:
      #0 pthread_mutex_init <null> (qemu-system-aarch64+0x41746f) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #1 qemu_mutex_init /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/qemu-thread-posix.c:71:11 (qemu-system-aarch64+0x19345d9) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #2 qemu_init_cpu_loop /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/cpus.c:423:5 (qemu-system-aarch64+0xccd145) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #3 qemu_init_subsystems /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/runstate.c:825:5 (qemu-system-aarch64+0xce83a7) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #4 qemu_init /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/vl.c:2795:5 (qemu-system-aarch64+0xcea0f3) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #5 main /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/main.c:47:5 (qemu-system-aarch64+0x1583c78) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)

    Thread T3 'CPU 0/TCG' (tid=1751259, running) created by main thread at:
      #0 pthread_create <null> (qemu-system-aarch64+0x415c6d) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #1 qemu_thread_create /home/alex/lsrc/qemu.git/builds/system.threadsan/../../util/qemu-thread-posix.c:581:11 (qemu-system-aarch64+0x193619b) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #2 mttcg_start_vcpu_thread /home/alex/lsrc/qemu.git/builds/system.threadsan/../../accel/tcg/tcg-accel-ops-mttcg.c:144:5 (qemu-system-aarch64+0x157b940) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #3 qemu_init_vcpu /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/cpus.c:680:5 (qemu-system-aarch64+0xcce7a0) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #4 arm_cpu_realizefn /home/alex/lsrc/qemu.git/builds/system.threadsan/../../target/arm/cpu.c:2588:5 (qemu-system-aarch64+0xfd9023) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #5 device_set_realized /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/core/qdev.c:510:13 (qemu-system-aarch64+0x158e205) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #6 property_set_bool /home/alex/lsrc/qemu.git/builds/system.threadsan/../../qom/object.c:2354:5 (qemu-system-aarch64+0x159dffb) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #7 object_property_set /home/alex/lsrc/qemu.git/builds/system.threadsan/../../qom/object.c:1463:5 (qemu-system-aarch64+0x159a6a7) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #8 object_property_set_qobject /home/alex/lsrc/qemu.git/builds/system.threadsan/../../qom/qom-qobject.c:28:10 (qemu-system-aarch64+0x15a35e2) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #9 object_property_set_bool /home/alex/lsrc/qemu.git/builds/system.threadsan/../../qom/object.c:1533:15 (qemu-system-aarch64+0x159b00e) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #10 qdev_realize /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/core/qdev.c:291:12 (qemu-system-aarch64+0x158b989) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #11 machvirt_init /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/arm/virt.c:2295:9 (qemu-system-aarch64+0xef0028) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #12 machine_run_board_init /home/alex/lsrc/qemu.git/builds/system.threadsan/../../hw/core/machine.c:1583:5 (qemu-system-aarch64+0x607380) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #13 qemu_init_board /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/vl.c:2621:5 (qemu-system-aarch64+0xce9b30) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #14 qmp_x_exit_preconfig /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/vl.c:2713:5 (qemu-system-aarch64+0xce98df) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #15 qemu_init /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/vl.c:3759:9 (qemu-system-aarch64+0xced7ca) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
      #16 main /home/alex/lsrc/qemu.git/builds/system.threadsan/../../system/main.c:47:5 (qemu-system-aarch64+0x1583c78) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)

  SUMMARY: ThreadSanitizer: data race (/home/alex/lsrc/qemu.git/builds/system.threadsan/qemu-system-aarch64+0x41fede) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) in __interceptor_memcpy

This could be a false positive or it could be a race between the guest
kernel clearing memory while we are still doing
virtio_gpu_ctrl_response.

What do you think?


>
> [1] https://lore.kernel.org/kvm/20240229025759.1187910-1-stevensd@google.com/
>
> You'll need to use recent Mesa version containing patch that removes
> dependency on cross-device feature from Venus that isn't supported by
> Qemu [2].
>
> [2] https://gitlab.freedesktop.org/mesa/mesa/-/commit/087e9a96d13155e26987befae78b6ccbb7ae242b
>
> Example Qemu cmdline that enables Venus:
>
>   qemu-system-x86_64 -device virtio-vga-gl,hostmem=4G,blob=true,venus=true \
>       -machine q35,accel=kvm,memory-backend=mem1 \
>       -object memory-backend-memfd,id=mem1,size=8G -m 8G
>
>
> Changes from V13 to V14
>
> - Fixed erronous fall-through in renderer_state's switch-case that was
>   spotted by Marc-André Lureau.
>
> - Reworked HOSTMEM_MR_FINISH_UNMAPPING handling as was suggested by
>   Akihiko Odaki. Now it shares the same code path with HOSTMEM_MR_MAPPED.
>
> - Made use of g_autofree in virgl_cmd_resource_create_blob() as was
>   suggested by Akihiko Odaki.
>
> - Removed virtio_gpu_virgl_deinit() and moved all deinit code to
>   virtio_gpu_gl_device_unrealize() as was suggested by Marc-André Lureau.
>
> - Replaced HAVE_FEATURE in mseon.build with virglrenderer's VERSION_MAJOR
>   check as was suggested by Marc-André Lureau.
>
> - Added trace event for cmd-suspension as was suggested by Marc-André Lureau.
>
> - Added patch to replace in-flight printf's with trace events as was
>   suggested by Marc-André Lureau
>
> Changes from V12 to V13
>
> - Replaced `res->async_unmap_in_progress` flag with a mapping state,
>   moved it to the virtio_gpu_virgl_hostmem_region like was suggested
>   by Akihiko Odaki.
>
> - Renamed blob_unmap function and added back cmd_suspended argument
>   to it. Suggested by Akihiko Odaki.
>
> - Reordered VirtIOGPUGL refactoring patches to minimize code changes
>   like was suggested by Akihiko Odaki.
>
> - Replaced gl->renderer_inited with gl->renderer_state, like was suggested
>   by Alex Bennée.
>
> - Added gl->renderer state resetting to gl_device_unrealize(), for
>   consistency. Suggested by Alex Bennée.
>
> - Added rb's from Alex and Manos.
>
> - Fixed compiling with !HAVE_VIRGL_RESOURCE_BLOB.
>
> Changes from V11 to V12
>
> - Fixed virgl_cmd_resource_create_blob() error handling. Now it doesn't
>   corrupt resource list and releases resource properly on error. Thanks
>   to Akihiko Odaki for spotting the bug.
>
> - Added new patch that handles virtio_gpu_virgl_init() failure gracefully,
>   fixing Qemu crash. Besides fixing the crash, it allows to implement
>   a cleaner virtio_gpu_virgl_deinit().
>
> - virtio_gpu_virgl_deinit() now assumes that previously virgl was
>   initialized successfully when it was inited at all. Suggested by
>   Akihiko Odaki.
>
> - Fixed missed freeing of print_stats timer in virtio_gpu_virgl_deinit()
>
> - Added back blob unmapping or RESOURCE_UNREF that was requested
>   by Akihiko Odaki. Added comment to the code explaining how
>   async unmapping works. Added back `res->async_unmap_in_progress`
>   flag and added comment telling why it's needed.
>
> - Moved cmdq_resume_bh to VirtIOGPUGL and made coding style changes
>   suggested by Akihiko Odaki.
>
> - Added patches that move fence_poll and print_stats timers to VirtIOGPUGL
>   for consistency with cmdq_resume_bh.
>
> Changes from V10 to V11
>
> - Replaced cmd_resume bool in struct ctrl_command with
>   "cmd->finished + !VIRTIO_GPU_FLAG_FENCE" checking as was requested
>   by Akihiko Odaki.
>
> - Reworked virgl_cmd_resource_unmap/unref_blob() to avoid re-adding
>   the 'async_unmap_in_progress' flag that was dropped in v9:
>
>     1. virgl_cmd_resource_[un]map_blob() now doesn't check itself whether
>        resource was previously mapped and lets virglrenderer to do the
>        checking.
>
>     2. error returned by virgl_renderer_resource_unmap() is now handled
>        and reported properly, previously the error wasn't checked. The
>        virgl_renderer_resource_unmap() fails if resource wasn't mapped.
>
>     3. virgl_cmd_resource_unref_blob() now doesn't allow to unref resource
>        that is mapped, it's a error condition if guest didn't unmap resource
>        before doing the unref. Previously unref was implicitly unmapping
>        resource.
>
> Changes from V9 to V10
>
> - Dropped 'async_unmap_in_progress' variable and switched to use
>   aio_bh_new() isntead of oneshot variant in the "blob commands" patch.
>
> - Further improved error messages by printing error code when actual error
>   occurrs and using ERR_UNSPEC instead of ERR_ENOMEM when we don't really
>   know if it was ENOMEM for sure.
>
> - Added vdc->unrealize for the virtio GL device and freed virgl data.
>
> - Dropped UUID and doc/migration patches. UUID feature isn't needed
>   anymore, instead we changed Mesa Venus driver to not require UUID.
>
> - Renamed virtio-gpu-gl "vulkan" property name back to "venus".
>
> Changes from V8 to V9
>
> - Added resuming of cmdq processing when hostmem MR is freed,
>   as was suggested by Akihiko Odaki.
>
> - Added more error messages, suggested by Akihiko Odaki
>
> - Dropped superfluous 'res->async_unmap_completed', suggested
>   by Akihiko Odaki.
>
> - Kept using cmd->suspended flag. Akihiko Odaki suggested to make
>   virtio_gpu_virgl_process_cmd() return false if cmd processing is
>   suspended, but it's not easy to implement due to ubiquitous
>   VIRTIO_GPU_FILL_CMD() macros that returns void, requiring to change
>   all the virtio-gpu processing code.
>
> - Added back virtio_gpu_virgl_resource as was requested by Akihiko Odaki,
>   though I'm not convinced it's really needed.
>
> - Switched to use GArray, renamed capset2_max_ver/size vars and moved
>   "vulkan" property definition to the virtio-gpu-gl device in the Venus
>   patch, like was suggested by Akihiko Odaki.
>
> - Moved UUID to virtio_gpu_virgl_resource and dropped UUID save/restore
>   since it will require bumping VM version and virgl device isn't miratable
>   anyways.
>
> - Fixed exposing UUID feature with Rutabaga
>
> - Dropped linux-headers update patch because headers were already updated
>   in Qemu/staging.
>
> - Added patch that updates virtio migration doc with a note about virtio-gpu
>   migration specifics, suggested by Akihiko Odaki.
>
> - Addressed coding style issue noticed by Akihiko Odaki
>
> Changes from V7 to V8
>
> - Supported suspension of virtio-gpu commands processing and made
>   unmapping of hostmem region asynchronous by blocking/suspending
>   cmd processing until region is unmapped. Suggested by Akihiko Odaki.
>
> - Fixed arm64 building of x86 targets using updated linux-headers.
>   Corrected the update script. Thanks to Rob Clark for reporting
>   the issue.
>
> - Added new patch that makes registration of virgl capsets dynamic.
>   Requested by Antonio Caggiano and Pierre-Eric Pelloux-Prayer.
>
> - Venus capset now isn't advertised if Vulkan is disabled with vulkan=false
>
> Changes from V6 to V7
>
> - Used scripts/update-linux-headers.sh to update Qemu headers based
>   on Linux v6.8-rc3 that adds Venus capset definition to virtio-gpu
>   protocol, was requested by Peter Maydel
>
> - Added r-bs that were given to v6 patches. Corrected missing s-o-bs
>
> - Dropped context_init Qemu's virtio-gpu device configuration flag,
>   was suggested by Marc-André Lureau
>
> - Added missing error condition checks spotted by Marc-André Lureau
>   and Akihiko Odaki, and few more
>
> - Returned back res->mr referencing to memory_region_init_ram_ptr() like
>   was suggested by Akihiko Odaki. Incorporated fix suggested by Pierre-Eric
>   to specify the MR name
>
> - Dropped the virgl_gpu_resource wrapper, cleaned up and simplified
>   patch that adds blob-cmd support
>
> - Fixed improper blob resource removal from resource list on resource_unref
>   that was spotted by Akihiko Odaki
>
> - Change order of the blob patches, was suggested by Akihiko Odaki.
>   The cmd_set_scanout_blob support is enabled first
>
> - Factored out patch that adds resource management support to virtio-gpu-gl,
>   was requested by Marc-André Lureau
>
> - Simplified and improved the UUID support patch, dropped the hash table
>   as we don't need it for now. Moved QemuUUID to virtio_gpu_simple_resource.
>   This all was suggested by Akihiko Odaki and Marc-André Lureau
>
> - Dropped console_has_gl() check, suggested by Akihiko Odaki
>
> - Reworked Meson cheking of libvirglrender features, made new features
>   available based on virglrender pkgconfig version instead of checking
>   symbols in header. This should fix build error using older virglrender
>   version, reported by Alex Bennée
>
> - Made enabling of Venus context configrable via new virtio-gpu device
>   "vulkan=true" flag, suggested by Marc-André Lureau. The flag is disabled
>   by default because it requires blob and hostmem options to be enabled
>   and configured
>
> Changes from V5 to V6
>
> - Move macros configurations under virgl.found() and rename
>   HAVE_VIRGL_CONTEXT_CREATE_WITH_FLAGS.
>
> - Handle the case while context_init is disabled.
>
> - Enable context_init by default.
>
> - Move virtio_gpu_virgl_resource_unmap() into
>   virgl_cmd_resource_unmap_blob().
>
> - Introduce new struct virgl_gpu_resource to store virgl specific members.
>
> - Remove erro handling of g_new0, because glib will abort() on OOM.
>
> - Set resource uuid as option.
>
> - Implement optional subsection of vmstate_virtio_gpu_resource_uuid_state
>   for virtio live migration.
>
> - Use g_int_hash/g_int_equal instead of the default
>
> - Add scanout_blob function for virtio-gpu-virgl
>
> - Resolve the memory leak on virtio-gpu-virgl
>
> - Remove the unstable API flags check because virglrenderer is already 1.0
>
> - Squash the render server flag support into "Initialize Venus"
>
> Changes from V4 (virtio gpu V4) to V5
>
> - Inverted patch 5 and 6 because we should configure
>   HAVE_VIRGL_CONTEXT_INIT firstly.
>
> - Validate owner of memory region to avoid slowing down DMA.
>
> - Use memory_region_init_ram_ptr() instead of
>   memory_region_init_ram_device_ptr().
>
> - Adjust sequence to allocate gpu resource before virglrender resource
>   creation
>
> - Add virtio migration handling for uuid.
>
> - Send kernel patch to define VIRTIO_GPU_CAPSET_VENUS.
>   https://lore.kernel.org/lkml/20230915105918.3763061-1-ray.huang@amd.com/
>
> - Add meson check to make sure unstable APIs defined from 0.9.0.
>
> Changes from V1 to V2 (virtio gpu V4)
>
> - Remove unused #include "hw/virtio/virtio-iommu.h"
>
> - Add a local function, called virgl_resource_destroy(), that is used
>   to release a vgpu resource on error paths and in resource_unref.
>
> - Remove virtio_gpu_virgl_resource_unmap from
>   virtio_gpu_cleanup_mapping(),
>   since this function won't be called on blob resources and also because
>   blob resources are unmapped via virgl_cmd_resource_unmap_blob().
>
> - In virgl_cmd_resource_create_blob(), do proper cleanup in error paths
>   and move QTAILQ_INSERT_HEAD(&g->reslist, res, next) after the resource
>   has been fully initialized.
>
> - Memory region has a different life-cycle from virtio gpu resources
>   i.e. cannot be released synchronously along with the vgpu resource.
>   So, here the field "region" was changed to a pointer and is allocated
>   dynamically when the blob is mapped.
>   Also, since the pointer can be used to indicate whether the blob
>   is mapped, the explicite field "mapped" was removed.
>
> - In virgl_cmd_resource_map_blob(), add check on the value of
>   res->region, to prevent beeing called twice on the same resource.
>
> - Add a patch to enable automatic deallocation of memory regions to resolve
>   use-after-free memory corruption with a reference.
>
>
> Antonio Caggiano (2):
>   virtio-gpu: Handle resource blob commands
>   virtio-gpu: Support Venus context
>
> Dmitry Osipenko (8):
>   virtio-gpu: Use trace events for tracking number of in-flight fences
>   virtio-gpu: Move fence_poll timer to VirtIOGPUGL
>   virtio-gpu: Move print_stats timer to VirtIOGPUGL
>   virtio-gpu: Handle virtio_gpu_virgl_init() failure
>   virtio-gpu: Unrealize GL device
>   virtio-gpu: Use pkgconfig version to decide which virgl features are
>     available
>   virtio-gpu: Don't require udmabuf when blobs and virgl are enabled
>   virtio-gpu: Support suspension of commands processing
>
> Huang Rui (2):
>   virtio-gpu: Support context-init feature with virglrenderer
>   virtio-gpu: Add virgl resource management
>
> Pierre-Eric Pelloux-Prayer (1):
>   virtio-gpu: Register capsets dynamically
>
> Robert Beckett (1):
>   virtio-gpu: Support blob scanout using dmabuf fd
>
>  hw/display/trace-events        |   3 +
>  hw/display/virtio-gpu-gl.c     |  62 +++-
>  hw/display/virtio-gpu-virgl.c  | 589 +++++++++++++++++++++++++++++++--
>  hw/display/virtio-gpu.c        |  44 ++-
>  include/hw/virtio/virtio-gpu.h |  32 +-
>  meson.build                    |   5 +-
>  6 files changed, 678 insertions(+), 57 deletions(-)
Dmitry Osipenko June 20, 2024, 5:34 p.m. UTC | #2
On 6/19/24 20:37, Alex Bennée wrote:
> So I've been experimenting with Aarch64 TCG with an Intel backend like
> this:
> 
> ./qemu-system-aarch64 \
>            -M virt -cpu cortex-a76 \
>            -device virtio-net-pci,netdev=unet \
>            -netdev user,id=unet,hostfwd=tcp::2222-:22 \
>            -m 8192 \
>            -object memory-backend-memfd,id=mem,size=8G,share=on \
>            -serial mon:stdio \
>            -kernel ~/lsrc/linux.git/builds/arm64.initramfs/arch/arm64/boot/Image \
>            -append "console=ttyAMA0" \
>            -device qemu-xhci -device usb-kbd -device usb-tablet \
>            -device virtio-gpu-gl-pci,blob=true,venus=true,hostmem=4G \
>            -display sdl,gl=on -d plugin,guest_errors,trace:virtio_gpu_cmd_res_create_blob,trace:virtio_gpu_cmd_res_back_\*,trace:virtio_gpu_cmd_res_xfer_toh_3d,trace:virtio_gpu_cmd_res_xfer_fromh_3d,trace:address_space_map 
> 
> And I've noticed a couple of things. First trying to launch vkmark to
> run a KMS mode test fails with:
> 
...
>   virgl_render_server[1875931]: vkr: failed to import resource: invalid res_id 5
>   virgl_render_server[1875931]: vkr: vkAllocateMemory resulted in CS error 
>   virgl_render_server[1875931]: vkr: ring_submit_cmd: vn_dispatch_command failed
> 
> More interestingly when shutting stuff down we see weirdness like:
> 
>   address_space_map as:0x561b48ec48c0 addr 0x1008ac4b0:18 write:1 attrs:0x1                                                                                                    
>   virgl_render_server[1875931]: vkr: destroying context 3 (vkmark) with a valid instance                                                                                       
>   virgl_render_server[1875931]: vkr: destroying device with valid objects                                                                                                      
>   vkr_context_remove_object: -7438602987017907480                                                                                                                              
>   vkr_context_remove_object: 7                                                                                                                                                 
>   vkr_context_remove_object: 5       
> 
> which indicates something has gone very wrong. I'm not super familiar
> with the memory allocation patterns but should stuff that is done as
> virtio_gpu_cmd_res_back_attach() be find-able in the list of resources?

This is expected to fail. Vkmark creates shmem virgl GBM FB BO on guest
that isn't exportable on host. AFAICT, more code changes should be
needed to support this case.

Note that "destroying device with valid objects" msg is fine, won't hurt
to silence it in Venus to avoid confusion. It will happen every time
guest application is closed without explicitly releasing every VK object.

> I tried running under RR to further debug but weirdly I can't get
> working graphics with that. I did try running under threadsan which
> complained about a potential data race:
> 
>   vkr_context_add_object: 1 -> 0x7b2c00000288
>   vkr_context_add_object: 2 -> 0x7b2c00000270
>   vkr_context_add_object: 3 -> 0x7b3800007f28
>   vkr_context_add_object: 4 -> 0x7b3800007fa0
>   vkr_context_add_object: 5 -> 0x7b48000103f8
>   vkr_context_add_object: 6 -> 0x7b48000104a0
>   vkr_context_add_object: 7 -> 0x7b4800010440
>   virtio_gpu_cmd_res_back_attach res 0x5
>   virtio_gpu_cmd_res_back_attach res 0x6
>   vkr_context_add_object: 8 -> 0x7b48000103e0
>   virgl_render_server[1751430]: vkr: failed to import resource: invalid res_id 5
>   virgl_render_server[1751430]: vkr: vkAllocateMemory resulted in CS error
>   virgl_render_server[1751430]: vkr: ring_submit_cmd: vn_dispatch_command failed
>   ==================
>   WARNING: ThreadSanitizer: data race (pid=1751256)
>     Read of size 8 at 0x7f7fa0ea9138 by main thread (mutexes: write M0):
>       #0 memcpy <null> (qemu-system-aarch64+0x41fede) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
..
>   ==================
>   SUMMARY: ThreadSanitizer: data race (/home/alex/lsrc/qemu.git/builds/system.threadsan/qemu-system-aarch64+0x41fede) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) in __interceptor_memcpy
> 
> This could be a false positive or it could be a race between the guest
> kernel clearing memory while we are still doing
> virtio_gpu_ctrl_response.
> 
> What do you think?

The memcpy warning looks a bit suspicion, but likely is harmless. I
don't see such warning with TSAN and x86 VM.
Alex Bennée June 21, 2024, 8:59 a.m. UTC | #3
Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:

> On 6/19/24 20:37, Alex Bennée wrote:
>> So I've been experimenting with Aarch64 TCG with an Intel backend like
>> this:
>> 
>> ./qemu-system-aarch64 \
>>            -M virt -cpu cortex-a76 \
>>            -device virtio-net-pci,netdev=unet \
>>            -netdev user,id=unet,hostfwd=tcp::2222-:22 \
>>            -m 8192 \
>>            -object memory-backend-memfd,id=mem,size=8G,share=on \
>>            -serial mon:stdio \
>>            -kernel ~/lsrc/linux.git/builds/arm64.initramfs/arch/arm64/boot/Image \
>>            -append "console=ttyAMA0" \
>>            -device qemu-xhci -device usb-kbd -device usb-tablet \
>>            -device virtio-gpu-gl-pci,blob=true,venus=true,hostmem=4G \
>>            -display sdl,gl=on -d plugin,guest_errors,trace:virtio_gpu_cmd_res_create_blob,trace:virtio_gpu_cmd_res_back_\*,trace:virtio_gpu_cmd_res_xfer_toh_3d,trace:virtio_gpu_cmd_res_xfer_fromh_3d,trace:address_space_map 
>> 
>> And I've noticed a couple of things. First trying to launch vkmark to
>> run a KMS mode test fails with:
>> 
> ...
>>   virgl_render_server[1875931]: vkr: failed to import resource: invalid res_id 5
>>   virgl_render_server[1875931]: vkr: vkAllocateMemory resulted in CS error 
>>   virgl_render_server[1875931]: vkr: ring_submit_cmd: vn_dispatch_command failed
>> 
>> More interestingly when shutting stuff down we see weirdness like:
>> 
>>   address_space_map as:0x561b48ec48c0 addr 0x1008ac4b0:18 write:1 attrs:0x1                                                                                                    
>>   virgl_render_server[1875931]: vkr: destroying context 3 (vkmark) with a valid instance                                                                                       
>>   virgl_render_server[1875931]: vkr: destroying device with valid objects                                                                                                      
>>   vkr_context_remove_object: -7438602987017907480                                                                                                                              
>>   vkr_context_remove_object: 7                                                                                                                                                 
>>   vkr_context_remove_object: 5       
>> 
>> which indicates something has gone very wrong. I'm not super familiar
>> with the memory allocation patterns but should stuff that is done as
>> virtio_gpu_cmd_res_back_attach() be find-able in the list of resources?
>
> This is expected to fail. Vkmark creates shmem virgl GBM FB BO on guest
> that isn't exportable on host. AFAICT, more code changes should be
> needed to support this case.

There are a lot of acronyms there. If this is pure guest memory why
isn't it exportable to the host? Or should the underlying mesa library
be making sure the allocation happens from the shared region?

Is vkmark particularly special here?


> Note that "destroying device with valid objects" msg is fine, won't hurt
> to silence it in Venus to avoid confusion. It will happen every time
> guest application is closed without explicitly releasing every VK
> object.

I was more concerned with:

>>   vkr_context_remove_object: -7438602987017907480                                                                                                                              

which looks like a corruption of the object ids (or maybe an offby one)

>
>> I tried running under RR to further debug but weirdly I can't get
>> working graphics with that. I did try running under threadsan which
>> complained about a potential data race:
>> 
>>   vkr_context_add_object: 1 -> 0x7b2c00000288
>>   vkr_context_add_object: 2 -> 0x7b2c00000270
>>   vkr_context_add_object: 3 -> 0x7b3800007f28
>>   vkr_context_add_object: 4 -> 0x7b3800007fa0
>>   vkr_context_add_object: 5 -> 0x7b48000103f8
>>   vkr_context_add_object: 6 -> 0x7b48000104a0
>>   vkr_context_add_object: 7 -> 0x7b4800010440
>>   virtio_gpu_cmd_res_back_attach res 0x5
>>   virtio_gpu_cmd_res_back_attach res 0x6
>>   vkr_context_add_object: 8 -> 0x7b48000103e0
>>   virgl_render_server[1751430]: vkr: failed to import resource: invalid res_id 5
>>   virgl_render_server[1751430]: vkr: vkAllocateMemory resulted in CS error
>>   virgl_render_server[1751430]: vkr: ring_submit_cmd: vn_dispatch_command failed
>>   ==================
>>   WARNING: ThreadSanitizer: data race (pid=1751256)
>>     Read of size 8 at 0x7f7fa0ea9138 by main thread (mutexes: write M0):
>>       #0 memcpy <null> (qemu-system-aarch64+0x41fede) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
> ..
>>   ==================
>>   SUMMARY: ThreadSanitizer: data race (/home/alex/lsrc/qemu.git/builds/system.threadsan/qemu-system-aarch64+0x41fede) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) in __interceptor_memcpy
>> 
>> This could be a false positive or it could be a race between the guest
>> kernel clearing memory while we are still doing
>> virtio_gpu_ctrl_response.
>> 
>> What do you think?
>
> The memcpy warning looks a bit suspicion, but likely is harmless. I
> don't see such warning with TSAN and x86 VM.

TSAN can only pick up these interactions with TCG guests because it can
track guest memory accesses. With a KVM guest we have no visibility of
the guest accesses.
Dmitry Osipenko June 21, 2024, 10:25 p.m. UTC | #4
On 6/21/24 11:59, Alex Bennée wrote:
> Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:
> 
>> On 6/19/24 20:37, Alex Bennée wrote:
>>> So I've been experimenting with Aarch64 TCG with an Intel backend like
>>> this:
>>>
>>> ./qemu-system-aarch64 \
>>>            -M virt -cpu cortex-a76 \
>>>            -device virtio-net-pci,netdev=unet \
>>>            -netdev user,id=unet,hostfwd=tcp::2222-:22 \
>>>            -m 8192 \
>>>            -object memory-backend-memfd,id=mem,size=8G,share=on \
>>>            -serial mon:stdio \
>>>            -kernel ~/lsrc/linux.git/builds/arm64.initramfs/arch/arm64/boot/Image \
>>>            -append "console=ttyAMA0" \
>>>            -device qemu-xhci -device usb-kbd -device usb-tablet \
>>>            -device virtio-gpu-gl-pci,blob=true,venus=true,hostmem=4G \
>>>            -display sdl,gl=on -d plugin,guest_errors,trace:virtio_gpu_cmd_res_create_blob,trace:virtio_gpu_cmd_res_back_\*,trace:virtio_gpu_cmd_res_xfer_toh_3d,trace:virtio_gpu_cmd_res_xfer_fromh_3d,trace:address_space_map 
>>>
>>> And I've noticed a couple of things. First trying to launch vkmark to
>>> run a KMS mode test fails with:
>>>
>> ...
>>>   virgl_render_server[1875931]: vkr: failed to import resource: invalid res_id 5
>>>   virgl_render_server[1875931]: vkr: vkAllocateMemory resulted in CS error 
>>>   virgl_render_server[1875931]: vkr: ring_submit_cmd: vn_dispatch_command failed
>>>
>>> More interestingly when shutting stuff down we see weirdness like:
>>>
>>>   address_space_map as:0x561b48ec48c0 addr 0x1008ac4b0:18 write:1 attrs:0x1                                                                                                    
>>>   virgl_render_server[1875931]: vkr: destroying context 3 (vkmark) with a valid instance                                                                                       
>>>   virgl_render_server[1875931]: vkr: destroying device with valid objects                                                                                                      
>>>   vkr_context_remove_object: -7438602987017907480                                                                                                                              
>>>   vkr_context_remove_object: 7                                                                                                                                                 
>>>   vkr_context_remove_object: 5       
>>>
>>> which indicates something has gone very wrong. I'm not super familiar
>>> with the memory allocation patterns but should stuff that is done as
>>> virtio_gpu_cmd_res_back_attach() be find-able in the list of resources?
>>
>> This is expected to fail. Vkmark creates shmem virgl GBM FB BO on guest
>> that isn't exportable on host. AFAICT, more code changes should be
>> needed to support this case.
> 
> There are a lot of acronyms there. If this is pure guest memory why
> isn't it exportable to the host? Or should the underlying mesa library
> be making sure the allocation happens from the shared region?
> 
> Is vkmark particularly special here?

Actually, you could get it to work to a some degree if you'll compile
virglrenderer with -Dminigbm_allocation=true. On host use GTK/Wayland
display.

Vkmark isn't special. It's virglrenderer that has a room for
improvement. ChromeOS doesn't use KMS in VMs, proper KMS support was
never a priority for Venus.

>> Note that "destroying device with valid objects" msg is fine, won't hurt
>> to silence it in Venus to avoid confusion. It will happen every time
>> guest application is closed without explicitly releasing every VK
>> object.
> 
> I was more concerned with:
> 
>>>   vkr_context_remove_object: -7438602987017907480                                                                                                                              
> 
> which looks like a corruption of the object ids (or maybe an offby one)

At first this appeared to be a valid value, otherwise venus should've
crashed Qemu with a debug-assert if ID was invalid. But I never see such
odd IDs with my testing.

>>> I tried running under RR to further debug but weirdly I can't get
>>> working graphics with that. I did try running under threadsan which
>>> complained about a potential data race:
>>>
>>>   vkr_context_add_object: 1 -> 0x7b2c00000288
>>>   vkr_context_add_object: 2 -> 0x7b2c00000270
>>>   vkr_context_add_object: 3 -> 0x7b3800007f28
>>>   vkr_context_add_object: 4 -> 0x7b3800007fa0
>>>   vkr_context_add_object: 5 -> 0x7b48000103f8
>>>   vkr_context_add_object: 6 -> 0x7b48000104a0
>>>   vkr_context_add_object: 7 -> 0x7b4800010440
>>>   virtio_gpu_cmd_res_back_attach res 0x5
>>>   virtio_gpu_cmd_res_back_attach res 0x6
>>>   vkr_context_add_object: 8 -> 0x7b48000103e0
>>>   virgl_render_server[1751430]: vkr: failed to import resource: invalid res_id 5
>>>   virgl_render_server[1751430]: vkr: vkAllocateMemory resulted in CS error
>>>   virgl_render_server[1751430]: vkr: ring_submit_cmd: vn_dispatch_command failed
>>>   ==================
>>>   WARNING: ThreadSanitizer: data race (pid=1751256)
>>>     Read of size 8 at 0x7f7fa0ea9138 by main thread (mutexes: write M0):
>>>       #0 memcpy <null> (qemu-system-aarch64+0x41fede) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025)
>> ..
>>>   ==================
>>>   SUMMARY: ThreadSanitizer: data race (/home/alex/lsrc/qemu.git/builds/system.threadsan/qemu-system-aarch64+0x41fede) (BuildId: 0bab171e77cb6782341ee3407e44af7267974025) in __interceptor_memcpy
>>>
>>> This could be a false positive or it could be a race between the guest
>>> kernel clearing memory while we are still doing
>>> virtio_gpu_ctrl_response.
>>>
>>> What do you think?
>>
>> The memcpy warning looks a bit suspicion, but likely is harmless. I
>> don't see such warning with TSAN and x86 VM.
> 
> TSAN can only pick up these interactions with TCG guests because it can
> track guest memory accesses. With a KVM guest we have no visibility of
> the guest accesses. 

I couldn't reproduce this issue with my KVM/TCG/ARM64 setups. Fox x86 I
checked both KVM and TCG, TSAN only warns about vitio-net memcpy's for me.
Alex Bennée June 23, 2024, 4:44 p.m. UTC | #5
Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:

> On 6/21/24 11:59, Alex Bennée wrote:
>> Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:
>> 
>>> On 6/19/24 20:37, Alex Bennée wrote:
>>>> So I've been experimenting with Aarch64 TCG with an Intel backend like
>>>> this:
>>>>
>>>> ./qemu-system-aarch64 \
>>>>            -M virt -cpu cortex-a76 \
>>>>            -device virtio-net-pci,netdev=unet \
>>>>            -netdev user,id=unet,hostfwd=tcp::2222-:22 \
>>>>            -m 8192 \
>>>>            -object memory-backend-memfd,id=mem,size=8G,share=on \
>>>>            -serial mon:stdio \
>>>>            -kernel ~/lsrc/linux.git/builds/arm64.initramfs/arch/arm64/boot/Image \
>>>>            -append "console=ttyAMA0" \
>>>>            -device qemu-xhci -device usb-kbd -device usb-tablet \
>>>>            -device virtio-gpu-gl-pci,blob=true,venus=true,hostmem=4G \
>>>>            -display sdl,gl=on -d plugin,guest_errors,trace:virtio_gpu_cmd_res_create_blob,trace:virtio_gpu_cmd_res_back_\*,trace:virtio_gpu_cmd_res_xfer_toh_3d,trace:virtio_gpu_cmd_res_xfer_fromh_3d,trace:address_space_map 
>>>>
>>>> And I've noticed a couple of things. First trying to launch vkmark to
>>>> run a KMS mode test fails with:
>>>>
>>> ...
>>>>   virgl_render_server[1875931]: vkr: failed to import resource: invalid res_id 5
>>>>   virgl_render_server[1875931]: vkr: vkAllocateMemory resulted in CS error 
>>>>   virgl_render_server[1875931]: vkr: ring_submit_cmd: vn_dispatch_command failed
>>>>
>>>> More interestingly when shutting stuff down we see weirdness like:
>>>>
>>>>   address_space_map as:0x561b48ec48c0 addr 0x1008ac4b0:18 write:1 attrs:0x1                                                                                                    
>>>>   virgl_render_server[1875931]: vkr: destroying context 3 (vkmark) with a valid instance                                                                                       
>>>>   virgl_render_server[1875931]: vkr: destroying device with valid objects                                                                                                      
>>>>   vkr_context_remove_object: -7438602987017907480                                                                                                                              
>>>>   vkr_context_remove_object: 7                                                                                                                                                 
>>>>   vkr_context_remove_object: 5       
>>>>
>>>> which indicates something has gone very wrong. I'm not super familiar
>>>> with the memory allocation patterns but should stuff that is done as
>>>> virtio_gpu_cmd_res_back_attach() be find-able in the list of resources?
>>>
>>> This is expected to fail. Vkmark creates shmem virgl GBM FB BO on guest
>>> that isn't exportable on host. AFAICT, more code changes should be
>>> needed to support this case.
>> 
>> There are a lot of acronyms there. If this is pure guest memory why
>> isn't it exportable to the host? Or should the underlying mesa library
>> be making sure the allocation happens from the shared region?
>> 
>> Is vkmark particularly special here?
>
> Actually, you could get it to work to a some degree if you'll compile
> virglrenderer with -Dminigbm_allocation=true. On host use GTK/Wayland
> display.

I'll give that a go.

> Vkmark isn't special. It's virglrenderer that has a room for
> improvement. ChromeOS doesn't use KMS in VMs, proper KMS support was
> never a priority for Venus.

Is there a tracking bug for KMS support for Venus? Or Venus should work
fine if virglrenderer can export the buffer to the host?

<snip>
>>>>
>>>> This could be a false positive or it could be a race between the guest
>>>> kernel clearing memory while we are still doing
>>>> virtio_gpu_ctrl_response.
>>>>
>>>> What do you think?
>>>
>>> The memcpy warning looks a bit suspicion, but likely is harmless. I
>>> don't see such warning with TSAN and x86 VM.
>> 
>> TSAN can only pick up these interactions with TCG guests because it can
>> track guest memory accesses. With a KVM guest we have no visibility of
>> the guest accesses. 
>
> I couldn't reproduce this issue with my KVM/TCG/ARM64 setups. Fox x86 I
> checked both KVM and TCG, TSAN only warns about vitio-net memcpy's for
> me.

Hmm OK. I'll keep an eye out as I test the next version.