mbox series

[v12,00/13] Support blob memory and venus on qemu

Message ID 20240519212712.2605419-1-dmitry.osipenko@collabora.com (mailing list archive)
Headers show
Series Support blob memory and venus on qemu | expand

Message

Dmitry Osipenko May 19, 2024, 9:26 p.m. UTC
Hello,

This series enables Vulkan Venus context support on virtio-gpu.

All virglrender and almost all Linux kernel prerequisite changes
needed by Venus are already in upstream. For kernel there is a pending
KVM patchset that fixes mapping of compound pages needed for DRM drivers
using TTM [1], othewrwise hostmem blob mapping will fail with a KVM error
from Qemu.

[1] https://lore.kernel.org/kvm/20240229025759.1187910-1-stevensd@google.com/

You'll need to use recent Mesa version containing patch that removes
dependency on cross-device feature from Venus that isn't supported by
Qemu [2].

[2] https://gitlab.freedesktop.org/mesa/mesa/-/commit/087e9a96d13155e26987befae78b6ccbb7ae242b

Example Qemu cmdline that enables Venus:

  qemu-system-x86_64 -device virtio-vga-gl,hostmem=4G,blob=true,venus=true \
      -machine q35,accel=kvm,memory-backend=mem1 \
      -object memory-backend-memfd,id=mem1,size=8G -m 8G


Changes from V11 to V12

- Fixed virgl_cmd_resource_create_blob() error handling. Now it doesn't
  corrupt resource list and releases resource properly on error. Thanks
  to Akihiko Odaki for spotting the bug.

- Added new patch that handles virtio_gpu_virgl_init() failure gracefully,
  fixing Qemu crash. Besides fixing the crash, it allows to implement
  a cleaner virtio_gpu_virgl_deinit().

- virtio_gpu_virgl_deinit() now assumes that previously virgl was
  initialized successfully when it was inited at all. Suggested by
  Akihiko Odaki.

- Fixed missed freeing of print_stats timer in virtio_gpu_virgl_deinit()

- Added back blob unmapping or RESOURCE_UNREF that was requested
  by Akihiko Odaki. Added comment to the code explaining how
  async unmapping works. Added back `res->async_unmap_in_progress`
  flag and added comment telling why it's needed.

- Moved cmdq_resume_bh to VirtIOGPUGL and made coding style changes
  suggested by Akihiko Odaki.

- Added patches that move fence_poll and print_stats timers to VirtIOGPUGL
  for consistency with cmdq_resume_bh.

Changes from V10 to V11

- Replaced cmd_resume bool in struct ctrl_command with
  "cmd->finished + !VIRTIO_GPU_FLAG_FENCE" checking as was requested
  by Akihiko Odaki.

- Reworked virgl_cmd_resource_unmap/unref_blob() to avoid re-adding
  the 'async_unmap_in_progress' flag that was dropped in v9:

    1. virgl_cmd_resource_[un]map_blob() now doesn't check itself whether
       resource was previously mapped and lets virglrenderer to do the
       checking.

    2. error returned by virgl_renderer_resource_unmap() is now handled
       and reported properly, previously the error wasn't checked. The
       virgl_renderer_resource_unmap() fails if resource wasn't mapped.

    3. virgl_cmd_resource_unref_blob() now doesn't allow to unref resource
       that is mapped, it's a error condition if guest didn't unmap resource
       before doing the unref. Previously unref was implicitly unmapping
       resource.

Changes from V9 to V10

- Dropped 'async_unmap_in_progress' variable and switched to use
  aio_bh_new() isntead of oneshot variant in the "blob commands" patch.

- Further improved error messages by printing error code when actual error
  occurrs and using ERR_UNSPEC instead of ERR_ENOMEM when we don't really
  know if it was ENOMEM for sure.

- Added vdc->unrealize for the virtio GL device and freed virgl data.

- Dropped UUID and doc/migration patches. UUID feature isn't needed
  anymore, instead we changed Mesa Venus driver to not require UUID.

- Renamed virtio-gpu-gl "vulkan" property name back to "venus".

Changes from V8 to V9

- Added resuming of cmdq processing when hostmem MR is freed,
  as was suggested by Akihiko Odaki.

- Added more error messages, suggested by Akihiko Odaki

- Dropped superfluous 'res->async_unmap_completed', suggested
  by Akihiko Odaki.

- Kept using cmd->suspended flag. Akihiko Odaki suggested to make
  virtio_gpu_virgl_process_cmd() return false if cmd processing is
  suspended, but it's not easy to implement due to ubiquitous
  VIRTIO_GPU_FILL_CMD() macros that returns void, requiring to change
  all the virtio-gpu processing code.

- Added back virtio_gpu_virgl_resource as was requested by Akihiko Odaki,
  though I'm not convinced it's really needed.

- Switched to use GArray, renamed capset2_max_ver/size vars and moved
  "vulkan" property definition to the virtio-gpu-gl device in the Venus
  patch, like was suggested by Akihiko Odaki.

- Moved UUID to virtio_gpu_virgl_resource and dropped UUID save/restore
  since it will require bumping VM version and virgl device isn't miratable
  anyways.

- Fixed exposing UUID feature with Rutabaga

- Dropped linux-headers update patch because headers were already updated
  in Qemu/staging.

- Added patch that updates virtio migration doc with a note about virtio-gpu
  migration specifics, suggested by Akihiko Odaki.

- Addressed coding style issue noticed by Akihiko Odaki

Changes from V7 to V8

- Supported suspension of virtio-gpu commands processing and made
  unmapping of hostmem region asynchronous by blocking/suspending
  cmd processing until region is unmapped. Suggested by Akihiko Odaki.

- Fixed arm64 building of x86 targets using updated linux-headers.
  Corrected the update script. Thanks to Rob Clark for reporting
  the issue.

- Added new patch that makes registration of virgl capsets dynamic.
  Requested by Antonio Caggiano and Pierre-Eric Pelloux-Prayer.

- Venus capset now isn't advertised if Vulkan is disabled with vulkan=false

Changes from V6 to V7

- Used scripts/update-linux-headers.sh to update Qemu headers based
  on Linux v6.8-rc3 that adds Venus capset definition to virtio-gpu
  protocol, was requested by Peter Maydel

- Added r-bs that were given to v6 patches. Corrected missing s-o-bs

- Dropped context_init Qemu's virtio-gpu device configuration flag,
  was suggested by Marc-André Lureau

- Added missing error condition checks spotted by Marc-André Lureau
  and Akihiko Odaki, and few more

- Returned back res->mr referencing to memory_region_init_ram_ptr() like
  was suggested by Akihiko Odaki. Incorporated fix suggested by Pierre-Eric
  to specify the MR name

- Dropped the virgl_gpu_resource wrapper, cleaned up and simplified
  patch that adds blob-cmd support

- Fixed improper blob resource removal from resource list on resource_unref
  that was spotted by Akihiko Odaki

- Change order of the blob patches, was suggested by Akihiko Odaki.
  The cmd_set_scanout_blob support is enabled first

- Factored out patch that adds resource management support to virtio-gpu-gl,
  was requested by Marc-André Lureau

- Simplified and improved the UUID support patch, dropped the hash table
  as we don't need it for now. Moved QemuUUID to virtio_gpu_simple_resource.
  This all was suggested by Akihiko Odaki and Marc-André Lureau

- Dropped console_has_gl() check, suggested by Akihiko Odaki

- Reworked Meson cheking of libvirglrender features, made new features
  available based on virglrender pkgconfig version instead of checking
  symbols in header. This should fix build error using older virglrender
  version, reported by Alex Bennée

- Made enabling of Venus context configrable via new virtio-gpu device
  "vulkan=true" flag, suggested by Marc-André Lureau. The flag is disabled
  by default because it requires blob and hostmem options to be enabled
  and configured

Changes from V5 to V6

- Move macros configurations under virgl.found() and rename
  HAVE_VIRGL_CONTEXT_CREATE_WITH_FLAGS.

- Handle the case while context_init is disabled.

- Enable context_init by default.

- Move virtio_gpu_virgl_resource_unmap() into
  virgl_cmd_resource_unmap_blob().

- Introduce new struct virgl_gpu_resource to store virgl specific members.

- Remove erro handling of g_new0, because glib will abort() on OOM.

- Set resource uuid as option.

- Implement optional subsection of vmstate_virtio_gpu_resource_uuid_state
  for virtio live migration.

- Use g_int_hash/g_int_equal instead of the default

- Add scanout_blob function for virtio-gpu-virgl

- Resolve the memory leak on virtio-gpu-virgl

- Remove the unstable API flags check because virglrenderer is already 1.0

- Squash the render server flag support into "Initialize Venus"

Changes from V4 (virtio gpu V4) to V5

- Inverted patch 5 and 6 because we should configure
  HAVE_VIRGL_CONTEXT_INIT firstly.

- Validate owner of memory region to avoid slowing down DMA.

- Use memory_region_init_ram_ptr() instead of
  memory_region_init_ram_device_ptr().

- Adjust sequence to allocate gpu resource before virglrender resource
  creation

- Add virtio migration handling for uuid.

- Send kernel patch to define VIRTIO_GPU_CAPSET_VENUS.
  https://lore.kernel.org/lkml/20230915105918.3763061-1-ray.huang@amd.com/

- Add meson check to make sure unstable APIs defined from 0.9.0.

Changes from V1 to V2 (virtio gpu V4)

- Remove unused #include "hw/virtio/virtio-iommu.h"

- Add a local function, called virgl_resource_destroy(), that is used
  to release a vgpu resource on error paths and in resource_unref.

- Remove virtio_gpu_virgl_resource_unmap from
  virtio_gpu_cleanup_mapping(),
  since this function won't be called on blob resources and also because
  blob resources are unmapped via virgl_cmd_resource_unmap_blob().

- In virgl_cmd_resource_create_blob(), do proper cleanup in error paths
  and move QTAILQ_INSERT_HEAD(&g->reslist, res, next) after the resource
  has been fully initialized.

- Memory region has a different life-cycle from virtio gpu resources
  i.e. cannot be released synchronously along with the vgpu resource.
  So, here the field "region" was changed to a pointer and is allocated
  dynamically when the blob is mapped.
  Also, since the pointer can be used to indicate whether the blob
  is mapped, the explicite field "mapped" was removed.

- In virgl_cmd_resource_map_blob(), add check on the value of
  res->region, to prevent beeing called twice on the same resource.

- Add a patch to enable automatic deallocation of memory regions to resolve
  use-after-free memory corruption with a reference.

Antonio Caggiano (2):
  virtio-gpu: Handle resource blob commands
  virtio-gpu: Support Venus context

Dmitry Osipenko (7):
  virtio-gpu: Unrealize GL device
  virtio-gpu: Handle virtio_gpu_virgl_init() failure
  virtio-gpu: Use pkgconfig version to decide which virgl features are
    available
  virtio-gpu: Don't require udmabuf when blobs and virgl are enabled
  virtio-gpu: Support suspension of commands processing
  virtio-gpu: Move fence_poll timer to VirtIOGPUGL
  virtio-gpu: Move print_stats timer to VirtIOGPUGL

Huang Rui (2):
  virtio-gpu: Support context-init feature with virglrenderer
  virtio-gpu: Add virgl resource management

Pierre-Eric Pelloux-Prayer (1):
  virtio-gpu: Register capsets dynamically

Robert Beckett (1):
  virtio-gpu: Support blob scanout using dmabuf fd

 hw/display/virtio-gpu-gl.c     |  38 ++-
 hw/display/virtio-gpu-virgl.c  | 594 +++++++++++++++++++++++++++++++--
 hw/display/virtio-gpu.c        |  35 +-
 include/hw/virtio/virtio-gpu.h |  22 +-
 meson.build                    |  10 +-
 5 files changed, 653 insertions(+), 46 deletions(-)

Comments

Alex Bennée May 21, 2024, 1:15 p.m. UTC | #1
Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:

> Hello,
>
> This series enables Vulkan Venus context support on virtio-gpu.
>
> All virglrender and almost all Linux kernel prerequisite changes
> needed by Venus are already in upstream. For kernel there is a pending
> KVM patchset that fixes mapping of compound pages needed for DRM drivers
> using TTM [1], othewrwise hostmem blob mapping will fail with a KVM error
> from Qemu.
>
> [1] https://lore.kernel.org/kvm/20240229025759.1187910-1-stevensd@google.com/
>
> You'll need to use recent Mesa version containing patch that removes
> dependency on cross-device feature from Venus that isn't supported by
> Qemu [2].
>
> [2] https://gitlab.freedesktop.org/mesa/mesa/-/commit/087e9a96d13155e26987befae78b6ccbb7ae242b
>
> Example Qemu cmdline that enables Venus:
>
>   qemu-system-x86_64 -device virtio-vga-gl,hostmem=4G,blob=true,venus=true \
>       -machine q35,accel=kvm,memory-backend=mem1 \
>       -object memory-backend-memfd,id=mem1,size=8G -m 8G

What is the correct device for non-x86 guests? We have virtio-gpu-gl-pci
but when doing that I get:

  -device virtio-gpu-gl-pci,hostmem=4G,blob=true,venus=true
  qemu-system-aarch64: -device virtio-gpu-gl-pci,hostmem=4G,blob=true,venus=true: opengl is not available

According to 37f86af087 (virtio-gpu: move virgl realize + properties):

  Drop the virgl property, the virtio-gpu-gl-device has virgl enabled no
  matter what.  Just use virtio-gpu-device instead if you don't want
  enable virgl and opengl.  This simplifies the logic and reduces the test
  matrix.

but that's not a good solution because that needs virtio-mmio and there
are reasons to have a PCI device (for one thing no ambiguity about
discovery).
Alex Bennée May 21, 2024, 2:57 p.m. UTC | #2
Alex Bennée <alex.bennee@linaro.org> writes:

> Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:
>
>> Hello,
>>
>> This series enables Vulkan Venus context support on virtio-gpu.
>>
>> All virglrender and almost all Linux kernel prerequisite changes
>> needed by Venus are already in upstream. For kernel there is a pending
>> KVM patchset that fixes mapping of compound pages needed for DRM drivers
>> using TTM [1], othewrwise hostmem blob mapping will fail with a KVM error
>> from Qemu.
>>
>> [1] https://lore.kernel.org/kvm/20240229025759.1187910-1-stevensd@google.com/
>>
>> You'll need to use recent Mesa version containing patch that removes
>> dependency on cross-device feature from Venus that isn't supported by
>> Qemu [2].
>>
>> [2] https://gitlab.freedesktop.org/mesa/mesa/-/commit/087e9a96d13155e26987befae78b6ccbb7ae242b
>>
>> Example Qemu cmdline that enables Venus:
>>
>>   qemu-system-x86_64 -device virtio-vga-gl,hostmem=4G,blob=true,venus=true \
>>       -machine q35,accel=kvm,memory-backend=mem1 \
>>       -object memory-backend-memfd,id=mem1,size=8G -m 8G
>
> What is the correct device for non-x86 guests? We have virtio-gpu-gl-pci
> but when doing that I get:
>
>   -device virtio-gpu-gl-pci,hostmem=4G,blob=true,venus=true
>   qemu-system-aarch64: -device virtio-gpu-gl-pci,hostmem=4G,blob=true,venus=true: opengl is not available
>
> According to 37f86af087 (virtio-gpu: move virgl realize + properties):
>
>   Drop the virgl property, the virtio-gpu-gl-device has virgl enabled no
>   matter what.  Just use virtio-gpu-device instead if you don't want
>   enable virgl and opengl.  This simplifies the logic and reduces the test
>   matrix.
>
> but that's not a good solution because that needs virtio-mmio and there
> are reasons to have a PCI device (for one thing no ambiguity about
> discovery).

Oops my mistake forgetting:

  --display gtk,gl=on

Although I do see a lot of eglMakeContext failures.
Dmitry Osipenko May 22, 2024, 12:02 a.m. UTC | #3
On 5/21/24 17:57, Alex Bennée wrote:
> Alex Bennée <alex.bennee@linaro.org> writes:
> 
>> Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:
>>
>>> Hello,
>>>
>>> This series enables Vulkan Venus context support on virtio-gpu.
>>>
>>> All virglrender and almost all Linux kernel prerequisite changes
>>> needed by Venus are already in upstream. For kernel there is a pending
>>> KVM patchset that fixes mapping of compound pages needed for DRM drivers
>>> using TTM [1], othewrwise hostmem blob mapping will fail with a KVM error
>>> from Qemu.
>>>
>>> [1] https://lore.kernel.org/kvm/20240229025759.1187910-1-stevensd@google.com/
>>>
>>> You'll need to use recent Mesa version containing patch that removes
>>> dependency on cross-device feature from Venus that isn't supported by
>>> Qemu [2].
>>>
>>> [2] https://gitlab.freedesktop.org/mesa/mesa/-/commit/087e9a96d13155e26987befae78b6ccbb7ae242b
>>>
>>> Example Qemu cmdline that enables Venus:
>>>
>>>   qemu-system-x86_64 -device virtio-vga-gl,hostmem=4G,blob=true,venus=true \
>>>       -machine q35,accel=kvm,memory-backend=mem1 \
>>>       -object memory-backend-memfd,id=mem1,size=8G -m 8G
>>
>> What is the correct device for non-x86 guests? We have virtio-gpu-gl-pci
>> but when doing that I get:
>>
>>   -device virtio-gpu-gl-pci,hostmem=4G,blob=true,venus=true
>>   qemu-system-aarch64: -device virtio-gpu-gl-pci,hostmem=4G,blob=true,venus=true: opengl is not available
>>
>> According to 37f86af087 (virtio-gpu: move virgl realize + properties):
>>
>>   Drop the virgl property, the virtio-gpu-gl-device has virgl enabled no
>>   matter what.  Just use virtio-gpu-device instead if you don't want
>>   enable virgl and opengl.  This simplifies the logic and reduces the test
>>   matrix.
>>
>> but that's not a good solution because that needs virtio-mmio and there
>> are reasons to have a PCI device (for one thing no ambiguity about
>> discovery).
> 
> Oops my mistake forgetting:
> 
>   --display gtk,gl=on
> 
> Although I do see a lot of eglMakeContext failures.

Please post the full Qemu cmdline you're using
Alex Bennée May 22, 2024, 9 a.m. UTC | #4
Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:

> On 5/21/24 17:57, Alex Bennée wrote:
>> Alex Bennée <alex.bennee@linaro.org> writes:
>> 
>>> Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:
>>>
>>>> Hello,
>>>>
>>>> This series enables Vulkan Venus context support on virtio-gpu.
>>>>
>>>> All virglrender and almost all Linux kernel prerequisite changes
>>>> needed by Venus are already in upstream. For kernel there is a pending
>>>> KVM patchset that fixes mapping of compound pages needed for DRM drivers
>>>> using TTM [1], othewrwise hostmem blob mapping will fail with a KVM error
>>>> from Qemu.
>>>>
>>>> [1] https://lore.kernel.org/kvm/20240229025759.1187910-1-stevensd@google.com/
>>>>
>>>> You'll need to use recent Mesa version containing patch that removes
>>>> dependency on cross-device feature from Venus that isn't supported by
>>>> Qemu [2].
>>>>
>>>> [2] https://gitlab.freedesktop.org/mesa/mesa/-/commit/087e9a96d13155e26987befae78b6ccbb7ae242b
>>>>
>>>> Example Qemu cmdline that enables Venus:
>>>>
>>>>   qemu-system-x86_64 -device virtio-vga-gl,hostmem=4G,blob=true,venus=true \
>>>>       -machine q35,accel=kvm,memory-backend=mem1 \
>>>>       -object memory-backend-memfd,id=mem1,size=8G -m 8G
>>>
>>> What is the correct device for non-x86 guests? We have virtio-gpu-gl-pci
>>> but when doing that I get:
>>>
>>>   -device virtio-gpu-gl-pci,hostmem=4G,blob=true,venus=true
>>>   qemu-system-aarch64: -device virtio-gpu-gl-pci,hostmem=4G,blob=true,venus=true: opengl is not available
>>>
>>> According to 37f86af087 (virtio-gpu: move virgl realize + properties):
>>>
>>>   Drop the virgl property, the virtio-gpu-gl-device has virgl enabled no
>>>   matter what.  Just use virtio-gpu-device instead if you don't want
>>>   enable virgl and opengl.  This simplifies the logic and reduces the test
>>>   matrix.
>>>
>>> but that's not a good solution because that needs virtio-mmio and there
>>> are reasons to have a PCI device (for one thing no ambiguity about
>>> discovery).
>> 
>> Oops my mistake forgetting:
>> 
>>   --display gtk,gl=on
>> 
>> Although I do see a lot of eglMakeContext failures.
>
> Please post the full Qemu cmdline you're using

With:

  ./qemu-system-aarch64 \
           -machine type=virt,virtualization=on,pflash0=rom,pflash1=efivars \
           -cpu neoverse-n1 \
           -smp 4 \
           -accel tcg \
           -device virtio-net-pci,netdev=unet \
           -device virtio-scsi-pci \
           -device scsi-hd,drive=hd \
           -netdev user,id=unet,hostfwd=tcp::2222-:22 \
           -blockdev driver=raw,node-name=hd,file.driver=host_device,file.filename=/dev/zen-ssd2/trixie-arm64,discard=unmap \
           -serial mon:stdio \
           -blockdev node-name=rom,driver=file,filename=(pwd)/pc-bios/edk2-aarch64-code.fd,read-only=true \
           -blockdev node-name=efivars,driver=file,filename=$HOME/images/qemu-arm64-efivars \
           -m 8192 \
           -object memory-backend-memfd,id=mem,size=8G,share=on \
           -device virtio-gpu-gl-pci,hostmem=4G,blob=true,venus=true \
           -display gtk,gl=on,show-cursor=on -vga none \
           -device qemu-xhci -device usb-kbd -device usb-tablet

I get a boot up with a lot of:

                                                                                                                                                                             
  (qemu:1545322): Gdk-WARNING **: 09:26:09.470: eglMakeCurrent failed                                                                                                          

  (qemu:1545322): Gdk-WARNING **: 09:26:09.470: eglMakeCurrent failed                                                                                                          

  (qemu:1545322): Gdk-WARNING **: 09:26:09.470: eglMakeCurrent failed                                                                                                          

  (qemu:1545322): Gdk-WARNING **: 09:26:09.470: eglMakeCurrent failed               

In the guest I run:

  meson devenv -C /root/lsrc/graphics/mesa.git/build fish

to bring in the latest Mesa (with virtio enabled). Running vulkaninfo
reports two cards:

  ==========                                                                            
  VULKANINFO                        
  ==========                                                                            

  Vulkan Instance Version: 1.3.280                                                      


  Instance Extensions: count = 14
  -------------------------------
  VK_EXT_debug_report                    : extension revision 10
  VK_EXT_debug_utils                     : extension revision 2
  VK_EXT_headless_surface                : extension revision 1
  VK_KHR_device_group_creation           : extension revision 1
  VK_KHR_external_fence_capabilities     : extension revision 1
  VK_KHR_external_memory_capabilities    : extension revision 1
  VK_KHR_external_semaphore_capabilities : extension revision 1
  VK_KHR_get_physical_device_properties2 : extension revision 2
  VK_KHR_get_surface_capabilities2       : extension revision 1
  VK_KHR_portability_enumeration         : extension revision 1
  VK_KHR_surface                         : extension revision 25
  VK_KHR_surface_protected_capabilities  : extension revision 1
  VK_KHR_wayland_surface                 : extension revision 6
  VK_LUNARG_direct_driver_loading        : extension revision 1

  Instance Layers: count = 2
  --------------------------
  VK_LAYER_MESA_device_select Linux device selection layer 1.3.211  version 1
  VK_LAYER_MESA_overlay       Mesa Overlay layer           1.3.211
  version 1

  Devices:
  ========
  GPU0:
          apiVersion         = 1.3.230
          driverVersion      = 24.1.99
          vendorID           = 0x8086
          deviceID           = 0xa780
          deviceType         = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU
          deviceName         = Virtio-GPU Venus (Intel(R) Graphics (RPL-S))
          driverID           = DRIVER_ID_MESA_VENUS
          driverName         = venus
          driverInfo         = Mesa 24.2.0-devel (git-0b582449f0)
          conformanceVersion = 1.3.0.0
          deviceUUID         = 29d2e940-a1a0-3054-0f9a-9f7dec52a084
          driverUUID         = 3694c390-f245-612a-12ce-7d3a99127622
  GPU1:
          apiVersion         = 1.2.0
          driverVersion      = 24.1.99
          vendorID           = 0x10005
          deviceID           = 0x0000
          deviceType         = PHYSICAL_DEVICE_TYPE_CPU
          deviceName         = Virtio-GPU Venus (llvmpipe (LLVM 15.0.6, 256 bits))
          driverID           = DRIVER_ID_MESA_VENUS
          driverName         = venus
          driverInfo         = Mesa 24.2.0-devel (git-0b582449f0)
          conformanceVersion = 1.3.0.0
          deviceUUID         = 5fb5c03f-c537-f0fe-a7e6-9cd5866acb8d
          driverUUID         = 3694c390-f245-612a-12ce-7d3a99127622

Running weston and then vkcube-wayland reports its selecting "GPU 0:
Virtio-GPU Venus (Intel(R) Graphics (RPL-S))" but otherwise produces no
output.

If I run with "-display sdl,gl=on,show-cursor=on" and the same other
command line options the results for vulkaninfo are the same. However
vkcube-wayland gets a little further and draws the initial cube on the
screen before locking up with:

 MESA-VIRTIO: debug: stuck in fence wait with iter at xxxx

where xxxx grows each time it prints. On shutting down I see some virgl
errors interspersed with the systemd logs:

  [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x209)
  [  OK  ] Stopped systemd-logind.service - User Login Management.
  virtio_gpu_virgl_process_cmd: ctrl 0x209, error 0x1200
  [  475.257111] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x209)
  [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x209)
  [  OK  ] Stopped target network-online.target - Network is Online.
  [  OK  ] Stopped target remote-fs.target - Remote File Systems.
  [  OK  ] Stopped NetworkManager-wait-online…vice - Network Manager Wait Online.
           Stopping avahi-daemon.service - Avahi mDNS/DNS-SD Stack...
           Stopping cups.service - CUPS Scheduler...
           Stopping user-runtime-dir@0.servic…er Runtime Directory /run/user/0...
  [  OK  ] Stopped avahi-daemon.service - Avahi mDNS/DNS-SD Stack.
  [  OK  ] Stopped cups.service - CUPS Scheduler.
  virtio_gpu_virgl_process_cmd: ctrl 0x209, error 0x1200
  [  475.357543] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x209)
  [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x209)
  [  OK  ] Stopped target network.target - Network.
  [  OK  ] Stopped target nss-user-lookup.target - User and Group Name Lookups.
           Stopping NetworkManager.service - Network Manager...
           Stopping networking.service - Raise network interfaces...
           Stopping wpa_supplicant.service - WPA supplicant...
  [  OK  ] Stopped wpa_supplicant.service - WPA supplicant.
  virtio_gpu_virgl_process_cmd: ctrl 0x209, error 0x1200
  [  493.585261] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x209)
  [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x209)
Dmitry Osipenko May 26, 2024, 11:46 p.m. UTC | #5
On 5/22/24 12:00, Alex Bennée wrote:
> Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:
> 
>> On 5/21/24 17:57, Alex Bennée wrote:
>>> Alex Bennée <alex.bennee@linaro.org> writes:
>>>
>>>> Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:
>>>>
>>>>> Hello,
>>>>>
>>>>> This series enables Vulkan Venus context support on virtio-gpu.
>>>>>
>>>>> All virglrender and almost all Linux kernel prerequisite changes
>>>>> needed by Venus are already in upstream. For kernel there is a pending
>>>>> KVM patchset that fixes mapping of compound pages needed for DRM drivers
>>>>> using TTM [1], othewrwise hostmem blob mapping will fail with a KVM error
>>>>> from Qemu.
>>>>>
>>>>> [1] https://lore.kernel.org/kvm/20240229025759.1187910-1-stevensd@google.com/
>>>>>
>>>>> You'll need to use recent Mesa version containing patch that removes
>>>>> dependency on cross-device feature from Venus that isn't supported by
>>>>> Qemu [2].
>>>>>
>>>>> [2] https://gitlab.freedesktop.org/mesa/mesa/-/commit/087e9a96d13155e26987befae78b6ccbb7ae242b
>>>>>
>>>>> Example Qemu cmdline that enables Venus:
>>>>>
>>>>>   qemu-system-x86_64 -device virtio-vga-gl,hostmem=4G,blob=true,venus=true \
>>>>>       -machine q35,accel=kvm,memory-backend=mem1 \
>>>>>       -object memory-backend-memfd,id=mem1,size=8G -m 8G
>>>>
>>>> What is the correct device for non-x86 guests? We have virtio-gpu-gl-pci
>>>> but when doing that I get:
>>>>
>>>>   -device virtio-gpu-gl-pci,hostmem=4G,blob=true,venus=true
>>>>   qemu-system-aarch64: -device virtio-gpu-gl-pci,hostmem=4G,blob=true,venus=true: opengl is not available
>>>>
>>>> According to 37f86af087 (virtio-gpu: move virgl realize + properties):
>>>>
>>>>   Drop the virgl property, the virtio-gpu-gl-device has virgl enabled no
>>>>   matter what.  Just use virtio-gpu-device instead if you don't want
>>>>   enable virgl and opengl.  This simplifies the logic and reduces the test
>>>>   matrix.
>>>>
>>>> but that's not a good solution because that needs virtio-mmio and there
>>>> are reasons to have a PCI device (for one thing no ambiguity about
>>>> discovery).
>>>
>>> Oops my mistake forgetting:
>>>
>>>   --display gtk,gl=on
>>>
>>> Although I do see a lot of eglMakeContext failures.
>>
>> Please post the full Qemu cmdline you're using
> 
> With:
> 
>   ./qemu-system-aarch64 \
>            -machine type=virt,virtualization=on,pflash0=rom,pflash1=efivars \
>            -cpu neoverse-n1 \
>            -smp 4 \
>            -accel tcg \
>            -device virtio-net-pci,netdev=unet \
>            -device virtio-scsi-pci \
>            -device scsi-hd,drive=hd \
>            -netdev user,id=unet,hostfwd=tcp::2222-:22 \
>            -blockdev driver=raw,node-name=hd,file.driver=host_device,file.filename=/dev/zen-ssd2/trixie-arm64,discard=unmap \
>            -serial mon:stdio \
>            -blockdev node-name=rom,driver=file,filename=(pwd)/pc-bios/edk2-aarch64-code.fd,read-only=true \
>            -blockdev node-name=efivars,driver=file,filename=$HOME/images/qemu-arm64-efivars \
>            -m 8192 \
>            -object memory-backend-memfd,id=mem,size=8G,share=on \
>            -device virtio-gpu-gl-pci,hostmem=4G,blob=true,venus=true \
>            -display gtk,gl=on,show-cursor=on -vga none \
>            -device qemu-xhci -device usb-kbd -device usb-tablet
> 
> I get a boot up with a lot of:
> 
>                                                                                                                                                                              
>   (qemu:1545322): Gdk-WARNING **: 09:26:09.470: eglMakeCurrent failed                                                                                                          
> 
>   (qemu:1545322): Gdk-WARNING **: 09:26:09.470: eglMakeCurrent failed                                                                                                          
> 
>   (qemu:1545322): Gdk-WARNING **: 09:26:09.470: eglMakeCurrent failed                                                                                                          
> 
>   (qemu:1545322): Gdk-WARNING **: 09:26:09.470: eglMakeCurrent failed               
> 
> In the guest I run:
> 
>   meson devenv -C /root/lsrc/graphics/mesa.git/build fish
> 
> to bring in the latest Mesa (with virtio enabled). Running vulkaninfo
> reports two cards:
> 
>   ==========                                                                            
>   VULKANINFO                        
>   ==========                                                                            
> 
>   Vulkan Instance Version: 1.3.280                                                      
> 
> 
>   Instance Extensions: count = 14
>   -------------------------------
>   VK_EXT_debug_report                    : extension revision 10
>   VK_EXT_debug_utils                     : extension revision 2
>   VK_EXT_headless_surface                : extension revision 1
>   VK_KHR_device_group_creation           : extension revision 1
>   VK_KHR_external_fence_capabilities     : extension revision 1
>   VK_KHR_external_memory_capabilities    : extension revision 1
>   VK_KHR_external_semaphore_capabilities : extension revision 1
>   VK_KHR_get_physical_device_properties2 : extension revision 2
>   VK_KHR_get_surface_capabilities2       : extension revision 1
>   VK_KHR_portability_enumeration         : extension revision 1
>   VK_KHR_surface                         : extension revision 25
>   VK_KHR_surface_protected_capabilities  : extension revision 1
>   VK_KHR_wayland_surface                 : extension revision 6
>   VK_LUNARG_direct_driver_loading        : extension revision 1
> 
>   Instance Layers: count = 2
>   --------------------------
>   VK_LAYER_MESA_device_select Linux device selection layer 1.3.211  version 1
>   VK_LAYER_MESA_overlay       Mesa Overlay layer           1.3.211
>   version 1
> 
>   Devices:
>   ========
>   GPU0:
>           apiVersion         = 1.3.230
>           driverVersion      = 24.1.99
>           vendorID           = 0x8086
>           deviceID           = 0xa780
>           deviceType         = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU
>           deviceName         = Virtio-GPU Venus (Intel(R) Graphics (RPL-S))
>           driverID           = DRIVER_ID_MESA_VENUS
>           driverName         = venus
>           driverInfo         = Mesa 24.2.0-devel (git-0b582449f0)
>           conformanceVersion = 1.3.0.0
>           deviceUUID         = 29d2e940-a1a0-3054-0f9a-9f7dec52a084
>           driverUUID         = 3694c390-f245-612a-12ce-7d3a99127622
>   GPU1:
>           apiVersion         = 1.2.0
>           driverVersion      = 24.1.99
>           vendorID           = 0x10005
>           deviceID           = 0x0000
>           deviceType         = PHYSICAL_DEVICE_TYPE_CPU
>           deviceName         = Virtio-GPU Venus (llvmpipe (LLVM 15.0.6, 256 bits))
>           driverID           = DRIVER_ID_MESA_VENUS
>           driverName         = venus
>           driverInfo         = Mesa 24.2.0-devel (git-0b582449f0)
>           conformanceVersion = 1.3.0.0
>           deviceUUID         = 5fb5c03f-c537-f0fe-a7e6-9cd5866acb8d
>           driverUUID         = 3694c390-f245-612a-12ce-7d3a99127622
> 
> Running weston and then vkcube-wayland reports its selecting "GPU 0:
> Virtio-GPU Venus (Intel(R) Graphics (RPL-S))" but otherwise produces no
> output.
> 
> If I run with "-display sdl,gl=on,show-cursor=on" and the same other
> command line options the results for vulkaninfo are the same. However
> vkcube-wayland gets a little further and draws the initial cube on the
> screen before locking up with:
> 
>  MESA-VIRTIO: debug: stuck in fence wait with iter at xxxx
> 
> where xxxx grows each time it prints. On shutting down I see some virgl
> errors interspersed with the systemd logs:
> 
>   [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x209)
>   [  OK  ] Stopped systemd-logind.service - User Login Management.
>   virtio_gpu_virgl_process_cmd: ctrl 0x209, error 0x1200
>   [  475.257111] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x209)
>   [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x209)
>   [  OK  ] Stopped target network-online.target - Network is Online.
>   [  OK  ] Stopped target remote-fs.target - Remote File Systems.
>   [  OK  ] Stopped NetworkManager-wait-online…vice - Network Manager Wait Online.
>            Stopping avahi-daemon.service - Avahi mDNS/DNS-SD Stack...
>            Stopping cups.service - CUPS Scheduler...
>            Stopping user-runtime-dir@0.servic…er Runtime Directory /run/user/0...
>   [  OK  ] Stopped avahi-daemon.service - Avahi mDNS/DNS-SD Stack.
>   [  OK  ] Stopped cups.service - CUPS Scheduler.
>   virtio_gpu_virgl_process_cmd: ctrl 0x209, error 0x1200
>   [  475.357543] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x209)
>   [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x209)
>   [  OK  ] Stopped target network.target - Network.
>   [  OK  ] Stopped target nss-user-lookup.target - User and Group Name Lookups.
>            Stopping NetworkManager.service - Network Manager...
>            Stopping networking.service - Raise network interfaces...
>            Stopping wpa_supplicant.service - WPA supplicant...
>   [  OK  ] Stopped wpa_supplicant.service - WPA supplicant.
>   virtio_gpu_virgl_process_cmd: ctrl 0x209, error 0x1200
>   [  493.585261] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x209)
>   [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x209)
> 

I've reproduced this with qemu-system-aarch64. Vkcube works for a second
and then stops, Qemu compeltely gets frozen after closing and re-running
vkcube. Doesn't feel like this is a problem with venus, but with arm64.
For now don't know where is the bug, will take a closer look.
Dmitry Osipenko May 27, 2024, 12:07 a.m. UTC | #6
On 5/22/24 12:00, Alex Bennée wrote:
> I get a boot up with a lot of:
>   (qemu:1545322): Gdk-WARNING **: 09:26:09.470: eglMakeCurrent failed   
>   (qemu:1545322): Gdk-WARNING **: 09:26:09.470: eglMakeCurrent failed   
>   (qemu:1545322): Gdk-WARNING **: 09:26:09.470: eglMakeCurrent failed   
>   (qemu:1545322): Gdk-WARNING **: 09:26:09.470: eglMakeCurrent failed   

Have same problem with GTK and arm64/UEFI. Something is resetting
virtio-gpu device during boot (maybe UEFI fw) and it doesn't work
properly with GTK. I'd expect x86 should have same issue, but don't
recall x86 having it.
Dmitry Osipenko May 27, 2024, 12:52 a.m. UTC | #7
On 5/27/24 02:46, Dmitry Osipenko wrote:
> On 5/22/24 12:00, Alex Bennée wrote:
>> Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:
>>
>>> On 5/21/24 17:57, Alex Bennée wrote:
>>>> Alex Bennée <alex.bennee@linaro.org> writes:
>>>>
>>>>> Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> This series enables Vulkan Venus context support on virtio-gpu.
>>>>>>
>>>>>> All virglrender and almost all Linux kernel prerequisite changes
>>>>>> needed by Venus are already in upstream. For kernel there is a pending
>>>>>> KVM patchset that fixes mapping of compound pages needed for DRM drivers
>>>>>> using TTM [1], othewrwise hostmem blob mapping will fail with a KVM error
>>>>>> from Qemu.
>>>>>>
>>>>>> [1] https://lore.kernel.org/kvm/20240229025759.1187910-1-stevensd@google.com/
>>>>>>
>>>>>> You'll need to use recent Mesa version containing patch that removes
>>>>>> dependency on cross-device feature from Venus that isn't supported by
>>>>>> Qemu [2].
>>>>>>
>>>>>> [2] https://gitlab.freedesktop.org/mesa/mesa/-/commit/087e9a96d13155e26987befae78b6ccbb7ae242b
>>>>>>
>>>>>> Example Qemu cmdline that enables Venus:
>>>>>>
>>>>>>   qemu-system-x86_64 -device virtio-vga-gl,hostmem=4G,blob=true,venus=true \
>>>>>>       -machine q35,accel=kvm,memory-backend=mem1 \
>>>>>>       -object memory-backend-memfd,id=mem1,size=8G -m 8G
>>>>>
>>>>> What is the correct device for non-x86 guests? We have virtio-gpu-gl-pci
>>>>> but when doing that I get:
>>>>>
>>>>>   -device virtio-gpu-gl-pci,hostmem=4G,blob=true,venus=true
>>>>>   qemu-system-aarch64: -device virtio-gpu-gl-pci,hostmem=4G,blob=true,venus=true: opengl is not available
>>>>>
>>>>> According to 37f86af087 (virtio-gpu: move virgl realize + properties):
>>>>>
>>>>>   Drop the virgl property, the virtio-gpu-gl-device has virgl enabled no
>>>>>   matter what.  Just use virtio-gpu-device instead if you don't want
>>>>>   enable virgl and opengl.  This simplifies the logic and reduces the test
>>>>>   matrix.
>>>>>
>>>>> but that's not a good solution because that needs virtio-mmio and there
>>>>> are reasons to have a PCI device (for one thing no ambiguity about
>>>>> discovery).
>>>>
>>>> Oops my mistake forgetting:
>>>>
>>>>   --display gtk,gl=on
>>>>
>>>> Although I do see a lot of eglMakeContext failures.
>>>
>>> Please post the full Qemu cmdline you're using
>>
>> With:
>>
>>   ./qemu-system-aarch64 \
>>            -machine type=virt,virtualization=on,pflash0=rom,pflash1=efivars \
>>            -cpu neoverse-n1 \
>>            -smp 4 \
>>            -accel tcg \
>>            -device virtio-net-pci,netdev=unet \
>>            -device virtio-scsi-pci \
>>            -device scsi-hd,drive=hd \
>>            -netdev user,id=unet,hostfwd=tcp::2222-:22 \
>>            -blockdev driver=raw,node-name=hd,file.driver=host_device,file.filename=/dev/zen-ssd2/trixie-arm64,discard=unmap \
>>            -serial mon:stdio \
>>            -blockdev node-name=rom,driver=file,filename=(pwd)/pc-bios/edk2-aarch64-code.fd,read-only=true \
>>            -blockdev node-name=efivars,driver=file,filename=$HOME/images/qemu-arm64-efivars \
>>            -m 8192 \
>>            -object memory-backend-memfd,id=mem,size=8G,share=on \
>>            -device virtio-gpu-gl-pci,hostmem=4G,blob=true,venus=true \
>>            -display gtk,gl=on,show-cursor=on -vga none \
>>            -device qemu-xhci -device usb-kbd -device usb-tablet
>>
>> I get a boot up with a lot of:
>>
>>                                                                                                                                                                              
>>   (qemu:1545322): Gdk-WARNING **: 09:26:09.470: eglMakeCurrent failed                                                                                                          
>>
>>   (qemu:1545322): Gdk-WARNING **: 09:26:09.470: eglMakeCurrent failed                                                                                                          
>>
>>   (qemu:1545322): Gdk-WARNING **: 09:26:09.470: eglMakeCurrent failed                                                                                                          
>>
>>   (qemu:1545322): Gdk-WARNING **: 09:26:09.470: eglMakeCurrent failed               
>>
>> In the guest I run:
>>
>>   meson devenv -C /root/lsrc/graphics/mesa.git/build fish
>>
>> to bring in the latest Mesa (with virtio enabled). Running vulkaninfo
>> reports two cards:
>>
>>   ==========                                                                            
>>   VULKANINFO                        
>>   ==========                                                                            
>>
>>   Vulkan Instance Version: 1.3.280                                                      
>>
>>
>>   Instance Extensions: count = 14
>>   -------------------------------
>>   VK_EXT_debug_report                    : extension revision 10
>>   VK_EXT_debug_utils                     : extension revision 2
>>   VK_EXT_headless_surface                : extension revision 1
>>   VK_KHR_device_group_creation           : extension revision 1
>>   VK_KHR_external_fence_capabilities     : extension revision 1
>>   VK_KHR_external_memory_capabilities    : extension revision 1
>>   VK_KHR_external_semaphore_capabilities : extension revision 1
>>   VK_KHR_get_physical_device_properties2 : extension revision 2
>>   VK_KHR_get_surface_capabilities2       : extension revision 1
>>   VK_KHR_portability_enumeration         : extension revision 1
>>   VK_KHR_surface                         : extension revision 25
>>   VK_KHR_surface_protected_capabilities  : extension revision 1
>>   VK_KHR_wayland_surface                 : extension revision 6
>>   VK_LUNARG_direct_driver_loading        : extension revision 1
>>
>>   Instance Layers: count = 2
>>   --------------------------
>>   VK_LAYER_MESA_device_select Linux device selection layer 1.3.211  version 1
>>   VK_LAYER_MESA_overlay       Mesa Overlay layer           1.3.211
>>   version 1
>>
>>   Devices:
>>   ========
>>   GPU0:
>>           apiVersion         = 1.3.230
>>           driverVersion      = 24.1.99
>>           vendorID           = 0x8086
>>           deviceID           = 0xa780
>>           deviceType         = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU
>>           deviceName         = Virtio-GPU Venus (Intel(R) Graphics (RPL-S))
>>           driverID           = DRIVER_ID_MESA_VENUS
>>           driverName         = venus
>>           driverInfo         = Mesa 24.2.0-devel (git-0b582449f0)
>>           conformanceVersion = 1.3.0.0
>>           deviceUUID         = 29d2e940-a1a0-3054-0f9a-9f7dec52a084
>>           driverUUID         = 3694c390-f245-612a-12ce-7d3a99127622
>>   GPU1:
>>           apiVersion         = 1.2.0
>>           driverVersion      = 24.1.99
>>           vendorID           = 0x10005
>>           deviceID           = 0x0000
>>           deviceType         = PHYSICAL_DEVICE_TYPE_CPU
>>           deviceName         = Virtio-GPU Venus (llvmpipe (LLVM 15.0.6, 256 bits))
>>           driverID           = DRIVER_ID_MESA_VENUS
>>           driverName         = venus
>>           driverInfo         = Mesa 24.2.0-devel (git-0b582449f0)
>>           conformanceVersion = 1.3.0.0
>>           deviceUUID         = 5fb5c03f-c537-f0fe-a7e6-9cd5866acb8d
>>           driverUUID         = 3694c390-f245-612a-12ce-7d3a99127622
>>
>> Running weston and then vkcube-wayland reports its selecting "GPU 0:
>> Virtio-GPU Venus (Intel(R) Graphics (RPL-S))" but otherwise produces no
>> output.
>>
>> If I run with "-display sdl,gl=on,show-cursor=on" and the same other
>> command line options the results for vulkaninfo are the same. However
>> vkcube-wayland gets a little further and draws the initial cube on the
>> screen before locking up with:
>>
>>  MESA-VIRTIO: debug: stuck in fence wait with iter at xxxx
>>
>> where xxxx grows each time it prints. On shutting down I see some virgl
>> errors interspersed with the systemd logs:
>>
>>   [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x209)
>>   [  OK  ] Stopped systemd-logind.service - User Login Management.
>>   virtio_gpu_virgl_process_cmd: ctrl 0x209, error 0x1200
>>   [  475.257111] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x209)
>>   [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x209)
>>   [  OK  ] Stopped target network-online.target - Network is Online.
>>   [  OK  ] Stopped target remote-fs.target - Remote File Systems.
>>   [  OK  ] Stopped NetworkManager-wait-online…vice - Network Manager Wait Online.
>>            Stopping avahi-daemon.service - Avahi mDNS/DNS-SD Stack...
>>            Stopping cups.service - CUPS Scheduler...
>>            Stopping user-runtime-dir@0.servic…er Runtime Directory /run/user/0...
>>   [  OK  ] Stopped avahi-daemon.service - Avahi mDNS/DNS-SD Stack.
>>   [  OK  ] Stopped cups.service - CUPS Scheduler.
>>   virtio_gpu_virgl_process_cmd: ctrl 0x209, error 0x1200
>>   [  475.357543] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x209)
>>   [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x209)
>>   [  OK  ] Stopped target network.target - Network.
>>   [  OK  ] Stopped target nss-user-lookup.target - User and Group Name Lookups.
>>            Stopping NetworkManager.service - Network Manager...
>>            Stopping networking.service - Raise network interfaces...
>>            Stopping wpa_supplicant.service - WPA supplicant...
>>   [  OK  ] Stopped wpa_supplicant.service - WPA supplicant.
>>   virtio_gpu_virgl_process_cmd: ctrl 0x209, error 0x1200
>>   [  493.585261] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x209)
>>   [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x209)
>>
> 
> I've reproduced this with qemu-system-aarch64. Vkcube works for a second
> and then stops, Qemu compeltely gets frozen after closing and re-running
> vkcube. Doesn't feel like this is a problem with venus, but with arm64.
> For now don't know where is the bug, will take a closer look.

Interestingly, on another try vkcube now works with no issues.
Alex Bennée June 5, 2024, 2:47 p.m. UTC | #8
Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:

> On 5/22/24 12:00, Alex Bennée wrote:
>> Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:
>> 
>>> On 5/21/24 17:57, Alex Bennée wrote:
>>>> Alex Bennée <alex.bennee@linaro.org> writes:
>>>>
>>>>> Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> This series enables Vulkan Venus context support on virtio-gpu.
>>>>>>
>>>>>> All virglrender and almost all Linux kernel prerequisite changes
>>>>>> needed by Venus are already in upstream. For kernel there is a pending
>>>>>> KVM patchset that fixes mapping of compound pages needed for DRM drivers
>>>>>> using TTM [1], othewrwise hostmem blob mapping will fail with a KVM error
>>>>>> from Qemu.
>>>>>>
>>>>>> [1] https://lore.kernel.org/kvm/20240229025759.1187910-1-stevensd@google.com/
>>>>>>
>>>>>> You'll need to use recent Mesa version containing patch that removes
>>>>>> dependency on cross-device feature from Venus that isn't supported by
>>>>>> Qemu [2].
>>>>>>
>>>>>> [2] https://gitlab.freedesktop.org/mesa/mesa/-/commit/087e9a96d13155e26987befae78b6ccbb7ae242b
>>>>>>
>>>>>> Example Qemu cmdline that enables Venus:
>>>>>>
>>>>>>   qemu-system-x86_64 -device virtio-vga-gl,hostmem=4G,blob=true,venus=true \
>>>>>>       -machine q35,accel=kvm,memory-backend=mem1 \
>>>>>>       -object memory-backend-memfd,id=mem1,size=8G -m 8G
>>>>>
>>>>> What is the correct device for non-x86 guests? We have virtio-gpu-gl-pci
>>>>> but when doing that I get:
>>>>>
>>>>>   -device virtio-gpu-gl-pci,hostmem=4G,blob=true,venus=true
>>>>>   qemu-system-aarch64: -device virtio-gpu-gl-pci,hostmem=4G,blob=true,venus=true: opengl is not available
>>>>>
>>>>> According to 37f86af087 (virtio-gpu: move virgl realize + properties):
>>>>>
>>>>>   Drop the virgl property, the virtio-gpu-gl-device has virgl enabled no
>>>>>   matter what.  Just use virtio-gpu-device instead if you don't want
>>>>>   enable virgl and opengl.  This simplifies the logic and reduces the test
>>>>>   matrix.
>>>>>
>>>>> but that's not a good solution because that needs virtio-mmio and there
>>>>> are reasons to have a PCI device (for one thing no ambiguity about
>>>>> discovery).
>>>>
>>>> Oops my mistake forgetting:
>>>>
>>>>   --display gtk,gl=on
>>>>
>>>> Although I do see a lot of eglMakeContext failures.
>>>
>>> Please post the full Qemu cmdline you're using
>> 
>> With:
>> 
>>   ./qemu-system-aarch64 \
>>            -machine type=virt,virtualization=on,pflash0=rom,pflash1=efivars \
>>            -cpu neoverse-n1 \
>>            -smp 4 \
>>            -accel tcg \
>>            -device virtio-net-pci,netdev=unet \
>>            -device virtio-scsi-pci \
>>            -device scsi-hd,drive=hd \
>>            -netdev user,id=unet,hostfwd=tcp::2222-:22 \
>>            -blockdev driver=raw,node-name=hd,file.driver=host_device,file.filename=/dev/zen-ssd2/trixie-arm64,discard=unmap \
>>            -serial mon:stdio \
>>            -blockdev node-name=rom,driver=file,filename=(pwd)/pc-bios/edk2-aarch64-code.fd,read-only=true \
>>            -blockdev node-name=efivars,driver=file,filename=$HOME/images/qemu-arm64-efivars \
>>            -m 8192 \
>>            -object memory-backend-memfd,id=mem,size=8G,share=on \
>>            -device virtio-gpu-gl-pci,hostmem=4G,blob=true,venus=true \
>>            -display gtk,gl=on,show-cursor=on -vga none \
>>            -device qemu-xhci -device usb-kbd -device usb-tablet
>> 
>> I get a boot up with a lot of:
>> 
>>                                                                                                                                                                              
>>   (qemu:1545322): Gdk-WARNING **: 09:26:09.470: eglMakeCurrent failed                                                                                                          
>> 
>>   (qemu:1545322): Gdk-WARNING **: 09:26:09.470: eglMakeCurrent failed                                                                                                          
>> 
>>   (qemu:1545322): Gdk-WARNING **: 09:26:09.470: eglMakeCurrent failed                                                                                                          
>> 
>>   (qemu:1545322): Gdk-WARNING **: 09:26:09.470: eglMakeCurrent failed               
>> 
>> In the guest I run:
>> 
>>   meson devenv -C /root/lsrc/graphics/mesa.git/build fish
>> 
>> to bring in the latest Mesa (with virtio enabled). Running vulkaninfo
>> reports two cards:
>> 
>>   ==========                                                                            
>>   VULKANINFO                        
>>   ==========                                                                            
>> 
>>   Vulkan Instance Version: 1.3.280                                                      
>> 
>> 
>>   Instance Extensions: count = 14
>>   -------------------------------
>>   VK_EXT_debug_report                    : extension revision 10
>>   VK_EXT_debug_utils                     : extension revision 2
>>   VK_EXT_headless_surface                : extension revision 1
>>   VK_KHR_device_group_creation           : extension revision 1
>>   VK_KHR_external_fence_capabilities     : extension revision 1
>>   VK_KHR_external_memory_capabilities    : extension revision 1
>>   VK_KHR_external_semaphore_capabilities : extension revision 1
>>   VK_KHR_get_physical_device_properties2 : extension revision 2
>>   VK_KHR_get_surface_capabilities2       : extension revision 1
>>   VK_KHR_portability_enumeration         : extension revision 1
>>   VK_KHR_surface                         : extension revision 25
>>   VK_KHR_surface_protected_capabilities  : extension revision 1
>>   VK_KHR_wayland_surface                 : extension revision 6
>>   VK_LUNARG_direct_driver_loading        : extension revision 1
>> 
>>   Instance Layers: count = 2
>>   --------------------------
>>   VK_LAYER_MESA_device_select Linux device selection layer 1.3.211  version 1
>>   VK_LAYER_MESA_overlay       Mesa Overlay layer           1.3.211
>>   version 1
>> 
>>   Devices:
>>   ========
>>   GPU0:
>>           apiVersion         = 1.3.230
>>           driverVersion      = 24.1.99
>>           vendorID           = 0x8086
>>           deviceID           = 0xa780
>>           deviceType         = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU
>>           deviceName         = Virtio-GPU Venus (Intel(R) Graphics (RPL-S))
>>           driverID           = DRIVER_ID_MESA_VENUS
>>           driverName         = venus
>>           driverInfo         = Mesa 24.2.0-devel (git-0b582449f0)
>>           conformanceVersion = 1.3.0.0
>>           deviceUUID         = 29d2e940-a1a0-3054-0f9a-9f7dec52a084
>>           driverUUID         = 3694c390-f245-612a-12ce-7d3a99127622
>>   GPU1:
>>           apiVersion         = 1.2.0
>>           driverVersion      = 24.1.99
>>           vendorID           = 0x10005
>>           deviceID           = 0x0000
>>           deviceType         = PHYSICAL_DEVICE_TYPE_CPU
>>           deviceName         = Virtio-GPU Venus (llvmpipe (LLVM 15.0.6, 256 bits))
>>           driverID           = DRIVER_ID_MESA_VENUS
>>           driverName         = venus
>>           driverInfo         = Mesa 24.2.0-devel (git-0b582449f0)
>>           conformanceVersion = 1.3.0.0
>>           deviceUUID         = 5fb5c03f-c537-f0fe-a7e6-9cd5866acb8d
>>           driverUUID         = 3694c390-f245-612a-12ce-7d3a99127622
>> 
>> Running weston and then vkcube-wayland reports its selecting "GPU 0:
>> Virtio-GPU Venus (Intel(R) Graphics (RPL-S))" but otherwise produces no
>> output.
>> 
>> If I run with "-display sdl,gl=on,show-cursor=on" and the same other
>> command line options the results for vulkaninfo are the same. However
>> vkcube-wayland gets a little further and draws the initial cube on the
>> screen before locking up with:
>> 
>>  MESA-VIRTIO: debug: stuck in fence wait with iter at xxxx
>> 
>> where xxxx grows each time it prints. On shutting down I see some virgl
>> errors interspersed with the systemd logs:
>> 
>>   [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x209)
>>   [  OK  ] Stopped systemd-logind.service - User Login Management.
>>   virtio_gpu_virgl_process_cmd: ctrl 0x209, error 0x1200
>>   [  475.257111] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x209)
>>   [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x209)
>>   [  OK  ] Stopped target network-online.target - Network is Online.
>>   [  OK  ] Stopped target remote-fs.target - Remote File Systems.
>>   [  OK  ] Stopped NetworkManager-wait-online…vice - Network Manager Wait Online.
>>            Stopping avahi-daemon.service - Avahi mDNS/DNS-SD Stack...
>>            Stopping cups.service - CUPS Scheduler...
>>            Stopping user-runtime-dir@0.servic…er Runtime Directory /run/user/0...
>>   [  OK  ] Stopped avahi-daemon.service - Avahi mDNS/DNS-SD Stack.
>>   [  OK  ] Stopped cups.service - CUPS Scheduler.
>>   virtio_gpu_virgl_process_cmd: ctrl 0x209, error 0x1200
>>   [  475.357543] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x209)
>>   [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x209)
>>   [  OK  ] Stopped target network.target - Network.
>>   [  OK  ] Stopped target nss-user-lookup.target - User and Group Name Lookups.
>>            Stopping NetworkManager.service - Network Manager...
>>            Stopping networking.service - Raise network interfaces...
>>            Stopping wpa_supplicant.service - WPA supplicant...
>>   [  OK  ] Stopped wpa_supplicant.service - WPA supplicant.
>>   virtio_gpu_virgl_process_cmd: ctrl 0x209, error 0x1200
>>   [  493.585261] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x209)
>>   [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x209)
>> 
>
> I've reproduced this with qemu-system-aarch64. Vkcube works for a second
> and then stops, Qemu compeltely gets frozen after closing and re-running
> vkcube. Doesn't feel like this is a problem with venus, but with arm64.
> For now don't know where is the bug, will take a closer look.

I'm guessing some sort of resource leak, if I run vkcube-wayland in the
guest it complains about being stuck on a fence with the iterator going
up. However on the host I see:

  virtio_gpu_fence_ctrl fence 0x13f1, type 0x207
  virtio_gpu_fence_ctrl fence 0x13f2, type 0x207
  virtio_gpu_fence_resp fence 0x13f1
  virtio_gpu_fence_resp fence 0x13f2
  virtio_gpu_fence_ctrl fence 0x13f3, type 0x207
  virtio_gpu_fence_ctrl fence 0x13f4, type 0x207
  virtio_gpu_fence_resp fence 0x13f3
  virtio_gpu_fence_resp fence 0x13f4
  virtio_gpu_fence_ctrl fence 0x13f5, type 0x207
  virtio_gpu_fence_ctrl fence 0x13f6, type 0x207
  virtio_gpu_fence_resp fence 0x13f5
  virtio_gpu_fence_resp fence 0x13f6
  virtio_gpu_fence_ctrl fence 0x13f7, type 0x207
  virtio_gpu_fence_ctrl fence 0x13f8, type 0x207
  virtio_gpu_fence_resp fence 0x13f7
  virtio_gpu_fence_resp fence 0x13f8
  virtio_gpu_fence_ctrl fence 0x13f9, type 0x204
  virtio_gpu_fence_resp fence 0x13f9

which looks like its going ok. However when I git Ctrl-C in the guest it
kills QEMU:

  virtio_gpu_fence_ctrl fence 0x13fc, type 0x207
  virtio_gpu_fence_ctrl fence 0x13fd, type 0x207
  virtio_gpu_fence_ctrl fence 0x13fe, type 0x204
  virtio_gpu_fence_ctrl fence 0x13ff, type 0x207
  virtio_gpu_fence_ctrl fence 0x1400, type 0x207
  virtio_gpu_fence_resp fence 0x13fc
  virtio_gpu_fence_resp fence 0x13fd
  virtio_gpu_fence_resp fence 0x13fe
  virtio_gpu_fence_resp fence 0x13ff
  virtio_gpu_fence_resp fence 0x1400
  qemu-system-aarch64: ../../subprojects/virglrenderer/src/virglrenderer.c:1282: virgl_renderer_resource_unmap: Assertion `!ret' failed.
  fish: Job 1, './qemu-system-aarch64 \' terminated by signal     -machine type=virt,virtuali… (    -cpu neoverse-n1 \)
  fish: Job     -smp 4 \, '    -accel tcg \' terminated by signal     -device virtio-net-pci,netd… (    -device virtio-scsi-pci \)
  fish: Job     -device scsi-hd,drive=hd \, '    -netdev user,id=unet,hostfw…' terminated by signal     -blockdev driver=raw,node-n… (    -serial mon:stdio \)
  fish: Job     -blockdev node-name=rom,dri…, '    -blockdev node-name=efivars…' terminated by signal     -m 8192 \ (    -object memory-backend-memf…)
  fish: Job     -device virtio-gpu-gl-pci,h…, '    -display sdl,gl=on,show-cur…' terminated by signal     -device qemu-xhci -device u… (    -kernel /home/alex/lsrc/lin…)
  fish: Job     -d guest_errors,unimp,trace…, 'SIGABRT' terminated by signal Abort ()

The backtrace (and the 18G size of the core file!) indicates a leak:

  (gdb) bt
  #0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
  #1  0x00007f0fa68a9e8f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
  #2  0x00007f0fa685afb2 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
  #3  0x00007f0fa6845472 in __GI_abort () at ./stdlib/abort.c:79
  #4  0x00007f0fa6845395 in __assert_fail_base
      (fmt=0x7f0fa69b9a90 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x55c3e1b0762d "!ret", file=file@entry=0x55c3e1d306f0 "../../subprojects/virglrenderer/src/virglrenderer.c", line=line@entry=1282, function=function@entry=0x55c3e1d30910 <__PRETTY_FUNCTION__.2> "virgl_renderer_resource_unmap") at ./assert/assert.c:92
  #5  0x00007f0fa6853eb2 in __GI___assert_fail
      (assertion=assertion@entry=0x55c3e1b0762d "!ret", file=file@entry=0x55c3e1d306f0 "../../subprojects/virglrenderer/src/virglrenderer.c", line=line@entry=1282, function=function@entry=0x55c3e1d30910 <__PRETTY_FUNCTION__.2> "virgl_renderer_resource_unmap") at ./assert/assert.c:101
  #6  0x000055c3e1958b50 in virgl_renderer_resource_unmap (res_handle=<optimized out>) at ../../subprojects/virglrenderer/src/virglrenderer.c:1282
  #7  0x000055c3e13d8507 in virtio_gpu_virgl_unmap_resource_blob (g=g@entry=0x55c3e5fed600, res=0x55c3e6e67b60, cmd_suspended=cmd_suspended@entry=0x7ffd5d720aaf)
      at ../../hw/display/virtio-gpu-virgl.c:188
  #8  0x000055c3e13d9af4 in virgl_cmd_resource_unmap_blob (cmd_suspended=0x7ffd5d720aaf, cmd=0x55c3e5bd9710, g=0x55c3e5fed600) at ../../hw/display/virtio-gpu-virgl.c:797
  #9  virtio_gpu_virgl_process_cmd (g=0x55c3e5fed600, cmd=0x55c3e5bd9710) at ../../hw/display/virtio-gpu-virgl.c:979
  #10 0x000055c3e13d6019 in virtio_gpu_process_cmdq (g=0x55c3e5fed600) at ../../hw/display/virtio-gpu.c:1055
  #11 0x000055c3e190c646 in aio_bh_poll (ctx=ctx@entry=0x55c3e4c03710) at ../../util/async.c:218
  #12 0x000055c3e18f562e in aio_dispatch (ctx=0x55c3e4c03710) at ../../util/aio-posix.c:423
  #13 0x000055c3e190c2ce in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at ../../util/async.c:360
  #14 0x00007f0fa8b047a9 in g_main_context_dispatch () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
  #15 0x000055c3e190db78 in glib_pollfds_poll () at ../../util/main-loop.c:287
  #16 os_host_main_loop_wait (timeout=1882878) at ../../util/main-loop.c:310
  #17 main_loop_wait (nonblocking=nonblocking@entry=0) at ../../util/main-loop.c:589
  #18 0x000055c3e1348ac9 in qemu_main_loop () at ../../system/runstate.c:796
  #19 0x000055c3e174f786 in qemu_default_main () at ../../system/main.c:37
  #20 0x00007f0fa684624a in __libc_start_call_main (main=main@entry=0x55c3e10286e0 <main>, argc=argc@entry=47, argv=argv@entry=0x7ffd5d720f18)
      at ../sysdeps/nptl/libc_start_call_main.h:58
  #21 0x00007f0fa6846305 in __libc_start_main_impl
      (main=0x55c3e10286e0 <main>, argc=47, argv=0x7ffd5d720f18, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffd5d720f08)
      at ../csu/libc-start.c:360
  #22 0x000055c3e102a3f1 in _start ()
Dmitry Osipenko June 5, 2024, 4:29 p.m. UTC | #9
On 6/5/24 17:47, Alex Bennée wrote:
....
> I'm guessing some sort of resource leak, if I run vkcube-wayland in the
> guest it complains about being stuck on a fence with the iterator going
> up. However on the host I see:
> 
>   virtio_gpu_fence_ctrl fence 0x13f1, type 0x207
>   virtio_gpu_fence_ctrl fence 0x13f2, type 0x207
>   virtio_gpu_fence_resp fence 0x13f1
>   virtio_gpu_fence_resp fence 0x13f2
>   virtio_gpu_fence_ctrl fence 0x13f3, type 0x207
>   virtio_gpu_fence_ctrl fence 0x13f4, type 0x207
>   virtio_gpu_fence_resp fence 0x13f3
>   virtio_gpu_fence_resp fence 0x13f4
>   virtio_gpu_fence_ctrl fence 0x13f5, type 0x207
>   virtio_gpu_fence_ctrl fence 0x13f6, type 0x207
>   virtio_gpu_fence_resp fence 0x13f5
>   virtio_gpu_fence_resp fence 0x13f6
>   virtio_gpu_fence_ctrl fence 0x13f7, type 0x207
>   virtio_gpu_fence_ctrl fence 0x13f8, type 0x207
>   virtio_gpu_fence_resp fence 0x13f7
>   virtio_gpu_fence_resp fence 0x13f8
>   virtio_gpu_fence_ctrl fence 0x13f9, type 0x204
>   virtio_gpu_fence_resp fence 0x13f9
> 
> which looks like its going ok. However when I git Ctrl-C in the guest it
> kills QEMU:
> 
>   virtio_gpu_fence_ctrl fence 0x13fc, type 0x207
>   virtio_gpu_fence_ctrl fence 0x13fd, type 0x207
>   virtio_gpu_fence_ctrl fence 0x13fe, type 0x204
>   virtio_gpu_fence_ctrl fence 0x13ff, type 0x207
>   virtio_gpu_fence_ctrl fence 0x1400, type 0x207
>   virtio_gpu_fence_resp fence 0x13fc
>   virtio_gpu_fence_resp fence 0x13fd
>   virtio_gpu_fence_resp fence 0x13fe
>   virtio_gpu_fence_resp fence 0x13ff
>   virtio_gpu_fence_resp fence 0x1400
>   qemu-system-aarch64: ../../subprojects/virglrenderer/src/virglrenderer.c:1282: virgl_renderer_resource_unmap: Assertion `!ret' failed.
>   fish: Job 1, './qemu-system-aarch64 \' terminated by signal     -machine type=virt,virtuali… (    -cpu neoverse-n1 \)
>   fish: Job     -smp 4 \, '    -accel tcg \' terminated by signal     -device virtio-net-pci,netd… (    -device virtio-scsi-pci \)
>   fish: Job     -device scsi-hd,drive=hd \, '    -netdev user,id=unet,hostfw…' terminated by signal     -blockdev driver=raw,node-n… (    -serial mon:stdio \)
>   fish: Job     -blockdev node-name=rom,dri…, '    -blockdev node-name=efivars…' terminated by signal     -m 8192 \ (    -object memory-backend-memf…)
>   fish: Job     -device virtio-gpu-gl-pci,h…, '    -display sdl,gl=on,show-cur…' terminated by signal     -device qemu-xhci -device u… (    -kernel /home/alex/lsrc/lin…)
>   fish: Job     -d guest_errors,unimp,trace…, 'SIGABRT' terminated by signal Abort ()
> 
> The backtrace (and the 18G size of the core file!) indicates a leak:

The unmap debug-assert tells that BO wasn't mapped because mapping
failed, likely due to OOM. You won't hit this abort with a release build
of libvirglrenderer. The leak likely happens due to unsignalled fence.

Please try to run vkcube with disabled fence-feedback feature:

 # VN_PERF_NO_FENCE_FEEDBACK=1 vkcube-wayland

It fixes hang for me. We had problems with combination of this Venus
optimization feature + Intel ANV driver for a long time and hoped that
it's fixed by now, apparently the issue was only masked.
Alex Bennée June 5, 2024, 5:37 p.m. UTC | #10
Dmitry Osipenko <dmitry.osipenko@collabora.com> writes:

> On 6/5/24 17:47, Alex Bennée wrote:
> ....
>> I'm guessing some sort of resource leak, if I run vkcube-wayland in the
>> guest it complains about being stuck on a fence with the iterator going
>> up. However on the host I see:
>> 
<snip>
>> 
>> The backtrace (and the 18G size of the core file!) indicates a leak:
>
> The unmap debug-assert tells that BO wasn't mapped because mapping
> failed, likely due to OOM. You won't hit this abort with a release build
> of libvirglrenderer.

AFAIK I should be building a release build (or at least I hope that is
what the wrapper I posted does):

  Message-Id: <20240605133527.529950-1-alex.bennee@linaro.org>
  Date: Wed,  5 Jun 2024 14:35:27 +0100
  Subject: [RFC PATCH] subprojects: add a wrapper for libvirglrenderer
  From: =?UTF-8?q?Alex=20Benn=C3=A9e?= <alex.bennee@linaro.org>

Maybe I need to explicitly set builtype=release in the default options?

> The leak likely happens due to unsignalled fence.
>
> Please try to run vkcube with disabled fence-feedback feature:
>
>  # VN_PERF_NO_FENCE_FEEDBACK=1 vkcube-wayland
>
> It fixes hang for me. We had problems with combination of this Venus
> optimization feature + Intel ANV driver for a long time and hoped that
> it's fixed by now, apparently the issue was only masked.

That doesn't help, still causes the crash:

  virtio_gpu_fence_ctrl fence 0xdfd, type 0x204                                         
  virtio_gpu_fence_ctrl fence 0xdfe, type 0x207                            
  virtio_gpu_fence_ctrl fence 0xdff, type 0x207                        
  virtio_gpu_fence_ctrl fence 0xe00, type 0x207                              
  virtio_gpu_fence_ctrl fence 0xe01, type 0x207                       
  virtio_gpu_fence_ctrl fence 0xe02, type 0x207                                
  virtio_gpu_fence_ctrl fence 0xe03, type 0x207                  
  virtio_gpu_fence_resp fence 0xdfd                                                     
  virtio_gpu_fence_resp fence 0xdfe                                                     
  virtio_gpu_fence_resp fence 0xdff                                                     
  virtio_gpu_fence_resp fence 0xe00                                                     
  virtio_gpu_fence_resp fence 0xe01                                                     
  virtio_gpu_fence_resp fence 0xe02                                                     
  virtio_gpu_fence_resp fence 0xe03                                                     
  stats: vq req  100,   7 -- 3D   25 (19560)                                            
  vrend_renderer_resource_unmap: invalid bits 0x83                              
  virgl_renderer_resource_unmap: unexpected ret = -22, pipe:0x555559e5d0c0 fd_type:0

  Thread 1 "qemu-system-aar" received signal SIGABRT, Aborted.                                                                                                                 
  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
  44      ./nptl/pthread_kill.c: No such file or directory.                          

Which I think means VREND_STORAGE_GL_MEMOBJ | VREND_STORAGE_GL_TEXTURE |
VREND_STORAGE_GUEST_MEMORY

(I note the sense of has_bits is meant to be mask, bit but I don't think
that makes any difference)