mbox series

[v3,00/10] vhost-user: Lift Max Ram Slots Limitation

Message ID 1588473683-27067-1-git-send-email-raphael.norwitz@nutanix.com (mailing list archive)
Headers show
Series vhost-user: Lift Max Ram Slots Limitation | expand

Message

Raphael Norwitz May 19, 2020, 12:25 p.m. UTC
In QEMU today, a VM with a vhost-user device can hot add memory a
maximum of 8 times. See these threads, among others:

[1] https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01046.html
    https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01236.html

[2] https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg04656.html

This series introduces a new protocol feature
VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS which, when enabled, lifts the
restriction on the maximum number RAM slots imposed by vhost-user.

Without vhost-user, a Qemu VM can support 256 ram slots (for ACPI targets),
or potentially more (the KVM max is 512). With each region, a file descriptor
must be sent over the socket. If that many regions are sent in a single message
there could be upwards of 256 file descriptors being opened in the backend process
at once. Opening that many fds could easily push the process past the open fd limit,
especially considering one backend process could have multiple vhost threads,
exposing different devices to different Qemu instances. Therefore to safely lift the
limit, transmitting regions should be split up over multiple messages.

In addition, the VHOST_USER_SET_MEM_TABLE message was not reused because
as the number of regions grows, the message becomes very large. In practice, such
large messages caused problems (truncated messages) and in the past it seems
the community has opted for smaller fixed size messages where possible. VRINGs,
for example, are sent to the backend individually instead of in one massive
message.

The implementation details are explained in more detail in the commit
messages, but at a high level the new protocol feature works as follows:
- If the VHOST_USER_PROTCOL_F_CONFIGURE_MEM_SLOTS feature is enabled,
  QEMU will send multiple VHOST_USER_ADD_MEM_REG and
  VHOST_USER_REM_MEM_REG messages to map and unmap individual memory
 regions instead of one large VHOST_USER_SET_MEM_TABLE message containing
  all memory regions.
- The vhost-user struct maintains a ’shadow state’ of memory regions
  already sent to the guest. Each time vhost_user_set_mem_table is called,
  the shadow state is compared with the new device state. A
  VHOST_USER_REM_MEM_REG will be sent for each region in the shadow state
  not in the device state. Then, a VHOST_USER_ADD_MEM_REG will be sent
  for each region in the device state but not the shadow state. After
  these messages have been sent, the shadow state will be updated to
  reflect the new device state.

The series consists of 10 changes:
1. Add helper to populate vhost-user message regions:
    This change adds a helper to populate a VhostUserMemoryRegion from a
    struct vhost_memory_region, which needs to be done in multiple places in
    in this series.

2. Add vhost-user helper to get MemoryRegion data
    This changes adds a helper to get a pointer to a MemoryRegion struct, along
    with it's offset address and associated file descriptor. This helper is used to
    simplify other vhost-user code paths and will be needed elsewhere in this
    series.

3. Add VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
    This change adds the VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
    protocol feature. At this point, if negotiated, the feature only allows the
    backend to limit the number of max ram slots to a number less than
    VHOST_MEMORY_MAX_NREGIONS = 8.

4. Transmit vhost-user memory regions individually
    With this change, if the VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
    protocol feature is enabled, Qemu will send regions to the backend using
    individual VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG
    messages.
    The max number of ram slots supported is still limited to 8.

5. Lift max memory slots imposed by vhost-user
    With this change, if the VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
    protocol feature is enabled, the backend can support a configurable number of
    ram slots up to the maximum allowed by the target platform.

6. Refactor out libvhost-user fault generation logic
    This cleanup moves some logic from vu_set_mem_table_exec_postcopy() to a
    separate helper, which will be needed elsewhere.

7. Support ram slot configuration in libvhost-user
   This change adds support for processing VHOST_USER_GET_MAX_MEMSLOTS
    messages in libvhost-user.
    The VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS protocol is not yet
    enabled in libvhost-user, so at this point this change is non-functional.

8. Support adding individual regions in libvhost-user
    This change adds libvhost-user support for mapping in new memory regions
    when receiving VHOST_USER_ADD_MEM_REG messages.
    The VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS protocol is not yet
    enabled in libvhost-user, so at this point this change is non-functional.

9. Support individual region unmap in libvhost-user
    This change adds libvhost-user support for unmapping removed memory regions
    when receiving VHOST_USER_REM_MEM_REG messages.
    The VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS protocol is not yet
    enabled in libvhost-user, so at this point this change is non-functional.

10. Lift max ram slots limit in libvhost-user
   This change makes libvhost-user try to negotiate the
   VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS, and adds support for
   backends built using libvhost-user to support hot adding memory up to the
   32 times.

The changes were tested with the vhost-user-bridge sample.

Changes since V2:
    * Add support for VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
       for backends build with libvhost-user
    * Add support for postcopy live-migration when the
       VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS protocol feature has
       been negotiated.
    * Add support for backends which want to support both
       VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS and
       VHOST_USER_PROTOCOL_F_REPLY_ACK
    * Change feature name from VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS
        to VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS, and any associated
        variable names.
    *Log a more descriptive message if the backend lowers the max ram slots limit
       on reconnect.

Changes since V1:
    * Kept the assert in vhost_user_set_mem_table_postcopy, but moved it
      to prevent corruption
    * Made QEMU send a single VHOST_USER_GET_MAX_MEMSLOTS message at
      startup and cache the returned value so that QEMU does not need to
      query the backend every time vhost_backend_memslots_limit is called.

Best,
Raphael

Raphael Norwitz (10):
  Add helper to populate vhost-user message regions
  Add vhost-user helper to get MemoryRegion data
  Add VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
  Transmit vhost-user memory regions individually
  Lift max memory slots limit imposed by vhost-user
  Refactor out libvhost-user fault generation logic
  Support ram slot configuration in libvhost-user
  Support adding individual regions in libvhost-user
  Support individual region unmap in libvhost-user
  Lift max ram slots limit in libvhost-user

 contrib/libvhost-user/libvhost-user.c | 341 ++++++++++++++----
 contrib/libvhost-user/libvhost-user.h |  24 +-
 docs/interop/vhost-user.rst           |  44 +++
 hw/virtio/vhost-user.c                | 634 ++++++++++++++++++++++++++++------
 include/hw/virtio/vhost-user.h        |   1 +
 5 files changed, 869 insertions(+), 175 deletions(-)

Comments

no-reply@patchew.org May 19, 2020, 4:46 p.m. UTC | #1
Patchew URL: https://patchew.org/QEMU/1588473683-27067-1-git-send-email-raphael.norwitz@nutanix.com/



Hi,

This series failed the asan build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-debug@fedora TARGET_LIST=x86_64-softmmu J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC      backends/rng-builtin.o
  CC      backends/tpm.o
  CC      backends/rng-random.o
/tmp/qemu-test/src/contrib/libvhost-user/libvhost-user.c:671:42: error: taking address of packed member 'payload' of class or structure 'VhostUserMsg' may result in an unaligned pointer value [-Werror,-Waddress-of-packed-member]
    VhostUserMemoryRegion *msg_region = &vmsg->payload.memreg.region;
                                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/tmp/qemu-test/src/contrib/libvhost-user/libvhost-user.c:784:42: error: taking address of packed member 'payload' of class or structure 'VhostUserMsg' may result in an unaligned pointer value [-Werror,-Waddress-of-packed-member]
    VhostUserMemoryRegion *msg_region = &vmsg->payload.memreg.region;
                                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~
2 errors generated.
make: *** [/tmp/qemu-test/src/rules.mak:69: contrib/libvhost-user/libvhost-user.o] Error 1
make: *** Waiting for unfinished jobs....
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 664, in <module>
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=191c6e078de943cda1edffe21640ae27', '-u', '1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=x86_64-softmmu', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-rh8txb3w/src/docker-src.2020-05-19-12.42.53.27187:/var/tmp/qemu:z,ro', 'qemu:fedora', '/var/tmp/qemu/run', 'test-debug']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=191c6e078de943cda1edffe21640ae27
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-rh8txb3w/src'
make: *** [docker-run-test-debug@fedora] Error 2

real    3m21.201s
user    0m8.007s


The full log is available at
http://patchew.org/logs/1588473683-27067-1-git-send-email-raphael.norwitz@nutanix.com/testing.asan/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com