mbox series

[v4,00/10] vhost-user: Lift Max Ram Slots Limitation

Message ID 1588533678-23450-1-git-send-email-raphael.norwitz@nutanix.com (mailing list archive)
Headers show
Series vhost-user: Lift Max Ram Slots Limitation | expand

Message

Raphael Norwitz May 21, 2020, 5 a.m. UTC
In QEMU today, a VM with a vhost-user device can hot add memory a
maximum of 8 times. See these threads, among others:

[1] https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01046.html
    https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01236.html

[2] https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg04656.html

This series introduces a new protocol feature
VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS which, when enabled, lifts the
restriction on the maximum number RAM slots imposed by vhost-user.

Without vhost-user, a Qemu VM can support 256 ram slots (for ACPI targets),
or potentially more (the KVM max is 512). With each region, a file descriptor
must be sent over the socket. If that many regions are sent in a single message
there could be upwards of 256 file descriptors being opened in the backend process
at once. Opening that many fds could easily push the process past the open fd limit,
especially considering one backend process could have multiple vhost threads,
exposing different devices to different Qemu instances. Therefore to safely lift the
limit, transmitting regions should be split up over multiple messages.

In addition, the VHOST_USER_SET_MEM_TABLE message was not reused because
as the number of regions grows, the message becomes very large. In practice, such
large messages caused problems (truncated messages) and in the past it seems
the community has opted for smaller fixed size messages where possible. VRINGs,
for example, are sent to the backend individually instead of in one massive
message.

The implementation details are explained in more detail in the commit
messages, but at a high level the new protocol feature works as follows:
- If the VHOST_USER_PROTCOL_F_CONFIGURE_MEM_SLOTS feature is enabled,
  QEMU will send multiple VHOST_USER_ADD_MEM_REG and
  VHOST_USER_REM_MEM_REG messages to map and unmap individual memory
 regions instead of one large VHOST_USER_SET_MEM_TABLE message containing
  all memory regions.
- The vhost-user struct maintains a ’shadow state’ of memory regions
  already sent to the guest. Each time vhost_user_set_mem_table is called,
  the shadow state is compared with the new device state. A
  VHOST_USER_REM_MEM_REG will be sent for each region in the shadow state
  not in the device state. Then, a VHOST_USER_ADD_MEM_REG will be sent
  for each region in the device state but not the shadow state. After
  these messages have been sent, the shadow state will be updated to
  reflect the new device state.

The series consists of 10 changes:
1. Add helper to populate vhost-user message regions:
    This change adds a helper to populate a VhostUserMemoryRegion from a
    struct vhost_memory_region, which needs to be done in multiple places in
    in this series.

2. Add vhost-user helper to get MemoryRegion data
    This changes adds a helper to get a pointer to a MemoryRegion struct, along
    with it's offset address and associated file descriptor. This helper is used to
    simplify other vhost-user code paths and will be needed elsewhere in this
    series.

3. Add VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
    This change adds the VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
    protocol feature. At this point, if negotiated, the feature only allows the
    backend to limit the number of max ram slots to a number less than
    VHOST_MEMORY_MAX_NREGIONS = 8.

4. Transmit vhost-user memory regions individually
    With this change, if the VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
    protocol feature is enabled, Qemu will send regions to the backend using
    individual VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG
    messages.
    The max number of ram slots supported is still limited to 8.

5. Lift max memory slots imposed by vhost-user
    With this change, if the VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
    protocol feature is enabled, the backend can support a configurable number of
    ram slots up to the maximum allowed by the target platform.

6. Refactor out libvhost-user fault generation logic
    This cleanup moves some logic from vu_set_mem_table_exec_postcopy() to a
    separate helper, which will be needed elsewhere.

7. Support ram slot configuration in libvhost-user
   This change adds support for processing VHOST_USER_GET_MAX_MEMSLOTS
    messages in libvhost-user.
    The VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS protocol is not yet
    enabled in libvhost-user, so at this point this change is non-functional.

8. Support adding individual regions in libvhost-user
    This change adds libvhost-user support for mapping in new memory regions
    when receiving VHOST_USER_ADD_MEM_REG messages.
    The VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS protocol is not yet
    enabled in libvhost-user, so at this point this change is non-functional.

9. Support individual region unmap in libvhost-user
    This change adds libvhost-user support for unmapping removed memory regions
    when receiving VHOST_USER_REM_MEM_REG messages.
    The VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS protocol is not yet
    enabled in libvhost-user, so at this point this change is non-functional.

10. Lift max ram slots limit in libvhost-user
   This change makes libvhost-user try to negotiate the
   VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS, and adds support for
   backends built using libvhost-user to support hot adding memory up to the
   32 times.

The changes were tested with the vhost-user-bridge sample.

Changes since V3:
    * Fixed compiler warnings caused by using pointers to packed elements
       (flagged by patchew building with -Waddress-of-packed-member)

Changes since V2:
    * Add support for VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
       for backends build with libvhost-user
    * Add support for postcopy live-migration when the
       VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS protocol feature has
       been negotiated.
    * Add support for backends which want to support both
       VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS and
       VHOST_USER_PROTOCOL_F_REPLY_ACK
    * Change feature name from VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS
        to VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS, and any associated
        variable names.
    *Log a more descriptive message if the backend lowers the max ram slots limit
       on reconnect.

Changes since V1:
    * Kept the assert in vhost_user_set_mem_table_postcopy, but moved it
      to prevent corruption
    * Made QEMU send a single VHOST_USER_GET_MAX_MEMSLOTS message at
      startup and cache the returned value so that QEMU does not need to
      query the backend every time vhost_backend_memslots_limit is called.

Best,
Raphael

Raphael Norwitz (10):
  Add helper to populate vhost-user message regions
  Add vhost-user helper to get MemoryRegion data
  Add VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
  Transmit vhost-user memory regions individually
  Lift max memory slots limit imposed by vhost-user
  Refactor out libvhost-user fault generation logic
  Support ram slot configuration in libvhost-user
  Support adding individual regions in libvhost-user
  Support individual region unmap in libvhost-user
  Lift max ram slots limit in libvhost-user

 contrib/libvhost-user/libvhost-user.c | 341 ++++++++++++++----
 contrib/libvhost-user/libvhost-user.h |  24 +-
 docs/interop/vhost-user.rst           |  44 +++
 hw/virtio/vhost-user.c                | 638 ++++++++++++++++++++++++++++------
 include/hw/virtio/vhost-user.h        |   1 +
 5 files changed, 873 insertions(+), 175 deletions(-)

Comments

Raphael Norwitz June 4, 2020, 4:11 a.m. UTC | #1
ping

On Thu, May 21, 2020 at 1:00 AM Raphael Norwitz
<raphael.norwitz@nutanix.com> wrote:
>
> In QEMU today, a VM with a vhost-user device can hot add memory a
> maximum of 8 times. See these threads, among others:
>
> [1] https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01046.html
>     https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01236.html
>
> [2] https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg04656.html
>
> This series introduces a new protocol feature
> VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS which, when enabled, lifts the
> restriction on the maximum number RAM slots imposed by vhost-user.
>
> Without vhost-user, a Qemu VM can support 256 ram slots (for ACPI targets),
> or potentially more (the KVM max is 512). With each region, a file descriptor
> must be sent over the socket. If that many regions are sent in a single message
> there could be upwards of 256 file descriptors being opened in the backend process
> at once. Opening that many fds could easily push the process past the open fd limit,
> especially considering one backend process could have multiple vhost threads,
> exposing different devices to different Qemu instances. Therefore to safely lift the
> limit, transmitting regions should be split up over multiple messages.
>
> In addition, the VHOST_USER_SET_MEM_TABLE message was not reused because
> as the number of regions grows, the message becomes very large. In practice, such
> large messages caused problems (truncated messages) and in the past it seems
> the community has opted for smaller fixed size messages where possible. VRINGs,
> for example, are sent to the backend individually instead of in one massive
> message.
>
> The implementation details are explained in more detail in the commit
> messages, but at a high level the new protocol feature works as follows:
> - If the VHOST_USER_PROTCOL_F_CONFIGURE_MEM_SLOTS feature is enabled,
>   QEMU will send multiple VHOST_USER_ADD_MEM_REG and
>   VHOST_USER_REM_MEM_REG messages to map and unmap individual memory
>  regions instead of one large VHOST_USER_SET_MEM_TABLE message containing
>   all memory regions.
> - The vhost-user struct maintains a ’shadow state’ of memory regions
>   already sent to the guest. Each time vhost_user_set_mem_table is called,
>   the shadow state is compared with the new device state. A
>   VHOST_USER_REM_MEM_REG will be sent for each region in the shadow state
>   not in the device state. Then, a VHOST_USER_ADD_MEM_REG will be sent
>   for each region in the device state but not the shadow state. After
>   these messages have been sent, the shadow state will be updated to
>   reflect the new device state.
>
> The series consists of 10 changes:
> 1. Add helper to populate vhost-user message regions:
>     This change adds a helper to populate a VhostUserMemoryRegion from a
>     struct vhost_memory_region, which needs to be done in multiple places in
>     in this series.
>
> 2. Add vhost-user helper to get MemoryRegion data
>     This changes adds a helper to get a pointer to a MemoryRegion struct, along
>     with it's offset address and associated file descriptor. This helper is used to
>     simplify other vhost-user code paths and will be needed elsewhere in this
>     series.
>
> 3. Add VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
>     This change adds the VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
>     protocol feature. At this point, if negotiated, the feature only allows the
>     backend to limit the number of max ram slots to a number less than
>     VHOST_MEMORY_MAX_NREGIONS = 8.
>
> 4. Transmit vhost-user memory regions individually
>     With this change, if the VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
>     protocol feature is enabled, Qemu will send regions to the backend using
>     individual VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG
>     messages.
>     The max number of ram slots supported is still limited to 8.
>
> 5. Lift max memory slots imposed by vhost-user
>     With this change, if the VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
>     protocol feature is enabled, the backend can support a configurable number of
>     ram slots up to the maximum allowed by the target platform.
>
> 6. Refactor out libvhost-user fault generation logic
>     This cleanup moves some logic from vu_set_mem_table_exec_postcopy() to a
>     separate helper, which will be needed elsewhere.
>
> 7. Support ram slot configuration in libvhost-user
>    This change adds support for processing VHOST_USER_GET_MAX_MEMSLOTS
>     messages in libvhost-user.
>     The VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS protocol is not yet
>     enabled in libvhost-user, so at this point this change is non-functional.
>
> 8. Support adding individual regions in libvhost-user
>     This change adds libvhost-user support for mapping in new memory regions
>     when receiving VHOST_USER_ADD_MEM_REG messages.
>     The VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS protocol is not yet
>     enabled in libvhost-user, so at this point this change is non-functional.
>
> 9. Support individual region unmap in libvhost-user
>     This change adds libvhost-user support for unmapping removed memory regions
>     when receiving VHOST_USER_REM_MEM_REG messages.
>     The VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS protocol is not yet
>     enabled in libvhost-user, so at this point this change is non-functional.
>
> 10. Lift max ram slots limit in libvhost-user
>    This change makes libvhost-user try to negotiate the
>    VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS, and adds support for
>    backends built using libvhost-user to support hot adding memory up to the
>    32 times.
>
> The changes were tested with the vhost-user-bridge sample.
>
> Changes since V3:
>     * Fixed compiler warnings caused by using pointers to packed elements
>        (flagged by patchew building with -Waddress-of-packed-member)
>
> Changes since V2:
>     * Add support for VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
>        for backends build with libvhost-user
>     * Add support for postcopy live-migration when the
>        VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS protocol feature has
>        been negotiated.
>     * Add support for backends which want to support both
>        VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS and
>        VHOST_USER_PROTOCOL_F_REPLY_ACK
>     * Change feature name from VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS
>         to VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS, and any associated
>         variable names.
>     *Log a more descriptive message if the backend lowers the max ram slots limit
>        on reconnect.
>
> Changes since V1:
>     * Kept the assert in vhost_user_set_mem_table_postcopy, but moved it
>       to prevent corruption
>     * Made QEMU send a single VHOST_USER_GET_MAX_MEMSLOTS message at
>       startup and cache the returned value so that QEMU does not need to
>       query the backend every time vhost_backend_memslots_limit is called.
>
> Best,
> Raphael
>
> Raphael Norwitz (10):
>   Add helper to populate vhost-user message regions
>   Add vhost-user helper to get MemoryRegion data
>   Add VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS
>   Transmit vhost-user memory regions individually
>   Lift max memory slots limit imposed by vhost-user
>   Refactor out libvhost-user fault generation logic
>   Support ram slot configuration in libvhost-user
>   Support adding individual regions in libvhost-user
>   Support individual region unmap in libvhost-user
>   Lift max ram slots limit in libvhost-user
>
>  contrib/libvhost-user/libvhost-user.c | 341 ++++++++++++++----
>  contrib/libvhost-user/libvhost-user.h |  24 +-
>  docs/interop/vhost-user.rst           |  44 +++
>  hw/virtio/vhost-user.c                | 638 ++++++++++++++++++++++++++++------
>  include/hw/virtio/vhost-user.h        |   1 +
>  5 files changed, 873 insertions(+), 175 deletions(-)
>
> --
> 1.8.3.1
>