mbox series

[v2,0/3] vhost-user: Lift Max Ram Slots Limitation

Message ID 1579143426-18305-1-git-send-email-raphael.norwitz@nutanix.com (mailing list archive)
Headers show
Series vhost-user: Lift Max Ram Slots Limitation | expand

Message

Raphael Norwitz Jan. 16, 2020, 2:57 a.m. UTC
In QEMU today, a VM with a vhost-user device can hot add memory a
maximum of 8 times. See these threads, among others:

[1] https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01046.html
    https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01236.html

[2] https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg04656.html

This series introduces a new protocol feature
VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS which, when enabled, lifts the
restriction on the maximum number RAM slots imposed by vhost-user.

The patch consists of 3 changes:
1. Fixed assert in vhost_user_set_mem_table_postcopy:
   This is a bug fix in the postcopy migration path
2. Refactor vhost_user_set_mem_table functions:
   This is a non-functional change refractoring the
   vhost_user_set_mem_table and vhost_user_set_mem_table_postcopy
   functions such that the feature can be more cleanly added.
3. Lift max memory slots limit imposed by vhost-user:
   This change introduces the new protocol feature
   VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS.

The implementation details are explained in more detail in the commit
messages, but at a high level the new protocol feature works as follows:
- If the VHOST_USER_PROTCOL_F_CONFIGURE_SLOTS feature is enabled, QEMU will
  send multiple VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG
  messages to map and unmap individual memory regions instead of one large
  VHOST_USER_SET_MEM_TABLE message containing all memory regions.
- The vhost-user struct maintains a ’shadow state’ of memory regions
  already sent to the guest. Each time vhost_user_set_mem_table is called,
  the shadow state is compared with the new device state. A
  VHOST_USER_REM_MEM_REG will be sent for each region in the shadow state
  not in the device state. Then, a VHOST_USER_ADD_MEM_REG will be sent
  for each region in the device state but not the shadow state. After
  these messages have been sent, the shadow state will be updated to
  reflect the new device state.

The VHOST_USER_SET_MEM_TABLE message was not reused because as the number of
regions grows, the message becomes very large. In practice, such large
messages caused problems (truncated messages) and in the past it seems the
community has opted for smaller fixed size messages where possible. VRINGs,
for example, are sent to the backend individually instead of in one massive
message.

Current Limitations:
- postcopy migration is not supported when the
  VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS has been negotiated. 
- VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS cannot be negotiated when
  VHOST_USER_PROTOCOL_F_REPLY_ACK has also been negotiated.

Both of these limitations are due to resource contraints. They are not
imposed for technical reasons.

Changes since V1:
    * Kept the assert in vhost_user_set_mem_table_postcopy, but moved it
      to prevent corruption
    * Made QEMU send a single VHOST_USER_GET_MAX_MEMSLOTS message at
      startup and cache the returned value so that QEMU does not need to
      query the backend every time vhost_backend_memslots_limit is called.

Best,
Raphael

Raphael Norwitz (3):
  Fixed assert in vhost_user_set_mem_table_postcopy
  Refactor vhost_user_set_mem_table functions
  Lift max memory slots limit imposed by vhost-user

 docs/interop/vhost-user.rst |  43 +++++
 hw/virtio/vhost-user.c      | 385 +++++++++++++++++++++++++++++++++-----------
 2 files changed, 336 insertions(+), 92 deletions(-)

Comments

Raphael Norwitz Jan. 31, 2020, 9:21 p.m. UTC | #1
Ping

On Wed, Jan 15, 2020 at 09:57:03PM -0500, Raphael Norwitz wrote:
> 
> In QEMU today, a VM with a vhost-user device can hot add memory a
> maximum of 8 times. See these threads, among others:
> 
> [1] https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01046.html  
>     https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01236.html 
> 
> [2] https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg04656.html 
> 
> This series introduces a new protocol feature
> VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS which, when enabled, lifts the
> restriction on the maximum number RAM slots imposed by vhost-user.
> 
> The patch consists of 3 changes:
> 1. Fixed assert in vhost_user_set_mem_table_postcopy:
>    This is a bug fix in the postcopy migration path
> 2. Refactor vhost_user_set_mem_table functions:
>    This is a non-functional change refractoring the
>    vhost_user_set_mem_table and vhost_user_set_mem_table_postcopy
>    functions such that the feature can be more cleanly added.
> 3. Lift max memory slots limit imposed by vhost-user:
>    This change introduces the new protocol feature
>    VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS.
> 
> The implementation details are explained in more detail in the commit
> messages, but at a high level the new protocol feature works as follows:
> - If the VHOST_USER_PROTCOL_F_CONFIGURE_SLOTS feature is enabled, QEMU will
>   send multiple VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG
>   messages to map and unmap individual memory regions instead of one large
>   VHOST_USER_SET_MEM_TABLE message containing all memory regions.
> - The vhost-user struct maintains a ’shadow state’ of memory regions
>   already sent to the guest. Each time vhost_user_set_mem_table is called,
>   the shadow state is compared with the new device state. A
>   VHOST_USER_REM_MEM_REG will be sent for each region in the shadow state
>   not in the device state. Then, a VHOST_USER_ADD_MEM_REG will be sent
>   for each region in the device state but not the shadow state. After
>   these messages have been sent, the shadow state will be updated to
>   reflect the new device state.
> 
> The VHOST_USER_SET_MEM_TABLE message was not reused because as the number of
> regions grows, the message becomes very large. In practice, such large
> messages caused problems (truncated messages) and in the past it seems the
> community has opted for smaller fixed size messages where possible. VRINGs,
> for example, are sent to the backend individually instead of in one massive
> message.
> 
> Current Limitations:
> - postcopy migration is not supported when the
>   VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS has been negotiated. 
> - VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS cannot be negotiated when
>   VHOST_USER_PROTOCOL_F_REPLY_ACK has also been negotiated.
> 
> Both of these limitations are due to resource contraints. They are not
> imposed for technical reasons.
> 
> Changes since V1:
>     * Kept the assert in vhost_user_set_mem_table_postcopy, but moved it
>       to prevent corruption
>     * Made QEMU send a single VHOST_USER_GET_MAX_MEMSLOTS message at
>       startup and cache the returned value so that QEMU does not need to
>       query the backend every time vhost_backend_memslots_limit is called.
> 
> Best,
> Raphael
> 
> Raphael Norwitz (3):
>   Fixed assert in vhost_user_set_mem_table_postcopy
>   Refactor vhost_user_set_mem_table functions
>   Lift max memory slots limit imposed by vhost-user
> 
>  docs/interop/vhost-user.rst |  43 +++++
>  hw/virtio/vhost-user.c      | 385 +++++++++++++++++++++++++++++++++-----------
>  2 files changed, 336 insertions(+), 92 deletions(-)
> 
> -- 
> 1.8.3.1
> 
>
Michael S. Tsirkin Feb. 6, 2020, 8:33 a.m. UTC | #2
On Wed, Jan 15, 2020 at 09:57:03PM -0500, Raphael Norwitz wrote:
> In QEMU today, a VM with a vhost-user device can hot add memory a
> maximum of 8 times. See these threads, among others:
> 
> [1] https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01046.html
>     https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01236.html
> 
> [2] https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg04656.html
> 
> This series introduces a new protocol feature
> VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS which, when enabled, lifts the
> restriction on the maximum number RAM slots imposed by vhost-user.
> 
> The patch consists of 3 changes:
> 1. Fixed assert in vhost_user_set_mem_table_postcopy:
>    This is a bug fix in the postcopy migration path
> 2. Refactor vhost_user_set_mem_table functions:
>    This is a non-functional change refractoring the
>    vhost_user_set_mem_table and vhost_user_set_mem_table_postcopy
>    functions such that the feature can be more cleanly added.
> 3. Lift max memory slots limit imposed by vhost-user:
>    This change introduces the new protocol feature
>    VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS.
> 
> The implementation details are explained in more detail in the commit
> messages, but at a high level the new protocol feature works as follows:
> - If the VHOST_USER_PROTCOL_F_CONFIGURE_SLOTS feature is enabled, QEMU will
>   send multiple VHOST_USER_ADD_MEM_REG and VHOST_USER_REM_MEM_REG
>   messages to map and unmap individual memory regions instead of one large
>   VHOST_USER_SET_MEM_TABLE message containing all memory regions.
> - The vhost-user struct maintains a ’shadow state’ of memory regions
>   already sent to the guest. Each time vhost_user_set_mem_table is called,
>   the shadow state is compared with the new device state. A
>   VHOST_USER_REM_MEM_REG will be sent for each region in the shadow state
>   not in the device state. Then, a VHOST_USER_ADD_MEM_REG will be sent
>   for each region in the device state but not the shadow state. After
>   these messages have been sent, the shadow state will be updated to
>   reflect the new device state.
> 
> The VHOST_USER_SET_MEM_TABLE message was not reused because as the number of
> regions grows, the message becomes very large. In practice, such large
> messages caused problems (truncated messages) and in the past it seems the
> community has opted for smaller fixed size messages where possible. VRINGs,
> for example, are sent to the backend individually instead of in one massive
> message.
> 
> Current Limitations:
> - postcopy migration is not supported when the
>   VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS has been negotiated. 
> - VHOST_USER_PROTOCOL_F_CONFIGURE_SLOTS cannot be negotiated when
>   VHOST_USER_PROTOCOL_F_REPLY_ACK has also been negotiated.
> 
> Both of these limitations are due to resource contraints. They are not
> imposed for technical reasons.
> 
> Changes since V1:
>     * Kept the assert in vhost_user_set_mem_table_postcopy, but moved it
>       to prevent corruption
>     * Made QEMU send a single VHOST_USER_GET_MAX_MEMSLOTS message at
>       startup and cache the returned value so that QEMU does not need to
>       query the backend every time vhost_backend_memslots_limit is called.

I'm a bit confused about what happens on reconnect.
Can you clarify pls?


> Best,
> Raphael
> 
> Raphael Norwitz (3):
>   Fixed assert in vhost_user_set_mem_table_postcopy
>   Refactor vhost_user_set_mem_table functions
>   Lift max memory slots limit imposed by vhost-user
> 
>  docs/interop/vhost-user.rst |  43 +++++
>  hw/virtio/vhost-user.c      | 385 +++++++++++++++++++++++++++++++++-----------
>  2 files changed, 336 insertions(+), 92 deletions(-)
> 
> -- 
> 1.8.3.1
Raphael Norwitz Feb. 9, 2020, 5:14 p.m. UTC | #3
On Thu, Feb 06, 2020 at 03:33:13AM -0500, Michael S. Tsirkin wrote:
> 
> On Wed, Jan 15, 2020 at 09:57:03PM -0500, Raphael Norwitz wrote:
> > 
> > Changes since V1:
> >     * Kept the assert in vhost_user_set_mem_table_postcopy, but moved it
> >       to prevent corruption
> >     * Made QEMU send a single VHOST_USER_GET_MAX_MEMSLOTS message at
> >       startup and cache the returned value so that QEMU does not need to
> >       query the backend every time vhost_backend_memslots_limit is called.
> 
> I'm a bit confused about what happens on reconnect.
> Can you clarify pls?
> 
From what I can see, backends which support reconnect call vhost_dev_init,
which then calls vhost_user_backend_init(), as vhost-user-blk does here:
https://github.com/qemu/qemu/blob/master/hw/block/vhost-user-blk.c#L315. The
ram slots limit is fetched in vhost_user_backend_init() so every time the
device reconnects the limit should be refetched.
Michael S. Tsirkin Feb. 10, 2020, 4:04 p.m. UTC | #4
On Sun, Feb 09, 2020 at 12:14:42PM -0500, Raphael Norwitz wrote:
> On Thu, Feb 06, 2020 at 03:33:13AM -0500, Michael S. Tsirkin wrote:
> > 
> > On Wed, Jan 15, 2020 at 09:57:03PM -0500, Raphael Norwitz wrote:
> > > 
> > > Changes since V1:
> > >     * Kept the assert in vhost_user_set_mem_table_postcopy, but moved it
> > >       to prevent corruption
> > >     * Made QEMU send a single VHOST_USER_GET_MAX_MEMSLOTS message at
> > >       startup and cache the returned value so that QEMU does not need to
> > >       query the backend every time vhost_backend_memslots_limit is called.
> > 
> > I'm a bit confused about what happens on reconnect.
> > Can you clarify pls?
> > 
> >From what I can see, backends which support reconnect call vhost_dev_init,
> which then calls vhost_user_backend_init(), as vhost-user-blk does here:
> https://github.com/qemu/qemu/blob/master/hw/block/vhost-user-blk.c#L315. The
> ram slots limit is fetched in vhost_user_backend_init() so every time the
> device reconnects the limit should be refetched. 

Right. Point is, we might have validated using an old limit.
Reconnect needs to verify limit did not change or at least
did not decrease.
Raphael Norwitz Feb. 19, 2020, 5:33 a.m. UTC | #5
On Mon, Feb 10, 2020 at 11:04:28AM -0500, Michael S. Tsirkin wrote:
> 
> On Sun, Feb 09, 2020 at 12:14:42PM -0500, Raphael Norwitz wrote:
> > On Thu, Feb 06, 2020 at 03:33:13AM -0500, Michael S. Tsirkin wrote:
> > > 
> > > On Wed, Jan 15, 2020 at 09:57:03PM -0500, Raphael Norwitz wrote:
> > > > 
> > > > Changes since V1:
> > > >     * Kept the assert in vhost_user_set_mem_table_postcopy, but moved it
> > > >       to prevent corruption
> > > >     * Made QEMU send a single VHOST_USER_GET_MAX_MEMSLOTS message at
> > > >       startup and cache the returned value so that QEMU does not need to
> > > >       query the backend every time vhost_backend_memslots_limit is called.
> > > 
> > > I'm a bit confused about what happens on reconnect.
> > > Can you clarify pls?
> > > 
> > >From what I can see, backends which support reconnect call vhost_dev_init,
> > which then calls vhost_user_backend_init(), as vhost-user-blk does here:
> > https://github.com/qemu/qemu/blob/master/hw/block/vhost-user-blk.c#L315. The
> > ram slots limit is fetched in vhost_user_backend_init() so every time the
> > device reconnects the limit should be refetched. 
> 
> Right. Point is, we might have validated using an old limit.
> Reconnect needs to verify limit did not change or at least
> did not decrease.
> 
> -- 
> MST
Good point - I did not consider this case. Could we keep the slots limit in
the VhostUserState instead?

Say vhost_user_init() initializes the limit inside the VhostUserState to 0. Then,
vhost_user_backend_init() checks if this limit is 0. If so, this is the initial
connection and qemu fetches the limit from the backend, ensures the returned
value is nonzero, and sets the limit the VhostUserState. If not, qemu knows this
is a reconnect and queries the backend slots limit. If the returned value does
not equal the limit in the VhostUserState, vhost_user_backend_init() returns an
error.

Thoughts?
Michael S. Tsirkin Feb. 19, 2020, 10:08 a.m. UTC | #6
On Wed, Feb 19, 2020 at 12:33:24AM -0500, Raphael Norwitz wrote:
> On Mon, Feb 10, 2020 at 11:04:28AM -0500, Michael S. Tsirkin wrote:
> > 
> > On Sun, Feb 09, 2020 at 12:14:42PM -0500, Raphael Norwitz wrote:
> > > On Thu, Feb 06, 2020 at 03:33:13AM -0500, Michael S. Tsirkin wrote:
> > > > 
> > > > On Wed, Jan 15, 2020 at 09:57:03PM -0500, Raphael Norwitz wrote:
> > > > > 
> > > > > Changes since V1:
> > > > >     * Kept the assert in vhost_user_set_mem_table_postcopy, but moved it
> > > > >       to prevent corruption
> > > > >     * Made QEMU send a single VHOST_USER_GET_MAX_MEMSLOTS message at
> > > > >       startup and cache the returned value so that QEMU does not need to
> > > > >       query the backend every time vhost_backend_memslots_limit is called.
> > > > 
> > > > I'm a bit confused about what happens on reconnect.
> > > > Can you clarify pls?
> > > > 
> > > >From what I can see, backends which support reconnect call vhost_dev_init,
> > > which then calls vhost_user_backend_init(), as vhost-user-blk does here:
> > > https://github.com/qemu/qemu/blob/master/hw/block/vhost-user-blk.c#L315. The
> > > ram slots limit is fetched in vhost_user_backend_init() so every time the
> > > device reconnects the limit should be refetched. 
> > 
> > Right. Point is, we might have validated using an old limit.
> > Reconnect needs to verify limit did not change or at least
> > did not decrease.
> > 
> > -- 
> > MST
> Good point - I did not consider this case. Could we keep the slots limit in
> the VhostUserState instead?
> 
> Say vhost_user_init() initializes the limit inside the VhostUserState to 0. Then,
> vhost_user_backend_init() checks if this limit is 0. If so, this is the initial
> connection and qemu fetches the limit from the backend, ensures the returned
> value is nonzero, and sets the limit the VhostUserState. If not, qemu knows this
> is a reconnect and queries the backend slots limit. If the returned value does
> not equal the limit in the VhostUserState, vhost_user_backend_init() returns an
> error.
> 
> Thoughts?

Right.
Or if we really want to, check backend value is >= the saved one.
Basically same thing we do with features.