diff mbox series

[v2,2/2] vhost-user: add a request-reply lock

Message ID 20240828100914.105728-3-ppandit@redhat.com (mailing list archive)
State New
Headers show
Series Postcopy migration and vhost-user errors | expand

Commit Message

Prasad Pandit Aug. 28, 2024, 10:09 a.m. UTC
From: Prasad Pandit <pjp@fedoraproject.org>

QEMU threads use vhost_user_write/read calls to send
and receive request/reply messages from a vhost-user
device. When multiple threads communicate with the
same vhost-user device, they can receive each other's
messages, resulting in an erroneous state.

When fault_thread exits upon completion of Postcopy
migration, it sends a 'postcopy_end' message to the
vhost-user device. But sometimes 'postcopy_end' message
is sent while vhost device is being setup via
vhost_dev_start().

     Thread-1                           Thread-2

 vhost_dev_start                    postcopy_ram_incoming_cleanup
 vhost_device_iotlb_miss            postcopy_notify
 vhost_backend_update_device_iotlb  vhost_user_postcopy_notifier
 vhost_user_send_device_iotlb_msg   vhost_user_postcopy_end
 process_message_reply              process_message_reply
 vhost_user_read                    vhost_user_read
 vhost_user_read_header             vhost_user_read_header
 "Fail to update device iotlb"      "Failed to receive reply to postcopy_end"

This creates confusion when vhost-user device receives
'postcopy_end' message while it is trying to update IOTLB entries.

 vhost_user_read_header:
  700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
 vhost_device_iotlb_miss:
  700871,700871: Fail to update device iotlb
 vhost_user_postcopy_end:
  700871,700900: Failed to receive reply to postcopy_end
 vhost_user_read_header:
  700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.

Here fault thread seems to end the postcopy migration
while another thread is starting the vhost-user device.

Add a mutex lock to hold for one request-reply cycle
and avoid such race condition.

Fixes: 46343570c06e ("vhost+postcopy: Wire up POSTCOPY_END notify")
Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
---
 hw/virtio/vhost-user.c         | 74 ++++++++++++++++++++++++++++++++++
 include/hw/virtio/vhost-user.h |  3 ++
 2 files changed, 77 insertions(+)

v2:
 - Place QEMU_LOCK_GUARD near the vhost_user_write() calls, holding
   the lock for longer fails some tests during rpmbuild(8).
 - rpmbuild(8) fails for some SRPMs, not all. RHEL-9 SRPM builds with
   this patch, whereas Fedora SRPM does not build.
 - The host OS also seems to affect rpmbuild(8). Some SRPMs build well
   on RHEL-9, but not on Fedora-40 machine.
 - koji builds successful with this patch
   https://koji.fedoraproject.org/koji/taskinfo?taskID=122254011
   https://koji.fedoraproject.org/koji/taskinfo?taskID=122252369

v1: Use QEMU_LOCK_GUARD(), rename lock variable
 - https://lore.kernel.org/qemu-devel/20240808095147.291626-3-ppandit@redhat.com/

v0:
 - https://lore.kernel.org/all/Zo_9OlX0pV0paFj7@x1n/
 - https://lore.kernel.org/all/20240720153808-mutt-send-email-mst@kernel.org/

--
2.46.0

Comments

Michael S. Tsirkin Aug. 28, 2024, 11:15 a.m. UTC | #1
On Wed, Aug 28, 2024 at 03:39:14PM +0530, Prasad Pandit wrote:
> From: Prasad Pandit <pjp@fedoraproject.org>
> 
> QEMU threads use vhost_user_write/read calls to send
> and receive request/reply messages from a vhost-user
> device. When multiple threads communicate with the
> same vhost-user device, they can receive each other's
> messages, resulting in an erroneous state.
> 
> When fault_thread exits upon completion of Postcopy
> migration, it sends a 'postcopy_end' message to the
> vhost-user device. But sometimes 'postcopy_end' message
> is sent while vhost device is being setup via
> vhost_dev_start().
> 
>      Thread-1                           Thread-2
> 
>  vhost_dev_start                    postcopy_ram_incoming_cleanup
>  vhost_device_iotlb_miss            postcopy_notify
>  vhost_backend_update_device_iotlb  vhost_user_postcopy_notifier
>  vhost_user_send_device_iotlb_msg   vhost_user_postcopy_end
>  process_message_reply              process_message_reply
>  vhost_user_read                    vhost_user_read
>  vhost_user_read_header             vhost_user_read_header
>  "Fail to update device iotlb"      "Failed to receive reply to postcopy_end"
> 
> This creates confusion when vhost-user device receives
> 'postcopy_end' message while it is trying to update IOTLB entries.
> 
>  vhost_user_read_header:
>   700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
>  vhost_device_iotlb_miss:
>   700871,700871: Fail to update device iotlb
>  vhost_user_postcopy_end:
>   700871,700900: Failed to receive reply to postcopy_end
>  vhost_user_read_header:
>   700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
> 
> Here fault thread seems to end the postcopy migration
> while another thread is starting the vhost-user device.
> 
> Add a mutex lock to hold for one request-reply cycle
> and avoid such race condition.
> 
> Fixes: 46343570c06e ("vhost+postcopy: Wire up POSTCOPY_END notify")
> Suggested-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>
> ---
>  hw/virtio/vhost-user.c         | 74 ++++++++++++++++++++++++++++++++++
>  include/hw/virtio/vhost-user.h |  3 ++
>  2 files changed, 77 insertions(+)
> 
> v2:
>  - Place QEMU_LOCK_GUARD near the vhost_user_write() calls, holding
>    the lock for longer fails some tests during rpmbuild(8).

what do you mean fails rpmbuild? that qemu with this
patch can not be compiled?

>  - rpmbuild(8) fails for some SRPMs, not all. RHEL-9 SRPM builds with
>    this patch, whereas Fedora SRPM does not build.
>  - The host OS also seems to affect rpmbuild(8). Some SRPMs build well
>    on RHEL-9, but not on Fedora-40 machine.
>  - koji builds successful with this patch
>    https://koji.fedoraproject.org/koji/taskinfo?taskID=122254011
>    https://koji.fedoraproject.org/koji/taskinfo?taskID=122252369
> 
> v1: Use QEMU_LOCK_GUARD(), rename lock variable
>  - https://lore.kernel.org/qemu-devel/20240808095147.291626-3-ppandit@redhat.com/
> 
> v0:
>  - https://lore.kernel.org/all/Zo_9OlX0pV0paFj7@x1n/
>  - https://lore.kernel.org/all/20240720153808-mutt-send-email-mst@kernel.org/
> 
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 00561daa06..7b030ae2cd 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -24,6 +24,7 @@
>  #include "qemu/main-loop.h"
>  #include "qemu/uuid.h"
>  #include "qemu/sockets.h"
> +#include "qemu/lockable.h"
>  #include "sysemu/runstate.h"
>  #include "sysemu/cryptodev.h"
>  #include "migration/postcopy-ram.h"
> @@ -446,6 +447,10 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base,
>          .hdr.size = sizeof(msg.payload.log),
>      };
> 
> +    struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      /* Send only once with first queue pair */
>      if (dev->vq_index != 0) {
>          return 0;
> @@ -664,6 +669,7 @@ static int send_remove_regions(struct vhost_dev *dev,
>                                 bool reply_supported)
>  {
>      struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
>      struct vhost_memory_region *shadow_reg;
>      int i, fd, shadow_reg_idx, ret;
>      ram_addr_t offset;
> @@ -685,6 +691,8 @@ static int send_remove_regions(struct vhost_dev *dev,
>              vhost_user_fill_msg_region(&region_buffer, shadow_reg, 0);
>              msg->payload.mem_reg.region = region_buffer;
> 
> +            QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>              ret = vhost_user_write(dev, msg, NULL, 0);
>              if (ret < 0) {
>                  return ret;
> @@ -718,6 +726,7 @@ static int send_add_regions(struct vhost_dev *dev,
>                              bool reply_supported, bool track_ramblocks)
>  {
>      struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
>      int i, fd, ret, reg_idx, reg_fd_idx;
>      struct vhost_memory_region *reg;
>      MemoryRegion *mr;
> @@ -746,6 +755,8 @@ static int send_add_regions(struct vhost_dev *dev,
>              vhost_user_fill_msg_region(&region_buffer, reg, offset);
>              msg->payload.mem_reg.region = region_buffer;
> 
> +            QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>              ret = vhost_user_write(dev, msg, &fd, 1);
>              if (ret < 0) {
>                  return ret;
> @@ -893,6 +904,7 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
>                                               bool config_mem_slots)
>  {
>      struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
>      int fds[VHOST_MEMORY_BASELINE_NREGIONS];
>      size_t fd_num = 0;
>      VhostUserMsg msg_reply;
> @@ -926,6 +938,8 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
>              return ret;
>          }
> 
> +        QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>          ret = vhost_user_write(dev, &msg, fds, fd_num);
>          if (ret < 0) {
>              return ret;
> @@ -1005,6 +1019,7 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
>                                      struct vhost_memory *mem)
>  {
>      struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
>      int fds[VHOST_MEMORY_BASELINE_NREGIONS];
>      size_t fd_num = 0;
>      bool do_postcopy = u->postcopy_listen && u->postcopy_fd.handler;
> @@ -1044,6 +1059,8 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
>              return ret;
>          }
> 
> +        QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>          ret = vhost_user_write(dev, &msg, fds, fd_num);
>          if (ret < 0) {
>              return ret;
> @@ -1089,6 +1106,10 @@ static int vhost_user_get_u64(struct vhost_dev *dev, int request, uint64_t *u64)
>          return 0;
>      }
> 
> +    struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          return ret;
> @@ -1138,6 +1159,10 @@ static int vhost_user_write_sync(struct vhost_dev *dev, VhostUserMsg *msg,
>          }
>      }
> 
> +/*  struct vhost_user *u = dev->opaque;
> + *  struct VhostUserState *us = u->user;
> + *  QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> + */
>      ret = vhost_user_write(dev, msg, NULL, 0);
>      if (ret < 0) {
>          return ret;


What is this comment saying?

> @@ -1277,6 +1302,8 @@ static int vhost_user_get_vring_base(struct vhost_dev *dev,
>          .hdr.size = sizeof(msg.payload.state),
>      };
>      struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> 
>      VhostUserHostNotifier *n = fetch_notifier(u->user, ring->index);
>      if (n) {
> @@ -1669,6 +1696,9 @@ int vhost_user_get_shared_object(struct vhost_dev *dev, unsigned char *uuid,
>      };
>      memcpy(msg.payload.object.uuid, uuid, sizeof(msg.payload.object.uuid));
> 
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          return ret;
> @@ -1889,6 +1919,9 @@ static int vhost_setup_backend_channel(struct vhost_dev *dev)
>          msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
>      }
> 
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, &sv[1], 1);
>      if (ret) {
>          goto out;
> @@ -1993,6 +2026,9 @@ static int vhost_user_postcopy_advise(struct vhost_dev *dev, Error **errp)
>          .hdr.flags = VHOST_USER_VERSION,
>      };
> 
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          error_setg(errp, "Failed to send postcopy_advise to vhost");
> @@ -2051,6 +2087,9 @@ static int vhost_user_postcopy_listen(struct vhost_dev *dev, Error **errp)
> 
>      trace_vhost_user_postcopy_listen();
> 
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          error_setg(errp, "Failed to send postcopy_listen to vhost");
> @@ -2080,6 +2119,9 @@ static int vhost_user_postcopy_end(struct vhost_dev *dev, Error **errp)
> 
>      trace_vhost_user_postcopy_end_entry();
> 
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          error_setg(errp, "Failed to send postcopy_end to vhost");
> @@ -2372,6 +2414,10 @@ static int vhost_user_net_set_mtu(struct vhost_dev *dev, uint16_t mtu)
>          msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
>      }
> 
> +    struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          return ret;
> @@ -2396,6 +2442,10 @@ static int vhost_user_send_device_iotlb_msg(struct vhost_dev *dev,
>          .payload.iotlb = *imsg,
>      };
> 
> +    struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          return ret;
> @@ -2428,6 +2478,10 @@ static int vhost_user_get_config(struct vhost_dev *dev, uint8_t *config,
> 
>      assert(config_len <= VHOST_USER_MAX_CONFIG_SIZE);
> 
> +    struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      msg.payload.config.offset = 0;
>      msg.payload.config.size = config_len;
>      ret = vhost_user_write(dev, &msg, NULL, 0);
> @@ -2492,6 +2546,10 @@ static int vhost_user_set_config(struct vhost_dev *dev, const uint8_t *data,
>      p = msg.payload.config.region;
>      memcpy(p, data, size);
> 
> +    struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          return ret;
> @@ -2570,6 +2628,10 @@ static int vhost_user_crypto_create_session(struct vhost_dev *dev,
>          }
>      }
> 
> +    struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      msg.payload.session.op_code = backend_info->op_code;
>      msg.payload.session.session_id = backend_info->session_id;
>      ret = vhost_user_write(dev, &msg, NULL, 0);
> @@ -2662,6 +2724,9 @@ static int vhost_user_get_inflight_fd(struct vhost_dev *dev,
>          return 0;
>      }
> 
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          return ret;
> @@ -2757,6 +2822,7 @@ bool vhost_user_init(VhostUserState *user, CharBackend *chr, Error **errp)
>      user->memory_slots = 0;
>      user->notifiers = g_ptr_array_new_full(VIRTIO_QUEUE_MAX / 4,
>                                             &vhost_user_state_destroy);
> +    qemu_mutex_init(&user->vhost_user_request_reply_lock);
>      return true;
>  }
> 
> @@ -2769,6 +2835,7 @@ void vhost_user_cleanup(VhostUserState *user)
>      user->notifiers = (GPtrArray *) g_ptr_array_free(user->notifiers, true);
>      memory_region_transaction_commit();
>      user->chr = NULL;
> +    qemu_mutex_destroy(&user->vhost_user_request_reply_lock);
>  }
> 
> 
> @@ -2902,6 +2969,9 @@ static int vhost_user_set_device_state_fd(struct vhost_dev *dev,
>          return -ENOTSUP;
>      }
> 
> +    struct VhostUserState *us = vu->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, &fd, 1);
>      close(fd);
>      if (ret < 0) {
> @@ -2965,6 +3035,10 @@ static int vhost_user_check_device_state(struct vhost_dev *dev, Error **errp)
>          return -ENOTSUP;
>      }
> 
> +    struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          error_setg_errno(errp, -ret,
> diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
> index 324cd8663a..e96f12d449 100644
> --- a/include/hw/virtio/vhost-user.h
> +++ b/include/hw/virtio/vhost-user.h
> @@ -67,6 +67,9 @@ typedef struct VhostUserState {
>      GPtrArray *notifiers;
>      int memory_slots;
>      bool supports_config;
> +
> +    /* Hold lock for a request-reply cycle */
> +    QemuMutex vhost_user_request_reply_lock;
>  } VhostUserState;
> 
>  /**
> --
> 2.46.0
Prasad Pandit Aug. 29, 2024, 5:39 a.m. UTC | #2
On Wed, 28 Aug 2024 at 16:45, Michael S. Tsirkin <mst@redhat.com> wrote:
> >  - Place QEMU_LOCK_GUARD near the vhost_user_write() calls, holding
> >    the lock for longer fails some tests during rpmbuild(8).
>
> what do you mean fails rpmbuild? that qemu with this patch can not be compiled?

* In V1 of this patch, QEMU_LOCK_GUARD was placed near beginning of
the function. But that caused some unit tests to fail reporting
TIMEOUT errors. In this V2, QEMU_LOCK_GUARD is placed near
vhost_user_write() calls, to reduce the time that lock is held.

* Both (V1 & V2) compile well, but fail at '%check' stage while
running unit tests (on some machines), ie. rpm package is not built.
rpmbuild(8) on F40 machine failed, but koji scratch build with the
same SRPM worked fine. Those scratch builds are shared above. RHEL-9
SRPM built well on RHEL-9 host, but failed to build on F40 machine
reporting failure at '%check' stage of rpmbuild(8).

Thank you.
---
  - Prasad
Michael S. Tsirkin Aug. 29, 2024, 6:23 a.m. UTC | #3
On Thu, Aug 29, 2024 at 11:09:44AM +0530, Prasad Pandit wrote:
> On Wed, 28 Aug 2024 at 16:45, Michael S. Tsirkin <mst@redhat.com> wrote:
> > >  - Place QEMU_LOCK_GUARD near the vhost_user_write() calls, holding
> > >    the lock for longer fails some tests during rpmbuild(8).
> >
> > what do you mean fails rpmbuild? that qemu with this patch can not be compiled?
> 
> * In V1 of this patch, QEMU_LOCK_GUARD was placed near beginning of
> the function. But that caused some unit tests to fail reporting
> TIMEOUT errors. In this V2, QEMU_LOCK_GUARD is placed near
> vhost_user_write() calls, to reduce the time that lock is held.
> 
> * Both (V1 & V2) compile well, but fail at '%check' stage while
> running unit tests (on some machines), ie. rpm package is not built.
> rpmbuild(8) on F40 machine failed, but koji scratch build with the
> same SRPM worked fine. Those scratch builds are shared above. RHEL-9
> SRPM built well on RHEL-9 host, but failed to build on F40 machine
> reporting failure at '%check' stage of rpmbuild(8).
> 
> Thank you.
> ---
>   - Prasad

Weird.  Seems to indicate some kind of deadlock?
Michael S. Tsirkin Aug. 29, 2024, 7:42 a.m. UTC | #4
On Wed, Aug 28, 2024 at 03:39:14PM +0530, Prasad Pandit wrote:
> From: Prasad Pandit <pjp@fedoraproject.org>
> 
> QEMU threads use vhost_user_write/read calls to send
> and receive request/reply messages from a vhost-user
> device. When multiple threads communicate with the
> same vhost-user device, they can receive each other's
> messages, resulting in an erroneous state.
> 
> When fault_thread exits upon completion of Postcopy
> migration, it sends a 'postcopy_end' message to the
> vhost-user device. But sometimes 'postcopy_end' message
> is sent while vhost device is being setup via
> vhost_dev_start().

So maybe vhost_user_postcopy_end should take the BQL?

>      Thread-1                           Thread-2
> 
>  vhost_dev_start                    postcopy_ram_incoming_cleanup
>  vhost_device_iotlb_miss            postcopy_notify
>  vhost_backend_update_device_iotlb  vhost_user_postcopy_notifier
>  vhost_user_send_device_iotlb_msg   vhost_user_postcopy_end
>  process_message_reply              process_message_reply
>  vhost_user_read                    vhost_user_read
>  vhost_user_read_header             vhost_user_read_header
>  "Fail to update device iotlb"      "Failed to receive reply to postcopy_end"
> 
> This creates confusion when vhost-user device receives
> 'postcopy_end' message while it is trying to update IOTLB entries.
> 
>  vhost_user_read_header:
>   700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
>  vhost_device_iotlb_miss:
>   700871,700871: Fail to update device iotlb
>  vhost_user_postcopy_end:
>   700871,700900: Failed to receive reply to postcopy_end
>  vhost_user_read_header:
>   700871,700871: Failed to read msg header. Flags 0x0 instead of 0x5.
> 
> Here fault thread seems to end the postcopy migration
> while another thread is starting the vhost-user device.
> 
> Add a mutex lock to hold for one request-reply cycle
> and avoid such race condition.
> 
> Fixes: 46343570c06e ("vhost+postcopy: Wire up POSTCOPY_END notify")
> Suggested-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Prasad Pandit <pjp@fedoraproject.org>


CC Author and reviewer of the offending commit.


> ---
>  hw/virtio/vhost-user.c         | 74 ++++++++++++++++++++++++++++++++++
>  include/hw/virtio/vhost-user.h |  3 ++
>  2 files changed, 77 insertions(+)
> 
> v2:
>  - Place QEMU_LOCK_GUARD near the vhost_user_write() calls, holding
>    the lock for longer fails some tests during rpmbuild(8).
>  - rpmbuild(8) fails for some SRPMs, not all. RHEL-9 SRPM builds with
>    this patch, whereas Fedora SRPM does not build.
>  - The host OS also seems to affect rpmbuild(8). Some SRPMs build well
>    on RHEL-9, but not on Fedora-40 machine.
>  - koji builds successful with this patch
>    https://koji.fedoraproject.org/koji/taskinfo?taskID=122254011
>    https://koji.fedoraproject.org/koji/taskinfo?taskID=122252369
> 
> v1: Use QEMU_LOCK_GUARD(), rename lock variable
>  - https://lore.kernel.org/qemu-devel/20240808095147.291626-3-ppandit@redhat.com/
> 
> v0:
>  - https://lore.kernel.org/all/Zo_9OlX0pV0paFj7@x1n/
>  - https://lore.kernel.org/all/20240720153808-mutt-send-email-mst@kernel.org/
> 
> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> index 00561daa06..7b030ae2cd 100644
> --- a/hw/virtio/vhost-user.c
> +++ b/hw/virtio/vhost-user.c
> @@ -24,6 +24,7 @@
>  #include "qemu/main-loop.h"
>  #include "qemu/uuid.h"
>  #include "qemu/sockets.h"
> +#include "qemu/lockable.h"
>  #include "sysemu/runstate.h"
>  #include "sysemu/cryptodev.h"
>  #include "migration/postcopy-ram.h"
> @@ -446,6 +447,10 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base,
>          .hdr.size = sizeof(msg.payload.log),
>      };
> 
> +    struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      /* Send only once with first queue pair */
>      if (dev->vq_index != 0) {
>          return 0;
> @@ -664,6 +669,7 @@ static int send_remove_regions(struct vhost_dev *dev,
>                                 bool reply_supported)
>  {
>      struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
>      struct vhost_memory_region *shadow_reg;
>      int i, fd, shadow_reg_idx, ret;
>      ram_addr_t offset;
> @@ -685,6 +691,8 @@ static int send_remove_regions(struct vhost_dev *dev,
>              vhost_user_fill_msg_region(&region_buffer, shadow_reg, 0);
>              msg->payload.mem_reg.region = region_buffer;
> 
> +            QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>              ret = vhost_user_write(dev, msg, NULL, 0);
>              if (ret < 0) {
>                  return ret;
> @@ -718,6 +726,7 @@ static int send_add_regions(struct vhost_dev *dev,
>                              bool reply_supported, bool track_ramblocks)
>  {
>      struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
>      int i, fd, ret, reg_idx, reg_fd_idx;
>      struct vhost_memory_region *reg;
>      MemoryRegion *mr;
> @@ -746,6 +755,8 @@ static int send_add_regions(struct vhost_dev *dev,
>              vhost_user_fill_msg_region(&region_buffer, reg, offset);
>              msg->payload.mem_reg.region = region_buffer;
> 
> +            QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>              ret = vhost_user_write(dev, msg, &fd, 1);
>              if (ret < 0) {
>                  return ret;
> @@ -893,6 +904,7 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
>                                               bool config_mem_slots)
>  {
>      struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
>      int fds[VHOST_MEMORY_BASELINE_NREGIONS];
>      size_t fd_num = 0;
>      VhostUserMsg msg_reply;
> @@ -926,6 +938,8 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
>              return ret;
>          }
> 
> +        QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>          ret = vhost_user_write(dev, &msg, fds, fd_num);
>          if (ret < 0) {
>              return ret;
> @@ -1005,6 +1019,7 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
>                                      struct vhost_memory *mem)
>  {
>      struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
>      int fds[VHOST_MEMORY_BASELINE_NREGIONS];
>      size_t fd_num = 0;
>      bool do_postcopy = u->postcopy_listen && u->postcopy_fd.handler;
> @@ -1044,6 +1059,8 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev,
>              return ret;
>          }
> 
> +        QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>          ret = vhost_user_write(dev, &msg, fds, fd_num);
>          if (ret < 0) {
>              return ret;
> @@ -1089,6 +1106,10 @@ static int vhost_user_get_u64(struct vhost_dev *dev, int request, uint64_t *u64)
>          return 0;
>      }
> 
> +    struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          return ret;
> @@ -1138,6 +1159,10 @@ static int vhost_user_write_sync(struct vhost_dev *dev, VhostUserMsg *msg,
>          }
>      }
> 
> +/*  struct vhost_user *u = dev->opaque;
> + *  struct VhostUserState *us = u->user;
> + *  QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> + */
>      ret = vhost_user_write(dev, msg, NULL, 0);
>      if (ret < 0) {
>          return ret;
> @@ -1277,6 +1302,8 @@ static int vhost_user_get_vring_base(struct vhost_dev *dev,
>          .hdr.size = sizeof(msg.payload.state),
>      };
>      struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> 
>      VhostUserHostNotifier *n = fetch_notifier(u->user, ring->index);
>      if (n) {
> @@ -1669,6 +1696,9 @@ int vhost_user_get_shared_object(struct vhost_dev *dev, unsigned char *uuid,
>      };
>      memcpy(msg.payload.object.uuid, uuid, sizeof(msg.payload.object.uuid));
> 
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          return ret;
> @@ -1889,6 +1919,9 @@ static int vhost_setup_backend_channel(struct vhost_dev *dev)
>          msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
>      }
> 
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, &sv[1], 1);
>      if (ret) {
>          goto out;
> @@ -1993,6 +2026,9 @@ static int vhost_user_postcopy_advise(struct vhost_dev *dev, Error **errp)
>          .hdr.flags = VHOST_USER_VERSION,
>      };
> 
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          error_setg(errp, "Failed to send postcopy_advise to vhost");
> @@ -2051,6 +2087,9 @@ static int vhost_user_postcopy_listen(struct vhost_dev *dev, Error **errp)
> 
>      trace_vhost_user_postcopy_listen();
> 
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          error_setg(errp, "Failed to send postcopy_listen to vhost");
> @@ -2080,6 +2119,9 @@ static int vhost_user_postcopy_end(struct vhost_dev *dev, Error **errp)
> 
>      trace_vhost_user_postcopy_end_entry();
> 
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          error_setg(errp, "Failed to send postcopy_end to vhost");
> @@ -2372,6 +2414,10 @@ static int vhost_user_net_set_mtu(struct vhost_dev *dev, uint16_t mtu)
>          msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
>      }
> 
> +    struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          return ret;
> @@ -2396,6 +2442,10 @@ static int vhost_user_send_device_iotlb_msg(struct vhost_dev *dev,
>          .payload.iotlb = *imsg,
>      };
> 
> +    struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          return ret;
> @@ -2428,6 +2478,10 @@ static int vhost_user_get_config(struct vhost_dev *dev, uint8_t *config,
> 
>      assert(config_len <= VHOST_USER_MAX_CONFIG_SIZE);
> 
> +    struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      msg.payload.config.offset = 0;
>      msg.payload.config.size = config_len;
>      ret = vhost_user_write(dev, &msg, NULL, 0);
> @@ -2492,6 +2546,10 @@ static int vhost_user_set_config(struct vhost_dev *dev, const uint8_t *data,
>      p = msg.payload.config.region;
>      memcpy(p, data, size);
> 
> +    struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          return ret;
> @@ -2570,6 +2628,10 @@ static int vhost_user_crypto_create_session(struct vhost_dev *dev,
>          }
>      }
> 
> +    struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      msg.payload.session.op_code = backend_info->op_code;
>      msg.payload.session.session_id = backend_info->session_id;
>      ret = vhost_user_write(dev, &msg, NULL, 0);
> @@ -2662,6 +2724,9 @@ static int vhost_user_get_inflight_fd(struct vhost_dev *dev,
>          return 0;
>      }
> 
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          return ret;
> @@ -2757,6 +2822,7 @@ bool vhost_user_init(VhostUserState *user, CharBackend *chr, Error **errp)
>      user->memory_slots = 0;
>      user->notifiers = g_ptr_array_new_full(VIRTIO_QUEUE_MAX / 4,
>                                             &vhost_user_state_destroy);
> +    qemu_mutex_init(&user->vhost_user_request_reply_lock);
>      return true;
>  }
> 
> @@ -2769,6 +2835,7 @@ void vhost_user_cleanup(VhostUserState *user)
>      user->notifiers = (GPtrArray *) g_ptr_array_free(user->notifiers, true);
>      memory_region_transaction_commit();
>      user->chr = NULL;
> +    qemu_mutex_destroy(&user->vhost_user_request_reply_lock);
>  }
> 
> 
> @@ -2902,6 +2969,9 @@ static int vhost_user_set_device_state_fd(struct vhost_dev *dev,
>          return -ENOTSUP;
>      }
> 
> +    struct VhostUserState *us = vu->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, &fd, 1);
>      close(fd);
>      if (ret < 0) {
> @@ -2965,6 +3035,10 @@ static int vhost_user_check_device_state(struct vhost_dev *dev, Error **errp)
>          return -ENOTSUP;
>      }
> 
> +    struct vhost_user *u = dev->opaque;
> +    struct VhostUserState *us = u->user;
> +    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
> +
>      ret = vhost_user_write(dev, &msg, NULL, 0);
>      if (ret < 0) {
>          error_setg_errno(errp, -ret,
> diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
> index 324cd8663a..e96f12d449 100644
> --- a/include/hw/virtio/vhost-user.h
> +++ b/include/hw/virtio/vhost-user.h
> @@ -67,6 +67,9 @@ typedef struct VhostUserState {
>      GPtrArray *notifiers;
>      int memory_slots;
>      bool supports_config;
> +
> +    /* Hold lock for a request-reply cycle */
> +    QemuMutex vhost_user_request_reply_lock;
>  } VhostUserState;
> 
>  /**
> --
> 2.46.0
Prasad Pandit Aug. 29, 2024, 9:15 a.m. UTC | #5
Hello Michael,

On Thu, 29 Aug 2024 at 13:12, Michael S. Tsirkin <mst@redhat.com> wrote:
> Weird.  Seems to indicate some kind of deadlock?

* Such a deadlock should occur across all environments I guess, not
sure why it happens selectively. It is strange.

> So maybe vhost_user_postcopy_end should take the BQL?
===
diff --git a/migration/savevm.c b/migration/savevm.c
index e7c1215671..31acda3818 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2050,7 +2050,9 @@ static void *postcopy_ram_listen_thread(void *opaque)
          */
         qemu_event_wait(&mis->main_thread_load_event);
     }
+    bql_lock();
     postcopy_ram_incoming_cleanup(mis);
+    bql_unlock();

     if (load_res < 0) {
         /*
===

* Actually a BQL patch above was tested and it worked fine. But not
sure if it is an acceptable solution. Another contention was taking
BQL could make things more complicated, so a local vhost-user specific
lock should be better.

...wdyt?
---
  - Prasad
Michael S. Tsirkin Aug. 29, 2024, 9:21 a.m. UTC | #6
On Thu, Aug 29, 2024 at 02:45:45PM +0530, Prasad Pandit wrote:
> Hello Michael,
> 
> On Thu, 29 Aug 2024 at 13:12, Michael S. Tsirkin <mst@redhat.com> wrote:
> > Weird.  Seems to indicate some kind of deadlock?
> 
> * Such a deadlock should occur across all environments I guess, not
> sure why it happens selectively. It is strange.

Some kind of race?

> > So maybe vhost_user_postcopy_end should take the BQL?
> ===
> diff --git a/migration/savevm.c b/migration/savevm.c
> index e7c1215671..31acda3818 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2050,7 +2050,9 @@ static void *postcopy_ram_listen_thread(void *opaque)
>           */
>          qemu_event_wait(&mis->main_thread_load_event);
>      }
> +    bql_lock();
>      postcopy_ram_incoming_cleanup(mis);
> +    bql_unlock();
> 
>      if (load_res < 0) {
>          /*
> ===
> 
> * Actually a BQL patch above was tested and it worked fine. But not
> sure if it is an acceptable solution. Another contention was taking
> BQL could make things more complicated, so a local vhost-user specific
> lock should be better.
> 
> ...wdyt?
> ---
>   - Prasad

Keep it simple, is my advice. Not causing regressions is good.
Peter Xu Aug. 29, 2024, 2:29 p.m. UTC | #7
On Thu, Aug 29, 2024 at 02:45:45PM +0530, Prasad Pandit wrote:
> Hello Michael,
> 
> On Thu, 29 Aug 2024 at 13:12, Michael S. Tsirkin <mst@redhat.com> wrote:
> > Weird.  Seems to indicate some kind of deadlock?
> 
> * Such a deadlock should occur across all environments I guess, not
> sure why it happens selectively. It is strange.
> 
> > So maybe vhost_user_postcopy_end should take the BQL?
> ===
> diff --git a/migration/savevm.c b/migration/savevm.c
> index e7c1215671..31acda3818 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2050,7 +2050,9 @@ static void *postcopy_ram_listen_thread(void *opaque)
>           */
>          qemu_event_wait(&mis->main_thread_load_event);
>      }
> +    bql_lock();
>      postcopy_ram_incoming_cleanup(mis);
> +    bql_unlock();
> 
>      if (load_res < 0) {
>          /*
> ===
> 
> * Actually a BQL patch above was tested and it worked fine. But not
> sure if it is an acceptable solution. Another contention was taking
> BQL could make things more complicated, so a local vhost-user specific
> lock should be better.
> 
> ...wdyt?

I think Michael was suggesting taking bql in vhost_user_postcopy_end(), not
in postcopy code directly.  I'm recently looking at how to make precopy
load even take less bql and even make it a separate thread. Above is
definitely going backwards, per we discussed already internally.

I cherish postcopy doesn't need to take bql on its own in most paths, and
we shouldn't add unnecessary bql requirement even if vhost-user isn't used.

Personally I still prefer we look into why a separate mutex won't work and
why that timed out; that could be part of whoever is going to investigate
the whole issue (including the hang later on). Otherwise I'm ok from
migration pov that we take bql in the vhost-user hook, but not in savevm.c.

Thanks,
Michael S. Tsirkin Aug. 29, 2024, 3:05 p.m. UTC | #8
On Thu, Aug 29, 2024 at 10:29:24AM -0400, Peter Xu wrote:
> On Thu, Aug 29, 2024 at 02:45:45PM +0530, Prasad Pandit wrote:
> > Hello Michael,
> > 
> > On Thu, 29 Aug 2024 at 13:12, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > Weird.  Seems to indicate some kind of deadlock?
> > 
> > * Such a deadlock should occur across all environments I guess, not
> > sure why it happens selectively. It is strange.
> > 
> > > So maybe vhost_user_postcopy_end should take the BQL?
> > ===
> > diff --git a/migration/savevm.c b/migration/savevm.c
> > index e7c1215671..31acda3818 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -2050,7 +2050,9 @@ static void *postcopy_ram_listen_thread(void *opaque)
> >           */
> >          qemu_event_wait(&mis->main_thread_load_event);
> >      }
> > +    bql_lock();
> >      postcopy_ram_incoming_cleanup(mis);
> > +    bql_unlock();
> > 
> >      if (load_res < 0) {
> >          /*
> > ===
> > 
> > * Actually a BQL patch above was tested and it worked fine. But not
> > sure if it is an acceptable solution. Another contention was taking
> > BQL could make things more complicated, so a local vhost-user specific
> > lock should be better.
> > 
> > ...wdyt?
> 
> I think Michael was suggesting taking bql in vhost_user_postcopy_end(), not
> in postcopy code directly.

maybe that's better, ok.

>  I'm recently looking at how to make precopy
> load even take less bql and even make it a separate thread. Above is
> definitely going backwards, per we discussed already internally.


At the same time a small bugfix is better, can be backported.


> I cherish postcopy doesn't need to take bql on its own in most paths, and
> we shouldn't add unnecessary bql requirement even if vhost-user isn't used.
> 
> Personally I still prefer we look into why a separate mutex won't work and
> why that timed out; that could be part of whoever is going to investigate
> the whole issue (including the hang later on). Otherwise I'm ok from
> migration pov that we take bql in the vhost-user hook, but not in savevm.c.
> 
> Thanks,

ok

> -- 
> Peter Xu
Peter Xu Aug. 29, 2024, 8:29 p.m. UTC | #9
On Thu, Aug 29, 2024 at 11:05:15AM -0400, Michael S. Tsirkin wrote:
> > Personally I still prefer we look into why a separate mutex won't work and
> > why that timed out; that could be part of whoever is going to investigate
> > the whole issue (including the hang later on). Otherwise I'm ok from
> > migration pov that we take bql in the vhost-user hook, but not in savevm.c.
> 
> ok

Just something as a heads-up comment in case someone might keep looking
into the hang problem: I'm not sure whether the brew build failure on the
test case is relevant to the hang issue we observed, or even it is the hang
issue itself - if the failure is about a timeout that one qemu hanged.

IOW, whoever cannot reproduce the hang might leverage the mutex patch to
reproduce, if we want to figure out the last missing piece of the puzzle..

Thanks,
diff mbox series

Patch

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 00561daa06..7b030ae2cd 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -24,6 +24,7 @@ 
 #include "qemu/main-loop.h"
 #include "qemu/uuid.h"
 #include "qemu/sockets.h"
+#include "qemu/lockable.h"
 #include "sysemu/runstate.h"
 #include "sysemu/cryptodev.h"
 #include "migration/postcopy-ram.h"
@@ -446,6 +447,10 @@  static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base,
         .hdr.size = sizeof(msg.payload.log),
     };

+    struct vhost_user *u = dev->opaque;
+    struct VhostUserState *us = u->user;
+    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
     /* Send only once with first queue pair */
     if (dev->vq_index != 0) {
         return 0;
@@ -664,6 +669,7 @@  static int send_remove_regions(struct vhost_dev *dev,
                                bool reply_supported)
 {
     struct vhost_user *u = dev->opaque;
+    struct VhostUserState *us = u->user;
     struct vhost_memory_region *shadow_reg;
     int i, fd, shadow_reg_idx, ret;
     ram_addr_t offset;
@@ -685,6 +691,8 @@  static int send_remove_regions(struct vhost_dev *dev,
             vhost_user_fill_msg_region(&region_buffer, shadow_reg, 0);
             msg->payload.mem_reg.region = region_buffer;

+            QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
             ret = vhost_user_write(dev, msg, NULL, 0);
             if (ret < 0) {
                 return ret;
@@ -718,6 +726,7 @@  static int send_add_regions(struct vhost_dev *dev,
                             bool reply_supported, bool track_ramblocks)
 {
     struct vhost_user *u = dev->opaque;
+    struct VhostUserState *us = u->user;
     int i, fd, ret, reg_idx, reg_fd_idx;
     struct vhost_memory_region *reg;
     MemoryRegion *mr;
@@ -746,6 +755,8 @@  static int send_add_regions(struct vhost_dev *dev,
             vhost_user_fill_msg_region(&region_buffer, reg, offset);
             msg->payload.mem_reg.region = region_buffer;

+            QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
             ret = vhost_user_write(dev, msg, &fd, 1);
             if (ret < 0) {
                 return ret;
@@ -893,6 +904,7 @@  static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
                                              bool config_mem_slots)
 {
     struct vhost_user *u = dev->opaque;
+    struct VhostUserState *us = u->user;
     int fds[VHOST_MEMORY_BASELINE_NREGIONS];
     size_t fd_num = 0;
     VhostUserMsg msg_reply;
@@ -926,6 +938,8 @@  static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev,
             return ret;
         }

+        QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
         ret = vhost_user_write(dev, &msg, fds, fd_num);
         if (ret < 0) {
             return ret;
@@ -1005,6 +1019,7 @@  static int vhost_user_set_mem_table(struct vhost_dev *dev,
                                     struct vhost_memory *mem)
 {
     struct vhost_user *u = dev->opaque;
+    struct VhostUserState *us = u->user;
     int fds[VHOST_MEMORY_BASELINE_NREGIONS];
     size_t fd_num = 0;
     bool do_postcopy = u->postcopy_listen && u->postcopy_fd.handler;
@@ -1044,6 +1059,8 @@  static int vhost_user_set_mem_table(struct vhost_dev *dev,
             return ret;
         }

+        QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
         ret = vhost_user_write(dev, &msg, fds, fd_num);
         if (ret < 0) {
             return ret;
@@ -1089,6 +1106,10 @@  static int vhost_user_get_u64(struct vhost_dev *dev, int request, uint64_t *u64)
         return 0;
     }

+    struct vhost_user *u = dev->opaque;
+    struct VhostUserState *us = u->user;
+    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
     ret = vhost_user_write(dev, &msg, NULL, 0);
     if (ret < 0) {
         return ret;
@@ -1138,6 +1159,10 @@  static int vhost_user_write_sync(struct vhost_dev *dev, VhostUserMsg *msg,
         }
     }

+/*  struct vhost_user *u = dev->opaque;
+ *  struct VhostUserState *us = u->user;
+ *  QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+ */
     ret = vhost_user_write(dev, msg, NULL, 0);
     if (ret < 0) {
         return ret;
@@ -1277,6 +1302,8 @@  static int vhost_user_get_vring_base(struct vhost_dev *dev,
         .hdr.size = sizeof(msg.payload.state),
     };
     struct vhost_user *u = dev->opaque;
+    struct VhostUserState *us = u->user;
+    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);

     VhostUserHostNotifier *n = fetch_notifier(u->user, ring->index);
     if (n) {
@@ -1669,6 +1696,9 @@  int vhost_user_get_shared_object(struct vhost_dev *dev, unsigned char *uuid,
     };
     memcpy(msg.payload.object.uuid, uuid, sizeof(msg.payload.object.uuid));

+    struct VhostUserState *us = u->user;
+    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
     ret = vhost_user_write(dev, &msg, NULL, 0);
     if (ret < 0) {
         return ret;
@@ -1889,6 +1919,9 @@  static int vhost_setup_backend_channel(struct vhost_dev *dev)
         msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
     }

+    struct VhostUserState *us = u->user;
+    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
     ret = vhost_user_write(dev, &msg, &sv[1], 1);
     if (ret) {
         goto out;
@@ -1993,6 +2026,9 @@  static int vhost_user_postcopy_advise(struct vhost_dev *dev, Error **errp)
         .hdr.flags = VHOST_USER_VERSION,
     };

+    struct VhostUserState *us = u->user;
+    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
     ret = vhost_user_write(dev, &msg, NULL, 0);
     if (ret < 0) {
         error_setg(errp, "Failed to send postcopy_advise to vhost");
@@ -2051,6 +2087,9 @@  static int vhost_user_postcopy_listen(struct vhost_dev *dev, Error **errp)

     trace_vhost_user_postcopy_listen();

+    struct VhostUserState *us = u->user;
+    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
     ret = vhost_user_write(dev, &msg, NULL, 0);
     if (ret < 0) {
         error_setg(errp, "Failed to send postcopy_listen to vhost");
@@ -2080,6 +2119,9 @@  static int vhost_user_postcopy_end(struct vhost_dev *dev, Error **errp)

     trace_vhost_user_postcopy_end_entry();

+    struct VhostUserState *us = u->user;
+    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
     ret = vhost_user_write(dev, &msg, NULL, 0);
     if (ret < 0) {
         error_setg(errp, "Failed to send postcopy_end to vhost");
@@ -2372,6 +2414,10 @@  static int vhost_user_net_set_mtu(struct vhost_dev *dev, uint16_t mtu)
         msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
     }

+    struct vhost_user *u = dev->opaque;
+    struct VhostUserState *us = u->user;
+    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
     ret = vhost_user_write(dev, &msg, NULL, 0);
     if (ret < 0) {
         return ret;
@@ -2396,6 +2442,10 @@  static int vhost_user_send_device_iotlb_msg(struct vhost_dev *dev,
         .payload.iotlb = *imsg,
     };

+    struct vhost_user *u = dev->opaque;
+    struct VhostUserState *us = u->user;
+    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
     ret = vhost_user_write(dev, &msg, NULL, 0);
     if (ret < 0) {
         return ret;
@@ -2428,6 +2478,10 @@  static int vhost_user_get_config(struct vhost_dev *dev, uint8_t *config,

     assert(config_len <= VHOST_USER_MAX_CONFIG_SIZE);

+    struct vhost_user *u = dev->opaque;
+    struct VhostUserState *us = u->user;
+    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
     msg.payload.config.offset = 0;
     msg.payload.config.size = config_len;
     ret = vhost_user_write(dev, &msg, NULL, 0);
@@ -2492,6 +2546,10 @@  static int vhost_user_set_config(struct vhost_dev *dev, const uint8_t *data,
     p = msg.payload.config.region;
     memcpy(p, data, size);

+    struct vhost_user *u = dev->opaque;
+    struct VhostUserState *us = u->user;
+    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
     ret = vhost_user_write(dev, &msg, NULL, 0);
     if (ret < 0) {
         return ret;
@@ -2570,6 +2628,10 @@  static int vhost_user_crypto_create_session(struct vhost_dev *dev,
         }
     }

+    struct vhost_user *u = dev->opaque;
+    struct VhostUserState *us = u->user;
+    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
     msg.payload.session.op_code = backend_info->op_code;
     msg.payload.session.session_id = backend_info->session_id;
     ret = vhost_user_write(dev, &msg, NULL, 0);
@@ -2662,6 +2724,9 @@  static int vhost_user_get_inflight_fd(struct vhost_dev *dev,
         return 0;
     }

+    struct VhostUserState *us = u->user;
+    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
     ret = vhost_user_write(dev, &msg, NULL, 0);
     if (ret < 0) {
         return ret;
@@ -2757,6 +2822,7 @@  bool vhost_user_init(VhostUserState *user, CharBackend *chr, Error **errp)
     user->memory_slots = 0;
     user->notifiers = g_ptr_array_new_full(VIRTIO_QUEUE_MAX / 4,
                                            &vhost_user_state_destroy);
+    qemu_mutex_init(&user->vhost_user_request_reply_lock);
     return true;
 }

@@ -2769,6 +2835,7 @@  void vhost_user_cleanup(VhostUserState *user)
     user->notifiers = (GPtrArray *) g_ptr_array_free(user->notifiers, true);
     memory_region_transaction_commit();
     user->chr = NULL;
+    qemu_mutex_destroy(&user->vhost_user_request_reply_lock);
 }


@@ -2902,6 +2969,9 @@  static int vhost_user_set_device_state_fd(struct vhost_dev *dev,
         return -ENOTSUP;
     }

+    struct VhostUserState *us = vu->user;
+    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
     ret = vhost_user_write(dev, &msg, &fd, 1);
     close(fd);
     if (ret < 0) {
@@ -2965,6 +3035,10 @@  static int vhost_user_check_device_state(struct vhost_dev *dev, Error **errp)
         return -ENOTSUP;
     }

+    struct vhost_user *u = dev->opaque;
+    struct VhostUserState *us = u->user;
+    QEMU_LOCK_GUARD(&us->vhost_user_request_reply_lock);
+
     ret = vhost_user_write(dev, &msg, NULL, 0);
     if (ret < 0) {
         error_setg_errno(errp, -ret,
diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
index 324cd8663a..e96f12d449 100644
--- a/include/hw/virtio/vhost-user.h
+++ b/include/hw/virtio/vhost-user.h
@@ -67,6 +67,9 @@  typedef struct VhostUserState {
     GPtrArray *notifiers;
     int memory_slots;
     bool supports_config;
+
+    /* Hold lock for a request-reply cycle */
+    QemuMutex vhost_user_request_reply_lock;
 } VhostUserState;

 /**