Message ID | 20250207184212.20831-1-dongli.zhang@oracle.com (mailing list archive) |
---|---|
Headers | show |
Series | vhost-scsi: log write descriptors for live migration (and two bugfix) | expand |
Thanks to the suggestion from Mike, I am going re-send v2 with: 1. Re-base on top of the below patchset. [PATCH v2 0/8] vhost-scsi: Memory reduction patches https://yhbt.net/lore/target-devel/20241203191705.19431-1-michael.christie@oracle.com/ The patchset can clean apply/build on top of the commit 87a132e73910 ("Merge tag 'mm-hotfixes-stable-2025-02-19-17-49' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm"). 2. Don't allocate all per-cmd log buffer until VHOST_F_LOG_ALL is set. Either to take advantage of vhost_scsi_set_features(), or follow the idea of below patch. [PATCH v2 5/8] vhost-scsi: Dynamically allocate scatterlists https://yhbt.net/lore/target-devel/20241203191705.19431-6-michael.christie@oracle.com/ Thank you very much! Dongli Zhang On 2/7/25 10:41 AM, Dongli Zhang wrote: > The live migration with vhost-scsi has been enabled by QEMU commit > b3e89c941a85 ("vhost-scsi: Allow user to enable migration"), which > thoroughly explains the workflow that QEMU collaborates with vhost-scsi on > the live migration. > > Although it logs dirty data for the used ring, it doesn't log any write > descriptor (VRING_DESC_F_WRITE). > > In comparison, vhost-net logs write descriptors via vhost_log_write(). The > SPDK (vhost-user-scsi backend) also logs write descriptors via > vhost_log_req_desc(). > > As a result, there is likely data mismatch between memory and vhost-scsi > disk during the live migration. > > 1. Suppose there is high workload and high memory usage. Suppose some > systemd userspace pages are swapped out to the swap disk. > > 2. Upon request from systemd, the kernel reads some pages from the swap > disk to the memory via vhost-scsi. > > 3. Although those userspace pages' data are updated, they are not marked as > dirty by vhost-scsi (this is the bug). They are not going to migrate to the > target host during memory transfer iterations. > > 4. Suppose systemd doesn't write to those pages any longer. Those pages > never get the chance to be dirty or migrated any longer. > > 5. Once the guest VM is resumed on the target host, because of the lack of > those dirty pages' data, the systemd may run into abnormal status, i.e., > there may be systemd segfault. > > Log all write descriptors to fix the issue. > > In addition, the patchset also fixes two bugs in vhost-scsi. > > Dongli Zhang (log descriptor, suggested by Joao Martins): > vhost: modify vhost_log_write() for broader users > vhost-scsi: adjust vhost_scsi_get_desc() to log vring descriptors > vhost-scsi: cache log buffer in I/O queue vhost_scsi_cmd > vhost-scsi: log I/O queue write descriptors > vhost-scsi: log control queue write descriptors > vhost-scsi: log event queue write descriptors > vhost: add WARNING if log_num is more than limit > > Dongli Zhang (vhost-scsi bugfix): > vhost-scsi: protect vq->log_used with vq->mutex > vhost-scsi: Fix vhost_scsi_send_bad_target() > > drivers/vhost/net.c | 2 +- > drivers/vhost/scsi.c | 191 +++++++++++++++++++++++++++++++++++++++------ > drivers/vhost/vhost.c | 46 ++++++++--- > drivers/vhost/vhost.h | 2 +- > 4 files changed, 206 insertions(+), 35 deletions(-) > > > base-commit: 5c8c229261f14159b54b9a32f12e5fa89d88b905 > > Thank you very much! > > Dongli Zhang > >