From patchwork Tue Feb 19 08:53:00 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 10819497 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F2CA1139A for ; Tue, 19 Feb 2019 08:55:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DEE69290E9 for ; Tue, 19 Feb 2019 08:55:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D21612A050; Tue, 19 Feb 2019 08:55:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 1BB68290E9 for ; Tue, 19 Feb 2019 08:55:45 +0000 (UTC) Received: from localhost ([127.0.0.1]:44435 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gw1Bo-0005RZ-CE for patchwork-qemu-devel@patchwork.kernel.org; Tue, 19 Feb 2019 03:55:44 -0500 Received: from eggs.gnu.org ([209.51.188.92]:55870) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gw19I-00035D-MT for qemu-devel@nongnu.org; Tue, 19 Feb 2019 03:53:10 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gw19G-0001IO-W7 for qemu-devel@nongnu.org; Tue, 19 Feb 2019 03:53:08 -0500 Received: from mga14.intel.com ([192.55.52.115]:14108) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gw19G-0001I0-Il for qemu-devel@nongnu.org; Tue, 19 Feb 2019 03:53:06 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Feb 2019 00:53:05 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,387,1544515200"; d="scan'208";a="321511650" Received: from joy-desktop.sh.intel.com ([10.239.13.17]) by fmsmga005.fm.intel.com with ESMTP; 19 Feb 2019 00:53:00 -0800 From: Yan Zhao To: alex.williamson@redhat.com, qemu-devel@nongnu.org Date: Tue, 19 Feb 2019 16:53:00 +0800 Message-Id: <1550566380-3788-1-git-send-email-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1550566254-3545-1-git-send-email-yan.y.zhao@intel.com> References: <1550566254-3545-1-git-send-email-yan.y.zhao@intel.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 192.55.52.115 Subject: [Qemu-devel] [PATCH 5/5] vfio/migration: support device memory capability X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: cjia@nvidia.com, kvm@vger.kernel.org, aik@ozlabs.ru, Zhengxiao.zx@Alibaba-inc.com, shuangtai.tst@alibaba-inc.com, kwankhede@nvidia.com, eauger@redhat.com, yi.l.liu@intel.com, eskultet@redhat.com, ziye.yang@intel.com, mlevitsk@redhat.com, pasic@linux.ibm.com, arei.gonglei@huawei.com, felipe@nutanix.com, Ken.Xue@amd.com, kevin.tian@intel.com, Yan Zhao , dgilbert@redhat.com, intel-gvt-dev@lists.freedesktop.org, changpeng.liu@intel.com, cohuck@redhat.com, zhi.a.wang@intel.com, jonathan.davies@nutanix.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP If a device has device memory capability, save/load data from device memory in pre-copy and stop-and-copy phases. LOGGING state is set for device memory for dirty page logging: in LOGGING state, get device memory returns whole device memory snapshot; outside LOGGING state, get device memory returns dirty data since last get operation. Usually, device memory is very big, qemu needs to chunk it into several pieces each with size of device memory region. Signed-off-by: Yan Zhao Signed-off-by: Kirti Wankhede --- hw/vfio/migration.c | 235 ++++++++++++++++++++++++++++++++++++++++++++++++++-- hw/vfio/pci.h | 1 + 2 files changed, 231 insertions(+), 5 deletions(-) diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 16d6395..f1e9309 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -203,6 +203,201 @@ static int vfio_load_data_device_config(VFIOPCIDevice *vdev, return 0; } +static int vfio_get_device_memory_size(VFIOPCIDevice *vdev) +{ + VFIODevice *vbasedev = &vdev->vbasedev; + VFIORegion *region_ctl = + &vdev->migration->region[VFIO_DEVSTATE_REGION_CTL]; + uint64_t len; + int sz; + + sz = sizeof(len); + if (pread(vbasedev->fd, &len, sz, + region_ctl->fd_offset + + offsetof(struct vfio_device_state_ctl, device_memory.size)) + != sz) { + error_report("vfio: Failed to get length of device memory"); + return -1; + } + vdev->migration->devmem_size = len; + return 0; +} + +static int vfio_set_device_memory_size(VFIOPCIDevice *vdev, uint64_t size) +{ + VFIODevice *vbasedev = &vdev->vbasedev; + VFIORegion *region_ctl = + &vdev->migration->region[VFIO_DEVSTATE_REGION_CTL]; + int sz; + + sz = sizeof(size); + if (pwrite(vbasedev->fd, &size, sz, + region_ctl->fd_offset + + offsetof(struct vfio_device_state_ctl, device_memory.size)) + != sz) { + error_report("vfio: Failed to set length of device comemory"); + return -1; + } + vdev->migration->devmem_size = size; + return 0; +} + +static +int vfio_save_data_device_memory_chunk(VFIOPCIDevice *vdev, QEMUFile *f, + uint64_t pos, uint64_t len) +{ + VFIODevice *vbasedev = &vdev->vbasedev; + VFIORegion *region_ctl = + &vdev->migration->region[VFIO_DEVSTATE_REGION_CTL]; + VFIORegion *region_devmem = + &vdev->migration->region[VFIO_DEVSTATE_REGION_DATA_DEVICE_MEMORY]; + void *dest; + uint32_t sz; + uint8_t *buf = NULL; + uint32_t action = VFIO_DEVICE_DATA_ACTION_GET_BUFFER; + + if (len > region_devmem->size) { + return -1; + } + + sz = sizeof(pos); + if (pwrite(vbasedev->fd, &pos, sz, + region_ctl->fd_offset + + offsetof(struct vfio_device_state_ctl, device_memory.pos)) + != sz) { + error_report("vfio: Failed to set save buffer pos"); + return -1; + } + sz = sizeof(action); + if (pwrite(vbasedev->fd, &action, sz, + region_ctl->fd_offset + + offsetof(struct vfio_device_state_ctl, device_memory.action)) + != sz) { + error_report("vfio: Failed to set save buffer action"); + return -1; + } + + if (!vfio_device_state_region_mmaped(region_devmem)) { + buf = g_malloc(len); + if (buf == NULL) { + error_report("vfio: Failed to allocate memory for migrate"); + return -1; + } + if (pread(vbasedev->fd, buf, len, region_devmem->fd_offset) != len) { + error_report("vfio: error load device memory buffer"); + return -1; + } + qemu_put_be64(f, len); + qemu_put_be64(f, pos); + qemu_put_buffer(f, buf, len); + g_free(buf); + } else { + dest = region_devmem->mmaps[0].mmap; + qemu_put_be64(f, len); + qemu_put_be64(f, pos); + qemu_put_buffer(f, dest, len); + } + return 0; +} + +static int vfio_save_data_device_memory(VFIOPCIDevice *vdev, QEMUFile *f) +{ + VFIORegion *region_devmem = + &vdev->migration->region[VFIO_DEVSTATE_REGION_DATA_DEVICE_MEMORY]; + uint64_t total_len = vdev->migration->devmem_size; + uint64_t pos = 0; + + qemu_put_be64(f, total_len); + while (pos < total_len) { + uint64_t len = region_devmem->size; + + if (pos + len >= total_len) { + len = total_len - pos; + } + if (vfio_save_data_device_memory_chunk(vdev, f, pos, len)) { + return -1; + } + } + + return 0; +} + +static +int vfio_load_data_device_memory_chunk(VFIOPCIDevice *vdev, QEMUFile *f, + uint64_t pos, uint64_t len) +{ + VFIODevice *vbasedev = &vdev->vbasedev; + VFIORegion *region_ctl = + &vdev->migration->region[VFIO_DEVSTATE_REGION_CTL]; + VFIORegion *region_devmem = + &vdev->migration->region[VFIO_DEVSTATE_REGION_DATA_DEVICE_MEMORY]; + + void *dest; + uint32_t sz; + uint8_t *buf = NULL; + uint32_t action = VFIO_DEVICE_DATA_ACTION_SET_BUFFER; + + if (len > region_devmem->size) { + return -1; + } + + sz = sizeof(pos); + if (pwrite(vbasedev->fd, &pos, sz, + region_ctl->fd_offset + + offsetof(struct vfio_device_state_ctl, device_memory.pos)) + != sz) { + error_report("vfio: Failed to set device memory buffer pos"); + return -1; + } + if (!vfio_device_state_region_mmaped(region_devmem)) { + buf = g_malloc(len); + if (buf == NULL) { + error_report("vfio: Failed to allocate memory for migrate"); + return -1; + } + qemu_get_buffer(f, buf, len); + if (pwrite(vbasedev->fd, buf, len, + region_devmem->fd_offset) != len) { + error_report("vfio: Failed to load devie memory buffer"); + return -1; + } + g_free(buf); + } else { + dest = region_devmem->mmaps[0].mmap; + qemu_get_buffer(f, dest, len); + } + + sz = sizeof(action); + if (pwrite(vbasedev->fd, &action, sz, + region_ctl->fd_offset + + offsetof(struct vfio_device_state_ctl, device_memory.action)) + != sz) { + error_report("vfio: Failed to set load device memory buffer action"); + return -1; + } + + return 0; + +} + +static int vfio_load_data_device_memory(VFIOPCIDevice *vdev, + QEMUFile *f, uint64_t total_len) +{ + uint64_t pos = 0, len = 0; + + vfio_set_device_memory_size(vdev, total_len); + + while (pos + len < total_len) { + len = qemu_get_be64(f); + pos = qemu_get_be64(f); + + vfio_load_data_device_memory_chunk(vdev, f, pos, len); + } + + return 0; +} + + static int vfio_set_dirty_page_bitmap_chunk(VFIOPCIDevice *vdev, uint64_t start_addr, uint64_t page_nr) { @@ -377,6 +572,10 @@ static void vfio_save_live_pending(QEMUFile *f, void *opaque, return; } + /* get dirty data size of device memory */ + vfio_get_device_memory_size(vdev); + + *res_precopy_only += vdev->migration->devmem_size; return; } @@ -388,7 +587,9 @@ static int vfio_save_iterate(QEMUFile *f, void *opaque) return 0; } - return 0; + qemu_put_byte(f, VFIO_SAVE_FLAG_DEVMEMORY); + /* get dirty data of device memory */ + return vfio_save_data_device_memory(vdev, f); } static void vfio_pci_load_config(VFIOPCIDevice *vdev, QEMUFile *f) @@ -458,6 +659,10 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id) len = qemu_get_be64(f); vfio_load_data_device_config(vdev, f, len); break; + case VFIO_SAVE_FLAG_DEVMEMORY: + len = qemu_get_be64(f); + vfio_load_data_device_memory(vdev, f, len); + break; default: ret = -EINVAL; } @@ -503,6 +708,13 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque) VFIOPCIDevice *vdev = opaque; int rc = 0; + if (vfio_device_data_cap_device_memory(vdev)) { + qemu_put_byte(f, VFIO_SAVE_FLAG_DEVMEMORY | VFIO_SAVE_FLAG_CONTINUE); + /* get dirty data of device memory */ + vfio_get_device_memory_size(vdev); + rc = vfio_save_data_device_memory(vdev, f); + } + qemu_put_byte(f, VFIO_SAVE_FLAG_PCI | VFIO_SAVE_FLAG_CONTINUE); vfio_pci_save_config(vdev, f); @@ -515,12 +727,22 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque) static int vfio_save_setup(QEMUFile *f, void *opaque) { + int rc = 0; VFIOPCIDevice *vdev = opaque; - qemu_put_byte(f, VFIO_SAVE_FLAG_SETUP); + + if (vfio_device_data_cap_device_memory(vdev)) { + qemu_put_byte(f, VFIO_SAVE_FLAG_SETUP | VFIO_SAVE_FLAG_CONTINUE); + qemu_put_byte(f, VFIO_SAVE_FLAG_DEVMEMORY); + /* get whole snapshot of device memory */ + vfio_get_device_memory_size(vdev); + rc = vfio_save_data_device_memory(vdev, f); + } else { + qemu_put_byte(f, VFIO_SAVE_FLAG_SETUP); + } vfio_set_device_state(vdev, VFIO_DEVICE_STATE_RUNNING | VFIO_DEVICE_STATE_LOGGING); - return 0; + return rc; } static int vfio_load_setup(QEMUFile *f, void *opaque) @@ -576,8 +798,11 @@ int vfio_migration_init(VFIOPCIDevice *vdev, Error **errp) goto error; } - if (vfio_device_data_cap_device_memory(vdev)) { - error_report("No suppport of data cap device memory Yet"); + if (vfio_device_data_cap_device_memory(vdev) && + vfio_device_state_region_setup(vdev, + &vdev->migration->region[VFIO_DEVSTATE_REGION_DATA_DEVICE_MEMORY], + VFIO_REGION_SUBTYPE_DEVICE_STATE_DATA_MEMORY, + "device-state-data-device-memory")) { goto error; } diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h index 4b7b1bb..a2cc64b 100644 --- a/hw/vfio/pci.h +++ b/hw/vfio/pci.h @@ -69,6 +69,7 @@ typedef struct VFIOMigration { uint32_t data_caps; uint32_t device_state; uint64_t devconfig_size; + uint64_t devmem_size; VMChangeStateEntry *vm_state; } VFIOMigration;