From patchwork Tue Feb 19 08:52:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 10819499 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1D4B813B5 for ; Tue, 19 Feb 2019 08:55:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 098B1290E9 for ; Tue, 19 Feb 2019 08:55:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F0E562A050; Tue, 19 Feb 2019 08:55:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 0E07B290E9 for ; Tue, 19 Feb 2019 08:55:58 +0000 (UTC) Received: from localhost ([127.0.0.1]:44399 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gw1C1-0004N1-8Y for patchwork-qemu-devel@patchwork.kernel.org; Tue, 19 Feb 2019 03:55:57 -0500 Received: from eggs.gnu.org ([209.51.188.92]:55582) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gw18o-0002iP-Gt for qemu-devel@nongnu.org; Tue, 19 Feb 2019 03:52:40 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gw18m-00015g-Cv for qemu-devel@nongnu.org; Tue, 19 Feb 2019 03:52:38 -0500 Received: from mga05.intel.com ([192.55.52.43]:6447) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gw18l-00012i-US for qemu-devel@nongnu.org; Tue, 19 Feb 2019 03:52:36 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Feb 2019 00:52:18 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,387,1544515200"; d="scan'208";a="145414989" Received: from joy-desktop.sh.intel.com ([10.239.13.17]) by fmsmga004.fm.intel.com with ESMTP; 19 Feb 2019 00:52:14 -0800 From: Yan Zhao To: alex.williamson@redhat.com, qemu-devel@nongnu.org Date: Tue, 19 Feb 2019 16:52:14 +0800 Message-Id: <1550566334-3602-1-git-send-email-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1550566254-3545-1-git-send-email-yan.y.zhao@intel.com> References: <1550566254-3545-1-git-send-email-yan.y.zhao@intel.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 192.55.52.43 Subject: [Qemu-devel] [PATCH 1/5] vfio/migration: define kernel interfaces X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: cjia@nvidia.com, kvm@vger.kernel.org, aik@ozlabs.ru, Zhengxiao.zx@Alibaba-inc.com, shuangtai.tst@alibaba-inc.com, kwankhede@nvidia.com, eauger@redhat.com, yi.l.liu@intel.com, eskultet@redhat.com, ziye.yang@intel.com, mlevitsk@redhat.com, pasic@linux.ibm.com, arei.gonglei@huawei.com, felipe@nutanix.com, Ken.Xue@amd.com, kevin.tian@intel.com, Yan Zhao , dgilbert@redhat.com, intel-gvt-dev@lists.freedesktop.org, changpeng.liu@intel.com, cohuck@redhat.com, zhi.a.wang@intel.com, jonathan.davies@nutanix.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP - defined 4 device states regions: one control region and 3 data regions - defined layout of control region in struct vfio_device_state_ctl - defined 4 device states: running, stop, running&logging, stop&logging - define 3 device data categories: device config, device memory, system memory - defined 2 device data capabilities: device memory and system memory - defined device state interfaces' version and 12 device state interfaces Signed-off-by: Yan Zhao Signed-off-by: Kevin Tian Signed-off-by: Yulei Zhang --- linux-headers/linux/vfio.h | 260 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 260 insertions(+) diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h index ceb6453..a124fc1 100644 --- a/linux-headers/linux/vfio.h +++ b/linux-headers/linux/vfio.h @@ -303,6 +303,56 @@ struct vfio_region_info_cap_type { #define VFIO_REGION_SUBTYPE_INTEL_IGD_HOST_CFG (2) #define VFIO_REGION_SUBTYPE_INTEL_IGD_LPC_CFG (3) +/* Device State region type and sub-type + * + * A VFIO device driver needs to register up to four device state regions in + * total: two mandatory and another two optional, if it plans to support device + * state management. + * + * 1. region CTL : + * Mandatory. + * This is a control region. + * Its layout is defined in struct vfio_device_state_ctl. + * Reading from this region can get version, capabilities and data + * size of device state interfaces. + * Writing to this region can set device state, data size and + * choose which interface to use. + * 2. region DEVICE_CONFIG + * Mandatory. + * This is a data region that holds device config data. + * Device config is such kind of data like MMIOs, page tables... + * Every device is supposed to possess device config data. + * Usually the size of device config data is small (no big + * than 10M), and it needs to be loaded in certain strict + * order. + * Therefore no dirty data logging is enabled for device + * config and it must be got/set as a whole. + * Size of device config data is smaller than or equal to that of + * device config region. + * It is able to be mmaped into user space. + * 3. region DEVICE_MEMORY + * Optional. + * This is a data region that holds device memory data. + * Device memory is device's internal memory, standalone and outside + * system memory. It is usually very big. + * Not all device has device memory. Like IGD only uses system + * memory and has no device memory. + * Size of devie memory is usually larger than that of device + * memory region. qemu needs to save/load it in chunks of size of + * device memory region. + * It is able to be mmaped into user space. + * 4. region DIRTY_BITMAP + * Optional. + * This is a data region that holds bitmap of dirty pages in system + * memory that a VFIO devices produces. + * It is able to be mmaped into user space. + */ +#define VFIO_REGION_TYPE_DEVICE_STATE (1 << 1) +#define VFIO_REGION_SUBTYPE_DEVICE_STATE_CTL (1) +#define VFIO_REGION_SUBTYPE_DEVICE_STATE_DATA_CONFIG (2) +#define VFIO_REGION_SUBTYPE_DEVICE_STATE_DATA_MEMORY (3) +#define VFIO_REGION_SUBTYPE_DEVICE_STATE_DATA_DIRTYBITMAP (4) + /* * The MSIX mappable capability informs that MSIX data of a BAR can be mmapped * which allows direct access to non-MSIX registers which happened to be within @@ -816,6 +866,216 @@ struct vfio_iommu_spapr_tce_remove { }; #define VFIO_IOMMU_SPAPR_TCE_REMOVE _IO(VFIO_TYPE, VFIO_BASE + 20) +/* version number of the device state interface */ +#define VFIO_DEVICE_STATE_INTERFACE_VERSION 1 + +/* + * For devices that have devcie memory, it is required to expose + * DEVICE_MEMORY capability. + * + * For devices producing dirty pages in system memory, it is required to + * expose cap SYSTEM_MEMORY in order to get dirty bitmap in certain range + * of system memory. + */ +#define VFIO_DEVICE_DATA_CAP_DEVICE_MEMORY 1 +#define VFIO_DEVICE_DATA_CAP_SYSTEM_MEMORY 2 + +/* + * DEVICE STATES + * + * Four states are defined for a VFIO device: + * RUNNING, RUNNING & LOGGING, STOP & LOGGING, STOP. + * They can be set by writing to device_state field of + * vfio_device_state_ctl region. + * + * RUNNING: In this state, a VFIO device is in active state ready to + * receive commands from device driver. + * It is the default state that a VFIO device enters initially. + * + * STOP: In this state, a VFIO device is deactivated to interact with + * device driver. + * + * LOGGING state is a special state that it CANNOT exist + * independently. + * It must be set alongside with state RUNNING or STOP, i.e, + * RUNNING & LOGGING, STOP & LOGGING. + * It is used for dirty data logging both for device memory + * and system memory. + * + * LOGGING only impacts device/system memory. In LOGGING state, get buffer + * of device memory returns dirty pages since last call; outside LOGGING + * state, get buffer of device memory returns whole snapshot of device + * memory. system memory's dirty page is only available in LOGGING state. + * + * Device config should be always accessible and return whole config snapshot + * regardless of LOGGING state. + * */ +#define VFIO_DEVICE_STATE_RUNNING 0 +#define VFIO_DEVICE_STATE_STOP 1 +#define VFIO_DEVICE_STATE_LOGGING 2 + +/* action to get data from device memory or device config + * the action is write to device state's control region, and data is read + * from device memory region or device config region. + * Each time before read device memory region or device config region, + * action VFIO_DEVICE_DATA_ACTION_GET_BUFFER is required to write to action + * field in control region. That is because device memory and devie config + * region is mmaped into user space. vendor driver has to be notified of + * the the GET_BUFFER action in advance. + */ +#define VFIO_DEVICE_DATA_ACTION_GET_BUFFER 1 + +/* action to set data to device memory or device config + * the action is write to device state's control region, and data is + * written to device memory region or device config region. + * Each time after write to device memory region or device config region, + * action VFIO_DEVICE_DATA_ACTION_GET_BUFFER is required to write to action + * field in control region. That is because device memory and devie config + * region is mmaped into user space. vendor driver has to be notified of + * the the SET_BUFFER action after data written. + */ +#define VFIO_DEVICE_DATA_ACTION_SET_BUFFER 2 + +/* layout of device state interfaces' control region + * By reading to control region and reading/writing data from device config + * region, device memory region, system memory regions, below interface can + * be implemented: + * + * 1. get version + * (1) user space calls read system call on "version" field of control + * region. + * (2) vendor driver writes version number of device state interfaces + * to the "version" field of control region. + * + * 2. get caps + * (1) user space calls read system call on "caps" field of control region. + * (2) if a VFIO device has huge device memory, vendor driver reports + * VFIO_DEVICE_DATA_CAP_DEVICE_MEMORY in "caps" field of control region. + * if a VFIO device produces dirty pages in system memory, vendor driver + * reports VFIO_DEVICE_DATA_CAP_SYSTEM_MEMORY in "caps" field of + * control region. + * + * 3. set device state + * (1) user space calls write system call on "device_state" field of + * control region. + * (2) device state transitions as: + * + * RUNNING -- start dirty data logging --> RUNNING & LOGGING + * RUNNING -- deactivate --> STOP + * RUNNING -- deactivate & start dirty data longging --> STOP & LOGGING + * RUNNING & LOGGING -- stop dirty data logging --> RUNNING + * RUNNING & LOGGING -- deactivate --> STOP & LOGGING + * RUNNING & LOGGING -- deactivate & stop dirty data logging --> STOP + * STOP -- activate --> RUNNING + * STOP -- start dirty data logging --> STOP & LOGGING + * STOP -- activate & start dirty data logging --> RUNNING & LOGGING + * STOP & LOGGING -- stop dirty data logging --> STOP + * STOP & LOGGING -- activate --> RUNNING & LOGGING + * STOP & LOGGING -- activate & stop dirty data logging --> RUNNING + * + * 4. get device config size + * (1) user space calls read system call on "device_config.size" field of + * control region for the total size of device config snapshot. + * (2) vendor driver writes device config data's total size in + * "device_config.size" field of control region. + * + * 5. set device config size + * (1) user space calls write system call. + * total size of device config snapshot --> "device_config.size" field + * of control region. + * (2) vendor driver reads device config data's total size from + * "device_config.size" field of control region. + * + * 6 get device config buffer + * (1) user space calls write system call. + * "GET_BUFFER" --> "device_config.action" field of control region. + * (2) vendor driver + * a. gets whole snapshot for device config + * b. writes whole device config snapshot to region + * DEVICE_CONFIG. + * (3) user space reads the whole of device config snapshot from region + * DEVICE_CONFIG. + * + * 7. set device config buffer + * (1) user space writes whole of device config data to region + * DEVICE_CONFIG. + * (2) user space calls write system call. + * "SET_BUFFER" --> "device_config.action" field of control region. + * (3) vendor driver loads whole of device config from region DEVICE_CONFIG. + * + * 8. get device memory size + * (1) user space calls read system call on "device_memory.size" field of + * control region for device memory size. + * (2) vendor driver + * a. gets device memory snapshot (in state RUNNING or STOP), or + * gets device memory dirty data (in state RUNNING & LOGGING or + * state STOP & LOGGING) + * b. writes size in "device_memory.size" field of control region + * + * 9. set device memory size + * (1) user space calls write system call on "device_memory.size" field of + * control region to set total size of device memory snapshot. + * (2) vendor driver reads device memory's size from "device_memory.size" + * field of control region. + * + * + * 10. get device memory buffer + * (1) user space calls write system. + * pos --> "device_memory.pos" field of control region, + * "GET_BUFFER" --> "device_memory.action" field of control region. + * (pos must be 0 or multiples of length of region DEVICE_MEMORY). + * (2) vendor driver writes N'th chunk of device memory snapshot/dirty data + * to region DEVICE_MEMORY. + * (N equals to pos/(region length of DEVICE_MEMORY)) + * (3) user space reads the N'th chunk of device memory snapshot/dirty data + * from region DEVICE_MEMORY. + * + * 11. set device memory buffer + * (1) user space writes N'th chunk of device memory snapshot/dirty data to + * region DEVICE_MEMORY. + * (N equals to pos/(region length of DEVICE_MEMORY)) + * (2) user space writes pos to "device_memory.pos" field and writes + * "SET_BUFFER" to "device_memory.action" field of control region. + * (3) vendor driver loads N'th chunk of device memory snapshot/dirty data + * from region DEVICE_MEMORY. + * + * 12. get system memory dirty bitmap + * (1) user space calls write system call to specify a range of system + * memory that querying dirty pages. + * system memory's start address --> "system_memory.start_addr" field + * of control region, + * system memory's page count --> "system_memory.page_nr" field of + * control region. + * (2) if device state is not in RUNNING or STOP & LOGGING, + * vendor driver returns empty bitmap; otherwise, + * vendor driver checks the page_nr, + * if it's larger than the size that region DIRTY_BITMAP can support, + * error returns; if not, + * vendor driver returns as bitmap to specify dirty pages that + * device produces since last query in this range of system memory . + * (3) usespace reads back the dirty bitmap from region DIRTY_BITMAP. + * + */ + +struct vfio_device_state_ctl { + __u32 version; /* ro versio of devcie state interfaces*/ + __u32 device_state; /* VFIO device state, wo */ + __u32 caps; /* ro */ + struct { + __u32 action; /* wo, GET_BUFFER or SET_BUFFER */ + __u64 size; /*rw, total size of device config*/ + } device_config; + struct { + __u32 action; /* wo, GET_BUFFER or SET_BUFFER */ + __u64 size; /* rw, total size of device memory*/ + __u64 pos;/*chunk offset in total buffer of device memory*/ + } device_memory; + struct { + __u64 start_addr; /* wo */ + __u64 page_nr; /* wo */ + } system_memory; +}__attribute__((packed)); + /* ***************************************************************** */ #endif /* VFIO_H */ From patchwork Tue Feb 19 08:52:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 10819501 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 438B213B5 for ; Tue, 19 Feb 2019 08:58:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 313F1290E9 for ; Tue, 19 Feb 2019 08:58:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2583C2A050; Tue, 19 Feb 2019 08:58:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI autolearn=unavailable version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id DE5A2290E9 for ; Tue, 19 Feb 2019 08:58:29 +0000 (UTC) Received: from localhost ([127.0.0.1]:44459 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gw1ET-0007lE-6L for patchwork-qemu-devel@patchwork.kernel.org; Tue, 19 Feb 2019 03:58:29 -0500 Received: from eggs.gnu.org ([209.51.188.92]:55588) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gw18o-0002iZ-Mw for qemu-devel@nongnu.org; Tue, 19 Feb 2019 03:52:41 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gw18m-00015S-3K for qemu-devel@nongnu.org; Tue, 19 Feb 2019 03:52:38 -0500 Received: from mga05.intel.com ([192.55.52.43]:6456) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gw18l-00014Y-NG for qemu-devel@nongnu.org; Tue, 19 Feb 2019 03:52:36 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Feb 2019 00:52:32 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,387,1544515200"; d="scan'208";a="144637067" Received: from joy-desktop.sh.intel.com ([10.239.13.17]) by fmsmga002.fm.intel.com with ESMTP; 19 Feb 2019 00:52:28 -0800 From: Yan Zhao To: alex.williamson@redhat.com, qemu-devel@nongnu.org Date: Tue, 19 Feb 2019 16:52:27 +0800 Message-Id: <1550566347-3648-1-git-send-email-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1550566254-3545-1-git-send-email-yan.y.zhao@intel.com> References: <1550566254-3545-1-git-send-email-yan.y.zhao@intel.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 192.55.52.43 Subject: [Qemu-devel] [PATCH 2/5] vfio/migration: support device of device config capability X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: cjia@nvidia.com, kvm@vger.kernel.org, aik@ozlabs.ru, Zhengxiao.zx@Alibaba-inc.com, shuangtai.tst@alibaba-inc.com, kwankhede@nvidia.com, eauger@redhat.com, yi.l.liu@intel.com, eskultet@redhat.com, ziye.yang@intel.com, mlevitsk@redhat.com, pasic@linux.ibm.com, arei.gonglei@huawei.com, felipe@nutanix.com, Ken.Xue@amd.com, kevin.tian@intel.com, Yan Zhao , dgilbert@redhat.com, intel-gvt-dev@lists.freedesktop.org, changpeng.liu@intel.com, cohuck@redhat.com, zhi.a.wang@intel.com, jonathan.davies@nutanix.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP Device config is the default data that every device should have. so device config capability is by default on, no need to set. - Currently two type of resources are saved/loaded for device of device config capability: General PCI config data, and Device config data. They are copies as a whole when precopy is stopped. Migration setup flow: - Setup device state regions, check its device state version and capabilities. Mmap Device Config Region and Dirty Bitmap Region, if available. - If device state regions are failed to get setup, a migration blocker is registered instead. - Added SaveVMHandlers to register device state save/load handlers. - Register VM state change handler to set device's running/stop states. - On migration startup on source machine, set device's state to VFIO_DEVICE_STATE_LOGGING Signed-off-by: Yan Zhao Signed-off-by: Yulei Zhang --- hw/vfio/Makefile.objs | 2 +- hw/vfio/migration.c | 633 ++++++++++++++++++++++++++++++++++++++++++ hw/vfio/pci.c | 1 - hw/vfio/pci.h | 25 +- include/hw/vfio/vfio-common.h | 1 + 5 files changed, 659 insertions(+), 3 deletions(-) create mode 100644 hw/vfio/migration.c diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs index 8b3f664..f32ff19 100644 --- a/hw/vfio/Makefile.objs +++ b/hw/vfio/Makefile.objs @@ -1,6 +1,6 @@ ifeq ($(CONFIG_LINUX), y) obj-$(CONFIG_SOFTMMU) += common.o -obj-$(CONFIG_PCI) += pci.o pci-quirks.o display.o +obj-$(CONFIG_PCI) += pci.o pci-quirks.o display.o migration.o obj-$(CONFIG_VFIO_CCW) += ccw.o obj-$(CONFIG_SOFTMMU) += platform.o obj-$(CONFIG_VFIO_XGMAC) += calxeda-xgmac.o diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c new file mode 100644 index 0000000..16d6395 --- /dev/null +++ b/hw/vfio/migration.c @@ -0,0 +1,633 @@ +#include "qemu/osdep.h" + +#include "hw/vfio/vfio-common.h" +#include "migration/blocker.h" +#include "migration/register.h" +#include "qapi/error.h" +#include "pci.h" +#include "sysemu/kvm.h" +#include "exec/ram_addr.h" + +#define VFIO_SAVE_FLAG_SETUP 0 +#define VFIO_SAVE_FLAG_PCI 1 +#define VFIO_SAVE_FLAG_DEVCONFIG 2 +#define VFIO_SAVE_FLAG_DEVMEMORY 4 +#define VFIO_SAVE_FLAG_CONTINUE 8 + +static int vfio_device_state_region_setup(VFIOPCIDevice *vdev, + VFIORegion *region, uint32_t subtype, const char *name) +{ + VFIODevice *vbasedev = &vdev->vbasedev; + struct vfio_region_info *info; + int ret; + + ret = vfio_get_dev_region_info(vbasedev, VFIO_REGION_TYPE_DEVICE_STATE, + subtype, &info); + if (ret) { + error_report("Failed to get info of region %s", name); + return ret; + } + + if (vfio_region_setup(OBJECT(vdev), vbasedev, + region, info->index, name)) { + error_report("Failed to setup migrtion region %s", name); + return ret; + } + + if (vfio_region_mmap(region)) { + error_report("Failed to mmap migrtion region %s", name); + } + + return 0; +} + +bool vfio_device_data_cap_system_memory(VFIOPCIDevice *vdev) +{ + return !!(vdev->migration->data_caps & VFIO_DEVICE_DATA_CAP_SYSTEM_MEMORY); +} + +bool vfio_device_data_cap_device_memory(VFIOPCIDevice *vdev) +{ + return !!(vdev->migration->data_caps & VFIO_DEVICE_DATA_CAP_DEVICE_MEMORY); +} + +static bool vfio_device_state_region_mmaped(VFIORegion *region) +{ + bool mmaped = true; + if (region->nr_mmaps != 1 || region->mmaps[0].offset || + (region->size != region->mmaps[0].size) || + (region->mmaps[0].mmap == NULL)) { + mmaped = false; + } + + return mmaped; +} + +static int vfio_get_device_config_size(VFIOPCIDevice *vdev) +{ + VFIODevice *vbasedev = &vdev->vbasedev; + VFIORegion *region_ctl = + &vdev->migration->region[VFIO_DEVSTATE_REGION_CTL]; + VFIORegion *region_config = + &vdev->migration->region[VFIO_DEVSTATE_REGION_DATA_CONFIG]; + uint64_t len; + int sz; + + sz = sizeof(len); + if (pread(vbasedev->fd, &len, sz, + region_ctl->fd_offset + + offsetof(struct vfio_device_state_ctl, device_config.size)) + != sz) { + error_report("vfio: Failed to get length of device config"); + return -1; + } + if (len > region_config->size) { + error_report("vfio: Error device config length"); + return -1; + } + vdev->migration->devconfig_size = len; + + return 0; +} + +static int vfio_set_device_config_size(VFIOPCIDevice *vdev, uint64_t size) +{ + VFIODevice *vbasedev = &vdev->vbasedev; + VFIORegion *region_ctl = + &vdev->migration->region[VFIO_DEVSTATE_REGION_CTL]; + VFIORegion *region_config = + &vdev->migration->region[VFIO_DEVSTATE_REGION_DATA_CONFIG]; + int sz; + + if (size > region_config->size) { + return -1; + } + + sz = sizeof(size); + if (pwrite(vbasedev->fd, &size, sz, + region_ctl->fd_offset + + offsetof(struct vfio_device_state_ctl, device_config.size)) + != sz) { + error_report("vfio: Failed to set length of device config"); + return -1; + } + vdev->migration->devconfig_size = size; + return 0; +} + +static int vfio_save_data_device_config(VFIOPCIDevice *vdev, QEMUFile *f) +{ + VFIODevice *vbasedev = &vdev->vbasedev; + VFIORegion *region_ctl = + &vdev->migration->region[VFIO_DEVSTATE_REGION_CTL]; + VFIORegion *region_config = + &vdev->migration->region[VFIO_DEVSTATE_REGION_DATA_CONFIG]; + void *dest; + uint32_t sz; + uint8_t *buf = NULL; + uint32_t action = VFIO_DEVICE_DATA_ACTION_GET_BUFFER; + uint64_t len = vdev->migration->devconfig_size; + + qemu_put_be64(f, len); + + sz = sizeof(action); + if (pwrite(vbasedev->fd, &action, sz, + region_ctl->fd_offset + + offsetof(struct vfio_device_state_ctl, device_config.action)) + != sz) { + error_report("vfio: action failure for device config get buffer"); + return -1; + } + + if (!vfio_device_state_region_mmaped(region_config)) { + buf = g_malloc(len); + if (buf == NULL) { + error_report("vfio: Failed to allocate memory for migrate"); + return -1; + } + if (pread(vbasedev->fd, buf, len, region_config->fd_offset) != len) { + error_report("vfio: Failed read device config buffer"); + return -1; + } + qemu_put_buffer(f, buf, len); + g_free(buf); + } else { + dest = region_config->mmaps[0].mmap; + qemu_put_buffer(f, dest, len); + } + return 0; +} + +static int vfio_load_data_device_config(VFIOPCIDevice *vdev, + QEMUFile *f, uint64_t len) +{ + VFIODevice *vbasedev = &vdev->vbasedev; + VFIORegion *region_ctl = + &vdev->migration->region[VFIO_DEVSTATE_REGION_CTL]; + VFIORegion *region_config = + &vdev->migration->region[VFIO_DEVSTATE_REGION_DATA_CONFIG]; + void *dest; + uint32_t sz; + uint8_t *buf = NULL; + uint32_t action = VFIO_DEVICE_DATA_ACTION_SET_BUFFER; + + vfio_set_device_config_size(vdev, len); + + if (!vfio_device_state_region_mmaped(region_config)) { + buf = g_malloc(len); + if (buf == NULL) { + error_report("vfio: Failed to allocate memory for migrate"); + return -1; + } + qemu_get_buffer(f, buf, len); + if (pwrite(vbasedev->fd, buf, len, + region_config->fd_offset) != len) { + error_report("vfio: Failed to write devie config buffer"); + return -1; + } + g_free(buf); + } else { + dest = region_config->mmaps[0].mmap; + qemu_get_buffer(f, dest, len); + } + + sz = sizeof(action); + if (pwrite(vbasedev->fd, &action, sz, + region_ctl->fd_offset + + offsetof(struct vfio_device_state_ctl, device_config.action)) + != sz) { + error_report("vfio: action failure for device config set buffer"); + return -1; + } + + return 0; +} + +static int vfio_set_dirty_page_bitmap_chunk(VFIOPCIDevice *vdev, + uint64_t start_addr, uint64_t page_nr) +{ + + VFIODevice *vbasedev = &vdev->vbasedev; + VFIORegion *region_ctl = + &vdev->migration->region[VFIO_DEVSTATE_REGION_CTL]; + VFIORegion *region_bitmap = + &vdev->migration->region[VFIO_DEVSTATE_REGION_DATA_BITMAP]; + unsigned long bitmap_size = + BITS_TO_LONGS(page_nr) * sizeof(unsigned long); + uint32_t sz; + + struct { + __u64 start_addr; + __u64 page_nr; + } system_memory; + system_memory.start_addr = start_addr; + system_memory.page_nr = page_nr; + sz = sizeof(system_memory); + if (pwrite(vbasedev->fd, &system_memory, sz, + region_ctl->fd_offset + + offsetof(struct vfio_device_state_ctl, system_memory)) + != sz) { + error_report("vfio: Failed to set system memory range for dirty pages"); + return -1; + } + + if (!vfio_device_state_region_mmaped(region_bitmap)) { + void *bitmap = g_malloc0(bitmap_size); + + if (pread(vbasedev->fd, bitmap, bitmap_size, + region_bitmap->fd_offset) != bitmap_size) { + error_report("vfio: Failed to read dirty bitmap data"); + return -1; + } + + cpu_physical_memory_set_dirty_lebitmap(bitmap, start_addr, page_nr); + + g_free(bitmap); + } else { + cpu_physical_memory_set_dirty_lebitmap( + region_bitmap->mmaps[0].mmap, + start_addr, page_nr); + } + return 0; +} + +int vfio_set_dirty_page_bitmap(VFIOPCIDevice *vdev, + uint64_t start_addr, uint64_t page_nr) +{ + VFIORegion *region_bitmap = + &vdev->migration->region[VFIO_DEVSTATE_REGION_DATA_BITMAP]; + unsigned long chunk_size = region_bitmap->size; + uint64_t chunk_pg_nr = (chunk_size / sizeof(unsigned long)) * + BITS_PER_LONG; + + uint64_t cnt_left; + int rc = 0; + + cnt_left = page_nr; + + while (cnt_left >= chunk_pg_nr) { + rc = vfio_set_dirty_page_bitmap_chunk(vdev, start_addr, chunk_pg_nr); + if (rc) { + goto exit; + } + cnt_left -= chunk_pg_nr; + start_addr += start_addr; + } + rc = vfio_set_dirty_page_bitmap_chunk(vdev, start_addr, cnt_left); + +exit: + return rc; +} + +static int vfio_set_device_state(VFIOPCIDevice *vdev, + uint32_t dev_state) +{ + VFIODevice *vbasedev = &vdev->vbasedev; + VFIORegion *region = + &vdev->migration->region[VFIO_DEVSTATE_REGION_CTL]; + uint32_t sz = sizeof(dev_state); + + if (!vdev->migration) { + return -1; + } + + if (pwrite(vbasedev->fd, &dev_state, sz, + region->fd_offset + + offsetof(struct vfio_device_state_ctl, device_state)) + != sz) { + error_report("vfio: Failed to set device state %d", dev_state); + return -1; + } + vdev->migration->device_state = dev_state; + return 0; +} + +static int vfio_get_device_data_caps(VFIOPCIDevice *vdev) +{ + VFIODevice *vbasedev = &vdev->vbasedev; + VFIORegion *region = + &vdev->migration->region[VFIO_DEVSTATE_REGION_CTL]; + + uint32_t caps; + uint32_t size = sizeof(caps); + + if (pread(vbasedev->fd, &caps, size, + region->fd_offset + + offsetof(struct vfio_device_state_ctl, caps)) + != size) { + error_report("%s Failed to read data caps of device states", + vbasedev->name); + return -1; + } + vdev->migration->data_caps = caps; + return 0; +} + + +static int vfio_check_devstate_version(VFIOPCIDevice *vdev) +{ + VFIODevice *vbasedev = &vdev->vbasedev; + VFIORegion *region = + &vdev->migration->region[VFIO_DEVSTATE_REGION_CTL]; + + uint32_t version; + uint32_t size = sizeof(version); + + if (pread(vbasedev->fd, &version, size, + region->fd_offset + + offsetof(struct vfio_device_state_ctl, version)) + != size) { + error_report("%s Failed to read version of device state interfaces", + vbasedev->name); + return -1; + } + + if (version != VFIO_DEVICE_STATE_INTERFACE_VERSION) { + error_report("%s migration version mismatch, right version is %d", + vbasedev->name, VFIO_DEVICE_STATE_INTERFACE_VERSION); + return -1; + } + + return 0; +} + +static void vfio_vm_change_state_handler(void *pv, int running, RunState state) +{ + VFIOPCIDevice *vdev = pv; + uint32_t dev_state = vdev->migration->device_state; + + if (!running) { + dev_state |= VFIO_DEVICE_STATE_STOP; + } else { + dev_state &= ~VFIO_DEVICE_STATE_STOP; + } + + vfio_set_device_state(vdev, dev_state); +} + +static void vfio_save_live_pending(QEMUFile *f, void *opaque, + uint64_t max_size, + uint64_t *res_precopy_only, + uint64_t *res_compatible, + uint64_t *res_post_copy_only) +{ + VFIOPCIDevice *vdev = opaque; + + if (!vfio_device_data_cap_device_memory(vdev)) { + return; + } + + return; +} + +static int vfio_save_iterate(QEMUFile *f, void *opaque) +{ + VFIOPCIDevice *vdev = opaque; + + if (!vfio_device_data_cap_device_memory(vdev)) { + return 0; + } + + return 0; +} + +static void vfio_pci_load_config(VFIOPCIDevice *vdev, QEMUFile *f) +{ + PCIDevice *pdev = &vdev->pdev; + uint32_t ctl, msi_lo, msi_hi, msi_data, bar_cfg, i; + bool msi_64bit; + + /* retore pci bar configuration */ + ctl = pci_default_read_config(pdev, PCI_COMMAND, 2); + vfio_pci_write_config(pdev, PCI_COMMAND, + ctl & (!(PCI_COMMAND_IO | PCI_COMMAND_MEMORY)), 2); + for (i = 0; i < PCI_ROM_SLOT; i++) { + bar_cfg = qemu_get_be32(f); + vfio_pci_write_config(pdev, PCI_BASE_ADDRESS_0 + i * 4, bar_cfg, 4); + } + vfio_pci_write_config(pdev, PCI_COMMAND, + ctl | PCI_COMMAND_IO | PCI_COMMAND_MEMORY, 2); + + /* restore msi configuration */ + ctl = pci_default_read_config(pdev, pdev->msi_cap + PCI_MSI_FLAGS, 2); + msi_64bit = !!(ctl & PCI_MSI_FLAGS_64BIT); + + vfio_pci_write_config(&vdev->pdev, + pdev->msi_cap + PCI_MSI_FLAGS, + ctl & (!PCI_MSI_FLAGS_ENABLE), 2); + + msi_lo = qemu_get_be32(f); + vfio_pci_write_config(pdev, pdev->msi_cap + PCI_MSI_ADDRESS_LO, msi_lo, 4); + + if (msi_64bit) { + msi_hi = qemu_get_be32(f); + vfio_pci_write_config(pdev, pdev->msi_cap + PCI_MSI_ADDRESS_HI, + msi_hi, 4); + } + msi_data = qemu_get_be32(f); + vfio_pci_write_config(pdev, + pdev->msi_cap + (msi_64bit ? PCI_MSI_DATA_64 : PCI_MSI_DATA_32), + msi_data, 2); + + vfio_pci_write_config(&vdev->pdev, pdev->msi_cap + PCI_MSI_FLAGS, + ctl | PCI_MSI_FLAGS_ENABLE, 2); + +} + +static int vfio_load_state(QEMUFile *f, void *opaque, int version_id) +{ + VFIOPCIDevice *vdev = opaque; + int flag; + uint64_t len; + int ret = 0; + + if (version_id != VFIO_DEVICE_STATE_INTERFACE_VERSION) { + return -EINVAL; + } + + do { + flag = qemu_get_byte(f); + + switch (flag & ~VFIO_SAVE_FLAG_CONTINUE) { + case VFIO_SAVE_FLAG_SETUP: + break; + case VFIO_SAVE_FLAG_PCI: + vfio_pci_load_config(vdev, f); + break; + case VFIO_SAVE_FLAG_DEVCONFIG: + len = qemu_get_be64(f); + vfio_load_data_device_config(vdev, f, len); + break; + default: + ret = -EINVAL; + } + } while (flag & VFIO_SAVE_FLAG_CONTINUE); + + return ret; +} + +static void vfio_pci_save_config(VFIOPCIDevice *vdev, QEMUFile *f) +{ + PCIDevice *pdev = &vdev->pdev; + uint32_t msi_cfg, msi_lo, msi_hi, msi_data, bar_cfg, i; + bool msi_64bit; + + for (i = 0; i < PCI_ROM_SLOT; i++) { + bar_cfg = pci_default_read_config(pdev, PCI_BASE_ADDRESS_0 + i * 4, 4); + qemu_put_be32(f, bar_cfg); + } + + msi_cfg = pci_default_read_config(pdev, pdev->msi_cap + PCI_MSI_FLAGS, 2); + msi_64bit = !!(msi_cfg & PCI_MSI_FLAGS_64BIT); + + msi_lo = pci_default_read_config(pdev, + pdev->msi_cap + PCI_MSI_ADDRESS_LO, 4); + qemu_put_be32(f, msi_lo); + + if (msi_64bit) { + msi_hi = pci_default_read_config(pdev, + pdev->msi_cap + PCI_MSI_ADDRESS_HI, + 4); + qemu_put_be32(f, msi_hi); + } + + msi_data = pci_default_read_config(pdev, + pdev->msi_cap + (msi_64bit ? PCI_MSI_DATA_64 : PCI_MSI_DATA_32), + 2); + qemu_put_be32(f, msi_data); + +} + +static int vfio_save_complete_precopy(QEMUFile *f, void *opaque) +{ + VFIOPCIDevice *vdev = opaque; + int rc = 0; + + qemu_put_byte(f, VFIO_SAVE_FLAG_PCI | VFIO_SAVE_FLAG_CONTINUE); + vfio_pci_save_config(vdev, f); + + qemu_put_byte(f, VFIO_SAVE_FLAG_DEVCONFIG); + rc += vfio_get_device_config_size(vdev); + rc += vfio_save_data_device_config(vdev, f); + + return rc; +} + +static int vfio_save_setup(QEMUFile *f, void *opaque) +{ + VFIOPCIDevice *vdev = opaque; + qemu_put_byte(f, VFIO_SAVE_FLAG_SETUP); + + vfio_set_device_state(vdev, VFIO_DEVICE_STATE_RUNNING | + VFIO_DEVICE_STATE_LOGGING); + return 0; +} + +static int vfio_load_setup(QEMUFile *f, void *opaque) +{ + return 0; +} + +static void vfio_save_cleanup(void *opaque) +{ + VFIOPCIDevice *vdev = opaque; + uint32_t dev_state = vdev->migration->device_state; + + dev_state &= ~VFIO_DEVICE_STATE_LOGGING; + + vfio_set_device_state(vdev, dev_state); +} + +static SaveVMHandlers savevm_vfio_handlers = { + .save_setup = vfio_save_setup, + .save_live_pending = vfio_save_live_pending, + .save_live_iterate = vfio_save_iterate, + .save_live_complete_precopy = vfio_save_complete_precopy, + .save_cleanup = vfio_save_cleanup, + .load_setup = vfio_load_setup, + .load_state = vfio_load_state, +}; + +int vfio_migration_init(VFIOPCIDevice *vdev, Error **errp) +{ + int ret; + Error *local_err = NULL; + vdev->migration = g_new0(VFIOMigration, 1); + + if (vfio_device_state_region_setup(vdev, + &vdev->migration->region[VFIO_DEVSTATE_REGION_CTL], + VFIO_REGION_SUBTYPE_DEVICE_STATE_CTL, + "device-state-ctl")) { + goto error; + } + + if (vfio_check_devstate_version(vdev)) { + goto error; + } + + if (vfio_get_device_data_caps(vdev)) { + goto error; + } + + if (vfio_device_state_region_setup(vdev, + &vdev->migration->region[VFIO_DEVSTATE_REGION_DATA_CONFIG], + VFIO_REGION_SUBTYPE_DEVICE_STATE_DATA_CONFIG, + "device-state-data-device-config")) { + goto error; + } + + if (vfio_device_data_cap_device_memory(vdev)) { + error_report("No suppport of data cap device memory Yet"); + goto error; + } + + if (vfio_device_data_cap_system_memory(vdev) && + vfio_device_state_region_setup(vdev, + &vdev->migration->region[VFIO_DEVSTATE_REGION_DATA_BITMAP], + VFIO_REGION_SUBTYPE_DEVICE_STATE_DATA_DIRTYBITMAP, + "device-state-data-dirtybitmap")) { + goto error; + } + + vdev->migration->device_state = VFIO_DEVICE_STATE_RUNNING; + + register_savevm_live(NULL, TYPE_VFIO_PCI, -1, + VFIO_DEVICE_STATE_INTERFACE_VERSION, + &savevm_vfio_handlers, + vdev); + + vdev->migration->vm_state = + qemu_add_vm_change_state_handler(vfio_vm_change_state_handler, vdev); + + return 0; +error: + error_setg(&vdev->migration_blocker, + "VFIO device doesn't support migration"); + ret = migrate_add_blocker(vdev->migration_blocker, &local_err); + if (local_err) { + error_propagate(errp, local_err); + error_free(vdev->migration_blocker); + } + + g_free(vdev->migration); + vdev->migration = NULL; + + return ret; +} + +void vfio_migration_finalize(VFIOPCIDevice *vdev) +{ + if (vdev->migration) { + int i; + qemu_del_vm_change_state_handler(vdev->migration->vm_state); + unregister_savevm(NULL, TYPE_VFIO_PCI, vdev); + for (i = 0; i < VFIO_DEVSTATE_REGION_NUM; i++) { + vfio_region_finalize(&vdev->migration->region[i]); + } + g_free(vdev->migration); + vdev->migration = NULL; + } else if (vdev->migration_blocker) { + migrate_del_blocker(vdev->migration_blocker); + error_free(vdev->migration_blocker); + } +} diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index c0cb1ec..b8e006b 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -37,7 +37,6 @@ #define MSIX_CAP_LENGTH 12 -#define TYPE_VFIO_PCI "vfio-pci" #define PCI_VFIO(obj) OBJECT_CHECK(VFIOPCIDevice, obj, TYPE_VFIO_PCI) static void vfio_disable_interrupts(VFIOPCIDevice *vdev); diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h index b1ae4c0..4b7b1bb 100644 --- a/hw/vfio/pci.h +++ b/hw/vfio/pci.h @@ -19,6 +19,7 @@ #include "qemu/event_notifier.h" #include "qemu/queue.h" #include "qemu/timer.h" +#include "sysemu/sysemu.h" #define PCI_ANY_ID (~0) @@ -56,6 +57,21 @@ typedef struct VFIOBAR { QLIST_HEAD(, VFIOQuirk) quirks; } VFIOBAR; +enum { + VFIO_DEVSTATE_REGION_CTL = 0, + VFIO_DEVSTATE_REGION_DATA_CONFIG, + VFIO_DEVSTATE_REGION_DATA_DEVICE_MEMORY, + VFIO_DEVSTATE_REGION_DATA_BITMAP, + VFIO_DEVSTATE_REGION_NUM, +}; +typedef struct VFIOMigration { + VFIORegion region[VFIO_DEVSTATE_REGION_NUM]; + uint32_t data_caps; + uint32_t device_state; + uint64_t devconfig_size; + VMChangeStateEntry *vm_state; +} VFIOMigration; + typedef struct VFIOVGARegion { MemoryRegion mem; off_t offset; @@ -132,6 +148,8 @@ typedef struct VFIOPCIDevice { VFIOBAR bars[PCI_NUM_REGIONS - 1]; /* No ROM */ VFIOVGA *vga; /* 0xa0000, 0x3b0, 0x3c0 */ void *igd_opregion; + VFIOMigration *migration; + Error *migration_blocker; PCIHostDeviceAddress host; EventNotifier err_notifier; EventNotifier req_notifier; @@ -198,5 +216,10 @@ int vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev, void vfio_display_reset(VFIOPCIDevice *vdev); int vfio_display_probe(VFIOPCIDevice *vdev, Error **errp); void vfio_display_finalize(VFIOPCIDevice *vdev); - +bool vfio_device_data_cap_system_memory(VFIOPCIDevice *vdev); +bool vfio_device_data_cap_device_memory(VFIOPCIDevice *vdev); +int vfio_set_dirty_page_bitmap(VFIOPCIDevice *vdev, + uint64_t start_addr, uint64_t page_nr); +int vfio_migration_init(VFIOPCIDevice *vdev, Error **errp); +void vfio_migration_finalize(VFIOPCIDevice *vdev); #endif /* HW_VFIO_VFIO_PCI_H */ diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index 1b434d0..ed43613 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -32,6 +32,7 @@ #endif #define VFIO_MSG_PREFIX "vfio %s: " +#define TYPE_VFIO_PCI "vfio-pci" enum { VFIO_DEVICE_TYPE_PCI = 0, From patchwork Tue Feb 19 08:52:41 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 10819493 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1398B139A for ; Tue, 19 Feb 2019 08:54:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 03AF828B18 for ; Tue, 19 Feb 2019 08:54:17 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EB57328D29; Tue, 19 Feb 2019 08:54:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI autolearn=unavailable version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 6514F28B18 for ; Tue, 19 Feb 2019 08:54:16 +0000 (UTC) Received: from localhost ([127.0.0.1]:44379 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gw1AN-0004AN-Lr for patchwork-qemu-devel@patchwork.kernel.org; Tue, 19 Feb 2019 03:54:15 -0500 Received: from eggs.gnu.org ([209.51.188.92]:55682) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gw192-0002tU-US for qemu-devel@nongnu.org; Tue, 19 Feb 2019 03:52:53 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gw190-0001B8-VM for qemu-devel@nongnu.org; Tue, 19 Feb 2019 03:52:52 -0500 Received: from mga02.intel.com ([134.134.136.20]:25125) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gw18z-00019y-01 for qemu-devel@nongnu.org; Tue, 19 Feb 2019 03:52:50 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Feb 2019 00:52:46 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,387,1544515200"; d="scan'208";a="123524385" Received: from joy-desktop.sh.intel.com ([10.239.13.17]) by fmsmga007.fm.intel.com with ESMTP; 19 Feb 2019 00:52:42 -0800 From: Yan Zhao To: alex.williamson@redhat.com, qemu-devel@nongnu.org Date: Tue, 19 Feb 2019 16:52:41 +0800 Message-Id: <1550566361-3697-1-git-send-email-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1550566254-3545-1-git-send-email-yan.y.zhao@intel.com> References: <1550566254-3545-1-git-send-email-yan.y.zhao@intel.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 134.134.136.20 Subject: [Qemu-devel] [PATCH 3/5] vfio/migration: tracking of dirty page in system memory X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: cjia@nvidia.com, kvm@vger.kernel.org, aik@ozlabs.ru, Zhengxiao.zx@Alibaba-inc.com, shuangtai.tst@alibaba-inc.com, kwankhede@nvidia.com, eauger@redhat.com, yi.l.liu@intel.com, eskultet@redhat.com, ziye.yang@intel.com, mlevitsk@redhat.com, pasic@linux.ibm.com, arei.gonglei@huawei.com, felipe@nutanix.com, Ken.Xue@amd.com, kevin.tian@intel.com, Yan Zhao , dgilbert@redhat.com, intel-gvt-dev@lists.freedesktop.org, changpeng.liu@intel.com, cohuck@redhat.com, zhi.a.wang@intel.com, jonathan.davies@nutanix.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP register the log_sync interface to hook into ram's live migration callbacks. ram_save_pending |->migration_bitmap_sync |->memory_global_dirty_log_sync |->memory_region_sync_dirty_bitmap |->listener->log_sync(listener, &mrs); So, the dirty page produced by vfio device in system memory will be save/load by ram's live migration code iteratively. Bitmap of device's dirty page in system memory is retrieved from Dirty Bitmap Region Signed-off-by: Yan Zhao Signed-off-by: Yulei Zhang --- hw/vfio/common.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 7c185e5a..719e750 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -27,6 +27,7 @@ #include "hw/vfio/vfio-common.h" #include "hw/vfio/vfio.h" +#include "hw/vfio/pci.h" #include "exec/address-spaces.h" #include "exec/memory.h" #include "hw/hw.h" @@ -698,9 +699,34 @@ static void vfio_listener_region_del(MemoryListener *listener, } } +static void vfio_log_sync(MemoryListener *listener, + MemoryRegionSection *section) +{ + VFIOContainer *container = container_of(listener, VFIOContainer, listener); + VFIOGroup *group = QLIST_FIRST(&container->group_list); + VFIODevice *vbasedev; + VFIOPCIDevice *vdev; + + ram_addr_t size = int128_get64(section->size); + uint64_t page_nr = size >> TARGET_PAGE_BITS; + uint64_t start_addr = section->offset_within_address_space; + + QLIST_FOREACH(vbasedev, &group->device_list, next) { + vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev); + if (!vdev->migration || + !vfio_device_data_cap_system_memory(vdev) || + !(vdev->migration->device_state & VFIO_DEVICE_STATE_LOGGING)) { + continue; + } + + vfio_set_dirty_page_bitmap(vdev, start_addr, page_nr); + } +} + static const MemoryListener vfio_memory_listener = { .region_add = vfio_listener_region_add, .region_del = vfio_listener_region_del, + .log_sync = vfio_log_sync, }; static void vfio_listener_release(VFIOContainer *container) From patchwork Tue Feb 19 08:52:51 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 10819503 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BE755139A for ; Tue, 19 Feb 2019 08:58:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AF592290E9 for ; Tue, 19 Feb 2019 08:58:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A2DD02A050; Tue, 19 Feb 2019 08:58:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI autolearn=unavailable version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 5D9EC290E9 for ; Tue, 19 Feb 2019 08:58:42 +0000 (UTC) Received: from localhost ([127.0.0.1]:44461 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gw1Ef-0007vx-Kx for patchwork-qemu-devel@patchwork.kernel.org; Tue, 19 Feb 2019 03:58:41 -0500 Received: from eggs.gnu.org ([209.51.188.92]:55741) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gw198-0002yM-He for qemu-devel@nongnu.org; Tue, 19 Feb 2019 03:52:59 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gw197-0001DQ-LU for qemu-devel@nongnu.org; Tue, 19 Feb 2019 03:52:58 -0500 Received: from mga01.intel.com ([192.55.52.88]:27178) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gw197-0001CV-Be for qemu-devel@nongnu.org; Tue, 19 Feb 2019 03:52:57 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Feb 2019 00:52:54 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,387,1544515200"; d="scan'208";a="125511180" Received: from joy-desktop.sh.intel.com ([10.239.13.17]) by fmsmga008.fm.intel.com with ESMTP; 19 Feb 2019 00:52:52 -0800 From: Yan Zhao To: alex.williamson@redhat.com, qemu-devel@nongnu.org Date: Tue, 19 Feb 2019 16:52:51 +0800 Message-Id: <1550566371-3743-1-git-send-email-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1550566254-3545-1-git-send-email-yan.y.zhao@intel.com> References: <1550566254-3545-1-git-send-email-yan.y.zhao@intel.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 192.55.52.88 Subject: [Qemu-devel] [PATCH 4/5] vfio/migration: turn on migration X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: cjia@nvidia.com, kvm@vger.kernel.org, aik@ozlabs.ru, Zhengxiao.zx@Alibaba-inc.com, shuangtai.tst@alibaba-inc.com, kwankhede@nvidia.com, eauger@redhat.com, yi.l.liu@intel.com, eskultet@redhat.com, ziye.yang@intel.com, mlevitsk@redhat.com, pasic@linux.ibm.com, arei.gonglei@huawei.com, felipe@nutanix.com, Ken.Xue@amd.com, kevin.tian@intel.com, Yan Zhao , dgilbert@redhat.com, intel-gvt-dev@lists.freedesktop.org, changpeng.liu@intel.com, cohuck@redhat.com, zhi.a.wang@intel.com, jonathan.davies@nutanix.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP init vfio migration in vfio_realize() and register migraton blocker if failure met. finalize all migration resources when vfio_instance_finalize(). Signed-off-by: Yan Zhao Signed-off-by: Yulei Zhang --- hw/vfio/pci.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index b8e006b..8bf625e 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -3068,6 +3068,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp) goto out_teardown; } + vfio_migration_init(vdev, errp); + vfio_register_err_notifier(vdev); vfio_register_req_notifier(vdev); vfio_setup_resetfn_quirk(vdev); @@ -3089,6 +3091,7 @@ static void vfio_instance_finalize(Object *obj) vfio_display_finalize(vdev); vfio_bars_finalize(vdev); + vfio_migration_finalize(vdev); g_free(vdev->emulated_config_bits); g_free(vdev->rom); /* @@ -3221,11 +3224,6 @@ static Property vfio_pci_dev_properties[] = { DEFINE_PROP_END_OF_LIST(), }; -static const VMStateDescription vfio_pci_vmstate = { - .name = "vfio-pci", - .unmigratable = 1, -}; - static void vfio_pci_dev_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); @@ -3233,7 +3231,6 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, void *data) dc->reset = vfio_pci_reset; dc->props = vfio_pci_dev_properties; - dc->vmsd = &vfio_pci_vmstate; dc->desc = "VFIO-based PCI device assignment"; set_bit(DEVICE_CATEGORY_MISC, dc->categories); pdc->realize = vfio_realize; From patchwork Tue Feb 19 08:53:00 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yan Zhao X-Patchwork-Id: 10819497 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F2CA1139A for ; Tue, 19 Feb 2019 08:55:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DEE69290E9 for ; Tue, 19 Feb 2019 08:55:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D21612A050; Tue, 19 Feb 2019 08:55:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 1BB68290E9 for ; Tue, 19 Feb 2019 08:55:45 +0000 (UTC) Received: from localhost ([127.0.0.1]:44435 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gw1Bo-0005RZ-CE for patchwork-qemu-devel@patchwork.kernel.org; Tue, 19 Feb 2019 03:55:44 -0500 Received: from eggs.gnu.org ([209.51.188.92]:55870) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gw19I-00035D-MT for qemu-devel@nongnu.org; Tue, 19 Feb 2019 03:53:10 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gw19G-0001IO-W7 for qemu-devel@nongnu.org; Tue, 19 Feb 2019 03:53:08 -0500 Received: from mga14.intel.com ([192.55.52.115]:14108) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gw19G-0001I0-Il for qemu-devel@nongnu.org; Tue, 19 Feb 2019 03:53:06 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Feb 2019 00:53:05 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,387,1544515200"; d="scan'208";a="321511650" Received: from joy-desktop.sh.intel.com ([10.239.13.17]) by fmsmga005.fm.intel.com with ESMTP; 19 Feb 2019 00:53:00 -0800 From: Yan Zhao To: alex.williamson@redhat.com, qemu-devel@nongnu.org Date: Tue, 19 Feb 2019 16:53:00 +0800 Message-Id: <1550566380-3788-1-git-send-email-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1550566254-3545-1-git-send-email-yan.y.zhao@intel.com> References: <1550566254-3545-1-git-send-email-yan.y.zhao@intel.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 192.55.52.115 Subject: [Qemu-devel] [PATCH 5/5] vfio/migration: support device memory capability X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: cjia@nvidia.com, kvm@vger.kernel.org, aik@ozlabs.ru, Zhengxiao.zx@Alibaba-inc.com, shuangtai.tst@alibaba-inc.com, kwankhede@nvidia.com, eauger@redhat.com, yi.l.liu@intel.com, eskultet@redhat.com, ziye.yang@intel.com, mlevitsk@redhat.com, pasic@linux.ibm.com, arei.gonglei@huawei.com, felipe@nutanix.com, Ken.Xue@amd.com, kevin.tian@intel.com, Yan Zhao , dgilbert@redhat.com, intel-gvt-dev@lists.freedesktop.org, changpeng.liu@intel.com, cohuck@redhat.com, zhi.a.wang@intel.com, jonathan.davies@nutanix.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP If a device has device memory capability, save/load data from device memory in pre-copy and stop-and-copy phases. LOGGING state is set for device memory for dirty page logging: in LOGGING state, get device memory returns whole device memory snapshot; outside LOGGING state, get device memory returns dirty data since last get operation. Usually, device memory is very big, qemu needs to chunk it into several pieces each with size of device memory region. Signed-off-by: Yan Zhao Signed-off-by: Kirti Wankhede --- hw/vfio/migration.c | 235 ++++++++++++++++++++++++++++++++++++++++++++++++++-- hw/vfio/pci.h | 1 + 2 files changed, 231 insertions(+), 5 deletions(-) diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 16d6395..f1e9309 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -203,6 +203,201 @@ static int vfio_load_data_device_config(VFIOPCIDevice *vdev, return 0; } +static int vfio_get_device_memory_size(VFIOPCIDevice *vdev) +{ + VFIODevice *vbasedev = &vdev->vbasedev; + VFIORegion *region_ctl = + &vdev->migration->region[VFIO_DEVSTATE_REGION_CTL]; + uint64_t len; + int sz; + + sz = sizeof(len); + if (pread(vbasedev->fd, &len, sz, + region_ctl->fd_offset + + offsetof(struct vfio_device_state_ctl, device_memory.size)) + != sz) { + error_report("vfio: Failed to get length of device memory"); + return -1; + } + vdev->migration->devmem_size = len; + return 0; +} + +static int vfio_set_device_memory_size(VFIOPCIDevice *vdev, uint64_t size) +{ + VFIODevice *vbasedev = &vdev->vbasedev; + VFIORegion *region_ctl = + &vdev->migration->region[VFIO_DEVSTATE_REGION_CTL]; + int sz; + + sz = sizeof(size); + if (pwrite(vbasedev->fd, &size, sz, + region_ctl->fd_offset + + offsetof(struct vfio_device_state_ctl, device_memory.size)) + != sz) { + error_report("vfio: Failed to set length of device comemory"); + return -1; + } + vdev->migration->devmem_size = size; + return 0; +} + +static +int vfio_save_data_device_memory_chunk(VFIOPCIDevice *vdev, QEMUFile *f, + uint64_t pos, uint64_t len) +{ + VFIODevice *vbasedev = &vdev->vbasedev; + VFIORegion *region_ctl = + &vdev->migration->region[VFIO_DEVSTATE_REGION_CTL]; + VFIORegion *region_devmem = + &vdev->migration->region[VFIO_DEVSTATE_REGION_DATA_DEVICE_MEMORY]; + void *dest; + uint32_t sz; + uint8_t *buf = NULL; + uint32_t action = VFIO_DEVICE_DATA_ACTION_GET_BUFFER; + + if (len > region_devmem->size) { + return -1; + } + + sz = sizeof(pos); + if (pwrite(vbasedev->fd, &pos, sz, + region_ctl->fd_offset + + offsetof(struct vfio_device_state_ctl, device_memory.pos)) + != sz) { + error_report("vfio: Failed to set save buffer pos"); + return -1; + } + sz = sizeof(action); + if (pwrite(vbasedev->fd, &action, sz, + region_ctl->fd_offset + + offsetof(struct vfio_device_state_ctl, device_memory.action)) + != sz) { + error_report("vfio: Failed to set save buffer action"); + return -1; + } + + if (!vfio_device_state_region_mmaped(region_devmem)) { + buf = g_malloc(len); + if (buf == NULL) { + error_report("vfio: Failed to allocate memory for migrate"); + return -1; + } + if (pread(vbasedev->fd, buf, len, region_devmem->fd_offset) != len) { + error_report("vfio: error load device memory buffer"); + return -1; + } + qemu_put_be64(f, len); + qemu_put_be64(f, pos); + qemu_put_buffer(f, buf, len); + g_free(buf); + } else { + dest = region_devmem->mmaps[0].mmap; + qemu_put_be64(f, len); + qemu_put_be64(f, pos); + qemu_put_buffer(f, dest, len); + } + return 0; +} + +static int vfio_save_data_device_memory(VFIOPCIDevice *vdev, QEMUFile *f) +{ + VFIORegion *region_devmem = + &vdev->migration->region[VFIO_DEVSTATE_REGION_DATA_DEVICE_MEMORY]; + uint64_t total_len = vdev->migration->devmem_size; + uint64_t pos = 0; + + qemu_put_be64(f, total_len); + while (pos < total_len) { + uint64_t len = region_devmem->size; + + if (pos + len >= total_len) { + len = total_len - pos; + } + if (vfio_save_data_device_memory_chunk(vdev, f, pos, len)) { + return -1; + } + } + + return 0; +} + +static +int vfio_load_data_device_memory_chunk(VFIOPCIDevice *vdev, QEMUFile *f, + uint64_t pos, uint64_t len) +{ + VFIODevice *vbasedev = &vdev->vbasedev; + VFIORegion *region_ctl = + &vdev->migration->region[VFIO_DEVSTATE_REGION_CTL]; + VFIORegion *region_devmem = + &vdev->migration->region[VFIO_DEVSTATE_REGION_DATA_DEVICE_MEMORY]; + + void *dest; + uint32_t sz; + uint8_t *buf = NULL; + uint32_t action = VFIO_DEVICE_DATA_ACTION_SET_BUFFER; + + if (len > region_devmem->size) { + return -1; + } + + sz = sizeof(pos); + if (pwrite(vbasedev->fd, &pos, sz, + region_ctl->fd_offset + + offsetof(struct vfio_device_state_ctl, device_memory.pos)) + != sz) { + error_report("vfio: Failed to set device memory buffer pos"); + return -1; + } + if (!vfio_device_state_region_mmaped(region_devmem)) { + buf = g_malloc(len); + if (buf == NULL) { + error_report("vfio: Failed to allocate memory for migrate"); + return -1; + } + qemu_get_buffer(f, buf, len); + if (pwrite(vbasedev->fd, buf, len, + region_devmem->fd_offset) != len) { + error_report("vfio: Failed to load devie memory buffer"); + return -1; + } + g_free(buf); + } else { + dest = region_devmem->mmaps[0].mmap; + qemu_get_buffer(f, dest, len); + } + + sz = sizeof(action); + if (pwrite(vbasedev->fd, &action, sz, + region_ctl->fd_offset + + offsetof(struct vfio_device_state_ctl, device_memory.action)) + != sz) { + error_report("vfio: Failed to set load device memory buffer action"); + return -1; + } + + return 0; + +} + +static int vfio_load_data_device_memory(VFIOPCIDevice *vdev, + QEMUFile *f, uint64_t total_len) +{ + uint64_t pos = 0, len = 0; + + vfio_set_device_memory_size(vdev, total_len); + + while (pos + len < total_len) { + len = qemu_get_be64(f); + pos = qemu_get_be64(f); + + vfio_load_data_device_memory_chunk(vdev, f, pos, len); + } + + return 0; +} + + static int vfio_set_dirty_page_bitmap_chunk(VFIOPCIDevice *vdev, uint64_t start_addr, uint64_t page_nr) { @@ -377,6 +572,10 @@ static void vfio_save_live_pending(QEMUFile *f, void *opaque, return; } + /* get dirty data size of device memory */ + vfio_get_device_memory_size(vdev); + + *res_precopy_only += vdev->migration->devmem_size; return; } @@ -388,7 +587,9 @@ static int vfio_save_iterate(QEMUFile *f, void *opaque) return 0; } - return 0; + qemu_put_byte(f, VFIO_SAVE_FLAG_DEVMEMORY); + /* get dirty data of device memory */ + return vfio_save_data_device_memory(vdev, f); } static void vfio_pci_load_config(VFIOPCIDevice *vdev, QEMUFile *f) @@ -458,6 +659,10 @@ static int vfio_load_state(QEMUFile *f, void *opaque, int version_id) len = qemu_get_be64(f); vfio_load_data_device_config(vdev, f, len); break; + case VFIO_SAVE_FLAG_DEVMEMORY: + len = qemu_get_be64(f); + vfio_load_data_device_memory(vdev, f, len); + break; default: ret = -EINVAL; } @@ -503,6 +708,13 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque) VFIOPCIDevice *vdev = opaque; int rc = 0; + if (vfio_device_data_cap_device_memory(vdev)) { + qemu_put_byte(f, VFIO_SAVE_FLAG_DEVMEMORY | VFIO_SAVE_FLAG_CONTINUE); + /* get dirty data of device memory */ + vfio_get_device_memory_size(vdev); + rc = vfio_save_data_device_memory(vdev, f); + } + qemu_put_byte(f, VFIO_SAVE_FLAG_PCI | VFIO_SAVE_FLAG_CONTINUE); vfio_pci_save_config(vdev, f); @@ -515,12 +727,22 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque) static int vfio_save_setup(QEMUFile *f, void *opaque) { + int rc = 0; VFIOPCIDevice *vdev = opaque; - qemu_put_byte(f, VFIO_SAVE_FLAG_SETUP); + + if (vfio_device_data_cap_device_memory(vdev)) { + qemu_put_byte(f, VFIO_SAVE_FLAG_SETUP | VFIO_SAVE_FLAG_CONTINUE); + qemu_put_byte(f, VFIO_SAVE_FLAG_DEVMEMORY); + /* get whole snapshot of device memory */ + vfio_get_device_memory_size(vdev); + rc = vfio_save_data_device_memory(vdev, f); + } else { + qemu_put_byte(f, VFIO_SAVE_FLAG_SETUP); + } vfio_set_device_state(vdev, VFIO_DEVICE_STATE_RUNNING | VFIO_DEVICE_STATE_LOGGING); - return 0; + return rc; } static int vfio_load_setup(QEMUFile *f, void *opaque) @@ -576,8 +798,11 @@ int vfio_migration_init(VFIOPCIDevice *vdev, Error **errp) goto error; } - if (vfio_device_data_cap_device_memory(vdev)) { - error_report("No suppport of data cap device memory Yet"); + if (vfio_device_data_cap_device_memory(vdev) && + vfio_device_state_region_setup(vdev, + &vdev->migration->region[VFIO_DEVSTATE_REGION_DATA_DEVICE_MEMORY], + VFIO_REGION_SUBTYPE_DEVICE_STATE_DATA_MEMORY, + "device-state-data-device-memory")) { goto error; } diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h index 4b7b1bb..a2cc64b 100644 --- a/hw/vfio/pci.h +++ b/hw/vfio/pci.h @@ -69,6 +69,7 @@ typedef struct VFIOMigration { uint32_t data_caps; uint32_t device_state; uint64_t devconfig_size; + uint64_t devmem_size; VMChangeStateEntry *vm_state; } VFIOMigration;