From patchwork Mon Dec 16 20:21:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kirti Wankhede X-Patchwork-Id: 11295305 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6EEEA109A for ; Mon, 16 Dec 2019 20:58:12 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 32A3921582 for ; Mon, 16 Dec 2019 20:58:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="oOuDhHab" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 32A3921582 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:59962 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1igxRT-0004f0-Bx for patchwork-qemu-devel@patchwork.kernel.org; Mon, 16 Dec 2019 15:58:11 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:36863) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1igxKB-0002Jz-1B for qemu-devel@nongnu.org; Mon, 16 Dec 2019 15:50:40 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1igxK8-0007Fs-W3 for qemu-devel@nongnu.org; Mon, 16 Dec 2019 15:50:38 -0500 Received: from hqnvemgate25.nvidia.com ([216.228.121.64]:15791) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1igxK8-0007FU-Pi for qemu-devel@nongnu.org; Mon, 16 Dec 2019 15:50:36 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate25.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Mon, 16 Dec 2019 12:50:26 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Mon, 16 Dec 2019 12:50:35 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Mon, 16 Dec 2019 12:50:35 -0800 Received: from HQMAIL105.nvidia.com (172.20.187.12) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Mon, 16 Dec 2019 20:50:34 +0000 Received: from kwankhede-dev.nvidia.com (10.124.1.5) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Mon, 16 Dec 2019 20:50:27 +0000 From: Kirti Wankhede To: , Subject: [PATCH v10 Kernel 1/5] vfio: KABI for migration interface for device state Date: Tue, 17 Dec 2019 01:51:36 +0530 Message-ID: <1576527700-21805-2-git-send-email-kwankhede@nvidia.com> X-Mailer: git-send-email 2.7.0 In-Reply-To: <1576527700-21805-1-git-send-email-kwankhede@nvidia.com> References: <1576527700-21805-1-git-send-email-kwankhede@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1576529426; bh=IxVUQFInkIkXlBC++lY6n5LPiVRy+YaYSpOpnkQ5ZkQ=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:X-NVConfidentiality:MIME-Version: Content-Type; b=oOuDhHabqDlh3SmZBSKrqKnD5Mko69ynU2KZ5S6hGekQnmWSQT0l2KvaWRW0oJ8sA rinsout+G1kN5mS7TcRr/1XYFhftD6N88OSx/0yAn2EApf4HQdb47T4j4Fu/WF+L54 MraZbLtePTYyM7i1LuQqHBlPJf49ahpOVyJlSyFn3ltoHaKbS54DWFf2119B0ynhCS AgIL0/z65ljSOUr4q4p0Dn7jHBhSGQ/SKvAVkOmgFVdQGG1XwQI9IDPPNYOe6J/ylu rPTGEM1Evdp+D0GT6fgc3HPnGIfqPqaSPYEpt6FTSVB7brV8SGcrHyYIKSjkiJ71gm toBshCNdx7wsg== X-detected-operating-system: by eggs.gnu.org: Windows 7 or 8 [fuzzy] X-Received-From: 216.228.121.64 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Zhengxiao.zx@Alibaba-inc.com, kevin.tian@intel.com, yi.l.liu@intel.com, yan.y.zhao@intel.com, kvm@vger.kernel.org, eskultet@redhat.com, ziye.yang@intel.com, qemu-devel@nongnu.org, cohuck@redhat.com, shuangtai.tst@alibaba-inc.com, dgilbert@redhat.com, zhi.a.wang@intel.com, mlevitsk@redhat.com, pasic@linux.ibm.com, aik@ozlabs.ru, Kirti Wankhede , eauger@redhat.com, felipe@nutanix.com, jonathan.davies@nutanix.com, changpeng.liu@intel.com, Ken.Xue@amd.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" - Defined MIGRATION region type and sub-type. - Defined vfio_device_migration_info structure which will be placed at 0th offset of migration region to get/set VFIO device related information. Defined members of structure and usage on read/write access. - Defined device states and added state transition details in the comment. - Added sequence to be followed while saving and resuming VFIO device state Signed-off-by: Kirti Wankhede Reviewed-by: Neo Jia --- include/uapi/linux/vfio.h | 180 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 180 insertions(+) diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 9e843a147ead..a0817ba267c1 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -305,6 +305,7 @@ struct vfio_region_info_cap_type { #define VFIO_REGION_TYPE_PCI_VENDOR_MASK (0xffff) #define VFIO_REGION_TYPE_GFX (1) #define VFIO_REGION_TYPE_CCW (2) +#define VFIO_REGION_TYPE_MIGRATION (3) /* sub-types for VFIO_REGION_TYPE_PCI_* */ @@ -379,6 +380,185 @@ struct vfio_region_gfx_edid { /* sub-types for VFIO_REGION_TYPE_CCW */ #define VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD (1) +/* sub-types for VFIO_REGION_TYPE_MIGRATION */ +#define VFIO_REGION_SUBTYPE_MIGRATION (1) + +/* + * Structure vfio_device_migration_info is placed at 0th offset of + * VFIO_REGION_SUBTYPE_MIGRATION region to get/set VFIO device related migration + * information. Field accesses from this structure are only supported at their + * native width and alignment, otherwise the result is undefined and vendor + * drivers should return an error. + * + * device_state: (read/write) + * To indicate vendor driver the state VFIO device should be transitioned + * to. If device state transition fails, write on this field return error. + * It consists of 3 bits: + * - If bit 0 set, indicates _RUNNING state. When its clear, that indicates + * _STOP state. When device is changed to _STOP, driver should stop + * device before write() returns. + * - If bit 1 set, indicates _SAVING state. When set, that indicates driver + * should start gathering device state information which will be provided + * to VFIO user space application to save device's state. + * - If bit 2 set, indicates _RESUMING state. When set, that indicates + * prepare to resume device, data provided through migration region + * should be used to resume device. + * Bits 3 - 31 are reserved for future use. User should perform + * read-modify-write operation on this field. + * + * +------- _RESUMING + * |+------ _SAVING + * ||+----- _RUNNING + * ||| + * 000b => Device Stopped, not saving or resuming + * 001b => Device running state, default state + * 010b => Stop Device & save device state, stop-and-copy state + * 011b => Device running and save device state, pre-copy state + * 100b => Device stopped and device state is resuming + * 101b => Invalid state + * 110b => Invalid state + * 111b => Invalid state + * + * State transitions: + * + * _RESUMING _RUNNING Pre-copy Stop-and-copy _STOP + * (100b) (001b) (011b) (010b) (000b) + * 0. Running or Default state + * | + * + * 1. Normal Shutdown + * |------------------------------------->| + * + * 2. Save state or Suspend + * |------------------------->|---------->| + * + * 3. Save state during live migration + * |----------->|------------>|---------->| + * + * 4. Resuming + * |<---------| + * + * 5. Resumed + * |--------->| + * + * 0. Default state of VFIO device is _RUNNNG when VFIO application starts. + * 1. During normal VFIO application shutdown, vfio device state changes + * from _RUNNING to _STOP. + * 2. When VFIO application save state or suspend application, VFIO device + * state transition is from _RUNNING to stop-and-copy state and then to + * _STOP. + * On state transition from _RUNNING to stop-and-copy, driver must + * stop device, save device state and send it to application through + * migration region. + * On _RUNNING to stop-and-copy state transition failure, application should + * set VFIO device state to _RUNNING. + * 3. In VFIO application live migration, state transition is from _RUNNING + * to pre-copy to stop-and-copy to _STOP. + * On state transition from _RUNNING to pre-copy, driver should start + * gathering device state while application is still running and send device + * state data to application through migration region. + * On state transition from pre-copy to stop-and-copy, driver must stop + * device, save device state and send it to application through migration + * region. + * On any failure during any of these state transition, VFIO device state + * should be set to _RUNNING. + * 4. To start resuming phase, VFIO device state should be transitioned from + * _RUNNING to _RESUMING state. + * In _RESUMING state, driver should use received device state data through + * migration region to resume device. + * On failure during this state transition, application should set _RUNNING + * state. + * 5. On providing saved device data to driver, appliation should change state + * from _RESUMING to _RUNNING. + * On failure to transition to _RUNNING state, VFIO application should reset + * the device and set _RUNNING state so that device doesn't remain in unknown + * or bad state. On reset, driver must reset device and device should be + * available in default usable state. + * + * pending bytes: (read only) + * Number of pending bytes yet to be migrated from vendor driver + * + * data_offset: (read only) + * User application should read data_offset in migration region from where + * user application should read device data during _SAVING state or write + * device data during _RESUMING state. See below for detail of sequence to + * be followed. + * + * data_size: (read/write) + * User application should read data_size to get size of data copied in + * bytes in migration region during _SAVING state and write size of data + * copied in bytes in migration region during _RESUMING state. + * + * Migration region looks like: + * ------------------------------------------------------------------ + * |vfio_device_migration_info| data section | + * | | /////////////////////////////// | + * ------------------------------------------------------------------ + * ^ ^ + * offset 0-trapped part data_offset + * + * Structure vfio_device_migration_info is always followed by data section in + * the region, so data_offset will always be non-0. Offset from where data is + * copied is decided by kernel driver, data section can be trapped or mapped + * or partitioned, depending on how kernel driver defines data section. + * Data section partition can be defined as mapped by sparse mmap capability. + * If mmapped, then data_offset should be page aligned, where as initial section + * which contain vfio_device_migration_info structure might not end at offset + * which is page aligned. The user is not required to access via mmap regardless + * of the region mmap capabilities. + * Vendor driver should decide whether to partition data section and how to + * partition the data section. Vendor driver should return data_offset + * accordingly. + * + * Sequence to be followed for _SAVING|_RUNNING device state or pre-copy phase + * and for _SAVING device state or stop-and-copy phase: + * a. read pending_bytes, indicates start of new iteration to get device data. + * If there was previous iteration, then this read operation indicates + * previous iteration is done. If pending_bytes > 0, go through below steps. + * b. read data_offset, indicates kernel driver to make data available through + * data section. Kernel driver should return this read operation only after + * data is available from (region + data_offset) to (region + data_offset + + * data_size). + * c. read data_size, amount of data in bytes available through migration + * region. + * d. read data of data_size bytes from (region + data_offset) from migration + * region. + * e. process data. + * f. Loop through a to e. + * + * Sequence to be followed while _RESUMING device state: + * While data for this device is available, repeat below steps: + * a. read data_offset from where user application should write data. + * b. write data of data_size to migration region from data_offset. + * c. write data_size which indicates vendor driver that data is written in + * staging buffer. Vendor driver should read this data from migration + * region and resume device's state. + * + * For user application, data is opaque. User should write data in the same + * order as received. + */ + +struct vfio_device_migration_info { + __u32 device_state; /* VFIO device state */ +#define VFIO_DEVICE_STATE_STOP (1 << 0) +#define VFIO_DEVICE_STATE_RUNNING (1 << 0) +#define VFIO_DEVICE_STATE_SAVING (1 << 1) +#define VFIO_DEVICE_STATE_RESUMING (1 << 2) +#define VFIO_DEVICE_STATE_MASK (VFIO_DEVICE_STATE_RUNNING | \ + VFIO_DEVICE_STATE_SAVING | \ + VFIO_DEVICE_STATE_RESUMING) + +#define VFIO_DEVICE_STATE_INVALID_CASE1 (VFIO_DEVICE_STATE_SAVING | \ + VFIO_DEVICE_STATE_RESUMING) + +#define VFIO_DEVICE_STATE_INVALID_CASE2 (VFIO_DEVICE_STATE_RUNNING | \ + VFIO_DEVICE_STATE_RESUMING) + __u32 reserved; + __u64 pending_bytes; + __u64 data_offset; + __u64 data_size; +} __attribute__((packed)); + /* * The MSIX mappable capability informs that MSIX data of a BAR can be mmapped * which allows direct access to non-MSIX registers which happened to be within From patchwork Mon Dec 16 20:21:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kirti Wankhede X-Patchwork-Id: 11295309 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5790B109A for ; Mon, 16 Dec 2019 20:59:27 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2C1B521582 for ; Mon, 16 Dec 2019 20:59:27 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="CrqJtDjs" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2C1B521582 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:59986 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1igxSg-0006MH-7E for patchwork-qemu-devel@patchwork.kernel.org; Mon, 16 Dec 2019 15:59:26 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:36891) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1igxKH-0002UR-3g for qemu-devel@nongnu.org; Mon, 16 Dec 2019 15:50:47 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1igxKG-0007KP-1E for qemu-devel@nongnu.org; Mon, 16 Dec 2019 15:50:44 -0500 Received: from hqnvemgate24.nvidia.com ([216.228.121.143]:5840) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1igxKF-0007JK-SH for qemu-devel@nongnu.org; Mon, 16 Dec 2019 15:50:43 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate24.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Mon, 16 Dec 2019 12:50:15 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Mon, 16 Dec 2019 12:50:42 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Mon, 16 Dec 2019 12:50:42 -0800 Received: from HQMAIL105.nvidia.com (172.20.187.12) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Mon, 16 Dec 2019 20:50:41 +0000 Received: from kwankhede-dev.nvidia.com (10.124.1.5) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Mon, 16 Dec 2019 20:50:35 +0000 From: Kirti Wankhede To: , Subject: [PATCH v10 Kernel 2/5] vfio iommu: Adds flag to indicate dirty pages tracking capability support Date: Tue, 17 Dec 2019 01:51:37 +0530 Message-ID: <1576527700-21805-3-git-send-email-kwankhede@nvidia.com> X-Mailer: git-send-email 2.7.0 In-Reply-To: <1576527700-21805-1-git-send-email-kwankhede@nvidia.com> References: <1576527700-21805-1-git-send-email-kwankhede@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1576529415; bh=UPYFmaFN47EhcHp79NnQF+QfSPH8G2xOgEgl05wOr30=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:X-NVConfidentiality:MIME-Version: Content-Type; b=CrqJtDjsHuk0X/9bY4qyk9X5shvfKasBWgI4KMZZQKJl18ZOZqc6IXdaoHQ80/OrI oCsPRvnik5oqjdzbUAsoHxWwCzGaELpVLVvyvK/fG8siBeqO4bVakyRVTdtP+DmSRx 3dbK5OoRjhzbDH8AhdBCim1tjOAKQtumQoAxZQLqLdcG4JsIa38gv8LxxVGkO6bEE4 5s7N+7NdLh5n6CDUff/SIS43mmir+Xrqq/x9/c0t1D3eH/ppa9fEZhtTAcxADTI/hR BPdgn3xJdOFDlFxWnQttmgPX7l5IpDj9U3D2hXcN2h40IuHMlkuxdpIHVjJgYwKtoP DdLNNMkZNjDvA== X-detected-operating-system: by eggs.gnu.org: Windows 7 or 8 [fuzzy] X-Received-From: 216.228.121.143 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Zhengxiao.zx@Alibaba-inc.com, kevin.tian@intel.com, yi.l.liu@intel.com, yan.y.zhao@intel.com, kvm@vger.kernel.org, eskultet@redhat.com, ziye.yang@intel.com, qemu-devel@nongnu.org, cohuck@redhat.com, shuangtai.tst@alibaba-inc.com, dgilbert@redhat.com, zhi.a.wang@intel.com, mlevitsk@redhat.com, pasic@linux.ibm.com, aik@ozlabs.ru, Kirti Wankhede , eauger@redhat.com, felipe@nutanix.com, jonathan.davies@nutanix.com, changpeng.liu@intel.com, Ken.Xue@amd.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" Flag VFIO_IOMMU_INFO_DIRTY_PGS in VFIO_IOMMU_GET_INFO indicates that driver support dirty pages tracking. Signed-off-by: Kirti Wankhede Reviewed-by: Neo Jia --- drivers/vfio/vfio_iommu_type1.c | 3 ++- include/uapi/linux/vfio.h | 5 +++-- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 2ada8e6cdb88..3f6b04f2334f 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -2234,7 +2234,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, info.cap_offset = 0; /* output, no-recopy necessary */ } - info.flags = VFIO_IOMMU_INFO_PGSIZES; + info.flags = VFIO_IOMMU_INFO_PGSIZES | + VFIO_IOMMU_INFO_DIRTY_PGS; info.iova_pgsizes = vfio_pgsize_bitmap(iommu); diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index a0817ba267c1..81847ed54eb7 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -900,8 +900,9 @@ struct vfio_device_ioeventfd { struct vfio_iommu_type1_info { __u32 argsz; __u32 flags; -#define VFIO_IOMMU_INFO_PGSIZES (1 << 0) /* supported page sizes info */ -#define VFIO_IOMMU_INFO_CAPS (1 << 1) /* Info supports caps */ +#define VFIO_IOMMU_INFO_PGSIZES (1 << 0) /* supported page sizes info */ +#define VFIO_IOMMU_INFO_CAPS (1 << 1) /* Info supports caps */ +#define VFIO_IOMMU_INFO_DIRTY_PGS (1 << 2) /* supports dirty page tracking */ __u64 iova_pgsizes; /* Bitmap of supported page sizes */ __u32 cap_offset; /* Offset within info struct of first cap */ }; From patchwork Mon Dec 16 20:21:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kirti Wankhede X-Patchwork-Id: 11295311 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C9B46112B for ; Mon, 16 Dec 2019 21:00:25 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9DC8821582 for ; Mon, 16 Dec 2019 21:00:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="qI27Qpt3" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9DC8821582 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:60004 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1igxTc-0007HH-RV for patchwork-qemu-devel@patchwork.kernel.org; Mon, 16 Dec 2019 16:00:24 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:36925) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1igxKO-0002fU-9m for qemu-devel@nongnu.org; Mon, 16 Dec 2019 15:50:53 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1igxKN-0007NN-1F for qemu-devel@nongnu.org; Mon, 16 Dec 2019 15:50:52 -0500 Received: from hqnvemgate26.nvidia.com ([216.228.121.65]:6553) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1igxKM-0007NC-S4 for qemu-devel@nongnu.org; Mon, 16 Dec 2019 15:50:50 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate26.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Mon, 16 Dec 2019 12:50:40 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Mon, 16 Dec 2019 12:50:49 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Mon, 16 Dec 2019 12:50:49 -0800 Received: from HQMAIL105.nvidia.com (172.20.187.12) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Mon, 16 Dec 2019 20:50:48 +0000 Received: from kwankhede-dev.nvidia.com (10.124.1.5) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Mon, 16 Dec 2019 20:50:42 +0000 From: Kirti Wankhede To: , Subject: [PATCH v10 Kernel 3/5] vfio iommu: Add ioctl defination for dirty pages tracking. Date: Tue, 17 Dec 2019 01:51:38 +0530 Message-ID: <1576527700-21805-4-git-send-email-kwankhede@nvidia.com> X-Mailer: git-send-email 2.7.0 In-Reply-To: <1576527700-21805-1-git-send-email-kwankhede@nvidia.com> References: <1576527700-21805-1-git-send-email-kwankhede@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1576529440; bh=9C0wkz7uJE+YfhvUbgMBGA/2c9koZ9iDMNMc32hwsoY=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:X-NVConfidentiality:MIME-Version: Content-Type; b=qI27Qpt3JBLOI2H89/9kaubuoatwhL+/O4x4/DPVQMypeIUcg3c7oxQ+Z/EDTabTy 3tUcgQLb7rtsZEYKsUVIawZe0mn61y3Uq+Jt1ndX3tdbo3bnd1hWSnkb0hzmwUA1zV CPO3tGxVHPRAJUf2oq+53GOrM8N3yG/g4vXqFW9BX2kwoSBoOncwVoTdDvt9BCXhQ5 db54xxK7je6CBeRTrmwT9XxJX0bUh30yKOrnlmh2xQrBjdYwIMKc0JZvrKyet/7dXp vLMR2ktqs/55O+ihVrAaGDAcvAfoVq1ViIQCLaIWCS8KHaPQ9RS8kDVRrjz+xz9v+a nR+LD0Iy5teMw== X-detected-operating-system: by eggs.gnu.org: Windows 7 or 8 [fuzzy] X-Received-From: 216.228.121.65 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Zhengxiao.zx@Alibaba-inc.com, kevin.tian@intel.com, yi.l.liu@intel.com, yan.y.zhao@intel.com, kvm@vger.kernel.org, eskultet@redhat.com, ziye.yang@intel.com, qemu-devel@nongnu.org, cohuck@redhat.com, shuangtai.tst@alibaba-inc.com, dgilbert@redhat.com, zhi.a.wang@intel.com, mlevitsk@redhat.com, pasic@linux.ibm.com, aik@ozlabs.ru, Kirti Wankhede , eauger@redhat.com, felipe@nutanix.com, jonathan.davies@nutanix.com, changpeng.liu@intel.com, Ken.Xue@amd.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" IOMMU container maintains a list of all pages pinned by vfio_pin_pages API. All pages pinned by vendor driver through this API should be considered as dirty during migration. When container consists of IOMMU capable device and all pages are pinned and mapped, then all pages are marked dirty. Added support to start/stop unpinned pages tracking and to get bitmap of all dirtied pages for requested IO virtual address range. Unpinned page tracking is cleared either when bitmap is read by user application or unpinned page tracking is stopped. Signed-off-by: Kirti Wankhede Reviewed-by: Neo Jia --- include/uapi/linux/vfio.h | 43 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 81847ed54eb7..4ad54fbb4698 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -975,6 +975,49 @@ struct vfio_iommu_type1_dma_unmap { #define VFIO_IOMMU_ENABLE _IO(VFIO_TYPE, VFIO_BASE + 15) #define VFIO_IOMMU_DISABLE _IO(VFIO_TYPE, VFIO_BASE + 16) +/** + * VFIO_IOMMU_DIRTY_PAGES - _IOWR(VFIO_TYPE, VFIO_BASE + 17, + * struct vfio_iommu_type1_dirty_bitmap) + * IOCTL is used for dirty pages tracking. Caller sets argsz, which is size of + * struct vfio_iommu_type1_dirty_bitmap. Caller set flag depend on which + * operation to perform, details as below: + * + * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_START set, indicates + * migration is active and IOMMU module should track pages which are being + * unpinned. Unpinned pages are tracked until bitmap for that range is queried + * or tracking is stopped by user application by setting + * VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP flag. + * + * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP set, indicates + * IOMMU should stop tracking unpinned pages and also free previously tracked + * unpinned pages data. + * + * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP flag set, + * IOCTL returns dirty pages bitmap for IOMMU container during migration for + * given IOVA range. User must allocate memory to get bitmap, zero the bitmap + * memory and set size of allocated memory in bitmap_size field. One bit is + * used to represent one page consecutively starting from iova offset. User + * should provide page size in 'pgsize'. Bit set in bitmap indicates page at + * that offset from iova is dirty. + * + * Only one flag should be set at a time. + * + */ +struct vfio_iommu_type1_dirty_bitmap { + __u32 argsz; + __u32 flags; +#define VFIO_IOMMU_DIRTY_PAGES_FLAG_START (1 << 0) +#define VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP (1 << 1) +#define VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP (1 << 2) + __u64 iova; /* IO virtual address */ + __u64 size; /* Size of iova range */ + __u64 pgsize; /* page size for bitmap */ + __u64 bitmap_size; /* in bytes */ + void __user *bitmap; /* one bit per page */ +}; + +#define VFIO_IOMMU_DIRTY_PAGES _IO(VFIO_TYPE, VFIO_BASE + 17) + /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */ /* From patchwork Mon Dec 16 20:21:39 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kirti Wankhede X-Patchwork-Id: 11295307 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6154D109A for ; Mon, 16 Dec 2019 20:58:32 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3687121582 for ; Mon, 16 Dec 2019 20:58:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="Sk+Q7NuB" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3687121582 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:59966 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1igxRn-00056p-Bz for patchwork-qemu-devel@patchwork.kernel.org; Mon, 16 Dec 2019 15:58:31 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:36945) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1igxKW-0002t7-7K for qemu-devel@nongnu.org; Mon, 16 Dec 2019 15:51:01 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1igxKU-0007OI-IJ for qemu-devel@nongnu.org; Mon, 16 Dec 2019 15:51:00 -0500 Received: from hqnvemgate24.nvidia.com ([216.228.121.143]:5858) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1igxKU-0007OA-Ch for qemu-devel@nongnu.org; Mon, 16 Dec 2019 15:50:58 -0500 Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate24.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Mon, 16 Dec 2019 12:50:29 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Mon, 16 Dec 2019 12:50:56 -0800 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Mon, 16 Dec 2019 12:50:56 -0800 Received: from HQMAIL105.nvidia.com (172.20.187.12) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Mon, 16 Dec 2019 20:50:56 +0000 Received: from kwankhede-dev.nvidia.com (10.124.1.5) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Mon, 16 Dec 2019 20:50:49 +0000 From: Kirti Wankhede To: , Subject: [PATCH v10 Kernel 4/5] vfio iommu: Implementation of ioctl to for dirty pages tracking. Date: Tue, 17 Dec 2019 01:51:39 +0530 Message-ID: <1576527700-21805-5-git-send-email-kwankhede@nvidia.com> X-Mailer: git-send-email 2.7.0 In-Reply-To: <1576527700-21805-1-git-send-email-kwankhede@nvidia.com> References: <1576527700-21805-1-git-send-email-kwankhede@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1576529430; bh=dFKZl1eFsFUEtP+oN/+QLnayBa+NmsRRReuVPXuwu8U=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:X-NVConfidentiality:MIME-Version: Content-Type; b=Sk+Q7NuBZagNC8xNORYRYYZrdQDn7MiPOCr7T9le7m12zdoGRlnpdATPJYVk0DvRp mqTcNQl0caSwCm/q9EYL2BUmJRP/nBVRnLJ4BvLqCfnJQAH3XHIsIPzo0PpWxa7OxT wMUB9H2R75WGMLFln+/WWumAtfKQw5CguX96Ab5ES3q35bZTe5o6tkQ+NBQXFHkXPm GZS5blL1iB30hT70Pt8BRnHyHQ11J//rsLPTnxIMIpY9TLkxFhuD5lTSw4azVm6KvC B1Ru7wR9DXX1lcUyRfRwLN/oMWMRGIi1kHEGAeitdG6QVATJt/Zx9X1nKShfgENpSw MuBUriKxUYUjA== X-detected-operating-system: by eggs.gnu.org: Windows 7 or 8 [fuzzy] X-Received-From: 216.228.121.143 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Zhengxiao.zx@Alibaba-inc.com, kevin.tian@intel.com, yi.l.liu@intel.com, yan.y.zhao@intel.com, kvm@vger.kernel.org, eskultet@redhat.com, ziye.yang@intel.com, qemu-devel@nongnu.org, cohuck@redhat.com, shuangtai.tst@alibaba-inc.com, dgilbert@redhat.com, zhi.a.wang@intel.com, mlevitsk@redhat.com, pasic@linux.ibm.com, aik@ozlabs.ru, Kirti Wankhede , eauger@redhat.com, felipe@nutanix.com, jonathan.davies@nutanix.com, changpeng.liu@intel.com, Ken.Xue@amd.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations: - Start unpinned pages dirty pages tracking while migration is active and device is running, i.e. during pre-copy phase. - Stop unpinned pages dirty pages tracking. This is required to stop unpinned dirty pages tracking if migration failed or cancelled during pre-copy phase. Unpinned pages tracking is clear. - Get dirty pages bitmap. Stop unpinned dirty pages tracking and clear unpinned pages information on bitmap read. This ioctl returns bitmap of dirty pages, its user space application responsibility to copy content of dirty pages from source to destination during migration. Signed-off-by: Kirti Wankhede Reviewed-by: Neo Jia --- drivers/vfio/vfio_iommu_type1.c | 210 ++++++++++++++++++++++++++++++++++++++-- 1 file changed, 203 insertions(+), 7 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 3f6b04f2334f..264449654d3f 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -70,6 +70,7 @@ struct vfio_iommu { unsigned int dma_avail; bool v2; bool nesting; + bool dirty_page_tracking; }; struct vfio_domain { @@ -112,6 +113,7 @@ struct vfio_pfn { dma_addr_t iova; /* Device address */ unsigned long pfn; /* Host pfn */ atomic_t ref_count; + bool unpinned; }; struct vfio_regions { @@ -244,6 +246,32 @@ static void vfio_remove_from_pfn_list(struct vfio_dma *dma, kfree(vpfn); } +static void vfio_remove_unpinned_from_pfn_list(struct vfio_dma *dma, bool warn) +{ + struct rb_node *n = rb_first(&dma->pfn_list); + + for (; n; n = rb_next(n)) { + struct vfio_pfn *vpfn = rb_entry(n, struct vfio_pfn, node); + + if (warn) + WARN_ON_ONCE(vpfn->unpinned); + + if (vpfn->unpinned) + vfio_remove_from_pfn_list(dma, vpfn); + } +} + +static void vfio_remove_unpinned_from_dma_list(struct vfio_iommu *iommu) +{ + struct rb_node *n = rb_first(&iommu->dma_list); + + for (; n; n = rb_next(n)) { + struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node); + + vfio_remove_unpinned_from_pfn_list(dma, false); + } +} + static struct vfio_pfn *vfio_iova_get_vfio_pfn(struct vfio_dma *dma, unsigned long iova) { @@ -254,13 +282,17 @@ static struct vfio_pfn *vfio_iova_get_vfio_pfn(struct vfio_dma *dma, return vpfn; } -static int vfio_iova_put_vfio_pfn(struct vfio_dma *dma, struct vfio_pfn *vpfn) +static int vfio_iova_put_vfio_pfn(struct vfio_dma *dma, struct vfio_pfn *vpfn, + bool dirty_tracking) { int ret = 0; if (atomic_dec_and_test(&vpfn->ref_count)) { ret = put_pfn(vpfn->pfn, dma->prot); - vfio_remove_from_pfn_list(dma, vpfn); + if (dirty_tracking) + vpfn->unpinned = true; + else + vfio_remove_from_pfn_list(dma, vpfn); } return ret; } @@ -504,7 +536,7 @@ static int vfio_pin_page_external(struct vfio_dma *dma, unsigned long vaddr, } static int vfio_unpin_page_external(struct vfio_dma *dma, dma_addr_t iova, - bool do_accounting) + bool do_accounting, bool dirty_tracking) { int unlocked; struct vfio_pfn *vpfn = vfio_find_vpfn(dma, iova); @@ -512,7 +544,10 @@ static int vfio_unpin_page_external(struct vfio_dma *dma, dma_addr_t iova, if (!vpfn) return 0; - unlocked = vfio_iova_put_vfio_pfn(dma, vpfn); + if (vpfn->unpinned) + return 0; + + unlocked = vfio_iova_put_vfio_pfn(dma, vpfn, dirty_tracking); if (do_accounting) vfio_lock_acct(dma, -unlocked, true); @@ -583,7 +618,8 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data, ret = vfio_add_to_pfn_list(dma, iova, phys_pfn[i]); if (ret) { - vfio_unpin_page_external(dma, iova, do_accounting); + vfio_unpin_page_external(dma, iova, do_accounting, + false); goto pin_unwind; } } @@ -598,7 +634,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data, iova = user_pfn[j] << PAGE_SHIFT; dma = vfio_find_dma(iommu, iova, PAGE_SIZE); - vfio_unpin_page_external(dma, iova, do_accounting); + vfio_unpin_page_external(dma, iova, do_accounting, false); phys_pfn[j] = 0; } pin_done: @@ -632,7 +668,8 @@ static int vfio_iommu_type1_unpin_pages(void *iommu_data, dma = vfio_find_dma(iommu, iova, PAGE_SIZE); if (!dma) goto unpin_exit; - vfio_unpin_page_external(dma, iova, do_accounting); + vfio_unpin_page_external(dma, iova, do_accounting, + iommu->dirty_page_tracking); } unpin_exit: @@ -850,6 +887,88 @@ static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu) return bitmap; } +/* + * start_iova is the reference from where bitmaping started. This is called + * from DMA_UNMAP where start_iova can be different than iova + */ + +static void vfio_iova_dirty_bitmap(struct vfio_iommu *iommu, dma_addr_t iova, + size_t size, uint64_t pgsize, + dma_addr_t start_iova, unsigned long *bitmap) +{ + struct vfio_dma *dma; + dma_addr_t i = iova; + unsigned long pgshift = __ffs(pgsize); + + while ((dma = vfio_find_dma(iommu, i, pgsize))) { + /* mark all pages dirty if all pages are pinned and mapped. */ + if (dma->iommu_mapped) { + dma_addr_t iova_limit; + + iova_limit = (dma->iova + dma->size) < (iova + size) ? + (dma->iova + dma->size) : (iova + size); + + for (; i < iova_limit; i += pgsize) { + unsigned int start; + + start = (i - start_iova) >> pgshift; + + __bitmap_set(bitmap, start, 1); + } + if (i >= iova + size) + return; + } else { + struct rb_node *n = rb_first(&dma->pfn_list); + bool found = false; + + for (; n; n = rb_next(n)) { + struct vfio_pfn *vpfn = rb_entry(n, + struct vfio_pfn, node); + if (vpfn->iova >= i) { + found = true; + break; + } + } + + if (!found) { + i += dma->size; + continue; + } + + for (; n; n = rb_next(n)) { + unsigned int start; + struct vfio_pfn *vpfn = rb_entry(n, + struct vfio_pfn, node); + + if (vpfn->iova >= iova + size) + return; + + start = (vpfn->iova - start_iova) >> pgshift; + + __bitmap_set(bitmap, start, 1); + + i = vpfn->iova + pgsize; + } + } + vfio_remove_unpinned_from_pfn_list(dma, false); + } +} + +static long verify_bitmap_size(unsigned long npages, unsigned long bitmap_size) +{ + long bsize; + + if (!bitmap_size || bitmap_size > SIZE_MAX) + return -EINVAL; + + bsize = ALIGN(npages, BITS_PER_LONG) / sizeof(unsigned long); + + if (bitmap_size < bsize) + return -EINVAL; + + return bsize; +} + static int vfio_dma_do_unmap(struct vfio_iommu *iommu, struct vfio_iommu_type1_dma_unmap *unmap) { @@ -2298,6 +2417,83 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, return copy_to_user((void __user *)arg, &unmap, minsz) ? -EFAULT : 0; + } else if (cmd == VFIO_IOMMU_DIRTY_PAGES) { + struct vfio_iommu_type1_dirty_bitmap range; + uint32_t mask = VFIO_IOMMU_DIRTY_PAGES_FLAG_START | + VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP | + VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP; + int ret; + + if (!iommu->v2) + return -EACCES; + + minsz = offsetofend(struct vfio_iommu_type1_dirty_bitmap, + bitmap); + + if (copy_from_user(&range, (void __user *)arg, minsz)) + return -EFAULT; + + if (range.argsz < minsz || range.flags & ~mask) + return -EINVAL; + + if (range.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_START) { + iommu->dirty_page_tracking = true; + return 0; + } else if (range.flags & VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP) { + iommu->dirty_page_tracking = false; + + mutex_lock(&iommu->lock); + vfio_remove_unpinned_from_dma_list(iommu); + mutex_unlock(&iommu->lock); + return 0; + + } else if (range.flags & + VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP) { + uint64_t iommu_pgmask; + unsigned long pgshift = __ffs(range.pgsize); + unsigned long *bitmap; + long bsize; + + iommu_pgmask = + ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1; + + if (((range.pgsize - 1) & iommu_pgmask) != + (range.pgsize - 1)) + return -EINVAL; + + if (range.iova & iommu_pgmask) + return -EINVAL; + if (!range.size || range.size > SIZE_MAX) + return -EINVAL; + if (range.iova + range.size < range.iova) + return -EINVAL; + + bsize = verify_bitmap_size(range.size >> pgshift, + range.bitmap_size); + if (bsize) + return ret; + + bitmap = kmalloc(bsize, GFP_KERNEL); + if (!bitmap) + return -ENOMEM; + + ret = copy_from_user(bitmap, + (void __user *)range.bitmap, bsize) ? -EFAULT : 0; + if (ret) + goto bitmap_exit; + + iommu->dirty_page_tracking = false; + mutex_lock(&iommu->lock); + vfio_iova_dirty_bitmap(iommu, range.iova, range.size, + range.pgsize, range.iova, bitmap); + mutex_unlock(&iommu->lock); + + ret = copy_to_user((void __user *)range.bitmap, bitmap, + range.bitmap_size) ? -EFAULT : 0; +bitmap_exit: + kfree(bitmap); + return ret; + } } return -ENOTTY; From patchwork Mon Dec 16 20:21:40 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kirti Wankhede X-Patchwork-Id: 11295303 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9AB9A930 for ; Mon, 16 Dec 2019 20:57:07 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6FEA621582 for ; Mon, 16 Dec 2019 20:57:07 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="fKkhG0Rp" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6FEA621582 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Received: from localhost ([::1]:59934 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1igxQQ-00039w-90 for patchwork-qemu-devel@patchwork.kernel.org; Mon, 16 Dec 2019 15:57:06 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:37002) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1igxKc-00033u-RZ for qemu-devel@nongnu.org; Mon, 16 Dec 2019 15:51:11 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1igxKb-0007Q5-C2 for qemu-devel@nongnu.org; Mon, 16 Dec 2019 15:51:06 -0500 Received: from hqnvemgate24.nvidia.com ([216.228.121.143]:5869) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1igxKb-0007Ps-3h for qemu-devel@nongnu.org; Mon, 16 Dec 2019 15:51:05 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate24.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Mon, 16 Dec 2019 12:50:36 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Mon, 16 Dec 2019 12:51:03 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Mon, 16 Dec 2019 12:51:03 -0800 Received: from HQMAIL105.nvidia.com (172.20.187.12) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Mon, 16 Dec 2019 20:51:03 +0000 Received: from kwankhede-dev.nvidia.com (10.124.1.5) by HQMAIL105.nvidia.com (172.20.187.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Mon, 16 Dec 2019 20:50:56 +0000 From: Kirti Wankhede To: , Subject: [PATCH v10 Kernel 5/5] vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap Date: Tue, 17 Dec 2019 01:51:40 +0530 Message-ID: <1576527700-21805-6-git-send-email-kwankhede@nvidia.com> X-Mailer: git-send-email 2.7.0 In-Reply-To: <1576527700-21805-1-git-send-email-kwankhede@nvidia.com> References: <1576527700-21805-1-git-send-email-kwankhede@nvidia.com> X-NVConfidentiality: public MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1576529437; bh=NNfqr2vUhNMQwsoU1oTVwnAsTm+8VgO1yNM6gGnpjxc=; h=X-PGP-Universal:From:To:CC:Subject:Date:Message-ID:X-Mailer: In-Reply-To:References:X-NVConfidentiality:MIME-Version: Content-Type; b=fKkhG0Rpo2uQO3Ql7u1tPsMGHIrYdi6htp/An+VH/Rvc607coS/qlLMRR2QUvsDC1 7dBFjQZnMzH8nLwiVjV7INjbuglDcYU9M63kFCdHRjqRURv2qOOpmEKFyBjLzitEzZ MpAd3lAe1qFlWZPCv7Rva8/WRBhoenFNPyqLK3RvnRRvBHcPkxa3w9b9VcW/8mzi+N ftVxx2IXJH8fwB0t2GBDUyGP5ycpL8fgEVRicp2I9FX+Tiwd5lyC+IcHOmgVCeqZ5s m//kTO1RkQm3OnALZAXCquXHX6fwIaSISKpvcbowUCqSJP8X9fK0lntKTS5XoD1LLl a0rGHPE6//LAg== X-detected-operating-system: by eggs.gnu.org: Windows 7 or 8 [fuzzy] X-Received-From: 216.228.121.143 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Zhengxiao.zx@Alibaba-inc.com, kevin.tian@intel.com, yi.l.liu@intel.com, yan.y.zhao@intel.com, kvm@vger.kernel.org, eskultet@redhat.com, ziye.yang@intel.com, qemu-devel@nongnu.org, cohuck@redhat.com, shuangtai.tst@alibaba-inc.com, dgilbert@redhat.com, zhi.a.wang@intel.com, mlevitsk@redhat.com, pasic@linux.ibm.com, aik@ozlabs.ru, Kirti Wankhede , eauger@redhat.com, felipe@nutanix.com, jonathan.davies@nutanix.com, changpeng.liu@intel.com, Ken.Xue@amd.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" Pages, pinned by external interface for requested IO virtual address range, might get unpinned and unmapped while migration is active and device is still running, that is, in pre-copy phase while guest driver still could access those pages. Host device can write to these pages while those were mapped. Such pages should be marked dirty so that after migration guest driver should still be able to complete the operation. To get bitmap during unmap, user should set flag VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP, bitmap memory should be allocated and zeroed by user space application. Bitmap size and page size should be set by user application. Signed-off-by: Kirti Wankhede Reviewed-by: Neo Jia --- drivers/vfio/vfio_iommu_type1.c | 63 ++++++++++++++++++++++++++++++++++++----- include/uapi/linux/vfio.h | 12 ++++++++ 2 files changed, 68 insertions(+), 7 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 264449654d3f..6bd02a13903b 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -970,7 +970,8 @@ static long verify_bitmap_size(unsigned long npages, unsigned long bitmap_size) } static int vfio_dma_do_unmap(struct vfio_iommu *iommu, - struct vfio_iommu_type1_dma_unmap *unmap) + struct vfio_iommu_type1_dma_unmap *unmap, + unsigned long *bitmap) { uint64_t mask; struct vfio_dma *dma, *dma_last = NULL; @@ -1045,6 +1046,15 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu, if (dma->task->mm != current->mm) break; + if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) && + (dma_last != dma)) + vfio_iova_dirty_bitmap(iommu, dma->iova, dma->size, + unmap->bitmap_pgsize, unmap->iova, + bitmap); + else + vfio_remove_unpinned_from_pfn_list(dma, true); + + if (!RB_EMPTY_ROOT(&dma->pfn_list)) { struct vfio_iommu_type1_dma_unmap nb_unmap; @@ -1070,6 +1080,7 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu, &nb_unmap); goto again; } + unmapped += dma->size; vfio_remove_dma(iommu, dma); } @@ -2401,22 +2412,60 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, } else if (cmd == VFIO_IOMMU_UNMAP_DMA) { struct vfio_iommu_type1_dma_unmap unmap; - long ret; + unsigned long *bitmap = NULL; + long ret, bsize; minsz = offsetofend(struct vfio_iommu_type1_dma_unmap, size); - if (copy_from_user(&unmap, (void __user *)arg, minsz)) + if (copy_from_user(&unmap, (void __user *)arg, sizeof(unmap))) return -EFAULT; - if (unmap.argsz < minsz || unmap.flags) + if (unmap.argsz < minsz || + unmap.flags & ~VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) return -EINVAL; - ret = vfio_dma_do_unmap(iommu, &unmap); + if (unmap.flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) { + unsigned long pgshift = __ffs(unmap.bitmap_pgsize); + uint64_t iommu_pgmask = + ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1; + + if (((unmap.bitmap_pgsize - 1) & iommu_pgmask) != + (unmap.bitmap_pgsize - 1)) + return -EINVAL; + + bsize = verify_bitmap_size(unmap.size >> pgshift, + unmap.bitmap_size); + if (bsize < 0) + return bsize; + + bitmap = kmalloc(bsize, GFP_KERNEL); + if (!bitmap) + return -ENOMEM; + + if (copy_from_user(bitmap, (void __user *)unmap.bitmap, + bsize)) { + ret = -EFAULT; + goto unmap_exit; + } + } + + ret = vfio_dma_do_unmap(iommu, &unmap, bitmap); if (ret) - return ret; + goto unmap_exit; - return copy_to_user((void __user *)arg, &unmap, minsz) ? + if (unmap.flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) { + if (copy_to_user((void __user *)unmap.bitmap, bitmap, + bsize)) { + ret = -EFAULT; + goto unmap_exit; + } + } + + ret = copy_to_user((void __user *)arg, &unmap, minsz) ? -EFAULT : 0; +unmap_exit: + kfree(bitmap); + return ret; } else if (cmd == VFIO_IOMMU_DIRTY_PAGES) { struct vfio_iommu_type1_dirty_bitmap range; uint32_t mask = VFIO_IOMMU_DIRTY_PAGES_FLAG_START | diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 4ad54fbb4698..7705aea7bdaf 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -958,12 +958,24 @@ struct vfio_iommu_type1_dma_map { * field. No guarantee is made to the user that arbitrary unmaps of iova * or size different from those used in the original mapping call will * succeed. + * VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP should be set to get dirty bitmap + * before unmapping IO virtual addresses. When this flag is set, user should + * allocate memory to get bitmap, clear the bitmap memory by setting zero and + * should set size of allocated memory in bitmap_size field. One bit in bitmap + * represents per page , page of user provided page size in 'bitmap_pgsize', + * consecutively starting from iova offset. Bit set indicates page at that + * offset from iova is dirty. Bitmap of pages in the range of unmapped size is + * returned in bitmap. */ struct vfio_iommu_type1_dma_unmap { __u32 argsz; __u32 flags; +#define VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP (1 << 0) __u64 iova; /* IO virtual address */ __u64 size; /* Size of mapping (bytes) */ + __u64 bitmap_pgsize; /* page size for bitmap */ + __u64 bitmap_size; /* in bytes */ + void __user *bitmap; /* one bit per page */ }; #define VFIO_IOMMU_UNMAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 14)