From patchwork Sun Nov 6 17:46:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13033516 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53E41C4332F for ; Sun, 6 Nov 2022 17:47:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230345AbiKFRrR (ORCPT ); Sun, 6 Nov 2022 12:47:17 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44386 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230088AbiKFRrO (ORCPT ); Sun, 6 Nov 2022 12:47:14 -0500 Received: from NAM12-BN8-obe.outbound.protection.outlook.com (mail-bn8nam12on2083.outbound.protection.outlook.com [40.107.237.83]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D1CF5643A for ; Sun, 6 Nov 2022 09:47:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=mYVXu5CeSh0YpemIsQzVUE1gA5g7nwFB5sxCGIwwLJo5DSS5xzF5tIUcoBQCZc6yR9C2xewgJKtPBdS84RubkDmpKdkcH6kVgxaoZ7f9Sul8fj9rz0aF9StDS5KIaK43H6nKrZdpBjQXVY7QLZ6DnSCy1WtYvnf+N5ERFwOJuQURLzgnRYxie1vfzTDJr9WUUwHrvqtztoGgO8d6d8pmShwlp/cDPxw8+97Dzko0XR4tQNloI6IbGGyZZO45j3jPjivHjAsaiGE1IIFRcdOUZ84dURC+1bvX7xtUZxFnPI7h6q2wbZOQ7dX3HPZwyUKVcb6pZd0/nDDsitlobGTOEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=7dzhG0/JcTEqGmUu8NC5P8WaUdJwbPRopjg1xRBDAPA=; b=JS1by11ogiHOR4QrCV5VJyesVOG8Lg3a9/Rp1nq2HfeJXkaPfVQ9Vwnvc0kmGrDskiN715Kqd83RHfgZ+bKYBKZn05oRKvKH9Y+H+BzTFyGgWmN5FD84/+57GMxRBSgs2+1YeD5CNm50qnazwoi0IYZNQFlPbp2X368M5dyY1zsOAwgeQrXFeQwDPkbeHZ9JERahalfKwlGytn3ZFTNwbCQp3te5/OUbVmwpd6PB5xAUUeZFUAHAYJoaaRBdF4WZdZRU14VAYB/y5AI1CPIUK6fQ2ekEman1QK+T+zPHFhZ3b88geFN2qZvesp8eufiWKUuGHIOGLPfhfMc7JeMIhg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.232) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=7dzhG0/JcTEqGmUu8NC5P8WaUdJwbPRopjg1xRBDAPA=; b=dkofhbkMDAxlAOqg8kzChw9ffPXvjiEne4e4ha7AOJrZXbjT1QNIO0MUNZrztODuJ7ZhvKHE7uyoDFS6phIv0kVrLPkjgrLayuK9UqLUmDJfxNT5TY7x0zm+WcScii7QodhZk2Ptmv40nnYUQyL36rVULJ69477A+6HN+TO3eR3/sS9w5K2tcsuqbLCZxeL8Gmz/9uz7vsVUjD6n1w6Tl7JR7RmxcIBkzLGBENuanTtSp7+s+7R/o8NV5tUazM08fD0/Vzj0zF4kktcrFQDvGtOa72liPfBkZd6Lc1Y4sMlSb7wB0CpgIJFUk68nEf1ezw7NpFOealSdEDivHbGdEQ== Received: from BN6PR17CA0046.namprd17.prod.outlook.com (2603:10b6:405:75::35) by LV2PR12MB5797.namprd12.prod.outlook.com (2603:10b6:408:17b::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.22; Sun, 6 Nov 2022 17:47:10 +0000 Received: from BN8NAM11FT034.eop-nam11.prod.protection.outlook.com (2603:10b6:405:75:cafe::6b) by BN6PR17CA0046.outlook.office365.com (2603:10b6:405:75::35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.22 via Frontend Transport; Sun, 6 Nov 2022 17:47:10 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.232) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.232 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.232; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.232) by BN8NAM11FT034.mail.protection.outlook.com (10.13.176.139) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.20 via Frontend Transport; Sun, 6 Nov 2022 17:47:10 +0000 Received: from drhqmail203.nvidia.com (10.126.190.182) by mail.nvidia.com (10.127.129.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.26; Sun, 6 Nov 2022 09:47:09 -0800 Received: from drhqmail202.nvidia.com (10.126.190.181) by drhqmail203.nvidia.com (10.126.190.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.29; Sun, 6 Nov 2022 09:47:09 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.181) with Microsoft SMTP Server id 15.2.986.29 via Frontend Transport; Sun, 6 Nov 2022 09:47:06 -0800 From: Yishai Hadas To: , CC: , , , , , , , , Subject: [PATCH vfio 01/13] vfio: Add an option to get migration data size Date: Sun, 6 Nov 2022 19:46:18 +0200 Message-ID: <20221106174630.25909-2-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221106174630.25909-1-yishaih@nvidia.com> References: <20221106174630.25909-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN8NAM11FT034:EE_|LV2PR12MB5797:EE_ X-MS-Office365-Filtering-Correlation-Id: 6796d641-7697-458e-c4a2-08dac01eee63 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: OFaEoqX3hefs8dakyuPKxB5F2hj+uGIwl5AmJA2XwzxiqznWup7fZSENKaGHNwxY5M6cWabPM4Z7TvVapYysQ0hTwovAMJ4fBnuCYSI/VWCtyBqdBzBWl4LzoYZJdI1k1GeVCzbw/RSViM9vdP2mee5y41XErJGHEVW16cPwcVh7N2cFFaFLzqq6wJiZjwNikALrNHdXpAT1JWeYzhuYFyXkO763BP3grYK4tUwLhtaBic2UW7/cvTPlrLetZIjqe5Z59dSMgX8Xtlqfh+gWi8yfvHsA6nfwMFIdhin58zrojHe8hmTXNtN8yETPiqElH1HONGxzchw/wSBWjJWLyWmafuWddwwhrIrTOoszTr66muJkHTlZ5RsIb4MMO/xLl/1lFwjYFxadvHt0SqWpWsXclYsw8G1JJuTJdtcakxSfi7bz7U8HqFxWDOW+KDTM1gzEFkh2ak1iFfMqcvHk8bGiPItPVif5o13YsAcS1CvKIc/hOXX0VVnW4/eCt4G1I7ksY+bHoA2qdslqs5jdL2BokXUUqWCbr6JOkZ5a7Iwxqq+nh9J8L5e9k74Pq8a/ts3OosTZeO+5Q+Qs/1VC2r+x9r/KTd+JOZtto4sn7iYQq+zurMyq+/+sSlWIYK5nMgDdJGJpjdenpJTlALFBO+gAXSpXR3O2pdwIhDJAYWHeuLLUCrmVzrIvSSu5vaERhiAa79fLei68LlJkANj77AQ8ePlUolrFa+EW+uJNENVIW7buK9FiVk7JMz5CPNZgUdwHliaHpbw5+xZt72X1ZQ== X-Forefront-Antispam-Report: CIP:216.228.118.232;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge1.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(346002)(376002)(396003)(39860400002)(136003)(451199015)(46966006)(40470700004)(36840700001)(8936002)(478600001)(4326008)(41300700001)(26005)(5660300002)(2616005)(186003)(336012)(1076003)(83380400001)(6636002)(40480700001)(70586007)(426003)(47076005)(8676002)(70206006)(6666004)(36756003)(2906002)(356005)(82310400005)(316002)(36860700001)(7636003)(7696005)(40460700003)(82740400003)(54906003)(86362001)(110136005);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Nov 2022 17:47:10.5539 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 6796d641-7697-458e-c4a2-08dac01eee63 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.232];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT034.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: LV2PR12MB5797 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add an option to get migration data size by introducing a new migration feature named VFIO_DEVICE_FEATURE_MIG_DATA_SIZE. Upon VFIO_DEVICE_FEATURE_GET the estimated data length that will be required to complete STOP_COPY is returned. This option may better enable user space to consider before moving to STOP_COPY whether it can meet the downtime SLA based on the returned data. The patch also includes the implementation for mlx5 and hisi for this new option to make it feature complete for the existing drivers in this area. Signed-off-by: Yishai Hadas Reviewed-by: Longfang Liu Reviewed-by: Jason Gunthorpe --- .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 9 ++++++ drivers/vfio/pci/mlx5/main.c | 18 +++++++++++ drivers/vfio/pci/vfio_pci_core.c | 3 +- drivers/vfio/vfio_main.c | 32 +++++++++++++++++++ include/linux/vfio.h | 5 +++ include/uapi/linux/vfio.h | 13 ++++++++ 6 files changed, 79 insertions(+), 1 deletion(-) diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c index 39eeca18a0f7..0c0c0c7f0521 100644 --- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c +++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c @@ -957,6 +957,14 @@ hisi_acc_vfio_pci_set_device_state(struct vfio_device *vdev, return res; } +static int +hisi_acc_vfio_pci_get_data_size(struct vfio_device *vdev, + unsigned long *stop_copy_length) +{ + *stop_copy_length = sizeof(struct acc_vf_data); + return 0; +} + static int hisi_acc_vfio_pci_get_device_state(struct vfio_device *vdev, enum vfio_device_mig_state *curr_state) @@ -1213,6 +1221,7 @@ static void hisi_acc_vfio_pci_close_device(struct vfio_device *core_vdev) static const struct vfio_migration_ops hisi_acc_vfio_pci_migrn_state_ops = { .migration_set_state = hisi_acc_vfio_pci_set_device_state, .migration_get_state = hisi_acc_vfio_pci_get_device_state, + .migration_get_data_size = hisi_acc_vfio_pci_get_data_size, }; static int hisi_acc_vfio_pci_migrn_init_dev(struct vfio_device *core_vdev) diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index fd6ccb8454a2..4c7a39ffd247 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -512,6 +512,23 @@ mlx5vf_pci_set_device_state(struct vfio_device *vdev, return res; } +static int mlx5vf_pci_get_data_size(struct vfio_device *vdev, + unsigned long *stop_copy_length) +{ + struct mlx5vf_pci_core_device *mvdev = container_of( + vdev, struct mlx5vf_pci_core_device, core_device.vdev); + size_t state_size; + int ret; + + mutex_lock(&mvdev->state_mutex); + ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, + &state_size); + if (!ret) + *stop_copy_length = state_size; + mlx5vf_state_mutex_unlock(mvdev); + return ret; +} + static int mlx5vf_pci_get_device_state(struct vfio_device *vdev, enum vfio_device_mig_state *curr_state) { @@ -577,6 +594,7 @@ static void mlx5vf_pci_close_device(struct vfio_device *core_vdev) static const struct vfio_migration_ops mlx5vf_pci_mig_ops = { .migration_set_state = mlx5vf_pci_set_device_state, .migration_get_state = mlx5vf_pci_get_device_state, + .migration_get_data_size = mlx5vf_pci_get_data_size, }; static const struct vfio_log_ops mlx5vf_pci_log_ops = { diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index badc9d828cac..4d97ca66ba6c 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -2128,7 +2128,8 @@ int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev) if (vdev->vdev.mig_ops) { if (!(vdev->vdev.mig_ops->migration_get_state && - vdev->vdev.mig_ops->migration_set_state) || + vdev->vdev.mig_ops->migration_set_state && + vdev->vdev.mig_ops->migration_get_data_size) || !(vdev->vdev.migration_flags & VFIO_MIGRATION_STOP_COPY)) return -EINVAL; } diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index 2d168793d4e1..b118e7b1bc59 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -1256,6 +1256,34 @@ vfio_ioctl_device_feature_mig_device_state(struct vfio_device *device, return 0; } +static int +vfio_ioctl_device_feature_migration_data_size(struct vfio_device *device, + u32 flags, void __user *arg, + size_t argsz) +{ + struct vfio_device_feature_mig_data_size data_size = {}; + unsigned long stop_copy_length; + int ret; + + if (!device->mig_ops) + return -ENOTTY; + + ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_GET, + sizeof(data_size)); + if (ret != 1) + return ret; + + ret = device->mig_ops->migration_get_data_size(device, &stop_copy_length); + if (ret) + return ret; + + data_size.stop_copy_length = stop_copy_length; + if (copy_to_user(arg, &data_size, sizeof(data_size))) + return -EFAULT; + + return 0; +} + static int vfio_ioctl_device_feature_migration(struct vfio_device *device, u32 flags, void __user *arg, size_t argsz) @@ -1483,6 +1511,10 @@ static int vfio_ioctl_device_feature(struct vfio_device *device, return vfio_ioctl_device_feature_logging_report( device, feature.flags, arg->data, feature.argsz - minsz); + case VFIO_DEVICE_FEATURE_MIG_DATA_SIZE: + return vfio_ioctl_device_feature_migration_data_size( + device, feature.flags, arg->data, + feature.argsz - minsz); default: if (unlikely(!device->ops->device_feature)) return -EINVAL; diff --git a/include/linux/vfio.h b/include/linux/vfio.h index e7cebeb875dd..5509451ae709 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -107,6 +107,9 @@ struct vfio_device_ops { * @migration_get_state: Optional callback to get the migration state for * devices that support migration. It's mandatory for * VFIO_DEVICE_FEATURE_MIGRATION migration support. + * @migration_get_data_size: Optional callback to get the estimated data + * length that will be required to complete stop copy. It's mandatory for + * VFIO_DEVICE_FEATURE_MIGRATION migration support. */ struct vfio_migration_ops { struct file *(*migration_set_state)( @@ -114,6 +117,8 @@ struct vfio_migration_ops { enum vfio_device_mig_state new_state); int (*migration_get_state)(struct vfio_device *device, enum vfio_device_mig_state *curr_state); + int (*migration_get_data_size)(struct vfio_device *device, + unsigned long *stop_copy_length); }; /** diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index d7d8e0922376..3e45dbaf190e 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -1128,6 +1128,19 @@ struct vfio_device_feature_dma_logging_report { #define VFIO_DEVICE_FEATURE_DMA_LOGGING_REPORT 8 +/* + * Upon VFIO_DEVICE_FEATURE_GET read back the estimated data length that will + * be required to complete stop copy. + * + * Note: Can be called on each device state. + */ + +struct vfio_device_feature_mig_data_size { + __aligned_u64 stop_copy_length; +}; + +#define VFIO_DEVICE_FEATURE_MIG_DATA_SIZE 9 + /* -------- API for Type1 VFIO IOMMU -------- */ /** From patchwork Sun Nov 6 17:46:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13033517 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A60BCC433FE for ; Sun, 6 Nov 2022 17:47:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230006AbiKFRrS (ORCPT ); Sun, 6 Nov 2022 12:47:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44426 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230398AbiKFRrQ (ORCPT ); Sun, 6 Nov 2022 12:47:16 -0500 Received: from NAM04-MW2-obe.outbound.protection.outlook.com (mail-mw2nam04on2082.outbound.protection.outlook.com [40.107.101.82]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5B9186479 for ; Sun, 6 Nov 2022 09:47:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=k/4rhUFwavHFZcr2ejxVgcU0d8k2cPHZcGgPdawEcSkvlg7c+1S5626TBWY+j94jBoXg5z02074JkJPnGWVtf45BBkYLJcw0LG+FwxT9pE6qgDZrcWEuaOPE5cHWTk3++uwBKsFR5fZrktuOHU3UG16vupvDZpJY9e694pH/oMiFbyF6rlXkUHVdsgKmbYQi4LUIhA3+evQEK1FWyeytSv7IfVUbQ/ff/3EmjsnT1q1oevlEfePIxJUAvQYo7Ru5SVgBwJDoAO11vc9NZYnTJKu+DHALxCguEtgPxQPx6nvcXZNUfS5i2QyzgyIas4XDRX398O4w047QSYbNLdjUPA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=vjm+z5O/62oj3E9OBEYAsU+ZQ0yPw1f9Zb2up/kXkdY=; b=PXvj0srMmhxX1hZKr9ZpD7JWGViIX8RCBhRRvtqcvslCzjqwNXZ2kVObfiEWoOLURzQiVq6mDLr9f+TSc4GQ6lVrIsCOYlWOHPzVYOUhGxBg3TgFeugL9lYHpkMS0S1s56hupKFzTA5rFYSt2gx5ah3VHZqNCYgO5FmnJ9KVtnNLA+qb+32n2or25ey13g25Mc1IxemMLIBfKpDPbWA7CapHhDBQeHT0i7RvbPcQpyRCN8BMUqfPg7+nPY5HXMvzjBbRwZd+3UJIqVXnh9Da6aw5kgfESl7jQ0C/k8r3px7Ta6MMt1ZzhO7QkM7ooNumvIcRoh1fjdVX3nFyqGFAzA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.233) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=vjm+z5O/62oj3E9OBEYAsU+ZQ0yPw1f9Zb2up/kXkdY=; b=dEpmKMgdh7NmV0ESM6oAxWodB6k3ibKL3wIE8n+Lq8AlSjFdNDPBx0d+J4+Ir07zUbDB0SkJ75vfPEd+hjErbGiDcy2eifDcygZit9/afLCdTdmTNC696lltXM3RB5UdxrASTFYW93T9u7hRHF8k4sXWS7lu8ZDlcsXs1G+HDlVen0FS4g/EIyjRkvEaTWDHPngyAwdQxroQ4j8walzCNs+kDjdF8BtojUh1AEUCoyGLf3NVq9A4cpf7ZMGr1giIuLGBP0iwa9VQ+AjS5bUEg3xnNcl7MKHJTKmERgO/mNJKDdmHnJdDKttBDLZD9UgzgmKGKobFBouKKWSzka+w1g== Received: from DM5PR07CA0106.namprd07.prod.outlook.com (2603:10b6:4:ae::35) by DM6PR12MB4417.namprd12.prod.outlook.com (2603:10b6:5:2a4::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.26; Sun, 6 Nov 2022 17:47:13 +0000 Received: from DM6NAM11FT010.eop-nam11.prod.protection.outlook.com (2603:10b6:4:ae:cafe::18) by DM5PR07CA0106.outlook.office365.com (2603:10b6:4:ae::35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.25 via Frontend Transport; Sun, 6 Nov 2022 17:47:13 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.233) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.233 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.233; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.233) by DM6NAM11FT010.mail.protection.outlook.com (10.13.172.222) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.20 via Frontend Transport; Sun, 6 Nov 2022 17:47:13 +0000 Received: from drhqmail201.nvidia.com (10.126.190.180) by mail.nvidia.com (10.127.129.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.26; Sun, 6 Nov 2022 09:47:13 -0800 Received: from drhqmail202.nvidia.com (10.126.190.181) by drhqmail201.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.29; Sun, 6 Nov 2022 09:47:12 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.181) with Microsoft SMTP Server id 15.2.986.29 via Frontend Transport; Sun, 6 Nov 2022 09:47:09 -0800 From: Yishai Hadas To: , CC: , , , , , , , , Subject: [PATCH vfio 02/13] vfio/mlx5: Fix a typo in mlx5vf_cmd_load_vhca_state() Date: Sun, 6 Nov 2022 19:46:19 +0200 Message-ID: <20221106174630.25909-3-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221106174630.25909-1-yishaih@nvidia.com> References: <20221106174630.25909-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DM6NAM11FT010:EE_|DM6PR12MB4417:EE_ X-MS-Office365-Filtering-Correlation-Id: dcc395d5-2014-40e2-ae8e-08dac01ef034 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: ekJCcp2/S5wdA97BVcPNq2RMF9kLMHO89SqF2IONschUCUIQmZBriTBKKqSl7mBU8M/yN76tXoR2Km280OFSOkoW/83bYGtf209SBzXPzscm6JfFe4xi0BSV9LV/z7/XS6R/4Qe4BiWcQ9KoKbm8c0bXVB/oYkQYq49uZCBxRbM/dkWNnx/k5cNG9VgLsPVq+sIjvOAAhrYzRy2synVU5xb6ZndhNzUeYDQG4o800WzXjArvWrg4YyhHTHnmMNzM5hzezlZEl4Dge6d39tGaE0lw3T8by55nHhAqr/8kWrylWlyyxP2N3NTrfmS/6/khqCD8LgdwtSXNdP7dffKRAtnWiyKqTDUn+FXm2Hjo9IEeqJ54tcSmfo10qOck6qpLwsXSvBnOwwNFXllC5ELSKN0YHVZ6ztL9iH0AyYzgKTaz3aLXUnaG2oeHD3OZPksibTiyT2aGzffd07WYlTE5KaiDdjTlR5BZhL6tfl6AlCN9slyhj2o5PZr1tLA8iGUzA/Q2dzmnMZ6mqhmoG8OkqJLGBzIbcCuTvKUrAylu2T8Fz1TYiZqIgWYwzAyFMxaZbEjrPQxVc5/hhfSHBekELQxF7z+UwevlGJ0yantAWLLlyZ8fU7929JGEkPQtC65l5jV9030KSlxr8x5S6Ps8wZ/d2GleQfqtEuqyScqEEhfPV1HR3rhDDZEGTJT8gJIOrJ0CB73WWwuoUHKH/OMkOJriN8UNqXTD+LkV4Aa0vJRZjy23DEhRUw4VvbJKBmpo+oW6mRWDxRDgpRDbpzshgw== X-Forefront-Antispam-Report: CIP:216.228.118.233;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge2.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(376002)(136003)(396003)(39860400002)(346002)(451199015)(40470700004)(36840700001)(46966006)(36756003)(356005)(7636003)(82740400003)(40460700003)(86362001)(40480700001)(478600001)(2906002)(4744005)(6666004)(54906003)(8676002)(6636002)(7696005)(316002)(4326008)(70206006)(70586007)(8936002)(41300700001)(5660300002)(110136005)(82310400005)(36860700001)(2616005)(336012)(1076003)(186003)(26005)(83380400001)(47076005)(426003);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Nov 2022 17:47:13.6476 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: dcc395d5-2014-40e2-ae8e-08dac01ef034 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.233];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT010.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR12MB4417 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Fix a typo in mlx5vf_cmd_load_vhca_state() to use the 'load' memory layout. As in/out sizes are equal for save and load commands there wasn't any functional issue. Fixes: f1d98f346ee3 ("vfio/mlx5: Expose migration commands over mlx5 device") Signed-off-by: Yishai Hadas Reviewed-by: Jason Gunthorpe --- drivers/vfio/pci/mlx5/cmd.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index c604b70437a5..0848bc905d3e 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -378,8 +378,8 @@ int mlx5vf_cmd_load_vhca_state(struct mlx5vf_pci_core_device *mvdev, struct mlx5_vf_migration_file *migf) { struct mlx5_core_dev *mdev; - u32 out[MLX5_ST_SZ_DW(save_vhca_state_out)] = {}; - u32 in[MLX5_ST_SZ_DW(save_vhca_state_in)] = {}; + u32 out[MLX5_ST_SZ_DW(load_vhca_state_out)] = {}; + u32 in[MLX5_ST_SZ_DW(load_vhca_state_in)] = {}; u32 pdn, mkey; int err; From patchwork Sun Nov 6 17:46:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13033518 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E246C433FE for ; Sun, 6 Nov 2022 17:47:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230377AbiKFRrY (ORCPT ); Sun, 6 Nov 2022 12:47:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44588 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230406AbiKFRrV (ORCPT ); Sun, 6 Nov 2022 12:47:21 -0500 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2078.outbound.protection.outlook.com [40.107.92.78]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CEDA864CB for ; Sun, 6 Nov 2022 09:47:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=mvhlMfK4zMqM3oYDdoOXAcZoi9VZ8LSyg9ze9PbCig5UoV/nqhq2ymIApFhGebIoZ3Qhhu2GgWoSfb7n9/ja5xH2a+aMhYp/Qi4fcnyFQcbH75morJv0YnDyeoHW5zdulKspD75dIL03B8mX6xKxwSRts0Ca3he07ZEV1ZOj/E+7xWtozeQ4BKwHwfVpz8867IOG5+koadB+lKM24CCLTBxhplmWlTEoR8OmbVwWO+Z6ay7mSuOhyb7G5EbWisyoYFldeKc6eM0c6PERYyeBfakeJg60MOcINScgJObdyHI7QaMDAvJJ/9DBp96VDdzGi+NUcaFeTjvQYP6k6t70fw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=7TfSCc1q+xE5ZW97VX5TpUrgcJbu476irAj8lTMNRGQ=; b=GcJicjRAYfIbYF/G8MK8oEULv6VHGCjSUu0ljUp4mCaiiXHdk76wc6RAMcXBeEuAb/8yoGsC2gp47hgTYexSE2vcMvxteQ3bUNdGlBXbV5YnSTOPL1R/Uh2ox3hPmHyfwiOPIZwZrmGa9zwLUjD7UTifAPPl/mwklVFXbTxBvIDyU8MukVCMghRMeGPd8gdw4wKxrQOhuTqcIeRjzyJGe2kKFFjFglUrpGKEbXWfjcRMa2dziH/sooU6CDEWvAMDSpyrLKdpEEgKbMUckXhnF6anFAO9TDkp7uqE4chhfrhKwf8DzT34V3bw0ALqINy+Q/Wo+FNByyIhIsomBQjl2A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.233) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=7TfSCc1q+xE5ZW97VX5TpUrgcJbu476irAj8lTMNRGQ=; b=T74hMBzBW81rTnAiXK84C9vDIMoRYhp3gBsrgANMMybJWd6YbmMPPva6bzluTfyOEMQU8h25AvciMMZnc1I8G22lVS5fmKPo0gUIlyy42GFjq10a4ps9eBFxKf58Il42Wy1lO7iOiPn+uQxytU3cI9X75aQRLUvCH4Y0+3+KOmhrE0whWIlMTYDTsCOPh0FDhTa65/Z5BvNMcK3fp/qZ6RUzcclZkhNg+FEMP11DEarmB4IErM28YP+SOvVUenPAh4NAv7EvM9kpDjNBmrTtNm2Z2CJR68U11UiDajo1RLusONvBfWa+F368RKdYCJ54HoP0RwGjqJW7Mwzm+oIzWQ== Received: from DM5PR07CA0085.namprd07.prod.outlook.com (2603:10b6:4:ae::14) by BL1PR12MB5061.namprd12.prod.outlook.com (2603:10b6:208:310::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.25; Sun, 6 Nov 2022 17:47:17 +0000 Received: from DM6NAM11FT010.eop-nam11.prod.protection.outlook.com (2603:10b6:4:ae:cafe::f3) by DM5PR07CA0085.outlook.office365.com (2603:10b6:4:ae::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.22 via Frontend Transport; Sun, 6 Nov 2022 17:47:17 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.233) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.233 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.233; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.233) by DM6NAM11FT010.mail.protection.outlook.com (10.13.172.222) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.20 via Frontend Transport; Sun, 6 Nov 2022 17:47:16 +0000 Received: from drhqmail201.nvidia.com (10.126.190.180) by mail.nvidia.com (10.127.129.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.26; Sun, 6 Nov 2022 09:47:16 -0800 Received: from drhqmail202.nvidia.com (10.126.190.181) by drhqmail201.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.29; Sun, 6 Nov 2022 09:47:15 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.181) with Microsoft SMTP Server id 15.2.986.29 via Frontend Transport; Sun, 6 Nov 2022 09:47:13 -0800 From: Yishai Hadas To: , CC: , , , , , , , , Subject: [PATCH vfio 03/13] net/mlx5: Introduce ifc bits for pre_copy Date: Sun, 6 Nov 2022 19:46:20 +0200 Message-ID: <20221106174630.25909-4-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221106174630.25909-1-yishaih@nvidia.com> References: <20221106174630.25909-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DM6NAM11FT010:EE_|BL1PR12MB5061:EE_ X-MS-Office365-Filtering-Correlation-Id: d9221c6b-2752-4850-c9a0-08dac01ef21f X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Uf9lbB6rRgjEPE1FjZVB8ReAyZP2SZAcl6Os4SpfrIjEENAnF92SlM9LuJ3wTc8AZgTqojwvIDP6RG+Vfya1NTk0cTY6sVpmVtYEqm/GNmOM6bZsZdyzyncjbZ6AKYEt3l+pU1eitoCyqD5rU8lX9jkOsICdvSrSjHVJa8IsE+n0LN83cqWJeIzYZx760goelSBEXis50sAEsGnE4cXt5gnxAm2GUJB90Z1k1aqcBAMYG5lQQY/9X+wSs68cPDPXPI8wpe/8/ofnYs6MEfMlYguYS4JyY7Ncc7tuAHXaBJMhfAWwJlm4Y91gY6FFav/CXdhlpKasnA6KwDfGDBkPWn+r/H2XaxrTDfhbsH7sjb1iTPtYIyYAZTQLOGyjRspAfyB4z625eE2Lu7MqNLQUCEfy5XZhtwfnpD4GsuXjuDRRpagbdPiZlc5sgFLI8F2UpodeVycbvCScJSwi0TyBdIrd4AHokGXQoxfLUmpQWpy8+zQ6AFi1BGaTvQQ6RWr9Ir3szFLf1reuHFwKrv48eYK9/5zZvgc8psJrtMxPkiGMqyQzFtzC1FzFrC95SczPwQVSngRCP7PY6rgF26Y8UDPeR2k1fwQkCi3gVaTWJ6JdwbfCuALaOmUkLyixbmpKpnor6DPZzitB/o+Rkes4YrtxxcOQkmOMeYeZnXRlJcmN0O3SjrYcUsAklXR+hAljwHnBqYXZGrWzDW7llvKhC0wc6Y/s9KI8RW8irxE2LR7VmApr5WbFhNLwXKHAJ6quR01KsXQQvsmwKbskokk2Yw== X-Forefront-Antispam-Report: CIP:216.228.118.233;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge2.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(346002)(39860400002)(376002)(136003)(396003)(451199015)(46966006)(36840700001)(40470700004)(86362001)(5660300002)(36756003)(41300700001)(2906002)(8936002)(82310400005)(6666004)(478600001)(4326008)(6636002)(70206006)(8676002)(26005)(70586007)(316002)(110136005)(54906003)(7696005)(356005)(7636003)(40480700001)(186003)(336012)(1076003)(36860700001)(2616005)(426003)(83380400001)(47076005)(40460700003)(82740400003);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Nov 2022 17:47:16.8505 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: d9221c6b-2752-4850-c9a0-08dac01ef21f X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.233];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT010.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL1PR12MB5061 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Shay Drory Introduce ifc related stuff to enable PRE_COPY of VF during migration. Signed-off-by: Shay Drory Signed-off-by: Yishai Hadas --- include/linux/mlx5/mlx5_ifc.h | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 06eab92b9fb3..9e35b657866c 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -1831,7 +1831,12 @@ struct mlx5_ifc_cmd_hca_cap_2_bits { u8 max_reformat_remove_size[0x8]; u8 max_reformat_remove_offset[0x8]; - u8 reserved_at_c0[0x160]; + u8 reserved_at_c0[0x8]; + u8 migration_multi_load[0x1]; + u8 migration_tracking_state[0x1]; + u8 reserved_at_ca[0x16]; + + u8 reserved_at_e0[0x140]; u8 reserved_at_220[0x1]; u8 sw_vhca_id_valid[0x1]; @@ -11744,7 +11749,8 @@ struct mlx5_ifc_query_vhca_migration_state_in_bits { u8 reserved_at_20[0x10]; u8 op_mod[0x10]; - u8 reserved_at_40[0x10]; + u8 incremental[0x1]; + u8 reserved_at_41[0xf]; u8 vhca_id[0x10]; u8 reserved_at_60[0x20]; @@ -11770,7 +11776,9 @@ struct mlx5_ifc_save_vhca_state_in_bits { u8 reserved_at_20[0x10]; u8 op_mod[0x10]; - u8 reserved_at_40[0x10]; + u8 incremental[0x1]; + u8 set_track[0x1]; + u8 reserved_at_42[0xe]; u8 vhca_id[0x10]; u8 reserved_at_60[0x20]; From patchwork Sun Nov 6 17:46:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13033519 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 76BC0C4332F for ; Sun, 6 Nov 2022 17:47:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230162AbiKFRr0 (ORCPT ); Sun, 6 Nov 2022 12:47:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44576 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230398AbiKFRrY (ORCPT ); Sun, 6 Nov 2022 12:47:24 -0500 Received: from NAM12-DM6-obe.outbound.protection.outlook.com (mail-dm6nam12on2083.outbound.protection.outlook.com [40.107.243.83]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4F6536441 for ; Sun, 6 Nov 2022 09:47:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=iO1cqfEPvT4OVmbQuFvdxmx15ZvUmlKEEVGdTVAHkrL7a0HtwsDGY8ozMXtkZqUDO3iFfaNJb9ZCCU+PR0bZdhFYkdiP3ytCu+rcbHG2nmiwfi7n9BiUqEPtJg+CaaoNhLm1US81QDTJA3mzOMTEcq3JHtn7yLP0hXnhWZX6FJOYQXGb5TruWLbcyC3plx+pqxP0TvvoCU9kqGwwaAuAKFF0vZyicRL4oDIOgzqNUIjmB7HgUKwc9Tihqtx8zJV8eJ+S3U1WI+8qwNyHXHTIkZkPsgZSPXqOz88Mseo5Srd5B8xqtwCHe4xPYs2Y1O/ktyLbHeN3uyUIFKlbEWzOsA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=wn+CAXVEh+49LP4O9QRcHF/PKj2/uSGJRMg/oed3FYI=; b=VOoJDlXAkK6D17g56DPc4yn2iSNsyWowZMNyzNnUgkNFvWJ0pp4hOPub1BRjkED/euwEGttnOnQFBt9v5LEoRO+fv9c0Q7KYCbgonIT1NGm4kCaIuC48nRt9frgU2oGc2vL9RhUdgDEdVhoh8Xg9pXzFgVXKSxqtj9lnoTHahN4Cp9pyfTaxBgUq3yahFliQRnttproJrFZpFSj61neMrjhbFIpHsZjeUuyeLaI9pkLmBAiUxzERtym10uXWqrjQ2CSk8wUhZbSbxHKyazp/MotK+DzD+mUlRQxLLmKXXGLcZzsYNfrIB3AJm1kvHg3iiIFye95V57Xa5iwwhbXzBg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.232) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=wn+CAXVEh+49LP4O9QRcHF/PKj2/uSGJRMg/oed3FYI=; b=jrvdgFf2wNiJCs7fhJSfRLDaiNMO8PXi+ER6CoVKeujAOmgS5VsdOPRtO45PwPnoU66RtNqx6h9C/LZF5S7mm6cRe/pFqeK36vA3E9CkNh+MwDr+YTn3A3wkWfIS9FZAG9lWt1KdLRaLA8nROnF4cZPRn3+R/IR9WhBXY1dat4w/4/PRV/J/viHjUBNXrfU6K5gGOQD75r+5jDorUZkA7gdEWFT1W+rLtMYyUfmTUhkhsgUaWhHUjO9Bo2+EAphvPmfZrSUAcWNsia+lRoXBbwB0wIS97i9PWQcMzYWvfYlk77AAF2DLCCdjw9trogVUOjVaexEY7zeNgioTDNCkMA== Received: from BN0PR03CA0007.namprd03.prod.outlook.com (2603:10b6:408:e6::12) by DS7PR12MB5816.namprd12.prod.outlook.com (2603:10b6:8:78::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.25; Sun, 6 Nov 2022 17:47:20 +0000 Received: from BN8NAM11FT057.eop-nam11.prod.protection.outlook.com (2603:10b6:408:e6:cafe::79) by BN0PR03CA0007.outlook.office365.com (2603:10b6:408:e6::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.25 via Frontend Transport; Sun, 6 Nov 2022 17:47:20 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.232) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.232 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.232; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.232) by BN8NAM11FT057.mail.protection.outlook.com (10.13.177.49) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.20 via Frontend Transport; Sun, 6 Nov 2022 17:47:20 +0000 Received: from drhqmail201.nvidia.com (10.126.190.180) by mail.nvidia.com (10.127.129.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.26; Sun, 6 Nov 2022 09:47:19 -0800 Received: from drhqmail202.nvidia.com (10.126.190.181) by drhqmail201.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.29; Sun, 6 Nov 2022 09:47:18 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.181) with Microsoft SMTP Server id 15.2.986.29 via Frontend Transport; Sun, 6 Nov 2022 09:47:16 -0800 From: Yishai Hadas To: , CC: , , , , , , , , Subject: [PATCH vfio 04/13] vfio: Extend the device migration protocol with PRE_COPY Date: Sun, 6 Nov 2022 19:46:21 +0200 Message-ID: <20221106174630.25909-5-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221106174630.25909-1-yishaih@nvidia.com> References: <20221106174630.25909-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN8NAM11FT057:EE_|DS7PR12MB5816:EE_ X-MS-Office365-Filtering-Correlation-Id: 0d114f7b-5a6a-4173-567e-08dac01ef41b X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: SnsZKAZ6Q8Mn7nNh5/0taEoeSQUPQNQu4UqBaqGSzl/euhK0gXPKgPN/DVZznmTrOQ0igZLL9jC4xeBqFx47bnbTCQjlU33D8Z/T9rqYP8/6bb6/DGxgAVo2tU5KyVrinrVKOPJlcLqm66TpYadS6P168aFl41F8bL3vidsuEZVmB+8OJJLulqdShcSkbRzefQugCbwYO49IkK4nP4rgFZWjWzAIKeq4Og4fCBoC8iZmCYq5xdhbp1tBaY9SEU8PjIw5N4520H8vsWAh46Q5vqiHVKFWfGOFgDTpL10jCWRXaCTO7nW7U2zTh2MbxIrjH6c+1GQWXXZynxROMDjG617PxSfmySETGcF0u0q/FG3UXeanrp5NQTcIGAQuyD2poOgyQjs6iI+fjCLGjMbu/a28dATa39Ru0AX6QgGy/2s9ditPekKBTGj88TiXf7Vxkf/dtp9s+1N9Xm17jQQRujBvz7XI+/pMTf36Qwp3TiIB9fuIT3dqerWpWCE+5UlAOS3B+U4Jg6dmctjIMwe9JTUsY6USr1iCwtUNQHmZoXDgGHUumwPHChQqklzVByG/XnuwGui17gEe7kgWoAQvp7O23RHKlommx/ieNhlesPV08+0g74qh7g+JX7MTtZ5ctR5tZynQutR5zUAYsW8QRWGdzRrqYJB7+nBcYyeJL0jMQiWaxJvFYHWcfcKep8zZJNKOwlt8knEsO7j6YbXketg+3fc04+T0yZZWADp948BCe2yvi7WS+H5VKAeGy7UYIYyz4UxtU27H2wIS2WCPKg== X-Forefront-Antispam-Report: CIP:216.228.118.232;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge1.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(376002)(136003)(39860400002)(396003)(346002)(451199015)(36840700001)(40470700004)(46966006)(40480700001)(478600001)(30864003)(8936002)(5660300002)(36756003)(2906002)(316002)(6636002)(54906003)(110136005)(41300700001)(70586007)(70206006)(8676002)(4326008)(7636003)(83380400001)(356005)(36860700001)(82740400003)(7696005)(6666004)(86362001)(186003)(40460700003)(336012)(47076005)(426003)(1076003)(26005)(2616005)(82310400005);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Nov 2022 17:47:20.0987 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 0d114f7b-5a6a-4173-567e-08dac01ef41b X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.232];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT057.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS7PR12MB5816 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Jason Gunthorpe The optional PRE_COPY states open the saving data transfer FD before reaching STOP_COPY and allows the device to dirty track internal state changes with the general idea to reduce the volume of data transferred in the STOP_COPY stage. While in PRE_COPY the device remains RUNNING, but the saving FD is open. Only if the device also supports RUNNING_P2P can it support PRE_COPY_P2P, which halts P2P transfers while continuing the saving FD. PRE_COPY, with P2P support, requires the driver to implement 7 new arcs and exists as an optional FSM branch between RUNNING and STOP_COPY: RUNNING -> PRE_COPY -> PRE_COPY_P2P -> STOP_COPY A new ioctl VFIO_MIG_GET_PRECOPY_INFO is provided to allow userspace to query the progress of the precopy operation in the driver with the idea it will judge to move to STOP_COPY at least once the initial data set is transferred, and possibly after the dirty size has shrunk appropriately. This ioctl is valid only in PRE_COPY states and kernel driver should return -EINVAL from any other migration state. Compared to the v1 clarification, STOP_COPY -> PRE_COPY is blocked and to be defined in future. We also split the pending_bytes report into the initial and sustaining values, e.g.: initial_bytes and dirty_bytes. initial_bytes: Amount of initial mandatory precopy data. dirty_bytes: device state changes relative to data previously retrieved. These fields are not required to have any bearing to STOP_COPY phase. Signed-off-by: Jason Gunthorpe Signed-off-by: Shay Drory Signed-off-by: Yishai Hadas --- drivers/vfio/vfio_main.c | 74 ++++++++++++++++++++++- include/uapi/linux/vfio.h | 122 ++++++++++++++++++++++++++++++++++++-- 2 files changed, 190 insertions(+), 6 deletions(-) diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index b118e7b1bc59..c58a795510f9 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -1056,7 +1056,7 @@ int vfio_mig_get_next_state(struct vfio_device *device, enum vfio_device_mig_state new_fsm, enum vfio_device_mig_state *next_fsm) { - enum { VFIO_DEVICE_NUM_STATES = VFIO_DEVICE_STATE_RUNNING_P2P + 1 }; + enum { VFIO_DEVICE_NUM_STATES = VFIO_DEVICE_STATE_PRE_COPY_P2P + 1 }; /* * The coding in this table requires the driver to implement the * following FSM arcs: @@ -1071,30 +1071,65 @@ int vfio_mig_get_next_state(struct vfio_device *device, * RUNNING_P2P -> RUNNING * RUNNING_P2P -> STOP * STOP -> RUNNING_P2P - * Without P2P the driver must implement: + * + * If precopy is supported then the driver must support these additional + * FSM arcs: + * RUNNING -> PRE_COPY + * PRE_COPY -> RUNNING + * PRE_COPY -> STOP_COPY + * However, if precopy and P2P are supported together then the driver + * must support these additional arcs beyond the P2P arcs above: + * PRE_COPY -> RUNNING + * PRE_COPY -> PRE_COPY_P2P + * PRE_COPY_P2P -> PRE_COPY + * PRE_COPY_P2P -> RUNNING_P2P + * PRE_COPY_P2P -> STOP_COPY + * RUNNING -> PRE_COPY + * RUNNING_P2P -> PRE_COPY_P2P + * + * Without P2P and precopy the driver must implement: * RUNNING -> STOP * STOP -> RUNNING * * The coding will step through multiple states for some combination * transitions; if all optional features are supported, this means the * following ones: + * PRE_COPY -> PRE_COPY_P2P -> STOP_COPY + * PRE_COPY -> RUNNING -> RUNNING_P2P + * PRE_COPY -> RUNNING -> RUNNING_P2P -> STOP + * PRE_COPY -> RUNNING -> RUNNING_P2P -> STOP -> RESUMING + * PRE_COPY_P2P -> RUNNING_P2P -> RUNNING + * PRE_COPY_P2P -> RUNNING_P2P -> STOP + * PRE_COPY_P2P -> RUNNING_P2P -> STOP -> RESUMING * RESUMING -> STOP -> RUNNING_P2P + * RESUMING -> STOP -> RUNNING_P2P -> PRE_COPY_P2P * RESUMING -> STOP -> RUNNING_P2P -> RUNNING + * RESUMING -> STOP -> RUNNING_P2P -> RUNNING -> PRE_COPY * RESUMING -> STOP -> STOP_COPY + * RUNNING -> RUNNING_P2P -> PRE_COPY_P2P * RUNNING -> RUNNING_P2P -> STOP * RUNNING -> RUNNING_P2P -> STOP -> RESUMING * RUNNING -> RUNNING_P2P -> STOP -> STOP_COPY + * RUNNING_P2P -> RUNNING -> PRE_COPY * RUNNING_P2P -> STOP -> RESUMING * RUNNING_P2P -> STOP -> STOP_COPY + * STOP -> RUNNING_P2P -> PRE_COPY_P2P * STOP -> RUNNING_P2P -> RUNNING + * STOP -> RUNNING_P2P -> RUNNING -> PRE_COPY * STOP_COPY -> STOP -> RESUMING * STOP_COPY -> STOP -> RUNNING_P2P * STOP_COPY -> STOP -> RUNNING_P2P -> RUNNING + * + * The following transitions are blocked: + * STOP_COPY -> PRE_COPY + * STOP_COPY -> PRE_COPY_P2P */ static const u8 vfio_from_fsm_table[VFIO_DEVICE_NUM_STATES][VFIO_DEVICE_NUM_STATES] = { [VFIO_DEVICE_STATE_STOP] = { [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_STOP, [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING_P2P, + [VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_RUNNING_P2P, + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P, [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP_COPY, [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_RESUMING, [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P, @@ -1103,14 +1138,38 @@ int vfio_mig_get_next_state(struct vfio_device *device, [VFIO_DEVICE_STATE_RUNNING] = { [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_RUNNING_P2P, [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING, + [VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_PRE_COPY, + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P, [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_RUNNING_P2P, [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_RUNNING_P2P, [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P, [VFIO_DEVICE_STATE_ERROR] = VFIO_DEVICE_STATE_ERROR, }, + [VFIO_DEVICE_STATE_PRE_COPY] = { + [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_RUNNING, + [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING, + [VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_PRE_COPY, + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_PRE_COPY_P2P, + [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_PRE_COPY_P2P, + [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_RUNNING, + [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_RUNNING, + [VFIO_DEVICE_STATE_ERROR] = VFIO_DEVICE_STATE_ERROR, + }, + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = { + [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_RUNNING_P2P, + [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING_P2P, + [VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_PRE_COPY, + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_PRE_COPY_P2P, + [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP_COPY, + [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_RUNNING_P2P, + [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P, + [VFIO_DEVICE_STATE_ERROR] = VFIO_DEVICE_STATE_ERROR, + }, [VFIO_DEVICE_STATE_STOP_COPY] = { [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_STOP, [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_STOP, + [VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_ERROR, + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_ERROR, [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP_COPY, [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_STOP, [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_STOP, @@ -1119,6 +1178,8 @@ int vfio_mig_get_next_state(struct vfio_device *device, [VFIO_DEVICE_STATE_RESUMING] = { [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_STOP, [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_STOP, + [VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_STOP, + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_STOP, [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP, [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_RESUMING, [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_STOP, @@ -1127,6 +1188,8 @@ int vfio_mig_get_next_state(struct vfio_device *device, [VFIO_DEVICE_STATE_RUNNING_P2P] = { [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_STOP, [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING, + [VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_RUNNING, + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_PRE_COPY_P2P, [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP, [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_STOP, [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P, @@ -1135,6 +1198,8 @@ int vfio_mig_get_next_state(struct vfio_device *device, [VFIO_DEVICE_STATE_ERROR] = { [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_ERROR, [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_ERROR, + [VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_ERROR, + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_ERROR, [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_ERROR, [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_ERROR, [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_ERROR, @@ -1145,6 +1210,11 @@ int vfio_mig_get_next_state(struct vfio_device *device, static const unsigned int state_flags_table[VFIO_DEVICE_NUM_STATES] = { [VFIO_DEVICE_STATE_STOP] = VFIO_MIGRATION_STOP_COPY, [VFIO_DEVICE_STATE_RUNNING] = VFIO_MIGRATION_STOP_COPY, + [VFIO_DEVICE_STATE_PRE_COPY] = + VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_PRE_COPY, + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_MIGRATION_STOP_COPY | + VFIO_MIGRATION_P2P | + VFIO_MIGRATION_PRE_COPY, [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_MIGRATION_STOP_COPY, [VFIO_DEVICE_STATE_RESUMING] = VFIO_MIGRATION_STOP_COPY, [VFIO_DEVICE_STATE_RUNNING_P2P] = diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 3e45dbaf190e..fca8e1b7e619 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -819,12 +819,20 @@ struct vfio_device_feature { * VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P means that RUNNING_P2P * is supported in addition to the STOP_COPY states. * + * VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_PRE_COPY means that + * PRE_COPY is supported in addition to the STOP_COPY states. + * + * VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P | VFIO_MIGRATION_PRE_COPY + * means that RUNNING_P2P, PRE_COPY and PRE_COPY_P2P are supported + * in addition to the STOP_COPY states. + * * Other combinations of flags have behavior to be defined in the future. */ struct vfio_device_feature_migration { __aligned_u64 flags; #define VFIO_MIGRATION_STOP_COPY (1 << 0) #define VFIO_MIGRATION_P2P (1 << 1) +#define VFIO_MIGRATION_PRE_COPY (1 << 2) }; #define VFIO_DEVICE_FEATURE_MIGRATION 1 @@ -875,8 +883,13 @@ struct vfio_device_feature_mig_state { * RESUMING - The device is stopped and is loading a new internal state * ERROR - The device has failed and must be reset * - * And 1 optional state to support VFIO_MIGRATION_P2P: + * And optional states to support VFIO_MIGRATION_P2P: * RUNNING_P2P - RUNNING, except the device cannot do peer to peer DMA + * And VFIO_MIGRATION_PRE_COPY: + * PRE_COPY - The device is running normally but tracking internal state + * changes + * And VFIO_MIGRATION_P2P | VFIO_MIGRATION_PRE_COPY: + * PRE_COPY_P2P - PRE_COPY, except the device cannot do peer to peer DMA * * The FSM takes actions on the arcs between FSM states. The driver implements * the following behavior for the FSM arcs: @@ -908,20 +921,48 @@ struct vfio_device_feature_mig_state { * * To abort a RESUMING session the device must be reset. * + * PRE_COPY -> RUNNING * RUNNING_P2P -> RUNNING * While in RUNNING the device is fully operational, the device may generate * interrupts, DMA, respond to MMIO, all vfio device regions are functional, * and the device may advance its internal state. * + * The PRE_COPY arc will terminate a data transfer session. + * + * PRE_COPY_P2P -> RUNNING_P2P * RUNNING -> RUNNING_P2P * STOP -> RUNNING_P2P * While in RUNNING_P2P the device is partially running in the P2P quiescent * state defined below. * + * The PRE_COPY_P2P arc will terminate a data transfer session. + * + * RUNNING -> PRE_COPY + * RUNNING_P2P -> PRE_COPY_P2P * STOP -> STOP_COPY - * This arc begin the process of saving the device state and will return a - * new data_fd. + * PRE_COPY, PRE_COPY_P2P and STOP_COPY form the "saving group" of states + * which share a data transfer session. Moving between these states alters + * what is streamed in session, but does not terminate or otherwise affect + * the associated fd. + * + * These arcs begin the process of saving the device state and will return a + * new data_fd. The migration driver may perform actions such as enabling + * dirty logging of device state when entering PRE_COPY or PER_COPY_P2P. * + * Each arc does not change the device operation, the device remains + * RUNNING, P2P quiesced or in STOP. The STOP_COPY state is described below + * in PRE_COPY_P2P -> STOP_COPY. + * + * PRE_COPY -> PRE_COPY_P2P + * Entering PRE_COPY_P2P continues all the behaviors of PRE_COPY above. + * However, while in the PRE_COPY_P2P state, the device is partially running + * in the P2P quiescent state defined below, like RUNNING_P2P. + * + * PRE_COPY_P2P -> PRE_COPY + * This arc allows returning the device to a full RUNNING behavior while + * continuing all the behaviors of PRE_COPY. + * + * PRE_COPY_P2P -> STOP_COPY * While in the STOP_COPY state the device has the same behavior as STOP * with the addition that the data transfers session continues to stream the * migration state. End of stream on the FD indicates the entire device @@ -939,6 +980,13 @@ struct vfio_device_feature_mig_state { * device state for this arc if required to prepare the device to receive the * migration data. * + * STOP_COPY -> PRE_COPY + * STOP_COPY -> PRE_COPY_P2P + * These arcs are not permitted and return error if requested. Future + * revisions of this API may define behaviors for these arcs, in this case + * support will be discoverable by a new flag in + * VFIO_DEVICE_FEATURE_MIGRATION. + * * any -> ERROR * ERROR cannot be specified as a device state, however any transition request * can be failed with an errno return and may then move the device_state into @@ -950,7 +998,7 @@ struct vfio_device_feature_mig_state { * The optional peer to peer (P2P) quiescent state is intended to be a quiescent * state for the device for the purposes of managing multiple devices within a * user context where peer-to-peer DMA between devices may be active. The - * RUNNING_P2P states must prevent the device from initiating + * RUNNING_P2P and PRE_COPY_P2P states must prevent the device from initiating * any new P2P DMA transactions. If the device can identify P2P transactions * then it can stop only P2P DMA, otherwise it must stop all DMA. The migration * driver must complete any such outstanding operations prior to completing the @@ -963,6 +1011,8 @@ struct vfio_device_feature_mig_state { * above FSM arcs. As there are multiple paths through the FSM arcs the path * should be selected based on the following rules: * - Select the shortest path. + * - The path cannot have saving group states as interior arcs, only + * starting/end states. * Refer to vfio_mig_get_next_state() for the result of the algorithm. * * The automatic transit through the FSM arcs that make up the combination @@ -976,6 +1026,9 @@ struct vfio_device_feature_mig_state { * support them. The user can discover if these states are supported by using * VFIO_DEVICE_FEATURE_MIGRATION. By using combination transitions the user can * avoid knowing about these optional states if the kernel driver supports them. + * + * Arcs touching PRE_COPY and PRE_COPY_P2P are removed if support for PRE_COPY + * is not present. */ enum vfio_device_mig_state { VFIO_DEVICE_STATE_ERROR = 0, @@ -984,8 +1037,69 @@ enum vfio_device_mig_state { VFIO_DEVICE_STATE_STOP_COPY = 3, VFIO_DEVICE_STATE_RESUMING = 4, VFIO_DEVICE_STATE_RUNNING_P2P = 5, + VFIO_DEVICE_STATE_PRE_COPY = 6, + VFIO_DEVICE_STATE_PRE_COPY_P2P = 7, +}; + +/** + * VFIO_MIG_GET_PRECOPY_INFO - _IO(VFIO_TYPE, VFIO_BASE + 21) + * + * This ioctl is used on the migration data FD in the precopy phase of the + * migration data transfer. It returns an estimate of the current data sizes + * remaining to be transferred. It allows the user to judge when it is + * appropriate to leave PRE_COPY for STOP_COPY. + * + * This ioctl is valid only in PRE_COPY states and kernel driver should + * return -EINVAL from any other migration state. + * + * The vfio_precopy_info data structure returned by this ioctl provides + * estimates of data available from the device during the PRE_COPY states. + * This estimate is split into two categories, initial_bytes and + * dirty_bytes. + * + * The initial_bytes field indicates the amount of initial mandatory precopy + * data available from the device. This field should have a non-zero initial + * value and decrease as migration data is read from the device. + * It is a must to leave PRE_COPY for STOP_COPY only after this field reach + * zero. + * + * The dirty_bytes field tracks device state changes relative to data + * previously retrieved. This field starts at zero and may increase as + * the internal device state is modified or decrease as that modified + * state is read from the device. + * + * Userspace may use the combination of these fields to estimate the + * potential data size available during the PRE_COPY phases, as well as + * trends relative to the rate the device is dirtying its internal + * state, but these fields are not required to have any bearing relative + * to the data size available during the STOP_COPY phase. + * + * Drivers have a lot of flexibility in when and what they transfer during the + * PRE_COPY phase, and how they report this from VFIO_MIG_GET_PRECOPY_INFO. + * + * During pre-copy the migration data FD has a temporary "end of stream" that is + * reached when both initial_bytes and dirty_byte are zero. For instance, this + * may indicate that the device is idle and not currently dirtying any internal + * state. When read() is done on this temporary end of stream the kernel driver + * should return ENOMSG from read(). Userspace can wait for more data (which may + * never come) by using poll. + * + * Once in STOP_COPY the migration data FD has a permanent end of stream + * signaled in the usual way by read() always returning 0 and poll always + * returning readable. ENOMSG may not be returned in STOP_COPY. Support + * for this ioctl is optional. + * + * Return: 0 on success, -1 and errno set on failure. + */ +struct vfio_precopy_info { + __u32 argsz; + __u32 flags; + __aligned_u64 initial_bytes; + __aligned_u64 dirty_bytes; }; +#define VFIO_MIG_GET_PRECOPY_INFO _IO(VFIO_TYPE, VFIO_BASE + 21) + /* * Upon VFIO_DEVICE_FEATURE_SET, allow the device to be moved into a low power * state with the platform-based power management. Device use of lower power From patchwork Sun Nov 6 17:46:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13033520 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0B04C433FE for ; Sun, 6 Nov 2022 17:47:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230451AbiKFRrd (ORCPT ); Sun, 6 Nov 2022 12:47:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44882 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230388AbiKFRra (ORCPT ); Sun, 6 Nov 2022 12:47:30 -0500 Received: from NAM11-BN8-obe.outbound.protection.outlook.com (mail-bn8nam11on2041.outbound.protection.outlook.com [40.107.236.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 332B364CB for ; Sun, 6 Nov 2022 09:47:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=NLZWO/Gb6TZcGewhMGee0mfbGVRV9KLtcVoiv5F2+Yh3roiCeuGKRwbM3Uy8odtBMny/euG/ilGumRLKkPCwCwIyscrjTFfPB+rAXkTdYGGRReyTWcp3HGHNoaSCNYDAtjjADAgwohjM7f2IbaDDPacyPVMiOGZS+i72wVB6oOMTwsVShOEK9m8tj47Ab5OZPvBNt3aoIwEtfNQwWmDMsP1lScywoUuvDCrIKUyPIvOctaPF8xMDjugx2nrWoWTZid6ciffUkNkGLY7cEXKM/611HkYV2/MJ5cYyAlFnVwnCyau9n3wPfmnnwVV8E697cePfcK51zm9BI9d2rlJy3g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=EyMf95wUkBQAiYj8yP+CguzBhX8VxgQsNU1Lx+uLNqg=; b=Rh/5Er4714+FaE4Lyc/GxaXCf9PYY2oiOJ3GyK26gpBcmgzccN2GSAdDVQI1h5qnTlZ6b4H+LTTwnjiOKq5KHw4r7MpFOO5bRO0zY+GqXVPeCC6EE1OmbxjKn9Zk6uEeDfBszpD8Ov/mtqUTmTSn2EWQHUkKz+3yeD2wm0AD8XaZ1YjtBj/gjXHmcnh0F+Ld0pvhzJY4WXZAEN/59WocEOF3nYxlgnbIyhEn60BfHZoU7wm7PdFRsggJaQXInVW22hDiM2ql1OukvDU8zj23cSuWN8Kbo19AF9Xu1mwVu+g0gE0mc3HOtK+cIKag9kBcrYlw2sdmNcREFfp/slUcww== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.232) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=EyMf95wUkBQAiYj8yP+CguzBhX8VxgQsNU1Lx+uLNqg=; b=d17ffT9nCmpkntg+dPYRLF4lGp+6AIqlz0lbwoLQz0Rc2Us+TKbJAKhMY5T441FonVAAvz6ecdeuhunDlLXchxhk0MqTygg6mbWyYTDxMrQTcD36fg2yWpJA7Neu3RSNTaqU+6+GPodal+zyYbSzw7qkU02DFmRtbLkPz2AQAL5eon6h48IKf15iCvPF9o7bDYZQgfYhMu6lF9GrEAPQWBAUzBOBmarzSmx9HZ1Hrc7K4M1BRZWu+hqJWNPz1WEKPnYRaSvICk2ZszXAPCDBs0wu7cyM2D4AWvR39y1XoDGIEEESLXwOFxTW087ArP61QcxU1Je3LahNICRGRrWODw== Received: from BN9PR03CA0655.namprd03.prod.outlook.com (2603:10b6:408:13b::30) by PH0PR12MB5418.namprd12.prod.outlook.com (2603:10b6:510:e5::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.24; Sun, 6 Nov 2022 17:47:24 +0000 Received: from BN8NAM11FT085.eop-nam11.prod.protection.outlook.com (2603:10b6:408:13b:cafe::9a) by BN9PR03CA0655.outlook.office365.com (2603:10b6:408:13b::30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.25 via Frontend Transport; Sun, 6 Nov 2022 17:47:24 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.232) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.232 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.232; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.232) by BN8NAM11FT085.mail.protection.outlook.com (10.13.176.100) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.20 via Frontend Transport; Sun, 6 Nov 2022 17:47:23 +0000 Received: from drhqmail203.nvidia.com (10.126.190.182) by mail.nvidia.com (10.127.129.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.26; Sun, 6 Nov 2022 09:47:22 -0800 Received: from drhqmail202.nvidia.com (10.126.190.181) by drhqmail203.nvidia.com (10.126.190.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.29; Sun, 6 Nov 2022 09:47:22 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.181) with Microsoft SMTP Server id 15.2.986.29 via Frontend Transport; Sun, 6 Nov 2022 09:47:19 -0800 From: Yishai Hadas To: , CC: , , , , , , , , Subject: [PATCH vfio 05/13] vfio/mlx5: Enforce a single SAVE command at a time Date: Sun, 6 Nov 2022 19:46:22 +0200 Message-ID: <20221106174630.25909-6-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221106174630.25909-1-yishaih@nvidia.com> References: <20221106174630.25909-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN8NAM11FT085:EE_|PH0PR12MB5418:EE_ X-MS-Office365-Filtering-Correlation-Id: ad26d005-4d89-4b1b-eabb-08dac01ef642 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: GBYkwwT1PGPkkI8mTPAoDSF6QA0PXF0iywfnxw6gPJ3ugBy97Wp4Pwyis5mBVKJ0TPRMnGO79VTcJ+OCESlAaUxO9XU2kobjIT3KPvDh45hY25SSipZ7c8gDpDjelxCWduqo91ovgakp6acLolX89SA7VVwOTbks1lGovXhBzjjXNVrGIgri07/ab+dClFnEd+vr3s6CAw6AIZlVR2Jkz82/lWpo8hkfGFeezashCh7NSogtXXWDyCF6yNtzu2pCNQAPJzUCo9TAoA5Fmr758pSDx5rDU1R7xf1c3yQgGpDZbda22ZXdVIy3VDXrX18ZB51yqInCfWzAh0IA0ZbLeRsVKqy8zRs9PD6BuM2E5sZb4jKVUdFAv9V7jFUQIrKl8MGyN2jqpLz2pXNqTeeP0+CR64xEcQY6liV4INKt/vkyuJYQL4Ewul2+z7HmQF4HfoqxaAjbMlnuWJwSsWTtKFXePOGw+gsKpbmkG7Bmy0oUKOl6pvPqT7FoapyH8Z0fmCeV+YP1M+tezwFSsY19VlfyOJ0Id48UL12p7cCtDWG54aZXAfe3YzxcY+kEPPAj7DasuGezWBNWuka3U60zBPXrj9pXzoBuO4vkAP4nnbI30soZhLw8L/BJ3MSFiLmq8tKqXGxkyNRCQnC1VkN5FMkZWnMMnYhkdtf/pSISl06W8NwAIirdoVQcC1S2EOOuhm3s91GswNmmgMik1eDxxmHk/43HN79qs6MSooWbYVnmF0z21o1qk9ros5yCEo2ENFmN2o9e8JLNMO2lqpVqYQ== X-Forefront-Antispam-Report: CIP:216.228.118.232;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge1.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(396003)(39860400002)(376002)(346002)(136003)(451199015)(36840700001)(46966006)(40470700004)(5660300002)(478600001)(41300700001)(8936002)(66899015)(2616005)(40480700001)(26005)(1076003)(186003)(2906002)(47076005)(426003)(336012)(4326008)(36756003)(70206006)(8676002)(70586007)(83380400001)(36860700001)(40460700003)(82310400005)(82740400003)(7696005)(86362001)(356005)(6636002)(110136005)(316002)(7636003)(6666004)(54906003);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Nov 2022 17:47:23.7447 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: ad26d005-4d89-4b1b-eabb-08dac01ef642 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.232];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT085.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH0PR12MB5418 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Enforce a single SAVE command at a time. As the SAVE command is an asynchronous one, we must enforce running only a single command at a time. This will preserve ordering between multiple calls and protect from races on the migration file data structure. This is a must for the next patches from the series where as part of PRE_COPY we may have multiple images to be saved and multiple SAVE commands may be issued from different flows. Signed-off-by: Yishai Hadas --- drivers/vfio/pci/mlx5/cmd.c | 5 +++++ drivers/vfio/pci/mlx5/cmd.h | 2 ++ drivers/vfio/pci/mlx5/main.c | 1 + 3 files changed, 8 insertions(+) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index 0848bc905d3e..b9ed2f1c8689 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -281,6 +281,8 @@ void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work) dma_unmap_sgtable(mdev->device, &migf->table.sgt, DMA_FROM_DEVICE, 0); mlx5_core_dealloc_pd(mdev, async_data->pdn); kvfree(async_data->out); + migf->save_cb_active = false; + wake_up(&migf->save_wait); fput(migf->filp); } @@ -321,6 +323,7 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, return -ENOTCONN; mdev = mvdev->mdev; + wait_event(migf->save_wait, !migf->save_cb_active); err = mlx5_core_alloc_pd(mdev, &pdn); if (err) return err; @@ -353,6 +356,7 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, get_file(migf->filp); async_data->mkey = mkey; async_data->pdn = pdn; + migf->save_cb_active = true; err = mlx5_cmd_exec_cb(&migf->async_ctx, in, sizeof(in), async_data->out, out_size, mlx5vf_save_callback, @@ -371,6 +375,7 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, dma_unmap_sgtable(mdev->device, &migf->table.sgt, DMA_FROM_DEVICE, 0); err_dma_map: mlx5_core_dealloc_pd(mdev, pdn); + migf->save_cb_active = false; return err; } diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index 921d5720a1e5..b1c5dd2ff144 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -26,6 +26,7 @@ struct mlx5_vf_migration_file { struct mutex lock; u8 disabled:1; u8 is_err:1; + u8 save_cb_active:1; struct sg_append_table table; size_t total_length; @@ -37,6 +38,7 @@ struct mlx5_vf_migration_file { unsigned long last_offset; struct mlx5vf_pci_core_device *mvdev; wait_queue_head_t poll_wait; + wait_queue_head_t save_wait; struct mlx5_async_ctx async_ctx; struct mlx5vf_async_data async_data; }; diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index 4c7a39ffd247..5da278f3c31c 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -245,6 +245,7 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev) stream_open(migf->filp->f_inode, migf->filp); mutex_init(&migf->lock); init_waitqueue_head(&migf->poll_wait); + init_waitqueue_head(&migf->save_wait); mlx5_cmd_init_async_ctx(mvdev->mdev, &migf->async_ctx); INIT_WORK(&migf->async_data.work, mlx5vf_mig_file_cleanup_cb); ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, From patchwork Sun Nov 6 17:46:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13033521 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1CD6AC433FE for ; Sun, 6 Nov 2022 17:47:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230491AbiKFRrh (ORCPT ); Sun, 6 Nov 2022 12:47:37 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44854 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230473AbiKFRrb (ORCPT ); Sun, 6 Nov 2022 12:47:31 -0500 Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2085.outbound.protection.outlook.com [40.107.223.85]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 85382645B for ; Sun, 6 Nov 2022 09:47:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Y1G6jZouTet0M71+FIDvpP+SeEEumqTqprzmrkxoW/lnuuoMhWHGI2Jx0aVvn1cimu8TNyQgKfxb5Ct1kgDDkVps4KltONdd9YhcpVsQoCmOyIDDY3gCCg5nkwhBou8Gxjd8/Hot2ar1FphWdj5NqIXEa6sFl0atLYAk/10u6+ZTPqpe0+z3QZOJzaIULbumQe3jkf9JgeGGN9auj51BEWFURlHvh1cU01u37QLh6UC5Vz824f9WqJjNA3FSo+tKNHRZPD4AAKXkid9+h8MjRApSkoj88LUBBJ58uVZ55RPGKFRO9UlbvCVVK4YQK3OMtrwYXQEChmSA6gKyF7fNOg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=YpnCO5YTassfq8N1pFKeu7XxPZuoZDSG3fe6y3LnAjw=; b=W2kpDOSCz8SiYWWokLjL28k6nOXbcuZwmV/3cV9jlYjPYpIs0SbbfabmB4xoJQrF1XMNiVDCwmpgvAyWaFs3hPox/bTVTmN+7cQbfLx14gaN7KZ+aYT9Z/9Ky7UGfc/EtGYJLkcgP79won1X5X5tkGZ3kOKWCR69BcfM5W8PAnEu/VEkTkIJil+OivG50bphrRhrptcD5ijCd4w7H7ghftMuIGCwSXUtWr3iPJA8IbaW1ftuZ07K8+iP1EV4AbOUiRy0SfxPoV+eBKFTsGHWr7BTKIeGnO4xVbXjoUczva+avkqimuwJfYkXfgYHE6vY480edjrx+Xz0J0Ebax57KQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.232) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=YpnCO5YTassfq8N1pFKeu7XxPZuoZDSG3fe6y3LnAjw=; b=cqlawIfLUXJ/YDBwu879H5mwFQAlZPj5jO79yfzZSGA6JSQcR153B66oJ+B8KcZd4N38HRRyGFRM0yxnLNywdaT6wi4fr4kKTzgAHB/TYSXQMu+5MNdsaAZ+yGJYwqDCxPoIL8s0tyZoB0EblrcmdJPE3yONHhmAsdsMR4B52QacmkOv72Yky++X73lEAPwvnsjS7KJnl/YvZCCaRal9gLKT1cwEgrJ0XSUx5nivrdYM0Fd1agvN/C4iz0iG+JKOYLI3ZAfV3eGdwZ1RVLBGL/lV3vG46YUNFfMcH/GZ1xLywFcw08gFNz0bbilRz25i1zzoGzo7+hhO+6Jm6BBUZQ== Received: from BN9PR03CA0648.namprd03.prod.outlook.com (2603:10b6:408:13b::23) by SN7PR12MB6789.namprd12.prod.outlook.com (2603:10b6:806:26b::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.25; Sun, 6 Nov 2022 17:47:27 +0000 Received: from BN8NAM11FT085.eop-nam11.prod.protection.outlook.com (2603:10b6:408:13b:cafe::c6) by BN9PR03CA0648.outlook.office365.com (2603:10b6:408:13b::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.25 via Frontend Transport; Sun, 6 Nov 2022 17:47:26 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.232) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.232 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.232; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.232) by BN8NAM11FT085.mail.protection.outlook.com (10.13.176.100) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.20 via Frontend Transport; Sun, 6 Nov 2022 17:47:26 +0000 Received: from drhqmail203.nvidia.com (10.126.190.182) by mail.nvidia.com (10.127.129.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.26; Sun, 6 Nov 2022 09:47:25 -0800 Received: from drhqmail202.nvidia.com (10.126.190.181) by drhqmail203.nvidia.com (10.126.190.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.29; Sun, 6 Nov 2022 09:47:25 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.181) with Microsoft SMTP Server id 15.2.986.29 via Frontend Transport; Sun, 6 Nov 2022 09:47:22 -0800 From: Yishai Hadas To: , CC: , , , , , , , , Subject: [PATCH vfio 06/13] vfio/mlx5: Refactor total_length name and usage Date: Sun, 6 Nov 2022 19:46:23 +0200 Message-ID: <20221106174630.25909-7-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221106174630.25909-1-yishaih@nvidia.com> References: <20221106174630.25909-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN8NAM11FT085:EE_|SN7PR12MB6789:EE_ X-MS-Office365-Filtering-Correlation-Id: ef7192a0-30c0-408f-4551-08dac01ef80a X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: MHZdI9mlsz314s7YgReOg1zh5O93Num5gHYaoaMzneCGEfCupdUOaevOXl0tcVGqGMDEWZZ9trlX+vMlDpO0bRNDTrolS75SysmvCZ+2YX0bF67VIrfCziBz4Xw9pCEQaDkzGhdJzWVB1345qQbbU9onrHzpMUWVGMgBY1GPGWdvlNqaiOM3uY2f5ouPVAHIazUFxLMW1PD3wNkXHjZ1ycq7Znbm9MyPyAmTYxHTA/k0Kv0X25I8gMCYLlkNMPvfppCcZubenoVBb/NucW74GI8r3NxB8mSjTx5ALv63rM9ia7LdELgGB1KnLE9sqbIFG7X/8CkqG18YSr0BB2lhut+G8+gsDBK3SR3ii5K8CIC5K8rKP8jRNO9jsZvoGLC+n0eN+EKzaio9lxjftasYB5cMao0DLFTmv7IDXKu25sKcbaIj0S6xOZlAoeazbe9/VkizjnQB8AQzNCx6W6a9eJ3v2k9Qh3kDMXRybiXgCvi/wVjwf0HSODJv2i6ZdobbS2vTWhA3kb4EqvCWzQh+PDALmP2hOn81OzaSGcHJql4WL3Tz03JBrfqg9X/kauwZHIyakjTubb/da5bMfdLiYYyjAi/gy3mL3bCEutlFvm+3dktC7F26QubzX8u0lQVLLI3DSKAMTebbgPcymsFUKgGfRQlm4Dr1l9+v4HvRxtwbzmeU7eeP41zw2NErWEd3QNG87YB4XIPox6qsNVhvH3uS5VQJgYzcwe0lgOKxEow2VtwxFmcyaQT93Ovot4lJ1cEqUR+nqjIvCZF75j/aWg== X-Forefront-Antispam-Report: CIP:216.228.118.232;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge1.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(346002)(396003)(136003)(376002)(39860400002)(451199015)(40470700004)(36840700001)(46966006)(82310400005)(36756003)(110136005)(54906003)(6636002)(316002)(5660300002)(40480700001)(83380400001)(2906002)(7636003)(356005)(478600001)(70586007)(41300700001)(70206006)(4326008)(8676002)(8936002)(82740400003)(2616005)(186003)(1076003)(40460700003)(336012)(426003)(47076005)(26005)(36860700001)(86362001)(7696005);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Nov 2022 17:47:26.7289 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: ef7192a0-30c0-408f-4551-08dac01ef80a X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.232];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT085.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN7PR12MB6789 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Shay Drory On the source side flow, change the usage of total_length to keep only the size of the image as returned from the FW and rename it to image_length. This is eliminating the usage of total_length inside the SAVE command. This is a preparation step for the next patches from the series where more than one image could be managed on the migration file. Signed-off-by: Shay Drory Signed-off-by: Yishai Hadas --- drivers/vfio/pci/mlx5/cmd.c | 24 +++++++++++------------- drivers/vfio/pci/mlx5/cmd.h | 2 +- drivers/vfio/pci/mlx5/main.c | 20 ++++++++++---------- 3 files changed, 22 insertions(+), 24 deletions(-) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index b9ed2f1c8689..24c6d2e4c2be 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -212,9 +212,9 @@ static int mlx5vf_cmd_get_vhca_id(struct mlx5_core_dev *mdev, u16 function_id, static int _create_mkey(struct mlx5_core_dev *mdev, u32 pdn, struct mlx5_vf_migration_file *migf, struct mlx5_vhca_recv_buf *recv_buf, - u32 *mkey) + u32 *mkey, size_t length) { - size_t npages = migf ? DIV_ROUND_UP(migf->total_length, PAGE_SIZE) : + size_t npages = migf ? DIV_ROUND_UP(length, PAGE_SIZE) : recv_buf->npages; int err = 0, inlen; __be64 *mtt; @@ -255,8 +255,7 @@ static int _create_mkey(struct mlx5_core_dev *mdev, u32 pdn, MLX5_SET(mkc, mkc, qpn, 0xffffff); MLX5_SET(mkc, mkc, log_page_size, PAGE_SHIFT); MLX5_SET(mkc, mkc, translations_octword_size, DIV_ROUND_UP(npages, 2)); - MLX5_SET64(mkc, mkc, len, - migf ? migf->total_length : (npages * PAGE_SIZE)); + MLX5_SET64(mkc, mkc, len, migf ? length : (npages * PAGE_SIZE)); err = mlx5_core_create_mkey(mdev, mkey, in, inlen); kvfree(in); return err; @@ -294,7 +293,7 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context) struct mlx5_vf_migration_file, async_data); if (!status) { - WRITE_ONCE(migf->total_length, + WRITE_ONCE(migf->image_length, MLX5_GET(save_vhca_state_out, async_data->out, actual_image_size)); wake_up_interruptible(&migf->poll_wait); @@ -333,7 +332,8 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, if (err) goto err_dma_map; - err = _create_mkey(mdev, pdn, migf, NULL, &mkey); + err = _create_mkey(mdev, pdn, migf, NULL, + &mkey, migf->allocated_length); if (err) goto err_create_mkey; @@ -342,7 +342,7 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, MLX5_SET(save_vhca_state_in, in, op_mod, 0); MLX5_SET(save_vhca_state_in, in, vhca_id, mvdev->vhca_id); MLX5_SET(save_vhca_state_in, in, mkey, mkey); - MLX5_SET(save_vhca_state_in, in, size, migf->total_length); + MLX5_SET(save_vhca_state_in, in, size, migf->allocated_length); async_data = &migf->async_data; async_data->out = kvzalloc(out_size, GFP_KERNEL); @@ -351,8 +351,6 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, goto err_out; } - /* no data exists till the callback comes back */ - migf->total_length = 0; get_file(migf->filp); async_data->mkey = mkey; async_data->pdn = pdn; @@ -393,7 +391,7 @@ int mlx5vf_cmd_load_vhca_state(struct mlx5vf_pci_core_device *mvdev, return -ENOTCONN; mutex_lock(&migf->lock); - if (!migf->total_length) { + if (!migf->image_length) { err = -EINVAL; goto end; } @@ -407,7 +405,7 @@ int mlx5vf_cmd_load_vhca_state(struct mlx5vf_pci_core_device *mvdev, if (err) goto err_reg; - err = _create_mkey(mdev, pdn, migf, NULL, &mkey); + err = _create_mkey(mdev, pdn, migf, NULL, &mkey, migf->image_length); if (err) goto err_mkey; @@ -416,7 +414,7 @@ int mlx5vf_cmd_load_vhca_state(struct mlx5vf_pci_core_device *mvdev, MLX5_SET(load_vhca_state_in, in, op_mod, 0); MLX5_SET(load_vhca_state_in, in, vhca_id, mvdev->vhca_id); MLX5_SET(load_vhca_state_in, in, mkey, mkey); - MLX5_SET(load_vhca_state_in, in, size, migf->total_length); + MLX5_SET(load_vhca_state_in, in, size, migf->image_length); err = mlx5_cmd_exec_inout(mdev, load_vhca_state, in, out); @@ -1047,7 +1045,7 @@ static int mlx5vf_alloc_qp_recv_resources(struct mlx5_core_dev *mdev, if (err) goto end; - err = _create_mkey(mdev, pdn, NULL, recv_buf, &recv_buf->mkey); + err = _create_mkey(mdev, pdn, NULL, recv_buf, &recv_buf->mkey, 0); if (err) goto err_create_mkey; diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index b1c5dd2ff144..b1fa1a0418a5 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -29,7 +29,7 @@ struct mlx5_vf_migration_file { u8 save_cb_active:1; struct sg_append_table table; - size_t total_length; + size_t image_length; size_t allocated_length; /* Optimize mlx5vf_get_migration_page() for sequential access */ diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index 5da278f3c31c..624b1a99dc21 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -116,7 +116,7 @@ static void mlx5vf_disable_fd(struct mlx5_vf_migration_file *migf) __free_page(sg_page_iter_page(&sg_iter)); sg_free_append_table(&migf->table); migf->disabled = true; - migf->total_length = 0; + migf->image_length = 0; migf->allocated_length = 0; migf->filp->f_pos = 0; mutex_unlock(&migf->lock); @@ -144,16 +144,16 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, if (!(filp->f_flags & O_NONBLOCK)) { if (wait_event_interruptible(migf->poll_wait, - READ_ONCE(migf->total_length) || migf->is_err)) + READ_ONCE(migf->image_length) || migf->is_err)) return -ERESTARTSYS; } mutex_lock(&migf->lock); - if ((filp->f_flags & O_NONBLOCK) && !READ_ONCE(migf->total_length)) { + if ((filp->f_flags & O_NONBLOCK) && !READ_ONCE(migf->image_length)) { done = -EAGAIN; goto out_unlock; } - if (*pos > migf->total_length) { + if (*pos > migf->image_length) { done = -EINVAL; goto out_unlock; } @@ -162,7 +162,7 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, goto out_unlock; } - len = min_t(size_t, migf->total_length - *pos, len); + len = min_t(size_t, migf->image_length - *pos, len); while (len) { size_t page_offset; struct page *page; @@ -208,7 +208,7 @@ static __poll_t mlx5vf_save_poll(struct file *filp, mutex_lock(&migf->lock); if (migf->disabled || migf->is_err) pollflags = EPOLLIN | EPOLLRDNORM | EPOLLRDHUP; - else if (READ_ONCE(migf->total_length)) + else if (READ_ONCE(migf->image_length)) pollflags = EPOLLIN | EPOLLRDNORM; mutex_unlock(&migf->lock); @@ -227,6 +227,7 @@ static struct mlx5_vf_migration_file * mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev) { struct mlx5_vf_migration_file *migf; + size_t length; int ret; migf = kzalloc(sizeof(*migf), GFP_KERNEL); @@ -248,13 +249,12 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev) init_waitqueue_head(&migf->save_wait); mlx5_cmd_init_async_ctx(mvdev->mdev, &migf->async_ctx); INIT_WORK(&migf->async_data.work, mlx5vf_mig_file_cleanup_cb); - ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, - &migf->total_length); + ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length); if (ret) goto out_free; ret = mlx5vf_add_migration_pages( - migf, DIV_ROUND_UP_ULL(migf->total_length, PAGE_SIZE)); + migf, DIV_ROUND_UP_ULL(length, PAGE_SIZE)); if (ret) goto out_free; @@ -328,7 +328,7 @@ static ssize_t mlx5vf_resume_write(struct file *filp, const char __user *buf, len -= page_len; done += page_len; buf += page_len; - migf->total_length += page_len; + migf->image_length += page_len; } out_unlock: mutex_unlock(&migf->lock); From patchwork Sun Nov 6 17:46:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13033522 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 811F9C4332F for ; Sun, 6 Nov 2022 17:47:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230487AbiKFRrr (ORCPT ); Sun, 6 Nov 2022 12:47:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44934 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230471AbiKFRrg (ORCPT ); Sun, 6 Nov 2022 12:47:36 -0500 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2071.outbound.protection.outlook.com [40.107.92.71]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4E8A5645D for ; Sun, 6 Nov 2022 09:47:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=jPMjPoJGOiuIg4lBX6uXON/HQRktc72ch/oXHAsEXSXpHp6PP7byr6+JKMa1hWhipKZw39XHR4CHrxfvcCAeRWfFqzdVFuGTIaucs32tBJUl3mJVQsTGmwRioZB/GEGggGM6g4qMQX3X7YLjnjYHeFie0y1hgPc513vB0Oiy2bM3wzoE8s5GFs06lpuz4HsCK0xh/1w0/M2xCNOuRHY9VQhDeF2rjmI+Hb5TKSa5lYL6zXwNor9sgeUwwoHmVkeHZszP8S1dDGb1PHz7CV3nOxiBmCTpwQa1OttWcQ7rC93Hb74Vhxx3kMNngCYbZDq/00VaqV4mhSsfrBnKrxomtw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=diXj7f3QuBargLHsZbFMFsSsR9eYFubMtd/+pUWMaA4=; b=d2RbAVq2vHmsMTu31otOS0PImn0t+ll/oEqwFEZfefIM99+w3Oe6UYxEX9OPK+Nf1mcxc9EKAkbmOstGfool0iY4l7Dns2VwP4f9KJkr/x2N78tpJ2dXgOidnbJUF9uqfSGGvUEGkYfXG/21eBw/YSE2PcuxHnj9zQ5wEMvymrDt3BnmKKJzfdxnvJTudGiYSwjutRoBOxRQY9plLRvAwPQ/IO1cOnZZcfJCDm0E+8T5EAlQYHq6UlGzYhmxX8nZp4YwJ9fFT2cgXyEMjUWra6kw2dSk9M+cDvxmRxLE2Vi2sDuAt4s93mzFpA9y9INUzlgOJABB/Am3dEL3XjRxBg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.233) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=diXj7f3QuBargLHsZbFMFsSsR9eYFubMtd/+pUWMaA4=; b=rinSljeWpOwUMYZfmku6PreKNSfS9X5D4RdwE2Wg+FBAAEK/enDyhWGzw5SIoD9g+MiYhE1v1WtGyvbVP8TOCxvUgsspyWiPENtiiWxlA9qRDL9tynnv82/Eu9/3quxwTDp5lrV7H6eGPaQDKBygwJqofMPJDBkYrSekELclzmHdzZPxG5luw364nilyKSfILn8RKpUiGItK7uhqi5SPVkOYvXioGeEnS4JmPg/rf2q4ZTUUTRnpzQc+D6/ODOmuR+G6xvwSTfzvNyKhs3xtgHoLxiwcd74mSd8oHwc09d5aWRhPl8uW0DlFClP7sj+JFOauQixq04mPQaRQHmrcgA== Received: from DM6PR08CA0051.namprd08.prod.outlook.com (2603:10b6:5:1e0::25) by SN7PR12MB6790.namprd12.prod.outlook.com (2603:10b6:806:269::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.25; Sun, 6 Nov 2022 17:47:31 +0000 Received: from DM6NAM11FT058.eop-nam11.prod.protection.outlook.com (2603:10b6:5:1e0:cafe::53) by DM6PR08CA0051.outlook.office365.com (2603:10b6:5:1e0::25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.25 via Frontend Transport; Sun, 6 Nov 2022 17:47:31 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.233) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.233 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.233; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.233) by DM6NAM11FT058.mail.protection.outlook.com (10.13.172.216) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.20 via Frontend Transport; Sun, 6 Nov 2022 17:47:31 +0000 Received: from drhqmail203.nvidia.com (10.126.190.182) by mail.nvidia.com (10.127.129.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.26; Sun, 6 Nov 2022 09:47:29 -0800 Received: from drhqmail202.nvidia.com (10.126.190.181) by drhqmail203.nvidia.com (10.126.190.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.29; Sun, 6 Nov 2022 09:47:28 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.181) with Microsoft SMTP Server id 15.2.986.29 via Frontend Transport; Sun, 6 Nov 2022 09:47:25 -0800 From: Yishai Hadas To: , CC: , , , , , , , , Subject: [PATCH vfio 07/13] vfio/mlx5: Introduce device transitions of PRE_COPY Date: Sun, 6 Nov 2022 19:46:24 +0200 Message-ID: <20221106174630.25909-8-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221106174630.25909-1-yishaih@nvidia.com> References: <20221106174630.25909-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DM6NAM11FT058:EE_|SN7PR12MB6790:EE_ X-MS-Office365-Filtering-Correlation-Id: 77773e22-c4c6-400a-099f-08dac01efa93 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: QJ97t508WhjN3QjIaIkCLCz2ONeLBvEeouXennskz7ZSOgm5tVJYaS/PfohMZWr6/GSyEj6oK6M+fws/Jn9B87BwCR4ZVAIxbQeprdW/Px/hDkHwqc1CnnKZrmOxyLGYup/14MfdvYfO1ZRTIHR5HQ2S43o9N++tgxDP9B8X+QpnmC8MiydZytzfKPcLCIfHpj5QTE+VtmsitEgzXg8/S3pWeU/8BrT+TnJx9tdEOrxJpn4GrXJQfnnAor1XiPdI17wyGH5xMERyrD/T15/PYv3FqrJnFPJJSQ/OwioIL9JTPtUCFxpLcFrnDqvwfZmG/8m1DeQDUb3q2HILQcXbqVUwZMjpmDZ6467/gxU0FDfyo6OWUPN6zYAvoysJ6US2zZMRzEIKUb49xjZi0nO2w0yMfzZj6xX97Ic0/YLSCGvtZZck7FaeFEALiX9ddbJvTJMxLc4GfiLP8zPMfmWX9NNL7EjzFhQpAILZXgWWjo7PHYFSNaDgbDe7JxhIbS6PL6skiwlIZH3wMyVzLpWdXzR2CJ7QoD3SSUEsHHcv/hXpnrpRuekcBxgnNudUM8hYGYjVhTsmRK4D71ZxYymPYElV8hKJOmVjFE/YS/ZV2a6+Xe7YYBAadVIBbIeIDNKIIBVE0DWTgp1AmEcejbTYqGo/C74HcMiIvzAm0v2fcpBIR+9w0hI44yDDHFRSnEkl5yFpye1/yPpW6Wb2/cUI5Yo/wfJC6+6s3TK25KHSwH6k6X0sL7rIAEDfxmeDNGSIp0pIebvbrwO2XkZAyeE4ig== X-Forefront-Antispam-Report: CIP:216.228.118.233;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge2.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(39860400002)(396003)(376002)(346002)(136003)(451199015)(46966006)(36840700001)(40470700004)(8936002)(5660300002)(66899015)(36860700001)(82740400003)(82310400005)(30864003)(41300700001)(83380400001)(7696005)(70586007)(2906002)(70206006)(336012)(47076005)(7636003)(4326008)(426003)(8676002)(356005)(26005)(110136005)(316002)(6636002)(54906003)(36756003)(86362001)(40480700001)(478600001)(40460700003)(2616005)(1076003)(186003);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Nov 2022 17:47:31.0276 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 77773e22-c4c6-400a-099f-08dac01efa93 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.233];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT058.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN7PR12MB6790 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Shay Drory In order to support PRE_COPY, mlx5 driver is transferring multiple states (images) of the device. e.g.: the source VF can save and transfer multiple states, and the target VF will load them by that order. The device is saving three kinds of states: 1) Initial state - when the device moves to PRE_COPY state. 2) Middle state - during PRE_COPY phase via VFIO_MIG_GET_PRECOPY_INFO. There can be multiple states of this type. 3) Final state - when the device moves to STOP_COPY state. After moving to PRE_COPY state, user is holding the saving migf FD and can use it. For example: user can start transferring data via read() callback. Also, user can switch from PRE_COPY to STOP_COPY whenever he sees it fits. This will invoke saving of final state. This means that mlx5 VFIO device can be switched to STOP_COPY without transferring any data in PRE_COPY state. Therefore, when the device moves to STOP_COPY, mlx5 will store the final state on a dedicated data structure. Signed-off-by: Shay Drory Signed-off-by: Yishai Hadas --- drivers/vfio/pci/mlx5/cmd.c | 56 +++++++++++++++------- drivers/vfio/pci/mlx5/cmd.h | 16 ++++++- drivers/vfio/pci/mlx5/main.c | 93 +++++++++++++++++++++++++++++++----- 3 files changed, 134 insertions(+), 31 deletions(-) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index 24c6d2e4c2be..eb684455c2b2 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -14,6 +14,7 @@ _mlx5vf_free_page_tracker_resources(struct mlx5vf_pci_core_device *mvdev); int mlx5vf_cmd_suspend_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod) { + struct mlx5_vf_migration_file *migf = mvdev->saving_migf; u32 out[MLX5_ST_SZ_DW(suspend_vhca_out)] = {}; u32 in[MLX5_ST_SZ_DW(suspend_vhca_in)] = {}; @@ -21,6 +22,14 @@ int mlx5vf_cmd_suspend_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod) if (mvdev->mdev_detach) return -ENOTCONN; + /* + * In case PRE_COPY is used, saving_migf is exposed while the device is + * running. Make sure to run only once there is no active save command. + * Running both in parallel, might end-up with a failure in the save + * command once it will try to turn on 'tracking' on a suspended device. + */ + if (migf) + wait_event(migf->save_wait, !migf->save_cb_active); MLX5_SET(suspend_vhca_in, in, opcode, MLX5_CMD_OP_SUSPEND_VHCA); MLX5_SET(suspend_vhca_in, in, vhca_id, mvdev->vhca_id); MLX5_SET(suspend_vhca_in, in, op_mod, op_mod); @@ -45,7 +54,7 @@ int mlx5vf_cmd_resume_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod) } int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev, - size_t *state_size) + size_t *state_size, u8 query_flags) { u32 out[MLX5_ST_SZ_DW(query_vhca_migration_state_out)] = {}; u32 in[MLX5_ST_SZ_DW(query_vhca_migration_state_in)] = {}; @@ -59,6 +68,8 @@ int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev, MLX5_CMD_OP_QUERY_VHCA_MIGRATION_STATE); MLX5_SET(query_vhca_migration_state_in, in, vhca_id, mvdev->vhca_id); MLX5_SET(query_vhca_migration_state_in, in, op_mod, 0); + MLX5_SET(query_vhca_migration_state_in, in, incremental, + query_flags & MLX5VF_QUERY_INC); ret = mlx5_cmd_exec_inout(mvdev->mdev, query_vhca_migration_state, in, out); @@ -210,11 +221,11 @@ static int mlx5vf_cmd_get_vhca_id(struct mlx5_core_dev *mdev, u16 function_id, } static int _create_mkey(struct mlx5_core_dev *mdev, u32 pdn, - struct mlx5_vf_migration_file *migf, + struct sg_table *sgt, struct mlx5_vhca_recv_buf *recv_buf, u32 *mkey, size_t length) { - size_t npages = migf ? DIV_ROUND_UP(length, PAGE_SIZE) : + size_t npages = sgt ? DIV_ROUND_UP(length, PAGE_SIZE) : recv_buf->npages; int err = 0, inlen; __be64 *mtt; @@ -232,10 +243,10 @@ static int _create_mkey(struct mlx5_core_dev *mdev, u32 pdn, DIV_ROUND_UP(npages, 2)); mtt = (__be64 *)MLX5_ADDR_OF(create_mkey_in, in, klm_pas_mtt); - if (migf) { + if (sgt) { struct sg_dma_page_iter dma_iter; - for_each_sgtable_dma_page(&migf->table.sgt, &dma_iter, 0) + for_each_sgtable_dma_page(sgt, &dma_iter, 0) *mtt++ = cpu_to_be64(sg_page_iter_dma_address(&dma_iter)); } else { int i; @@ -255,7 +266,7 @@ static int _create_mkey(struct mlx5_core_dev *mdev, u32 pdn, MLX5_SET(mkc, mkc, qpn, 0xffffff); MLX5_SET(mkc, mkc, log_page_size, PAGE_SHIFT); MLX5_SET(mkc, mkc, translations_octword_size, DIV_ROUND_UP(npages, 2)); - MLX5_SET64(mkc, mkc, len, migf ? length : (npages * PAGE_SIZE)); + MLX5_SET64(mkc, mkc, len, sgt ? length : (npages * PAGE_SIZE)); err = mlx5_core_create_mkey(mdev, mkey, in, inlen); kvfree(in); return err; @@ -277,7 +288,7 @@ void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work) mutex_unlock(&migf->lock); mlx5_core_destroy_mkey(mdev, async_data->mkey); - dma_unmap_sgtable(mdev->device, &migf->table.sgt, DMA_FROM_DEVICE, 0); + dma_unmap_sgtable(mdev->device, async_data->sgt, DMA_FROM_DEVICE, 0); mlx5_core_dealloc_pd(mdev, async_data->pdn); kvfree(async_data->out); migf->save_cb_active = false; @@ -293,9 +304,14 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context) struct mlx5_vf_migration_file, async_data); if (!status) { - WRITE_ONCE(migf->image_length, - MLX5_GET(save_vhca_state_out, async_data->out, - actual_image_size)); + size_t len = MLX5_GET(save_vhca_state_out, async_data->out, + actual_image_size); + + if (async_data->sgt == &migf->final_table.sgt) + WRITE_ONCE(migf->final_length, len); + else + WRITE_ONCE(migf->image_length, len); + wake_up_interruptible(&migf->poll_wait); } @@ -308,7 +324,8 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context) } int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, - struct mlx5_vf_migration_file *migf) + struct mlx5_vf_migration_file *migf, bool inc, + bool track) { u32 out_size = MLX5_ST_SZ_BYTES(save_vhca_state_out); u32 in[MLX5_ST_SZ_DW(save_vhca_state_in)] = {}; @@ -327,12 +344,15 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, if (err) return err; - err = dma_map_sgtable(mdev->device, &migf->table.sgt, DMA_FROM_DEVICE, - 0); + async_data = &migf->async_data; + async_data->sgt = (!track && inc) ? &migf->final_table.sgt : + &migf->table.sgt; + err = dma_map_sgtable(mdev->device, async_data->sgt, + DMA_FROM_DEVICE, 0); if (err) goto err_dma_map; - err = _create_mkey(mdev, pdn, migf, NULL, + err = _create_mkey(mdev, pdn, async_data->sgt, NULL, &mkey, migf->allocated_length); if (err) goto err_create_mkey; @@ -343,8 +363,9 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, MLX5_SET(save_vhca_state_in, in, vhca_id, mvdev->vhca_id); MLX5_SET(save_vhca_state_in, in, mkey, mkey); MLX5_SET(save_vhca_state_in, in, size, migf->allocated_length); + MLX5_SET(save_vhca_state_in, in, incremental, inc); + MLX5_SET(save_vhca_state_in, in, set_track, track); - async_data = &migf->async_data; async_data->out = kvzalloc(out_size, GFP_KERNEL); if (!async_data->out) { err = -ENOMEM; @@ -370,7 +391,7 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, err_out: mlx5_core_destroy_mkey(mdev, mkey); err_create_mkey: - dma_unmap_sgtable(mdev->device, &migf->table.sgt, DMA_FROM_DEVICE, 0); + dma_unmap_sgtable(mdev->device, async_data->sgt, DMA_FROM_DEVICE, 0); err_dma_map: mlx5_core_dealloc_pd(mdev, pdn); migf->save_cb_active = false; @@ -405,7 +426,8 @@ int mlx5vf_cmd_load_vhca_state(struct mlx5vf_pci_core_device *mvdev, if (err) goto err_reg; - err = _create_mkey(mdev, pdn, migf, NULL, &mkey, migf->image_length); + err = _create_mkey(mdev, pdn, &migf->table.sgt, NULL, &mkey, + migf->image_length); if (err) goto err_mkey; diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index b1fa1a0418a5..c12fa81ba53f 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -15,6 +15,7 @@ struct mlx5vf_async_data { struct mlx5_async_work cb_work; struct work_struct work; + struct sg_table *sgt; int status; u32 pdn; u32 mkey; @@ -31,6 +32,12 @@ struct mlx5_vf_migration_file { struct sg_append_table table; size_t image_length; size_t allocated_length; + /* + * The device can be moved to stop_copy before the previous state was + * fully read. Another set of variables is needed to maintain it. + */ + size_t final_length; + struct sg_append_table final_table; /* Optimize mlx5vf_get_migration_page() for sequential access */ struct scatterlist *last_offset_sg; @@ -115,17 +122,22 @@ struct mlx5vf_pci_core_device { struct mlx5_core_dev *mdev; }; +enum { + MLX5VF_QUERY_INC = (1UL << 0), +}; + int mlx5vf_cmd_suspend_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod); int mlx5vf_cmd_resume_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod); int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev, - size_t *state_size); + size_t *state_size, u8 query_flags); void mlx5vf_cmd_set_migratable(struct mlx5vf_pci_core_device *mvdev, const struct vfio_migration_ops *mig_ops, const struct vfio_log_ops *log_ops); void mlx5vf_cmd_remove_migratable(struct mlx5vf_pci_core_device *mvdev); void mlx5vf_cmd_close_migratable(struct mlx5vf_pci_core_device *mvdev); int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, - struct mlx5_vf_migration_file *migf); + struct mlx5_vf_migration_file *migf, bool inc, + bool track); int mlx5vf_cmd_load_vhca_state(struct mlx5vf_pci_core_device *mvdev, struct mlx5_vf_migration_file *migf); void mlx5vf_state_mutex_unlock(struct mlx5vf_pci_core_device *mvdev); diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index 624b1a99dc21..10e073c32ab1 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -64,7 +64,8 @@ mlx5vf_get_migration_page(struct mlx5_vf_migration_file *migf, } static int mlx5vf_add_migration_pages(struct mlx5_vf_migration_file *migf, - unsigned int npages) + unsigned int npages, + struct sg_append_table *table) { unsigned int to_alloc = npages; struct page **page_list; @@ -85,7 +86,7 @@ static int mlx5vf_add_migration_pages(struct mlx5_vf_migration_file *migf, } to_alloc -= filled; ret = sg_alloc_append_table_from_pages( - &migf->table, page_list, filled, 0, + table, page_list, filled, 0, filled << PAGE_SHIFT, UINT_MAX, SG_MAX_SINGLE_ALLOC, GFP_KERNEL); @@ -118,7 +119,11 @@ static void mlx5vf_disable_fd(struct mlx5_vf_migration_file *migf) migf->disabled = true; migf->image_length = 0; migf->allocated_length = 0; + migf->final_length = 0; migf->filp->f_pos = 0; + for_each_sgtable_page(&migf->final_table.sgt, &sg_iter, 0) + __free_page(sg_page_iter_page(&sg_iter)); + sg_free_append_table(&migf->final_table); mutex_unlock(&migf->lock); } @@ -215,6 +220,16 @@ static __poll_t mlx5vf_save_poll(struct file *filp, return pollflags; } +/* + * FD is exposed and user can use it after receiving an error. + * Mark migf in error, and wake the user. + */ +static void mlx5vf_mark_err(struct mlx5_vf_migration_file *migf) +{ + migf->is_err = true; + wake_up_interruptible(&migf->poll_wait); +} + static const struct file_operations mlx5vf_save_fops = { .owner = THIS_MODULE, .read = mlx5vf_save_read, @@ -223,8 +238,35 @@ static const struct file_operations mlx5vf_save_fops = { .llseek = no_llseek, }; +static int mlx5vf_pci_save_device_inc_data(struct mlx5vf_pci_core_device *mvdev) +{ + struct mlx5_vf_migration_file *migf = mvdev->saving_migf; + size_t length; + int ret; + + ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length, + MLX5VF_QUERY_INC); + if (ret) + return ret; + + if (migf->is_err) + return -ENODEV; + + ret = mlx5vf_add_migration_pages( + migf, DIV_ROUND_UP_ULL(length, PAGE_SIZE), &migf->final_table); + if (ret) { + mlx5vf_mark_err(migf); + return ret; + } + + ret = mlx5vf_cmd_save_vhca_state(mvdev, migf, true, false); + if (ret) + mlx5vf_mark_err(migf); + return ret; +} + static struct mlx5_vf_migration_file * -mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev) +mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev, bool track) { struct mlx5_vf_migration_file *migf; size_t length; @@ -249,17 +291,17 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev) init_waitqueue_head(&migf->save_wait); mlx5_cmd_init_async_ctx(mvdev->mdev, &migf->async_ctx); INIT_WORK(&migf->async_data.work, mlx5vf_mig_file_cleanup_cb); - ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length); + ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length, 0); if (ret) goto out_free; ret = mlx5vf_add_migration_pages( - migf, DIV_ROUND_UP_ULL(length, PAGE_SIZE)); + migf, DIV_ROUND_UP_ULL(length, PAGE_SIZE), &migf->table); if (ret) goto out_free; migf->mvdev = mvdev; - ret = mlx5vf_cmd_save_vhca_state(mvdev, migf); + ret = mlx5vf_cmd_save_vhca_state(mvdev, migf, false, track); if (ret) goto out_free; return migf; @@ -296,7 +338,7 @@ static ssize_t mlx5vf_resume_write(struct file *filp, const char __user *buf, done = mlx5vf_add_migration_pages( migf, DIV_ROUND_UP(requested_length - migf->allocated_length, - PAGE_SIZE)); + PAGE_SIZE), &migf->table); if (done) goto out_unlock; } @@ -403,7 +445,8 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev, return NULL; } - if (cur == VFIO_DEVICE_STATE_RUNNING && new == VFIO_DEVICE_STATE_RUNNING_P2P) { + if ((cur == VFIO_DEVICE_STATE_RUNNING && new == VFIO_DEVICE_STATE_RUNNING_P2P) || + (cur == VFIO_DEVICE_STATE_PRE_COPY && new == VFIO_DEVICE_STATE_PRE_COPY_P2P)) { ret = mlx5vf_cmd_suspend_vhca(mvdev, MLX5_SUSPEND_VHCA_IN_OP_MOD_SUSPEND_INITIATOR); if (ret) @@ -411,7 +454,8 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev, return NULL; } - if (cur == VFIO_DEVICE_STATE_RUNNING_P2P && new == VFIO_DEVICE_STATE_RUNNING) { + if ((cur == VFIO_DEVICE_STATE_RUNNING_P2P && new == VFIO_DEVICE_STATE_RUNNING) || + (cur == VFIO_DEVICE_STATE_PRE_COPY_P2P && new == VFIO_DEVICE_STATE_PRE_COPY)) { ret = mlx5vf_cmd_resume_vhca(mvdev, MLX5_RESUME_VHCA_IN_OP_MOD_RESUME_INITIATOR); if (ret) @@ -422,7 +466,7 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev, if (cur == VFIO_DEVICE_STATE_STOP && new == VFIO_DEVICE_STATE_STOP_COPY) { struct mlx5_vf_migration_file *migf; - migf = mlx5vf_pci_save_device_data(mvdev); + migf = mlx5vf_pci_save_device_data(mvdev, false); if (IS_ERR(migf)) return ERR_CAST(migf); get_file(migf->filp); @@ -430,7 +474,10 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev, return migf->filp; } - if ((cur == VFIO_DEVICE_STATE_STOP_COPY && new == VFIO_DEVICE_STATE_STOP)) { + if ((cur == VFIO_DEVICE_STATE_STOP_COPY && new == VFIO_DEVICE_STATE_STOP) || + (cur == VFIO_DEVICE_STATE_PRE_COPY && new == VFIO_DEVICE_STATE_RUNNING) || + (cur == VFIO_DEVICE_STATE_PRE_COPY_P2P && + new == VFIO_DEVICE_STATE_RUNNING_P2P)) { mlx5vf_disable_fds(mvdev); return NULL; } @@ -455,6 +502,28 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev, return NULL; } + if ((cur == VFIO_DEVICE_STATE_RUNNING && new == VFIO_DEVICE_STATE_PRE_COPY) || + (cur == VFIO_DEVICE_STATE_RUNNING_P2P && + new == VFIO_DEVICE_STATE_PRE_COPY_P2P)) { + struct mlx5_vf_migration_file *migf; + + migf = mlx5vf_pci_save_device_data(mvdev, true); + if (IS_ERR(migf)) + return ERR_CAST(migf); + get_file(migf->filp); + mvdev->saving_migf = migf; + return migf->filp; + } + + if (cur == VFIO_DEVICE_STATE_PRE_COPY_P2P && new == VFIO_DEVICE_STATE_STOP_COPY) { + ret = mlx5vf_cmd_suspend_vhca(mvdev, + MLX5_SUSPEND_VHCA_IN_OP_MOD_SUSPEND_RESPONDER); + if (ret) + return ERR_PTR(ret); + ret = mlx5vf_pci_save_device_inc_data(mvdev); + return ret ? ERR_PTR(ret) : NULL; + } + /* * vfio_mig_get_next_state() does not use arcs other than the above */ @@ -523,7 +592,7 @@ static int mlx5vf_pci_get_data_size(struct vfio_device *vdev, mutex_lock(&mvdev->state_mutex); ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, - &state_size); + &state_size, 0); if (!ret) *stop_copy_length = state_size; mlx5vf_state_mutex_unlock(mvdev); From patchwork Sun Nov 6 17:46:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13033523 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7163EC433FE for ; Sun, 6 Nov 2022 17:48:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231297AbiKFRsA (ORCPT ); Sun, 6 Nov 2022 12:48:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45204 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231147AbiKFRrp (ORCPT ); Sun, 6 Nov 2022 12:47:45 -0500 Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on2050.outbound.protection.outlook.com [40.107.220.50]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 02F4A6571 for ; Sun, 6 Nov 2022 09:47:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=bkyU1EpMTJ4+d5E9rn951Gvj05kqCn10a932Q9N1CKcQGy9RHxhbB78GhFILuTX0ZPPVjnkbh75nMAvelCjwuICcavbl1/u0FkrskDNrnmU80QnawW3D5yt1V4RKNgrwVsiT1qF/chzm5lbYI/HENOMQ7EWXk1ON4oSEmlkLHC6nhrvUqEIqvScVdiAINp8sa7LmC1LfzZYm4FpMXh+nszZTGaMHi7bBMJlbmYk+SvN6sz42Ag9uXGTVpMWWLsyIcuhIRN+eOj4R+VYIUynhL09oYJPvsRSlahL0iprKDQYo+ImZqvzFdYhCve741VDlcnFPFBF4FHlx5DQ2iIsvhQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=JcmuycdpkEmt2Q1yqZWh0X9ViiwkmDSrVhQQomAWY6E=; b=FHnb6rWJkRz2hmfH1yoGBhT0br624sHMlkb3eBwna/ejvnrTmxNjs4WSHQXbJBlxN8Zf5BO0+V7mNhZhW6yJy3Z4hXfz+yOgKcbptduT8opjl+IGFKTmlm/VBEz48MwzJXGurVeItSSBBDVlp7mTXZVwWmuVqkgLNJv8EyIfU78wEV3qKUbmya2Aw6WyleuMerX6W1ncXyCF23VbMGnfjuqwOSFGg64Dx8qeH/ciL8A3VLuCHT8H2WgFC4THBUOjt/XE09pp8YAMubCKo9zysXPCjcFXTAajSoMOs66nC78wysdMwBdED8ViwcQVDeQBW4rE5yYnmoG3wvf6nwAqcw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.233) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=JcmuycdpkEmt2Q1yqZWh0X9ViiwkmDSrVhQQomAWY6E=; b=kxoJPYb5noh6p+IkR2B2twQ26Bh4hvhCTDW0NXJlEulCiTRmF5DiLI72fG3W04IqFm/QWP2hucyHHYZIJjnrQP6nA0cgGUPsM/BG4bkbRKgEFMIeYZgJcH8jS8k5UrEsMvTLxu+1lBl3nniT1tjxrHRHn3py8ogU+9uTTqqEM9uJLw7ZYDQM66hqHArW4lj9WHDmp1uNZbswNxN3GTfAzL61/JezBxl3RCgdoJ7rwhs3ZaJqg01vVZyy48t9JpNDsNEjO/G+2ctHqggOvyzr+O6nB9hb9L2z+0byGw2s3ZZ9TeexEcKAwbpt6+mRh0h6Y3ZBd+/FDSCXzqDPTok6cw== Received: from DM6PR08CA0048.namprd08.prod.outlook.com (2603:10b6:5:1e0::22) by DM4PR12MB5216.namprd12.prod.outlook.com (2603:10b6:5:398::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5769.19; Sun, 6 Nov 2022 17:47:34 +0000 Received: from DM6NAM11FT058.eop-nam11.prod.protection.outlook.com (2603:10b6:5:1e0:cafe::cf) by DM6PR08CA0048.outlook.office365.com (2603:10b6:5:1e0::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.25 via Frontend Transport; Sun, 6 Nov 2022 17:47:34 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.233) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.233 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.233; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.233) by DM6NAM11FT058.mail.protection.outlook.com (10.13.172.216) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.20 via Frontend Transport; Sun, 6 Nov 2022 17:47:34 +0000 Received: from drhqmail203.nvidia.com (10.126.190.182) by mail.nvidia.com (10.127.129.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.26; Sun, 6 Nov 2022 09:47:32 -0800 Received: from drhqmail202.nvidia.com (10.126.190.181) by drhqmail203.nvidia.com (10.126.190.182) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.29; Sun, 6 Nov 2022 09:47:31 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.181) with Microsoft SMTP Server id 15.2.986.29 via Frontend Transport; Sun, 6 Nov 2022 09:47:29 -0800 From: Yishai Hadas To: , CC: , , , , , , , , Subject: [PATCH vfio 08/13] vfio/mlx5: Introduce vfio precopy ioctl implementation Date: Sun, 6 Nov 2022 19:46:25 +0200 Message-ID: <20221106174630.25909-9-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221106174630.25909-1-yishaih@nvidia.com> References: <20221106174630.25909-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DM6NAM11FT058:EE_|DM4PR12MB5216:EE_ X-MS-Office365-Filtering-Correlation-Id: 00f5d856-63df-43fc-71e4-08dac01efc8c X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: LTpSQLhX7+Ttt+Zk8g3Q1JupD3OwRHPPxj9XBfJ+A2+bjkRvsCCDAFAoiEy5FJklJV1NXDSsErtSh/kJNzF9xUT+7diTsuZ2gjWLiII7f24UAijzu7SBoiLWoKJuyTOdtaeV4le7hbrElz+yOh/jhnRXxHwhYi4aG+6pbXwigUOhm4RrxwR6mjAdbwLKxS8DQoLbMRC5gTzdBfI+A+Eq8zaPuQnInE5ardutp0LCuX4dGi+4CZ0Y0A1eyHbHBZk/9UrL7p+aELvfnj64e4qVXATPkJ8Jo1HuD8OqlC3exu9vnjcvgXPLiwm04l1FAvis/JJUL7OS5R9V4+aO2/ECQmFVK+87oIjo9od+mNjDRpdOgyCOjL/4kIpIM1x3p8e5kcokH8/yZ1xN4H67bFoyD+78Zm9+1Bq/vA9AeAX5Awa95uTfWoKVOfkIIrhBZW+XHmL44rqN/dWjxGRgof9iQ/9wc0l015+U/J9ldXinMFj1lWLpk3/wFI3HP1wLg3kDzk0qVIkH3J8g18lSmGAekMlO4oq+KCaXPocAPJWp3GbU+cQeRKiFjDn8vbmgHzEjXi/VujhKz9pp38hF3jgJ36ewRCQOI98LkPMhcTH0UWcsClVzKiaY+Xw+KZur97s0L01bfF2tJ9t8kOmr1+EObSBc+JqEZ/2gWnAjtdB224KzhT1ZGuocWS8SEu9SHm4NDey/yazkBVgvcCrFqmHy54pCvmsiGSDgrdBajNkREUZfPqyNHwCx0FAQNBoJBnW7i4i6YZNb3XkGGw7Tu2avVw== X-Forefront-Antispam-Report: CIP:216.228.118.233;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge2.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(136003)(39860400002)(396003)(376002)(346002)(451199015)(46966006)(40470700004)(36840700001)(186003)(1076003)(426003)(83380400001)(2616005)(336012)(2906002)(6666004)(36756003)(8936002)(41300700001)(316002)(7696005)(36860700001)(82310400005)(47076005)(26005)(478600001)(5660300002)(40480700001)(86362001)(70206006)(4326008)(110136005)(356005)(8676002)(6636002)(40460700003)(7636003)(82740400003)(70586007)(54906003);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Nov 2022 17:47:34.3398 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 00f5d856-63df-43fc-71e4-08dac01efc8c X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.233];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT058.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM4PR12MB5216 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Shay Drory vfio precopy ioctl returns an estimation of data available for transferring from the device. Whenever a user is using VFIO_MIG_GET_PRECOPY_INFO, track the current state of the device, and if needed, append the dirty data to the transfer FD data. This is done by saving a middle state. As mlx5 runs the SAVE command asynchronously, make sure to query for incremental data only once there is no active save command. Running both in parallel, might end-up with a failure in the incremental query command on un-tracked vhca. Also, a middle state will be saved only after the previous state has finished its SAVE command and has been fully transferred, this enables to re-use the resources. In order to map between FD position and the new saved state data, store the current FD position. Signed-off-by: Shay Drory Signed-off-by: Yishai Hadas --- drivers/vfio/pci/mlx5/cmd.c | 9 +++ drivers/vfio/pci/mlx5/cmd.h | 1 + drivers/vfio/pci/mlx5/main.c | 131 +++++++++++++++++++++++++++++++++++ 3 files changed, 141 insertions(+) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index eb684455c2b2..2d2171191218 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -64,6 +64,15 @@ int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev, if (mvdev->mdev_detach) return -ENOTCONN; + /* + * In case PRE_COPY is used, saving_migf is exposed while device is + * running. Make sure to run only once there is no active save command. + * Running both in parallel, might end-up with a failure in the + * incremental query command on un-tracked vhca. + */ + if (query_flags & MLX5VF_QUERY_INC) + wait_event(mvdev->saving_migf->save_wait, + !mvdev->saving_migf->save_cb_active); MLX5_SET(query_vhca_migration_state_in, in, opcode, MLX5_CMD_OP_QUERY_VHCA_MIGRATION_STATE); MLX5_SET(query_vhca_migration_state_in, in, vhca_id, mvdev->vhca_id); diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index c12fa81ba53f..07a2fc54c9d8 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -30,6 +30,7 @@ struct mlx5_vf_migration_file { u8 save_cb_active:1; struct sg_append_table table; + size_t table_start_pos; size_t image_length; size_t allocated_length; /* diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index 10e073c32ab1..266626066fed 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -107,6 +107,22 @@ static int mlx5vf_add_migration_pages(struct mlx5_vf_migration_file *migf, return ret; } +static void mlx5vf_prep_next_table(struct mlx5_vf_migration_file *migf) +{ + struct sg_page_iter sg_iter; + + lockdep_assert_held(&migf->lock); + migf->table_start_pos += migf->image_length; + /* clear sgtable, all data has been transferred */ + for_each_sgtable_page(&migf->table.sgt, &sg_iter, 0) + __free_page(sg_page_iter_page(&sg_iter)); + sg_free_append_table(&migf->table); + memset(&migf->table, 0, sizeof(migf->table)); + migf->image_length = 0; + migf->allocated_length = 0; + migf->last_offset_sg = NULL; +} + static void mlx5vf_disable_fd(struct mlx5_vf_migration_file *migf) { struct sg_page_iter sg_iter; @@ -120,6 +136,7 @@ static void mlx5vf_disable_fd(struct mlx5_vf_migration_file *migf) migf->image_length = 0; migf->allocated_length = 0; migf->final_length = 0; + migf->table_start_pos = 0; migf->filp->f_pos = 0; for_each_sgtable_page(&migf->final_table.sgt, &sg_iter, 0) __free_page(sg_page_iter_page(&sg_iter)); @@ -137,6 +154,13 @@ static int mlx5vf_release_file(struct inode *inode, struct file *filp) return 0; } +#define MIGF_TOTAL_DATA(migf) \ + (migf->table_start_pos + migf->image_length + migf->final_length) + +#define VFIO_MIG_STATE_PRE_COPY(mvdev) \ + (mvdev->mig_state == VFIO_DEVICE_STATE_PRE_COPY || \ + mvdev->mig_state == VFIO_DEVICE_STATE_PRE_COPY_P2P) + static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, loff_t *pos) { @@ -230,10 +254,117 @@ static void mlx5vf_mark_err(struct mlx5_vf_migration_file *migf) wake_up_interruptible(&migf->poll_wait); } +static ssize_t mlx5vf_precopy_ioctl(struct file *filp, unsigned int cmd, + unsigned long arg) +{ + struct mlx5_vf_migration_file *migf = filp->private_data; + struct mlx5vf_pci_core_device *mvdev = migf->mvdev; + bool first_state, state_finish_transfer; + struct vfio_precopy_info info; + loff_t *pos = &filp->f_pos; + unsigned long minsz; + size_t inc_length; + int ret; + + if (cmd != VFIO_MIG_GET_PRECOPY_INFO) + return -ENOTTY; + + minsz = offsetofend(struct vfio_precopy_info, dirty_bytes); + + if (copy_from_user(&info, (void __user *)arg, minsz)) + return -EFAULT; + + if (info.argsz < minsz) + return -EINVAL; + + mutex_lock(&mvdev->state_mutex); + if (!VFIO_MIG_STATE_PRE_COPY(migf->mvdev)) { + ret = -EINVAL; + goto err_state_unlock; + } + + /* + * We can't issue a SAVE command when the device is suspended, so as + * part of VFIO_DEVICE_STATE_PRE_COPY_P2P no reason to query for extra + * bytes that can't be read. + */ + if (mvdev->mig_state != VFIO_DEVICE_STATE_PRE_COPY_P2P) { + /* + * Once the query returns it's guaranteed that there is no + * active SAVE command. + * As so, the other code below is safe with the proper locks. + */ + ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &inc_length, + MLX5VF_QUERY_INC); + if (ret) + goto err_state_unlock; + } + + mutex_lock(&migf->lock); + if (*pos > MIGF_TOTAL_DATA(migf)) { + ret = -EINVAL; + goto err_migf_unlock; + } + + if (migf->disabled || migf->is_err) { + ret = -ENODEV; + goto err_migf_unlock; + } + + first_state = migf->table_start_pos == 0; + if (first_state) { + info.initial_bytes = MIGF_TOTAL_DATA(migf) - *pos; + info.dirty_bytes = 0; + } else { + info.initial_bytes = 0; + info.dirty_bytes = MIGF_TOTAL_DATA(migf) - *pos; + } + state_finish_transfer = *pos == MIGF_TOTAL_DATA(migf); + if (!(state_finish_transfer && inc_length && + mvdev->mig_state == VFIO_DEVICE_STATE_PRE_COPY)) { + mutex_unlock(&migf->lock); + goto done; + } + + /* + * We finished transferring the current state and the device has a + * dirty state, save a new state to be ready for. + */ + mlx5vf_prep_next_table(migf); + ret = mlx5vf_add_migration_pages(migf, + DIV_ROUND_UP_ULL(inc_length, PAGE_SIZE), + &migf->table); + mutex_unlock(&migf->lock); + if (ret) { + mlx5vf_mark_err(migf); + goto err_state_unlock; + } + + ret = mlx5vf_cmd_save_vhca_state(mvdev, migf, true, true); + if (ret) { + mlx5vf_mark_err(migf); + goto err_state_unlock; + } + + info.dirty_bytes += inc_length; + +done: + mlx5vf_state_mutex_unlock(mvdev); + return copy_to_user((void __user *)arg, &info, minsz); + +err_migf_unlock: + mutex_unlock(&migf->lock); +err_state_unlock: + mlx5vf_state_mutex_unlock(mvdev); + return ret; +} + static const struct file_operations mlx5vf_save_fops = { .owner = THIS_MODULE, .read = mlx5vf_save_read, .poll = mlx5vf_save_poll, + .unlocked_ioctl = mlx5vf_precopy_ioctl, + .compat_ioctl = compat_ptr_ioctl, .release = mlx5vf_release_file, .llseek = no_llseek, }; From patchwork Sun Nov 6 17:46:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13033524 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68020C4332F for ; Sun, 6 Nov 2022 17:48:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231314AbiKFRsC (ORCPT ); Sun, 6 Nov 2022 12:48:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45218 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230468AbiKFRrp (ORCPT ); Sun, 6 Nov 2022 12:47:45 -0500 Received: from NAM04-DM6-obe.outbound.protection.outlook.com (mail-dm6nam04on2070.outbound.protection.outlook.com [40.107.102.70]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 01523657D for ; Sun, 6 Nov 2022 09:47:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=NUMhRKj0aYe+XkUoKbJ2NB8ZdQWs2fLS62kCIvxms8z2FokCMbOrrpHChauVPRscfAeemVbbrcHpS0HuDd4DSljVFXRKJjcibUN117aBj4oFuzV6lh8mb4ytW61rf3P/o3vs7zmH3+FNBo78Vv0A7J4pM8RLjD743Iz+QRhogFz3ei34REjpE0YwBy5GGsLQ+yuj8P8KBMfdsPAOndYE+R/M0GgiN+8m6KFHtqOrb33gnCsMdkngGRcIFZhY4QqpQytTjDbBUwaUolV13eRgIbZWvokFGM7J1PatW3X+Q+3lYP1Z1fVwNjQEP90nRFupK3zNhZVZF/JNax5ndXWTKA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=sCZMaXfF/czc2VaRVZLUMNEQLdeNsGTaq65WXvaUMW4=; b=M9wfT8Eq/0AjhAsAD8ITwugUWMQWDnWQ9FMRqH/ZaPfV2Pl3yiVAu7ej65tJKaAokkf93RFnZJcY+WO8+tfZdYpeT31ao8Lv8NDH8U0mc62eYZ1Pt0TexuiGSN1mHbimWbC8+wcNdklmtSfOKcz4OEIPg0co4GFpMmSJznS4IRwRT92C5lULjcjolcJIVFUD3HukL1LD5paVtcDXwihxjfhT+EvJeH3ti/i2YXBfUEFN4jIeSpodZrQSQTlZPPzPdjyTRkME5zpHWTZoY9Op/9Jdwb4rc7+Vwfzy1sz7wvPYARvPZ5rIYrCwznE1pZaLojQwTHV4MMDUMrRyxh1VQQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.233) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=sCZMaXfF/czc2VaRVZLUMNEQLdeNsGTaq65WXvaUMW4=; b=X+WhRrVzov+qhkKYpkB8A7kXBW0EKUMgz6LKbL8i5JClZpQFXyw7Ul3nzlb/9dY1thtFeTpa9SedlxOM9QOIuLEtqk5ub55DNkYFoH8rDHAvKE0DJwAU0T0sgGp+UNvHa1n/ZXx4yNpAaCAgK+aeYq5MlnzYzadd+cWohJdUzl8cO8uDeB49oyQ02SBfAqRxenmhTYasRGvoR+lt6/yZJR1RZA8l7KngNeSXcf5txDOZ9jFBjqUz7yHRopD2olpMJ6d/D0iRAv5NppyvlV9taNJ2wlq2+gSFBqmXLTAA5OuJqNH8OgnoaK24QAySJelLid3IZU6N+rTLFT51UhX/ZQ== Received: from DM6PR08CA0059.namprd08.prod.outlook.com (2603:10b6:5:1e0::33) by CH0PR12MB5330.namprd12.prod.outlook.com (2603:10b6:610:d5::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.25; Sun, 6 Nov 2022 17:47:36 +0000 Received: from DM6NAM11FT058.eop-nam11.prod.protection.outlook.com (2603:10b6:5:1e0:cafe::d) by DM6PR08CA0059.outlook.office365.com (2603:10b6:5:1e0::33) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.25 via Frontend Transport; Sun, 6 Nov 2022 17:47:36 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.233) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.233 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.233; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.233) by DM6NAM11FT058.mail.protection.outlook.com (10.13.172.216) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.20 via Frontend Transport; Sun, 6 Nov 2022 17:47:36 +0000 Received: from drhqmail201.nvidia.com (10.126.190.180) by mail.nvidia.com (10.127.129.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.26; Sun, 6 Nov 2022 09:47:35 -0800 Received: from drhqmail202.nvidia.com (10.126.190.181) by drhqmail201.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.29; Sun, 6 Nov 2022 09:47:35 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.181) with Microsoft SMTP Server id 15.2.986.29 via Frontend Transport; Sun, 6 Nov 2022 09:47:32 -0800 From: Yishai Hadas To: , CC: , , , , , , , , Subject: [PATCH vfio 09/13] vfio/mlx5: Manage read() of multiple state saves Date: Sun, 6 Nov 2022 19:46:26 +0200 Message-ID: <20221106174630.25909-10-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221106174630.25909-1-yishaih@nvidia.com> References: <20221106174630.25909-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DM6NAM11FT058:EE_|CH0PR12MB5330:EE_ X-MS-Office365-Filtering-Correlation-Id: e1fe64c0-438b-443e-e889-08dac01efdac X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: vDmoFIRSQDlJseQs0mwtg7v6soPVfVUDVCPgxYZ+HR0HSqSzOJ3ddEYetiLjKh09QGEvjmmQjRHBE+bvFoJ2S2/X+gQ4pmr1uxNpBDqAXvT4W53zjLWPJlr3iwUTwQpzgHQVzcpHR6NRIF9cU//uCIURG7A5Z1CEVXcuISN1BmYNjYqZe2EzDqQTN9wYgANMOMNCiS66GxjcvDcLoAUjeFQSzQ+yIL92+nAmQZQgqUoSmJPPG/zpAW/6E+R+0VcLAZheAPmq5j8Lm8/5cp84o6SLabccFcERfvDCiquzjehILnEb3dL91E7MzZgS53bFySGav5JChOamnX2ZgdsAtDzvtvoKbBstUjs2CVTveGrg3C77EbDNmVEyWElo9WelQqqQg9Ut1jGqSD6YMPr6dtoIRlZfx3HAR5FI45k03LwcdMg1Yt3D1I3pvG/AOZ5hHB8Ek9ZRPWX+criqMnNPlXRk3LObvNA1cR/n7xn36G1Z8nbVowDZBoarzdGoErr77Q1EYVmtGRCFE5I7TH3TuF5dta5nslqrzhVVVDY3i1CTELpCwonrfwkNkdVCCzW82rUZvfCvDs9gVPXNrEbt404Tzkh45N+mc5YHHIqu9MrKQif2iVVcpNUH7wU6/5hDHhPCTnhuJ8/ug7/5skbKMWwOUXGKBUliCShx1zdoZIeKIK4tV6RjgdRVEw/kHYks5Oiqej2amSWg+7htl+rufhsNNzCM1dZaT/xrDMmAfqnD9E347ya/BuAfSq5kxx8i9dSgxorf8ult7pR4cWzAcA== X-Forefront-Antispam-Report: CIP:216.228.118.233;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge2.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(396003)(39860400002)(136003)(346002)(376002)(451199015)(40470700004)(36840700001)(46966006)(478600001)(2616005)(110136005)(6636002)(5660300002)(8936002)(36756003)(6666004)(47076005)(26005)(36860700001)(186003)(316002)(1076003)(336012)(86362001)(7696005)(426003)(40480700001)(4326008)(8676002)(70586007)(54906003)(70206006)(82740400003)(41300700001)(40460700003)(2906002)(82310400005)(356005)(83380400001)(7636003);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Nov 2022 17:47:36.2459 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: e1fe64c0-438b-443e-e889-08dac01efdac X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.233];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT058.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH0PR12MB5330 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Shay Drory Since all the states that were mentioned in previous patches are transferred over the same FD, on one hand, and mlx5 keeps one data structure per state, on the other hand, mlx5 needs to manage the delta between FD position and the current state (data structure) transferred. Also, as mentioned in previous patch, user can switch VFIO device to STOP_COPY without transferring any data in PRE_COPY state. Hence, the delta management of the final state has a dedicated data structure. Signed-off-by: Shay Drory Signed-off-by: Yishai Hadas --- drivers/vfio/pci/mlx5/main.c | 115 ++++++++++++++++++++++++++++++----- 1 file changed, 100 insertions(+), 15 deletions(-) diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index 266626066fed..8a5714158e43 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -34,7 +34,7 @@ static struct mlx5vf_pci_core_device *mlx5vf_drvdata(struct pci_dev *pdev) static struct page * mlx5vf_get_migration_page(struct mlx5_vf_migration_file *migf, - unsigned long offset) + unsigned long offset, struct sg_append_table *table) { unsigned long cur_offset = 0; struct scatterlist *sg; @@ -43,14 +43,14 @@ mlx5vf_get_migration_page(struct mlx5_vf_migration_file *migf, /* All accesses are sequential */ if (offset < migf->last_offset || !migf->last_offset_sg) { migf->last_offset = 0; - migf->last_offset_sg = migf->table.sgt.sgl; + migf->last_offset_sg = table->sgt.sgl; migf->sg_last_entry = 0; } cur_offset = migf->last_offset; for_each_sg(migf->last_offset_sg, sg, - migf->table.sgt.orig_nents - migf->sg_last_entry, i) { + table->sgt.orig_nents - migf->sg_last_entry, i) { if (offset < sg->length + cur_offset) { migf->last_offset_sg = sg; migf->sg_last_entry += i; @@ -161,10 +161,45 @@ static int mlx5vf_release_file(struct inode *inode, struct file *filp) (mvdev->mig_state == VFIO_DEVICE_STATE_PRE_COPY || \ mvdev->mig_state == VFIO_DEVICE_STATE_PRE_COPY_P2P) +#define VFIO_PRE_COPY_SUPP(mvdev) \ + (mvdev->core_device.vdev.migration_flags & VFIO_MIGRATION_PRE_COPY) + +#define MIGF_HAS_DATA(migf) \ + (READ_ONCE(migf->image_length) || READ_ONCE(migf->final_length)) + +static size_t +mlx5vf_final_table_start_pos(struct mlx5_vf_migration_file *migf) +{ + return MIGF_TOTAL_DATA(migf) - migf->final_length; +} + +static size_t mlx5vf_get_table_start_pos(struct mlx5_vf_migration_file *migf) +{ + return migf->table_start_pos; +} + +static size_t mlx5vf_get_table_end_pos(struct mlx5_vf_migration_file *migf, + struct sg_append_table *table) +{ + if (table == &migf->final_table) + return MIGF_TOTAL_DATA(migf); + return migf->table_start_pos + migf->image_length; +} + +static struct sg_append_table * +mlx5vf_get_table(struct mlx5_vf_migration_file *migf, loff_t *pos) +{ + if (migf->final_length && + *pos >= mlx5vf_final_table_start_pos(migf)) + return &migf->final_table; + return &migf->table; +} + static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, loff_t *pos) { struct mlx5_vf_migration_file *migf = filp->private_data; + struct sg_append_table *table; ssize_t done = 0; if (pos) @@ -173,16 +208,16 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, if (!(filp->f_flags & O_NONBLOCK)) { if (wait_event_interruptible(migf->poll_wait, - READ_ONCE(migf->image_length) || migf->is_err)) + (MIGF_HAS_DATA(migf) || migf->is_err))) return -ERESTARTSYS; } mutex_lock(&migf->lock); - if ((filp->f_flags & O_NONBLOCK) && !READ_ONCE(migf->image_length)) { + if ((filp->f_flags & O_NONBLOCK) && !MIGF_HAS_DATA(migf)) { done = -EAGAIN; goto out_unlock; } - if (*pos > migf->image_length) { + if (*pos > MIGF_TOTAL_DATA(migf)) { done = -EINVAL; goto out_unlock; } @@ -191,16 +226,28 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, goto out_unlock; } - len = min_t(size_t, migf->image_length - *pos, len); + /* If we reach the end of the PRE_COPY size */ + if (MIGF_TOTAL_DATA(migf) == *pos && + VFIO_MIG_STATE_PRE_COPY(migf->mvdev)) { + done = -ENOMSG; + goto out_unlock; + } + + len = min_t(size_t, MIGF_TOTAL_DATA(migf) - *pos, len); + table = mlx5vf_get_table(migf, pos); while (len) { + struct sg_append_table *tmp = table; + unsigned long offset; size_t page_offset; struct page *page; size_t page_len; u8 *from_buff; int ret; - page_offset = (*pos) % PAGE_SIZE; - page = mlx5vf_get_migration_page(migf, *pos - page_offset); + offset = *pos - mlx5vf_get_table_start_pos(migf); + page_offset = offset % PAGE_SIZE; + offset -= page_offset; + page = mlx5vf_get_migration_page(migf, offset, table); if (!page) { if (done == 0) done = -EINVAL; @@ -208,6 +255,12 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, } page_len = min_t(size_t, len, PAGE_SIZE - page_offset); + /* + * In case an image is ended in the middle of the page, read + * until the end of the image and manage it. + */ + page_len = min_t(size_t, page_len, + mlx5vf_get_table_end_pos(migf, table) - *pos); from_buff = kmap_local_page(page); ret = copy_to_user(buf, from_buff + page_offset, page_len); kunmap_local(from_buff); @@ -219,6 +272,23 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, len -= page_len; done += page_len; buf += page_len; + /* + * In case we moved from PRE_COPY to STOP_COPY we need to prepare + * migf for final state when current state was fully transferred. + * Otherwise we might miss the final table and caller may get EOF + * by next read(). + */ + if (migf->final_table.sgt.sgl && + *pos == mlx5vf_final_table_start_pos(migf)) { + mlx5vf_prep_next_table(migf); + table = mlx5vf_get_table(migf, pos); + /* + * Check whether the SAVE command has finished and we + * have some extra data. + */ + if (tmp == table) + break; + } } out_unlock: @@ -237,7 +307,7 @@ static __poll_t mlx5vf_save_poll(struct file *filp, mutex_lock(&migf->lock); if (migf->disabled || migf->is_err) pollflags = EPOLLIN | EPOLLRDNORM | EPOLLRDHUP; - else if (READ_ONCE(migf->image_length)) + else if (MIGF_HAS_DATA(migf)) pollflags = EPOLLIN | EPOLLRDNORM; mutex_unlock(&migf->lock); @@ -380,20 +450,34 @@ static int mlx5vf_pci_save_device_inc_data(struct mlx5vf_pci_core_device *mvdev) if (ret) return ret; - if (migf->is_err) - return -ENODEV; - + mutex_lock(&migf->lock); + if (migf->is_err) { + ret = -ENODEV; + goto err; + } + /* + * We finished transferring the current state, prepare migf for final + * table. Otherwise we might miss the final table and caller may get + * EOF by next read(). + */ + if (migf->filp->f_pos == MIGF_TOTAL_DATA(migf)) + mlx5vf_prep_next_table(migf); ret = mlx5vf_add_migration_pages( migf, DIV_ROUND_UP_ULL(length, PAGE_SIZE), &migf->final_table); if (ret) { mlx5vf_mark_err(migf); - return ret; + goto err; } + mutex_unlock(&migf->lock); ret = mlx5vf_cmd_save_vhca_state(mvdev, migf, true, false); if (ret) mlx5vf_mark_err(migf); return ret; + +err: + mutex_unlock(&migf->lock); + return ret; } static struct mlx5_vf_migration_file * @@ -482,7 +566,8 @@ static ssize_t mlx5vf_resume_write(struct file *filp, const char __user *buf, int ret; page_offset = (*pos) % PAGE_SIZE; - page = mlx5vf_get_migration_page(migf, *pos - page_offset); + page = mlx5vf_get_migration_page(migf, *pos - page_offset, + &migf->table); if (!page) { if (done == 0) done = -EINVAL; From patchwork Sun Nov 6 17:46:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13033525 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C377EC4332F for ; Sun, 6 Nov 2022 17:48:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230508AbiKFRsF (ORCPT ); Sun, 6 Nov 2022 12:48:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45066 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230182AbiKFRrr (ORCPT ); Sun, 6 Nov 2022 12:47:47 -0500 Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2059.outbound.protection.outlook.com [40.107.93.59]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CC5DA643F for ; Sun, 6 Nov 2022 09:47:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=IEqOysOCR0+Fal1NgdLiB59i7JCL4YajMSqrb2Ny/gyABTTb5Ua3J/U6sUMiI0HQzweyCTPLOM5gdjMZcI6mo/aoYjktgv6a4ZihdLQwv1cKxueG0LZDSk4+bWb91/Loi3Esw9o4nzUhjWV20OcfLEE+LmXlFq2X2PaazDcYG/yAs9ZoQBdMDXnWvv5TpN9MRoWH2MfV7yvsCQK8V0qveKP8bZKeKPPaAJ2j75GVdqzDU/5jDAKcGUABZb6wMdb3iUaMXnWhLk+ixwToctT35nin4dAN20axOYzF66LPxT5irFa6MU3YBtXKuuusarQ+58GphfITY/XYcSib3FYsbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=DJQc+mXFz/oqtDITqBEJvX3FM0Qz3Kx3G6cS6v7sx0Q=; b=kqLuSY4iCitz5m7Q8HNBO7rQ5QhryGw6IKg5WyJjA0xgLXzR0G9QXEXD5U5g9PWJc3AW4oy0BWmnvMlP8RHp1Vey7GjYKT0lykqQ6dk4Vr0HfeGxgQLF1L1aiazAAisklyjB/Pb+WXavF3verIi5xJwym5gSgcQ79jW5Bg4iKwsqituqtClrmNIQl9zqf/RlGlTAjwvD3xAwUiWl+PfJCQZLcrZ8dEpkzTNOcm+ksGpvZtmNE7vWzaqkDJcG8bA/SyvZL1GthEBawUy8g5Sa292CpM/BurS5+Vg0hV7E1AQz2G/qixbjI/aWYzvHMVcaazZxiEYqoZctgikc4/eIjA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.233) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=DJQc+mXFz/oqtDITqBEJvX3FM0Qz3Kx3G6cS6v7sx0Q=; b=JcmH8ZFNHwMKJgN8hsnODegOzbG7fovkj9TtZFU95t2ADO6RBielOQH7dDdAo0FR+UXv9ynXRVBxqqHrYSRmlXmUOxKca9WozjshWeM5PgB+2eMAI/K0ZkbjKChNSn9Jq7hLox2IR+FbmjOOvu1tuqPEDkg4vZpZmq/PdlkED6hoYkSA4PCV/zCSY04hCTPXfTQUHdZAn0DP2EM6OBfo4GMII/K3jS1XbAxsxSNuT3fARP58PhhH3oGoUdfZeP9Bb3nUBfKK+EgxNyzQilCTVRwkBuIhb/2xIJ7wsUX5PTsj6DUg5p09zOSAsyYhwxAoFNgrwDfzAWqxFwlbdSqTuw== Received: from DM6PR06CA0010.namprd06.prod.outlook.com (2603:10b6:5:120::23) by PH7PR12MB5710.namprd12.prod.outlook.com (2603:10b6:510:1e1::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.25; Sun, 6 Nov 2022 17:47:39 +0000 Received: from DM6NAM11FT111.eop-nam11.prod.protection.outlook.com (2603:10b6:5:120:cafe::13) by DM6PR06CA0010.outlook.office365.com (2603:10b6:5:120::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.25 via Frontend Transport; Sun, 6 Nov 2022 17:47:39 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.233) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.233 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.233; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.233) by DM6NAM11FT111.mail.protection.outlook.com (10.13.173.26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.20 via Frontend Transport; Sun, 6 Nov 2022 17:47:39 +0000 Received: from drhqmail201.nvidia.com (10.126.190.180) by mail.nvidia.com (10.127.129.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.26; Sun, 6 Nov 2022 09:47:38 -0800 Received: from drhqmail202.nvidia.com (10.126.190.181) by drhqmail201.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.29; Sun, 6 Nov 2022 09:47:38 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.181) with Microsoft SMTP Server id 15.2.986.29 via Frontend Transport; Sun, 6 Nov 2022 09:47:35 -0800 From: Yishai Hadas To: , CC: , , , , , , , , Subject: [PATCH vfio 10/13] vfio/mlx5: Introduce SW headers for migration states Date: Sun, 6 Nov 2022 19:46:27 +0200 Message-ID: <20221106174630.25909-11-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221106174630.25909-1-yishaih@nvidia.com> References: <20221106174630.25909-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DM6NAM11FT111:EE_|PH7PR12MB5710:EE_ X-MS-Office365-Filtering-Correlation-Id: 3ba7ff8a-8aa4-4a4f-15e0-08dac01eff6d X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: oGJp1VzIKyKIPe6fskKtWUhkSkAmkkjgMlB3583LJgsbnZnhaoeGW8urlemGhLLGYnYnCDW/G/jV/JHktV6qZzzs8uyzEfCLxePHSjbanCQjywAzAgZf4xWxv136g5RiWi4Ajh/K/O4w6/PwlBYv7zBI+e+UqTUlLNHfOCW981frV4K76CzQFwVwu49BE9vjCzf8d35jSu9+bCxC+Trh0JwpAO7llTVXypna1jBBK6KFR+D92n92pV3Bj43Y6xegw/84XPJxhaaEKN4aqxCNxAVv3sJf3LIWME89le+C+16+Q6StKBOWFNPIzCnP9kU4FdAtQYhfKnWsKNUVl336yZTxPW3KrXFzYYXhJe9nUu9JwrD5ADHFHJm6+Qb7cMFdzq3bqQLb1BS78Egr+XQgmm/67MkrCFDBjVLtM6H45Mvj6V/2ilKV14wj2/0K4nJS41XSAB3ET7XPIFJgkkLQYx8OSRGnjD/93wPgKquTxTBB2K0GWTdpW4c/a2rOJ6+cAw8YZXr8xNLgaD53ikfL4+EPDB3J8kfKn8yqLp/ci2AAha4ICMsY2DLHm759Je7nFgUfn9JAMoCUMJyFytvhyjUhm44iaIdyhOyK6VmMXZLUTYgbKRaBqWFQVM0deIwclIXVu00L7jRdJUIkLPVYwmwGJS5J4FNZKSJkbqcF8hLV4Ae4PRgPrI0LgCaZ4b+nMD+aqSphP8fOZdLL9NY2IjNOu/qanR86QWfSeh9WeC/kpmTq4MHRETR2Qi7GwOEDD59WFPmWCfGAVnhftrmNuQ== X-Forefront-Antispam-Report: CIP:216.228.118.233;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge2.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(39860400002)(396003)(136003)(376002)(346002)(451199015)(40470700004)(46966006)(36840700001)(40480700001)(36756003)(86362001)(7636003)(356005)(6636002)(54906003)(316002)(36860700001)(82310400005)(478600001)(82740400003)(26005)(6666004)(70206006)(8676002)(4326008)(40460700003)(7696005)(110136005)(8936002)(5660300002)(70586007)(47076005)(1076003)(83380400001)(336012)(426003)(186003)(2616005)(41300700001)(2906002);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Nov 2022 17:47:39.1675 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 3ba7ff8a-8aa4-4a4f-15e0-08dac01eff6d X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.233];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT111.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH7PR12MB5710 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Shay Drory As mentioned in the previous patches, mlx5 is transferring multiple states when the PRE_COPY protocol is used. This states mechanism requires the target VM to know the states' size in order to execute multiple loads. Therefore, add SW header, with the needed information, for each saved state the source VM is transferring to the target VM. This patch implements the source VM handling of the headers, following patch will implement the target VM handling of the headers. Signed-off-by: Shay Drory Signed-off-by: Yishai Hadas --- drivers/vfio/pci/mlx5/cmd.h | 7 +++++ drivers/vfio/pci/mlx5/main.c | 50 +++++++++++++++++++++++++++++++++--- 2 files changed, 54 insertions(+), 3 deletions(-) diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index 07a2fc54c9d8..3b0411e4a74e 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -22,17 +22,24 @@ struct mlx5vf_async_data { void *out; }; +struct mlx5_vf_migration_header { + u32 image_size; + u32 reserved; +}; + struct mlx5_vf_migration_file { struct file *filp; struct mutex lock; u8 disabled:1; u8 is_err:1; u8 save_cb_active:1; + u8 header_read:1; struct sg_append_table table; size_t table_start_pos; size_t image_length; size_t allocated_length; + size_t sw_headers_bytes_sent; /* * The device can be moved to stop_copy before the previous state was * fully read. Another set of variables is needed to maintain it. diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index 8a5714158e43..c0ee121bd5ea 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -121,6 +121,7 @@ static void mlx5vf_prep_next_table(struct mlx5_vf_migration_file *migf) migf->image_length = 0; migf->allocated_length = 0; migf->last_offset_sg = NULL; + migf->header_read = false; } static void mlx5vf_disable_fd(struct mlx5_vf_migration_file *migf) @@ -155,7 +156,8 @@ static int mlx5vf_release_file(struct inode *inode, struct file *filp) } #define MIGF_TOTAL_DATA(migf) \ - (migf->table_start_pos + migf->image_length + migf->final_length) + (migf->table_start_pos + migf->image_length + migf->final_length + \ + migf->sw_headers_bytes_sent) #define VFIO_MIG_STATE_PRE_COPY(mvdev) \ (mvdev->mig_state == VFIO_DEVICE_STATE_PRE_COPY || \ @@ -175,7 +177,7 @@ mlx5vf_final_table_start_pos(struct mlx5_vf_migration_file *migf) static size_t mlx5vf_get_table_start_pos(struct mlx5_vf_migration_file *migf) { - return migf->table_start_pos; + return migf->table_start_pos + migf->sw_headers_bytes_sent; } static size_t mlx5vf_get_table_end_pos(struct mlx5_vf_migration_file *migf, @@ -183,7 +185,40 @@ static size_t mlx5vf_get_table_end_pos(struct mlx5_vf_migration_file *migf, { if (table == &migf->final_table) return MIGF_TOTAL_DATA(migf); - return migf->table_start_pos + migf->image_length; + return migf->table_start_pos + migf->image_length + + migf->sw_headers_bytes_sent; +} + +static void mlx5vf_send_sw_header(struct mlx5_vf_migration_file *migf, + loff_t *pos, char __user **buf, size_t *len, + ssize_t *done) +{ + struct mlx5_vf_migration_header header = {}; + size_t header_size = sizeof(header); + void *header_buf = &header; + size_t size_to_transfer; + + if (*pos >= mlx5vf_final_table_start_pos(migf)) + header.image_size = migf->final_length; + else + header.image_size = migf->image_length; + + size_to_transfer = header_size - + (migf->sw_headers_bytes_sent % header_size); + size_to_transfer = min_t(size_t, size_to_transfer, *len); + header_buf += header_size - size_to_transfer; + if (copy_to_user(*buf, header_buf, size_to_transfer)) { + *done = -EFAULT; + return; + } + + migf->sw_headers_bytes_sent += size_to_transfer; + migf->header_read = !(migf->sw_headers_bytes_sent % header_size); + + *pos += size_to_transfer; + *len -= size_to_transfer; + *done += size_to_transfer; + *buf += size_to_transfer; } static struct sg_append_table * @@ -233,6 +268,12 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, goto out_unlock; } + if (VFIO_PRE_COPY_SUPP(migf->mvdev) && !migf->header_read) { + mlx5vf_send_sw_header(migf, pos, &buf, &len, &done); + if (done < 0) + goto out_unlock; + } + len = min_t(size_t, MIGF_TOTAL_DATA(migf) - *pos, len); table = mlx5vf_get_table(migf, pos); while (len) { @@ -288,6 +329,9 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, */ if (tmp == table) break; + mlx5vf_send_sw_header(migf, pos, &buf, &len, &done); + if (done < 0) + goto out_unlock; } } From patchwork Sun Nov 6 17:46:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13033526 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6DF7AC43217 for ; Sun, 6 Nov 2022 17:48:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231134AbiKFRsI (ORCPT ); Sun, 6 Nov 2022 12:48:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44834 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230521AbiKFRru (ORCPT ); Sun, 6 Nov 2022 12:47:50 -0500 Received: from NAM04-BN8-obe.outbound.protection.outlook.com (mail-bn8nam04on2060.outbound.protection.outlook.com [40.107.100.60]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 834F86479 for ; Sun, 6 Nov 2022 09:47:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=dnyiN9v+y+r7PyT/lZGHD+qVjw9vqO7zLTMsiX7a3YHY9pKdBuhsrCG6825zGtTPJ34EF5jqNEkCIHJetyXjnk+ZhtdLb4FlQlxqeCQ6rHLW+RgLfnP7rplnJZOvHhfhry1+iQuXLFVOr2Ebxzzz6qP0kerxHB/eDsdHmsn71aXquNDbYCj/03qW62r2y6PGz0NtignVtVuHQa4VzyhhaHh/7w37xoHk7vkilOmnkZ6bCXDpL57W/ojZRYBdwywTFuAOJxotHGp9tLKhSY74EKvB182fMQOWiVFqcY/bMP4PYjjlGqLILDzQrzS8S8nM6X2R7cffRA7W4DZ3/Gu0gg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=jHdXOUOBTuHi96GiNfo/fIiJce9WDOz0BmHEeWMgRpQ=; b=H5I0B76AiNEL4a3rpAVwIHVlXSyL2VRgsOh6dFqc1rCLL6xD04mBlraFZdOD+gcpQJOevV4oECE57jMug2eamoiBc5JBw9GkGi3qxHy3GncTK2xCIsTnSQT4HZz6BezUtm585IUInMhYrPCSWfmVP7b2x8A5jUOJYgWWEDLsSmravRymYIQtphUFvkXk5shGs2R6GmMymwd76nLmsY6F3IDdpwrF689JDww/6bKB4l5/vgbBopFa72wvrjZidhZRqZG5n1g8xJgtKIbpPoCfTubxT3135qXQS/7vF7rY2x/TEfobbRlh8XxXRTeILbsMjG0Rwan4NGVNCU7+Ya/jZg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.232) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=jHdXOUOBTuHi96GiNfo/fIiJce9WDOz0BmHEeWMgRpQ=; b=CUjAW1/R0DzmMdGUjVG+yR4FO62C00aHckrn6Z4hygEgMmUCecY/NdMB14BXOTcWBXBRvB0y89abH2axxU89CnNeyoxqeNHFAfcWA39W9THnWMIJiwk8FNnTdoIw3fAAl64wzFaZpvICkxKHLuVHMvBcx5RRuPplupwQ6GN/xnPTVpDRHd3iyq3zJdqSUHhVCCpl/e+F9I5y0Pb8XWbKlmQ4+ZUdEpPMQjlDXx+iXEhaiVMRsLxLONcXbUUh+kY8tNZKtDXxnE6mE1NAmEA1Qc8JILRxEr8YZbl5sSZdhRZUeVJIKdFgjT3i0jMa186n7iVqFJ84VBNPn/fSn/YOnQ== Received: from BN0PR07CA0027.namprd07.prod.outlook.com (2603:10b6:408:141::29) by DM4PR12MB5261.namprd12.prod.outlook.com (2603:10b6:5:398::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.25; Sun, 6 Nov 2022 17:47:44 +0000 Received: from BN8NAM11FT056.eop-nam11.prod.protection.outlook.com (2603:10b6:408:141:cafe::67) by BN0PR07CA0027.outlook.office365.com (2603:10b6:408:141::29) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.26 via Frontend Transport; Sun, 6 Nov 2022 17:47:43 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.232) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.232 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.232; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.232) by BN8NAM11FT056.mail.protection.outlook.com (10.13.177.26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.20 via Frontend Transport; Sun, 6 Nov 2022 17:47:43 +0000 Received: from drhqmail201.nvidia.com (10.126.190.180) by mail.nvidia.com (10.127.129.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.26; Sun, 6 Nov 2022 09:47:41 -0800 Received: from drhqmail202.nvidia.com (10.126.190.181) by drhqmail201.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.29; Sun, 6 Nov 2022 09:47:41 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.181) with Microsoft SMTP Server id 15.2.986.29 via Frontend Transport; Sun, 6 Nov 2022 09:47:38 -0800 From: Yishai Hadas To: , CC: , , , , , , , , Subject: [PATCH vfio 11/13] vfio/mlx5: Introduce multiple loads Date: Sun, 6 Nov 2022 19:46:28 +0200 Message-ID: <20221106174630.25909-12-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221106174630.25909-1-yishaih@nvidia.com> References: <20221106174630.25909-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN8NAM11FT056:EE_|DM4PR12MB5261:EE_ X-MS-Office365-Filtering-Correlation-Id: 7736938b-269b-40a3-206a-08dac01f0200 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: hgLaTwnF80bKwXw+pV9byp1YMi85XeZM750cUZkAyBsS1U+A8yWHl0FICyEvlxFo6EkxSBSmFKkBXivZ6ZJR5ATxuM9kHdVaM8ZzN/gsVtNWhUCL/o88LRcZh3QuQlTdFjuNVWmQlt81sbr3myM0hitqu/bnf31idnmts5BEAEi4SqTnDJN7cy0mSSf2FqVOKDRhOKpRug8kYbtuUPvCZXPg1lc7aUAjLD45+OOoM08o1uCJtGzKb5xIn+mBvyBFgSrAqrCS8MTtmQ7JC6j5rTdrSywEj6+fqlioN9wuUu5zUHX2gIs2V2+yZjrUGnmh1JVDWcqJVmiLUJ8WMkLDBHg3INi/EuygDbQC7kmpMmOje64XQWjsW8UtVYN4321pXeQmdiYLG7+G2NiNQ+0iwGWqidpWETjwuQYh8Gvzk/GYXg+pDKCswRD0U7SoPuJUMyweZPFZ5syUJ9cs3PQo31bT+0uBDjBGE6+QL5mC4tGMQAtETveIJ6xYBXS3lboiYaA41KV2w7N/NSuCD3PyC6jqysr0tE/V2YSJuv3drvj0Sbvk4WiuA7C7UqTbjH8qfCLJqfh4Npl1EcUH4VNFYyiQJ7KwCJqIlYTzY+pHIkMpjQi6n7ZKp/Ld3fp4SsHFnrF6xslxeOqg7O5SymA8mZK+zHgbI3kdf0cgfTTnX4UU25rimKfNFDCXvaLjzsiS941BO+lYKzGwUPNgnn8JFt82JNWW8W0oTp9DFQ8gh4+ixaTQlSN43URNs7BudGTFN3H3VKcLZesJq6mnSh2W7Q== X-Forefront-Antispam-Report: CIP:216.228.118.232;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge1.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(136003)(396003)(39860400002)(376002)(346002)(451199015)(46966006)(40470700004)(36840700001)(66899015)(82310400005)(70586007)(41300700001)(4326008)(70206006)(8676002)(6636002)(54906003)(110136005)(7636003)(356005)(83380400001)(40480700001)(8936002)(478600001)(2906002)(316002)(36756003)(5660300002)(26005)(2616005)(336012)(186003)(426003)(47076005)(40460700003)(82740400003)(36860700001)(1076003)(86362001)(6666004)(7696005);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Nov 2022 17:47:43.4553 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 7736938b-269b-40a3-206a-08dac01f0200 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.232];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT056.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM4PR12MB5261 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Shay Drory In order to support PRE_COPY, mlx5 driver transfers multiple states (images) of the device. e.g.: the source VF can save and transfer multiple states, and the target VF will load them by that order. This patch implements the changes for the target VF to decompose the header for each state and to write and load multiple states. Signed-off-by: Shay Drory Signed-off-by: Yishai Hadas --- drivers/vfio/pci/mlx5/cmd.c | 12 ++--- drivers/vfio/pci/mlx5/cmd.h | 2 + drivers/vfio/pci/mlx5/main.c | 98 ++++++++++++++++++++++++++++++------ 3 files changed, 89 insertions(+), 23 deletions(-) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index 2d2171191218..a1b17cd688b9 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -420,16 +420,14 @@ int mlx5vf_cmd_load_vhca_state(struct mlx5vf_pci_core_device *mvdev, if (mvdev->mdev_detach) return -ENOTCONN; - mutex_lock(&migf->lock); - if (!migf->image_length) { - err = -EINVAL; - goto end; - } + lockdep_assert_held(&migf->lock); + if (!migf->image_length) + return -EINVAL; mdev = mvdev->mdev; err = mlx5_core_alloc_pd(mdev, &pdn); if (err) - goto end; + return err; err = dma_map_sgtable(mdev->device, &migf->table.sgt, DMA_TO_DEVICE, 0); if (err) @@ -454,8 +452,6 @@ int mlx5vf_cmd_load_vhca_state(struct mlx5vf_pci_core_device *mvdev, dma_unmap_sgtable(mdev->device, &migf->table.sgt, DMA_TO_DEVICE, 0); err_reg: mlx5_core_dealloc_pd(mdev, pdn); -end: - mutex_unlock(&migf->lock); return err; } diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index 3b0411e4a74e..03f3b5e99879 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -39,6 +39,8 @@ struct mlx5_vf_migration_file { size_t table_start_pos; size_t image_length; size_t allocated_length; + size_t expected_length; + struct mlx5_vf_migration_header header; size_t sw_headers_bytes_sent; /* * The device can be moved to stop_copy before the previous state was diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index c0ee121bd5ea..6cdd4fc93818 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -569,12 +569,45 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev, bool track) return ERR_PTR(ret); } +static void mlx5vf_recv_sw_header(struct mlx5_vf_migration_file *migf, + loff_t *pos, const char __user **buf, + size_t *len, ssize_t *done) +{ + ssize_t header_size = sizeof(migf->header); + void *header_buf = &migf->header; + size_t size_to_recv; + + size_to_recv = header_size - (migf->sw_headers_bytes_sent % header_size); + size_to_recv = min_t(size_t, size_to_recv, *len); + header_buf += header_size - size_to_recv; + if (copy_from_user(header_buf, *buf, size_to_recv)) { + *done = -EFAULT; + return; + } + + *pos += size_to_recv; + *len -= size_to_recv; + *done += size_to_recv; + *buf += size_to_recv; + migf->sw_headers_bytes_sent += size_to_recv; + migf->header_read = !(migf->sw_headers_bytes_sent % header_size); + + if (migf->sw_headers_bytes_sent % header_size) + return; + migf->expected_length = migf->header.image_size; +} + +#define EXPECTED_TABLE_END_POSITION(migf) \ + (migf->table_start_pos + migf->expected_length + \ + migf->sw_headers_bytes_sent) + static ssize_t mlx5vf_resume_write(struct file *filp, const char __user *buf, size_t len, loff_t *pos) { struct mlx5_vf_migration_file *migf = filp->private_data; loff_t requested_length; ssize_t done = 0; + int ret = 0; if (pos) return -ESPIPE; @@ -584,33 +617,47 @@ static ssize_t mlx5vf_resume_write(struct file *filp, const char __user *buf, check_add_overflow((loff_t)len, *pos, &requested_length)) return -EINVAL; - if (requested_length > MAX_MIGRATION_SIZE) - return -ENOMEM; - + mutex_lock(&migf->mvdev->state_mutex); mutex_lock(&migf->lock); + requested_length -= migf->table_start_pos; + if (requested_length > MAX_MIGRATION_SIZE) { + ret = -ENOMEM; + goto out_unlock; + } + if (migf->disabled) { - done = -ENODEV; + ret = -ENODEV; goto out_unlock; } +start_over: if (migf->allocated_length < requested_length) { - done = mlx5vf_add_migration_pages( + ret = mlx5vf_add_migration_pages( migf, DIV_ROUND_UP(requested_length - migf->allocated_length, PAGE_SIZE), &migf->table); - if (done) + if (ret) + goto out_unlock; + } + + if (VFIO_PRE_COPY_SUPP(migf->mvdev)) { + if (!migf->header_read) + mlx5vf_recv_sw_header(migf, pos, &buf, &len, &done); + if (done < 0) goto out_unlock; } while (len) { + unsigned long offset; size_t page_offset; struct page *page; size_t page_len; u8 *to_buff; - int ret; - page_offset = (*pos) % PAGE_SIZE; - page = mlx5vf_get_migration_page(migf, *pos - page_offset, + offset = *pos - mlx5vf_get_table_start_pos(migf); + page_offset = offset % PAGE_SIZE; + offset -= page_offset; + page = mlx5vf_get_migration_page(migf, offset, &migf->table); if (!page) { if (done == 0) @@ -619,11 +666,15 @@ static ssize_t mlx5vf_resume_write(struct file *filp, const char __user *buf, } page_len = min_t(size_t, len, PAGE_SIZE - page_offset); + if (VFIO_PRE_COPY_SUPP(migf->mvdev)) + page_len = min_t(size_t, page_len, + EXPECTED_TABLE_END_POSITION(migf) - *pos); + to_buff = kmap_local_page(page); ret = copy_from_user(to_buff + page_offset, buf, page_len); kunmap_local(to_buff); if (ret) { - done = -EFAULT; + ret = -EFAULT; goto out_unlock; } *pos += page_len; @@ -631,10 +682,22 @@ static ssize_t mlx5vf_resume_write(struct file *filp, const char __user *buf, done += page_len; buf += page_len; migf->image_length += page_len; + + if (*pos == EXPECTED_TABLE_END_POSITION(migf)) { + ret = mlx5vf_cmd_load_vhca_state(migf->mvdev, migf); + if (ret) + goto out_unlock; + mlx5vf_prep_next_table(migf); + if (len) { + requested_length -= migf->expected_length; + goto start_over; + } + } } out_unlock: mutex_unlock(&migf->lock); - return done; + mlx5vf_state_mutex_unlock(migf->mvdev); + return ret ? ret : done; } static const struct file_operations mlx5vf_resume_fops = { @@ -663,6 +726,7 @@ mlx5vf_pci_resume_device_data(struct mlx5vf_pci_core_device *mvdev) } stream_open(migf->filp->f_inode, migf->filp); mutex_init(&migf->lock); + migf->mvdev = mvdev; return migf; } @@ -754,10 +818,14 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev, } if (cur == VFIO_DEVICE_STATE_RESUMING && new == VFIO_DEVICE_STATE_STOP) { - ret = mlx5vf_cmd_load_vhca_state(mvdev, - mvdev->resuming_migf); - if (ret) - return ERR_PTR(ret); + if (!VFIO_PRE_COPY_SUPP(mvdev)) { + mutex_lock(&mvdev->resuming_migf->lock); + ret = mlx5vf_cmd_load_vhca_state(mvdev, + mvdev->resuming_migf); + mutex_unlock(&mvdev->resuming_migf->lock); + if (ret) + return ERR_PTR(ret); + } mlx5vf_disable_fds(mvdev); return NULL; } From patchwork Sun Nov 6 17:46:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13033527 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 925C4C4332F for ; Sun, 6 Nov 2022 17:48:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231209AbiKFRsJ (ORCPT ); Sun, 6 Nov 2022 12:48:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45468 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231200AbiKFRr5 (ORCPT ); Sun, 6 Nov 2022 12:47:57 -0500 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2086.outbound.protection.outlook.com [40.107.92.86]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CB34364C6 for ; Sun, 6 Nov 2022 09:47:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=WpFAQalyLxF2AFp5sBayXjZOI3Lw2xRF0amHWFENCNIce33aLdprtVeyWkNlZ44kSc8kMfsfr2Pkmx1louz/sAFWFX0MKd9BvBijEu+otWZqpt9EGYS0QquYPAiLrXUfs40bDoc+PqzGhQ1aloPOeiRnxJUrEsa+sBQMGtJbWuVDXhZDe6A111IhuejTd655iYq80F0iPZGf49BdnhY25caDO8VLzmHxB3TzSQPFqkaSrpdvcp5270/bm7nPzuyqWJmaGx85MjVhMU9dlWLcWJ8VMXiyZXElKorffJ0BGYV0dACZJ0xt3LMtYnDNu3AYiGwc2w7iBtvMdhBNeAv9eg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=UCqDW9PzqugI2zfH/SPXxiR9pQahanoteTIeDWiYOuM=; b=XoJNC6fDISH3kijaJ0qN9eEBfvUNwehtOIW4DGLqAqZzMr7MQLlEscdnU6vI7+79WWFLTXZVr0kxMLfq7o61GKfhg7AaKAMU0a7JeIpprhRnXq3nSl4vY0i9iQ7eD5+NKftvshnlq2lmEQ5aDJKWQGyBduqrsE7xffeYzSjE/0m3N7V69x0w2cxoZOuXy+fF59Hq+8+dFjNbq8pUfuL2cX//nrgDcxJOjubbfrMlJqpU/CIJoSbN1BWCar22wcyAk1tGWTGM5RtFBMtEomEdJrt3WBXnvD0iiNblDrow8bvkJUggk3M/4MHF/fOHhA9yNi1sIBHd1RvfefEeuT0G9A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.232) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=UCqDW9PzqugI2zfH/SPXxiR9pQahanoteTIeDWiYOuM=; b=M+2/HG69C0FsRDsSUxwR4BK5mwrid0cCgrqxZrtdj1BiU9spSljPXoCERt3qvtREnH5xu2Ti9b3j26P5U/TPipOOJBIFFmqbeaMQXLtPgzfsCemNweyex5LwZ1fGzhSAl0wRSqZSVqCLQPYjYvZE2P6Haoy1wWLtx1bEvs6J8m9iUmiciby9oGvmn1uHaC5pS+kbyx5Nkqwj6itgwcHMxBcafey6aeiBp7BwW8db14U0XyTvCi+Hv3sND5AJplb9nAR9AJVa0W3HS9IWkepWIDOwmkrRUC9BTBv+dqdPQ24ml0p0svmIQzl0kAW6UI5Btl03RW9X60pOiwRLJQufvg== Received: from BN0PR04CA0197.namprd04.prod.outlook.com (2603:10b6:408:e9::22) by PH7PR12MB5950.namprd12.prod.outlook.com (2603:10b6:510:1d9::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.22; Sun, 6 Nov 2022 17:47:46 +0000 Received: from BN8NAM11FT083.eop-nam11.prod.protection.outlook.com (2603:10b6:408:e9:cafe::48) by BN0PR04CA0197.outlook.office365.com (2603:10b6:408:e9::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.25 via Frontend Transport; Sun, 6 Nov 2022 17:47:45 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.232) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.232 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.232; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.232) by BN8NAM11FT083.mail.protection.outlook.com (10.13.177.75) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.20 via Frontend Transport; Sun, 6 Nov 2022 17:47:45 +0000 Received: from drhqmail201.nvidia.com (10.126.190.180) by mail.nvidia.com (10.127.129.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.26; Sun, 6 Nov 2022 09:47:44 -0800 Received: from drhqmail202.nvidia.com (10.126.190.181) by drhqmail201.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.29; Sun, 6 Nov 2022 09:47:44 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.181) with Microsoft SMTP Server id 15.2.986.29 via Frontend Transport; Sun, 6 Nov 2022 09:47:41 -0800 From: Yishai Hadas To: , CC: , , , , , , , , Subject: [PATCH vfio 12/13] vfio/mlx5: Fallback to STOP_COPY upon specific PRE_COPY error Date: Sun, 6 Nov 2022 19:46:29 +0200 Message-ID: <20221106174630.25909-13-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221106174630.25909-1-yishaih@nvidia.com> References: <20221106174630.25909-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN8NAM11FT083:EE_|PH7PR12MB5950:EE_ X-MS-Office365-Filtering-Correlation-Id: 9fda320d-e281-40ae-517c-08dac01f034c X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: bcZmRZFL87Rav4a1R2KW9Gyw8q2drJ6HeqdiZ4jN/1SLZ4Z4A8ujQ8NXzooJLhG7naGB6SYAOX++UtVnmJOMcfiTmAJEnrNo4PpjnTwLRk+fzt3Z104wgM2c5BWiTMX/juu3+8jmsg1kfRs8154wB4l4DiZs462+SuW+H1LXwaLWaJhcCG6/tcH1xubxhBSNa3YfvpEE/cZqOa6wyUO/UrcOfjrnp66W+bLKRslLZiKu7ZaF19/AaIVjBPLPEsn8KrZft2W3CBWAFGkhqZFrh8zhYjPoGmayHpm3pWxHWxXS48kkpckbea+wkbwjKSSTLVyfNfVLb6pMQpo/9ZcMUM3JKrjbj8yIPXqhMYO0OIPEuN6g0G897vG9rqGvN7Uq5EpJbmFXu7DUH+Pd9TNBpj0NvNCXd3qYM2GaUxygyTrtMZft9DpcLw4KIhQVo1RtU6X96biaX0nA3IRpipxi6sZsvimCBg81bdp7fUVcGWiCUMcgtxbii3b1SYChJsVpYS7bcs/kje14GxY7dnq8Oyf7if7r2gHzkXR/r/3ALFuIBReAvTACZSwj1JQmtgZV3CGnUM+rpUEQw9i80nW3L2qbP9AXhQQdabb8H3cEIV2nJsPsH71UcS2RmEq2SiKPpdmoUlpEEzwvaSAkiI7vrLz4DmvnjeHXpN+MP1p3iloP87g1QrKwahwd6U+Mc4HVpP7gO72wr5ALC3JJrp9C2ahED5marR1mXMHF9cBw0/CBln9tBNHj+djVDkGABxTVEPZMyt3NelRrZu5sqZCJbg== X-Forefront-Antispam-Report: CIP:216.228.118.232;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge1.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(39860400002)(396003)(376002)(346002)(136003)(451199015)(40470700004)(46966006)(36840700001)(40460700003)(86362001)(356005)(6666004)(316002)(82740400003)(7696005)(82310400005)(36756003)(70206006)(8676002)(41300700001)(70586007)(4326008)(6636002)(54906003)(36860700001)(26005)(7636003)(186003)(2616005)(336012)(478600001)(1076003)(40480700001)(5660300002)(2906002)(83380400001)(110136005)(47076005)(8936002)(426003);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Nov 2022 17:47:45.6167 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 9fda320d-e281-40ae-517c-08dac01f034c X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.232];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT083.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH7PR12MB5950 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Shay Drory Before a SAVE command is issued, a QUERY command is issued in order to know the device data size. In case PRE_COPY is used, the above commands are issued while the device is running. Thus, it is possible that between the QUERY and the SAVE commands the state of the device will be changed significantly and thus the SAVE will fail. Currently, if a SAVE command is failing, the driver will fail the migration. In the above case, don't fail the migration, but don't allow for new SAVEs to be executed while the device is in a RUNNING state. Once the device will be moved to STOP_COPY, SAVE can be executed again and the full device state will be read. Signed-off-by: Shay Drory Signed-off-by: Yishai Hadas --- drivers/vfio/pci/mlx5/cmd.c | 30 ++++++++++++++++++++++++++++-- drivers/vfio/pci/mlx5/cmd.h | 2 ++ drivers/vfio/pci/mlx5/main.c | 7 ++++--- 3 files changed, 34 insertions(+), 5 deletions(-) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index a1b17cd688b9..ef1c141dc5e0 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -70,9 +70,18 @@ int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev, * Running both in parallel, might end-up with a failure in the * incremental query command on un-tracked vhca. */ - if (query_flags & MLX5VF_QUERY_INC) + if (query_flags & MLX5VF_QUERY_INC) { wait_event(mvdev->saving_migf->save_wait, !mvdev->saving_migf->save_cb_active); + if (mvdev->saving_migf->precopy_err) { + /* In case we had a PRE_COPY error, only query full image for final image */ + if (!(query_flags & MLX5VF_QUERY_FINAL)) { + *state_size = 0; + return 0; + } + query_flags &= ~MLX5VF_QUERY_INC; + } + } MLX5_SET(query_vhca_migration_state_in, in, opcode, MLX5_CMD_OP_QUERY_VHCA_MIGRATION_STATE); MLX5_SET(query_vhca_migration_state_in, in, vhca_id, mvdev->vhca_id); @@ -291,7 +300,10 @@ void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work) mutex_lock(&migf->lock); if (async_data->status) { - migf->is_err = true; + if (async_data->status == MLX5_CMD_STAT_BAD_RES_STATE_ERR) + migf->precopy_err = true; + else + migf->is_err = true; wake_up_interruptible(&migf->poll_wait); } mutex_unlock(&migf->lock); @@ -328,6 +340,8 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context) * The error and the cleanup flows can't run from an * interrupt context */ + if (status == -EREMOTEIO) + status = MLX5_GET(save_vhca_state_out, async_data->out, status); async_data->status = status; queue_work(migf->mvdev->cb_wq, &async_data->work); } @@ -356,6 +370,18 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, async_data = &migf->async_data; async_data->sgt = (!track && inc) ? &migf->final_table.sgt : &migf->table.sgt; + if (migf->precopy_err) { + /* + * In case we had a PRE_COPY error, SAVE is triggered only for + * the final image, read device full image. + */ + inc = false; + /* + * Turn off precopy_err to let reader proceed only once this + * SAVE call is completed, otherwise final state might be lost. + */ + migf->precopy_err = false; + } err = dma_map_sgtable(mdev->device, async_data->sgt, DMA_FROM_DEVICE, 0); if (err) diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index 03f3b5e99879..784670848a7c 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -32,6 +32,7 @@ struct mlx5_vf_migration_file { struct mutex lock; u8 disabled:1; u8 is_err:1; + u8 precopy_err:1; u8 save_cb_active:1; u8 header_read:1; @@ -134,6 +135,7 @@ struct mlx5vf_pci_core_device { enum { MLX5VF_QUERY_INC = (1UL << 0), + MLX5VF_QUERY_FINAL = (1UL << 1), }; int mlx5vf_cmd_suspend_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod); diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index 6cdd4fc93818..db2d0166a0f5 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -243,7 +243,8 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, if (!(filp->f_flags & O_NONBLOCK)) { if (wait_event_interruptible(migf->poll_wait, - (MIGF_HAS_DATA(migf) || migf->is_err))) + (MIGF_HAS_DATA(migf) || migf->is_err || + migf->precopy_err))) return -ERESTARTSYS; } @@ -351,7 +352,7 @@ static __poll_t mlx5vf_save_poll(struct file *filp, mutex_lock(&migf->lock); if (migf->disabled || migf->is_err) pollflags = EPOLLIN | EPOLLRDNORM | EPOLLRDHUP; - else if (MIGF_HAS_DATA(migf)) + else if (MIGF_HAS_DATA(migf) || migf->precopy_err) pollflags = EPOLLIN | EPOLLRDNORM; mutex_unlock(&migf->lock); @@ -490,7 +491,7 @@ static int mlx5vf_pci_save_device_inc_data(struct mlx5vf_pci_core_device *mvdev) int ret; ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length, - MLX5VF_QUERY_INC); + MLX5VF_QUERY_INC | MLX5VF_QUERY_FINAL); if (ret) return ret; From patchwork Sun Nov 6 17:46:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13033528 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC0A6C433FE for ; Sun, 6 Nov 2022 17:48:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231292AbiKFRsM (ORCPT ); Sun, 6 Nov 2022 12:48:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45222 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230379AbiKFRr7 (ORCPT ); Sun, 6 Nov 2022 12:47:59 -0500 Received: from NAM02-DM3-obe.outbound.protection.outlook.com (mail-dm3nam02on2082.outbound.protection.outlook.com [40.107.95.82]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A8D7B6569 for ; Sun, 6 Nov 2022 09:47:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=YEcl+mpwPtLdQzkgx1C5YY4a2ydaTRukvS5ERNV5TdCqcqNdYFTcPWvRY98DOQM79/ICQ/1OSUU2BPpdbgOTY7ALpAHmGgDZ96/gotKGLmeytBngoYFvPEeRN1SBg3gUqd+9QMc5StzAXd/LlHlmzOlSe37LcbT3jlhORH0OnY9gNCXGCrXM0sxbU2PlEVyQq8NLWIQr1NztG+NbSD0XJXEePIG+834JuWa8b3q4V3b63rwf7Pr5122XdxsisYsU/T7nlaz9rSK93cmk7oUXZo6unxLG+pOc9Fe81beyeuV+9C3Jkj4sLy9CcQwW3XP4zolVR+MV7aQJh8khguZN7g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=f61bF3voGi2g7EHk23iYWXebgdkBJ7TNNjnngTUGQo4=; b=cO1EDXcEZdhUvTfKLizll3OG1GKoaNK5QtcoRIHZF2Io7JUdTNEiEsfHg+GjzW6HbLEfOotfh4DOU+z5e7nyl+ohLK9r0SJQoLYf6IUZIPvNbjTzZfhxobj6OY+wKd+nXd/I5ur0fBh/EWJfL29W3etU/6oUlXO1wRctVtFDcti9IVbmoa7mPWhMx04W40Ckh7sDnPZyELv46xh+pDgT0itnvTzc8mhAW9yNKau93kpzU9WTcUlUlDlyOulrXCGFaCiN2nelcClMswRcCNdJrTCFVFSTlAoYMzOOr0Bs/CUlGO7WM4C4DTCcaMaemMuN0qoH2taqrUlTClgUZbix1g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.232) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=f61bF3voGi2g7EHk23iYWXebgdkBJ7TNNjnngTUGQo4=; b=BkR/HyOJo+Ekodq81tSJjWICUGUZa8JF8+lMvnj54K9LC1YxbpO71kOf9sD3bM2i6eLzzQ+tLw6sAOt+Qz/Wnf1qad01QzjGk0aS3Njox8nuEOu3D6SPW6hQd4d3OG7UjZRuGWLtBtxzDkNCIpU1XfOjIEagHvfWXdqvnEX8KJ/KpjuRHr7aSUfFPC7UBwY+jHL3c6gCrKNxxfK7vkHVg2ejgLK3q2AGTLNpYkjOuZqbdm0NFv9iN0S669vqk+lfCIzjV87fvJlZJnfu8rQ8gjVSvR9+w2aYkx5KdYAqM/UiutheiMUJU+ltUnDK/FAyg9IDKkvTFA7znATP5S4OmQ== Received: from BN0PR04CA0182.namprd04.prod.outlook.com (2603:10b6:408:e9::7) by DM4PR12MB5963.namprd12.prod.outlook.com (2603:10b6:8:6a::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.26; Sun, 6 Nov 2022 17:47:49 +0000 Received: from BN8NAM11FT083.eop-nam11.prod.protection.outlook.com (2603:10b6:408:e9:cafe::3a) by BN0PR04CA0182.outlook.office365.com (2603:10b6:408:e9::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.25 via Frontend Transport; Sun, 6 Nov 2022 17:47:49 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.232) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.232 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.232; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.118.232) by BN8NAM11FT083.mail.protection.outlook.com (10.13.177.75) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5791.20 via Frontend Transport; Sun, 6 Nov 2022 17:47:49 +0000 Received: from drhqmail201.nvidia.com (10.126.190.180) by mail.nvidia.com (10.127.129.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.26; Sun, 6 Nov 2022 09:47:48 -0800 Received: from drhqmail202.nvidia.com (10.126.190.181) by drhqmail201.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.29; Sun, 6 Nov 2022 09:47:47 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.126.190.181) with Microsoft SMTP Server id 15.2.986.29 via Frontend Transport; Sun, 6 Nov 2022 09:47:45 -0800 From: Yishai Hadas To: , CC: , , , , , , , , Subject: [PATCH vfio 13/13] vfio/mlx5: Enable MIGRATION_PRE_COPY flag Date: Sun, 6 Nov 2022 19:46:30 +0200 Message-ID: <20221106174630.25909-14-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221106174630.25909-1-yishaih@nvidia.com> References: <20221106174630.25909-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN8NAM11FT083:EE_|DM4PR12MB5963:EE_ X-MS-Office365-Filtering-Correlation-Id: b034f91c-861b-40ae-1a87-08dac01f0586 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: ZzgRpMcSorrOOkyd9eUDEP4XAtZsBp6uakr3JSFLjVF2t2DPdqOT5D1pEUZZ7SZ854RI91srSUwDlSDZ+1IhA8z6VScD1QuwrsKTs3EuWkgQf1tqBrr83R9o9vJLdKAXDayActJKAzyFnvkya8UVNrEmD/Pc3iVs5r8CmbzrtsE11c4zcVQEvVkieH6RMylOvWuImUfJQLZaoLlYa1YyXso0+yh/TeHKiXpe602ThttcQPQwjyNhbtO+mkWO6MW3/8wQEn7GOF3oKdKEny4edhPXYAn148UwDNyo/S1PDaOqwyYw6148yNXp3z5f7AmJLA+CCweDy6RYebivTa92fQaA0DhbYc1lorXVZuxwY8GQdABGOwa9EPTmt6i7mu0+6kDVISW/Pv3nz5c2taYOW+YMCKANH0E4stzBi1ffD+D7KFlgtAxjVk6W/HQTKE5+EaO0eVLf1lWjMhv+cAr3yck3FIq3xRUZzK+/izhvYeBErPszp8RRcVhEGg098sfwr/tmrr2ga/5m0z9ZqwQQs886qbmzFZOmz/pXyM7iDUiAHA4ktCcnt/qZICKX77l/e9vWQBEX3xlK+izVWfOqA3OEAow3dJekTpptxi3euLgzfoPzmhiMHzSMPMOdvJ9ikTgwJb2VgSO7L5gRTS8WshnjfZ/LO4IG4uGElnFWnMX5Guz+KkF8DI4fVKHRMjUx91SOOFNM79yWMZeGzptEpplNhXaO9N7aWWJiNPoavDV1rNtQfbJ6EoVboMVcIIaVjray2B0FfA9MxnFcde7Utg== X-Forefront-Antispam-Report: CIP:216.228.118.232;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc7edge1.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(39860400002)(376002)(346002)(136003)(396003)(451199015)(40470700004)(46966006)(36840700001)(478600001)(40480700001)(4744005)(8936002)(36756003)(5660300002)(2906002)(316002)(6636002)(54906003)(110136005)(41300700001)(70586007)(8676002)(70206006)(4326008)(7636003)(356005)(36860700001)(82740400003)(7696005)(6666004)(86362001)(186003)(40460700003)(1076003)(426003)(47076005)(26005)(2616005)(336012)(82310400005);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Nov 2022 17:47:49.3509 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: b034f91c-861b-40ae-1a87-08dac01f0586 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.232];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT083.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM4PR12MB5963 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Shay Drory Now that everything has been set up for MIGRATION_PRE_COPY, enable it. Signed-off-by: Shay Drory Signed-off-by: Yishai Hadas --- drivers/vfio/pci/mlx5/cmd.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index ef1c141dc5e0..8c3bb706f630 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -202,6 +202,11 @@ void mlx5vf_cmd_set_migratable(struct mlx5vf_pci_core_device *mvdev, if (MLX5_CAP_GEN(mvdev->mdev, adv_virtualization)) mvdev->core_device.vdev.log_ops = log_ops; + if (MLX5_CAP_GEN_2(mvdev->mdev, migration_multi_load) && + MLX5_CAP_GEN_2(mvdev->mdev, migration_tracking_state)) + mvdev->core_device.vdev.migration_flags |= + VFIO_MIGRATION_PRE_COPY; + end: mlx5_vf_put_core_dev(mvdev->mdev); }