From patchwork Mon Dec 5 14:48:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13064575 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07E37C47089 for ; Mon, 5 Dec 2022 14:49:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230350AbiLEOtU (ORCPT ); Mon, 5 Dec 2022 09:49:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53010 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230199AbiLEOtR (ORCPT ); Mon, 5 Dec 2022 09:49:17 -0500 Received: from NAM02-SN1-obe.outbound.protection.outlook.com (mail-sn1nam02on2067.outbound.protection.outlook.com [40.107.96.67]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C3141B7A1 for ; Mon, 5 Dec 2022 06:49:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=nHfB3p1E04QqSVvHet2kE4kFlLj8LfWqugNDFfp2VEn8IX083rGSKPX1pCQlVDSLeS4s4NxpZLKt8qvlaOVpXPjR6JW2WLNrwe5XmLy0K32AaluwS35W4Ca/7eGzWEhuLkXyUQI38I3vSeU0L8odsO7pkWVY0Ipf+rnQzzhCJ+LpVcPx86uLGpXFC2wpGPVhgwmZa5xmmP+ExAt1IIBnCtTvj8qDJ2bXMEHY0QVU0cIWlcJd28fsn02DA5Z7nNaGW4r9JwAWCWFhAQ0vKmSWGE5Kt8uDOcqff9xb0NPoDVktnnx4953ikwNMZhnyMU9uHM3sEQrTT+6MCAB6adJlQQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=atS9WtVhHIUg8aIzy0neojIfiBRjVu88h+ixljUF16s=; b=bXyhI0YSlPlJsu6Z7Z1KaBtNrbstqq7ie9eCqmuGCTQ3TIh0CViC+TRPp9kP8Yeut9fhyCofQh5bHOtFEp+MSH0AAQ3kxOqYgfhbhYyVxsIL/taAxwmk06vfiop5kafcNFoSMWIgqJkoBW5877fQ8epEz7TXS4d69XdJ7PSYI1SyCeo30ZdwkYQtw1kdI/vC3HQFDxOHeEgHiFC5Zo8oR4Q9oS9uhky9SUM2v+u3rsWsocmjXS0jKKKVfup26155qc0I6iwXnN1XDWr+B2ZjTrlCljl+KzMzYyRiI8zjql1tWv1Geb+vRfUGw5yWp6LU8caPs+0plPxaqCNlgpDEPg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=atS9WtVhHIUg8aIzy0neojIfiBRjVu88h+ixljUF16s=; b=Ky5qb1rBCrOqEY765KKFIN0rqnv2wy5QwlvDz7RfWLDfXFZnosVZRrOzwwMj/GNu5vkFH1MFGtAWLOSQ8ZHyIZ0z4elU64Z6Z7bcIXnosy+WCK6gy9Uyzi34Vdiq9UH9AX2SsjO551VnY1bt9rJZHEjsFxGj5WUioA9m6TijTdBadRJwhe5UEpUkw3krtS343iZL8qHY9+qObVr++RoDrEhqwqomrnTaF7BdFRrT6AI09VckKlSXZ9EoXuF+ThHHgAyqOKF4/RaSsAEDQ6YnbRQ0QgZLln2tYLheHtef0314nIK40EjQ1+78z0Rh1pCsOx42GQRYz37depLnddYA5A== Received: from CY5PR13CA0046.namprd13.prod.outlook.com (2603:10b6:930:11::30) by CY5PR12MB6252.namprd12.prod.outlook.com (2603:10b6:930:20::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.14; Mon, 5 Dec 2022 14:49:15 +0000 Received: from CY4PEPF0000B8EA.namprd05.prod.outlook.com (2603:10b6:930:11:cafe::b7) by CY5PR13CA0046.outlook.office365.com (2603:10b6:930:11::30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5901.8 via Frontend Transport; Mon, 5 Dec 2022 14:49:15 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by CY4PEPF0000B8EA.mail.protection.outlook.com (10.167.241.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5813.11 via Frontend Transport; Mon, 5 Dec 2022 14:49:14 +0000 Received: from rnnvmail205.nvidia.com (10.129.68.10) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Mon, 5 Dec 2022 06:49:06 -0800 Received: from rnnvmail205.nvidia.com (10.129.68.10) by rnnvmail205.nvidia.com (10.129.68.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Mon, 5 Dec 2022 06:49:06 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.129.68.10) with Microsoft SMTP Server id 15.2.986.36 via Frontend Transport; Mon, 5 Dec 2022 06:49:03 -0800 From: Yishai Hadas To: , CC: , , , , , , , , , Subject: [PATCH V3 vfio 01/14] net/mlx5: Introduce ifc bits for pre_copy Date: Mon, 5 Dec 2022 16:48:25 +0200 Message-ID: <20221205144838.245287-2-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221205144838.245287-1-yishaih@nvidia.com> References: <20221205144838.245287-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY4PEPF0000B8EA:EE_|CY5PR12MB6252:EE_ X-MS-Office365-Filtering-Correlation-Id: 6d2fe2fe-9a76-43d7-18f1-08dad6cfe138 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: mEYU3KvEj0/9vsrQg/swwwOZDb9oz13gRdwb+xrOoitu2mqsb0ITJsqinPJ6+b+GiWphV5rHRtcATQBuN4JIvCoMUg70RiDbPcWiKVGjHO011aKr774TWecTlEkJaz0VjMZ7W730rebhJ1BGhjQrQRt8Za4RH2VgedyGVuhOUvZAuCUHlt8fZFoW0dPpTiE+o/p/y0d7axcH95i75g3jr/7AHWzx37EClOZ9gTiI4mhpCoFdjy7tcqnDYejN85tSJFGvOTsk0CU7AlPRhZ5CiLWtf4V9McYs9A8WRatkSixzqVG0zz8UQGwU33ZroQAVBr+ehW7ljdsKpT1BR75Toh82MxRKlH7ymCSmrbv9Z5kUKkNNaaY8AVJH58fyy1i+jDAiGPpjHoKjN7km2KM/C3HR/ayh10xVmkr3OGdQbE9yyHQV+zcAli3NZBLZilEFLUbvLtNLYWZWDR8DG9yn/R96wwoQ7ut7RvSvvw6Mkl4sTRWdi49RkEgwqlHQruKL0HFgSaUnFQ597jdj2Nl3TbqvH0h4jJV9rq66e3D2RQlfnNl87NCfP8Dc/QUdvoNs0z46JoNDzm5bjGbD0pbPrm0nd2xQl3+dbclFPN/Sf6YrekbbgDZN+o/8jLvsbcYPWFwbileWlUAynZ5QUDJfeGJYycRl1rlC6RlMVcvHzC0r8U7c1zWHsRAsbw2eTpJ14Y1ubWfYynvg079ABnKU3w== X-Forefront-Antispam-Report: CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(376002)(39860400002)(396003)(346002)(136003)(451199015)(40470700004)(36840700001)(46966006)(83380400001)(36860700001)(356005)(86362001)(7636003)(82740400003)(5660300002)(41300700001)(8936002)(2906002)(4326008)(40460700003)(82310400005)(8676002)(40480700001)(186003)(26005)(7696005)(6666004)(336012)(47076005)(426003)(1076003)(54906003)(316002)(110136005)(6636002)(2616005)(70586007)(478600001)(70206006)(36756003);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Dec 2022 14:49:14.9908 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 6d2fe2fe-9a76-43d7-18f1-08dad6cfe138 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CY4PEPF0000B8EA.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY5PR12MB6252 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Shay Drory Introduce ifc related stuff to enable PRE_COPY of VF during migration. Signed-off-by: Shay Drory Signed-off-by: Yishai Hadas Acked-by: Leon Romanovsky --- include/linux/mlx5/mlx5_ifc.h | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 5a4e914e2a6f..230a96626a5f 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -1882,7 +1882,12 @@ struct mlx5_ifc_cmd_hca_cap_2_bits { u8 max_reformat_remove_size[0x8]; u8 max_reformat_remove_offset[0x8]; - u8 reserved_at_c0[0xe0]; + u8 reserved_at_c0[0x8]; + u8 migration_multi_load[0x1]; + u8 migration_tracking_state[0x1]; + u8 reserved_at_ca[0x16]; + + u8 reserved_at_e0[0xc0]; u8 reserved_at_1a0[0xb]; u8 log_min_mkey_entity_size[0x5]; @@ -11918,7 +11923,8 @@ struct mlx5_ifc_query_vhca_migration_state_in_bits { u8 reserved_at_20[0x10]; u8 op_mod[0x10]; - u8 reserved_at_40[0x10]; + u8 incremental[0x1]; + u8 reserved_at_41[0xf]; u8 vhca_id[0x10]; u8 reserved_at_60[0x20]; @@ -11944,7 +11950,9 @@ struct mlx5_ifc_save_vhca_state_in_bits { u8 reserved_at_20[0x10]; u8 op_mod[0x10]; - u8 reserved_at_40[0x10]; + u8 incremental[0x1]; + u8 set_track[0x1]; + u8 reserved_at_42[0xe]; u8 vhca_id[0x10]; u8 reserved_at_60[0x20]; From patchwork Mon Dec 5 14:48:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13064576 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2C0CC4321E for ; Mon, 5 Dec 2022 14:49:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231494AbiLEOt1 (ORCPT ); Mon, 5 Dec 2022 09:49:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53086 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231138AbiLEOtY (ORCPT ); Mon, 5 Dec 2022 09:49:24 -0500 Received: from NAM02-SN1-obe.outbound.protection.outlook.com (mail-sn1nam02on2074.outbound.protection.outlook.com [40.107.96.74]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9D5281B9F0 for ; Mon, 5 Dec 2022 06:49:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=HpcwfVTcAkbYbXobbr4RVk7fsZ7k3YgjxLkawUq5EE+/dP9d8NsBXK0dC/CxO8FpNvxSw3ocfGtKZ94lhrxGxS/G2k2Ao6U44ZqRBL9LwgJtw90/hDUlOTvUL8bfS8KwdT0SV1/LEqhKF92QFX7BUAKII4ZrzudXASRV4LfZVyGb1nc3feRV/X6+Qey3z7enp1RRzd9+3/lq5waWdpElTh+5vA/H5FT3QkbCwBo1QC9n6ZVbx/quVCYgqA33KE6h5a40jtf1zJii6K9gbubdrLJt116SS2qiDnUHdtGXmHTsfc/1uZULxzLAxLjVwbo3Z9x0mcxOJAAm/9n3orTowA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=vciraGivD+YwWormzn/JhijLAfv5wMh9/GykaZEtXao=; b=SbLD60Hd6r+34xvvD9L1NdefpXQC2Bpyq/rABuBLg6/epwLQkNVBCt962jiquVBg9kI5GWdPR911mqrZ2n8s9Xb0R1RzZH5oI0Tm7vLJtfwD9jw2DJ4pySEgnK/Xcr8r+/nMUeIyRc1N0TVwzbxBuBNXoGA0gpKbBr64XBaYezwzkUz5br8xFI9pyT2OgPPFEPiK0J+R0tXgQV/vuSFooaWVgc2lAUuMRPKL6kVlJ12uqPEO6DduPGur6pwnS0E5RWOD8du2qna3YPFEsJQWd2iamq+pX2/SMmIa/I1uK5dpOvZKkYppNCmzhxHLdJfzy3mrVouWj5swHCjIdu0Ktw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.160) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=vciraGivD+YwWormzn/JhijLAfv5wMh9/GykaZEtXao=; b=nV8JRoKOJbOCBq3GYKMRAmO93Rll+HAUjaQ0YzqaiyYp7n+CAqCbL5nEUp/Hsv8CLLKy7xOPu1slgmspAI0xJh3uATz1cBusYet52Pl8OaxVXg5yCiLyApFOx0O0+jYBYpfQACIvYQgslM8zZmZ8nPDLXQ9zLTYSUOYBTd3pR6ujXOSc+JKPx0aVGZjuIibDXeu1mKrtVM7wBUnXanf7Vnx6S4J0dt8yRnBnucIo7XZKMJEAuFtOU4PrFauGy2mX/z8Mzx43dCdtf0QDfgYM5J1+sNn9SFxxeq6Ck+7TI9HKc80c526Us6AkKI+SidOrXGfAiHCze5m76psXW2dFFQ== Received: from MW4PR03CA0129.namprd03.prod.outlook.com (2603:10b6:303:8c::14) by CH0PR12MB5265.namprd12.prod.outlook.com (2603:10b6:610:d0::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.14; Mon, 5 Dec 2022 14:49:20 +0000 Received: from CO1NAM11FT017.eop-nam11.prod.protection.outlook.com (2603:10b6:303:8c:cafe::86) by MW4PR03CA0129.outlook.office365.com (2603:10b6:303:8c::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.14 via Frontend Transport; Mon, 5 Dec 2022 14:49:19 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.160) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.160) by CO1NAM11FT017.mail.protection.outlook.com (10.13.175.108) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.13 via Frontend Transport; Mon, 5 Dec 2022 14:49:19 +0000 Received: from rnnvmail205.nvidia.com (10.129.68.10) by mail.nvidia.com (10.129.200.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Mon, 5 Dec 2022 06:49:11 -0800 Received: from rnnvmail205.nvidia.com (10.129.68.10) by rnnvmail205.nvidia.com (10.129.68.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Mon, 5 Dec 2022 06:49:10 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.129.68.10) with Microsoft SMTP Server id 15.2.986.36 via Frontend Transport; Mon, 5 Dec 2022 06:49:07 -0800 From: Yishai Hadas To: , CC: , , , , , , , , , Subject: [PATCH V3 vfio 02/14] vfio: Extend the device migration protocol with PRE_COPY Date: Mon, 5 Dec 2022 16:48:26 +0200 Message-ID: <20221205144838.245287-3-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221205144838.245287-1-yishaih@nvidia.com> References: <20221205144838.245287-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1NAM11FT017:EE_|CH0PR12MB5265:EE_ X-MS-Office365-Filtering-Correlation-Id: d930197f-f187-416c-84b7-08dad6cfe3df X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: YaRaoIhvf0bPXeJpbEDExQ9EIktMjD1w6OrcnlAJ7Xo04SXfL5cqBIkl3x1ytmogFnfe7L8qT5Sd2xkzLPuwo2H93ZjwgNzhZ9JY0VzJ7FMUcA3TSCK9TRPUNhNHTCC9MjNOwcZGMBayp10CvMtC9YOMiQRhxmP2vUjEKnrwDN1awOrbhwkptLclYLzx3UUK6GVCXFYFuk0OEHHycW3Lcq0Xmc0hh3GCaLG2iODwkAd8s86rBs3Rajwq6jx0me/SxFjpzXHOeOnOpyU7eK7nfndVpPeznN6rfE/QJ5ufIOPEkaI0h544hTIfLnv38l+xs8YkhuYb5MTDUdRt6BlTeXOYJRxFYCj2dSh4EJZicQm4IRL/XQG7zzQv5uC4nMBI3EDP+465jAsGbxbuTWWMCTk6vRnWV4iemuPilYpegf5qGOUJAYLkiPPE9zc42FsTFxal6Y5xk/x+/BdlNBjY9agTjDi+B+3QSv3ORr82kwH8B0ruGSjeq0C7m27mVqDt7SxemjuLODi6Akc0KyFDEDzf6Hl9TwUeQZ8v9uwWeWE0SoZoG0r+5it6ThTVwNoDqQU9jzFVBO6QKBCyQC4YSU4EwnqjOgTpcYFsKObREklzWCybEYDt2+E3QvQrFXbvLJbQSAYjjSNOfOuE4qy0Kahig7mgjPJOmfDT7b8++pPBmsvCCzuVtOTGQrwhdW5bjXRs0+d5gE29GWhvqed0vw== X-Forefront-Antispam-Report: CIP:216.228.117.160;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge1.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(396003)(39860400002)(346002)(136003)(376002)(451199015)(46966006)(40470700004)(36840700001)(83380400001)(36860700001)(356005)(86362001)(7636003)(82740400003)(5660300002)(30864003)(41300700001)(40460700003)(8936002)(2906002)(4326008)(82310400005)(8676002)(40480700001)(186003)(26005)(7696005)(336012)(47076005)(426003)(1076003)(6666004)(54906003)(316002)(110136005)(6636002)(2616005)(70586007)(478600001)(70206006)(36756003);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Dec 2022 14:49:19.5059 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: d930197f-f187-416c-84b7-08dad6cfe3df X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.160];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT017.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH0PR12MB5265 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Jason Gunthorpe The optional PRE_COPY states open the saving data transfer FD before reaching STOP_COPY and allows the device to dirty track internal state changes with the general idea to reduce the volume of data transferred in the STOP_COPY stage. While in PRE_COPY the device remains RUNNING, but the saving FD is open. Only if the device also supports RUNNING_P2P can it support PRE_COPY_P2P, which halts P2P transfers while continuing the saving FD. PRE_COPY, with P2P support, requires the driver to implement 7 new arcs and exists as an optional FSM branch between RUNNING and STOP_COPY: RUNNING -> PRE_COPY -> PRE_COPY_P2P -> STOP_COPY A new ioctl VFIO_MIG_GET_PRECOPY_INFO is provided to allow userspace to query the progress of the precopy operation in the driver with the idea it will judge to move to STOP_COPY at least once the initial data set is transferred, and possibly after the dirty size has shrunk appropriately. This ioctl is valid only in PRE_COPY states and kernel driver should return -EINVAL from any other migration state. Compared to the v1 clarification, STOP_COPY -> PRE_COPY is blocked and to be defined in future. We also split the pending_bytes report into the initial and sustaining values, e.g.: initial_bytes and dirty_bytes. initial_bytes: Amount of initial precopy data. dirty_bytes: Device state changes relative to data previously retrieved. These fields are not required to have any bearing to STOP_COPY phase. It is recommended to leave PRE_COPY for STOP_COPY only after the initial_bytes field reaches zero. Leaving PRE_COPY earlier might make things slower. Signed-off-by: Jason Gunthorpe Signed-off-by: Shay Drory Reviewed-by: Kevin Tian Reviewed-by: Shameer Kolothum Signed-off-by: Yishai Hadas --- drivers/vfio/vfio_main.c | 74 ++++++++++++++++++++++- include/uapi/linux/vfio.h | 123 ++++++++++++++++++++++++++++++++++++-- 2 files changed, 191 insertions(+), 6 deletions(-) diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index 662e267a3e13..9c4a752dad4e 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -1042,7 +1042,7 @@ int vfio_mig_get_next_state(struct vfio_device *device, enum vfio_device_mig_state new_fsm, enum vfio_device_mig_state *next_fsm) { - enum { VFIO_DEVICE_NUM_STATES = VFIO_DEVICE_STATE_RUNNING_P2P + 1 }; + enum { VFIO_DEVICE_NUM_STATES = VFIO_DEVICE_STATE_PRE_COPY_P2P + 1 }; /* * The coding in this table requires the driver to implement the * following FSM arcs: @@ -1057,30 +1057,65 @@ int vfio_mig_get_next_state(struct vfio_device *device, * RUNNING_P2P -> RUNNING * RUNNING_P2P -> STOP * STOP -> RUNNING_P2P - * Without P2P the driver must implement: + * + * If precopy is supported then the driver must support these additional + * FSM arcs: + * RUNNING -> PRE_COPY + * PRE_COPY -> RUNNING + * PRE_COPY -> STOP_COPY + * However, if precopy and P2P are supported together then the driver + * must support these additional arcs beyond the P2P arcs above: + * PRE_COPY -> RUNNING + * PRE_COPY -> PRE_COPY_P2P + * PRE_COPY_P2P -> PRE_COPY + * PRE_COPY_P2P -> RUNNING_P2P + * PRE_COPY_P2P -> STOP_COPY + * RUNNING -> PRE_COPY + * RUNNING_P2P -> PRE_COPY_P2P + * + * Without P2P and precopy the driver must implement: * RUNNING -> STOP * STOP -> RUNNING * * The coding will step through multiple states for some combination * transitions; if all optional features are supported, this means the * following ones: + * PRE_COPY -> PRE_COPY_P2P -> STOP_COPY + * PRE_COPY -> RUNNING -> RUNNING_P2P + * PRE_COPY -> RUNNING -> RUNNING_P2P -> STOP + * PRE_COPY -> RUNNING -> RUNNING_P2P -> STOP -> RESUMING + * PRE_COPY_P2P -> RUNNING_P2P -> RUNNING + * PRE_COPY_P2P -> RUNNING_P2P -> STOP + * PRE_COPY_P2P -> RUNNING_P2P -> STOP -> RESUMING * RESUMING -> STOP -> RUNNING_P2P + * RESUMING -> STOP -> RUNNING_P2P -> PRE_COPY_P2P * RESUMING -> STOP -> RUNNING_P2P -> RUNNING + * RESUMING -> STOP -> RUNNING_P2P -> RUNNING -> PRE_COPY * RESUMING -> STOP -> STOP_COPY + * RUNNING -> RUNNING_P2P -> PRE_COPY_P2P * RUNNING -> RUNNING_P2P -> STOP * RUNNING -> RUNNING_P2P -> STOP -> RESUMING * RUNNING -> RUNNING_P2P -> STOP -> STOP_COPY + * RUNNING_P2P -> RUNNING -> PRE_COPY * RUNNING_P2P -> STOP -> RESUMING * RUNNING_P2P -> STOP -> STOP_COPY + * STOP -> RUNNING_P2P -> PRE_COPY_P2P * STOP -> RUNNING_P2P -> RUNNING + * STOP -> RUNNING_P2P -> RUNNING -> PRE_COPY * STOP_COPY -> STOP -> RESUMING * STOP_COPY -> STOP -> RUNNING_P2P * STOP_COPY -> STOP -> RUNNING_P2P -> RUNNING + * + * The following transitions are blocked: + * STOP_COPY -> PRE_COPY + * STOP_COPY -> PRE_COPY_P2P */ static const u8 vfio_from_fsm_table[VFIO_DEVICE_NUM_STATES][VFIO_DEVICE_NUM_STATES] = { [VFIO_DEVICE_STATE_STOP] = { [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_STOP, [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING_P2P, + [VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_RUNNING_P2P, + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P, [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP_COPY, [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_RESUMING, [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P, @@ -1089,14 +1124,38 @@ int vfio_mig_get_next_state(struct vfio_device *device, [VFIO_DEVICE_STATE_RUNNING] = { [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_RUNNING_P2P, [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING, + [VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_PRE_COPY, + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P, [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_RUNNING_P2P, [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_RUNNING_P2P, [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P, [VFIO_DEVICE_STATE_ERROR] = VFIO_DEVICE_STATE_ERROR, }, + [VFIO_DEVICE_STATE_PRE_COPY] = { + [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_RUNNING, + [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING, + [VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_PRE_COPY, + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_PRE_COPY_P2P, + [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_PRE_COPY_P2P, + [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_RUNNING, + [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_RUNNING, + [VFIO_DEVICE_STATE_ERROR] = VFIO_DEVICE_STATE_ERROR, + }, + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = { + [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_RUNNING_P2P, + [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING_P2P, + [VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_PRE_COPY, + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_PRE_COPY_P2P, + [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP_COPY, + [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_RUNNING_P2P, + [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P, + [VFIO_DEVICE_STATE_ERROR] = VFIO_DEVICE_STATE_ERROR, + }, [VFIO_DEVICE_STATE_STOP_COPY] = { [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_STOP, [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_STOP, + [VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_ERROR, + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_ERROR, [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP_COPY, [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_STOP, [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_STOP, @@ -1105,6 +1164,8 @@ int vfio_mig_get_next_state(struct vfio_device *device, [VFIO_DEVICE_STATE_RESUMING] = { [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_STOP, [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_STOP, + [VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_STOP, + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_STOP, [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP, [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_RESUMING, [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_STOP, @@ -1113,6 +1174,8 @@ int vfio_mig_get_next_state(struct vfio_device *device, [VFIO_DEVICE_STATE_RUNNING_P2P] = { [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_STOP, [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING, + [VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_RUNNING, + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_PRE_COPY_P2P, [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP, [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_STOP, [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P, @@ -1121,6 +1184,8 @@ int vfio_mig_get_next_state(struct vfio_device *device, [VFIO_DEVICE_STATE_ERROR] = { [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_ERROR, [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_ERROR, + [VFIO_DEVICE_STATE_PRE_COPY] = VFIO_DEVICE_STATE_ERROR, + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_DEVICE_STATE_ERROR, [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_ERROR, [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_ERROR, [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_ERROR, @@ -1131,6 +1196,11 @@ int vfio_mig_get_next_state(struct vfio_device *device, static const unsigned int state_flags_table[VFIO_DEVICE_NUM_STATES] = { [VFIO_DEVICE_STATE_STOP] = VFIO_MIGRATION_STOP_COPY, [VFIO_DEVICE_STATE_RUNNING] = VFIO_MIGRATION_STOP_COPY, + [VFIO_DEVICE_STATE_PRE_COPY] = + VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_PRE_COPY, + [VFIO_DEVICE_STATE_PRE_COPY_P2P] = VFIO_MIGRATION_STOP_COPY | + VFIO_MIGRATION_P2P | + VFIO_MIGRATION_PRE_COPY, [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_MIGRATION_STOP_COPY, [VFIO_DEVICE_STATE_RESUMING] = VFIO_MIGRATION_STOP_COPY, [VFIO_DEVICE_STATE_RUNNING_P2P] = diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 3e45dbaf190e..23105eb036fa 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -819,12 +819,20 @@ struct vfio_device_feature { * VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P means that RUNNING_P2P * is supported in addition to the STOP_COPY states. * + * VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_PRE_COPY means that + * PRE_COPY is supported in addition to the STOP_COPY states. + * + * VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P | VFIO_MIGRATION_PRE_COPY + * means that RUNNING_P2P, PRE_COPY and PRE_COPY_P2P are supported + * in addition to the STOP_COPY states. + * * Other combinations of flags have behavior to be defined in the future. */ struct vfio_device_feature_migration { __aligned_u64 flags; #define VFIO_MIGRATION_STOP_COPY (1 << 0) #define VFIO_MIGRATION_P2P (1 << 1) +#define VFIO_MIGRATION_PRE_COPY (1 << 2) }; #define VFIO_DEVICE_FEATURE_MIGRATION 1 @@ -875,8 +883,13 @@ struct vfio_device_feature_mig_state { * RESUMING - The device is stopped and is loading a new internal state * ERROR - The device has failed and must be reset * - * And 1 optional state to support VFIO_MIGRATION_P2P: + * And optional states to support VFIO_MIGRATION_P2P: * RUNNING_P2P - RUNNING, except the device cannot do peer to peer DMA + * And VFIO_MIGRATION_PRE_COPY: + * PRE_COPY - The device is running normally but tracking internal state + * changes + * And VFIO_MIGRATION_P2P | VFIO_MIGRATION_PRE_COPY: + * PRE_COPY_P2P - PRE_COPY, except the device cannot do peer to peer DMA * * The FSM takes actions on the arcs between FSM states. The driver implements * the following behavior for the FSM arcs: @@ -908,20 +921,48 @@ struct vfio_device_feature_mig_state { * * To abort a RESUMING session the device must be reset. * + * PRE_COPY -> RUNNING * RUNNING_P2P -> RUNNING * While in RUNNING the device is fully operational, the device may generate * interrupts, DMA, respond to MMIO, all vfio device regions are functional, * and the device may advance its internal state. * + * The PRE_COPY arc will terminate a data transfer session. + * + * PRE_COPY_P2P -> RUNNING_P2P * RUNNING -> RUNNING_P2P * STOP -> RUNNING_P2P * While in RUNNING_P2P the device is partially running in the P2P quiescent * state defined below. * + * The PRE_COPY_P2P arc will terminate a data transfer session. + * + * RUNNING -> PRE_COPY + * RUNNING_P2P -> PRE_COPY_P2P * STOP -> STOP_COPY - * This arc begin the process of saving the device state and will return a - * new data_fd. + * PRE_COPY, PRE_COPY_P2P and STOP_COPY form the "saving group" of states + * which share a data transfer session. Moving between these states alters + * what is streamed in session, but does not terminate or otherwise affect + * the associated fd. + * + * These arcs begin the process of saving the device state and will return a + * new data_fd. The migration driver may perform actions such as enabling + * dirty logging of device state when entering PRE_COPY or PER_COPY_P2P. * + * Each arc does not change the device operation, the device remains + * RUNNING, P2P quiesced or in STOP. The STOP_COPY state is described below + * in PRE_COPY_P2P -> STOP_COPY. + * + * PRE_COPY -> PRE_COPY_P2P + * Entering PRE_COPY_P2P continues all the behaviors of PRE_COPY above. + * However, while in the PRE_COPY_P2P state, the device is partially running + * in the P2P quiescent state defined below, like RUNNING_P2P. + * + * PRE_COPY_P2P -> PRE_COPY + * This arc allows returning the device to a full RUNNING behavior while + * continuing all the behaviors of PRE_COPY. + * + * PRE_COPY_P2P -> STOP_COPY * While in the STOP_COPY state the device has the same behavior as STOP * with the addition that the data transfers session continues to stream the * migration state. End of stream on the FD indicates the entire device @@ -939,6 +980,13 @@ struct vfio_device_feature_mig_state { * device state for this arc if required to prepare the device to receive the * migration data. * + * STOP_COPY -> PRE_COPY + * STOP_COPY -> PRE_COPY_P2P + * These arcs are not permitted and return error if requested. Future + * revisions of this API may define behaviors for these arcs, in this case + * support will be discoverable by a new flag in + * VFIO_DEVICE_FEATURE_MIGRATION. + * * any -> ERROR * ERROR cannot be specified as a device state, however any transition request * can be failed with an errno return and may then move the device_state into @@ -950,7 +998,7 @@ struct vfio_device_feature_mig_state { * The optional peer to peer (P2P) quiescent state is intended to be a quiescent * state for the device for the purposes of managing multiple devices within a * user context where peer-to-peer DMA between devices may be active. The - * RUNNING_P2P states must prevent the device from initiating + * RUNNING_P2P and PRE_COPY_P2P states must prevent the device from initiating * any new P2P DMA transactions. If the device can identify P2P transactions * then it can stop only P2P DMA, otherwise it must stop all DMA. The migration * driver must complete any such outstanding operations prior to completing the @@ -963,6 +1011,8 @@ struct vfio_device_feature_mig_state { * above FSM arcs. As there are multiple paths through the FSM arcs the path * should be selected based on the following rules: * - Select the shortest path. + * - The path cannot have saving group states as interior arcs, only + * starting/end states. * Refer to vfio_mig_get_next_state() for the result of the algorithm. * * The automatic transit through the FSM arcs that make up the combination @@ -976,6 +1026,9 @@ struct vfio_device_feature_mig_state { * support them. The user can discover if these states are supported by using * VFIO_DEVICE_FEATURE_MIGRATION. By using combination transitions the user can * avoid knowing about these optional states if the kernel driver supports them. + * + * Arcs touching PRE_COPY and PRE_COPY_P2P are removed if support for PRE_COPY + * is not present. */ enum vfio_device_mig_state { VFIO_DEVICE_STATE_ERROR = 0, @@ -984,8 +1037,70 @@ enum vfio_device_mig_state { VFIO_DEVICE_STATE_STOP_COPY = 3, VFIO_DEVICE_STATE_RESUMING = 4, VFIO_DEVICE_STATE_RUNNING_P2P = 5, + VFIO_DEVICE_STATE_PRE_COPY = 6, + VFIO_DEVICE_STATE_PRE_COPY_P2P = 7, +}; + +/** + * VFIO_MIG_GET_PRECOPY_INFO - _IO(VFIO_TYPE, VFIO_BASE + 21) + * + * This ioctl is used on the migration data FD in the precopy phase of the + * migration data transfer. It returns an estimate of the current data sizes + * remaining to be transferred. It allows the user to judge when it is + * appropriate to leave PRE_COPY for STOP_COPY. + * + * This ioctl is valid only in PRE_COPY states and kernel driver should + * return -EINVAL from any other migration state. + * + * The vfio_precopy_info data structure returned by this ioctl provides + * estimates of data available from the device during the PRE_COPY states. + * This estimate is split into two categories, initial_bytes and + * dirty_bytes. + * + * The initial_bytes field indicates the amount of initial precopy + * data available from the device. This field should have a non-zero initial + * value and decrease as migration data is read from the device. + * It is recommended to leave PRE_COPY for STOP_COPY only after this field + * reaches zero. Leaving PRE_COPY earlier might make things slower. + * + * The dirty_bytes field tracks device state changes relative to data + * previously retrieved. This field starts at zero and may increase as + * the internal device state is modified or decrease as that modified + * state is read from the device. + * + * Userspace may use the combination of these fields to estimate the + * potential data size available during the PRE_COPY phases, as well as + * trends relative to the rate the device is dirtying its internal + * state, but these fields are not required to have any bearing relative + * to the data size available during the STOP_COPY phase. + * + * Drivers have a lot of flexibility in when and what they transfer during the + * PRE_COPY phase, and how they report this from VFIO_MIG_GET_PRECOPY_INFO. + * + * During pre-copy the migration data FD has a temporary "end of stream" that is + * reached when both initial_bytes and dirty_byte are zero. For instance, this + * may indicate that the device is idle and not currently dirtying any internal + * state. When read() is done on this temporary end of stream the kernel driver + * should return ENOMSG from read(). Userspace can wait for more data (which may + * never come) by using poll. + * + * Once in STOP_COPY the migration data FD has a permanent end of stream + * signaled in the usual way by read() always returning 0 and poll always + * returning readable. ENOMSG may not be returned in STOP_COPY. + * Support for this ioctl is mandatory if a driver claims to support + * VFIO_MIGRATION_PRE_COPY. + * + * Return: 0 on success, -1 and errno set on failure. + */ +struct vfio_precopy_info { + __u32 argsz; + __u32 flags; + __aligned_u64 initial_bytes; + __aligned_u64 dirty_bytes; }; +#define VFIO_MIG_GET_PRECOPY_INFO _IO(VFIO_TYPE, VFIO_BASE + 21) + /* * Upon VFIO_DEVICE_FEATURE_SET, allow the device to be moved into a low power * state with the platform-based power management. Device use of lower power From patchwork Mon Dec 5 14:48:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13064577 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9799C47088 for ; Mon, 5 Dec 2022 14:49:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231138AbiLEOt2 (ORCPT ); Mon, 5 Dec 2022 09:49:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53132 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230411AbiLEOt0 (ORCPT ); Mon, 5 Dec 2022 09:49:26 -0500 Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2041.outbound.protection.outlook.com [40.107.93.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1D0E51BEAB for ; Mon, 5 Dec 2022 06:49:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=SvpJ4c2sCsTgtgIcbP+oOz4+PCXaFFcSrHD1zlWLiKIOF3Dz0e8VPtEaZZGmpeKjput8YCiapOH8mXcI2QRxJZKn/TGVG+azrvLkfIEmjHrVhxiTZ+LlGu1X0inhabbMRdIRRnC2zl8gdnY65dyWu/03CHhc6HBlqKC/jQe+5OREBxmL/1d/uQikq9VFTwQQJHO1Rs6ke7KrQkwJwd5zg7u9WuTac+1L3TklgizR0d4eCbNEje8oVafugWfiUYxJGzPDxpfZK7N4XPODkYA7NfZcgcKIupScyW+10vmuRHlwlWQH6z2wDX6H/V1YN0HdHs8kZzYRA8q1QFKCpNzh0Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=mJI2trT55PVHXIAkqqeIO5OfjZXn9lo+ApMxNa4PUok=; b=kwzvxNM7heBW3KOKBzQpik2Et/8JfBScqBoLyiGZJyoplQWZ5ki+dhGYQ5T51RWAVg6EjQiWmduPhW4eY9zUFA7eURVY4nXDZBVeq/3H2Wu1aaak/0vW3tm3GOIYJ8kkKL6TPrIZgpdrHVxowocY5NtV+CixeU1+oXqYoa0jbxyipA5u17rI7fomCjhT2w8oHricqqRec4yhRwXcjhAQ/idR1p1wsXUnZtG+Ib6s6yi0+dhxEnh8M87DVhOsFkgsbjPvc47Pw1fZxp5kUpjCqQZKcMUHuzbWaHyPqrWipE424FkjPg+SJ0HOGh8tjMoB7+G8WqE4UmYTFZFmu6Z2Ug== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=mJI2trT55PVHXIAkqqeIO5OfjZXn9lo+ApMxNa4PUok=; b=fMzcPnTSCqw8koho0LMd+CLtipToMCe/SAjTIU2eanzNPCV2919Xh2LaCD3756f1Et95T27uw/opnY+1m2ltuWyjs2BqOUzUd/WvSe/0odfmr01l+1HrU5EaIYKBEAhYyc2sJZ1jBSmcSMktBheNIkexsSpiAK7W58k4cNHpAsWBmpQdY4bl5D55GrAH/X7XJwUkJvmdNwcx+e0W3Sag7muWj4FX1GdJOWhzjHTybT0Yxeuadm1ptToOkVJ1cLIJMqmhU1nYSpgMJQPiKcDhoLbFudqbhC4Kj5Sv3x6fF4czOWRtU2JxPKZ+NTaBe0ipRErNjPQbItLnpn/V3Q4fgA== Received: from CY5PR15CA0106.namprd15.prod.outlook.com (2603:10b6:930:7::8) by PH7PR12MB6563.namprd12.prod.outlook.com (2603:10b6:510:211::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.14; Mon, 5 Dec 2022 14:49:24 +0000 Received: from CY4PEPF0000B8E8.namprd05.prod.outlook.com (2603:10b6:930:7:cafe::f3) by CY5PR15CA0106.outlook.office365.com (2603:10b6:930:7::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.8 via Frontend Transport; Mon, 5 Dec 2022 14:49:24 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by CY4PEPF0000B8E8.mail.protection.outlook.com (10.167.241.4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5813.11 via Frontend Transport; Mon, 5 Dec 2022 14:49:24 +0000 Received: from rnnvmail205.nvidia.com (10.129.68.10) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Mon, 5 Dec 2022 06:49:15 -0800 Received: from rnnvmail205.nvidia.com (10.129.68.10) by rnnvmail205.nvidia.com (10.129.68.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Mon, 5 Dec 2022 06:49:14 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.129.68.10) with Microsoft SMTP Server id 15.2.986.36 via Frontend Transport; Mon, 5 Dec 2022 06:49:11 -0800 From: Yishai Hadas To: , CC: , , , , , , , , , Subject: [PATCH V3 vfio 03/14] vfio/mlx5: Enforce a single SAVE command at a time Date: Mon, 5 Dec 2022 16:48:27 +0200 Message-ID: <20221205144838.245287-4-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221205144838.245287-1-yishaih@nvidia.com> References: <20221205144838.245287-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY4PEPF0000B8E8:EE_|PH7PR12MB6563:EE_ X-MS-Office365-Filtering-Correlation-Id: 3bd371a2-e4d0-4f1c-e923-08dad6cfe6a4 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: hjDtUYiWGsOaS0imASwJgRJDZzGHfF0UAbbG5nqn/DuRo34IWFHg3qGENB4BFOhYzszrxUtewf/xwTkATaT7FRxrF7aFPdXJDYkP/bYA9dlRoeUhoPvIWTQXuR0Ri/66PlzNY13WL1NiCG9ezWBle+sLVU8WthhRPA+TLEx6UtGSC6lDjPABMJFHniN/yB3LMX1IFgQS+ne2VAhT9vE4OZK8nHfoUzMxwPRAIQ4cOmUUgQshz+VsJt9avfokQGZ71hWsFsy9XjYtSCBz1DIz9RihrToOKaBA7Dk3bfKISbOYnFHp6BCw9YVg4S/7ONxiYZO+uvCS1mC4xJWBPVUc9KDrMaZhId7CpCfeEi4w24W6fbA0RkSoceLVl819pRGAlAF3ezUM8OlhTE9V0UHKzedFGYRkkn2nLuMtCd2Qbq1yB55ESrQc7ANREFDmzuqmECRCuZS4r6tbi+xwewLnVs7CEX0LQL84vhckhxmLSkbWFc9Vqf7QNKWEk57Mb+aAl661TxBWvk/VGR6UnzX0hI1WAYy/XSQu2Fo49UdGptthGm7KZRDZBRrRM68vpQhAnmWFH54Br+kZDtKLz1orVvb1VyUJVWdZGXDIU74jDB8wGzfVSCMXuUHZyvkQozGja3OgVDirPtHYkX7o83Ue8B3Ei/lhiiEUHwZmNKDtHaNavU0voVFYczaaC+tYhba9cHexkslBWO+GmltjNHJq4A== X-Forefront-Antispam-Report: CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(346002)(39860400002)(396003)(376002)(136003)(451199015)(40470700004)(36840700001)(46966006)(66899015)(2906002)(8936002)(83380400001)(8676002)(4326008)(36756003)(86362001)(7636003)(356005)(336012)(186003)(426003)(2616005)(47076005)(82310400005)(70206006)(1076003)(41300700001)(5660300002)(82740400003)(6636002)(316002)(110136005)(26005)(54906003)(7696005)(6666004)(36860700001)(70586007)(40460700003)(478600001)(40480700001);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Dec 2022 14:49:24.1082 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 3bd371a2-e4d0-4f1c-e923-08dad6cfe6a4 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CY4PEPF0000B8E8.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH7PR12MB6563 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Enforce a single SAVE command at a time. As the SAVE command is an asynchronous one, we must enforce running only a single command at a time. This will preserve ordering between multiple calls and protect from races on the migration file data structure. This is a must for the next patches from the series where as part of PRE_COPY we may have multiple images to be saved and multiple SAVE commands may be issued from different flows. Reviewed-by: Jason Gunthorpe Signed-off-by: Yishai Hadas --- drivers/vfio/pci/mlx5/cmd.c | 6 ++++++ drivers/vfio/pci/mlx5/cmd.h | 1 + drivers/vfio/pci/mlx5/main.c | 7 +++++++ 3 files changed, 14 insertions(+) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index 0848bc905d3e..55ee8036f59c 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -281,6 +281,7 @@ void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work) dma_unmap_sgtable(mdev->device, &migf->table.sgt, DMA_FROM_DEVICE, 0); mlx5_core_dealloc_pd(mdev, async_data->pdn); kvfree(async_data->out); + complete(&migf->save_comp); fput(migf->filp); } @@ -321,6 +322,10 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, return -ENOTCONN; mdev = mvdev->mdev; + err = wait_for_completion_interruptible(&migf->save_comp); + if (err) + return err; + err = mlx5_core_alloc_pd(mdev, &pdn); if (err) return err; @@ -371,6 +376,7 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, dma_unmap_sgtable(mdev->device, &migf->table.sgt, DMA_FROM_DEVICE, 0); err_dma_map: mlx5_core_dealloc_pd(mdev, pdn); + complete(&migf->save_comp); return err; } diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index 921d5720a1e5..8ffa7699872c 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -37,6 +37,7 @@ struct mlx5_vf_migration_file { unsigned long last_offset; struct mlx5vf_pci_core_device *mvdev; wait_queue_head_t poll_wait; + struct completion save_comp; struct mlx5_async_ctx async_ctx; struct mlx5vf_async_data async_data; }; diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index 6e9cf2aacc52..0d71ebb2a972 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -245,6 +245,13 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev) stream_open(migf->filp->f_inode, migf->filp); mutex_init(&migf->lock); init_waitqueue_head(&migf->poll_wait); + init_completion(&migf->save_comp); + /* + * save_comp is being used as a binary semaphore built from + * a completion. A normal mutex cannot be used because the lock is + * passed between kernel threads and lockdep can't model this. + */ + complete(&migf->save_comp); mlx5_cmd_init_async_ctx(mvdev->mdev, &migf->async_ctx); INIT_WORK(&migf->async_data.work, mlx5vf_mig_file_cleanup_cb); ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, From patchwork Mon Dec 5 14:48:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13064579 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B72F4C4321E for ; Mon, 5 Dec 2022 14:49:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231327AbiLEOtp (ORCPT ); Mon, 5 Dec 2022 09:49:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53198 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231649AbiLEOth (ORCPT ); Mon, 5 Dec 2022 09:49:37 -0500 Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2044.outbound.protection.outlook.com [40.107.93.44]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D6ADC1C429 for ; Mon, 5 Dec 2022 06:49:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=JLrGrPZZyuJBi7Saae41BdUC82x4o9Pk6V6fGv8rsQMm3c/uY5tB41onviLMwkggkeVd7BN2+71LAyWhbgIXGNL269L94bcgz+CQCET25k7XwzmES98f9W0cTXW5DazDVWXHBo0f06fqMBYoAOsA8r2kRWyT/sm23uGWkH5eU06CSzg86UswISU7kCHPsVRKmS27hsAxo/ffB61jm5QHysDs18LC0JZh/zHQwTXgkTAkEqkiBsfVjUh388ujl7WmW5lsAN0j/vL6IOH/3AVvzU5oFZVoq/PdUBxANzfO1dQkXWbFWFjbm4dlyvdAUMk0J1zhhnWr4HrARwEbRMrGKw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=kFBel/HGQBPe+s29hiyVwJUsEQv7X0qc6jIfLi0wJzk=; b=kfqAGQMRgNaNxPGGdY+Ow4KtjbvTaoUZtr1VfhBswiGCnRkvAWbflZ1N3Fax/yoYipFxmjVD8BeB73FbPXbLQP89govrmuYMBXJoqg5F8i+ohobC6af6A5YvDUp9lM3fgOc+JVlMqDmwSt8gZ/XIqTzJf7hqBC5KEcJXXBTh4uuxjVHYw7IGLuT/rrx2NFpqbCM/WPJdnU++oI9rkUKaUuERPeHG3uwJgo5YgMxRywmVV6h2Ha/jEPAnej738HZJ3Wdbn6N+PIMEw566wMVfI7r/Q6IexDDPnD7hUjPjlp4/k57I1bR8UZqRzRjsbU+cZOo9WGMzj6/ZR9L6kPnpPw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=kFBel/HGQBPe+s29hiyVwJUsEQv7X0qc6jIfLi0wJzk=; b=mTLZiureQFUCH3jk1xHwj+rCkGCrDjYSJm8Uoej50D0UYV/UK8TnUFTLhU7WGAymVAj9SlNWvJhMm4To8uY3Dc/XcpVcJugOuEGC11FxYLzNzsnK91akcB4zBFN+O8heRkAyzd2L3+m1KWEu0oUr7ZWZNCkZKcuamBcBHJKn5wnDDh4JSiwY9BVuWF4fr1wTnsqQFCvIsdZ017hyOKLkfzMKaBbDBzZZaA9iZMPudUOcpgI7hnp1F2E6nyb2XqFtdRz5dX5S+t2NCe8+/YgeDHnzteAr3n1UZFHhbl7veJl6g1exKStfN30DUI9jjQMEAW6JFzlKUd24uwEfFmVRAw== Received: from CY5PR15CA0106.namprd15.prod.outlook.com (2603:10b6:930:7::8) by CH3PR12MB8212.namprd12.prod.outlook.com (2603:10b6:610:120::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.14; Mon, 5 Dec 2022 14:49:29 +0000 Received: from CY4PEPF0000B8E8.namprd05.prod.outlook.com (2603:10b6:930:7:cafe::fa) by CY5PR15CA0106.outlook.office365.com (2603:10b6:930:7::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.8 via Frontend Transport; Mon, 5 Dec 2022 14:49:28 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by CY4PEPF0000B8E8.mail.protection.outlook.com (10.167.241.4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5813.11 via Frontend Transport; Mon, 5 Dec 2022 14:49:28 +0000 Received: from rnnvmail202.nvidia.com (10.129.68.7) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Mon, 5 Dec 2022 06:49:18 -0800 Received: from rnnvmail205.nvidia.com (10.129.68.10) by rnnvmail202.nvidia.com (10.129.68.7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Mon, 5 Dec 2022 06:49:18 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.129.68.10) with Microsoft SMTP Server id 15.2.986.36 via Frontend Transport; Mon, 5 Dec 2022 06:49:14 -0800 From: Yishai Hadas To: , CC: , , , , , , , , , Subject: [PATCH V3 vfio 04/14] vfio/mlx5: Refactor PD usage Date: Mon, 5 Dec 2022 16:48:28 +0200 Message-ID: <20221205144838.245287-5-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221205144838.245287-1-yishaih@nvidia.com> References: <20221205144838.245287-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY4PEPF0000B8E8:EE_|CH3PR12MB8212:EE_ X-MS-Office365-Filtering-Correlation-Id: 740f3677-a28a-4a20-b5ee-08dad6cfe95a X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Bk5eZUFv9izQnludnxTBSfihUU8rNVBLIMtKt2RVRzRoRULJdhfu9/E1cD69kXymusN4b8WsD0fH0hFSBA2yHYKs/c0BvG+yVBn/M7BCPtnrEuE5FnQlCBG+aV0YDKhDUVsJiAy28lXMcYBJcS3+ziufZIvFVH5xRs1SsKrYLpzvUH4qWOmvRVEOHKHRkNH1Q71fCYBZezbCqInoPtI6esWsfagxfNrwhVLaxa4aKS3YGnLoEvEdSqADPYLiZrmGr/SxF7jkQEg/5uBhtsI/KGaYtuB1dxc56we6L1Xf3q0nMjRAQP45jnrfZ20qT3HP34TeSUz4PVkGnR+2Rk7PiSGVx5RhilBypQJnwt1hxHMHK9Zw7Z2JH296i/0Y/ZC5H3MPnwnWpjuHKH2t3YJ/Kv/o0dRGGYEfPO8IKLHVWFWjYZZiNc9vYwUh6tPMtWhjYPpSxYLIU2jpohMIKMVdWERZrUHMp3+ss2hMHTrwvis+hN88UHqYN8zj8Kl9LvqmSUDjohXHLNNMOXrhII7bTZOsEdE7wbjeD7t7MCgqxZkwtsrw71eKvxURzKIr0pUkqr6CohKOnt5eKQmlYvfMB0VoEgpGfHE6PNaXzb48ZQfCHS1aV06HY1FdQVXTvJsRwvmP0bQVllS2/R4VdPwYIIvyd7qb5tobezd5Xk/Qpp3dnINJPXDcE2w2LyX17hxfby/bxV3zHb2Ush7A2WCYCQ== X-Forefront-Antispam-Report: CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(396003)(39860400002)(346002)(136003)(376002)(451199015)(46966006)(40470700004)(36840700001)(83380400001)(36860700001)(356005)(86362001)(7636003)(82740400003)(5660300002)(41300700001)(40460700003)(8936002)(2906002)(4326008)(82310400005)(8676002)(40480700001)(186003)(26005)(7696005)(336012)(47076005)(426003)(1076003)(6666004)(54906003)(316002)(110136005)(6636002)(2616005)(70586007)(478600001)(70206006)(66899015)(36756003);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Dec 2022 14:49:28.6550 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 740f3677-a28a-4a20-b5ee-08dad6cfe95a X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CY4PEPF0000B8E8.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH3PR12MB8212 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch refactors PD usage such as its life cycle will be as of the migration file instead of allocating/destroying it upon each SAVE/LOAD command. This is a preparation step towards the PRE_COPY series where multiple images will be SAVED/LOADED and a single PD can be simply reused. Reviewed-by: Jason Gunthorpe Signed-off-by: Yishai Hadas --- drivers/vfio/pci/mlx5/cmd.c | 53 ++++++++++++++++++++++++------------ drivers/vfio/pci/mlx5/cmd.h | 5 +++- drivers/vfio/pci/mlx5/main.c | 44 ++++++++++++++++++++++-------- 3 files changed, 71 insertions(+), 31 deletions(-) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index 55ee8036f59c..a97eac49e3d6 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -279,7 +279,6 @@ void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work) mlx5_core_destroy_mkey(mdev, async_data->mkey); dma_unmap_sgtable(mdev->device, &migf->table.sgt, DMA_FROM_DEVICE, 0); - mlx5_core_dealloc_pd(mdev, async_data->pdn); kvfree(async_data->out); complete(&migf->save_comp); fput(migf->filp); @@ -314,7 +313,7 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, u32 in[MLX5_ST_SZ_DW(save_vhca_state_in)] = {}; struct mlx5vf_async_data *async_data; struct mlx5_core_dev *mdev; - u32 pdn, mkey; + u32 mkey; int err; lockdep_assert_held(&mvdev->state_mutex); @@ -326,16 +325,12 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, if (err) return err; - err = mlx5_core_alloc_pd(mdev, &pdn); - if (err) - return err; - err = dma_map_sgtable(mdev->device, &migf->table.sgt, DMA_FROM_DEVICE, 0); if (err) goto err_dma_map; - err = _create_mkey(mdev, pdn, migf, NULL, &mkey); + err = _create_mkey(mdev, migf->pdn, migf, NULL, &mkey); if (err) goto err_create_mkey; @@ -357,7 +352,6 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, migf->total_length = 0; get_file(migf->filp); async_data->mkey = mkey; - async_data->pdn = pdn; err = mlx5_cmd_exec_cb(&migf->async_ctx, in, sizeof(in), async_data->out, out_size, mlx5vf_save_callback, @@ -375,7 +369,6 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, err_create_mkey: dma_unmap_sgtable(mdev->device, &migf->table.sgt, DMA_FROM_DEVICE, 0); err_dma_map: - mlx5_core_dealloc_pd(mdev, pdn); complete(&migf->save_comp); return err; } @@ -386,7 +379,7 @@ int mlx5vf_cmd_load_vhca_state(struct mlx5vf_pci_core_device *mvdev, struct mlx5_core_dev *mdev; u32 out[MLX5_ST_SZ_DW(load_vhca_state_out)] = {}; u32 in[MLX5_ST_SZ_DW(load_vhca_state_in)] = {}; - u32 pdn, mkey; + u32 mkey; int err; lockdep_assert_held(&mvdev->state_mutex); @@ -400,15 +393,11 @@ int mlx5vf_cmd_load_vhca_state(struct mlx5vf_pci_core_device *mvdev, } mdev = mvdev->mdev; - err = mlx5_core_alloc_pd(mdev, &pdn); - if (err) - goto end; - err = dma_map_sgtable(mdev->device, &migf->table.sgt, DMA_TO_DEVICE, 0); if (err) - goto err_reg; + goto end; - err = _create_mkey(mdev, pdn, migf, NULL, &mkey); + err = _create_mkey(mdev, migf->pdn, migf, NULL, &mkey); if (err) goto err_mkey; @@ -424,13 +413,41 @@ int mlx5vf_cmd_load_vhca_state(struct mlx5vf_pci_core_device *mvdev, mlx5_core_destroy_mkey(mdev, mkey); err_mkey: dma_unmap_sgtable(mdev->device, &migf->table.sgt, DMA_TO_DEVICE, 0); -err_reg: - mlx5_core_dealloc_pd(mdev, pdn); end: mutex_unlock(&migf->lock); return err; } +int mlx5vf_cmd_alloc_pd(struct mlx5_vf_migration_file *migf) +{ + int err; + + lockdep_assert_held(&migf->mvdev->state_mutex); + if (migf->mvdev->mdev_detach) + return -ENOTCONN; + + err = mlx5_core_alloc_pd(migf->mvdev->mdev, &migf->pdn); + return err; +} + +void mlx5vf_cmd_dealloc_pd(struct mlx5_vf_migration_file *migf) +{ + lockdep_assert_held(&migf->mvdev->state_mutex); + if (migf->mvdev->mdev_detach) + return; + + mlx5_core_dealloc_pd(migf->mvdev->mdev, migf->pdn); +} + +void mlx5fv_cmd_clean_migf_resources(struct mlx5_vf_migration_file *migf) +{ + lockdep_assert_held(&migf->mvdev->state_mutex); + + WARN_ON(migf->mvdev->mdev_detach); + + mlx5vf_cmd_dealloc_pd(migf); +} + static void combine_ranges(struct rb_root_cached *root, u32 cur_nodes, u32 req_nodes) { diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index 8ffa7699872c..ba760f956d53 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -16,7 +16,6 @@ struct mlx5vf_async_data { struct mlx5_async_work cb_work; struct work_struct work; int status; - u32 pdn; u32 mkey; void *out; }; @@ -27,6 +26,7 @@ struct mlx5_vf_migration_file { u8 disabled:1; u8 is_err:1; + u32 pdn; struct sg_append_table table; size_t total_length; size_t allocated_length; @@ -127,6 +127,9 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, struct mlx5_vf_migration_file *migf); int mlx5vf_cmd_load_vhca_state(struct mlx5vf_pci_core_device *mvdev, struct mlx5_vf_migration_file *migf); +int mlx5vf_cmd_alloc_pd(struct mlx5_vf_migration_file *migf); +void mlx5vf_cmd_dealloc_pd(struct mlx5_vf_migration_file *migf); +void mlx5fv_cmd_clean_migf_resources(struct mlx5_vf_migration_file *migf); void mlx5vf_state_mutex_unlock(struct mlx5vf_pci_core_device *mvdev); void mlx5vf_disable_fds(struct mlx5vf_pci_core_device *mvdev); void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work); diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index 0d71ebb2a972..1916f7c1468c 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -236,12 +236,15 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev) migf->filp = anon_inode_getfile("mlx5vf_mig", &mlx5vf_save_fops, migf, O_RDONLY); if (IS_ERR(migf->filp)) { - int err = PTR_ERR(migf->filp); - - kfree(migf); - return ERR_PTR(err); + ret = PTR_ERR(migf->filp); + goto end; } + migf->mvdev = mvdev; + ret = mlx5vf_cmd_alloc_pd(migf); + if (ret) + goto out_free; + stream_open(migf->filp->f_inode, migf->filp); mutex_init(&migf->lock); init_waitqueue_head(&migf->poll_wait); @@ -257,20 +260,25 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev) ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &migf->total_length); if (ret) - goto out_free; + goto out_pd; ret = mlx5vf_add_migration_pages( migf, DIV_ROUND_UP_ULL(migf->total_length, PAGE_SIZE)); if (ret) - goto out_free; + goto out_pd; - migf->mvdev = mvdev; ret = mlx5vf_cmd_save_vhca_state(mvdev, migf); if (ret) - goto out_free; + goto out_save; return migf; +out_save: + mlx5vf_disable_fd(migf); +out_pd: + mlx5vf_cmd_dealloc_pd(migf); out_free: fput(migf->filp); +end: + kfree(migf); return ERR_PTR(ret); } @@ -352,6 +360,7 @@ static struct mlx5_vf_migration_file * mlx5vf_pci_resume_device_data(struct mlx5vf_pci_core_device *mvdev) { struct mlx5_vf_migration_file *migf; + int ret; migf = kzalloc(sizeof(*migf), GFP_KERNEL); if (!migf) @@ -360,20 +369,30 @@ mlx5vf_pci_resume_device_data(struct mlx5vf_pci_core_device *mvdev) migf->filp = anon_inode_getfile("mlx5vf_mig", &mlx5vf_resume_fops, migf, O_WRONLY); if (IS_ERR(migf->filp)) { - int err = PTR_ERR(migf->filp); - - kfree(migf); - return ERR_PTR(err); + ret = PTR_ERR(migf->filp); + goto end; } + + migf->mvdev = mvdev; + ret = mlx5vf_cmd_alloc_pd(migf); + if (ret) + goto out_free; + stream_open(migf->filp->f_inode, migf->filp); mutex_init(&migf->lock); return migf; +out_free: + fput(migf->filp); +end: + kfree(migf); + return ERR_PTR(ret); } void mlx5vf_disable_fds(struct mlx5vf_pci_core_device *mvdev) { if (mvdev->resuming_migf) { mlx5vf_disable_fd(mvdev->resuming_migf); + mlx5fv_cmd_clean_migf_resources(mvdev->resuming_migf); fput(mvdev->resuming_migf->filp); mvdev->resuming_migf = NULL; } @@ -381,6 +400,7 @@ void mlx5vf_disable_fds(struct mlx5vf_pci_core_device *mvdev) mlx5_cmd_cleanup_async_ctx(&mvdev->saving_migf->async_ctx); cancel_work_sync(&mvdev->saving_migf->async_data.work); mlx5vf_disable_fd(mvdev->saving_migf); + mlx5fv_cmd_clean_migf_resources(mvdev->saving_migf); fput(mvdev->saving_migf->filp); mvdev->saving_migf = NULL; } From patchwork Mon Dec 5 14:48:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13064578 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42DEFC4321E for ; Mon, 5 Dec 2022 14:49:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230309AbiLEOtm (ORCPT ); Mon, 5 Dec 2022 09:49:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53288 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231640AbiLEOth (ORCPT ); Mon, 5 Dec 2022 09:49:37 -0500 Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2051.outbound.protection.outlook.com [40.107.93.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E97311B9E2 for ; Mon, 5 Dec 2022 06:49:32 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=XJl58uBVVJdmgwmRsnsCKLXjWUL/MxbTmLBVSTYmdPyBCFCaDj0GrLmB4TR/zPpRJW3MokBbzIumW5Ia7o61vRs1dPpKRhlu0qcZFfTh1YE8rYNf/ASf+hPpEo38h0wCAAQCjIs7qYsUbjfNUWDPIqXLkunsgNJ0/5oXlEgJwjSvOXCVmvq6nPBswCEGsHpVhSAUraBcabiWs4FT06jYRZt/jRAhCmpaIYg8GlvqQmOJjKBqs3GAPOVFKmgwndbG0XLdx4dqPLWGHocekaKk/FFXIU6mleajW65gv2hcT/HjqoAcC5XCyZ1nXkVrK3iTTlB2VH69AmSdlanhiQdZNg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=63ZDIov65xUAtsNyUPnhuO5+SlQ6Ybtku7UX42sImbc=; b=E5YofLOdNkLmiY1qBTSdaJwYJ5h2C4e90VQfDuxbDDWN403ztCaK3gKeWFPxiIOL206xsbwvEIXajFYaUcDD+RgMJJWLPpGiqyM4MdAhXqLpBAHtbVxM8nwyLfCfyCnI3TmZo5hv73krPWf4nVEf0s5+4HMHywPc92n9arEXseO+Uj1EMhpj5tx/BTdpSBG38LkDjMzmg3jFzCS1UJZauYeeg8nm7BulnN5EwMMr2SSi69BFH3TElxmLVzbPLyoI8zmn9CeS1HZ5EaT+vvndy9jrqUZ1VaCllfN3c8K8LIAihu9AKpXNhaik6J9QnufkcRW32sHcOVQOJ45X6UCCeA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.160) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=63ZDIov65xUAtsNyUPnhuO5+SlQ6Ybtku7UX42sImbc=; b=VYJ9HsHONX3yJVCsy5HYF0UsLlZF9VImmdbxhECsTzDb99qFpf8vpZMY0A24aplkd+GRKMMR+Lig2eZ/gbAJGWHZRk5xDgATwPQOhJfuHdPp01nkniVrcvCYJ2WF5K6RAYBds/wtNl8JL+fwgEr91/3kykFGJ03X5p7tNoM53ypzxMssItcf+aIOLugIhXiAOcMQa5YOu8ccCX7W3fp50rdAqlLvO+Q/WwzZ7VXPQVd+bDsQ3lKNDLzHFfqB3Z2+8bHWiWFmRFrPAzjqM3uE8H5H7pRpPpMQbTTXiuohV+CjiYv+kxa1ewWq+rl8VIyO8pAT8+AeuajnX+LKGQnLiQ== Received: from MW4PR03CA0294.namprd03.prod.outlook.com (2603:10b6:303:b5::29) by IA1PR12MB6162.namprd12.prod.outlook.com (2603:10b6:208:3ea::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.13; Mon, 5 Dec 2022 14:49:30 +0000 Received: from CO1NAM11FT049.eop-nam11.prod.protection.outlook.com (2603:10b6:303:b5:cafe::d8) by MW4PR03CA0294.outlook.office365.com (2603:10b6:303:b5::29) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.14 via Frontend Transport; Mon, 5 Dec 2022 14:49:30 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.160) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.160) by CO1NAM11FT049.mail.protection.outlook.com (10.13.175.50) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.14 via Frontend Transport; Mon, 5 Dec 2022 14:49:29 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Mon, 5 Dec 2022 06:49:22 -0800 Received: from rnnvmail205.nvidia.com (10.129.68.10) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Mon, 5 Dec 2022 06:49:21 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.129.68.10) with Microsoft SMTP Server id 15.2.986.36 via Frontend Transport; Mon, 5 Dec 2022 06:49:18 -0800 From: Yishai Hadas To: , CC: , , , , , , , , , Subject: [PATCH V3 vfio 05/14] vfio/mlx5: Refactor MKEY usage Date: Mon, 5 Dec 2022 16:48:29 +0200 Message-ID: <20221205144838.245287-6-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221205144838.245287-1-yishaih@nvidia.com> References: <20221205144838.245287-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1NAM11FT049:EE_|IA1PR12MB6162:EE_ X-MS-Office365-Filtering-Correlation-Id: 62474361-3f3a-40fa-d45e-08dad6cfea1c X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: DnQpuOCi5pvojw0pZmF81FVDb37ndcuFT3AGJJZ1VsezyL2ntO06yGWXqec+k42fjlSrLQVEEW7EG++K/JdkuDykRolh8J6A9amWXJRRoCm7jN5aI07DbXNF3AR08DKJrxlpXKvAPTqvFHsMtSvsWgutcdk5+x+fmvOrPXHhanWHRw2wCRap4Lb7R/IVtoXzjWtOE03lyosxqe5OQKGI4WghczrXOtdUMAziz5VKtPP+285xVVNY0aic/npwEl1H1UQY0lMYPdG0cp+9mJ+esucwJQgn4DxE71BfShveQh4NN6iHKDClVhddPhpS0mDX6s9dHtlrxECYRLCVO4R9DtlHzdPVOHZfbdkvcemsQ+RYJTB4vXL2v4ZwnaG6CXCMB/02i/O0RRyHMVR9LEusAMWFf8PMuCnQXOJdn/zM/zdd7doskVIoEPIof0bhtH190TJRasggXC7Fi4ubGdNNHyJjffhlYsJmCRJwSsdbIsM3kGJeqiyoB7V5WBQV2ytWtfqjNf8G0sXC011BVtHpvoEzBIQl4IfTNP+8sYXwYV3LKKGqmj3fXq1Ad4LcgVNhl24itiy/o1XMG4X/3AWIAbROv0Ojxb8Erk8yZa2UIOD+mtAnVIogwhII5zqlEskWaQaKfOs5qKm9x24cpExwu8m02XtELNq1azWOccNsVym64/WwVvy2wpK/nLZDVLILSIJHE9g38jJx8U5c+py8cw== X-Forefront-Antispam-Report: CIP:216.228.117.160;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge1.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(39860400002)(346002)(376002)(136003)(396003)(451199015)(46966006)(40470700004)(36840700001)(83380400001)(47076005)(426003)(66899015)(336012)(86362001)(40460700003)(478600001)(40480700001)(36756003)(82310400005)(82740400003)(36860700001)(1076003)(2616005)(7636003)(356005)(4326008)(186003)(6666004)(5660300002)(26005)(8676002)(7696005)(30864003)(70206006)(54906003)(2906002)(110136005)(316002)(41300700001)(70586007)(6636002)(8936002);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Dec 2022 14:49:29.9741 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 62474361-3f3a-40fa-d45e-08dad6cfea1c X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.160];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT049.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: IA1PR12MB6162 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch refactors MKEY usage such as its life cycle will be as of the migration file instead of allocating/destroying it upon each SAVE/LOAD command. This is a preparation step towards the PRE_COPY series where multiple images will be SAVED/LOADED. We achieve it by having a new struct named mlx5_vhca_data_buffer which holds the mkey and its related stuff as of sg_append_table, allocated_length, etc. The above fields were taken out from the migration file main struct, into mlx5_vhca_data_buffer dedicated struct with the proper helpers in place. For now we have a single mlx5_vhca_data_buffer per migration file. However, in coming patches we'll have multiple of them to support multiple images. Reviewed-by: Jason Gunthorpe Signed-off-by: Yishai Hadas --- drivers/vfio/pci/mlx5/cmd.c | 162 ++++++++++++++++++++++------------- drivers/vfio/pci/mlx5/cmd.h | 37 +++++--- drivers/vfio/pci/mlx5/main.c | 92 +++++++++++--------- 3 files changed, 178 insertions(+), 113 deletions(-) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index a97eac49e3d6..ed4c472d2eae 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -210,11 +210,11 @@ static int mlx5vf_cmd_get_vhca_id(struct mlx5_core_dev *mdev, u16 function_id, } static int _create_mkey(struct mlx5_core_dev *mdev, u32 pdn, - struct mlx5_vf_migration_file *migf, + struct mlx5_vhca_data_buffer *buf, struct mlx5_vhca_recv_buf *recv_buf, u32 *mkey) { - size_t npages = migf ? DIV_ROUND_UP(migf->total_length, PAGE_SIZE) : + size_t npages = buf ? DIV_ROUND_UP(buf->allocated_length, PAGE_SIZE) : recv_buf->npages; int err = 0, inlen; __be64 *mtt; @@ -232,10 +232,10 @@ static int _create_mkey(struct mlx5_core_dev *mdev, u32 pdn, DIV_ROUND_UP(npages, 2)); mtt = (__be64 *)MLX5_ADDR_OF(create_mkey_in, in, klm_pas_mtt); - if (migf) { + if (buf) { struct sg_dma_page_iter dma_iter; - for_each_sgtable_dma_page(&migf->table.sgt, &dma_iter, 0) + for_each_sgtable_dma_page(&buf->table.sgt, &dma_iter, 0) *mtt++ = cpu_to_be64(sg_page_iter_dma_address(&dma_iter)); } else { int i; @@ -255,20 +255,99 @@ static int _create_mkey(struct mlx5_core_dev *mdev, u32 pdn, MLX5_SET(mkc, mkc, qpn, 0xffffff); MLX5_SET(mkc, mkc, log_page_size, PAGE_SHIFT); MLX5_SET(mkc, mkc, translations_octword_size, DIV_ROUND_UP(npages, 2)); - MLX5_SET64(mkc, mkc, len, - migf ? migf->total_length : (npages * PAGE_SIZE)); + MLX5_SET64(mkc, mkc, len, npages * PAGE_SIZE); err = mlx5_core_create_mkey(mdev, mkey, in, inlen); kvfree(in); return err; } +static int mlx5vf_dma_data_buffer(struct mlx5_vhca_data_buffer *buf) +{ + struct mlx5vf_pci_core_device *mvdev = buf->migf->mvdev; + struct mlx5_core_dev *mdev = mvdev->mdev; + int ret; + + lockdep_assert_held(&mvdev->state_mutex); + if (mvdev->mdev_detach) + return -ENOTCONN; + + if (buf->dmaed || !buf->allocated_length) + return -EINVAL; + + ret = dma_map_sgtable(mdev->device, &buf->table.sgt, buf->dma_dir, 0); + if (ret) + return ret; + + ret = _create_mkey(mdev, buf->migf->pdn, buf, NULL, &buf->mkey); + if (ret) + goto err; + + buf->dmaed = true; + + return 0; +err: + dma_unmap_sgtable(mdev->device, &buf->table.sgt, buf->dma_dir, 0); + return ret; +} + +void mlx5vf_free_data_buffer(struct mlx5_vhca_data_buffer *buf) +{ + struct mlx5_vf_migration_file *migf = buf->migf; + struct sg_page_iter sg_iter; + + lockdep_assert_held(&migf->mvdev->state_mutex); + WARN_ON(migf->mvdev->mdev_detach); + + if (buf->dmaed) { + mlx5_core_destroy_mkey(migf->mvdev->mdev, buf->mkey); + dma_unmap_sgtable(migf->mvdev->mdev->device, &buf->table.sgt, + buf->dma_dir, 0); + } + + /* Undo alloc_pages_bulk_array() */ + for_each_sgtable_page(&buf->table.sgt, &sg_iter, 0) + __free_page(sg_page_iter_page(&sg_iter)); + sg_free_append_table(&buf->table); + kfree(buf); +} + +struct mlx5_vhca_data_buffer * +mlx5vf_alloc_data_buffer(struct mlx5_vf_migration_file *migf, + size_t length, + enum dma_data_direction dma_dir) +{ + struct mlx5_vhca_data_buffer *buf; + int ret; + + buf = kzalloc(sizeof(*buf), GFP_KERNEL); + if (!buf) + return ERR_PTR(-ENOMEM); + + buf->dma_dir = dma_dir; + buf->migf = migf; + if (length) { + ret = mlx5vf_add_migration_pages(buf, + DIV_ROUND_UP_ULL(length, PAGE_SIZE)); + if (ret) + goto end; + + ret = mlx5vf_dma_data_buffer(buf); + if (ret) + goto end; + } + + return buf; +end: + mlx5vf_free_data_buffer(buf); + return ERR_PTR(ret); +} + void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work) { struct mlx5vf_async_data *async_data = container_of(_work, struct mlx5vf_async_data, work); struct mlx5_vf_migration_file *migf = container_of(async_data, struct mlx5_vf_migration_file, async_data); - struct mlx5_core_dev *mdev = migf->mvdev->mdev; mutex_lock(&migf->lock); if (async_data->status) { @@ -276,9 +355,6 @@ void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work) wake_up_interruptible(&migf->poll_wait); } mutex_unlock(&migf->lock); - - mlx5_core_destroy_mkey(mdev, async_data->mkey); - dma_unmap_sgtable(mdev->device, &migf->table.sgt, DMA_FROM_DEVICE, 0); kvfree(async_data->out); complete(&migf->save_comp); fput(migf->filp); @@ -292,7 +368,7 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context) struct mlx5_vf_migration_file, async_data); if (!status) { - WRITE_ONCE(migf->total_length, + WRITE_ONCE(migf->buf->length, MLX5_GET(save_vhca_state_out, async_data->out, actual_image_size)); wake_up_interruptible(&migf->poll_wait); @@ -307,39 +383,28 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context) } int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, - struct mlx5_vf_migration_file *migf) + struct mlx5_vf_migration_file *migf, + struct mlx5_vhca_data_buffer *buf) { u32 out_size = MLX5_ST_SZ_BYTES(save_vhca_state_out); u32 in[MLX5_ST_SZ_DW(save_vhca_state_in)] = {}; struct mlx5vf_async_data *async_data; - struct mlx5_core_dev *mdev; - u32 mkey; int err; lockdep_assert_held(&mvdev->state_mutex); if (mvdev->mdev_detach) return -ENOTCONN; - mdev = mvdev->mdev; err = wait_for_completion_interruptible(&migf->save_comp); if (err) return err; - err = dma_map_sgtable(mdev->device, &migf->table.sgt, DMA_FROM_DEVICE, - 0); - if (err) - goto err_dma_map; - - err = _create_mkey(mdev, migf->pdn, migf, NULL, &mkey); - if (err) - goto err_create_mkey; - MLX5_SET(save_vhca_state_in, in, opcode, MLX5_CMD_OP_SAVE_VHCA_STATE); MLX5_SET(save_vhca_state_in, in, op_mod, 0); MLX5_SET(save_vhca_state_in, in, vhca_id, mvdev->vhca_id); - MLX5_SET(save_vhca_state_in, in, mkey, mkey); - MLX5_SET(save_vhca_state_in, in, size, migf->total_length); + MLX5_SET(save_vhca_state_in, in, mkey, buf->mkey); + MLX5_SET(save_vhca_state_in, in, size, buf->allocated_length); async_data = &migf->async_data; async_data->out = kvzalloc(out_size, GFP_KERNEL); @@ -348,10 +413,7 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, goto err_out; } - /* no data exists till the callback comes back */ - migf->total_length = 0; get_file(migf->filp); - async_data->mkey = mkey; err = mlx5_cmd_exec_cb(&migf->async_ctx, in, sizeof(in), async_data->out, out_size, mlx5vf_save_callback, @@ -365,57 +427,33 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, fput(migf->filp); kvfree(async_data->out); err_out: - mlx5_core_destroy_mkey(mdev, mkey); -err_create_mkey: - dma_unmap_sgtable(mdev->device, &migf->table.sgt, DMA_FROM_DEVICE, 0); -err_dma_map: complete(&migf->save_comp); return err; } int mlx5vf_cmd_load_vhca_state(struct mlx5vf_pci_core_device *mvdev, - struct mlx5_vf_migration_file *migf) + struct mlx5_vf_migration_file *migf, + struct mlx5_vhca_data_buffer *buf) { - struct mlx5_core_dev *mdev; u32 out[MLX5_ST_SZ_DW(load_vhca_state_out)] = {}; u32 in[MLX5_ST_SZ_DW(load_vhca_state_in)] = {}; - u32 mkey; int err; lockdep_assert_held(&mvdev->state_mutex); if (mvdev->mdev_detach) return -ENOTCONN; - mutex_lock(&migf->lock); - if (!migf->total_length) { - err = -EINVAL; - goto end; - } - - mdev = mvdev->mdev; - err = dma_map_sgtable(mdev->device, &migf->table.sgt, DMA_TO_DEVICE, 0); + err = mlx5vf_dma_data_buffer(buf); if (err) - goto end; - - err = _create_mkey(mdev, migf->pdn, migf, NULL, &mkey); - if (err) - goto err_mkey; + return err; MLX5_SET(load_vhca_state_in, in, opcode, MLX5_CMD_OP_LOAD_VHCA_STATE); MLX5_SET(load_vhca_state_in, in, op_mod, 0); MLX5_SET(load_vhca_state_in, in, vhca_id, mvdev->vhca_id); - MLX5_SET(load_vhca_state_in, in, mkey, mkey); - MLX5_SET(load_vhca_state_in, in, size, migf->total_length); - - err = mlx5_cmd_exec_inout(mdev, load_vhca_state, in, out); - - mlx5_core_destroy_mkey(mdev, mkey); -err_mkey: - dma_unmap_sgtable(mdev->device, &migf->table.sgt, DMA_TO_DEVICE, 0); -end: - mutex_unlock(&migf->lock); - return err; + MLX5_SET(load_vhca_state_in, in, mkey, buf->mkey); + MLX5_SET(load_vhca_state_in, in, size, buf->length); + return mlx5_cmd_exec_inout(mvdev->mdev, load_vhca_state, in, out); } int mlx5vf_cmd_alloc_pd(struct mlx5_vf_migration_file *migf) @@ -445,6 +483,10 @@ void mlx5fv_cmd_clean_migf_resources(struct mlx5_vf_migration_file *migf) WARN_ON(migf->mvdev->mdev_detach); + if (migf->buf) { + mlx5vf_free_data_buffer(migf->buf); + migf->buf = NULL; + } mlx5vf_cmd_dealloc_pd(migf); } diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index ba760f956d53..b0f08dfc8120 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -12,11 +12,25 @@ #include #include +struct mlx5_vhca_data_buffer { + struct sg_append_table table; + loff_t start_pos; + u64 length; + u64 allocated_length; + u32 mkey; + enum dma_data_direction dma_dir; + u8 dmaed:1; + struct mlx5_vf_migration_file *migf; + /* Optimize mlx5vf_get_migration_page() for sequential access */ + struct scatterlist *last_offset_sg; + unsigned int sg_last_entry; + unsigned long last_offset; +}; + struct mlx5vf_async_data { struct mlx5_async_work cb_work; struct work_struct work; int status; - u32 mkey; void *out; }; @@ -27,14 +41,7 @@ struct mlx5_vf_migration_file { u8 is_err:1; u32 pdn; - struct sg_append_table table; - size_t total_length; - size_t allocated_length; - - /* Optimize mlx5vf_get_migration_page() for sequential access */ - struct scatterlist *last_offset_sg; - unsigned int sg_last_entry; - unsigned long last_offset; + struct mlx5_vhca_data_buffer *buf; struct mlx5vf_pci_core_device *mvdev; wait_queue_head_t poll_wait; struct completion save_comp; @@ -124,12 +131,20 @@ void mlx5vf_cmd_set_migratable(struct mlx5vf_pci_core_device *mvdev, void mlx5vf_cmd_remove_migratable(struct mlx5vf_pci_core_device *mvdev); void mlx5vf_cmd_close_migratable(struct mlx5vf_pci_core_device *mvdev); int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, - struct mlx5_vf_migration_file *migf); + struct mlx5_vf_migration_file *migf, + struct mlx5_vhca_data_buffer *buf); int mlx5vf_cmd_load_vhca_state(struct mlx5vf_pci_core_device *mvdev, - struct mlx5_vf_migration_file *migf); + struct mlx5_vf_migration_file *migf, + struct mlx5_vhca_data_buffer *buf); int mlx5vf_cmd_alloc_pd(struct mlx5_vf_migration_file *migf); void mlx5vf_cmd_dealloc_pd(struct mlx5_vf_migration_file *migf); void mlx5fv_cmd_clean_migf_resources(struct mlx5_vf_migration_file *migf); +struct mlx5_vhca_data_buffer * +mlx5vf_alloc_data_buffer(struct mlx5_vf_migration_file *migf, + size_t length, enum dma_data_direction dma_dir); +void mlx5vf_free_data_buffer(struct mlx5_vhca_data_buffer *buf); +int mlx5vf_add_migration_pages(struct mlx5_vhca_data_buffer *buf, + unsigned int npages); void mlx5vf_state_mutex_unlock(struct mlx5vf_pci_core_device *mvdev); void mlx5vf_disable_fds(struct mlx5vf_pci_core_device *mvdev); void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work); diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index 1916f7c1468c..5f694fce854c 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -33,7 +33,7 @@ static struct mlx5vf_pci_core_device *mlx5vf_drvdata(struct pci_dev *pdev) } static struct page * -mlx5vf_get_migration_page(struct mlx5_vf_migration_file *migf, +mlx5vf_get_migration_page(struct mlx5_vhca_data_buffer *buf, unsigned long offset) { unsigned long cur_offset = 0; @@ -41,20 +41,20 @@ mlx5vf_get_migration_page(struct mlx5_vf_migration_file *migf, unsigned int i; /* All accesses are sequential */ - if (offset < migf->last_offset || !migf->last_offset_sg) { - migf->last_offset = 0; - migf->last_offset_sg = migf->table.sgt.sgl; - migf->sg_last_entry = 0; + if (offset < buf->last_offset || !buf->last_offset_sg) { + buf->last_offset = 0; + buf->last_offset_sg = buf->table.sgt.sgl; + buf->sg_last_entry = 0; } - cur_offset = migf->last_offset; + cur_offset = buf->last_offset; - for_each_sg(migf->last_offset_sg, sg, - migf->table.sgt.orig_nents - migf->sg_last_entry, i) { + for_each_sg(buf->last_offset_sg, sg, + buf->table.sgt.orig_nents - buf->sg_last_entry, i) { if (offset < sg->length + cur_offset) { - migf->last_offset_sg = sg; - migf->sg_last_entry += i; - migf->last_offset = cur_offset; + buf->last_offset_sg = sg; + buf->sg_last_entry += i; + buf->last_offset = cur_offset; return nth_page(sg_page(sg), (offset - cur_offset) / PAGE_SIZE); } @@ -63,8 +63,8 @@ mlx5vf_get_migration_page(struct mlx5_vf_migration_file *migf, return NULL; } -static int mlx5vf_add_migration_pages(struct mlx5_vf_migration_file *migf, - unsigned int npages) +int mlx5vf_add_migration_pages(struct mlx5_vhca_data_buffer *buf, + unsigned int npages) { unsigned int to_alloc = npages; struct page **page_list; @@ -85,13 +85,13 @@ static int mlx5vf_add_migration_pages(struct mlx5_vf_migration_file *migf, } to_alloc -= filled; ret = sg_alloc_append_table_from_pages( - &migf->table, page_list, filled, 0, + &buf->table, page_list, filled, 0, filled << PAGE_SHIFT, UINT_MAX, SG_MAX_SINGLE_ALLOC, GFP_KERNEL); if (ret) goto err; - migf->allocated_length += filled * PAGE_SIZE; + buf->allocated_length += filled * PAGE_SIZE; /* clean input for another bulk allocation */ memset(page_list, 0, filled * sizeof(*page_list)); to_fill = min_t(unsigned int, to_alloc, @@ -108,16 +108,8 @@ static int mlx5vf_add_migration_pages(struct mlx5_vf_migration_file *migf, static void mlx5vf_disable_fd(struct mlx5_vf_migration_file *migf) { - struct sg_page_iter sg_iter; - mutex_lock(&migf->lock); - /* Undo alloc_pages_bulk_array() */ - for_each_sgtable_page(&migf->table.sgt, &sg_iter, 0) - __free_page(sg_page_iter_page(&sg_iter)); - sg_free_append_table(&migf->table); migf->disabled = true; - migf->total_length = 0; - migf->allocated_length = 0; migf->filp->f_pos = 0; mutex_unlock(&migf->lock); } @@ -136,6 +128,7 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, loff_t *pos) { struct mlx5_vf_migration_file *migf = filp->private_data; + struct mlx5_vhca_data_buffer *vhca_buf = migf->buf; ssize_t done = 0; if (pos) @@ -144,16 +137,16 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, if (!(filp->f_flags & O_NONBLOCK)) { if (wait_event_interruptible(migf->poll_wait, - READ_ONCE(migf->total_length) || migf->is_err)) + READ_ONCE(vhca_buf->length) || migf->is_err)) return -ERESTARTSYS; } mutex_lock(&migf->lock); - if ((filp->f_flags & O_NONBLOCK) && !READ_ONCE(migf->total_length)) { + if ((filp->f_flags & O_NONBLOCK) && !READ_ONCE(vhca_buf->length)) { done = -EAGAIN; goto out_unlock; } - if (*pos > migf->total_length) { + if (*pos > vhca_buf->length) { done = -EINVAL; goto out_unlock; } @@ -162,7 +155,7 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, goto out_unlock; } - len = min_t(size_t, migf->total_length - *pos, len); + len = min_t(size_t, vhca_buf->length - *pos, len); while (len) { size_t page_offset; struct page *page; @@ -171,7 +164,7 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, int ret; page_offset = (*pos) % PAGE_SIZE; - page = mlx5vf_get_migration_page(migf, *pos - page_offset); + page = mlx5vf_get_migration_page(vhca_buf, *pos - page_offset); if (!page) { if (done == 0) done = -EINVAL; @@ -208,7 +201,7 @@ static __poll_t mlx5vf_save_poll(struct file *filp, mutex_lock(&migf->lock); if (migf->disabled || migf->is_err) pollflags = EPOLLIN | EPOLLRDNORM | EPOLLRDHUP; - else if (READ_ONCE(migf->total_length)) + else if (READ_ONCE(migf->buf->length)) pollflags = EPOLLIN | EPOLLRDNORM; mutex_unlock(&migf->lock); @@ -227,6 +220,8 @@ static struct mlx5_vf_migration_file * mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev) { struct mlx5_vf_migration_file *migf; + struct mlx5_vhca_data_buffer *buf; + size_t length; int ret; migf = kzalloc(sizeof(*migf), GFP_KERNEL); @@ -257,22 +252,23 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev) complete(&migf->save_comp); mlx5_cmd_init_async_ctx(mvdev->mdev, &migf->async_ctx); INIT_WORK(&migf->async_data.work, mlx5vf_mig_file_cleanup_cb); - ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, - &migf->total_length); + ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length); if (ret) goto out_pd; - ret = mlx5vf_add_migration_pages( - migf, DIV_ROUND_UP_ULL(migf->total_length, PAGE_SIZE)); - if (ret) + buf = mlx5vf_alloc_data_buffer(migf, length, DMA_FROM_DEVICE); + if (IS_ERR(buf)) { + ret = PTR_ERR(buf); goto out_pd; + } - ret = mlx5vf_cmd_save_vhca_state(mvdev, migf); + ret = mlx5vf_cmd_save_vhca_state(mvdev, migf, buf); if (ret) goto out_save; + migf->buf = buf; return migf; out_save: - mlx5vf_disable_fd(migf); + mlx5vf_free_data_buffer(buf); out_pd: mlx5vf_cmd_dealloc_pd(migf); out_free: @@ -286,6 +282,7 @@ static ssize_t mlx5vf_resume_write(struct file *filp, const char __user *buf, size_t len, loff_t *pos) { struct mlx5_vf_migration_file *migf = filp->private_data; + struct mlx5_vhca_data_buffer *vhca_buf = migf->buf; loff_t requested_length; ssize_t done = 0; @@ -306,10 +303,10 @@ static ssize_t mlx5vf_resume_write(struct file *filp, const char __user *buf, goto out_unlock; } - if (migf->allocated_length < requested_length) { + if (vhca_buf->allocated_length < requested_length) { done = mlx5vf_add_migration_pages( - migf, - DIV_ROUND_UP(requested_length - migf->allocated_length, + vhca_buf, + DIV_ROUND_UP(requested_length - vhca_buf->allocated_length, PAGE_SIZE)); if (done) goto out_unlock; @@ -323,7 +320,7 @@ static ssize_t mlx5vf_resume_write(struct file *filp, const char __user *buf, int ret; page_offset = (*pos) % PAGE_SIZE; - page = mlx5vf_get_migration_page(migf, *pos - page_offset); + page = mlx5vf_get_migration_page(vhca_buf, *pos - page_offset); if (!page) { if (done == 0) done = -EINVAL; @@ -342,7 +339,7 @@ static ssize_t mlx5vf_resume_write(struct file *filp, const char __user *buf, len -= page_len; done += page_len; buf += page_len; - migf->total_length += page_len; + vhca_buf->length += page_len; } out_unlock: mutex_unlock(&migf->lock); @@ -360,6 +357,7 @@ static struct mlx5_vf_migration_file * mlx5vf_pci_resume_device_data(struct mlx5vf_pci_core_device *mvdev) { struct mlx5_vf_migration_file *migf; + struct mlx5_vhca_data_buffer *buf; int ret; migf = kzalloc(sizeof(*migf), GFP_KERNEL); @@ -378,9 +376,18 @@ mlx5vf_pci_resume_device_data(struct mlx5vf_pci_core_device *mvdev) if (ret) goto out_free; + buf = mlx5vf_alloc_data_buffer(migf, 0, DMA_TO_DEVICE); + if (IS_ERR(buf)) { + ret = PTR_ERR(buf); + goto out_pd; + } + + migf->buf = buf; stream_open(migf->filp->f_inode, migf->filp); mutex_init(&migf->lock); return migf; +out_pd: + mlx5vf_cmd_dealloc_pd(migf); out_free: fput(migf->filp); end: @@ -474,7 +481,8 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev, if (cur == VFIO_DEVICE_STATE_RESUMING && new == VFIO_DEVICE_STATE_STOP) { ret = mlx5vf_cmd_load_vhca_state(mvdev, - mvdev->resuming_migf); + mvdev->resuming_migf, + mvdev->resuming_migf->buf); if (ret) return ERR_PTR(ret); mlx5vf_disable_fds(mvdev); From patchwork Mon Dec 5 14:48:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13064581 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 392AEC4321E for ; Mon, 5 Dec 2022 14:49:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231583AbiLEOtt (ORCPT ); Mon, 5 Dec 2022 09:49:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53438 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231151AbiLEOtn (ORCPT ); Mon, 5 Dec 2022 09:49:43 -0500 Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on2078.outbound.protection.outlook.com [40.107.220.78]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1A9B21C105 for ; Mon, 5 Dec 2022 06:49:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=f3EUYh7hrOkC61hEC/o3ZjoRYMeywWEVTA4+TOuyBBYzTAo+3dzmO90lCavXOEhWAM5enyTZItmiuDskEiBVV2mzvg8AGKoV3L6Xtx4vhGJ0h5Pjldw6Hn9PzCtw7Ap5VwJdhwdsNl9QPDZymoyutNFYH3Aiem7fv46yDVV9TIVNe69tbq+I7AzBFRtxu4H0U5FcEIvkpSHYha1shg/jcaJJzGFeaZhaat/bJrJy57J09dImD9WqDl0YQ4173tVzgxJYKNglOgnT40ZQ96NKAbRldIOnNjC8zn/5uQ2WmWy6BQDMFme66KrJL8UJiQCkodrTm/sal8ySCu3Q4CLbxg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=GIV4Ourelq5LgR69e5dtig0n8mGW+SmX+Z8CVOFcIKY=; b=nRA9mpjGBkB+fvWGp0RraeaRSN3ljUHYzNcmQKbxjlDs++N+sOgKiov8KsqpdWGqN2cf3IXh+IEElALqB3Oz3n/OLR6+GUbXojjBTC6L73JmxNA5K3bqBEzw0cf56FnUdyGI/j75wiWmnVBaw4pd9DTmSrFH+GRRENu/5gVNerf2Aijgf5lyVxUkg/eyv5aVKjl8d4vHZqvCNbPOEwZWGB7OuZ+gPonjYwSGaUqavQaKFH55s+jsc4vewFxILEk3tW65hLnp9nw1d/ljZSndnTSPzupmj3D051+JZyh2uSjidt6hZLLQOE+OrbKNA8JkpPKKU/5sODu11PF6gy4lRw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=GIV4Ourelq5LgR69e5dtig0n8mGW+SmX+Z8CVOFcIKY=; b=NrAdxjN9Oi3LFr7+vOCq15c0PulqtllzozOrTDZdx66ncOkG5sMuVFM1a7Xt8CA88u3s6PcbpAx8DZTzXx7wfHQyHgUlecJxN94ODCx/2GGCjgfQzicczP/EH2E3GhD/Lnaal7dhQdSQ1Mlk1ZKdFwf7kTZF0u+UrRKoM0YrBcwdMWaN53LwLDlKn864cPqRiVuMelPFgdVs/MSLLfxSeQrdm2bO7gxMzBqGBCT97XWkJnrgR1qR/grWbKjqmVEkTQpe3EFI6J0lK1dz62N3f3qRUQ6oeIy1hMSMtJ8DLrZSl3z38xHn78M5jxYblXsIylUbxQrWcULNdwsAbXdCMg== Received: from CY5P221CA0019.NAMP221.PROD.OUTLOOK.COM (2603:10b6:930:b::32) by SA0PR12MB4574.namprd12.prod.outlook.com (2603:10b6:806:94::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.14; Mon, 5 Dec 2022 14:49:40 +0000 Received: from CY4PEPF0000B8EC.namprd05.prod.outlook.com (2603:10b6:930:b:cafe::e3) by CY5P221CA0019.outlook.office365.com (2603:10b6:930:b::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.13 via Frontend Transport; Mon, 5 Dec 2022 14:49:40 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by CY4PEPF0000B8EC.mail.protection.outlook.com (10.167.241.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5813.11 via Frontend Transport; Mon, 5 Dec 2022 14:49:40 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Mon, 5 Dec 2022 06:49:26 -0800 Received: from rnnvmail205.nvidia.com (10.129.68.10) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Mon, 5 Dec 2022 06:49:25 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.129.68.10) with Microsoft SMTP Server id 15.2.986.36 via Frontend Transport; Mon, 5 Dec 2022 06:49:22 -0800 From: Yishai Hadas To: , CC: , , , , , , , , , Subject: [PATCH V3 vfio 06/14] vfio/mlx5: Refactor migration file state Date: Mon, 5 Dec 2022 16:48:30 +0200 Message-ID: <20221205144838.245287-7-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221205144838.245287-1-yishaih@nvidia.com> References: <20221205144838.245287-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY4PEPF0000B8EC:EE_|SA0PR12MB4574:EE_ X-MS-Office365-Filtering-Correlation-Id: 680094c1-fde9-4613-2d82-08dad6cff04d X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: N6u+ifb3K3Xjs372abDCFLyWTEQZQDa+okN4g7TbRpBrTOYlr0pYdlZg27r1aUGlgZLCU5UEn4HNN6JsjAdMaRtp+rEwP2FvUQ7k/MxpEMB3aypk/A8TsVKVNgHkbPiWyISp7iJT+vE94Cy+S+pePipST5EEZg3GDhiHflMYhiM64qH1gV+xsjHqY6jPaxhI/aslwU9U/OyZQtDYa889kLzYrf3EFxaQ5zlqpqjTHFvq0EcM/BmHpYL9pCs6MiODdsHVZ93qE07lkwUiCO/5Yvkm4TNPqUT5S+chT7jHnJdFVX9IY98fIFOWKWGFnhoFtYOGxbK6aJHbkFilcUNgkTAB2M1AfzgLfNn+bsmPGv1xTF3t0kkFf2gpjUNiwDLy1FiBcoZW9VuMRNDf8Tl+9mP0mpiNdsDsmrBTKuDkHP+hWpPTJ143o0fW7L2/3Se16YTd0LRPGl2pmZPDT9Qu1IqUH0QTmf4f6ppN5Cyo58XFDaJSIPBFqwPwzxTrZGFtGf3W0Q7qRTrKaw+ifcf/+0ac16BTHqXwfYsy41ydT6Tv06hvOeVxhKw6KRyrTR27Ai8cq+cb0AKlvB5qrURVIuyPXENSOt138uWC3IbnecEZzD+r+g0qN49rAHctJluTZVN7J4XmNiBLJ5XuuJIpahL8wWMaIGDtTrbK/WPSHjUYUHI0knM1W0lKer9II3vtGRgcYwv8FMJbI4oqqT3+Vg== X-Forefront-Antispam-Report: CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(346002)(136003)(39860400002)(396003)(376002)(451199015)(46966006)(36840700001)(40470700004)(86362001)(36756003)(40480700001)(40460700003)(1076003)(2616005)(336012)(41300700001)(70586007)(70206006)(4326008)(8676002)(8936002)(5660300002)(47076005)(426003)(7696005)(478600001)(6666004)(316002)(6636002)(54906003)(110136005)(186003)(26005)(7636003)(356005)(82310400005)(82740400003)(83380400001)(2906002)(36860700001);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Dec 2022 14:49:40.3104 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 680094c1-fde9-4613-2d82-08dad6cff04d X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CY4PEPF0000B8EC.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA0PR12MB4574 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Refactor migration file state to be an emum which is mutual exclusive. As of that dropped the 'disabled' state as 'error' is the same from functional point of view. Next patches from the series will extend this enum for other relevant states. Reviewed-by: Jason Gunthorpe Signed-off-by: Yishai Hadas --- drivers/vfio/pci/mlx5/cmd.c | 2 +- drivers/vfio/pci/mlx5/cmd.h | 7 +++++-- drivers/vfio/pci/mlx5/main.c | 11 ++++++----- 3 files changed, 12 insertions(+), 8 deletions(-) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index ed4c472d2eae..fcba12326185 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -351,7 +351,7 @@ void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work) mutex_lock(&migf->lock); if (async_data->status) { - migf->is_err = true; + migf->state = MLX5_MIGF_STATE_ERROR; wake_up_interruptible(&migf->poll_wait); } mutex_unlock(&migf->lock); diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index b0f08dfc8120..14403e654e4e 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -12,6 +12,10 @@ #include #include +enum mlx5_vf_migf_state { + MLX5_MIGF_STATE_ERROR = 1, +}; + struct mlx5_vhca_data_buffer { struct sg_append_table table; loff_t start_pos; @@ -37,8 +41,7 @@ struct mlx5vf_async_data { struct mlx5_vf_migration_file { struct file *filp; struct mutex lock; - u8 disabled:1; - u8 is_err:1; + enum mlx5_vf_migf_state state; u32 pdn; struct mlx5_vhca_data_buffer *buf; diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index 5f694fce854c..d95646c2f010 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -109,7 +109,7 @@ int mlx5vf_add_migration_pages(struct mlx5_vhca_data_buffer *buf, static void mlx5vf_disable_fd(struct mlx5_vf_migration_file *migf) { mutex_lock(&migf->lock); - migf->disabled = true; + migf->state = MLX5_MIGF_STATE_ERROR; migf->filp->f_pos = 0; mutex_unlock(&migf->lock); } @@ -137,7 +137,8 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, if (!(filp->f_flags & O_NONBLOCK)) { if (wait_event_interruptible(migf->poll_wait, - READ_ONCE(vhca_buf->length) || migf->is_err)) + READ_ONCE(vhca_buf->length) || + migf->state == MLX5_MIGF_STATE_ERROR)) return -ERESTARTSYS; } @@ -150,7 +151,7 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, done = -EINVAL; goto out_unlock; } - if (migf->disabled || migf->is_err) { + if (migf->state == MLX5_MIGF_STATE_ERROR) { done = -ENODEV; goto out_unlock; } @@ -199,7 +200,7 @@ static __poll_t mlx5vf_save_poll(struct file *filp, poll_wait(filp, &migf->poll_wait, wait); mutex_lock(&migf->lock); - if (migf->disabled || migf->is_err) + if (migf->state == MLX5_MIGF_STATE_ERROR) pollflags = EPOLLIN | EPOLLRDNORM | EPOLLRDHUP; else if (READ_ONCE(migf->buf->length)) pollflags = EPOLLIN | EPOLLRDNORM; @@ -298,7 +299,7 @@ static ssize_t mlx5vf_resume_write(struct file *filp, const char __user *buf, return -ENOMEM; mutex_lock(&migf->lock); - if (migf->disabled) { + if (migf->state == MLX5_MIGF_STATE_ERROR) { done = -ENODEV; goto out_unlock; } From patchwork Mon Dec 5 14:48:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13064580 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72809C47088 for ; Mon, 5 Dec 2022 14:49:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231633AbiLEOts (ORCPT ); Mon, 5 Dec 2022 09:49:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53300 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231586AbiLEOtl (ORCPT ); Mon, 5 Dec 2022 09:49:41 -0500 Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2051.outbound.protection.outlook.com [40.107.223.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E42F61C424 for ; Mon, 5 Dec 2022 06:49:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=gbhj2c3YefK3t3fi7aa9TNntizdmur5zpA3WIv3pFRSwGVQT182dZhhxHVu69Jx7FTGIi5IlYdlaaq2lnYc3y5uXKoQeh29ZHEkoBLhrO3ewIt+ZrODRc5znI8r4yG2LjwwiCy5lGXmXficZ5XbVz0NtwF6XafEJNpFE/7DkBQPm8tbBCiQAYhF5Y1/oqKJd/UsOpCFM0C1dHDItFCt53Y+L3k1L8UQG55ctmFS8eKwqKgCEdsSqQa+xv15ePHru/us/bqb9qxhXIiz/Q4LXmq0JQ8mU4Or+otjfs8AiWYxuOmbTE97lJhgzNMPwW56TMJsx1uOq8R5VNODwsxd0iw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=mknVYAkYjeaZmtPY2XBE735t1zrTG3dCOphbrPb2urM=; b=XvGoRNANEpOLpq++5sxlYtL3jDfGg/9oADQEsCLecXkO1d3OQegI7DC/m7gmN7+d0sr89pxRPnGq7Ph4wP1oCF8y/lZGaFIKWL0jV6LFqejDVAHTvFLSpD+FVSR8o6/x6Hl/qCHsfhk6nBa5/eC5K2Fh+BGFF/NCCiuF4le8lqXiWtUJBePEmhnuxRnAv0Wzsx6fO+xLmO1EMrPp+xsbC6IHx3ooVnLK97WsIi3GyZwKughEv6/ddPI+dH02mf6QujFLh+OFWL1X/Q8TFqVCnTyue5LF7ytpE/NUg17ietKV0TX8paaAVXoDFnA29ukpX9Lc1lj7hewCM2+iG0m8iw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.160) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=mknVYAkYjeaZmtPY2XBE735t1zrTG3dCOphbrPb2urM=; b=NOihQxbQ0/85T2AFj6FMEHBLMefO5LlOqZX6YoZTy/EASnSPwRyBDAKX3CD+Admg6imoXF/KqeP7LaehjOEPDjcyaTOfUb/X2gK42RtmaRE54ZR4dnS3quKzuPUkJpw0xwyc32HCYCWCooyeU0S6ofcdhxzOG8817Bo7irMx4CTltv+i24Ze6AyqrT16oOl3OgfTm2QZMvJrRRZSughvQE6WW2jXlGg3sKWS7N6nYeW24mJossv2UPIQi9vsWfJGur+vRIEdnvOpvOpcILsvJeM7KhzhczSTw1CwH8wZXd9/BkJmsvrFB470CNNMx7ZGk0cWDgE6HwbW0ccaD7jnIQ== Received: from MW4PR04CA0100.namprd04.prod.outlook.com (2603:10b6:303:83::15) by DM6PR12MB4545.namprd12.prod.outlook.com (2603:10b6:5:2a3::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.13; Mon, 5 Dec 2022 14:49:38 +0000 Received: from CO1NAM11FT101.eop-nam11.prod.protection.outlook.com (2603:10b6:303:83:cafe::1f) by MW4PR04CA0100.outlook.office365.com (2603:10b6:303:83::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.13 via Frontend Transport; Mon, 5 Dec 2022 14:49:38 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.160) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.160) by CO1NAM11FT101.mail.protection.outlook.com (10.13.175.164) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.14 via Frontend Transport; Mon, 5 Dec 2022 14:49:37 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Mon, 5 Dec 2022 06:49:30 -0800 Received: from rnnvmail205.nvidia.com (10.129.68.10) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Mon, 5 Dec 2022 06:49:29 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.129.68.10) with Microsoft SMTP Server id 15.2.986.36 via Frontend Transport; Mon, 5 Dec 2022 06:49:26 -0800 From: Yishai Hadas To: , CC: , , , , , , , , , Subject: [PATCH V3 vfio 07/14] vfio/mlx5: Refactor to use queue based data chunks Date: Mon, 5 Dec 2022 16:48:31 +0200 Message-ID: <20221205144838.245287-8-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221205144838.245287-1-yishaih@nvidia.com> References: <20221205144838.245287-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1NAM11FT101:EE_|DM6PR12MB4545:EE_ X-MS-Office365-Filtering-Correlation-Id: 1afe0933-77d4-4e95-d3c9-08dad6cfeee2 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Cjk2S+7JJL9TyrD4YR+3sr64jTvUJK8cSKTbk9HemDBdmsbdSs+9mRzlmFH5UOiL8wzQ2DdlqqISlQsLRg89rmuNXpmuToEUOhUG9gYC/IwqKIiOaQK70zttNBkOTcK2dH09TgVNfZB5I/ng2bt4cMjXaN4R8IZJ78QDMoFyAUl8yOdIQ1VJ1ADNTL+XqixO5uSzNRd6+hG2/U2kZBg/qrYjYkjiCNseGiq933kKjnAss+qO07llIKMu3bFj7FG1fCWHo09XrjzdyN/CF/cOcGFIj/aoXhHCLhV3vmDlAF5eMsVWa3BAFTcZTa/l+vJma/SNR47NczE4OYb6E5pf3w9yipWqn6lmpaFbNj5ISN3D8bz0nIOIhEGAQ8/Tg9rSmQaWcjFReOFQBxlElgtqb9rkGkGNrA6cemka9R7/tUQl4QDIvd50JIo4SoWVRpp/PjEaNsGk1vD6kqX0iJ4XJdceJ3tmfzUMxAaKj+wAHzCzbA7TeSW8Nzeh3rdCCIS4pgA1+FhCOVk2+nUcd7KNDVi9gfw4gWXyBsKCsN/ahMZebBWJgRL7rPs4vUM4KNXfcazlyKZG1d4Glb5p6NrZhI0G98ExsltMNTECCG6diA4cEmBqKj+o7MJ/BqiRbRJP0ikCvHKZ4oebDfcNazVTNHrpxQpTbm9xVja05YnmxMuv0bDIQgmaSlmSVlooMUux8Ejot2ktlNpAQSpnMLv6iQ== X-Forefront-Antispam-Report: CIP:216.228.117.160;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge1.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(136003)(396003)(346002)(376002)(39860400002)(451199015)(46966006)(36840700001)(40470700004)(40480700001)(36756003)(86362001)(40460700003)(7636003)(7696005)(478600001)(70586007)(70206006)(4326008)(8936002)(41300700001)(5660300002)(6666004)(8676002)(54906003)(316002)(2906002)(110136005)(6636002)(36860700001)(82310400005)(82740400003)(356005)(336012)(26005)(2616005)(83380400001)(186003)(1076003)(426003)(47076005)(66899015);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Dec 2022 14:49:37.9826 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 1afe0933-77d4-4e95-d3c9-08dad6cfeee2 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.160];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT101.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR12MB4545 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Refactor to use queue based data chunks on the migration file. The SAVE command adds a chunk to the tail of the queue while the read() API finds the required chunk and returns its data. In case the queue is empty but the state of the migration file is MLX5_MIGF_STATE_COMPLETE, read() may not be blocked but will return 0 to indicate end of file. This is a step towards maintaining multiple images and their meta data (i.e. headers) on the migration file as part of next patches from the series. Note: At that point, we still use a single chunk on the migration file but becomes ready to support multiple. Reviewed-by: Jason Gunthorpe Signed-off-by: Yishai Hadas --- drivers/vfio/pci/mlx5/cmd.c | 24 +++++- drivers/vfio/pci/mlx5/cmd.h | 5 ++ drivers/vfio/pci/mlx5/main.c | 145 +++++++++++++++++++++++++++-------- 3 files changed, 136 insertions(+), 38 deletions(-) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index fcba12326185..0e36b4c8c816 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -351,6 +351,7 @@ void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work) mutex_lock(&migf->lock); if (async_data->status) { + migf->buf = async_data->buf; migf->state = MLX5_MIGF_STATE_ERROR; wake_up_interruptible(&migf->poll_wait); } @@ -368,9 +369,15 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context) struct mlx5_vf_migration_file, async_data); if (!status) { - WRITE_ONCE(migf->buf->length, - MLX5_GET(save_vhca_state_out, async_data->out, - actual_image_size)); + unsigned long flags; + + async_data->buf->length = + MLX5_GET(save_vhca_state_out, async_data->out, + actual_image_size); + spin_lock_irqsave(&migf->list_lock, flags); + list_add_tail(&async_data->buf->buf_elm, &migf->buf_list); + spin_unlock_irqrestore(&migf->list_lock, flags); + migf->state = MLX5_MIGF_STATE_COMPLETE; wake_up_interruptible(&migf->poll_wait); } @@ -407,6 +414,7 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, MLX5_SET(save_vhca_state_in, in, size, buf->allocated_length); async_data = &migf->async_data; + async_data->buf = buf; async_data->out = kvzalloc(out_size, GFP_KERNEL); if (!async_data->out) { err = -ENOMEM; @@ -479,14 +487,22 @@ void mlx5vf_cmd_dealloc_pd(struct mlx5_vf_migration_file *migf) void mlx5fv_cmd_clean_migf_resources(struct mlx5_vf_migration_file *migf) { - lockdep_assert_held(&migf->mvdev->state_mutex); + struct mlx5_vhca_data_buffer *entry; + lockdep_assert_held(&migf->mvdev->state_mutex); WARN_ON(migf->mvdev->mdev_detach); if (migf->buf) { mlx5vf_free_data_buffer(migf->buf); migf->buf = NULL; } + + while ((entry = list_first_entry_or_null(&migf->buf_list, + struct mlx5_vhca_data_buffer, buf_elm))) { + list_del(&entry->buf_elm); + mlx5vf_free_data_buffer(entry); + } + mlx5vf_cmd_dealloc_pd(migf); } diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index 14403e654e4e..6e594689566e 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -14,6 +14,7 @@ enum mlx5_vf_migf_state { MLX5_MIGF_STATE_ERROR = 1, + MLX5_MIGF_STATE_COMPLETE, }; struct mlx5_vhca_data_buffer { @@ -24,6 +25,7 @@ struct mlx5_vhca_data_buffer { u32 mkey; enum dma_data_direction dma_dir; u8 dmaed:1; + struct list_head buf_elm; struct mlx5_vf_migration_file *migf; /* Optimize mlx5vf_get_migration_page() for sequential access */ struct scatterlist *last_offset_sg; @@ -34,6 +36,7 @@ struct mlx5_vhca_data_buffer { struct mlx5vf_async_data { struct mlx5_async_work cb_work; struct work_struct work; + struct mlx5_vhca_data_buffer *buf; int status; void *out; }; @@ -45,6 +48,8 @@ struct mlx5_vf_migration_file { u32 pdn; struct mlx5_vhca_data_buffer *buf; + spinlock_t list_lock; + struct list_head buf_list; struct mlx5vf_pci_core_device *mvdev; wait_queue_head_t poll_wait; struct completion save_comp; diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index d95646c2f010..ca16425811c4 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -124,11 +124,90 @@ static int mlx5vf_release_file(struct inode *inode, struct file *filp) return 0; } +static struct mlx5_vhca_data_buffer * +mlx5vf_get_data_buff_from_pos(struct mlx5_vf_migration_file *migf, loff_t pos, + bool *end_of_data) +{ + struct mlx5_vhca_data_buffer *buf; + bool found = false; + + *end_of_data = false; + spin_lock_irq(&migf->list_lock); + if (list_empty(&migf->buf_list)) { + *end_of_data = true; + goto end; + } + + buf = list_first_entry(&migf->buf_list, struct mlx5_vhca_data_buffer, + buf_elm); + if (pos >= buf->start_pos && + pos < buf->start_pos + buf->length) { + found = true; + goto end; + } + + /* + * As we use a stream based FD we may expect having the data always + * on first chunk + */ + migf->state = MLX5_MIGF_STATE_ERROR; + +end: + spin_unlock_irq(&migf->list_lock); + return found ? buf : NULL; +} + +static ssize_t mlx5vf_buf_read(struct mlx5_vhca_data_buffer *vhca_buf, + char __user **buf, size_t *len, loff_t *pos) +{ + unsigned long offset; + ssize_t done = 0; + size_t copy_len; + + copy_len = min_t(size_t, + vhca_buf->start_pos + vhca_buf->length - *pos, *len); + while (copy_len) { + size_t page_offset; + struct page *page; + size_t page_len; + u8 *from_buff; + int ret; + + offset = *pos - vhca_buf->start_pos; + page_offset = offset % PAGE_SIZE; + offset -= page_offset; + page = mlx5vf_get_migration_page(vhca_buf, offset); + if (!page) + return -EINVAL; + page_len = min_t(size_t, copy_len, PAGE_SIZE - page_offset); + from_buff = kmap_local_page(page); + ret = copy_to_user(*buf, from_buff + page_offset, page_len); + kunmap_local(from_buff); + if (ret) + return -EFAULT; + *pos += page_len; + *len -= page_len; + *buf += page_len; + done += page_len; + copy_len -= page_len; + } + + if (*pos >= vhca_buf->start_pos + vhca_buf->length) { + spin_lock_irq(&vhca_buf->migf->list_lock); + list_del_init(&vhca_buf->buf_elm); + spin_unlock_irq(&vhca_buf->migf->list_lock); + } + + return done; +} + static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, loff_t *pos) { struct mlx5_vf_migration_file *migf = filp->private_data; - struct mlx5_vhca_data_buffer *vhca_buf = migf->buf; + struct mlx5_vhca_data_buffer *vhca_buf; + bool first_loop_call = true; + bool end_of_data; ssize_t done = 0; if (pos) @@ -137,53 +216,47 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, if (!(filp->f_flags & O_NONBLOCK)) { if (wait_event_interruptible(migf->poll_wait, - READ_ONCE(vhca_buf->length) || - migf->state == MLX5_MIGF_STATE_ERROR)) + !list_empty(&migf->buf_list) || + migf->state == MLX5_MIGF_STATE_ERROR || + migf->state == MLX5_MIGF_STATE_COMPLETE)) return -ERESTARTSYS; } mutex_lock(&migf->lock); - if ((filp->f_flags & O_NONBLOCK) && !READ_ONCE(vhca_buf->length)) { - done = -EAGAIN; - goto out_unlock; - } - if (*pos > vhca_buf->length) { - done = -EINVAL; - goto out_unlock; - } if (migf->state == MLX5_MIGF_STATE_ERROR) { done = -ENODEV; goto out_unlock; } - len = min_t(size_t, vhca_buf->length - *pos, len); while (len) { - size_t page_offset; - struct page *page; - size_t page_len; - u8 *from_buff; - int ret; + ssize_t count; + + vhca_buf = mlx5vf_get_data_buff_from_pos(migf, *pos, + &end_of_data); + if (first_loop_call) { + first_loop_call = false; + if (end_of_data && migf->state != MLX5_MIGF_STATE_COMPLETE) { + if (filp->f_flags & O_NONBLOCK) { + done = -EAGAIN; + goto out_unlock; + } + } + } - page_offset = (*pos) % PAGE_SIZE; - page = mlx5vf_get_migration_page(vhca_buf, *pos - page_offset); - if (!page) { - if (done == 0) - done = -EINVAL; + if (end_of_data) + goto out_unlock; + + if (!vhca_buf) { + done = -EINVAL; goto out_unlock; } - page_len = min_t(size_t, len, PAGE_SIZE - page_offset); - from_buff = kmap_local_page(page); - ret = copy_to_user(buf, from_buff + page_offset, page_len); - kunmap_local(from_buff); - if (ret) { - done = -EFAULT; + count = mlx5vf_buf_read(vhca_buf, &buf, &len, pos); + if (count < 0) { + done = count; goto out_unlock; } - *pos += page_len; - len -= page_len; - done += page_len; - buf += page_len; + done += count; } out_unlock: @@ -202,7 +275,8 @@ static __poll_t mlx5vf_save_poll(struct file *filp, mutex_lock(&migf->lock); if (migf->state == MLX5_MIGF_STATE_ERROR) pollflags = EPOLLIN | EPOLLRDNORM | EPOLLRDHUP; - else if (READ_ONCE(migf->buf->length)) + else if (!list_empty(&migf->buf_list) || + migf->state == MLX5_MIGF_STATE_COMPLETE) pollflags = EPOLLIN | EPOLLRDNORM; mutex_unlock(&migf->lock); @@ -253,6 +327,8 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev) complete(&migf->save_comp); mlx5_cmd_init_async_ctx(mvdev->mdev, &migf->async_ctx); INIT_WORK(&migf->async_data.work, mlx5vf_mig_file_cleanup_cb); + INIT_LIST_HEAD(&migf->buf_list); + spin_lock_init(&migf->list_lock); ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length); if (ret) goto out_pd; @@ -266,7 +342,6 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev) ret = mlx5vf_cmd_save_vhca_state(mvdev, migf, buf); if (ret) goto out_save; - migf->buf = buf; return migf; out_save: mlx5vf_free_data_buffer(buf); @@ -386,6 +461,8 @@ mlx5vf_pci_resume_device_data(struct mlx5vf_pci_core_device *mvdev) migf->buf = buf; stream_open(migf->filp->f_inode, migf->filp); mutex_init(&migf->lock); + INIT_LIST_HEAD(&migf->buf_list); + spin_lock_init(&migf->list_lock); return migf; out_pd: mlx5vf_cmd_dealloc_pd(migf); From patchwork Mon Dec 5 14:48:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13064582 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2552EC47088 for ; Mon, 5 Dec 2022 14:49:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231715AbiLEOty (ORCPT ); Mon, 5 Dec 2022 09:49:54 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53616 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231931AbiLEOtw (ORCPT ); Mon, 5 Dec 2022 09:49:52 -0500 Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2063.outbound.protection.outlook.com [40.107.223.63]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AB0E01C427 for ; Mon, 5 Dec 2022 06:49:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=iNjd37tGtJTarZS0wD/dqa/UEcd+5+vzGtgzEBvHMiCLgYQnGZ0Fi2j/3I09MklnvLp30/Mi8qN293QyfUCVyp0R5IbmrHa90OSSEMlPeo+4UbvMjDye+BEG1vgr7z/IQqAl154OMcPOh3i8ltu4irYR7f7zvpT4zn5aXen0Wal7jJjOIM/+xexK4Vw1DXK7RiTRz8vzMVQi5rlOGW23kpARm6N581FZnjg9+uP3Ue2HuuFuf6lnOiy4YMPb2tavmbY0SIFGpDoiKrQgS6gxmQWa48sWDYG1U+Y3Bu0in4lflBTpaSouZPJrhKra9tzBhDRRQlE94WeGTtUo8nfyCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=PdEkCmgLeKm3w5mcdoKZQXJXayGVnWJqzv1f+Y6dQHw=; b=Y1rAzXghUPNYAJeOmcH3P+YOT3HPbRjEhUWR2a/M9PtigC0L45jhSnAzSw9gmZufS1yRmTdBJgMyQ6mthySaN/B4L4Cjl3hc21NbTih4cSwRXWO03aLrlWzTMGx9pUc6ar2x+UQulFbKlEWbOIirgFigPAoZRec5sShlRUshOKNFhzRcSm6qoX7qw55E/S9Sb/LPw0yMFSe60tJDLnMnPCvgQxN77Ojgi9FRCapa9NgJSeUjwRtK5CLXtUV6jMMijxQaPMwII9+oiZVBKHfh0bVxENtJhCUSIFSbIz9gcXcmdT0Kn1rvtiwV9V9x+cPEVSLYGAA3ujywIfTi2nIwag== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=PdEkCmgLeKm3w5mcdoKZQXJXayGVnWJqzv1f+Y6dQHw=; b=ZK4wejVroZQClsJ+RD1/p6BjdWQcHkVHFTH5ykFwaRAaLk5jH+qb4ukFMB+sLqzU0OB6Hwvu1KH+/pxmqBsJ/wTqepYlp9axkU2V69s8CSUO8imLBzSJrESo8UcS2dpC/mTgsVLjhcEp1HhPU1Ftb1H3bW+HLcfwnmibueIOTx94hgVionhExl6/7oKGAw2ZrZ37tLcpH2oQCNXNWYVquuVQVOmyHZNrBMKqv/jVc8H6hmHq7lSa4sxa0w8HIhBWqWDF7yuo4Hmbh53XevXe/02piJvWdU8mjBpjND0IW+wW+Hl1NFE3URrlVPwUfVrd1hnGRNhkwlNvt5xM11tBBg== Received: from CY8PR22CA0007.namprd22.prod.outlook.com (2603:10b6:930:45::15) by CO6PR12MB5459.namprd12.prod.outlook.com (2603:10b6:303:13b::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.14; Mon, 5 Dec 2022 14:49:48 +0000 Received: from CY4PEPF0000B8EE.namprd05.prod.outlook.com (2603:10b6:930:45:cafe::ae) by CY8PR22CA0007.outlook.office365.com (2603:10b6:930:45::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.8 via Frontend Transport; Mon, 5 Dec 2022 14:49:47 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by CY4PEPF0000B8EE.mail.protection.outlook.com (10.167.241.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5813.11 via Frontend Transport; Mon, 5 Dec 2022 14:49:46 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Mon, 5 Dec 2022 06:49:34 -0800 Received: from rnnvmail205.nvidia.com (10.129.68.10) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Mon, 5 Dec 2022 06:49:33 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.129.68.10) with Microsoft SMTP Server id 15.2.986.36 via Frontend Transport; Mon, 5 Dec 2022 06:49:30 -0800 From: Yishai Hadas To: , CC: , , , , , , , , , Subject: [PATCH V3 vfio 08/14] vfio/mlx5: Introduce device transitions of PRE_COPY Date: Mon, 5 Dec 2022 16:48:32 +0200 Message-ID: <20221205144838.245287-9-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221205144838.245287-1-yishaih@nvidia.com> References: <20221205144838.245287-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY4PEPF0000B8EE:EE_|CO6PR12MB5459:EE_ X-MS-Office365-Filtering-Correlation-Id: 32370f18-657b-4133-44bc-08dad6cff3fa X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: FRUBy2yRnWHyhCAOZjgHBeSBzrun79uc6OCVp6e7Wna4XjHoRO6th6ILCrpAwm0DOKDBIB2V6S+H6v2nUY8RzZCJdLvd1YDByv1xLo2oiuIWsLOs75CTleCiR6YY9iz4wo6H+1AbE1qRFrMkQ7E/oYW233ZsFBSszGmBBMetzI8OYQXpp5R7DBzV+pYTqWj36qPx6U8tCUuvNeUm2+RgfU1XGXAPC3sXubX+sWk+aYheIS6uWcEK1r6wlKkTYhhcWo0o1digNjGr9J7aPKuLZRZSooj+n8ZRUnHw8OM6rV9u0AyDKnNKOr6G6Lb3d38p3lUHaTN0Hog+6FyqnV6VIvkrCRSgXSp86uQqPt3RhAIpch1mirfb25/kSCsTSFuSRRGnjgo4sqh7qTztwbmjtdzTCL7TNT17g4a/uEKPR141KavacTJIBURPpYODDM+B+8YBjt8L6cep4CN0gDcdhYTDo4AoX5ZeDV4ctACYRHut+0nTKNmqR2WdxmPozQ6ItbBiX5gN/CnYZ3QWNMpCr2ScyUVTDrsHx1TUdMK2i4WzcmDvLL6Yyfj6uacOO61ZdHkWxWxh0ERCnxCfuO7YS0TMwPuyrzTYzOKzCrOA7Cafq5GKBfmBMQXpySzUXdEdhodDOCaJDvNzc5SvaO4sQ1mMoX8KQDmnQZrv0paV/9a1OzQZTTGdEGNSLH+kxXBduDXKALHdYmJjyO++hOIPtQ== X-Forefront-Antispam-Report: CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(396003)(39860400002)(376002)(346002)(136003)(451199015)(40470700004)(36840700001)(46966006)(36756003)(82740400003)(86362001)(7636003)(356005)(8676002)(41300700001)(70206006)(4326008)(70586007)(2906002)(8936002)(30864003)(83380400001)(36860700001)(40460700003)(478600001)(5660300002)(2616005)(6636002)(1076003)(54906003)(316002)(110136005)(40480700001)(82310400005)(186003)(336012)(426003)(47076005)(7696005)(26005);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Dec 2022 14:49:46.4467 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 32370f18-657b-4133-44bc-08dad6cff3fa X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CY4PEPF0000B8EE.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CO6PR12MB5459 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org In order to support PRE_COPY, mlx5 driver is transferring multiple states (images) of the device. e.g.: the source VF can save and transfer multiple states, and the target VF will load them by that order. The device is saving three kinds of states: 1) Initial state - when the device moves to PRE_COPY state. 2) Middle state - during PRE_COPY phase via VFIO_MIG_GET_PRECOPY_INFO. There can be multiple states of this type. 3) Final state - when the device moves to STOP_COPY state. After moving to PRE_COPY state, user is holding the saving migf FD and can use it. For example: user can start transferring data via read() callback. Also, user can switch from PRE_COPY to STOP_COPY whenever he sees it fits. This will invoke saving of final state. This means that mlx5 VFIO device can be switched to STOP_COPY without transferring any data in PRE_COPY state. Therefore, when the device moves to STOP_COPY, mlx5 will store the final state on a dedicated queue entry on the list. Co-developed-by: Shay Drory Signed-off-by: Shay Drory Reviewed-by: Jason Gunthorpe Signed-off-by: Yishai Hadas --- drivers/vfio/pci/mlx5/cmd.c | 96 +++++++++++++++++++++++++++++++++--- drivers/vfio/pci/mlx5/cmd.h | 16 +++++- drivers/vfio/pci/mlx5/main.c | 90 ++++++++++++++++++++++++++++++--- 3 files changed, 184 insertions(+), 18 deletions(-) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index 0e36b4c8c816..5fcece201d4c 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -14,18 +14,36 @@ _mlx5vf_free_page_tracker_resources(struct mlx5vf_pci_core_device *mvdev); int mlx5vf_cmd_suspend_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod) { + struct mlx5_vf_migration_file *migf = mvdev->saving_migf; u32 out[MLX5_ST_SZ_DW(suspend_vhca_out)] = {}; u32 in[MLX5_ST_SZ_DW(suspend_vhca_in)] = {}; + int err; lockdep_assert_held(&mvdev->state_mutex); if (mvdev->mdev_detach) return -ENOTCONN; + /* + * In case PRE_COPY is used, saving_migf is exposed while the device is + * running. Make sure to run only once there is no active save command. + * Running both in parallel, might end-up with a failure in the save + * command once it will try to turn on 'tracking' on a suspended device. + */ + if (migf) { + err = wait_for_completion_interruptible(&migf->save_comp); + if (err) + return err; + } + MLX5_SET(suspend_vhca_in, in, opcode, MLX5_CMD_OP_SUSPEND_VHCA); MLX5_SET(suspend_vhca_in, in, vhca_id, mvdev->vhca_id); MLX5_SET(suspend_vhca_in, in, op_mod, op_mod); - return mlx5_cmd_exec_inout(mvdev->mdev, suspend_vhca, in, out); + err = mlx5_cmd_exec_inout(mvdev->mdev, suspend_vhca, in, out); + if (migf) + complete(&migf->save_comp); + + return err; } int mlx5vf_cmd_resume_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod) @@ -45,7 +63,7 @@ int mlx5vf_cmd_resume_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod) } int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev, - size_t *state_size) + size_t *state_size, u8 query_flags) { u32 out[MLX5_ST_SZ_DW(query_vhca_migration_state_out)] = {}; u32 in[MLX5_ST_SZ_DW(query_vhca_migration_state_in)] = {}; @@ -59,6 +77,8 @@ int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev, MLX5_CMD_OP_QUERY_VHCA_MIGRATION_STATE); MLX5_SET(query_vhca_migration_state_in, in, vhca_id, mvdev->vhca_id); MLX5_SET(query_vhca_migration_state_in, in, op_mod, 0); + MLX5_SET(query_vhca_migration_state_in, in, incremental, + query_flags & MLX5VF_QUERY_INC); ret = mlx5_cmd_exec_inout(mvdev->mdev, query_vhca_migration_state, in, out); @@ -342,6 +362,56 @@ mlx5vf_alloc_data_buffer(struct mlx5_vf_migration_file *migf, return ERR_PTR(ret); } +void mlx5vf_put_data_buffer(struct mlx5_vhca_data_buffer *buf) +{ + spin_lock_irq(&buf->migf->list_lock); + list_add_tail(&buf->buf_elm, &buf->migf->avail_list); + spin_unlock_irq(&buf->migf->list_lock); +} + +struct mlx5_vhca_data_buffer * +mlx5vf_get_data_buffer(struct mlx5_vf_migration_file *migf, + size_t length, enum dma_data_direction dma_dir) +{ + struct mlx5_vhca_data_buffer *buf, *temp_buf; + struct list_head free_list; + + lockdep_assert_held(&migf->mvdev->state_mutex); + if (migf->mvdev->mdev_detach) + return ERR_PTR(-ENOTCONN); + + INIT_LIST_HEAD(&free_list); + + spin_lock_irq(&migf->list_lock); + list_for_each_entry_safe(buf, temp_buf, &migf->avail_list, buf_elm) { + if (buf->dma_dir == dma_dir) { + list_del_init(&buf->buf_elm); + if (buf->allocated_length >= length) { + spin_unlock_irq(&migf->list_lock); + goto found; + } + /* + * Prevent holding redundant buffers. Put in a free + * list and call at the end not under the spin lock + * (&migf->list_lock) to mlx5vf_free_data_buffer which + * might sleep. + */ + list_add(&buf->buf_elm, &free_list); + } + } + spin_unlock_irq(&migf->list_lock); + buf = mlx5vf_alloc_data_buffer(migf, length, dma_dir); + +found: + while ((temp_buf = list_first_entry_or_null(&free_list, + struct mlx5_vhca_data_buffer, buf_elm))) { + list_del(&temp_buf->buf_elm); + mlx5vf_free_data_buffer(temp_buf); + } + + return buf; +} + void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work) { struct mlx5vf_async_data *async_data = container_of(_work, @@ -351,7 +421,7 @@ void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work) mutex_lock(&migf->lock); if (async_data->status) { - migf->buf = async_data->buf; + mlx5vf_put_data_buffer(async_data->buf); migf->state = MLX5_MIGF_STATE_ERROR; wake_up_interruptible(&migf->poll_wait); } @@ -369,15 +439,19 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context) struct mlx5_vf_migration_file, async_data); if (!status) { + size_t image_size; unsigned long flags; - async_data->buf->length = - MLX5_GET(save_vhca_state_out, async_data->out, - actual_image_size); + image_size = MLX5_GET(save_vhca_state_out, async_data->out, + actual_image_size); + async_data->buf->length = image_size; + async_data->buf->start_pos = migf->max_pos; + migf->max_pos += async_data->buf->length; spin_lock_irqsave(&migf->list_lock, flags); list_add_tail(&async_data->buf->buf_elm, &migf->buf_list); spin_unlock_irqrestore(&migf->list_lock, flags); - migf->state = MLX5_MIGF_STATE_COMPLETE; + if (async_data->last_chunk) + migf->state = MLX5_MIGF_STATE_COMPLETE; wake_up_interruptible(&migf->poll_wait); } @@ -391,7 +465,8 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context) int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, struct mlx5_vf_migration_file *migf, - struct mlx5_vhca_data_buffer *buf) + struct mlx5_vhca_data_buffer *buf, bool inc, + bool track) { u32 out_size = MLX5_ST_SZ_BYTES(save_vhca_state_out); u32 in[MLX5_ST_SZ_DW(save_vhca_state_in)] = {}; @@ -412,9 +487,12 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, MLX5_SET(save_vhca_state_in, in, vhca_id, mvdev->vhca_id); MLX5_SET(save_vhca_state_in, in, mkey, buf->mkey); MLX5_SET(save_vhca_state_in, in, size, buf->allocated_length); + MLX5_SET(save_vhca_state_in, in, incremental, inc); + MLX5_SET(save_vhca_state_in, in, set_track, track); async_data = &migf->async_data; async_data->buf = buf; + async_data->last_chunk = !track; async_data->out = kvzalloc(out_size, GFP_KERNEL); if (!async_data->out) { err = -ENOMEM; @@ -497,6 +575,8 @@ void mlx5fv_cmd_clean_migf_resources(struct mlx5_vf_migration_file *migf) migf->buf = NULL; } + list_splice(&migf->avail_list, &migf->buf_list); + while ((entry = list_first_entry_or_null(&migf->buf_list, struct mlx5_vhca_data_buffer, buf_elm))) { list_del(&entry->buf_elm); diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index 6e594689566e..34e61c7aa23d 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -38,6 +38,7 @@ struct mlx5vf_async_data { struct work_struct work; struct mlx5_vhca_data_buffer *buf; int status; + u8 last_chunk:1; void *out; }; @@ -47,9 +48,11 @@ struct mlx5_vf_migration_file { enum mlx5_vf_migf_state state; u32 pdn; + loff_t max_pos; struct mlx5_vhca_data_buffer *buf; spinlock_t list_lock; struct list_head buf_list; + struct list_head avail_list; struct mlx5vf_pci_core_device *mvdev; wait_queue_head_t poll_wait; struct completion save_comp; @@ -129,10 +132,14 @@ struct mlx5vf_pci_core_device { struct mlx5_core_dev *mdev; }; +enum { + MLX5VF_QUERY_INC = (1UL << 0), +}; + int mlx5vf_cmd_suspend_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod); int mlx5vf_cmd_resume_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod); int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev, - size_t *state_size); + size_t *state_size, u8 query_flags); void mlx5vf_cmd_set_migratable(struct mlx5vf_pci_core_device *mvdev, const struct vfio_migration_ops *mig_ops, const struct vfio_log_ops *log_ops); @@ -140,7 +147,8 @@ void mlx5vf_cmd_remove_migratable(struct mlx5vf_pci_core_device *mvdev); void mlx5vf_cmd_close_migratable(struct mlx5vf_pci_core_device *mvdev); int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, struct mlx5_vf_migration_file *migf, - struct mlx5_vhca_data_buffer *buf); + struct mlx5_vhca_data_buffer *buf, bool inc, + bool track); int mlx5vf_cmd_load_vhca_state(struct mlx5vf_pci_core_device *mvdev, struct mlx5_vf_migration_file *migf, struct mlx5_vhca_data_buffer *buf); @@ -151,6 +159,10 @@ struct mlx5_vhca_data_buffer * mlx5vf_alloc_data_buffer(struct mlx5_vf_migration_file *migf, size_t length, enum dma_data_direction dma_dir); void mlx5vf_free_data_buffer(struct mlx5_vhca_data_buffer *buf); +struct mlx5_vhca_data_buffer * +mlx5vf_get_data_buffer(struct mlx5_vf_migration_file *migf, + size_t length, enum dma_data_direction dma_dir); +void mlx5vf_put_data_buffer(struct mlx5_vhca_data_buffer *buf); int mlx5vf_add_migration_pages(struct mlx5_vhca_data_buffer *buf, unsigned int npages); void mlx5vf_state_mutex_unlock(struct mlx5vf_pci_core_device *mvdev); diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index ca16425811c4..9cabba456044 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -195,6 +195,7 @@ static ssize_t mlx5vf_buf_read(struct mlx5_vhca_data_buffer *vhca_buf, if (*pos >= vhca_buf->start_pos + vhca_buf->length) { spin_lock_irq(&vhca_buf->migf->list_lock); list_del_init(&vhca_buf->buf_elm); + list_add_tail(&vhca_buf->buf_elm, &vhca_buf->migf->avail_list); spin_unlock_irq(&vhca_buf->migf->list_lock); } @@ -283,6 +284,16 @@ static __poll_t mlx5vf_save_poll(struct file *filp, return pollflags; } +/* + * FD is exposed and user can use it after receiving an error. + * Mark migf in error, and wake the user. + */ +static void mlx5vf_mark_err(struct mlx5_vf_migration_file *migf) +{ + migf->state = MLX5_MIGF_STATE_ERROR; + wake_up_interruptible(&migf->poll_wait); +} + static const struct file_operations mlx5vf_save_fops = { .owner = THIS_MODULE, .read = mlx5vf_save_read, @@ -291,8 +302,42 @@ static const struct file_operations mlx5vf_save_fops = { .llseek = no_llseek, }; +static int mlx5vf_pci_save_device_inc_data(struct mlx5vf_pci_core_device *mvdev) +{ + struct mlx5_vf_migration_file *migf = mvdev->saving_migf; + struct mlx5_vhca_data_buffer *buf; + size_t length; + int ret; + + if (migf->state == MLX5_MIGF_STATE_ERROR) + return -ENODEV; + + ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length, + MLX5VF_QUERY_INC); + if (ret) + goto err; + + buf = mlx5vf_get_data_buffer(migf, length, DMA_FROM_DEVICE); + if (IS_ERR(buf)) { + ret = PTR_ERR(buf); + goto err; + } + + ret = mlx5vf_cmd_save_vhca_state(mvdev, migf, buf, true, false); + if (ret) + goto err_save; + + return 0; + +err_save: + mlx5vf_put_data_buffer(buf); +err: + mlx5vf_mark_err(migf); + return ret; +} + static struct mlx5_vf_migration_file * -mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev) +mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev, bool track) { struct mlx5_vf_migration_file *migf; struct mlx5_vhca_data_buffer *buf; @@ -328,8 +373,9 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev) mlx5_cmd_init_async_ctx(mvdev->mdev, &migf->async_ctx); INIT_WORK(&migf->async_data.work, mlx5vf_mig_file_cleanup_cb); INIT_LIST_HEAD(&migf->buf_list); + INIT_LIST_HEAD(&migf->avail_list); spin_lock_init(&migf->list_lock); - ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length); + ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length, 0); if (ret) goto out_pd; @@ -339,7 +385,7 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev) goto out_pd; } - ret = mlx5vf_cmd_save_vhca_state(mvdev, migf, buf); + ret = mlx5vf_cmd_save_vhca_state(mvdev, migf, buf, false, track); if (ret) goto out_save; return migf; @@ -462,6 +508,7 @@ mlx5vf_pci_resume_device_data(struct mlx5vf_pci_core_device *mvdev) stream_open(migf->filp->f_inode, migf->filp); mutex_init(&migf->lock); INIT_LIST_HEAD(&migf->buf_list); + INIT_LIST_HEAD(&migf->avail_list); spin_lock_init(&migf->list_lock); return migf; out_pd: @@ -514,7 +561,8 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev, return NULL; } - if (cur == VFIO_DEVICE_STATE_RUNNING && new == VFIO_DEVICE_STATE_RUNNING_P2P) { + if ((cur == VFIO_DEVICE_STATE_RUNNING && new == VFIO_DEVICE_STATE_RUNNING_P2P) || + (cur == VFIO_DEVICE_STATE_PRE_COPY && new == VFIO_DEVICE_STATE_PRE_COPY_P2P)) { ret = mlx5vf_cmd_suspend_vhca(mvdev, MLX5_SUSPEND_VHCA_IN_OP_MOD_SUSPEND_INITIATOR); if (ret) @@ -522,7 +570,8 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev, return NULL; } - if (cur == VFIO_DEVICE_STATE_RUNNING_P2P && new == VFIO_DEVICE_STATE_RUNNING) { + if ((cur == VFIO_DEVICE_STATE_RUNNING_P2P && new == VFIO_DEVICE_STATE_RUNNING) || + (cur == VFIO_DEVICE_STATE_PRE_COPY_P2P && new == VFIO_DEVICE_STATE_PRE_COPY)) { ret = mlx5vf_cmd_resume_vhca(mvdev, MLX5_RESUME_VHCA_IN_OP_MOD_RESUME_INITIATOR); if (ret) @@ -533,7 +582,7 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev, if (cur == VFIO_DEVICE_STATE_STOP && new == VFIO_DEVICE_STATE_STOP_COPY) { struct mlx5_vf_migration_file *migf; - migf = mlx5vf_pci_save_device_data(mvdev); + migf = mlx5vf_pci_save_device_data(mvdev, false); if (IS_ERR(migf)) return ERR_CAST(migf); get_file(migf->filp); @@ -541,7 +590,10 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev, return migf->filp; } - if ((cur == VFIO_DEVICE_STATE_STOP_COPY && new == VFIO_DEVICE_STATE_STOP)) { + if ((cur == VFIO_DEVICE_STATE_STOP_COPY && new == VFIO_DEVICE_STATE_STOP) || + (cur == VFIO_DEVICE_STATE_PRE_COPY && new == VFIO_DEVICE_STATE_RUNNING) || + (cur == VFIO_DEVICE_STATE_PRE_COPY_P2P && + new == VFIO_DEVICE_STATE_RUNNING_P2P)) { mlx5vf_disable_fds(mvdev); return NULL; } @@ -567,6 +619,28 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev, return NULL; } + if ((cur == VFIO_DEVICE_STATE_RUNNING && new == VFIO_DEVICE_STATE_PRE_COPY) || + (cur == VFIO_DEVICE_STATE_RUNNING_P2P && + new == VFIO_DEVICE_STATE_PRE_COPY_P2P)) { + struct mlx5_vf_migration_file *migf; + + migf = mlx5vf_pci_save_device_data(mvdev, true); + if (IS_ERR(migf)) + return ERR_CAST(migf); + get_file(migf->filp); + mvdev->saving_migf = migf; + return migf->filp; + } + + if (cur == VFIO_DEVICE_STATE_PRE_COPY_P2P && new == VFIO_DEVICE_STATE_STOP_COPY) { + ret = mlx5vf_cmd_suspend_vhca(mvdev, + MLX5_SUSPEND_VHCA_IN_OP_MOD_SUSPEND_RESPONDER); + if (ret) + return ERR_PTR(ret); + ret = mlx5vf_pci_save_device_inc_data(mvdev); + return ret ? ERR_PTR(ret) : NULL; + } + /* * vfio_mig_get_next_state() does not use arcs other than the above */ @@ -635,7 +709,7 @@ static int mlx5vf_pci_get_data_size(struct vfio_device *vdev, mutex_lock(&mvdev->state_mutex); ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, - &state_size); + &state_size, 0); if (!ret) *stop_copy_length = state_size; mlx5vf_state_mutex_unlock(mvdev); From patchwork Mon Dec 5 14:48:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13064583 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8CB51C4321E for ; Mon, 5 Dec 2022 14:49:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230505AbiLEOt4 (ORCPT ); Mon, 5 Dec 2022 09:49:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53682 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231951AbiLEOtw (ORCPT ); Mon, 5 Dec 2022 09:49:52 -0500 Received: from NAM04-DM6-obe.outbound.protection.outlook.com (mail-dm6nam04on2054.outbound.protection.outlook.com [40.107.102.54]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 193B71C106 for ; Mon, 5 Dec 2022 06:49:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=k4td+VJc0kQlErxWZJB5Vz51ZDw9B7O+3ElU6GzFZ0GI2nFdLAHjbAd5Y3/7surTN+xfOM74UuuFDn5n0jTw74AYttVF+IuUXNWAUG28H+z5ArBUkYtsaMAL/nUHaMmLv1sba2YPBZjjM66DbiRJ52Pvgz8WVXSdTkdO6Vi1ATp7ZXv4kV8oHkVFSze6wexh3dHMs8WaYVAGTLGVn4bJSbTNsHho8XXt1qu4ltcisN0ojK+xUiG5/rHsq2i2R8f5at7ogXSC99unswYRV9eQCJ5603LYzwE2a9pVBUpf5X31snbMF8Kv7Pi8c1XxucSxu/wHExTfeJlrnEFugBkctA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=1u4bF40v8iAzGAsqQRRKbN9Z1DkuTsXc0vlB2H7J+Ho=; b=TbRgAwepitxjTG1CjrNbwDcO1X3AB0MnmluIjnxHcdYSWAJxzPG1eg7NGYicbhHAbXlHVyRPSayKWHRwanhXvA/JRs7PXr44FTmd1iMf3AfehIYSfySLBK7yGZcX/axVatuJhg6LwIYb+FzuIG5IPvzfwAWs38aCLEu1AerWQ411ZrGgleGAXmJdSPDQlZH+8UMfSIsva3NDLXFeSoa3ZV0xSRgbos0YsgyDH/A0Pgm8C27h32TBf0YkW8TIlar5htHhkWwkuco9PJeLo8xfusH9lb+G7h1mNV6Ppt7umvpEidblzMu6dm1jomK9mX8Zx5/cHvrFhM/nqXu/cROvnw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.160) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=1u4bF40v8iAzGAsqQRRKbN9Z1DkuTsXc0vlB2H7J+Ho=; b=kikRIJGN8mmtswADY346tfqiE88bomGs1fYT5xTuxgHksCGBjXY24l+/XTpnp/hzGHs+p1cjPFOlJerUwpcm5M2i8n/wh3F5aJNzVByGwT2MPFBRGIfcFurrJcp7CRqSjpDWYaXIwgvqPMH0sgIjL+x/MzNA3oss8dgM3TJFkYwIY3aHWXBiJHq1t2Hc5QpTDgPDlzPiocbnw+WWvk2oGax3DlUmB5c54p4SR+uWLJakHZepp3u609ADoApa8k8IqXHUH3moM2jv0j41o2d0W+YPL8uHJgl7tQdddwjDrGUo1xZDjZ5KbDjOna4NFm64jkYdZJKPTXao93ND/aGmIA== Received: from MW4PR04CA0090.namprd04.prod.outlook.com (2603:10b6:303:6b::35) by LV2PR12MB6016.namprd12.prod.outlook.com (2603:10b6:408:14e::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.14; Mon, 5 Dec 2022 14:49:47 +0000 Received: from CO1NAM11FT061.eop-nam11.prod.protection.outlook.com (2603:10b6:303:6b:cafe::9b) by MW4PR04CA0090.outlook.office365.com (2603:10b6:303:6b::35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.14 via Frontend Transport; Mon, 5 Dec 2022 14:49:47 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.160) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.160) by CO1NAM11FT061.mail.protection.outlook.com (10.13.175.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.13 via Frontend Transport; Mon, 5 Dec 2022 14:49:47 +0000 Received: from rnnvmail202.nvidia.com (10.129.68.7) by mail.nvidia.com (10.129.200.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Mon, 5 Dec 2022 06:49:37 -0800 Received: from rnnvmail205.nvidia.com (10.129.68.10) by rnnvmail202.nvidia.com (10.129.68.7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Mon, 5 Dec 2022 06:49:37 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.129.68.10) with Microsoft SMTP Server id 15.2.986.36 via Frontend Transport; Mon, 5 Dec 2022 06:49:34 -0800 From: Yishai Hadas To: , CC: , , , , , , , , , Subject: [PATCH V3 vfio 09/14] vfio/mlx5: Introduce SW headers for migration states Date: Mon, 5 Dec 2022 16:48:33 +0200 Message-ID: <20221205144838.245287-10-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221205144838.245287-1-yishaih@nvidia.com> References: <20221205144838.245287-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1NAM11FT061:EE_|LV2PR12MB6016:EE_ X-MS-Office365-Filtering-Correlation-Id: 0a46d2a0-87e2-415d-9817-08dad6cff44e X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: ZYXrhPrdWxAx15hv5EzZ3tw4xQuZ3SIog+Fver1HshmefIwdEI7D/s6nYDkvNrsV4UTZuOHo2qNjPjAx6rwo9x5H12cO1Hog2SdveBeCXPJJ12OB5HKeeO/VKkY+47vGKcw9mgzviDw/AyksAtcuxqSfQs7PcEYIy2qu5APzTHXPr3Q06IWvglcLHet9KIdm3TrvHsGLx7MPDqqNjKd9Hub6B+jOXGvP7uN6TlUT+Ej31xoilsUvC0+nrl2MtMapywM/OPhjk7Fion3OwXzeZbGb0RHcVGfH2BiGQ+WHhKj/GT7fiV4lYWw31H0iTGbABGhLStrWazUlaMxckESlj9NsEMFIG8ubT02R3GhhUIScnzW8dACn0QF19Tav7C+hMxl93UAqwoTHaSS7JD+z+VOZwkB4kZ5L8RgJYj4RGn1nBC6yZLPWlRooMod7H2nxtpoYMpD2pZmMaedeOXiwvanHka4VqC1hWZHrs72xdjSnB6KJKIVw/S6PgAcMXksIk9YQA4q7lOxxwBoY+32e/sQDoQcgc2VEqLXcAPnREz/KT0cjMYX9nMoffwR0NJgGLD9ikHYbS677vmvxFO+TL4HqhltKfmRr7wqyDcNN1VEgJcQGO1X65tNGQAS356I4S0+454cyOYK1TtjjLvdi5E/bh5e8xSuqdbhCe6XATGfiljtLge4EWZG7DQl1KanvI8OexB/tRdK9Fj1tj6sIGQ== X-Forefront-Antispam-Report: CIP:216.228.117.160;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge1.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(346002)(136003)(396003)(39860400002)(376002)(451199015)(36840700001)(40470700004)(46966006)(40480700001)(36756003)(40460700003)(86362001)(7636003)(7696005)(478600001)(26005)(5660300002)(4326008)(8676002)(41300700001)(8936002)(6636002)(316002)(2906002)(110136005)(54906003)(70206006)(70586007)(82310400005)(36860700001)(356005)(82740400003)(2616005)(336012)(186003)(1076003)(83380400001)(47076005)(426003);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Dec 2022 14:49:47.0774 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 0a46d2a0-87e2-415d-9817-08dad6cff44e X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.160];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT061.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: LV2PR12MB6016 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org As mentioned in the previous patches, mlx5 is transferring multiple states when the PRE_COPY protocol is used. This states mechanism requires the target VM to know the states' size in order to execute multiple loads. Therefore, add SW header, with the needed information, for each saved state the source VM is transferring to the target VM. This patch implements the source VM handling of the headers, following patch will implement the target VM handling of the headers. Reviewed-by: Jason Gunthorpe Signed-off-by: Yishai Hadas --- drivers/vfio/pci/mlx5/cmd.c | 56 ++++++++++++++++++++++++++++++++++-- drivers/vfio/pci/mlx5/cmd.h | 13 +++++++++ drivers/vfio/pci/mlx5/main.c | 2 +- 3 files changed, 67 insertions(+), 4 deletions(-) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index 5fcece201d4c..160fa38fc78d 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -351,9 +351,11 @@ mlx5vf_alloc_data_buffer(struct mlx5_vf_migration_file *migf, if (ret) goto end; - ret = mlx5vf_dma_data_buffer(buf); - if (ret) - goto end; + if (dma_dir != DMA_NONE) { + ret = mlx5vf_dma_data_buffer(buf); + if (ret) + goto end; + } } return buf; @@ -422,6 +424,8 @@ void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work) mutex_lock(&migf->lock); if (async_data->status) { mlx5vf_put_data_buffer(async_data->buf); + if (async_data->header_buf) + mlx5vf_put_data_buffer(async_data->header_buf); migf->state = MLX5_MIGF_STATE_ERROR; wake_up_interruptible(&migf->poll_wait); } @@ -431,6 +435,32 @@ void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work) fput(migf->filp); } +static int add_buf_header(struct mlx5_vhca_data_buffer *header_buf, + size_t image_size) +{ + struct mlx5_vf_migration_file *migf = header_buf->migf; + struct mlx5_vf_migration_header header = {}; + unsigned long flags; + struct page *page; + u8 *to_buff; + + header.image_size = cpu_to_le64(image_size); + page = mlx5vf_get_migration_page(header_buf, 0); + if (!page) + return -EINVAL; + to_buff = kmap_local_page(page); + memcpy(to_buff, &header, sizeof(header)); + kunmap_local(to_buff); + header_buf->length = sizeof(header); + header_buf->header_image_size = image_size; + header_buf->start_pos = header_buf->migf->max_pos; + migf->max_pos += header_buf->length; + spin_lock_irqsave(&migf->list_lock, flags); + list_add_tail(&header_buf->buf_elm, &migf->buf_list); + spin_unlock_irqrestore(&migf->list_lock, flags); + return 0; +} + static void mlx5vf_save_callback(int status, struct mlx5_async_work *context) { struct mlx5vf_async_data *async_data = container_of(context, @@ -444,6 +474,11 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context) image_size = MLX5_GET(save_vhca_state_out, async_data->out, actual_image_size); + if (async_data->header_buf) { + status = add_buf_header(async_data->header_buf, image_size); + if (status) + goto err; + } async_data->buf->length = image_size; async_data->buf->start_pos = migf->max_pos; migf->max_pos += async_data->buf->length; @@ -455,6 +490,7 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context) wake_up_interruptible(&migf->poll_wait); } +err: /* * The error and the cleanup flows can't run from an * interrupt context @@ -470,6 +506,7 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, { u32 out_size = MLX5_ST_SZ_BYTES(save_vhca_state_out); u32 in[MLX5_ST_SZ_DW(save_vhca_state_in)] = {}; + struct mlx5_vhca_data_buffer *header_buf = NULL; struct mlx5vf_async_data *async_data; int err; @@ -499,6 +536,16 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, goto err_out; } + if (MLX5VF_PRE_COPY_SUPP(mvdev)) { + header_buf = mlx5vf_get_data_buffer(migf, + sizeof(struct mlx5_vf_migration_header), DMA_NONE); + if (IS_ERR(header_buf)) { + err = PTR_ERR(header_buf); + goto err_free; + } + } + + async_data->header_buf = header_buf; get_file(migf->filp); err = mlx5_cmd_exec_cb(&migf->async_ctx, in, sizeof(in), async_data->out, @@ -510,7 +557,10 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, return 0; err_exec: + if (header_buf) + mlx5vf_put_data_buffer(header_buf); fput(migf->filp); +err_free: kvfree(async_data->out); err_out: complete(&migf->save_comp); diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index 34e61c7aa23d..3e36ccca820a 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -12,16 +12,26 @@ #include #include +#define MLX5VF_PRE_COPY_SUPP(mvdev) \ + ((mvdev)->core_device.vdev.migration_flags & VFIO_MIGRATION_PRE_COPY) + enum mlx5_vf_migf_state { MLX5_MIGF_STATE_ERROR = 1, MLX5_MIGF_STATE_COMPLETE, }; +struct mlx5_vf_migration_header { + __le64 image_size; + /* For future use in case we may need to change the kernel protocol */ + __le64 flags; +}; + struct mlx5_vhca_data_buffer { struct sg_append_table table; loff_t start_pos; u64 length; u64 allocated_length; + u64 header_image_size; u32 mkey; enum dma_data_direction dma_dir; u8 dmaed:1; @@ -37,6 +47,7 @@ struct mlx5vf_async_data { struct mlx5_async_work cb_work; struct work_struct work; struct mlx5_vhca_data_buffer *buf; + struct mlx5_vhca_data_buffer *header_buf; int status; u8 last_chunk:1; void *out; @@ -165,6 +176,8 @@ mlx5vf_get_data_buffer(struct mlx5_vf_migration_file *migf, void mlx5vf_put_data_buffer(struct mlx5_vhca_data_buffer *buf); int mlx5vf_add_migration_pages(struct mlx5_vhca_data_buffer *buf, unsigned int npages); +struct page *mlx5vf_get_migration_page(struct mlx5_vhca_data_buffer *buf, + unsigned long offset); void mlx5vf_state_mutex_unlock(struct mlx5vf_pci_core_device *mvdev); void mlx5vf_disable_fds(struct mlx5vf_pci_core_device *mvdev); void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work); diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index 9cabba456044..9a36e36ec33b 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -32,7 +32,7 @@ static struct mlx5vf_pci_core_device *mlx5vf_drvdata(struct pci_dev *pdev) core_device); } -static struct page * +struct page * mlx5vf_get_migration_page(struct mlx5_vhca_data_buffer *buf, unsigned long offset) { From patchwork Mon Dec 5 14:48:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13064584 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95965C4321E for ; Mon, 5 Dec 2022 14:50:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231151AbiLEOt6 (ORCPT ); Mon, 5 Dec 2022 09:49:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53784 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231663AbiLEOt4 (ORCPT ); Mon, 5 Dec 2022 09:49:56 -0500 Received: from NAM04-DM6-obe.outbound.protection.outlook.com (mail-dm6nam04on2082.outbound.protection.outlook.com [40.107.102.82]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BF1D11C427 for ; Mon, 5 Dec 2022 06:49:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Wox+Ju88bGHnxJh1VHQSwMWzUESmFQK8cxjcHzDQCh/X2rZ2q5KgRZ31OL0pTQh7zuLwX7ibgtiP65OYg5Yu7/kAtpTUV8GxJsPPwRICqlZ/QLV/LkzdKVIcMgx3XCLgYctq2qtw+T5z6pZJM2NJ3gayVBfNWVs1vEb3CcXtGI+LpZd7+KDxOYL+wxFG826HrrocPwW24go6RYY88NIZxpeYMJZdPCH4nSth59fgU0xNqvxpbAe1BIDseSBqAZ6YPM7+trgk/NxfeZBWR9/adUVG4lmY5enB+BNPGvloxZyNnW9TDC4ub25YxDpyOrGBdgBSGMP6c6prq4DwrEm1ww== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ypaAh5o4CiN2ps+bMYLapf6I6ScbOL8H5+WlOZZuvZA=; b=H99ZnLZ1nj5HF9ILrpFPlXOEjqR92hxD1UWGEKgtOe7MCi0HROc5pQceZEHhiK1OwTsHXs+EdXuXxdxcM0smrnxvktuyE0txRl5zAqTRFFm+FVOrIHJhsfN37H4ke7RrQ4aUW+a3ez/QdTIR8yh0A6FeZYShjx9x5432G3QxWswzUZ9zhcpU19Vyg8rttjSjn1awmkOYSiUkde0lS+5rULCEzpq+6z8nY4WFp7LoYe0fkT/U+EDiZG63PayouDQ4GYB3O03a5pZ1lt6EP99YTMEg3GrGKIgbhfuTKkGtvWHYQcn1qeuznLGQYXmyGMfIYqIsV4y8lLeslYt1rJmM0w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.160) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ypaAh5o4CiN2ps+bMYLapf6I6ScbOL8H5+WlOZZuvZA=; b=eJgwBQVhNSaadzFpUUFr3BLvxBua8WNrzLccTr34PToitbI/7GG0usxOxaTrEVYXqMtRbd17/c9AQ6cTW0wofXBInVNAz18NKqYIT/zxv3UkQ8+Apmq0rb1IinI9ANv1PP53wJRUk9GRgAyOQijaqlTHJklUukp9002n4RRSi3Wa87Ua1QFJq2PegUqFe3Ig24x22IlVV+lZYxoE4MH/y7f/EwhWUqdh3vFE2hoFaFDzwk/Ot683/yYmM22i0Nk6ekY/YTTbm5/RLN3IP9Hd7PqtH7F0ZXZCaolhJEBf6fOvwXZeoxYObVc1zbMLwkvlORUoJhKx+Be/Fr+luBViTg== Received: from MW4PR04CA0040.namprd04.prod.outlook.com (2603:10b6:303:6a::15) by CY5PR12MB6431.namprd12.prod.outlook.com (2603:10b6:930:39::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.13; Mon, 5 Dec 2022 14:49:53 +0000 Received: from CO1NAM11FT092.eop-nam11.prod.protection.outlook.com (2603:10b6:303:6a:cafe::5b) by MW4PR04CA0040.outlook.office365.com (2603:10b6:303:6a::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.14 via Frontend Transport; Mon, 5 Dec 2022 14:49:52 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.160) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.160) by CO1NAM11FT092.mail.protection.outlook.com (10.13.175.225) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.14 via Frontend Transport; Mon, 5 Dec 2022 14:49:52 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Mon, 5 Dec 2022 06:49:41 -0800 Received: from rnnvmail205.nvidia.com (10.129.68.10) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Mon, 5 Dec 2022 06:49:41 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.129.68.10) with Microsoft SMTP Server id 15.2.986.36 via Frontend Transport; Mon, 5 Dec 2022 06:49:38 -0800 From: Yishai Hadas To: , CC: , , , , , , , , , Subject: [PATCH V3 vfio 10/14] vfio/mlx5: Introduce vfio precopy ioctl implementation Date: Mon, 5 Dec 2022 16:48:34 +0200 Message-ID: <20221205144838.245287-11-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221205144838.245287-1-yishaih@nvidia.com> References: <20221205144838.245287-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1NAM11FT092:EE_|CY5PR12MB6431:EE_ X-MS-Office365-Filtering-Correlation-Id: 7793b0e5-8c0a-43c2-59bd-08dad6cff7b4 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: gZZlsNoxH/GMEZ/1CLMNNdozpEw9BUo2lrmKaNgyOvqKRRvMjYVidB7fDnkc3mCKBYGOwY/8/nbJCmNsYm/0SDNUEFXadXLAxXu1H6OL9LYup+iSCE9R59hpLcBhIrm/XvDRcX8SHnW8wLxRErpVymet3b/oc/D0n7kUF2uMHLn50Y6PsqsP1HrHuMb8Z2CZHHs5LaWk8vXDfdAOFsqA+odg+gdMXIgMZkuNlTzXKJIGVJRoGPcSUfYHTcSDovk71l6epeGgiUFk8ydH/vxuFo5WVLSAdMoLluUVu8vPp5PY/sVsafZ3FrOUstHd2sDrzMCEaXs8MpO4BZJWNI6KXsGyvibtVXOn7Ako9xueXBrxn+tdEvbSP47eWQL1tC8Zsn8ANR6AWeg1AhlwpZO/a5QL4+y3sHmhWRaY+wWLtrU3rP1uS9xXV8Y1HsGj2dR9ZChcvdTSQ5vWzkeJ8Fcd05mS0JggPRwP2FxPCVes78565nNxkNtS1VkFKiBBWfzAcr97s7FCAs6ozEwOJa/ZqLHGZ/hOfeL3HBjbGKxfTtbe3pqHNOmw/GbmhAktc1QS7l3IVrBiuPMESjPyRRqVYob2DJ3fBRkB9Sl0/7P9mRRqU+dMBl12NTlJh6Wc37xHMt004MWxBFIHUf+2IN9pEFp5vbSMEeM8pC2wZopMzobKeaNxGIFp/UIS/lpQtTM6+B2ECJWK6IfOTZ4JyB8IIA== X-Forefront-Antispam-Report: CIP:216.228.117.160;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge1.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(396003)(346002)(376002)(39860400002)(136003)(451199015)(46966006)(40470700004)(36840700001)(47076005)(426003)(7636003)(86362001)(40460700003)(7696005)(356005)(478600001)(40480700001)(36756003)(82740400003)(36860700001)(2616005)(336012)(1076003)(83380400001)(82310400005)(186003)(41300700001)(8936002)(4326008)(5660300002)(8676002)(26005)(6666004)(6636002)(54906003)(316002)(70206006)(70586007)(110136005)(2906002);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Dec 2022 14:49:52.7775 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 7793b0e5-8c0a-43c2-59bd-08dad6cff7b4 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.160];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT092.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY5PR12MB6431 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org vfio precopy ioctl returns an estimation of data available for transferring from the device. Whenever a user is using VFIO_MIG_GET_PRECOPY_INFO, track the current state of the device, and if needed, append the dirty data to the transfer FD data. This is done by saving a middle state. As mlx5 runs the SAVE command asynchronously, make sure to query for incremental data only once there is no active save command. Running both in parallel, might end-up with a failure in the incremental query command on un-tracked vhca. Also, a middle state will be saved only after the previous state has finished its SAVE command and has been fully transferred, this prevents endless use resources. Co-developed-by: Shay Drory Signed-off-by: Shay Drory Reviewed-by: Jason Gunthorpe Signed-off-by: Yishai Hadas --- drivers/vfio/pci/mlx5/cmd.c | 16 +++++ drivers/vfio/pci/mlx5/main.c | 111 +++++++++++++++++++++++++++++++++++ 2 files changed, 127 insertions(+) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index 160fa38fc78d..12e74ecebe64 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -67,12 +67,25 @@ int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev, { u32 out[MLX5_ST_SZ_DW(query_vhca_migration_state_out)] = {}; u32 in[MLX5_ST_SZ_DW(query_vhca_migration_state_in)] = {}; + bool inc = query_flags & MLX5VF_QUERY_INC; int ret; lockdep_assert_held(&mvdev->state_mutex); if (mvdev->mdev_detach) return -ENOTCONN; + /* + * In case PRE_COPY is used, saving_migf is exposed while device is + * running. Make sure to run only once there is no active save command. + * Running both in parallel, might end-up with a failure in the + * incremental query command on un-tracked vhca. + */ + if (inc) { + ret = wait_for_completion_interruptible(&mvdev->saving_migf->save_comp); + if (ret) + return ret; + } + MLX5_SET(query_vhca_migration_state_in, in, opcode, MLX5_CMD_OP_QUERY_VHCA_MIGRATION_STATE); MLX5_SET(query_vhca_migration_state_in, in, vhca_id, mvdev->vhca_id); @@ -82,6 +95,9 @@ int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev, ret = mlx5_cmd_exec_inout(mvdev->mdev, query_vhca_migration_state, in, out); + if (inc) + complete(&mvdev->saving_migf->save_comp); + if (ret) return ret; diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index 9a36e36ec33b..08c7d96e92b7 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -294,10 +294,121 @@ static void mlx5vf_mark_err(struct mlx5_vf_migration_file *migf) wake_up_interruptible(&migf->poll_wait); } +static ssize_t mlx5vf_precopy_ioctl(struct file *filp, unsigned int cmd, + unsigned long arg) +{ + struct mlx5_vf_migration_file *migf = filp->private_data; + struct mlx5vf_pci_core_device *mvdev = migf->mvdev; + struct mlx5_vhca_data_buffer *buf; + struct vfio_precopy_info info = {}; + loff_t *pos = &filp->f_pos; + unsigned long minsz; + size_t inc_length = 0; + bool end_of_data; + int ret; + + if (cmd != VFIO_MIG_GET_PRECOPY_INFO) + return -ENOTTY; + + minsz = offsetofend(struct vfio_precopy_info, dirty_bytes); + + if (copy_from_user(&info, (void __user *)arg, minsz)) + return -EFAULT; + + if (info.argsz < minsz) + return -EINVAL; + + mutex_lock(&mvdev->state_mutex); + if (mvdev->mig_state != VFIO_DEVICE_STATE_PRE_COPY && + mvdev->mig_state != VFIO_DEVICE_STATE_PRE_COPY_P2P) { + ret = -EINVAL; + goto err_state_unlock; + } + + /* + * We can't issue a SAVE command when the device is suspended, so as + * part of VFIO_DEVICE_STATE_PRE_COPY_P2P no reason to query for extra + * bytes that can't be read. + */ + if (mvdev->mig_state == VFIO_DEVICE_STATE_PRE_COPY) { + /* + * Once the query returns it's guaranteed that there is no + * active SAVE command. + * As so, the other code below is safe with the proper locks. + */ + ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &inc_length, + MLX5VF_QUERY_INC); + if (ret) + goto err_state_unlock; + } + + mutex_lock(&migf->lock); + if (migf->state == MLX5_MIGF_STATE_ERROR) { + ret = -ENODEV; + goto err_migf_unlock; + } + + buf = mlx5vf_get_data_buff_from_pos(migf, *pos, &end_of_data); + if (buf) { + if (buf->start_pos == 0) { + info.initial_bytes = buf->header_image_size - *pos; + } else if (buf->start_pos == + sizeof(struct mlx5_vf_migration_header)) { + /* First data buffer following the header */ + info.initial_bytes = buf->start_pos + + buf->length - *pos; + } else { + info.dirty_bytes = buf->start_pos + buf->length - *pos; + } + } else { + if (!end_of_data) { + ret = -EINVAL; + goto err_migf_unlock; + } + + info.dirty_bytes = inc_length; + } + + if (!end_of_data || !inc_length) { + mutex_unlock(&migf->lock); + goto done; + } + + mutex_unlock(&migf->lock); + /* + * We finished transferring the current state and the device has a + * dirty state, save a new state to be ready for. + */ + buf = mlx5vf_get_data_buffer(migf, inc_length, DMA_FROM_DEVICE); + if (IS_ERR(buf)) { + ret = PTR_ERR(buf); + mlx5vf_mark_err(migf); + goto err_state_unlock; + } + + ret = mlx5vf_cmd_save_vhca_state(mvdev, migf, buf, true, true); + if (ret) { + mlx5vf_mark_err(migf); + mlx5vf_put_data_buffer(buf); + goto err_state_unlock; + } + +done: + mlx5vf_state_mutex_unlock(mvdev); + return copy_to_user((void __user *)arg, &info, minsz); +err_migf_unlock: + mutex_unlock(&migf->lock); +err_state_unlock: + mlx5vf_state_mutex_unlock(mvdev); + return ret; +} + static const struct file_operations mlx5vf_save_fops = { .owner = THIS_MODULE, .read = mlx5vf_save_read, .poll = mlx5vf_save_poll, + .unlocked_ioctl = mlx5vf_precopy_ioctl, + .compat_ioctl = compat_ptr_ioctl, .release = mlx5vf_release_file, .llseek = no_llseek, }; From patchwork Mon Dec 5 14:48:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13064585 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CFAC4C4321E for ; Mon, 5 Dec 2022 14:50:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231737AbiLEOuT (ORCPT ); Mon, 5 Dec 2022 09:50:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54044 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232116AbiLEOuH (ORCPT ); Mon, 5 Dec 2022 09:50:07 -0500 Received: from NAM11-BN8-obe.outbound.protection.outlook.com (mail-bn8nam11on2040.outbound.protection.outlook.com [40.107.236.40]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0B1041CFCD for ; Mon, 5 Dec 2022 06:50:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=WrNHOhEcKCnrxZBTTc+zppOXikFYvLW7VLdTJPv7TQFdnR2kU8JBPkG871xnQQ+YOLLu+8a8389sj1A9yR1aAOjg1MAjyL9bXmUYV3accwqBESd+zXr6odmy0EalpONCj/7LR1IRtIMNa1cezyKlAqnfoA/mR8Qs+08FcjoAeA1vc/MMYKIi0FadLyrgeenIHuxeKWlkdMj2kMiCdDEDVS4I4yfatpbnkusWv6bb5l+TnWkXRS7gdVRel8UyMmcsHc61ETGGWnhG7tnmfJpBKptjCEvOhgN+ZUA3+5FoHx/61YQAm4x2FyDBp1e/K5TZAODT0hulI7Kos4jKuTq08g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=QEKHD2MLK+QyA7jR/5FGvpObfCTuRZmzts02R7sWvko=; b=e8eUQKviU5/qup63vWwYVSrITL86JqHqVSVLkRFk9AB1Edl+hxsvhuN16MrbgvhB/9/UxOQv7a6aoWbi3nK5RVAAsCFqSwisRfIlF+CdQ1VUgdqY+DrUgqeT6mrvl1FUVjAKWA/WtQ+FW6HilwzX22WBzBqRhLJcAw9FFgvtUGP6n+NtJYm8sqgf0lWTztCTHffivCnFY872xU/BBuSpbXePGpm8LLL7dEbeK/LoMYccdEeBIaM4Df8bYLL7Ne61wj//WLgFYJipBcHNbYcdjB9cfLe1xIm47edhhHXitCe6B/tjz/JrDFd6K5C1W290+o7oioxRcWtBmSYUfDWOZg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.160) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=QEKHD2MLK+QyA7jR/5FGvpObfCTuRZmzts02R7sWvko=; b=mqp0RTHjF4HEem8aifQ8qC5f9l/kydA1NkfqYoTNLw3jlPmCPpJKzvlKduGSdK5sdEwwStDt29pVPQkuOkKT3vzFLsKMOVKG8oXM8JGg71VhLNIvluzC1qW6/VH0tXLNc0ckhDMtxDeOrsEsUyFglZ7VZ/nm3DWEsM7ZCNuqy5dY2CPKThWD1KquQXQYat3ag/ZlH86zNWkSPf14HKs/6491y5Fz78KKka77j49Dps4y4ZgA8+gGj23tNS949BLgjujMQRNTY0WJsCjRzbFtjn0YhZItzyKid7Z1HuGywyLXaNWW/JO6a4c757JTSP0XT64qNNyeQ9nic+Ni355KLg== Received: from MW4PR04CA0056.namprd04.prod.outlook.com (2603:10b6:303:6a::31) by CY8PR12MB7537.namprd12.prod.outlook.com (2603:10b6:930:94::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.13; Mon, 5 Dec 2022 14:50:00 +0000 Received: from CO1NAM11FT092.eop-nam11.prod.protection.outlook.com (2603:10b6:303:6a:cafe::5) by MW4PR04CA0056.outlook.office365.com (2603:10b6:303:6a::31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.14 via Frontend Transport; Mon, 5 Dec 2022 14:50:00 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.160) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.160) by CO1NAM11FT092.mail.protection.outlook.com (10.13.175.225) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.14 via Frontend Transport; Mon, 5 Dec 2022 14:50:00 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Mon, 5 Dec 2022 06:49:45 -0800 Received: from rnnvmail205.nvidia.com (10.129.68.10) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Mon, 5 Dec 2022 06:49:45 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.129.68.10) with Microsoft SMTP Server id 15.2.986.36 via Frontend Transport; Mon, 5 Dec 2022 06:49:41 -0800 From: Yishai Hadas To: , CC: , , , , , , , , , Subject: [PATCH V3 vfio 11/14] vfio/mlx5: Consider temporary end of stream as part of PRE_COPY Date: Mon, 5 Dec 2022 16:48:35 +0200 Message-ID: <20221205144838.245287-12-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221205144838.245287-1-yishaih@nvidia.com> References: <20221205144838.245287-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1NAM11FT092:EE_|CY8PR12MB7537:EE_ X-MS-Office365-Filtering-Correlation-Id: e583c320-8804-4529-7116-08dad6cffc31 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: +2XbXko6kySSx+VWzbwDYzyUiQ5W3AQ0L0JbcsQDwnJBkcY/rQUKn9dYJIiqQTkQ+h4a0+yBxLuxJAg7EUfV2ixYeUY4f/Nfi1JKHaqbli6kwDcsqqAykEtPWF3qaLABikkr+KDeZ25cZoXrZ+fRZgcfhckOEs54N9rCSgF3RjYQZR5uN3HyCzDBJxntvhYHABv4e3VcRyB+EIsrHFfxNcByD1r3AmT6oyfA0Fl643wQtHjVmedbw7lM/PAZmO/kAByy5Xq+bKHYxxI6q6v+6MlNirDluPodgjjxLRzEuTeP0Ksbt4n78tgtLx55MAo4/+9FUDlnJ5Tm4WiaSi+RWFeTvmH5k0pLE3A03TRYbALyTx7s4mCwID3T2rR2/QU7s4MmQqe+6GuMt6Hvd6UnTkqCiKIyDbAgu54IofsvhLbe5gUPVgBsi6P/V7rXOkNffeFz3sDOsF1gKMIkSPnbhRrNVWXJL/OwPMY2hA7gQsoEe46XtWZlyVq7iD/VhU1qh7+Z2MKG3o5sybOOQkN9lD0ELsam+NAZy/RrsTBVAsYjLL94V0OIyy9fPbonNo4Xa5dMIE4v8qrtxdFow0MYBySJTDHmnq6Tga+8arHi7CIlKHgDnUopinx4DSNj/xxS/OXihUXdry1ei91Jy+xGSvNwLySfLPrsJ7MYUq3BsZrDiuAR4nfipL+gLcPqiCTxydGtuXJJxbSsiK4Kf6T4TQ== X-Forefront-Antispam-Report: CIP:216.228.117.160;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge1.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(346002)(39860400002)(136003)(376002)(396003)(451199015)(46966006)(40470700004)(36840700001)(82310400005)(40480700001)(36756003)(336012)(47076005)(186003)(110136005)(316002)(83380400001)(1076003)(6636002)(7696005)(6666004)(26005)(54906003)(426003)(2906002)(86362001)(70206006)(36860700001)(4326008)(8676002)(70586007)(7636003)(356005)(82740400003)(41300700001)(40460700003)(2616005)(478600001)(5660300002)(8936002);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Dec 2022 14:50:00.3082 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: e583c320-8804-4529-7116-08dad6cffc31 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.160];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT092.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY8PR12MB7537 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org During PRE_COPY the migration data FD may have a temporary "end of stream" that is reached when the initial_bytes were read and no other dirty data exists yet. For instance, this may indicate that the device is idle and not currently dirtying any internal state. When read() is done on this temporary end of stream the kernel driver should return ENOMSG from read(). Userspace can wait for more data or consider moving to STOP_COPY. To not block the user upon read() and let it get ENOMSG we add a new state named MLX5_MIGF_STATE_PRE_COPY on the migration file. In addition, we add the MLX5_MIGF_STATE_SAVE_LAST state to block the read() once we call the last SAVE upon moving to STOP_COPY. Any further error will be marked with MLX5_MIGF_STATE_ERROR and the user won't be blocked. Reviewed-by: Jason Gunthorpe Signed-off-by: Yishai Hadas --- drivers/vfio/pci/mlx5/cmd.c | 7 +++++-- drivers/vfio/pci/mlx5/cmd.h | 2 ++ drivers/vfio/pci/mlx5/main.c | 7 +++++++ 3 files changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index 12e74ecebe64..f6293da033cc 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -501,8 +501,8 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context) spin_lock_irqsave(&migf->list_lock, flags); list_add_tail(&async_data->buf->buf_elm, &migf->buf_list); spin_unlock_irqrestore(&migf->list_lock, flags); - if (async_data->last_chunk) - migf->state = MLX5_MIGF_STATE_COMPLETE; + migf->state = async_data->last_chunk ? + MLX5_MIGF_STATE_COMPLETE : MLX5_MIGF_STATE_PRE_COPY; wake_up_interruptible(&migf->poll_wait); } @@ -561,6 +561,9 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, } } + if (async_data->last_chunk) + migf->state = MLX5_MIGF_STATE_SAVE_LAST; + async_data->header_buf = header_buf; get_file(migf->filp); err = mlx5_cmd_exec_cb(&migf->async_ctx, in, sizeof(in), diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index 3e36ccca820a..d048f23977dd 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -17,6 +17,8 @@ enum mlx5_vf_migf_state { MLX5_MIGF_STATE_ERROR = 1, + MLX5_MIGF_STATE_PRE_COPY, + MLX5_MIGF_STATE_SAVE_LAST, MLX5_MIGF_STATE_COMPLETE, }; diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index 08c7d96e92b7..d70d7a85d11c 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -219,6 +219,7 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, if (wait_event_interruptible(migf->poll_wait, !list_empty(&migf->buf_list) || migf->state == MLX5_MIGF_STATE_ERROR || + migf->state == MLX5_MIGF_STATE_PRE_COPY || migf->state == MLX5_MIGF_STATE_COMPLETE)) return -ERESTARTSYS; } @@ -236,6 +237,12 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, &end_of_data); if (first_loop_call) { first_loop_call = false; + /* Temporary end of file as part of PRE_COPY */ + if (end_of_data && migf->state == MLX5_MIGF_STATE_PRE_COPY) { + done = -ENOMSG; + goto out_unlock; + } + if (end_of_data && migf->state != MLX5_MIGF_STATE_COMPLETE) { if (filp->f_flags & O_NONBLOCK) { done = -EAGAIN; From patchwork Mon Dec 5 14:48:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13064586 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9253AC4321E for ; Mon, 5 Dec 2022 14:50:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231183AbiLEOuc (ORCPT ); Mon, 5 Dec 2022 09:50:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53822 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231416AbiLEOuY (ORCPT ); Mon, 5 Dec 2022 09:50:24 -0500 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2053.outbound.protection.outlook.com [40.107.92.53]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2583B1DA7A for ; Mon, 5 Dec 2022 06:50:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=OqbnA8sVKbpfst1v7bxPgnTyW5kwSMkGxbTiPmlmsXSKypufi5D9miPXv6tJyCk8lkOqPm0l6HK88/DCeutLvJQ2mv8z7QonXl+ONTD8HzGyxQ1Pi07nXJ3hooHErtChnMiJIaEPalZDr90X4fdNmhBMfGYVHNSact7Bd3WPqh38iZDWn3Zp5K4vkDBe8fo2VDN3UpYgGVbbuXQX1V5xe0Ry1zMK3JmaG1oH0gFLIRforrH37OlHb+cL1aUiM563BsW+iaG5xCDYxRiKT1EnI6AVrpE2u3ojL07mnHGvOIaLm/n8ZTYn4GTLpQHX4jMyFASHRDAlB6I8JTwbIyXzuQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=dR/xeHz8Pxr5iy3IS1+rSBA17njm2MaoNvncNOgiPrc=; b=BZsd5MtpZuQpGy/Z2jlaOT746V78N8wMehzhAL9hET6qhD3XqXXSLJK87mNxBz5EkAzMLmHbm+4REtqMFMF9zmie656J9n8V+Bxxll5T/8yoCO2Ohx7pF1neRbtpuVpGlVdENWd4+9i8NkXtsseuhw0iLoTT1TlGMPdJBL/bS5dKGVP4v1GKu8XCCrkxh1T50qfmfgW3zPB30J5haEEcLSOoq5pLPk000OL6wmIpcau33fF03zykyhPdSI73wDmITCRrYdXo6B3mOcx+Vqv+3PM3UHZjJNBqOQE3GWQaSdIjEzHJ80jxoccOkiwXZu8st7JxQR8KeQI+3I6VMAfSFQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=dR/xeHz8Pxr5iy3IS1+rSBA17njm2MaoNvncNOgiPrc=; b=Cn1TY+Zkk8/sQosub5kLRinIDABZSX7hFKeV/OIksQgPZlrYvq0GQCFcuj/dxoYGj8+bZ9t1QyPtO3kVVueiBIG6b0m+tXQKOAIQK4d+NXhm3aHjIi2B9hkAStK80y0GfdmI0nmyaEQ3SWuMLFwDTV5Ij7lH5Uw38tXbmA6DN7YdqE26/G4UtEElOGbYz1pgUxxRH3MrSwJre/QudgEjF6/7d7Rk/PmA+AH89isA5iIvBxrQTQS8MvGSlZsAYt9ULMzZ50NGsBnzTLUC/zhgoBw+h8iFO0o9Wn3CA8ujtvsdHd0GEejl+1NuNt61YzJ7P2utRWRwxBqYLh+3Cag7fA== Received: from CY8PR22CA0014.namprd22.prod.outlook.com (2603:10b6:930:45::23) by MW5PR12MB5623.namprd12.prod.outlook.com (2603:10b6:303:199::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.14; Mon, 5 Dec 2022 14:50:07 +0000 Received: from CY4PEPF0000B8EE.namprd05.prod.outlook.com (2603:10b6:930:45:cafe::59) by CY8PR22CA0014.outlook.office365.com (2603:10b6:930:45::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.8 via Frontend Transport; Mon, 5 Dec 2022 14:50:07 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by CY4PEPF0000B8EE.mail.protection.outlook.com (10.167.241.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5813.11 via Frontend Transport; Mon, 5 Dec 2022 14:50:07 +0000 Received: from rnnvmail204.nvidia.com (10.129.68.6) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Mon, 5 Dec 2022 06:49:49 -0800 Received: from rnnvmail205.nvidia.com (10.129.68.10) by rnnvmail204.nvidia.com (10.129.68.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Mon, 5 Dec 2022 06:49:49 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.129.68.10) with Microsoft SMTP Server id 15.2.986.36 via Frontend Transport; Mon, 5 Dec 2022 06:49:45 -0800 From: Yishai Hadas To: , CC: , , , , , , , , , Subject: [PATCH V3 vfio 12/14] vfio/mlx5: Introduce multiple loads Date: Mon, 5 Dec 2022 16:48:36 +0200 Message-ID: <20221205144838.245287-13-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221205144838.245287-1-yishaih@nvidia.com> References: <20221205144838.245287-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY4PEPF0000B8EE:EE_|MW5PR12MB5623:EE_ X-MS-Office365-Filtering-Correlation-Id: 069e5c7b-4b16-4f2c-d52b-08dad6d0003b X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: fhs514nA0YoGI/2C5ZUVCmxg5Aw43/HELpjAnl4+5cPchQ+XC+F3tilCB3cuhnDN5rkTP5Q9N5pfsuF4L1+E9AivKC4yeR3eRyrJdgyAtZ7+BrEGNoOcZyeMCwgxnoDAHJ0XmapYFrc51vhJyZueCZ/xBSgBH6oc60goJIMqiv35//LUGjB4TQlJ943cPegzL3k0k0lxT0e1JjA7BC8ACr74Esp8l92BPC4bgGa1k8TYxqVU6pwxNLO0ilWfM7fX/LmEQOQgIkJNZUhwet3dmL5E6j3N1//XGqyOoVfgliXNEGq18JTeGQOK4OOQJigHydi2PLlqpyUVKpa71t3Q62jyoexNIaMVsmuz8ks+H2aE3HQxOy/osrNqKEQEylmcPos1D22E7cPJm0eObWLhMvPnjFmbbHRG+jSJVokKp09AefkY/dtPBaNxw03vgTXiHTYm1jDbQeEoaNAOvi8YZX0U2kvYNAAoJ5ire3jCFNAmcXwnmFCp0aDc1+Mhn2OJ6VPCcO/fPamd7j8FugRwo3QuTiytOfq5k8o83dCYlWab/YFUoZFbuKQZpLyXmLlHWTfjVus4WPjJanTohO+hmJnMOJyRO/KhhasZMh1ecbz4ueeNecQ9vwyhgVFq77W1sWGbsvR8+avnm/TxDVCGIEr6UcXkl4XgkMVedo0addOXxZGH1fkN5awrAWeWk4zVY3OQCLK+6Dttks4Q6k3BMw== X-Forefront-Antispam-Report: CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(39860400002)(396003)(346002)(136003)(376002)(451199015)(46966006)(40470700004)(36840700001)(36756003)(66899015)(82740400003)(86362001)(356005)(7636003)(2906002)(40460700003)(41300700001)(8936002)(4326008)(5660300002)(30864003)(36860700001)(83380400001)(70206006)(70586007)(478600001)(316002)(54906003)(2616005)(110136005)(6636002)(40480700001)(8676002)(82310400005)(336012)(6666004)(1076003)(47076005)(426003)(186003)(7696005)(26005);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Dec 2022 14:50:07.0246 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 069e5c7b-4b16-4f2c-d52b-08dad6d0003b X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CY4PEPF0000B8EE.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW5PR12MB5623 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org In order to support PRE_COPY, mlx5 driver transfers multiple states (images) of the device. e.g.: the source VF can save and transfer multiple states, and the target VF will load them by that order. This patch implements the changes for the target VF to decompose the header for each state and to write and load multiple states. Reviewed-by: Jason Gunthorpe Signed-off-by: Yishai Hadas --- drivers/vfio/pci/mlx5/cmd.c | 13 +- drivers/vfio/pci/mlx5/cmd.h | 10 ++ drivers/vfio/pci/mlx5/main.c | 279 +++++++++++++++++++++++++++++------ 3 files changed, 257 insertions(+), 45 deletions(-) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index f6293da033cc..993749818d90 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -598,9 +598,11 @@ int mlx5vf_cmd_load_vhca_state(struct mlx5vf_pci_core_device *mvdev, if (mvdev->mdev_detach) return -ENOTCONN; - err = mlx5vf_dma_data_buffer(buf); - if (err) - return err; + if (!buf->dmaed) { + err = mlx5vf_dma_data_buffer(buf); + if (err) + return err; + } MLX5_SET(load_vhca_state_in, in, opcode, MLX5_CMD_OP_LOAD_VHCA_STATE); @@ -644,6 +646,11 @@ void mlx5fv_cmd_clean_migf_resources(struct mlx5_vf_migration_file *migf) migf->buf = NULL; } + if (migf->buf_header) { + mlx5vf_free_data_buffer(migf->buf_header); + migf->buf_header = NULL; + } + list_splice(&migf->avail_list, &migf->buf_list); while ((entry = list_first_entry_or_null(&migf->buf_list, diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index d048f23977dd..7729eac8c78c 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -22,6 +22,14 @@ enum mlx5_vf_migf_state { MLX5_MIGF_STATE_COMPLETE, }; +enum mlx5_vf_load_state { + MLX5_VF_LOAD_STATE_READ_IMAGE_NO_HEADER, + MLX5_VF_LOAD_STATE_READ_HEADER, + MLX5_VF_LOAD_STATE_PREP_IMAGE, + MLX5_VF_LOAD_STATE_READ_IMAGE, + MLX5_VF_LOAD_STATE_LOAD_IMAGE, +}; + struct mlx5_vf_migration_header { __le64 image_size; /* For future use in case we may need to change the kernel protocol */ @@ -60,9 +68,11 @@ struct mlx5_vf_migration_file { struct mutex lock; enum mlx5_vf_migf_state state; + enum mlx5_vf_load_state load_state; u32 pdn; loff_t max_pos; struct mlx5_vhca_data_buffer *buf; + struct mlx5_vhca_data_buffer *buf_header; spinlock_t list_lock; struct list_head buf_list; struct list_head avail_list; diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index d70d7a85d11c..30365b52e530 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -518,13 +518,162 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev, bool track) return ERR_PTR(ret); } +static int +mlx5vf_append_page_to_mig_buf(struct mlx5_vhca_data_buffer *vhca_buf, + const char __user **buf, size_t *len, + loff_t *pos, ssize_t *done) +{ + unsigned long offset; + size_t page_offset; + struct page *page; + size_t page_len; + u8 *to_buff; + int ret; + + offset = *pos - vhca_buf->start_pos; + page_offset = offset % PAGE_SIZE; + + page = mlx5vf_get_migration_page(vhca_buf, offset - page_offset); + if (!page) + return -EINVAL; + page_len = min_t(size_t, *len, PAGE_SIZE - page_offset); + to_buff = kmap_local_page(page); + ret = copy_from_user(to_buff + page_offset, *buf, page_len); + kunmap_local(to_buff); + if (ret) + return -EFAULT; + + *pos += page_len; + *done += page_len; + *buf += page_len; + *len -= page_len; + vhca_buf->length += page_len; + return 0; +} + +static int +mlx5vf_resume_read_image_no_header(struct mlx5_vhca_data_buffer *vhca_buf, + loff_t requested_length, + const char __user **buf, size_t *len, + loff_t *pos, ssize_t *done) +{ + int ret; + + if (requested_length > MAX_MIGRATION_SIZE) + return -ENOMEM; + + if (vhca_buf->allocated_length < requested_length) { + ret = mlx5vf_add_migration_pages( + vhca_buf, + DIV_ROUND_UP(requested_length - vhca_buf->allocated_length, + PAGE_SIZE)); + if (ret) + return ret; + } + + while (*len) { + ret = mlx5vf_append_page_to_mig_buf(vhca_buf, buf, len, pos, + done); + if (ret) + return ret; + } + + return 0; +} + +static ssize_t +mlx5vf_resume_read_image(struct mlx5_vf_migration_file *migf, + struct mlx5_vhca_data_buffer *vhca_buf, + size_t image_size, const char __user **buf, + size_t *len, loff_t *pos, ssize_t *done, + bool *has_work) +{ + size_t copy_len, to_copy; + int ret; + + to_copy = min_t(size_t, *len, image_size - vhca_buf->length); + copy_len = to_copy; + while (to_copy) { + ret = mlx5vf_append_page_to_mig_buf(vhca_buf, buf, &to_copy, pos, + done); + if (ret) + return ret; + } + + *len -= copy_len; + if (vhca_buf->length == image_size) { + migf->load_state = MLX5_VF_LOAD_STATE_LOAD_IMAGE; + migf->max_pos += image_size; + *has_work = true; + } + + return 0; +} + +static int +mlx5vf_resume_read_header(struct mlx5_vf_migration_file *migf, + struct mlx5_vhca_data_buffer *vhca_buf, + const char __user **buf, + size_t *len, loff_t *pos, + ssize_t *done, bool *has_work) +{ + struct page *page; + size_t copy_len; + u8 *to_buff; + int ret; + + copy_len = min_t(size_t, *len, + sizeof(struct mlx5_vf_migration_header) - vhca_buf->length); + page = mlx5vf_get_migration_page(vhca_buf, 0); + if (!page) + return -EINVAL; + to_buff = kmap_local_page(page); + ret = copy_from_user(to_buff + vhca_buf->length, *buf, copy_len); + if (ret) { + ret = -EFAULT; + goto end; + } + + *buf += copy_len; + *pos += copy_len; + *done += copy_len; + *len -= copy_len; + vhca_buf->length += copy_len; + if (vhca_buf->length == sizeof(struct mlx5_vf_migration_header)) { + u64 flags; + + vhca_buf->header_image_size = le64_to_cpup((__le64 *)to_buff); + if (vhca_buf->header_image_size > MAX_MIGRATION_SIZE) { + ret = -ENOMEM; + goto end; + } + + flags = le64_to_cpup((__le64 *)(to_buff + + offsetof(struct mlx5_vf_migration_header, flags))); + if (flags) { + ret = -EOPNOTSUPP; + goto end; + } + + migf->load_state = MLX5_VF_LOAD_STATE_PREP_IMAGE; + migf->max_pos += vhca_buf->length; + *has_work = true; + } +end: + kunmap_local(to_buff); + return ret; +} + static ssize_t mlx5vf_resume_write(struct file *filp, const char __user *buf, size_t len, loff_t *pos) { struct mlx5_vf_migration_file *migf = filp->private_data; struct mlx5_vhca_data_buffer *vhca_buf = migf->buf; + struct mlx5_vhca_data_buffer *vhca_buf_header = migf->buf_header; loff_t requested_length; + bool has_work = false; ssize_t done = 0; + int ret = 0; if (pos) return -ESPIPE; @@ -534,56 +683,83 @@ static ssize_t mlx5vf_resume_write(struct file *filp, const char __user *buf, check_add_overflow((loff_t)len, *pos, &requested_length)) return -EINVAL; - if (requested_length > MAX_MIGRATION_SIZE) - return -ENOMEM; - + mutex_lock(&migf->mvdev->state_mutex); mutex_lock(&migf->lock); if (migf->state == MLX5_MIGF_STATE_ERROR) { - done = -ENODEV; + ret = -ENODEV; goto out_unlock; } - if (vhca_buf->allocated_length < requested_length) { - done = mlx5vf_add_migration_pages( - vhca_buf, - DIV_ROUND_UP(requested_length - vhca_buf->allocated_length, - PAGE_SIZE)); - if (done) - goto out_unlock; - } + while (len || has_work) { + has_work = false; + switch (migf->load_state) { + case MLX5_VF_LOAD_STATE_READ_HEADER: + ret = mlx5vf_resume_read_header(migf, vhca_buf_header, + &buf, &len, pos, + &done, &has_work); + if (ret) + goto out_unlock; + break; + case MLX5_VF_LOAD_STATE_PREP_IMAGE: + { + u64 size = vhca_buf_header->header_image_size; + + if (vhca_buf->allocated_length < size) { + mlx5vf_free_data_buffer(vhca_buf); + + migf->buf = mlx5vf_alloc_data_buffer(migf, + size, DMA_TO_DEVICE); + if (IS_ERR(migf->buf)) { + ret = PTR_ERR(migf->buf); + migf->buf = NULL; + goto out_unlock; + } - while (len) { - size_t page_offset; - struct page *page; - size_t page_len; - u8 *to_buff; - int ret; + vhca_buf = migf->buf; + } - page_offset = (*pos) % PAGE_SIZE; - page = mlx5vf_get_migration_page(vhca_buf, *pos - page_offset); - if (!page) { - if (done == 0) - done = -EINVAL; - goto out_unlock; + vhca_buf->start_pos = migf->max_pos; + migf->load_state = MLX5_VF_LOAD_STATE_READ_IMAGE; + break; } + case MLX5_VF_LOAD_STATE_READ_IMAGE_NO_HEADER: + ret = mlx5vf_resume_read_image_no_header(vhca_buf, + requested_length, + &buf, &len, pos, &done); + if (ret) + goto out_unlock; + break; + case MLX5_VF_LOAD_STATE_READ_IMAGE: + ret = mlx5vf_resume_read_image(migf, vhca_buf, + vhca_buf_header->header_image_size, + &buf, &len, pos, &done, &has_work); + if (ret) + goto out_unlock; + break; + case MLX5_VF_LOAD_STATE_LOAD_IMAGE: + ret = mlx5vf_cmd_load_vhca_state(migf->mvdev, migf, vhca_buf); + if (ret) + goto out_unlock; + migf->load_state = MLX5_VF_LOAD_STATE_READ_HEADER; - page_len = min_t(size_t, len, PAGE_SIZE - page_offset); - to_buff = kmap_local_page(page); - ret = copy_from_user(to_buff + page_offset, buf, page_len); - kunmap_local(to_buff); - if (ret) { - done = -EFAULT; - goto out_unlock; + /* prep header buf for next image */ + vhca_buf_header->length = 0; + vhca_buf_header->header_image_size = 0; + /* prep data buf for next image */ + vhca_buf->length = 0; + + break; + default: + break; } - *pos += page_len; - len -= page_len; - done += page_len; - buf += page_len; - vhca_buf->length += page_len; } + out_unlock: + if (ret) + migf->state = MLX5_MIGF_STATE_ERROR; mutex_unlock(&migf->lock); - return done; + mlx5vf_state_mutex_unlock(migf->mvdev); + return ret ? ret : done; } static const struct file_operations mlx5vf_resume_fops = { @@ -623,12 +799,29 @@ mlx5vf_pci_resume_device_data(struct mlx5vf_pci_core_device *mvdev) } migf->buf = buf; + if (MLX5VF_PRE_COPY_SUPP(mvdev)) { + buf = mlx5vf_alloc_data_buffer(migf, + sizeof(struct mlx5_vf_migration_header), DMA_NONE); + if (IS_ERR(buf)) { + ret = PTR_ERR(buf); + goto out_buf; + } + + migf->buf_header = buf; + migf->load_state = MLX5_VF_LOAD_STATE_READ_HEADER; + } else { + /* Initial state will be to read the image */ + migf->load_state = MLX5_VF_LOAD_STATE_READ_IMAGE_NO_HEADER; + } + stream_open(migf->filp->f_inode, migf->filp); mutex_init(&migf->lock); INIT_LIST_HEAD(&migf->buf_list); INIT_LIST_HEAD(&migf->avail_list); spin_lock_init(&migf->list_lock); return migf; +out_buf: + mlx5vf_free_data_buffer(buf); out_pd: mlx5vf_cmd_dealloc_pd(migf); out_free: @@ -728,11 +921,13 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev, } if (cur == VFIO_DEVICE_STATE_RESUMING && new == VFIO_DEVICE_STATE_STOP) { - ret = mlx5vf_cmd_load_vhca_state(mvdev, - mvdev->resuming_migf, - mvdev->resuming_migf->buf); - if (ret) - return ERR_PTR(ret); + if (!MLX5VF_PRE_COPY_SUPP(mvdev)) { + ret = mlx5vf_cmd_load_vhca_state(mvdev, + mvdev->resuming_migf, + mvdev->resuming_migf->buf); + if (ret) + return ERR_PTR(ret); + } mlx5vf_disable_fds(mvdev); return NULL; } From patchwork Mon Dec 5 14:48:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13064587 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6507FC4321E for ; Mon, 5 Dec 2022 14:50:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231998AbiLEOug (ORCPT ); Mon, 5 Dec 2022 09:50:36 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54374 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231909AbiLEOu2 (ORCPT ); Mon, 5 Dec 2022 09:50:28 -0500 Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2049.outbound.protection.outlook.com [40.107.223.49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BFCB61C421 for ; Mon, 5 Dec 2022 06:50:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=XJoFAh6Phml/nmTHKB8KJ0x5VnX4loP7qATlJprvTivKM6uB3PLlbTXCIiJQ+xkCE1yzf9yXADuu5k385xwu1/ZsizMPso4xstwNk4nvQHnHLEl9qsXc9/z6gDdwnOa+GuFUbG489KBAz96kIsCUe7w04um39BB/75fW/WA2DsfXsbRLKRc7OzD/sywh3PW0snf+87CCgqzZyjPgZqNslLNS/8KogH90eJA+fzlwoyF+RvjkF5Fvnpy9uCDLuVCMCcXF7QaH9pU0Wy2zfY0uo12bR2ZsgpntGnLVihMsXk+toEfPoOSNb2GptjXxWJxPQG/iatqAzUty4zi2EeADbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=fOTvfA4xju10qzMIaregSF5Vk0iYMrpnX+eQOxYRLQ0=; b=RDfr8aGU+0NruIY7ANGVvAYzHX++bgq3NoHWkDWWdsfAAITDZ3iaYDw9GJ4kEcZEYQkaDtqxeIw2ERUTnu/fLTWjiUxQsgXcYbMoOnh9SlW68zzktW+dLIa04yryObunCiiXHxXBwawALcpLg90Ik+oMu5IU46H9Slcm8No7WEUBPqL0bNWJQY9FdPejL3bZdv+ymUmRDIAHWGxh468cEH1G7TKPF+oCuJZAsv34ysiUEQvmEE8ymB8NNuslEh3U3VHoYqLr5h+5yK8eKg7esaXWROFEx3qOfgyVGNTRuAlV8LBElPpOfn4F7p/TAMz5ufgMu8k32pUkyI4GxOi3EA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.160) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=fOTvfA4xju10qzMIaregSF5Vk0iYMrpnX+eQOxYRLQ0=; b=S4vKZyTP4s/msZ9IemjfsvYPdvx62dfrlPSqkmsTed7FmF8+moTM1x4uDItlsWC1F1cIZrLIXhC8PfeELzvafbq+DDgJ1ogpZRvuhfojSsuw9hRgcewkITkqdkvLVQVQQ/ZiGoxMbHRpZ75D1IhipW/wBEe4LlRjrhIVHBfo6IPetNAIpCyJguL8xbu2+l9Ii6dU5PcGXkbXyrF5/iePYyQ6z0CzhjJjOB+dIsv5M93zgYGG8anlkGKxp3tqHFYIis/mfujoeVxoDmAVhDfagstUTS2PLM4TZK7Cf1xiLN+NtzzdH5Ot7X4FgHS5mZZDJJScNmxO6vDTusx6i8HlkA== Received: from BN9PR03CA0498.namprd03.prod.outlook.com (2603:10b6:408:130::23) by PH8PR12MB7327.namprd12.prod.outlook.com (2603:10b6:510:215::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.13; Mon, 5 Dec 2022 14:50:09 +0000 Received: from BL02EPF00010207.namprd05.prod.outlook.com (2603:10b6:408:130:cafe::3e) by BN9PR03CA0498.outlook.office365.com (2603:10b6:408:130::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.14 via Frontend Transport; Mon, 5 Dec 2022 14:50:09 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.160) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.160) by BL02EPF00010207.mail.protection.outlook.com (10.167.241.197) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.8 via Frontend Transport; Mon, 5 Dec 2022 14:50:09 +0000 Received: from rnnvmail202.nvidia.com (10.129.68.7) by mail.nvidia.com (10.129.200.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Mon, 5 Dec 2022 06:49:53 -0800 Received: from rnnvmail205.nvidia.com (10.129.68.10) by rnnvmail202.nvidia.com (10.129.68.7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Mon, 5 Dec 2022 06:49:52 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.129.68.10) with Microsoft SMTP Server id 15.2.986.36 via Frontend Transport; Mon, 5 Dec 2022 06:49:49 -0800 From: Yishai Hadas To: , CC: , , , , , , , , , Subject: [PATCH V3 vfio 13/14] vfio/mlx5: Fallback to STOP_COPY upon specific PRE_COPY error Date: Mon, 5 Dec 2022 16:48:37 +0200 Message-ID: <20221205144838.245287-14-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221205144838.245287-1-yishaih@nvidia.com> References: <20221205144838.245287-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BL02EPF00010207:EE_|PH8PR12MB7327:EE_ X-MS-Office365-Filtering-Correlation-Id: 1a313639-8632-4d12-52b4-08dad6d001b4 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: todedplQchktTFsFAO5WYDdZTLweZBwViHHYf6vX8rHcywZ8b/zSRsJhBKIytvCLPF+cLwI/MvblskBesN0Qm71kLuts688wx+I9VsDtmOs6zpOmictr26CvGWpl3JR9GSqSpyt8VTvAuMsa5ysHvXidzttF6cLKSzolqxyTZMlaX+zfrNE7mAh1Q0+vhyAhpGnTYfLYTBv1VsJeVfXPgvrB5KIvL4hU1ZzEW/XqQlJlNLYYHnRp0tFnBiYXuAZrIY2MCEiyhbE/Q6H+VFNqM/eBxp2s6cWTDupgbJTngjTdpWYHLCsPp1nJOzE1RohPcFWvyhs4ODVep9hM8unMKesZQw4Eh+cwhMuvb8Bq1oIMDpDGLDk4pfxlVgpKnrYco217zBLZlQFUA8O1DGYOmXOse1B1OJuLbC9b3/54G9b4I6Y1UOk5D92iuTVWy78uZ/+yk1KjEgDBGQrsgyxAdzwPU4qNLNVrtOfF/vQBxIh+qkYMevRxzsicFdD556NCHUbpekdvvJSwu8Kg3DGHKfof9Dw+7EeKAmja+DmWuNWl9YzgrNwpyOmJqOwqffyj02NvrmCWq+GB1GOVKejQmaUvBAzAO6CXRwY/usAhJzFIejxoMtFRzRG5t0p+CqnUt/EWgOSwk8odkzLL/2pBm//Z2pgZJF2iMY3hzbr31xEwIXEfOQ+ea+g9vs1T08IDktCz5wyPF4b0VInFL65reg== X-Forefront-Antispam-Report: CIP:216.228.117.160;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge1.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(376002)(346002)(39860400002)(396003)(136003)(451199015)(36840700001)(46966006)(40470700004)(36756003)(70206006)(8676002)(4326008)(70586007)(8936002)(110136005)(40460700003)(54906003)(7636003)(6636002)(2906002)(316002)(82310400005)(47076005)(2616005)(336012)(82740400003)(1076003)(426003)(83380400001)(186003)(36860700001)(40480700001)(86362001)(356005)(478600001)(7696005)(26005)(6666004)(41300700001)(5660300002);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Dec 2022 14:50:09.4452 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 1a313639-8632-4d12-52b4-08dad6d001b4 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.160];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BL02EPF00010207.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH8PR12MB7327 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Shay Drory Before a SAVE command is issued, a QUERY command is issued in order to know the device data size. In case PRE_COPY is used, the above commands are issued while the device is running. Thus, it is possible that between the QUERY and the SAVE commands the state of the device will be changed significantly and thus the SAVE will fail. Currently, if a SAVE command is failing, the driver will fail the migration. In the above case, don't fail the migration, but don't allow for new SAVEs to be executed while the device is in a RUNNING state. Once the device will be moved to STOP_COPY, SAVE can be executed again and the full device state will be read. Signed-off-by: Shay Drory Reviewed-by: Jason Gunthorpe Signed-off-by: Yishai Hadas --- drivers/vfio/pci/mlx5/cmd.c | 27 ++++++++++++++++++++++++++- drivers/vfio/pci/mlx5/cmd.h | 2 ++ drivers/vfio/pci/mlx5/main.c | 6 ++++-- 3 files changed, 32 insertions(+), 3 deletions(-) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index 993749818d90..01ef695e9441 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -84,6 +84,19 @@ int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev, ret = wait_for_completion_interruptible(&mvdev->saving_migf->save_comp); if (ret) return ret; + if (mvdev->saving_migf->state == + MLX5_MIGF_STATE_PRE_COPY_ERROR) { + /* + * In case we had a PRE_COPY error, only query full + * image for final image + */ + if (!(query_flags & MLX5VF_QUERY_FINAL)) { + *state_size = 0; + complete(&mvdev->saving_migf->save_comp); + return 0; + } + query_flags &= ~MLX5VF_QUERY_INC; + } } MLX5_SET(query_vhca_migration_state_in, in, opcode, @@ -442,7 +455,10 @@ void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work) mlx5vf_put_data_buffer(async_data->buf); if (async_data->header_buf) mlx5vf_put_data_buffer(async_data->header_buf); - migf->state = MLX5_MIGF_STATE_ERROR; + if (async_data->status == MLX5_CMD_STAT_BAD_RES_STATE_ERR) + migf->state = MLX5_MIGF_STATE_PRE_COPY_ERROR; + else + migf->state = MLX5_MIGF_STATE_ERROR; wake_up_interruptible(&migf->poll_wait); } mutex_unlock(&migf->lock); @@ -511,6 +527,8 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context) * The error and the cleanup flows can't run from an * interrupt context */ + if (status == -EREMOTEIO) + status = MLX5_GET(save_vhca_state_out, async_data->out, status); async_data->status = status; queue_work(migf->mvdev->cb_wq, &async_data->work); } @@ -534,6 +552,13 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev, if (err) return err; + if (migf->state == MLX5_MIGF_STATE_PRE_COPY_ERROR) + /* + * In case we had a PRE_COPY error, SAVE is triggered only for + * the final image, read device full image. + */ + inc = false; + MLX5_SET(save_vhca_state_in, in, opcode, MLX5_CMD_OP_SAVE_VHCA_STATE); MLX5_SET(save_vhca_state_in, in, op_mod, 0); diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h index 7729eac8c78c..5483171d57ad 100644 --- a/drivers/vfio/pci/mlx5/cmd.h +++ b/drivers/vfio/pci/mlx5/cmd.h @@ -17,6 +17,7 @@ enum mlx5_vf_migf_state { MLX5_MIGF_STATE_ERROR = 1, + MLX5_MIGF_STATE_PRE_COPY_ERROR, MLX5_MIGF_STATE_PRE_COPY, MLX5_MIGF_STATE_SAVE_LAST, MLX5_MIGF_STATE_COMPLETE, @@ -157,6 +158,7 @@ struct mlx5vf_pci_core_device { enum { MLX5VF_QUERY_INC = (1UL << 0), + MLX5VF_QUERY_FINAL = (1UL << 1), }; int mlx5vf_cmd_suspend_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod); diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c index 30365b52e530..88f36793f841 100644 --- a/drivers/vfio/pci/mlx5/main.c +++ b/drivers/vfio/pci/mlx5/main.c @@ -219,6 +219,7 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, if (wait_event_interruptible(migf->poll_wait, !list_empty(&migf->buf_list) || migf->state == MLX5_MIGF_STATE_ERROR || + migf->state == MLX5_MIGF_STATE_PRE_COPY_ERROR || migf->state == MLX5_MIGF_STATE_PRE_COPY || migf->state == MLX5_MIGF_STATE_COMPLETE)) return -ERESTARTSYS; @@ -238,7 +239,8 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len, if (first_loop_call) { first_loop_call = false; /* Temporary end of file as part of PRE_COPY */ - if (end_of_data && migf->state == MLX5_MIGF_STATE_PRE_COPY) { + if (end_of_data && (migf->state == MLX5_MIGF_STATE_PRE_COPY || + migf->state == MLX5_MIGF_STATE_PRE_COPY_ERROR)) { done = -ENOMSG; goto out_unlock; } @@ -431,7 +433,7 @@ static int mlx5vf_pci_save_device_inc_data(struct mlx5vf_pci_core_device *mvdev) return -ENODEV; ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length, - MLX5VF_QUERY_INC); + MLX5VF_QUERY_INC | MLX5VF_QUERY_FINAL); if (ret) goto err; From patchwork Mon Dec 5 14:48:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yishai Hadas X-Patchwork-Id: 13064588 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 85EE6C47089 for ; Mon, 5 Dec 2022 14:50:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231940AbiLEOui (ORCPT ); Mon, 5 Dec 2022 09:50:38 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53822 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231655AbiLEOub (ORCPT ); Mon, 5 Dec 2022 09:50:31 -0500 Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on2069.outbound.protection.outlook.com [40.107.220.69]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8E2651C101 for ; Mon, 5 Dec 2022 06:50:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=GGsWiEjmoEgeuUw5xmaHJ7zfd6alab8ft0cn/i1HMynT6eIISeg3LXL+5WCCtVNAknEdI/trVL+C5nvzlyITaFvdaNzadbQpVkUnoPLYXpnO/t7JneTonMjJJtTD+utO+1xo9Y8oV6fJy8EyYPJpKucMDoObR8KNFiGV8m6luFEkFQaNkdj3/iUxQCPHBBYD+u5MrC8bxU8XXIk/vDu/5FID2P6kQdKMD3LBZVp+yNPTHBbbRXySnDHXvavGtvgJxA+rLnqqdnJA8nZTNUZordEqEszcEiDDgpZbTwhkvA40dOjyeqNfgp1h7IYfa2Y6eKgIyR/0MO+MdnEZKRIfug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=KvAHdI2RID8sQjFZ4CqeVrtRgrAgbEiBY+k2QpZpnuo=; b=eXSf6sDVGyiiIreSgY38Sv+zr2FN0fwgMMhS1yJyCkARez6c2cASYoN+0u4FraonPKHAlgyt4MvV8SRTitQuZBWDpkszeXrgg4EklkuTzMag17uIyR2oEW+j/jKog4zG2Es1Tp1BbgOmmTVk1mShUO0WWM7WYT+CjxY8KQF/NoJDjs12Ew4GOfDzvqu9hRL02FLHYee+xKCJngn20wAjQJdHL6Vn9+tKs9SpiLrtai39WlMOehsC+K8FgqctQdJV1MtDImHC9jcDbGuHDoVZc1vaWFWjXd6F+FHokLq6RKG4+wTVk8qS05LJSg9s6VF1B7yL+CnaTpvCRYmwPn0q4Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=redhat.com smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=KvAHdI2RID8sQjFZ4CqeVrtRgrAgbEiBY+k2QpZpnuo=; b=lF/YaqX40544mLneWU+js5mq+GbSw4d8k/Tfv7QWPWaQL4yNkpYBUZFtnywcUYHesDLlTWnpkCr5Vgp9jQyvyGwN5uM3MtNhV/3YnvH5qbNtH5X7UKHmy74CwcKLkFiuMJMgSD/Y7mFsLuRCFR/obE/2cgSsJ7tLPONSXXIiHr4qoPKUTxqBT8rCHiIS+xSFVfqh+ZFLYXQV5YL6YwZ20dxSe0VRqvC9cAu1aXe1XmGjLQXvAeakFd8r2MtsOec12TxJOAapdetnlaCc1Og82P1t8WZIiZjanD70/3XxB8IpmJht0CpmygtpAgabHdc4GtgOZ7iDTO5Sccsnw8jEGQ== Received: from CY5PR15CA0101.namprd15.prod.outlook.com (2603:10b6:930:7::22) by BL1PR12MB5302.namprd12.prod.outlook.com (2603:10b6:208:31d::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.14; Mon, 5 Dec 2022 14:50:16 +0000 Received: from CY4PEPF0000B8E8.namprd05.prod.outlook.com (2603:10b6:930:7:cafe::2b) by CY5PR15CA0101.outlook.office365.com (2603:10b6:930:7::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.8 via Frontend Transport; Mon, 5 Dec 2022 14:50:15 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by CY4PEPF0000B8E8.mail.protection.outlook.com (10.167.241.4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5813.11 via Frontend Transport; Mon, 5 Dec 2022 14:50:13 +0000 Received: from rnnvmail202.nvidia.com (10.129.68.7) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Mon, 5 Dec 2022 06:49:57 -0800 Received: from rnnvmail205.nvidia.com (10.129.68.10) by rnnvmail202.nvidia.com (10.129.68.7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Mon, 5 Dec 2022 06:49:56 -0800 Received: from vdi.nvidia.com (10.127.8.10) by mail.nvidia.com (10.129.68.10) with Microsoft SMTP Server id 15.2.986.36 via Frontend Transport; Mon, 5 Dec 2022 06:49:53 -0800 From: Yishai Hadas To: , CC: , , , , , , , , , Subject: [PATCH V3 vfio 14/14] vfio/mlx5: Enable MIGRATION_PRE_COPY flag Date: Mon, 5 Dec 2022 16:48:38 +0200 Message-ID: <20221205144838.245287-15-yishaih@nvidia.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20221205144838.245287-1-yishaih@nvidia.com> References: <20221205144838.245287-1-yishaih@nvidia.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY4PEPF0000B8E8:EE_|BL1PR12MB5302:EE_ X-MS-Office365-Filtering-Correlation-Id: a0794ac9-7a22-4a15-f77b-08dad6d00444 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Z1WFwfnaA/ReOrSjmoM8yhdTPTDi4XQX0pyt6VP5wQZjVyAdsShiUnTP8GFYwZxtU1E/Jty8AHWkRuT00Y0451xVatYEa0qbDaXpeYvxGwCOkBK6YvtvnB47JHXM0TwT8y37POjSQE/NKgA0ovQSelLL84qAA7PJrqfC48TrS4QLSaj01EasLuMX5NURrjYWRwSFYhyk51tqBDZ/TIrZQpi2S99ycLaDXmGVrKElzrOc585URyxQvN+rGtqth0XEmy5JpYfwyOe8htEAKwglnezpLldffVCSNP8qKfsTC8azrfnMD7maFTPggZY8PtedgUAgpW3gmwzxSWFHJ5KIfxc0VU7hkEuoFpYNru6ny0y2+l9LgSzZ+pYdVQy0dm4L2TdrXgil6tT4SoSD77l++nYESUAdCp2t4U5h5I6Wo48ubqkvdpHRTbwXHLvhd59SwqhmNcGPq8Es4VIe2pdOSXqjN8g01UUoqVHHo3QpXuP6qpQJ4ppyUIF9h23N5dseXMdnuz58vW0Gow2OUfy9AGqpUdfRl/InLgVPSBdTUKjuWSgvJ9N/1PbiW05dwgk+B7jiW2br3THi+KYOXGySCt2KXJ9CFVO062qYOf2JU0LVA6lxsUrY7VgJ9063FhdgSF8m5cw4UW66CT8BXqFkq9af3DfZYKX43nwkf2Dzm0SVJ+mFXpUyp7J4aUdM0ls9e44nDffPsW4x8oiAnnOJFg== X-Forefront-Antispam-Report: CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230022)(4636009)(39860400002)(346002)(376002)(396003)(136003)(451199015)(36840700001)(40470700004)(46966006)(36756003)(82740400003)(86362001)(356005)(7636003)(2906002)(40460700003)(41300700001)(8936002)(4326008)(4744005)(5660300002)(36860700001)(70206006)(70586007)(478600001)(316002)(54906003)(2616005)(110136005)(6636002)(40480700001)(8676002)(82310400005)(336012)(6666004)(1076003)(47076005)(426003)(7696005)(186003)(26005);DIR:OUT;SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Dec 2022 14:50:13.8110 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: a0794ac9-7a22-4a15-f77b-08dad6d00444 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CY4PEPF0000B8E8.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL1PR12MB5302 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Shay Drory Now that everything has been set up for MIGRATION_PRE_COPY, enable it. Signed-off-by: Shay Drory Reviewed-by: Jason Gunthorpe Signed-off-by: Yishai Hadas --- drivers/vfio/pci/mlx5/cmd.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index 01ef695e9441..64e68d13cb98 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -222,6 +222,11 @@ void mlx5vf_cmd_set_migratable(struct mlx5vf_pci_core_device *mvdev, if (MLX5_CAP_GEN(mvdev->mdev, adv_virtualization)) mvdev->core_device.vdev.log_ops = log_ops; + if (MLX5_CAP_GEN_2(mvdev->mdev, migration_multi_load) && + MLX5_CAP_GEN_2(mvdev->mdev, migration_tracking_state)) + mvdev->core_device.vdev.migration_flags |= + VFIO_MIGRATION_PRE_COPY; + end: mlx5_vf_put_core_dev(mvdev->mdev); }