From patchwork Wed Jul 19 22:35:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brett Creeley X-Patchwork-Id: 13319602 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 20F361FB3E for ; Wed, 19 Jul 2023 22:35:50 +0000 (UTC) Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on20615.outbound.protection.outlook.com [IPv6:2a01:111:f400:fe5a::615]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C8A481FE6; Wed, 19 Jul 2023 15:35:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=OzKzSs36rxzn0c7QLHHByKvuMXmk6UyCUhMyqJ7CDxEkYC5KzTAP73l/cs5V5utzk45mJK6MP8rd/jPKWY9z716pgXd2o7pmw4net5HzRLOR3DGSb5eBHMc70wPLUV3sQrI0lfoBdqLTWKT2thunB8MjWF8SZS468hTfZ0NoDiDCXI3CofWfmum4aQcI+n1kTLbRBkp+5s9t6NYahBRWs8eqjc+jLe5kf7EbZsffu2xPmjRn2JbNUMj0wDdtfEWZ7ZEFlT8Vd+aiJ7H4r7AR3dvIu5kTOqL5WmnGd+m0q3my4UeeySWfpcGCgYNh236O0F7v/DdsUhoAHQMHixruzA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ySOn0UHJNptpXXPBGLj3BDqJ81Vo9WUO8xq2SoXTbbk=; b=GeqjCoCE8Wuhw/Ki+OtxcYPpcb/D/0McdKBMALoPoO6bzaa1YB7RAYMBe/N6G1lrWb+b0MTcdIaY+i6kPTReOgjv0+LI5cbMfB4hJB6oCQa0L9ZAIZ3Y+VPvpFbGFGuZWczN1mURQGtDxYa2wQsSgmYoCNAIGdD6e4maXn61qVgBBN82KlmEaKvJj8fcrK7YwDRi9F89Oy1vZPeHrcwb/k+UwVtSmx8PROWILlYxKOAUnVJWx5nG5FJ9qCylkgHRT3M+cFagH80hY/XcQD7HKDXC43egucRZygjMwiti4BwdPRZJ7BxJzd7jk33ZyCPjKyXWJcf7qXW4nJJ4UGaGFA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ySOn0UHJNptpXXPBGLj3BDqJ81Vo9WUO8xq2SoXTbbk=; b=vlaGXTnWNZ0PNzlZi4yMo1OxTmPGzhfdq/KBbnHEBb5cLqsMjUslKI0vtbWzJ1ZEeShvnQ+qDajVWsh+xgM+KLoM0dD9VfMOjhLPwVhq/uk0uUjyNuMPOTwxCO5UQgSK3j5+doov/B3Kwtu2akLhd4SRkhzGILDL+LFCsOQ9wN8= Received: from BN8PR07CA0013.namprd07.prod.outlook.com (2603:10b6:408:ac::26) by PH7PR12MB8793.namprd12.prod.outlook.com (2603:10b6:510:27a::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6609.24; Wed, 19 Jul 2023 22:35:43 +0000 Received: from BN8NAM11FT072.eop-nam11.prod.protection.outlook.com (2603:10b6:408:ac:cafe::ab) by BN8PR07CA0013.outlook.office365.com (2603:10b6:408:ac::26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6609.24 via Frontend Transport; Wed, 19 Jul 2023 22:35:42 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by BN8NAM11FT072.mail.protection.outlook.com (10.13.176.165) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.6588.34 via Frontend Transport; Wed, 19 Jul 2023 22:35:42 +0000 Received: from driver-dev1.pensando.io (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Wed, 19 Jul 2023 17:35:41 -0500 From: Brett Creeley To: , , , , , , CC: , Subject: [PATCH v12 vfio 1/7] vfio: Commonize combine_ranges for use in other VFIO drivers Date: Wed, 19 Jul 2023 15:35:21 -0700 Message-ID: <20230719223527.12795-2-brett.creeley@amd.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230719223527.12795-1-brett.creeley@amd.com> References: <20230719223527.12795-1-brett.creeley@amd.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN8NAM11FT072:EE_|PH7PR12MB8793:EE_ X-MS-Office365-Filtering-Correlation-Id: 239fed58-7562-4a30-cb20-08db88a87c8d X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Bg6/sufR41Mpc/co+8ESVpD6uhYMY/gYewxVOApNqTvL/LlapgD16Zc4Y6rKLwT54aeMd3UCFyBPeCHAOG2Iy0PlFVbHu3eF9gS59dYjzm4sD6I0tGqXmcY533MhC2MI7xdDs01KsWLoQToOPwgnM3qlUSRFh0AisSwE7AwPgoinDexKRjAiaREOxuR+ocboScrnCgbHiomagMInt0/FcXNcMNxmwl8bR7qfM8p9ej45hWHl/EQZ05D9jb6U/XEqEQhqAaMZmd+MsA/t0d6LEnsxf42MSZ6EiH2vuOgxGuB+qnrZMGEwGbntp0G3ZYEA05p2BOgm3Cu5PWVXKQdgehhSrjMxd9/U6MlASxyHtHeQJ4KwhNKmo+krtZten2rNkhD4TzGthrs40xOYFYMXwHgXirOHhXQg+/tEbXNjZ/jWBq8h4fDzZJptCBCly6lI2CCRHGzzSs6t+B8yuuRxStmuOf6jpbxaH5Dy+szH9w7iZ7UjAWk8+9rHTQrV6d1r5LJeMT7gE8RCi1wArdkYAbVcftRUmKv02zRo0E0vDgyg4FPIhtnl8FwI77Bvny3eLIdzcS8NLiFy9PkYDdbdKza9bwrLfSMCKF95llcf64+ZT2trl8TIummwuh4ACGILia7TeUomt39DVFlGFnkKmp378KsrXOIJRjovs2MsncTwFJ3n3PvnBk0GLJDukphzydwu2XxxdcQP/ZViTWa8SN3d2a07FkKjfd3KQ+BOdWwklZBpwVrR5JcUVWg2bBlTP5pPhJsXj5UDKsAhihYwSA== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230028)(4636009)(376002)(346002)(39860400002)(396003)(136003)(82310400008)(451199021)(46966006)(36840700001)(40470700004)(1076003)(26005)(186003)(336012)(478600001)(70206006)(70586007)(316002)(4326008)(16526019)(41300700001)(86362001)(6666004)(8676002)(8936002)(44832011)(54906003)(110136005)(5660300002)(36756003)(2616005)(426003)(2906002)(36860700001)(40460700003)(83380400001)(47076005)(66899021)(40480700001)(82740400003)(81166007)(356005)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 19 Jul 2023 22:35:42.8139 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 239fed58-7562-4a30-cb20-08db88a87c8d X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT072.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH7PR12MB8793 X-Spam-Status: No, score=-1.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FORGED_SPF_HELO, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_PASS,SPF_NONE,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Currently only Mellanox uses the combine_ranges function. The new pds_vfio driver also needs this function. So, move it to a common location for other vendor drivers to use. Also, Simon Harmon noticed that RCT ordering was not followed for vfio_combin_iova_ranges(), so fix that. Cc: Yishai Hadas Signed-off-by: Brett Creeley Signed-off-by: Shannon Nelson Reviewed-by: Simon Horman --- drivers/vfio/pci/mlx5/cmd.c | 48 +------------------------------------ drivers/vfio/vfio_main.c | 47 ++++++++++++++++++++++++++++++++++++ include/linux/vfio.h | 3 +++ 3 files changed, 51 insertions(+), 47 deletions(-) diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c index deed156e6165..7f6c51992a15 100644 --- a/drivers/vfio/pci/mlx5/cmd.c +++ b/drivers/vfio/pci/mlx5/cmd.c @@ -732,52 +732,6 @@ void mlx5fv_cmd_clean_migf_resources(struct mlx5_vf_migration_file *migf) mlx5vf_cmd_dealloc_pd(migf); } -static void combine_ranges(struct rb_root_cached *root, u32 cur_nodes, - u32 req_nodes) -{ - struct interval_tree_node *prev, *curr, *comb_start, *comb_end; - unsigned long min_gap; - unsigned long curr_gap; - - /* Special shortcut when a single range is required */ - if (req_nodes == 1) { - unsigned long last; - - curr = comb_start = interval_tree_iter_first(root, 0, ULONG_MAX); - while (curr) { - last = curr->last; - prev = curr; - curr = interval_tree_iter_next(curr, 0, ULONG_MAX); - if (prev != comb_start) - interval_tree_remove(prev, root); - } - comb_start->last = last; - return; - } - - /* Combine ranges which have the smallest gap */ - while (cur_nodes > req_nodes) { - prev = NULL; - min_gap = ULONG_MAX; - curr = interval_tree_iter_first(root, 0, ULONG_MAX); - while (curr) { - if (prev) { - curr_gap = curr->start - prev->last; - if (curr_gap < min_gap) { - min_gap = curr_gap; - comb_start = prev; - comb_end = curr; - } - } - prev = curr; - curr = interval_tree_iter_next(curr, 0, ULONG_MAX); - } - comb_start->last = comb_end->last; - interval_tree_remove(comb_end, root); - cur_nodes--; - } -} - static int mlx5vf_create_tracker(struct mlx5_core_dev *mdev, struct mlx5vf_pci_core_device *mvdev, struct rb_root_cached *ranges, u32 nnodes) @@ -800,7 +754,7 @@ static int mlx5vf_create_tracker(struct mlx5_core_dev *mdev, int i; if (num_ranges > max_num_range) { - combine_ranges(ranges, nnodes, max_num_range); + vfio_combine_iova_ranges(ranges, nnodes, max_num_range); num_ranges = max_num_range; } diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index f0ca33b2e1df..3bde62f7e08b 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -865,6 +865,53 @@ static int vfio_ioctl_device_feature_migration(struct vfio_device *device, return 0; } +void vfio_combine_iova_ranges(struct rb_root_cached *root, u32 cur_nodes, + u32 req_nodes) +{ + struct interval_tree_node *prev, *curr, *comb_start, *comb_end; + unsigned long min_gap, curr_gap; + + /* Special shortcut when a single range is required */ + if (req_nodes == 1) { + unsigned long last; + + comb_start = interval_tree_iter_first(root, 0, ULONG_MAX); + curr = comb_start; + while (curr) { + last = curr->last; + prev = curr; + curr = interval_tree_iter_next(curr, 0, ULONG_MAX); + if (prev != comb_start) + interval_tree_remove(prev, root); + } + comb_start->last = last; + return; + } + + /* Combine ranges which have the smallest gap */ + while (cur_nodes > req_nodes) { + prev = NULL; + min_gap = ULONG_MAX; + curr = interval_tree_iter_first(root, 0, ULONG_MAX); + while (curr) { + if (prev) { + curr_gap = curr->start - prev->last; + if (curr_gap < min_gap) { + min_gap = curr_gap; + comb_start = prev; + comb_end = curr; + } + } + prev = curr; + curr = interval_tree_iter_next(curr, 0, ULONG_MAX); + } + comb_start->last = comb_end->last; + interval_tree_remove(comb_end, root); + cur_nodes--; + } +} +EXPORT_SYMBOL_GPL(vfio_combine_iova_ranges); + /* Ranges should fit into a single kernel page */ #define LOG_MAX_RANGES \ (PAGE_SIZE / sizeof(struct vfio_device_feature_dma_logging_range)) diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 2c137ea94a3e..f49933b63ac3 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -245,6 +245,9 @@ int vfio_mig_get_next_state(struct vfio_device *device, enum vfio_device_mig_state new_fsm, enum vfio_device_mig_state *next_fsm); +void vfio_combine_iova_ranges(struct rb_root_cached *root, u32 cur_nodes, + u32 req_nodes); + /* * External user API */ From patchwork Wed Jul 19 22:35:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brett Creeley X-Patchwork-Id: 13319601 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1EF3D22F1A for ; Wed, 19 Jul 2023 22:35:49 +0000 (UTC) Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on2053.outbound.protection.outlook.com [40.107.244.53]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3FD30B3; Wed, 19 Jul 2023 15:35:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=PXjDHM+PMMs4dT6T079OCK/OwpVN3XCdV09REoFob934z5l2xCqKqrAkBjEiPNwP3kAkWarZfI62ouDnAu2wacv4vtttCAMziMH0QHsmM9LvDCiE2ha2bouczU8uLaaNlurED8BUOLAzqSl1w0XAqx/UaiUkyrzaXDru7grh0k+59PbBmegxlxoJy8Y7Xtycey67O8pSKKmuRpwUmhpHKNSjoUTzrhDcsP8oy8Y71TDJ4LZsH/tnNuDxyqkP7tJEx4yTBw1J791jXwKASy9MjtMPa8+yUNWa6rYQtTXKT7QSekTCPFqbpkKL53JxTWEpRZel1sEGfAs+5WF1yhbntA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=0OOGZxpnlBGxUtdbItvgTGYmpVYI4NJnsPwixahBXNw=; b=V4nO491PTmZpAAxc5WxUvtjbdNBoo2xT6M1gHH6ERLpHhES294U/S8WNxVXfI4bgrecZ8A3jMEXWCZ4L76B8WbXNnEP0H8GNzGgHTwalaheIISDgEYQJ32sqGoOngTxZY1X+F7ItsJVJPTKknhwpLF8VH1bzz1WMC1Zrx/BMu8jIf9S3kZDX2czRaLnzFBX1QesjzHq6MIOdS1YJt1wC2ORdtyvoHVuh5pOAESLxrLjvHTkV6PVdrOw0v0Uu9qA7T42BsrlzkHYhYgWknZuCmdG9WX0WwpouZi/RqVuXDZFgHHYWQOIylybIbhd7WIz7/gXZsnaMKk05k28Fk+qWSA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=0OOGZxpnlBGxUtdbItvgTGYmpVYI4NJnsPwixahBXNw=; b=XHAq+I2mO05fgDLp21a/025tJTv1upfSc8GvZJXL8YFg+Kz6u1NaCatK90I9NT68t0fbN+mefT4/vce/KxhkKrE19oUQT4wiSmCAH/Zy5sdod0NuGkJKbYlCwmdLzmhxlQ8Uitji+vTxsxzj7vEkhGB1d//T+4oLO5LaFMDweqM= Received: from BN9PR03CA0976.namprd03.prod.outlook.com (2603:10b6:408:109::21) by SA1PR12MB6726.namprd12.prod.outlook.com (2603:10b6:806:255::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6609.24; Wed, 19 Jul 2023 22:35:45 +0000 Received: from BN8NAM11FT023.eop-nam11.prod.protection.outlook.com (2603:10b6:408:109:cafe::15) by BN9PR03CA0976.outlook.office365.com (2603:10b6:408:109::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6609.24 via Frontend Transport; Wed, 19 Jul 2023 22:35:45 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by BN8NAM11FT023.mail.protection.outlook.com (10.13.177.103) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.6588.34 via Frontend Transport; Wed, 19 Jul 2023 22:35:44 +0000 Received: from driver-dev1.pensando.io (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Wed, 19 Jul 2023 17:35:43 -0500 From: Brett Creeley To: , , , , , , CC: , Subject: [PATCH v12 vfio 2/7] vfio/pds: Initial support for pds VFIO driver Date: Wed, 19 Jul 2023 15:35:22 -0700 Message-ID: <20230719223527.12795-3-brett.creeley@amd.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230719223527.12795-1-brett.creeley@amd.com> References: <20230719223527.12795-1-brett.creeley@amd.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN8NAM11FT023:EE_|SA1PR12MB6726:EE_ X-MS-Office365-Filtering-Correlation-Id: 162cc801-b34e-4d60-85be-08db88a87dde X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 0TvD0oqUt47LSUMxQCxuyMG6eHsndvI6ojbFFBY0BB2THh+lYOEYcd8odl/8enmWeBzJ0KfxBnFhShAgX2/lutZnoSjX7H4IIcalgqUCK/lXpPxSack0fCQPJSq+foifSk5qgCAefFqbRrcXQVSNiyQM0wTtjVLFnBwWTVJMY1PNYIZ6oRjNp49EuVOkwbsA+qTsUCYW5+Rpp29FlNrKSnWjs09GenvxPlzLffedtxhtxKg63xKY1CBPuGwFs9Hz8Xh0Ep56DKHiqXfEd5dfrw7/6WcaEU/gn92L+vjax5+ZJjVjITraU9MfTPi/LAL0sID8cu6sSnAnXmA4lyQVlEBtzkSE2ocdHsNUc5J8X+cOtyiqlDy1juoHyOREvNNA+SHs3pggbLw44BuBpxc4teju9xqOkCPsQLdJiKE7Kxyqg3YHP6ds7gbnOnEj0yWSJL1Gt70azUC1SWmRnLEt08P5OesyfnQxRjILSV4JgmAQjXyw13eO3ABHTO5OIbG1eVjq5eWw1gaGIQDwMD9XYWqA+VWVE2nlmXN5lIz1NdG/cEj7ReDmCWiVCMjraqgQWVniUGUbzkohUC6IslytAik7I/wwoxCvnuFE5qr7H3NfuHq9eVq+eTTpg9JZ0jjUZk4iSjZOkqGDVQPDMZsHlrDme6E3Y6j3QsAuoazlKPrldgKkZNCYa8H8m74sYPKsxxLnZqAuAvdaVdWLwRZmjGs6hV32eHfoqcwgllb4Lk4qwZiFDCN3JIsUJ/nr2o52hFqECiGoHMQHJRUMUzRpZA== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230028)(4636009)(396003)(376002)(346002)(136003)(39860400002)(451199021)(82310400008)(36840700001)(46966006)(40470700004)(86362001)(478600001)(83380400001)(1076003)(336012)(40480700001)(186003)(41300700001)(26005)(110136005)(54906003)(8676002)(8936002)(316002)(356005)(2906002)(4326008)(40460700003)(44832011)(36860700001)(70206006)(6666004)(81166007)(426003)(2616005)(16526019)(70586007)(36756003)(5660300002)(47076005)(82740400003)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 19 Jul 2023 22:35:44.9894 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 162cc801-b34e-4d60-85be-08db88a87dde X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT023.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA1PR12MB6726 X-Spam-Status: No, score=-1.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FORGED_SPF_HELO, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE, T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net This is the initial framework for the new pds-vfio-pci device driver. This does the very basics of registering the PDS PCI device and configuring it as a VFIO PCI device. With this change, the VF device can be bound to the pds-vfio-pci driver on the host and presented to the VM as an ethernet VF. Signed-off-by: Brett Creeley Signed-off-by: Shannon Nelson --- drivers/vfio/pci/Makefile | 2 + drivers/vfio/pci/pds/Makefile | 8 ++++ drivers/vfio/pci/pds/pci_drv.c | 68 ++++++++++++++++++++++++++++++ drivers/vfio/pci/pds/vfio_dev.c | 75 +++++++++++++++++++++++++++++++++ drivers/vfio/pci/pds/vfio_dev.h | 19 +++++++++ 5 files changed, 172 insertions(+) create mode 100644 drivers/vfio/pci/pds/Makefile create mode 100644 drivers/vfio/pci/pds/pci_drv.c create mode 100644 drivers/vfio/pci/pds/vfio_dev.c create mode 100644 drivers/vfio/pci/pds/vfio_dev.h diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile index 24c524224da5..45167be462d8 100644 --- a/drivers/vfio/pci/Makefile +++ b/drivers/vfio/pci/Makefile @@ -11,3 +11,5 @@ obj-$(CONFIG_VFIO_PCI) += vfio-pci.o obj-$(CONFIG_MLX5_VFIO_PCI) += mlx5/ obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/ + +obj-$(CONFIG_PDS_VFIO_PCI) += pds/ diff --git a/drivers/vfio/pci/pds/Makefile b/drivers/vfio/pci/pds/Makefile new file mode 100644 index 000000000000..e5e53a6d86d1 --- /dev/null +++ b/drivers/vfio/pci/pds/Makefile @@ -0,0 +1,8 @@ +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2023 Advanced Micro Devices, Inc. + +obj-$(CONFIG_PDS_VFIO_PCI) += pds-vfio-pci.o + +pds-vfio-pci-y := \ + pci_drv.o \ + vfio_dev.o diff --git a/drivers/vfio/pci/pds/pci_drv.c b/drivers/vfio/pci/pds/pci_drv.c new file mode 100644 index 000000000000..4670ddda603a --- /dev/null +++ b/drivers/vfio/pci/pds/pci_drv.c @@ -0,0 +1,68 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright(c) 2023 Advanced Micro Devices, Inc. */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include + +#include + +#include "vfio_dev.h" + +#define PDS_VFIO_DRV_DESCRIPTION "AMD/Pensando VFIO Device Driver" +#define PCI_VENDOR_ID_PENSANDO 0x1dd8 + +static int pds_vfio_pci_probe(struct pci_dev *pdev, + const struct pci_device_id *id) +{ + struct pds_vfio_pci_device *pds_vfio; + int err; + + pds_vfio = vfio_alloc_device(pds_vfio_pci_device, vfio_coredev.vdev, + &pdev->dev, pds_vfio_ops_info()); + if (IS_ERR(pds_vfio)) + return PTR_ERR(pds_vfio); + + dev_set_drvdata(&pdev->dev, &pds_vfio->vfio_coredev); + + err = vfio_pci_core_register_device(&pds_vfio->vfio_coredev); + if (err) + goto out_put_vdev; + + return 0; + +out_put_vdev: + vfio_put_device(&pds_vfio->vfio_coredev.vdev); + return err; +} + +static void pds_vfio_pci_remove(struct pci_dev *pdev) +{ + struct pds_vfio_pci_device *pds_vfio = pds_vfio_pci_drvdata(pdev); + + vfio_pci_core_unregister_device(&pds_vfio->vfio_coredev); + vfio_put_device(&pds_vfio->vfio_coredev.vdev); +} + +static const struct pci_device_id pds_vfio_pci_table[] = { + { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_PENSANDO, 0x1003) }, /* Ethernet VF */ + { 0, } +}; +MODULE_DEVICE_TABLE(pci, pds_vfio_pci_table); + +static struct pci_driver pds_vfio_pci_driver = { + .name = KBUILD_MODNAME, + .id_table = pds_vfio_pci_table, + .probe = pds_vfio_pci_probe, + .remove = pds_vfio_pci_remove, + .driver_managed_dma = true, +}; + +module_pci_driver(pds_vfio_pci_driver); + +MODULE_DESCRIPTION(PDS_VFIO_DRV_DESCRIPTION); +MODULE_AUTHOR("Brett Creeley "); +MODULE_LICENSE("GPL"); diff --git a/drivers/vfio/pci/pds/vfio_dev.c b/drivers/vfio/pci/pds/vfio_dev.c new file mode 100644 index 000000000000..7c125643f5d9 --- /dev/null +++ b/drivers/vfio/pci/pds/vfio_dev.c @@ -0,0 +1,75 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright(c) 2023 Advanced Micro Devices, Inc. */ + +#include +#include + +#include "vfio_dev.h" + +struct pds_vfio_pci_device *pds_vfio_pci_drvdata(struct pci_dev *pdev) +{ + struct vfio_pci_core_device *core_device = dev_get_drvdata(&pdev->dev); + + return container_of(core_device, struct pds_vfio_pci_device, + vfio_coredev); +} + +static int pds_vfio_init_device(struct vfio_device *vdev) +{ + struct pds_vfio_pci_device *pds_vfio = + container_of(vdev, struct pds_vfio_pci_device, + vfio_coredev.vdev); + struct pci_dev *pdev = to_pci_dev(vdev->dev); + int err, vf_id; + + err = vfio_pci_core_init_dev(vdev); + if (err) + return err; + + vf_id = pci_iov_vf_id(pdev); + if (vf_id < 0) + return vf_id; + + pds_vfio->vf_id = vf_id; + + return 0; +} + +static int pds_vfio_open_device(struct vfio_device *vdev) +{ + struct pds_vfio_pci_device *pds_vfio = + container_of(vdev, struct pds_vfio_pci_device, + vfio_coredev.vdev); + int err; + + err = vfio_pci_core_enable(&pds_vfio->vfio_coredev); + if (err) + return err; + + vfio_pci_core_finish_enable(&pds_vfio->vfio_coredev); + + return 0; +} + +static const struct vfio_device_ops pds_vfio_ops = { + .name = "pds-vfio", + .init = pds_vfio_init_device, + .release = vfio_pci_core_release_dev, + .open_device = pds_vfio_open_device, + .close_device = vfio_pci_core_close_device, + .ioctl = vfio_pci_core_ioctl, + .device_feature = vfio_pci_core_ioctl_feature, + .read = vfio_pci_core_read, + .write = vfio_pci_core_write, + .mmap = vfio_pci_core_mmap, + .request = vfio_pci_core_request, + .match = vfio_pci_core_match, + .bind_iommufd = vfio_iommufd_physical_bind, + .unbind_iommufd = vfio_iommufd_physical_unbind, + .attach_ioas = vfio_iommufd_physical_attach_ioas, +}; + +const struct vfio_device_ops *pds_vfio_ops_info(void) +{ + return &pds_vfio_ops; +} diff --git a/drivers/vfio/pci/pds/vfio_dev.h b/drivers/vfio/pci/pds/vfio_dev.h new file mode 100644 index 000000000000..a4d4b65778d1 --- /dev/null +++ b/drivers/vfio/pci/pds/vfio_dev.h @@ -0,0 +1,19 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright(c) 2023 Advanced Micro Devices, Inc. */ + +#ifndef _VFIO_DEV_H_ +#define _VFIO_DEV_H_ + +#include +#include + +struct pds_vfio_pci_device { + struct vfio_pci_core_device vfio_coredev; + + int vf_id; +}; + +const struct vfio_device_ops *pds_vfio_ops_info(void); +struct pds_vfio_pci_device *pds_vfio_pci_drvdata(struct pci_dev *pdev); + +#endif /* _VFIO_DEV_H_ */ From patchwork Wed Jul 19 22:35:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brett Creeley X-Patchwork-Id: 13319603 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 495711FB3E for ; Wed, 19 Jul 2023 22:35:55 +0000 (UTC) Received: from NAM12-BN8-obe.outbound.protection.outlook.com (mail-bn8nam12on20626.outbound.protection.outlook.com [IPv6:2a01:111:f400:fe5b::626]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D91701FDC; Wed, 19 Jul 2023 15:35:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=G4AumaItiwQw/XfrtyJfdwP/nh/bvelxJek8Cms2T2hXEvz9E4ig57oSmfPT52brNYUs5/ARKNdzLyIMNRrgDUs8aieQ8C+ZoXIv7wmt3vo4OAYMCz4IGlw8ZIQLPhhcFieFVYVYYvtdSvWJ+caAl5a+Ub+cDXswdjc66cinXxne0iBnWvA6/0HBuw4FSfiKdOlYxpDyRwSJ2yAJa7uvbR6TPOJYzDF1AW5wILjUqOpkTyVI0GSqbkc2epYmL4vjre8jfg0dh2pBrqGvHoUxx0GpHONBDa9TvjYEgPdG5jHLn3LrCHQy6x5HgkKF2aK2X3tkqrMNUzbnf/KUAs6LuA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Ad9eew3UxzXlhFCK0o+Gt1Yk0iZEZrBshErmLHK32Y4=; b=BvNd3ht9mcitLucI9/6KR/5B0gbqomPv8/PJjvTn3JOlIwRYfZjBaV+1g+I2paFYC3ILe3cmUXbobjz896XRje1wM00MSgTvjw68mV+GKz3vkIJJfs63Gr0/qxp9/D1kaOmqoPplCXqS1szQ8avLx1EypcEszoVoxjZ6uCtzPetwUoWyC7FPRdwz4pUjwDl2IkZZuOAvOQBNayF5bpPDGj+u0j91Uh8qzn0kgZFFTtlCOBtsKX4S/c/awF4hYA7N/ylbjHYDw/E24P8PK3p0dOIBwt18iDrHYZKVIHsYqxGVnRYOvFXJKTLiD0ZvZ6y8bFNd+4Vy6e08PxeRb4cUhA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Ad9eew3UxzXlhFCK0o+Gt1Yk0iZEZrBshErmLHK32Y4=; b=V6cb59fDoebtmQb6AZGG4kcjgOU+OYkEep4oSHeOOli4EOdIcICe8OOgSk67K2LiICkS6MqGcDJULYo1HMlbK4DgNTvArbFgxnNvTPoELZb3pZjNe9KcxsqwMBj3lucdsl0OSbRpMDFgTL+RpHYLXLrfxdGlXlPD8fxq/jX8+a8= Received: from BN8PR15CA0057.namprd15.prod.outlook.com (2603:10b6:408:80::34) by CO6PR12MB5412.namprd12.prod.outlook.com (2603:10b6:5:35e::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6609.24; Wed, 19 Jul 2023 22:35:46 +0000 Received: from BN8NAM11FT096.eop-nam11.prod.protection.outlook.com (2603:10b6:408:80:cafe::1b) by BN8PR15CA0057.outlook.office365.com (2603:10b6:408:80::34) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6609.24 via Frontend Transport; Wed, 19 Jul 2023 22:35:46 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by BN8NAM11FT096.mail.protection.outlook.com (10.13.177.195) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.6588.35 via Frontend Transport; Wed, 19 Jul 2023 22:35:46 +0000 Received: from driver-dev1.pensando.io (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Wed, 19 Jul 2023 17:35:45 -0500 From: Brett Creeley To: , , , , , , CC: , Subject: [PATCH v12 vfio 3/7] vfio/pds: register with the pds_core PF Date: Wed, 19 Jul 2023 15:35:23 -0700 Message-ID: <20230719223527.12795-4-brett.creeley@amd.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230719223527.12795-1-brett.creeley@amd.com> References: <20230719223527.12795-1-brett.creeley@amd.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN8NAM11FT096:EE_|CO6PR12MB5412:EE_ X-MS-Office365-Filtering-Correlation-Id: e910132f-f4df-4593-618f-08db88a87ed7 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: jjOsUyy3VqW3iegptotcDluW4v4SXP3QFXXi2UphOozSbMTeXu6z+Al64PXILc2ut7PGHyzMHw1+1XJ3DtjXVYVphk5pmmq8sjBxXuKeBgcHt21hi6nM63xkSQ/xfqcSL17AO5cvro07xQWUse/RJ5Syip42G7Fo3Ia9rktNarcdU8gnpEuiGWy7xQeGmuWAXYJT0AzZ3LXS8egM2uV3BwBf8oDVgRtQgOV8EwySTqPaJOGjfjiaP4WFjzX0+VWRaRDjKL7K69WB4t5MofywzmSG5Mv8XjZsugp8P9xQQ5PAABYAvdIzG3hEvq/8swDfdYEy5mnCr74fD05gCRE5qcPO+a2lfhIuAiZTPKXNHQqJgKBS6E49YmSloziTXhp0SKZWFFeE/EHP3whwi3F9C8m5NcXWsK1PJAcUYFT42HPZlwi8e0vsgxCeUve4z0vODiB4zyuQh51gzC5X8AxnaFDLpUopRV5tQOLKq49ApA8CHYay7zNsjtiFr6OljO6xM649a+QZbbqlsuzo16emQgw2CQ1KQdvsJkSJmBkM107ZgjTck/HFX0i2vvx11HRTy0VdmK/oPxOEoip9uUeY+igMXVnRqgUx8y7/noMLc74nL9WlTt60E+QDGCwegnbouK+4z0+dh7oTCAceMGLlb6Xr2yfvD+OPRYTjx1qPpqv2hr/lnBQBTv89McZuWzjiSOIgonhhPjqfIqj/lo1oGnYB/IrCdCEh9liXRkzdCBAPpbEdOYwUR9MepjKnDP7XUSRn4Aw4+8bdE68qwzmtgA== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230028)(4636009)(136003)(346002)(396003)(39860400002)(376002)(451199021)(82310400008)(46966006)(36840700001)(40470700004)(54906003)(110136005)(478600001)(6666004)(40480700001)(36860700001)(36756003)(86362001)(83380400001)(40460700003)(2906002)(2616005)(426003)(26005)(1076003)(336012)(186003)(16526019)(82740400003)(81166007)(316002)(356005)(70586007)(44832011)(70206006)(4326008)(8676002)(47076005)(41300700001)(5660300002)(8936002)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 19 Jul 2023 22:35:46.6516 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: e910132f-f4df-4593-618f-08db88a87ed7 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT096.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CO6PR12MB5412 X-Spam-Status: No, score=-1.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FORGED_SPF_HELO, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_PASS,SPF_NONE,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net The pds_core driver will supply adminq services, so find the PF and register with the DSC services. Use the following commands to enable a VF: echo 1 > /sys/bus/pci/drivers/pds_core/$PF_BDF/sriov_numvfs Signed-off-by: Brett Creeley Signed-off-by: Shannon Nelson --- drivers/vfio/pci/pds/Makefile | 1 + drivers/vfio/pci/pds/cmds.c | 44 +++++++++++++++++++++++++++++++++ drivers/vfio/pci/pds/cmds.h | 10 ++++++++ drivers/vfio/pci/pds/pci_drv.c | 19 ++++++++++++++ drivers/vfio/pci/pds/pci_drv.h | 9 +++++++ drivers/vfio/pci/pds/vfio_dev.c | 13 +++++++++- drivers/vfio/pci/pds/vfio_dev.h | 6 +++++ include/linux/pds/pds_common.h | 3 ++- 8 files changed, 103 insertions(+), 2 deletions(-) create mode 100644 drivers/vfio/pci/pds/cmds.c create mode 100644 drivers/vfio/pci/pds/cmds.h create mode 100644 drivers/vfio/pci/pds/pci_drv.h diff --git a/drivers/vfio/pci/pds/Makefile b/drivers/vfio/pci/pds/Makefile index e5e53a6d86d1..91587c7fe8f9 100644 --- a/drivers/vfio/pci/pds/Makefile +++ b/drivers/vfio/pci/pds/Makefile @@ -4,5 +4,6 @@ obj-$(CONFIG_PDS_VFIO_PCI) += pds-vfio-pci.o pds-vfio-pci-y := \ + cmds.o \ pci_drv.o \ vfio_dev.o diff --git a/drivers/vfio/pci/pds/cmds.c b/drivers/vfio/pci/pds/cmds.c new file mode 100644 index 000000000000..d6925bceed26 --- /dev/null +++ b/drivers/vfio/pci/pds/cmds.c @@ -0,0 +1,44 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright(c) 2023 Advanced Micro Devices, Inc. */ + +#include +#include + +#include +#include +#include + +#include "vfio_dev.h" +#include "cmds.h" + +int pds_vfio_register_client_cmd(struct pds_vfio_pci_device *pds_vfio) +{ + struct pci_dev *pdev = pds_vfio_to_pci_dev(pds_vfio); + char devname[PDS_DEVNAME_LEN]; + int ci; + + snprintf(devname, sizeof(devname), "%s.%d-%u", PDS_LM_DEV_NAME, + pci_domain_nr(pdev->bus), + PCI_DEVID(pdev->bus->number, pdev->devfn)); + + ci = pds_client_register(pci_physfn(pdev), devname); + if (ci < 0) + return ci; + + pds_vfio->client_id = ci; + + return 0; +} + +void pds_vfio_unregister_client_cmd(struct pds_vfio_pci_device *pds_vfio) +{ + struct pci_dev *pdev = pds_vfio_to_pci_dev(pds_vfio); + int err; + + err = pds_client_unregister(pci_physfn(pdev), pds_vfio->client_id); + if (err) + dev_err(&pdev->dev, "unregister from DSC failed: %pe\n", + ERR_PTR(err)); + + pds_vfio->client_id = 0; +} diff --git a/drivers/vfio/pci/pds/cmds.h b/drivers/vfio/pci/pds/cmds.h new file mode 100644 index 000000000000..4c592afccf89 --- /dev/null +++ b/drivers/vfio/pci/pds/cmds.h @@ -0,0 +1,10 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright(c) 2023 Advanced Micro Devices, Inc. */ + +#ifndef _CMDS_H_ +#define _CMDS_H_ + +int pds_vfio_register_client_cmd(struct pds_vfio_pci_device *pds_vfio); +void pds_vfio_unregister_client_cmd(struct pds_vfio_pci_device *pds_vfio); + +#endif /* _CMDS_H_ */ diff --git a/drivers/vfio/pci/pds/pci_drv.c b/drivers/vfio/pci/pds/pci_drv.c index 4670ddda603a..928903a84f27 100644 --- a/drivers/vfio/pci/pds/pci_drv.c +++ b/drivers/vfio/pci/pds/pci_drv.c @@ -8,9 +8,13 @@ #include #include +#include #include +#include #include "vfio_dev.h" +#include "pci_drv.h" +#include "cmds.h" #define PDS_VFIO_DRV_DESCRIPTION "AMD/Pensando VFIO Device Driver" #define PCI_VENDOR_ID_PENSANDO 0x1dd8 @@ -27,13 +31,27 @@ static int pds_vfio_pci_probe(struct pci_dev *pdev, return PTR_ERR(pds_vfio); dev_set_drvdata(&pdev->dev, &pds_vfio->vfio_coredev); + pds_vfio->pdsc = pdsc_get_pf_struct(pdev); + if (IS_ERR_OR_NULL(pds_vfio->pdsc)) { + err = PTR_ERR(pds_vfio->pdsc) ?: -ENODEV; + goto out_put_vdev; + } err = vfio_pci_core_register_device(&pds_vfio->vfio_coredev); if (err) goto out_put_vdev; + err = pds_vfio_register_client_cmd(pds_vfio); + if (err) { + dev_err(&pdev->dev, "failed to register as client: %pe\n", + ERR_PTR(err)); + goto out_unregister_coredev; + } + return 0; +out_unregister_coredev: + vfio_pci_core_unregister_device(&pds_vfio->vfio_coredev); out_put_vdev: vfio_put_device(&pds_vfio->vfio_coredev.vdev); return err; @@ -43,6 +61,7 @@ static void pds_vfio_pci_remove(struct pci_dev *pdev) { struct pds_vfio_pci_device *pds_vfio = pds_vfio_pci_drvdata(pdev); + pds_vfio_unregister_client_cmd(pds_vfio); vfio_pci_core_unregister_device(&pds_vfio->vfio_coredev); vfio_put_device(&pds_vfio->vfio_coredev.vdev); } diff --git a/drivers/vfio/pci/pds/pci_drv.h b/drivers/vfio/pci/pds/pci_drv.h new file mode 100644 index 000000000000..e79bed12ed14 --- /dev/null +++ b/drivers/vfio/pci/pds/pci_drv.h @@ -0,0 +1,9 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright(c) 2023 Advanced Micro Devices, Inc. */ + +#ifndef _PCI_DRV_H +#define _PCI_DRV_H + +#include + +#endif /* _PCI_DRV_H */ diff --git a/drivers/vfio/pci/pds/vfio_dev.c b/drivers/vfio/pci/pds/vfio_dev.c index 7c125643f5d9..5299cfb262d5 100644 --- a/drivers/vfio/pci/pds/vfio_dev.c +++ b/drivers/vfio/pci/pds/vfio_dev.c @@ -6,6 +6,11 @@ #include "vfio_dev.h" +struct pci_dev *pds_vfio_to_pci_dev(struct pds_vfio_pci_device *pds_vfio) +{ + return pds_vfio->vfio_coredev.pdev; +} + struct pds_vfio_pci_device *pds_vfio_pci_drvdata(struct pci_dev *pdev) { struct vfio_pci_core_device *core_device = dev_get_drvdata(&pdev->dev); @@ -20,7 +25,7 @@ static int pds_vfio_init_device(struct vfio_device *vdev) container_of(vdev, struct pds_vfio_pci_device, vfio_coredev.vdev); struct pci_dev *pdev = to_pci_dev(vdev->dev); - int err, vf_id; + int err, vf_id, pci_id; err = vfio_pci_core_init_dev(vdev); if (err) @@ -32,6 +37,12 @@ static int pds_vfio_init_device(struct vfio_device *vdev) pds_vfio->vf_id = vf_id; + pci_id = PCI_DEVID(pdev->bus->number, pdev->devfn); + dev_dbg(&pdev->dev, + "%s: PF %#04x VF %#04x vf_id %d domain %d pds_vfio %p\n", + __func__, pci_dev_id(pdev->physfn), pci_id, vf_id, + pci_domain_nr(pdev->bus), pds_vfio); + return 0; } diff --git a/drivers/vfio/pci/pds/vfio_dev.h b/drivers/vfio/pci/pds/vfio_dev.h index a4d4b65778d1..824832aa1513 100644 --- a/drivers/vfio/pci/pds/vfio_dev.h +++ b/drivers/vfio/pci/pds/vfio_dev.h @@ -7,13 +7,19 @@ #include #include +struct pdsc; + struct pds_vfio_pci_device { struct vfio_pci_core_device vfio_coredev; + struct pdsc *pdsc; int vf_id; + u16 client_id; }; const struct vfio_device_ops *pds_vfio_ops_info(void); struct pds_vfio_pci_device *pds_vfio_pci_drvdata(struct pci_dev *pdev); +struct pci_dev *pds_vfio_to_pci_dev(struct pds_vfio_pci_device *pds_vfio); + #endif /* _VFIO_DEV_H_ */ diff --git a/include/linux/pds/pds_common.h b/include/linux/pds/pds_common.h index 435c8e8161c2..255c7c186bf5 100644 --- a/include/linux/pds/pds_common.h +++ b/include/linux/pds/pds_common.h @@ -34,12 +34,13 @@ enum pds_core_vif_types { #define PDS_DEV_TYPE_CORE_STR "Core" #define PDS_DEV_TYPE_VDPA_STR "vDPA" -#define PDS_DEV_TYPE_VFIO_STR "VFio" +#define PDS_DEV_TYPE_VFIO_STR "vfio" #define PDS_DEV_TYPE_ETH_STR "Eth" #define PDS_DEV_TYPE_RDMA_STR "RDMA" #define PDS_DEV_TYPE_LM_STR "LM" #define PDS_VDPA_DEV_NAME PDS_CORE_DRV_NAME "." PDS_DEV_TYPE_VDPA_STR +#define PDS_LM_DEV_NAME PDS_CORE_DRV_NAME "." PDS_DEV_TYPE_LM_STR "." PDS_DEV_TYPE_VFIO_STR int pdsc_register_notify(struct notifier_block *nb); void pdsc_unregister_notify(struct notifier_block *nb); From patchwork Wed Jul 19 22:35:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brett Creeley X-Patchwork-Id: 13319604 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3C21C1FB3E for ; Wed, 19 Jul 2023 22:35:58 +0000 (UTC) Received: from NAM04-DM6-obe.outbound.protection.outlook.com (mail-dm6nam04on2045.outbound.protection.outlook.com [40.107.102.45]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 959D01FF9; Wed, 19 Jul 2023 15:35:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=bCVlGcjADjesduCXHkfE1YtzwKbmjb/p5hwhXlZ+cfpUnzWf2OHneNxed7RXXv9dEct8LDkkCTDDdWHYkbliQSW926FWx7hsGkKf1i/IzQ5ZRVWWirUW/pOG31rbemoxi+PYuBHei8UPd5DOUFnbtj4nDAHgGSVubmJyMNji+1PwAabLMBd6mn1J9hVVEi6istFjJx6Xz6CA11WLifp3sJlRmjD/w5PY190fcZrXO8jTf/Wd4/G1qh3xufdsxjoyzTGUzm1aOw/grqhqrIpD83gmnYpVFcZCffDZpApCh+aC/FiVRpZ08N8krOzImbwCvXEumb7O3GuH5sxzYEzeKw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=G/nS+wkD+Sl6Nru/gCzqE+24Amh6k5Jrzo3q910jliM=; b=PHbZtWPxvrMbtY8CoVO9ETggAklsJhjYcBiJUT0AJtQ+ucPY2aUdymgFbL8tpsZE16cBdXc4qE/uDJdRWxwx+KGlrMZG6YcImOufliFBfEg8HbKjI0rRW+vS7VaNu/0hmWes+PR8M7vliY0bOisYXmHI4Kcnq8M4NHJrAhk8HvxtY+WmMmKDsmMQ+h4w352st4q2eWoaHAybn6vU/9caxss9CjqWuxt8+D1Erga6i+g5oWeUFudsiTZMUpSN1daIRjN9iNwH5ZAbO3KGD9AcMV1dRa1fJGHU6eIyudQnb2TTnqi+LLsWCbW78LKhhRuFoJI7rvr2RKDo9GoI2m7tqg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=G/nS+wkD+Sl6Nru/gCzqE+24Amh6k5Jrzo3q910jliM=; b=v6WJ3ymFIcZ1DGrnpjOX84EVBVHmttHCJdfw9GTAjl7+mq8TxgshAsykxictlJ0EPJa+FHwtp7I6EjjSA1JgOlFhwIWp3nLAaD0ZBla9YP4szpGIijdWSlO4bvZxednmsf6UnstMnUiUv3Yk2DP+qeIdDw6037gEIrrO8rmxAT0= Received: from BN8PR12CA0026.namprd12.prod.outlook.com (2603:10b6:408:60::39) by DS0PR12MB9059.namprd12.prod.outlook.com (2603:10b6:8:c5::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6609.24; Wed, 19 Jul 2023 22:35:48 +0000 Received: from BN8NAM11FT012.eop-nam11.prod.protection.outlook.com (2603:10b6:408:60:cafe::8d) by BN8PR12CA0026.outlook.office365.com (2603:10b6:408:60::39) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6609.24 via Frontend Transport; Wed, 19 Jul 2023 22:35:48 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by BN8NAM11FT012.mail.protection.outlook.com (10.13.177.55) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.6588.34 via Frontend Transport; Wed, 19 Jul 2023 22:35:48 +0000 Received: from driver-dev1.pensando.io (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Wed, 19 Jul 2023 17:35:47 -0500 From: Brett Creeley To: , , , , , , CC: , Subject: [PATCH v12 vfio 4/7] vfio/pds: Add VFIO live migration support Date: Wed, 19 Jul 2023 15:35:24 -0700 Message-ID: <20230719223527.12795-5-brett.creeley@amd.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230719223527.12795-1-brett.creeley@amd.com> References: <20230719223527.12795-1-brett.creeley@amd.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN8NAM11FT012:EE_|DS0PR12MB9059:EE_ X-MS-Office365-Filtering-Correlation-Id: 0a70d76e-7461-4792-a293-08db88a88019 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: RNvsvgTELFlSCxnvVOe6z/lqqkHVsUTcZ7Sv94E5DdSOMR2HqRetsmaBQG/niO8TsA3lObxXgbLpgEE4XAtno6Gwpji2nLkoCcuSCgfG4cQwc4M/rmmPCqpdatvPDC7feTIv/bso4ssTFLJuev3sKGE9saR2D+L53vx5Eg3BK463T0k3R2+oJ23Di9TwCT7K7VYtiv26FCjbeTAPRwswpOx/vQPD7srdcKLOWR0abb4A6k9Jn4z+xkQwyhOLOTP/txqmGvWBhcEbkgB0CLxSo/JcMvnLgmOrxfxSFgSIz1xvV4ktvuE0dYa1mjtz3zgNx6VlgZK+MMa9CH6Bm6KrqkmTON7nfmf/y2BEAbBb7hHrez+24pFZ2855vN2P7OOQ3yawCHb5GnQlO3m6ipXL6Q6OA96J0RGz/GggC/0g+YvRntdlwIZpXPXycCM4mhGTPaVbc8u/oV3/iXYHPv8s0nlGhwd4CkUfhP8QMoS/qubiBYdXwvCUQq2zxAdgQJeQP5cVWaIeOqjbkvDRtGhiQgf8Mss3Ag5wVlHvu5P9nbrPYdwVKih/svK0nCdVvthopUKkmVyEyw6JsM0Vq3NZoI3hRlBB9BactXTc1KDd56oBjWB+8by3nl5c66eLdG/c29wbg4I9S1iaGubOgdliU5H9fEtTwt1TK9DY9jLdS4xwF95dh7yJT4PGjuZLK5ha3iDTPXcv+e6Q6jF1XfaN+q1op7UykuCLa7dsMz/FQxSAUND6IXTtIEpcHpaU21oJN6aKPNbOKp5xTSuh7Gu3hA== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230028)(4636009)(376002)(346002)(39860400002)(396003)(136003)(82310400008)(451199021)(46966006)(36840700001)(40470700004)(70206006)(70586007)(40460700003)(316002)(4326008)(6666004)(356005)(81166007)(82740400003)(47076005)(26005)(2616005)(16526019)(1076003)(186003)(336012)(426003)(36860700001)(83380400001)(478600001)(30864003)(2906002)(36756003)(86362001)(54906003)(110136005)(40480700001)(44832011)(41300700001)(5660300002)(8936002)(8676002)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 19 Jul 2023 22:35:48.7635 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 0a70d76e-7461-4792-a293-08db88a88019 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT012.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS0PR12MB9059 X-Spam-Status: No, score=-1.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FORGED_SPF_HELO, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE, T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Add live migration support via the VFIO subsystem. The migration implementation aligns with the definition from uapi/vfio.h and uses the pds_core PF's adminq for device configuration. The ability to suspend, resume, and transfer VF device state data is included along with the required admin queue command structures and implementations. PDS_LM_CMD_SUSPEND and PDS_LM_CMD_SUSPEND_STATUS are added to support the VF device suspend operation. PDS_LM_CMD_RESUME is added to support the VF device resume operation. PDS_LM_CMD_STATUS is added to determine the exact size of the VF device state data. PDS_LM_CMD_SAVE is added to get the VF device state data. PDS_LM_CMD_RESTORE is added to restore the VF device with the previously saved data from PDS_LM_CMD_SAVE. PDS_LM_CMD_HOST_VF_STATUS is added to notify the DSC/firmware when a migration is in/not-in progress from the host's perspective. The DSC/firmware can use this to clear/setup any necessary state related to a migration. Signed-off-by: Brett Creeley Signed-off-by: Shannon Nelson --- drivers/vfio/pci/pds/Makefile | 1 + drivers/vfio/pci/pds/cmds.c | 324 ++++++++++++++++++++++++ drivers/vfio/pci/pds/cmds.h | 8 +- drivers/vfio/pci/pds/lm.c | 434 ++++++++++++++++++++++++++++++++ drivers/vfio/pci/pds/lm.h | 41 +++ drivers/vfio/pci/pds/pci_drv.c | 13 + drivers/vfio/pci/pds/vfio_dev.c | 120 ++++++++- drivers/vfio/pci/pds/vfio_dev.h | 11 + include/linux/pds/pds_adminq.h | 197 +++++++++++++++ 9 files changed, 1147 insertions(+), 2 deletions(-) create mode 100644 drivers/vfio/pci/pds/lm.c create mode 100644 drivers/vfio/pci/pds/lm.h diff --git a/drivers/vfio/pci/pds/Makefile b/drivers/vfio/pci/pds/Makefile index 91587c7fe8f9..47750bb31ea2 100644 --- a/drivers/vfio/pci/pds/Makefile +++ b/drivers/vfio/pci/pds/Makefile @@ -5,5 +5,6 @@ obj-$(CONFIG_PDS_VFIO_PCI) += pds-vfio-pci.o pds-vfio-pci-y := \ cmds.o \ + lm.o \ pci_drv.o \ vfio_dev.o diff --git a/drivers/vfio/pci/pds/cmds.c b/drivers/vfio/pci/pds/cmds.c index d6925bceed26..be034cc4043f 100644 --- a/drivers/vfio/pci/pds/cmds.c +++ b/drivers/vfio/pci/pds/cmds.c @@ -3,6 +3,7 @@ #include #include +#include #include #include @@ -11,6 +12,31 @@ #include "vfio_dev.h" #include "cmds.h" +#define SUSPEND_TIMEOUT_S 5 +#define SUSPEND_CHECK_INTERVAL_MS 1 + +static int pds_vfio_client_adminq_cmd(struct pds_vfio_pci_device *pds_vfio, + union pds_core_adminq_cmd *req, + union pds_core_adminq_comp *resp, + bool fast_poll) +{ + union pds_core_adminq_cmd cmd = {}; + int err; + + /* Wrap the client request */ + cmd.client_request.opcode = PDS_AQ_CMD_CLIENT_CMD; + cmd.client_request.client_id = cpu_to_le16(pds_vfio->client_id); + memcpy(cmd.client_request.client_cmd, req, + sizeof(cmd.client_request.client_cmd)); + + err = pdsc_adminq_post(pds_vfio->pdsc, &cmd, resp, fast_poll); + if (err && err != -EAGAIN) + dev_info(pds_vfio_to_dev(pds_vfio), + "client admin cmd failed: %pe\n", ERR_PTR(err)); + + return err; +} + int pds_vfio_register_client_cmd(struct pds_vfio_pci_device *pds_vfio) { struct pci_dev *pdev = pds_vfio_to_pci_dev(pds_vfio); @@ -42,3 +68,301 @@ void pds_vfio_unregister_client_cmd(struct pds_vfio_pci_device *pds_vfio) pds_vfio->client_id = 0; } + +static int +pds_vfio_suspend_wait_device_cmd(struct pds_vfio_pci_device *pds_vfio) +{ + union pds_core_adminq_cmd cmd = { + .lm_suspend_status = { + .opcode = PDS_LM_CMD_SUSPEND_STATUS, + .vf_id = cpu_to_le16(pds_vfio->vf_id), + }, + }; + struct device *dev = pds_vfio_to_dev(pds_vfio); + union pds_core_adminq_comp comp = {}; + unsigned long time_limit; + unsigned long time_start; + unsigned long time_done; + int err; + + time_start = jiffies; + time_limit = time_start + HZ * SUSPEND_TIMEOUT_S; + do { + err = pds_vfio_client_adminq_cmd(pds_vfio, &cmd, &comp, true); + if (err != -EAGAIN) + break; + + msleep(SUSPEND_CHECK_INTERVAL_MS); + } while (time_before(jiffies, time_limit)); + + time_done = jiffies; + dev_dbg(dev, "%s: vf%u: Suspend comp received in %d msecs\n", __func__, + pds_vfio->vf_id, jiffies_to_msecs(time_done - time_start)); + + /* Check the results */ + if (time_after_eq(time_done, time_limit)) { + dev_err(dev, "%s: vf%u: Suspend comp timeout\n", __func__, + pds_vfio->vf_id); + err = -ETIMEDOUT; + } + + return err; +} + +int pds_vfio_suspend_device_cmd(struct pds_vfio_pci_device *pds_vfio, u8 type) +{ + union pds_core_adminq_cmd cmd = { + .lm_suspend = { + .opcode = PDS_LM_CMD_SUSPEND, + .vf_id = cpu_to_le16(pds_vfio->vf_id), + .type = type, + }, + }; + struct device *dev = pds_vfio_to_dev(pds_vfio); + union pds_core_adminq_comp comp = {}; + int err; + + dev_dbg(dev, "vf%u: Suspend device\n", pds_vfio->vf_id); + + /* + * The initial suspend request to the firmware starts the device suspend + * operation and the firmware returns success if it's started + * successfully. + */ + err = pds_vfio_client_adminq_cmd(pds_vfio, &cmd, &comp, true); + if (err) { + dev_err(dev, "vf%u: Suspend failed: %pe\n", pds_vfio->vf_id, + ERR_PTR(err)); + return err; + } + + /* + * The subsequent suspend status request(s) check if the firmware has + * completed the device suspend process. + */ + return pds_vfio_suspend_wait_device_cmd(pds_vfio); +} + +int pds_vfio_resume_device_cmd(struct pds_vfio_pci_device *pds_vfio, u8 type) +{ + union pds_core_adminq_cmd cmd = { + .lm_resume = { + .opcode = PDS_LM_CMD_RESUME, + .vf_id = cpu_to_le16(pds_vfio->vf_id), + .type = type, + }, + }; + struct device *dev = pds_vfio_to_dev(pds_vfio); + union pds_core_adminq_comp comp = {}; + + dev_dbg(dev, "vf%u: Resume device\n", pds_vfio->vf_id); + + return pds_vfio_client_adminq_cmd(pds_vfio, &cmd, &comp, true); +} + +int pds_vfio_get_lm_status_cmd(struct pds_vfio_pci_device *pds_vfio, u64 *size) +{ + union pds_core_adminq_cmd cmd = { + .lm_status = { + .opcode = PDS_LM_CMD_STATUS, + .vf_id = cpu_to_le16(pds_vfio->vf_id), + }, + }; + struct device *dev = pds_vfio_to_dev(pds_vfio); + union pds_core_adminq_comp comp = {}; + int err; + + dev_dbg(dev, "vf%u: Get migration status\n", pds_vfio->vf_id); + + err = pds_vfio_client_adminq_cmd(pds_vfio, &cmd, &comp, false); + if (err) + return err; + + *size = le64_to_cpu(comp.lm_status.size); + return 0; +} + +static int pds_vfio_dma_map_lm_file(struct device *dev, + enum dma_data_direction dir, + struct pds_vfio_lm_file *lm_file) +{ + struct pds_lm_sg_elem *sgl, *sge; + struct scatterlist *sg; + dma_addr_t sgl_addr; + size_t sgl_size; + int err; + int i; + + if (!lm_file) + return -EINVAL; + + /* dma map file pages */ + err = dma_map_sgtable(dev, &lm_file->sg_table, dir, 0); + if (err) + return err; + + lm_file->num_sge = lm_file->sg_table.nents; + + /* alloc sgl */ + sgl_size = lm_file->num_sge * sizeof(struct pds_lm_sg_elem); + sgl = kzalloc(sgl_size, GFP_KERNEL); + if (!sgl) { + err = -ENOMEM; + goto out_unmap_sgtable; + } + + /* fill sgl */ + sge = sgl; + for_each_sgtable_dma_sg(&lm_file->sg_table, sg, i) { + sge->addr = cpu_to_le64(sg_dma_address(sg)); + sge->len = cpu_to_le32(sg_dma_len(sg)); + dev_dbg(dev, "addr = %llx, len = %u\n", sge->addr, sge->len); + sge++; + } + + sgl_addr = dma_map_single(dev, sgl, sgl_size, DMA_TO_DEVICE); + if (dma_mapping_error(dev, sgl_addr)) { + err = -EIO; + goto out_free_sgl; + } + + lm_file->sgl = sgl; + lm_file->sgl_addr = sgl_addr; + + return 0; + +out_free_sgl: + kfree(sgl); +out_unmap_sgtable: + lm_file->num_sge = 0; + dma_unmap_sgtable(dev, &lm_file->sg_table, dir, 0); + return err; +} + +static void pds_vfio_dma_unmap_lm_file(struct device *dev, + enum dma_data_direction dir, + struct pds_vfio_lm_file *lm_file) +{ + if (!lm_file) + return; + + /* free sgl */ + if (lm_file->sgl) { + dma_unmap_single(dev, lm_file->sgl_addr, + lm_file->num_sge * sizeof(*lm_file->sgl), + DMA_TO_DEVICE); + kfree(lm_file->sgl); + lm_file->sgl = NULL; + lm_file->sgl_addr = DMA_MAPPING_ERROR; + lm_file->num_sge = 0; + } + + /* dma unmap file pages */ + dma_unmap_sgtable(dev, &lm_file->sg_table, dir, 0); +} + +int pds_vfio_get_lm_state_cmd(struct pds_vfio_pci_device *pds_vfio) +{ + union pds_core_adminq_cmd cmd = { + .lm_save = { + .opcode = PDS_LM_CMD_SAVE, + .vf_id = cpu_to_le16(pds_vfio->vf_id), + }, + }; + struct pci_dev *pdev = pds_vfio_to_pci_dev(pds_vfio); + struct device *pdsc_dev = &pci_physfn(pdev)->dev; + union pds_core_adminq_comp comp = {}; + struct pds_vfio_lm_file *lm_file; + int err; + + dev_dbg(&pdev->dev, "vf%u: Get migration state\n", pds_vfio->vf_id); + + lm_file = pds_vfio->save_file; + + err = pds_vfio_dma_map_lm_file(pdsc_dev, DMA_FROM_DEVICE, lm_file); + if (err) { + dev_err(&pdev->dev, "failed to map save migration file: %pe\n", + ERR_PTR(err)); + return err; + } + + cmd.lm_save.sgl_addr = cpu_to_le64(lm_file->sgl_addr); + cmd.lm_save.num_sge = cpu_to_le32(lm_file->num_sge); + + err = pds_vfio_client_adminq_cmd(pds_vfio, &cmd, &comp, false); + if (err) + dev_err(&pdev->dev, "failed to get migration state: %pe\n", + ERR_PTR(err)); + + pds_vfio_dma_unmap_lm_file(pdsc_dev, DMA_FROM_DEVICE, lm_file); + + return err; +} + +int pds_vfio_set_lm_state_cmd(struct pds_vfio_pci_device *pds_vfio) +{ + union pds_core_adminq_cmd cmd = { + .lm_restore = { + .opcode = PDS_LM_CMD_RESTORE, + .vf_id = cpu_to_le16(pds_vfio->vf_id), + }, + }; + struct pci_dev *pdev = pds_vfio_to_pci_dev(pds_vfio); + struct device *pdsc_dev = &pci_physfn(pdev)->dev; + union pds_core_adminq_comp comp = {}; + struct pds_vfio_lm_file *lm_file; + int err; + + dev_dbg(&pdev->dev, "vf%u: Set migration state\n", pds_vfio->vf_id); + + lm_file = pds_vfio->restore_file; + + err = pds_vfio_dma_map_lm_file(pdsc_dev, DMA_TO_DEVICE, lm_file); + if (err) { + dev_err(&pdev->dev, + "failed to map restore migration file: %pe\n", + ERR_PTR(err)); + return err; + } + + cmd.lm_restore.sgl_addr = cpu_to_le64(lm_file->sgl_addr); + cmd.lm_restore.num_sge = cpu_to_le32(lm_file->num_sge); + + err = pds_vfio_client_adminq_cmd(pds_vfio, &cmd, &comp, false); + if (err) + dev_err(&pdev->dev, "failed to set migration state: %pe\n", + ERR_PTR(err)); + + pds_vfio_dma_unmap_lm_file(pdsc_dev, DMA_TO_DEVICE, lm_file); + + return err; +} + +void pds_vfio_send_host_vf_lm_status_cmd(struct pds_vfio_pci_device *pds_vfio, + enum pds_lm_host_vf_status vf_status) +{ + union pds_core_adminq_cmd cmd = { + .lm_host_vf_status = { + .opcode = PDS_LM_CMD_HOST_VF_STATUS, + .vf_id = cpu_to_le16(pds_vfio->vf_id), + .status = vf_status, + }, + }; + struct device *dev = pds_vfio_to_dev(pds_vfio); + union pds_core_adminq_comp comp = {}; + int err; + + dev_dbg(dev, "vf%u: Set host VF LM status: %u", pds_vfio->vf_id, + vf_status); + if (vf_status != PDS_LM_STA_IN_PROGRESS && + vf_status != PDS_LM_STA_NONE) { + dev_warn(dev, "Invalid host VF migration status, %d\n", + vf_status); + return; + } + + err = pds_vfio_client_adminq_cmd(pds_vfio, &cmd, &comp, false); + if (err) + dev_warn(dev, "failed to send host VF migration status: %pe\n", + ERR_PTR(err)); +} diff --git a/drivers/vfio/pci/pds/cmds.h b/drivers/vfio/pci/pds/cmds.h index 4c592afccf89..1097e96d3c11 100644 --- a/drivers/vfio/pci/pds/cmds.h +++ b/drivers/vfio/pci/pds/cmds.h @@ -6,5 +6,11 @@ int pds_vfio_register_client_cmd(struct pds_vfio_pci_device *pds_vfio); void pds_vfio_unregister_client_cmd(struct pds_vfio_pci_device *pds_vfio); - +int pds_vfio_suspend_device_cmd(struct pds_vfio_pci_device *pds_vfio, u8 type); +int pds_vfio_resume_device_cmd(struct pds_vfio_pci_device *pds_vfio, u8 type); +int pds_vfio_get_lm_status_cmd(struct pds_vfio_pci_device *pds_vfio, u64 *size); +int pds_vfio_get_lm_state_cmd(struct pds_vfio_pci_device *pds_vfio); +int pds_vfio_set_lm_state_cmd(struct pds_vfio_pci_device *pds_vfio); +void pds_vfio_send_host_vf_lm_status_cmd(struct pds_vfio_pci_device *pds_vfio, + enum pds_lm_host_vf_status vf_status); #endif /* _CMDS_H_ */ diff --git a/drivers/vfio/pci/pds/lm.c b/drivers/vfio/pci/pds/lm.c new file mode 100644 index 000000000000..10c051d46851 --- /dev/null +++ b/drivers/vfio/pci/pds/lm.c @@ -0,0 +1,434 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright(c) 2023 Advanced Micro Devices, Inc. */ + +#include +#include +#include +#include +#include +#include + +#include "vfio_dev.h" +#include "cmds.h" + +static struct pds_vfio_lm_file * +pds_vfio_get_lm_file(const struct file_operations *fops, int flags, u64 size) +{ + struct pds_vfio_lm_file *lm_file = NULL; + unsigned long long npages; + struct page **pages; + void *page_mem; + const void *p; + + if (!size) + return NULL; + + /* Alloc file structure */ + lm_file = kzalloc(sizeof(*lm_file), GFP_KERNEL); + if (!lm_file) + return NULL; + + /* Create file */ + lm_file->filep = + anon_inode_getfile("pds_vfio_lm", fops, lm_file, flags); + if (!lm_file->filep) + goto out_free_file; + + stream_open(lm_file->filep->f_inode, lm_file->filep); + mutex_init(&lm_file->lock); + + /* prevent file from being released before we are done with it */ + get_file(lm_file->filep); + + /* Allocate memory for file pages */ + npages = DIV_ROUND_UP_ULL(size, PAGE_SIZE); + pages = kmalloc_array(npages, sizeof(*pages), GFP_KERNEL); + if (!pages) + goto out_put_file; + + page_mem = kvzalloc(ALIGN(size, PAGE_SIZE), GFP_KERNEL); + if (!page_mem) + goto out_free_pages_array; + + p = page_mem - offset_in_page(page_mem); + for (unsigned long long i = 0; i < npages; i++) { + if (is_vmalloc_addr(p)) + pages[i] = vmalloc_to_page(p); + else + pages[i] = kmap_to_page((void *)p); + if (!pages[i]) + goto out_free_page_mem; + + p += PAGE_SIZE; + } + + /* Create scatterlist of file pages to use for DMA mapping later */ + if (sg_alloc_table_from_pages(&lm_file->sg_table, pages, npages, 0, + size, GFP_KERNEL)) + goto out_free_page_mem; + + lm_file->size = size; + lm_file->pages = pages; + lm_file->npages = npages; + lm_file->page_mem = page_mem; + lm_file->alloc_size = npages * PAGE_SIZE; + + return lm_file; + +out_free_page_mem: + kvfree(page_mem); +out_free_pages_array: + kfree(pages); +out_put_file: + fput(lm_file->filep); + mutex_destroy(&lm_file->lock); +out_free_file: + kfree(lm_file); + + return NULL; +} + +static void pds_vfio_put_lm_file(struct pds_vfio_lm_file *lm_file) +{ + mutex_lock(&lm_file->lock); + + lm_file->size = 0; + lm_file->alloc_size = 0; + + /* Free scatter list of file pages */ + sg_free_table(&lm_file->sg_table); + + kvfree(lm_file->page_mem); + lm_file->page_mem = NULL; + kfree(lm_file->pages); + lm_file->pages = NULL; + + mutex_unlock(&lm_file->lock); + + /* allow file to be released since we are done with it */ + fput(lm_file->filep); +} + +void pds_vfio_put_save_file(struct pds_vfio_pci_device *pds_vfio) +{ + if (!pds_vfio->save_file) + return; + + pds_vfio_put_lm_file(pds_vfio->save_file); + pds_vfio->save_file = NULL; +} + +void pds_vfio_put_restore_file(struct pds_vfio_pci_device *pds_vfio) +{ + if (!pds_vfio->restore_file) + return; + + pds_vfio_put_lm_file(pds_vfio->restore_file); + pds_vfio->restore_file = NULL; +} + +static struct page *pds_vfio_get_file_page(struct pds_vfio_lm_file *lm_file, + unsigned long offset) +{ + unsigned long cur_offset = 0; + struct scatterlist *sg; + unsigned int i; + + /* All accesses are sequential */ + if (offset < lm_file->last_offset || !lm_file->last_offset_sg) { + lm_file->last_offset = 0; + lm_file->last_offset_sg = lm_file->sg_table.sgl; + lm_file->sg_last_entry = 0; + } + + cur_offset = lm_file->last_offset; + + for_each_sg(lm_file->last_offset_sg, sg, + lm_file->sg_table.orig_nents - lm_file->sg_last_entry, i) { + if (offset < sg->length + cur_offset) { + lm_file->last_offset_sg = sg; + lm_file->sg_last_entry += i; + lm_file->last_offset = cur_offset; + return nth_page(sg_page(sg), + (offset - cur_offset) / PAGE_SIZE); + } + cur_offset += sg->length; + } + + return NULL; +} + +static int pds_vfio_release_file(struct inode *inode, struct file *filp) +{ + struct pds_vfio_lm_file *lm_file = filp->private_data; + + mutex_lock(&lm_file->lock); + lm_file->filep->f_pos = 0; + lm_file->size = 0; + mutex_unlock(&lm_file->lock); + mutex_destroy(&lm_file->lock); + kfree(lm_file); + + return 0; +} + +static ssize_t pds_vfio_save_read(struct file *filp, char __user *buf, + size_t len, loff_t *pos) +{ + struct pds_vfio_lm_file *lm_file = filp->private_data; + ssize_t done = 0; + + if (pos) + return -ESPIPE; + pos = &filp->f_pos; + + mutex_lock(&lm_file->lock); + if (*pos > lm_file->size) { + done = -EINVAL; + goto out_unlock; + } + + len = min_t(size_t, lm_file->size - *pos, len); + while (len) { + size_t page_offset; + struct page *page; + size_t page_len; + u8 *from_buff; + int err; + + page_offset = (*pos) % PAGE_SIZE; + page = pds_vfio_get_file_page(lm_file, *pos - page_offset); + if (!page) { + if (done == 0) + done = -EINVAL; + goto out_unlock; + } + + page_len = min_t(size_t, len, PAGE_SIZE - page_offset); + from_buff = kmap_local_page(page); + err = copy_to_user(buf, from_buff + page_offset, page_len); + kunmap_local(from_buff); + if (err) { + done = -EFAULT; + goto out_unlock; + } + *pos += page_len; + len -= page_len; + done += page_len; + buf += page_len; + } + +out_unlock: + mutex_unlock(&lm_file->lock); + return done; +} + +static const struct file_operations pds_vfio_save_fops = { + .owner = THIS_MODULE, + .read = pds_vfio_save_read, + .release = pds_vfio_release_file, + .llseek = no_llseek, +}; + +static int pds_vfio_get_save_file(struct pds_vfio_pci_device *pds_vfio) +{ + struct device *dev = &pds_vfio->vfio_coredev.pdev->dev; + struct pds_vfio_lm_file *lm_file; + int err; + u64 size; + + /* Get live migration state size in this state */ + err = pds_vfio_get_lm_status_cmd(pds_vfio, &size); + if (err) { + dev_err(dev, "failed to get save status: %pe\n", ERR_PTR(err)); + return err; + } + + dev_dbg(dev, "save status, size = %lld\n", size); + + if (!size) { + dev_err(dev, "invalid state size\n"); + return -EIO; + } + + lm_file = pds_vfio_get_lm_file(&pds_vfio_save_fops, O_RDONLY, size); + if (!lm_file) { + dev_err(dev, "failed to create save file\n"); + return -ENOENT; + } + + dev_dbg(dev, "size = %lld, alloc_size = %lld, npages = %lld\n", + lm_file->size, lm_file->alloc_size, lm_file->npages); + + pds_vfio->save_file = lm_file; + + return 0; +} + +static ssize_t pds_vfio_restore_write(struct file *filp, const char __user *buf, + size_t len, loff_t *pos) +{ + struct pds_vfio_lm_file *lm_file = filp->private_data; + loff_t requested_length; + ssize_t done = 0; + + if (pos) + return -ESPIPE; + + pos = &filp->f_pos; + + if (*pos < 0 || + check_add_overflow((loff_t)len, *pos, &requested_length)) + return -EINVAL; + + mutex_lock(&lm_file->lock); + + while (len) { + size_t page_offset; + struct page *page; + size_t page_len; + u8 *to_buff; + int err; + + page_offset = (*pos) % PAGE_SIZE; + page = pds_vfio_get_file_page(lm_file, *pos - page_offset); + if (!page) { + if (done == 0) + done = -EINVAL; + goto out_unlock; + } + + page_len = min_t(size_t, len, PAGE_SIZE - page_offset); + to_buff = kmap_local_page(page); + err = copy_from_user(to_buff + page_offset, buf, page_len); + kunmap_local(to_buff); + if (err) { + done = -EFAULT; + goto out_unlock; + } + *pos += page_len; + len -= page_len; + done += page_len; + buf += page_len; + lm_file->size += page_len; + } +out_unlock: + mutex_unlock(&lm_file->lock); + return done; +} + +static const struct file_operations pds_vfio_restore_fops = { + .owner = THIS_MODULE, + .write = pds_vfio_restore_write, + .release = pds_vfio_release_file, + .llseek = no_llseek, +}; + +static int pds_vfio_get_restore_file(struct pds_vfio_pci_device *pds_vfio) +{ + struct device *dev = &pds_vfio->vfio_coredev.pdev->dev; + struct pds_vfio_lm_file *lm_file; + u64 size; + + size = sizeof(union pds_lm_dev_state); + dev_dbg(dev, "restore status, size = %lld\n", size); + + if (!size) { + dev_err(dev, "invalid state size"); + return -EIO; + } + + lm_file = pds_vfio_get_lm_file(&pds_vfio_restore_fops, O_WRONLY, size); + if (!lm_file) { + dev_err(dev, "failed to create restore file"); + return -ENOENT; + } + pds_vfio->restore_file = lm_file; + + return 0; +} + +struct file * +pds_vfio_step_device_state_locked(struct pds_vfio_pci_device *pds_vfio, + enum vfio_device_mig_state next) +{ + enum vfio_device_mig_state cur = pds_vfio->state; + int err; + + if (cur == VFIO_DEVICE_STATE_STOP && next == VFIO_DEVICE_STATE_STOP_COPY) { + err = pds_vfio_get_save_file(pds_vfio); + if (err) + return ERR_PTR(err); + + err = pds_vfio_get_lm_state_cmd(pds_vfio); + if (err) { + pds_vfio_put_save_file(pds_vfio); + return ERR_PTR(err); + } + + return pds_vfio->save_file->filep; + } + + if (cur == VFIO_DEVICE_STATE_STOP_COPY && next == VFIO_DEVICE_STATE_STOP) { + pds_vfio_put_save_file(pds_vfio); + pds_vfio_send_host_vf_lm_status_cmd(pds_vfio, PDS_LM_STA_NONE); + return NULL; + } + + if (cur == VFIO_DEVICE_STATE_STOP && next == VFIO_DEVICE_STATE_RESUMING) { + err = pds_vfio_get_restore_file(pds_vfio); + if (err) + return ERR_PTR(err); + + return pds_vfio->restore_file->filep; + } + + if (cur == VFIO_DEVICE_STATE_RESUMING && next == VFIO_DEVICE_STATE_STOP) { + err = pds_vfio_set_lm_state_cmd(pds_vfio); + if (err) + return ERR_PTR(err); + + pds_vfio_put_restore_file(pds_vfio); + return NULL; + } + + if (cur == VFIO_DEVICE_STATE_RUNNING && next == VFIO_DEVICE_STATE_RUNNING_P2P) { + pds_vfio_send_host_vf_lm_status_cmd(pds_vfio, + PDS_LM_STA_IN_PROGRESS); + err = pds_vfio_suspend_device_cmd(pds_vfio, + PDS_LM_SUSPEND_RESUME_TYPE_P2P); + if (err) + return ERR_PTR(err); + + return NULL; + } + + if (cur == VFIO_DEVICE_STATE_RUNNING_P2P && next == VFIO_DEVICE_STATE_RUNNING) { + err = pds_vfio_resume_device_cmd(pds_vfio, + PDS_LM_SUSPEND_RESUME_TYPE_FULL); + if (err) + return ERR_PTR(err); + + pds_vfio_send_host_vf_lm_status_cmd(pds_vfio, PDS_LM_STA_NONE); + return NULL; + } + + if (cur == VFIO_DEVICE_STATE_STOP && next == VFIO_DEVICE_STATE_RUNNING_P2P) { + err = pds_vfio_resume_device_cmd(pds_vfio, + PDS_LM_SUSPEND_RESUME_TYPE_P2P); + if (err) + return ERR_PTR(err); + + return NULL; + } + + if (cur == VFIO_DEVICE_STATE_RUNNING_P2P && next == VFIO_DEVICE_STATE_STOP) { + err = pds_vfio_suspend_device_cmd(pds_vfio, + PDS_LM_SUSPEND_RESUME_TYPE_FULL); + if (err) + return ERR_PTR(err); + return NULL; + } + + return ERR_PTR(-EINVAL); +} diff --git a/drivers/vfio/pci/pds/lm.h b/drivers/vfio/pci/pds/lm.h new file mode 100644 index 000000000000..13be893198b7 --- /dev/null +++ b/drivers/vfio/pci/pds/lm.h @@ -0,0 +1,41 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright(c) 2023 Advanced Micro Devices, Inc. */ + +#ifndef _LM_H_ +#define _LM_H_ + +#include +#include +#include +#include + +#include +#include + +struct pds_vfio_lm_file { + struct file *filep; + struct mutex lock; /* protect live migration data file */ + u64 size; /* Size with valid data */ + u64 alloc_size; /* Total allocated size. Always >= len */ + void *page_mem; /* memory allocated for pages */ + struct page **pages; /* Backing pages for file */ + unsigned long long npages; + struct sg_table sg_table; /* SG table for backing pages */ + struct pds_lm_sg_elem *sgl; /* DMA mapping */ + dma_addr_t sgl_addr; + u16 num_sge; + struct scatterlist *last_offset_sg; /* Iterator */ + unsigned int sg_last_entry; + unsigned long last_offset; +}; + +struct pds_vfio_pci_device; + +struct file * +pds_vfio_step_device_state_locked(struct pds_vfio_pci_device *pds_vfio, + enum vfio_device_mig_state next); + +void pds_vfio_put_save_file(struct pds_vfio_pci_device *pds_vfio); +void pds_vfio_put_restore_file(struct pds_vfio_pci_device *pds_vfio); + +#endif /* _LM_H_ */ diff --git a/drivers/vfio/pci/pds/pci_drv.c b/drivers/vfio/pci/pds/pci_drv.c index 928903a84f27..c1739edd261a 100644 --- a/drivers/vfio/pci/pds/pci_drv.c +++ b/drivers/vfio/pci/pds/pci_drv.c @@ -72,11 +72,24 @@ static const struct pci_device_id pds_vfio_pci_table[] = { }; MODULE_DEVICE_TABLE(pci, pds_vfio_pci_table); +static void pds_vfio_pci_aer_reset_done(struct pci_dev *pdev) +{ + struct pds_vfio_pci_device *pds_vfio = pds_vfio_pci_drvdata(pdev); + + pds_vfio_reset(pds_vfio); +} + +static const struct pci_error_handlers pds_vfio_pci_err_handlers = { + .reset_done = pds_vfio_pci_aer_reset_done, + .error_detected = vfio_pci_core_aer_err_detected, +}; + static struct pci_driver pds_vfio_pci_driver = { .name = KBUILD_MODNAME, .id_table = pds_vfio_pci_table, .probe = pds_vfio_pci_probe, .remove = pds_vfio_pci_remove, + .err_handler = &pds_vfio_pci_err_handlers, .driver_managed_dma = true, }; diff --git a/drivers/vfio/pci/pds/vfio_dev.c b/drivers/vfio/pci/pds/vfio_dev.c index 5299cfb262d5..f4dfd5d5ba09 100644 --- a/drivers/vfio/pci/pds/vfio_dev.c +++ b/drivers/vfio/pci/pds/vfio_dev.c @@ -4,6 +4,7 @@ #include #include +#include "lm.h" #include "vfio_dev.h" struct pci_dev *pds_vfio_to_pci_dev(struct pds_vfio_pci_device *pds_vfio) @@ -11,6 +12,11 @@ struct pci_dev *pds_vfio_to_pci_dev(struct pds_vfio_pci_device *pds_vfio) return pds_vfio->vfio_coredev.pdev; } +struct device *pds_vfio_to_dev(struct pds_vfio_pci_device *pds_vfio) +{ + return &pds_vfio_to_pci_dev(pds_vfio)->dev; +} + struct pds_vfio_pci_device *pds_vfio_pci_drvdata(struct pci_dev *pdev) { struct vfio_pci_core_device *core_device = dev_get_drvdata(&pdev->dev); @@ -19,6 +25,98 @@ struct pds_vfio_pci_device *pds_vfio_pci_drvdata(struct pci_dev *pdev) vfio_coredev); } +static void pds_vfio_state_mutex_unlock(struct pds_vfio_pci_device *pds_vfio) +{ +again: + spin_lock(&pds_vfio->reset_lock); + if (pds_vfio->deferred_reset) { + pds_vfio->deferred_reset = false; + if (pds_vfio->state == VFIO_DEVICE_STATE_ERROR) { + pds_vfio->state = VFIO_DEVICE_STATE_RUNNING; + pds_vfio_put_restore_file(pds_vfio); + pds_vfio_put_save_file(pds_vfio); + } + spin_unlock(&pds_vfio->reset_lock); + goto again; + } + mutex_unlock(&pds_vfio->state_mutex); + spin_unlock(&pds_vfio->reset_lock); +} + +void pds_vfio_reset(struct pds_vfio_pci_device *pds_vfio) +{ + spin_lock(&pds_vfio->reset_lock); + pds_vfio->deferred_reset = true; + if (!mutex_trylock(&pds_vfio->state_mutex)) { + spin_unlock(&pds_vfio->reset_lock); + return; + } + spin_unlock(&pds_vfio->reset_lock); + pds_vfio_state_mutex_unlock(pds_vfio); +} + +static struct file * +pds_vfio_set_device_state(struct vfio_device *vdev, + enum vfio_device_mig_state new_state) +{ + struct pds_vfio_pci_device *pds_vfio = + container_of(vdev, struct pds_vfio_pci_device, + vfio_coredev.vdev); + struct file *res = NULL; + + mutex_lock(&pds_vfio->state_mutex); + while (new_state != pds_vfio->state) { + enum vfio_device_mig_state next_state; + + int err = vfio_mig_get_next_state(vdev, pds_vfio->state, + new_state, &next_state); + if (err) { + res = ERR_PTR(err); + break; + } + + res = pds_vfio_step_device_state_locked(pds_vfio, next_state); + if (IS_ERR(res)) + break; + + pds_vfio->state = next_state; + + if (WARN_ON(res && new_state != pds_vfio->state)) { + res = ERR_PTR(-EINVAL); + break; + } + } + pds_vfio_state_mutex_unlock(pds_vfio); + + return res; +} + +static int pds_vfio_get_device_state(struct vfio_device *vdev, + enum vfio_device_mig_state *current_state) +{ + struct pds_vfio_pci_device *pds_vfio = + container_of(vdev, struct pds_vfio_pci_device, + vfio_coredev.vdev); + + mutex_lock(&pds_vfio->state_mutex); + *current_state = pds_vfio->state; + pds_vfio_state_mutex_unlock(pds_vfio); + return 0; +} + +static int pds_vfio_get_device_state_size(struct vfio_device *vdev, + unsigned long *stop_copy_length) +{ + *stop_copy_length = PDS_LM_DEVICE_STATE_LENGTH; + return 0; +} + +static const struct vfio_migration_ops pds_vfio_lm_ops = { + .migration_set_state = pds_vfio_set_device_state, + .migration_get_state = pds_vfio_get_device_state, + .migration_get_data_size = pds_vfio_get_device_state_size +}; + static int pds_vfio_init_device(struct vfio_device *vdev) { struct pds_vfio_pci_device *pds_vfio = @@ -37,6 +135,9 @@ static int pds_vfio_init_device(struct vfio_device *vdev) pds_vfio->vf_id = vf_id; + vdev->migration_flags = VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P; + vdev->mig_ops = &pds_vfio_lm_ops; + pci_id = PCI_DEVID(pdev->bus->number, pdev->devfn); dev_dbg(&pdev->dev, "%s: PF %#04x VF %#04x vf_id %d domain %d pds_vfio %p\n", @@ -57,17 +158,34 @@ static int pds_vfio_open_device(struct vfio_device *vdev) if (err) return err; + mutex_init(&pds_vfio->state_mutex); + pds_vfio->state = VFIO_DEVICE_STATE_RUNNING; + vfio_pci_core_finish_enable(&pds_vfio->vfio_coredev); return 0; } +static void pds_vfio_close_device(struct vfio_device *vdev) +{ + struct pds_vfio_pci_device *pds_vfio = + container_of(vdev, struct pds_vfio_pci_device, + vfio_coredev.vdev); + + mutex_lock(&pds_vfio->state_mutex); + pds_vfio_put_restore_file(pds_vfio); + pds_vfio_put_save_file(pds_vfio); + mutex_unlock(&pds_vfio->state_mutex); + mutex_destroy(&pds_vfio->state_mutex); + vfio_pci_core_close_device(vdev); +} + static const struct vfio_device_ops pds_vfio_ops = { .name = "pds-vfio", .init = pds_vfio_init_device, .release = vfio_pci_core_release_dev, .open_device = pds_vfio_open_device, - .close_device = vfio_pci_core_close_device, + .close_device = pds_vfio_close_device, .ioctl = vfio_pci_core_ioctl, .device_feature = vfio_pci_core_ioctl_feature, .read = vfio_pci_core_read, diff --git a/drivers/vfio/pci/pds/vfio_dev.h b/drivers/vfio/pci/pds/vfio_dev.h index 824832aa1513..31bd14de0c91 100644 --- a/drivers/vfio/pci/pds/vfio_dev.h +++ b/drivers/vfio/pci/pds/vfio_dev.h @@ -7,19 +7,30 @@ #include #include +#include "lm.h" + struct pdsc; struct pds_vfio_pci_device { struct vfio_pci_core_device vfio_coredev; struct pdsc *pdsc; + struct pds_vfio_lm_file *save_file; + struct pds_vfio_lm_file *restore_file; + struct mutex state_mutex; /* protect migration state */ + enum vfio_device_mig_state state; + spinlock_t reset_lock; /* protect reset_done flow */ + u8 deferred_reset; + int vf_id; u16 client_id; }; const struct vfio_device_ops *pds_vfio_ops_info(void); struct pds_vfio_pci_device *pds_vfio_pci_drvdata(struct pci_dev *pdev); +void pds_vfio_reset(struct pds_vfio_pci_device *pds_vfio); struct pci_dev *pds_vfio_to_pci_dev(struct pds_vfio_pci_device *pds_vfio); +struct device *pds_vfio_to_dev(struct pds_vfio_pci_device *pds_vfio); #endif /* _VFIO_DEV_H_ */ diff --git a/include/linux/pds/pds_adminq.h b/include/linux/pds/pds_adminq.h index bcba7fda3cc9..29ac6514e421 100644 --- a/include/linux/pds/pds_adminq.h +++ b/include/linux/pds/pds_adminq.h @@ -818,6 +818,194 @@ struct pds_vdpa_set_features_cmd { __le64 features; }; +#define PDS_LM_DEVICE_STATE_LENGTH 65536 +#define PDS_LM_CHECK_DEVICE_STATE_LENGTH(X) \ + PDS_CORE_SIZE_CHECK(union, PDS_LM_DEVICE_STATE_LENGTH, X) + +/* + * enum pds_lm_cmd_opcode - Live Migration Device commands + */ +enum pds_lm_cmd_opcode { + PDS_LM_CMD_HOST_VF_STATUS = 1, + + /* Device state commands */ + PDS_LM_CMD_STATUS = 16, + PDS_LM_CMD_SUSPEND = 18, + PDS_LM_CMD_SUSPEND_STATUS = 19, + PDS_LM_CMD_RESUME = 20, + PDS_LM_CMD_SAVE = 21, + PDS_LM_CMD_RESTORE = 22, +}; + +/** + * struct pds_lm_cmd - generic command + * @opcode: Opcode + * @rsvd: Word boundary padding + * @vf_id: VF id + * @rsvd2: Structure padding to 60 Bytes + */ +struct pds_lm_cmd { + u8 opcode; + u8 rsvd; + __le16 vf_id; + u8 rsvd2[56]; +}; + +/** + * struct pds_lm_status_cmd - STATUS command + * @opcode: Opcode + * @rsvd: Word boundary padding + * @vf_id: VF id + */ +struct pds_lm_status_cmd { + u8 opcode; + u8 rsvd; + __le16 vf_id; +}; + +/** + * struct pds_lm_status_comp - STATUS command completion + * @status: Status of the command (enum pds_core_status_code) + * @rsvd: Word boundary padding + * @comp_index: Index in the desc ring for which this is the completion + * @size: Size of the device state + * @rsvd2: Word boundary padding + * @color: Color bit + */ +struct pds_lm_status_comp { + u8 status; + u8 rsvd; + __le16 comp_index; + union { + __le64 size; + u8 rsvd2[11]; + } __packed; + u8 color; +}; + +enum pds_lm_suspend_resume_type { + PDS_LM_SUSPEND_RESUME_TYPE_FULL = 0, + PDS_LM_SUSPEND_RESUME_TYPE_P2P = 1, +}; + +/** + * struct pds_lm_suspend_cmd - SUSPEND command + * @opcode: Opcode PDS_LM_CMD_SUSPEND + * @rsvd: Word boundary padding + * @vf_id: VF id + * @type: Type of suspend (enum pds_lm_suspend_resume_type) + */ +struct pds_lm_suspend_cmd { + u8 opcode; + u8 rsvd; + __le16 vf_id; + u8 type; +}; + +/** + * struct pds_lm_suspend_status_cmd - SUSPEND status command + * @opcode: Opcode PDS_AQ_CMD_LM_SUSPEND_STATUS + * @rsvd: Word boundary padding + * @vf_id: VF id + * @type: Type of suspend (enum pds_lm_suspend_resume_type) + */ +struct pds_lm_suspend_status_cmd { + u8 opcode; + u8 rsvd; + __le16 vf_id; + u8 type; +}; + +/** + * struct pds_lm_resume_cmd - RESUME command + * @opcode: Opcode PDS_LM_CMD_RESUME + * @rsvd: Word boundary padding + * @vf_id: VF id + * @type: Type of resume (enum pds_lm_suspend_resume_type) + */ +struct pds_lm_resume_cmd { + u8 opcode; + u8 rsvd; + __le16 vf_id; + u8 type; +}; + +/** + * struct pds_lm_sg_elem - Transmit scatter-gather (SG) descriptor element + * @addr: DMA address of SG element data buffer + * @len: Length of SG element data buffer, in bytes + * @rsvd: Word boundary padding + */ +struct pds_lm_sg_elem { + __le64 addr; + __le32 len; + __le16 rsvd[2]; +}; + +/** + * struct pds_lm_save_cmd - SAVE command + * @opcode: Opcode PDS_LM_CMD_SAVE + * @rsvd: Word boundary padding + * @vf_id: VF id + * @rsvd2: Word boundary padding + * @sgl_addr: IOVA address of the SGL to dma the device state + * @num_sge: Total number of SG elements + */ +struct pds_lm_save_cmd { + u8 opcode; + u8 rsvd; + __le16 vf_id; + u8 rsvd2[4]; + __le64 sgl_addr; + __le32 num_sge; +} __packed; + +/** + * struct pds_lm_restore_cmd - RESTORE command + * @opcode: Opcode PDS_LM_CMD_RESTORE + * @rsvd: Word boundary padding + * @vf_id: VF id + * @rsvd2: Word boundary padding + * @sgl_addr: IOVA address of the SGL to dma the device state + * @num_sge: Total number of SG elements + */ +struct pds_lm_restore_cmd { + u8 opcode; + u8 rsvd; + __le16 vf_id; + u8 rsvd2[4]; + __le64 sgl_addr; + __le32 num_sge; +} __packed; + +/** + * union pds_lm_dev_state - device state information + * @words: Device state words + */ +union pds_lm_dev_state { + __le32 words[PDS_LM_DEVICE_STATE_LENGTH / sizeof(__le32)]; +}; + +enum pds_lm_host_vf_status { + PDS_LM_STA_NONE = 0, + PDS_LM_STA_IN_PROGRESS, + PDS_LM_STA_MAX, +}; + +/** + * struct pds_lm_host_vf_status_cmd - HOST_VF_STATUS command + * @opcode: Opcode PDS_LM_CMD_HOST_VF_STATUS + * @rsvd: Word boundary padding + * @vf_id: VF id + * @status: Current LM status of host VF driver (enum pds_lm_host_status) + */ +struct pds_lm_host_vf_status_cmd { + u8 opcode; + u8 rsvd; + __le16 vf_id; + u8 status; +}; + union pds_core_adminq_cmd { u8 opcode; u8 bytes[64]; @@ -844,6 +1032,13 @@ union pds_core_adminq_cmd { struct pds_vdpa_vq_init_cmd vdpa_vq_init; struct pds_vdpa_vq_reset_cmd vdpa_vq_reset; + struct pds_lm_suspend_cmd lm_suspend; + struct pds_lm_suspend_status_cmd lm_suspend_status; + struct pds_lm_resume_cmd lm_resume; + struct pds_lm_status_cmd lm_status; + struct pds_lm_save_cmd lm_save; + struct pds_lm_restore_cmd lm_restore; + struct pds_lm_host_vf_status_cmd lm_host_vf_status; }; union pds_core_adminq_comp { @@ -868,6 +1063,8 @@ union pds_core_adminq_comp { struct pds_vdpa_vq_init_comp vdpa_vq_init; struct pds_vdpa_vq_reset_comp vdpa_vq_reset; + + struct pds_lm_status_comp lm_status; }; #ifndef __CHECKER__ From patchwork Wed Jul 19 22:35:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brett Creeley X-Patchwork-Id: 13319605 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EB1B622EED for ; Wed, 19 Jul 2023 22:35:59 +0000 (UTC) Received: from NAM10-MW2-obe.outbound.protection.outlook.com (mail-mw2nam10on2062a.outbound.protection.outlook.com [IPv6:2a01:111:f400:7e89::62a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E99EF2101; Wed, 19 Jul 2023 15:35:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Oxo8C4b39BksxshqNuUPgKybTzy/wFXQxcDIDif61D9pvC54C6SVd/YIFeztuae9N4TtobbU+Nqsiqm5Oh+6GTd6A5wQsD0mayGdCSSZieiKrb9/LU4ON3hnmhQL9gfQ/y5q+IuNng+sf3PSPSqRUijhPKLdkmb1mNZuIyxhionJKaViqRlE4BfRAsRZ8oOAEQSnOJaRb9Aiea1SQi6v8P4Y2c2dxffQkQg7cZAcQqSje8OPBSgtnQR9QhFgv0mh07yqUaCUZrO+s3T3OkpmxMtjtUdrtXGDZzMw8Nd8fE0KJinI0Kb/AlnNRnlIv0n9IKSMzhO2NKtVmzQ6fom3fg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=VRrYpU0IGZaF9uULMB0s+w1LhNbBNcK2AetTZPrK+IQ=; b=I9BReCQymPb5MsVC4Wjir4G32pe13J/sp3NTn42NpbtU7H7fMF91J/O1eshO1kQ6jkBmhc0B/U6gmva8CgFkJEUo2gM8bAzYZhxYXyKjKeoWnxR8RSVWFALc9+QzfougGphfAamZgGcfRSOzgvLBx6BpNF9k1RhIvb42ckV5818hg9ZFlIEYd0yOVCLEFza+KWT7yCVr6PNXOuosDp9PZGxKwaCpNDJ1ccfvZr34yAf1AI4Pi+oa+WI9nmqH79f2HHjzi6b6L0EQ5h1uF5yhmFjnHG8ZQRKFkVCPXsajqUjDa4DwecwxgiukUsi6vqZ+qpSArcoJMUgHXQUECuQkCQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=VRrYpU0IGZaF9uULMB0s+w1LhNbBNcK2AetTZPrK+IQ=; b=cQ/dHQbrAgtvQnUOgTGZ5UeCbXLOso/Cj5+kMUV/7i4uFybXWvL7uQtD00lSAaIvjYKA9p5253qgt2BJl3Q2sV5gKCgCjq7edLJodb8HKZ7bOJDIhj8lyrPj/oSrAbcQfKYjtHSi/dmyifN5Gs9FYHWd/e5xc8t35BXBZwTlvPI= Received: from BN0PR08CA0013.namprd08.prod.outlook.com (2603:10b6:408:142::21) by MW4PR12MB7032.namprd12.prod.outlook.com (2603:10b6:303:1e9::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6609.23; Wed, 19 Jul 2023 22:35:51 +0000 Received: from BN8NAM11FT107.eop-nam11.prod.protection.outlook.com (2603:10b6:408:142:cafe::b6) by BN0PR08CA0013.outlook.office365.com (2603:10b6:408:142::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6609.24 via Frontend Transport; Wed, 19 Jul 2023 22:35:50 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by BN8NAM11FT107.mail.protection.outlook.com (10.13.176.149) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.6609.25 via Frontend Transport; Wed, 19 Jul 2023 22:35:50 +0000 Received: from driver-dev1.pensando.io (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Wed, 19 Jul 2023 17:35:49 -0500 From: Brett Creeley To: , , , , , , CC: , Subject: [PATCH v12 vfio 5/7] vfio/pds: Add support for dirty page tracking Date: Wed, 19 Jul 2023 15:35:25 -0700 Message-ID: <20230719223527.12795-6-brett.creeley@amd.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230719223527.12795-1-brett.creeley@amd.com> References: <20230719223527.12795-1-brett.creeley@amd.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN8NAM11FT107:EE_|MW4PR12MB7032:EE_ X-MS-Office365-Filtering-Correlation-Id: dc3a207c-6317-4a10-18f4-08db88a88138 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 8eXLVlS9hKQtdWhyBrY0iXyXGF2kJmsPYv5EQSE1YUI3BCZjpn9TWx2jB2RnC8DNRnxAg/8UojH9qiNXqbS/Lx9F/MkLFUfUVS5USldEVgLZMrZ4KOu1PaWLkU/3IQ3soNCKFxROeB9tXGqPhjiE65JBKkuyWIEBwOvaNObxDUat7kT4orU6f3oUjcIeXM3RmwD46P+sBfcEEXPrizrGFSVJxBc1wurbLbII+mllCfRf+4hgB++BMZAAKFLtc7WAo8FGdlIFPqzb0x3P2ZGm69tB7mYWogBsee70E3+t25ZmKr9YpBuj7cXRyPCr21zpb6CwdNNlDB5jP+Q30DkufYCTrAnYdeB8dHnhbpmWiHuTFMyivxaC+TXXNdnnmKs/AWZCbqPDq/OEq0NbA7ps/LSkrtyKs7eVLeN6KTUj1Ahhh1TLQL7RXa+Q7wEhEIxk7JmWnZLDh6kyOccKG4IFtdfBs0HpsZ7Pq/LrQ/aUaLAWLd1Xm6q2xxDKN+V0PPotqczyHWQFCMvRiVdzZtL2GGuyk0y1FDK3Bh8BUmcGnDLuAglnXsM7Rvm7IcGWIYgihy/NcEpsl2VdJm03jeOD5BMBMJ7BL2RrJAAZ3u+Gs83iExepgo7oz7oyUZzeFA3ATSJOO2vXlXhXRbf/6kWGiA8X/hIaSXpFz9I4xuuGcnaN4ubRY5Lvcxl4JDMtF1DDs0OWmW6lVScl57iaLURqiuVeTAwmbIxlH2NBkT3TJZz7Kkx9T4ThzTfOf5PC3zYleJukKV4rFunfspRrNdUxpZMVwMM0WSEETAwfThpMksc= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230028)(4636009)(346002)(376002)(396003)(136003)(39860400002)(82310400008)(451199021)(46966006)(36840700001)(40470700004)(316002)(41300700001)(4326008)(5660300002)(8676002)(8936002)(426003)(36860700001)(2616005)(83380400001)(47076005)(86362001)(81166007)(40480700001)(82740400003)(356005)(70206006)(70586007)(336012)(36756003)(186003)(26005)(40460700003)(16526019)(110136005)(54906003)(6666004)(1076003)(478600001)(44832011)(2906002)(30864003)(14143004)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 19 Jul 2023 22:35:50.6408 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: dc3a207c-6317-4a10-18f4-08db88a88138 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT107.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW4PR12MB7032 X-Spam-Status: No, score=-1.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FORGED_SPF_HELO, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_PASS,SPF_NONE,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net In order to support dirty page tracking, the driver has to implement the VFIO subsystem's vfio_log_ops. This includes log_start, log_stop, and log_read_and_clear. All of the tracker resources are allocated and dirty tracking on the device is started during log_start. The resources are cleaned up and dirty tracking on the device is stopped during log_stop. The dirty pages are determined and reported during log_read_and_clear. In order to support these callbacks admin queue commands are used. All of the adminq queue command structures and implementations are included as part of this patch. PDS_LM_CMD_DIRTY_STATUS is added to query the current status of dirty tracking on the device. This includes if it's enabled (i.e. number of regions being tracked from the device's perspective) and the maximum number of regions supported from the device's perspective. PDS_LM_CMD_DIRTY_ENABLE is added to enable dirty tracking on the specified number of regions and their iova ranges. PDS_LM_CMD_DIRTY_DISABLE is added to disable dirty tracking for all regions on the device. PDS_LM_CMD_READ_SEQ and PDS_LM_CMD_DIRTY_WRITE_ACK are added to support reading and acknowledging the currently dirtied pages. Signed-off-by: Brett Creeley Signed-off-by: Shannon Nelson --- drivers/vfio/pci/pds/Makefile | 1 + drivers/vfio/pci/pds/cmds.c | 125 +++++++ drivers/vfio/pci/pds/cmds.h | 9 + drivers/vfio/pci/pds/dirty.c | 576 ++++++++++++++++++++++++++++++++ drivers/vfio/pci/pds/dirty.h | 39 +++ drivers/vfio/pci/pds/lm.c | 2 +- drivers/vfio/pci/pds/vfio_dev.c | 12 +- drivers/vfio/pci/pds/vfio_dev.h | 4 + include/linux/pds/pds_adminq.h | 178 ++++++++++ 9 files changed, 944 insertions(+), 2 deletions(-) create mode 100644 drivers/vfio/pci/pds/dirty.c create mode 100644 drivers/vfio/pci/pds/dirty.h diff --git a/drivers/vfio/pci/pds/Makefile b/drivers/vfio/pci/pds/Makefile index 47750bb31ea2..d5a06d81634f 100644 --- a/drivers/vfio/pci/pds/Makefile +++ b/drivers/vfio/pci/pds/Makefile @@ -5,6 +5,7 @@ obj-$(CONFIG_PDS_VFIO_PCI) += pds-vfio-pci.o pds-vfio-pci-y := \ cmds.o \ + dirty.o \ lm.o \ pci_drv.o \ vfio_dev.o diff --git a/drivers/vfio/pci/pds/cmds.c b/drivers/vfio/pci/pds/cmds.c index be034cc4043f..eeacdee3f2b5 100644 --- a/drivers/vfio/pci/pds/cmds.c +++ b/drivers/vfio/pci/pds/cmds.c @@ -366,3 +366,128 @@ void pds_vfio_send_host_vf_lm_status_cmd(struct pds_vfio_pci_device *pds_vfio, dev_warn(dev, "failed to send host VF migration status: %pe\n", ERR_PTR(err)); } + +int pds_vfio_dirty_status_cmd(struct pds_vfio_pci_device *pds_vfio, + u64 regions_dma, u8 *max_regions, u8 *num_regions) +{ + union pds_core_adminq_cmd cmd = { + .lm_dirty_status = { + .opcode = PDS_LM_CMD_DIRTY_STATUS, + .vf_id = cpu_to_le16(pds_vfio->vf_id), + }, + }; + struct device *dev = pds_vfio_to_dev(pds_vfio); + union pds_core_adminq_comp comp = {}; + int err; + + dev_dbg(dev, "vf%u: Dirty status\n", pds_vfio->vf_id); + + cmd.lm_dirty_status.regions_dma = cpu_to_le64(regions_dma); + cmd.lm_dirty_status.max_regions = *max_regions; + + err = pds_vfio_client_adminq_cmd(pds_vfio, &cmd, &comp, false); + if (err) { + dev_err(dev, "failed to get dirty status: %pe\n", ERR_PTR(err)); + return err; + } + + /* only support seq_ack approach for now */ + if (!(le32_to_cpu(comp.lm_dirty_status.bmp_type_mask) & + BIT(PDS_LM_DIRTY_BMP_TYPE_SEQ_ACK))) { + dev_err(dev, "Dirty bitmap tracking SEQ_ACK not supported\n"); + return -EOPNOTSUPP; + } + + *num_regions = comp.lm_dirty_status.num_regions; + *max_regions = comp.lm_dirty_status.max_regions; + + dev_dbg(dev, + "Page Tracking Status command successful, max_regions: %d, num_regions: %d, bmp_type: %s\n", + *max_regions, *num_regions, "PDS_LM_DIRTY_BMP_TYPE_SEQ_ACK"); + + return 0; +} + +int pds_vfio_dirty_enable_cmd(struct pds_vfio_pci_device *pds_vfio, + u64 regions_dma, u8 num_regions) +{ + union pds_core_adminq_cmd cmd = { + .lm_dirty_enable = { + .opcode = PDS_LM_CMD_DIRTY_ENABLE, + .vf_id = cpu_to_le16(pds_vfio->vf_id), + .regions_dma = cpu_to_le64(regions_dma), + .bmp_type = PDS_LM_DIRTY_BMP_TYPE_SEQ_ACK, + .num_regions = num_regions, + }, + }; + struct device *dev = pds_vfio_to_dev(pds_vfio); + union pds_core_adminq_comp comp = {}; + int err; + + err = pds_vfio_client_adminq_cmd(pds_vfio, &cmd, &comp, false); + if (err) { + dev_err(dev, "failed dirty tracking enable: %pe\n", + ERR_PTR(err)); + return err; + } + + return 0; +} + +int pds_vfio_dirty_disable_cmd(struct pds_vfio_pci_device *pds_vfio) +{ + union pds_core_adminq_cmd cmd = { + .lm_dirty_disable = { + .opcode = PDS_LM_CMD_DIRTY_DISABLE, + .vf_id = cpu_to_le16(pds_vfio->vf_id), + }, + }; + struct device *dev = pds_vfio_to_dev(pds_vfio); + union pds_core_adminq_comp comp = {}; + int err; + + err = pds_vfio_client_adminq_cmd(pds_vfio, &cmd, &comp, false); + if (err || comp.lm_dirty_status.num_regions != 0) { + /* in case num_regions is still non-zero after disable */ + err = err ? err : -EIO; + dev_err(dev, + "failed dirty tracking disable: %pe, num_regions %d\n", + ERR_PTR(err), comp.lm_dirty_status.num_regions); + return err; + } + + return 0; +} + +int pds_vfio_dirty_seq_ack_cmd(struct pds_vfio_pci_device *pds_vfio, + u64 sgl_dma, u16 num_sge, u32 offset, + u32 total_len, bool read_seq) +{ + const char *cmd_type_str = read_seq ? "read_seq" : "write_ack"; + union pds_core_adminq_cmd cmd = { + .lm_dirty_seq_ack = { + .vf_id = cpu_to_le16(pds_vfio->vf_id), + .len_bytes = cpu_to_le32(total_len), + .off_bytes = cpu_to_le32(offset), + .sgl_addr = cpu_to_le64(sgl_dma), + .num_sge = cpu_to_le16(num_sge), + }, + }; + struct device *dev = pds_vfio_to_dev(pds_vfio); + union pds_core_adminq_comp comp = {}; + int err; + + if (read_seq) + cmd.lm_dirty_seq_ack.opcode = PDS_LM_CMD_DIRTY_READ_SEQ; + else + cmd.lm_dirty_seq_ack.opcode = PDS_LM_CMD_DIRTY_WRITE_ACK; + + err = pds_vfio_client_adminq_cmd(pds_vfio, &cmd, &comp, false); + if (err) { + dev_err(dev, "failed cmd Page Tracking %s: %pe\n", cmd_type_str, + ERR_PTR(err)); + return err; + } + + return 0; +} diff --git a/drivers/vfio/pci/pds/cmds.h b/drivers/vfio/pci/pds/cmds.h index 1097e96d3c11..8d20118fb4a6 100644 --- a/drivers/vfio/pci/pds/cmds.h +++ b/drivers/vfio/pci/pds/cmds.h @@ -13,4 +13,13 @@ int pds_vfio_get_lm_state_cmd(struct pds_vfio_pci_device *pds_vfio); int pds_vfio_set_lm_state_cmd(struct pds_vfio_pci_device *pds_vfio); void pds_vfio_send_host_vf_lm_status_cmd(struct pds_vfio_pci_device *pds_vfio, enum pds_lm_host_vf_status vf_status); +int pds_vfio_dirty_status_cmd(struct pds_vfio_pci_device *pds_vfio, + u64 regions_dma, u8 *max_regions, + u8 *num_regions); +int pds_vfio_dirty_enable_cmd(struct pds_vfio_pci_device *pds_vfio, + u64 regions_dma, u8 num_regions); +int pds_vfio_dirty_disable_cmd(struct pds_vfio_pci_device *pds_vfio); +int pds_vfio_dirty_seq_ack_cmd(struct pds_vfio_pci_device *pds_vfio, + u64 sgl_dma, u16 num_sge, u32 offset, + u32 total_len, bool read_seq); #endif /* _CMDS_H_ */ diff --git a/drivers/vfio/pci/pds/dirty.c b/drivers/vfio/pci/pds/dirty.c new file mode 100644 index 000000000000..7ec8c9f16d3b --- /dev/null +++ b/drivers/vfio/pci/pds/dirty.c @@ -0,0 +1,576 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright(c) 2023 Advanced Micro Devices, Inc. */ + +#include +#include + +#include +#include +#include + +#include "vfio_dev.h" +#include "cmds.h" +#include "dirty.h" + +#define READ_SEQ true +#define WRITE_ACK false + +bool pds_vfio_dirty_is_enabled(struct pds_vfio_pci_device *pds_vfio) +{ + return pds_vfio->dirty.is_enabled; +} + +void pds_vfio_dirty_set_enabled(struct pds_vfio_pci_device *pds_vfio) +{ + pds_vfio->dirty.is_enabled = true; +} + +void pds_vfio_dirty_set_disabled(struct pds_vfio_pci_device *pds_vfio) +{ + pds_vfio->dirty.is_enabled = false; +} + +static void +pds_vfio_print_guest_region_info(struct pds_vfio_pci_device *pds_vfio, + u8 max_regions) +{ + int len = max_regions * sizeof(struct pds_lm_dirty_region_info); + struct pci_dev *pdev = pds_vfio->vfio_coredev.pdev; + struct device *pdsc_dev = &pci_physfn(pdev)->dev; + struct pds_lm_dirty_region_info *region_info; + dma_addr_t regions_dma; + u8 num_regions; + int err; + + region_info = kcalloc(max_regions, + sizeof(struct pds_lm_dirty_region_info), + GFP_KERNEL); + if (!region_info) + return; + + regions_dma = + dma_map_single(pdsc_dev, region_info, len, DMA_FROM_DEVICE); + if (dma_mapping_error(pdsc_dev, regions_dma)) + goto out_free_region_info; + + err = pds_vfio_dirty_status_cmd(pds_vfio, regions_dma, &max_regions, + &num_regions); + dma_unmap_single(pdsc_dev, regions_dma, len, DMA_FROM_DEVICE); + if (err) + goto out_free_region_info; + + for (unsigned int i = 0; i < num_regions; i++) + dev_dbg(&pdev->dev, + "region_info[%d]: dma_base 0x%llx page_count %u page_size_log2 %u\n", + i, le64_to_cpu(region_info[i].dma_base), + le32_to_cpu(region_info[i].page_count), + region_info[i].page_size_log2); + +out_free_region_info: + kfree(region_info); +} + +static int pds_vfio_dirty_alloc_bitmaps(struct pds_vfio_dirty *dirty, + unsigned long bytes) +{ + unsigned long *host_seq_bmp, *host_ack_bmp; + + host_seq_bmp = vzalloc(bytes); + if (!host_seq_bmp) + return -ENOMEM; + + host_ack_bmp = vzalloc(bytes); + if (!host_ack_bmp) { + bitmap_free(host_seq_bmp); + return -ENOMEM; + } + + dirty->host_seq.bmp = host_seq_bmp; + dirty->host_ack.bmp = host_ack_bmp; + + return 0; +} + +static void pds_vfio_dirty_free_bitmaps(struct pds_vfio_dirty *dirty) +{ + if (dirty->host_seq.bmp) + vfree(dirty->host_seq.bmp); + if (dirty->host_ack.bmp) + vfree(dirty->host_ack.bmp); + + dirty->host_seq.bmp = NULL; + dirty->host_ack.bmp = NULL; +} + +static void __pds_vfio_dirty_free_sgl(struct pds_vfio_pci_device *pds_vfio, + struct pds_vfio_bmp_info *bmp_info) +{ + struct pci_dev *pdev = pds_vfio->vfio_coredev.pdev; + struct device *pdsc_dev = &pci_physfn(pdev)->dev; + + dma_unmap_single(pdsc_dev, bmp_info->sgl_addr, + bmp_info->num_sge * sizeof(struct pds_lm_sg_elem), + DMA_BIDIRECTIONAL); + kfree(bmp_info->sgl); + + bmp_info->num_sge = 0; + bmp_info->sgl = NULL; + bmp_info->sgl_addr = 0; +} + +static void pds_vfio_dirty_free_sgl(struct pds_vfio_pci_device *pds_vfio) +{ + if (pds_vfio->dirty.host_seq.sgl) + __pds_vfio_dirty_free_sgl(pds_vfio, &pds_vfio->dirty.host_seq); + if (pds_vfio->dirty.host_ack.sgl) + __pds_vfio_dirty_free_sgl(pds_vfio, &pds_vfio->dirty.host_ack); +} + +static int __pds_vfio_dirty_alloc_sgl(struct pds_vfio_pci_device *pds_vfio, + struct pds_vfio_bmp_info *bmp_info, + u32 page_count) +{ + struct pci_dev *pdev = pds_vfio->vfio_coredev.pdev; + struct device *pdsc_dev = &pci_physfn(pdev)->dev; + struct pds_lm_sg_elem *sgl; + dma_addr_t sgl_addr; + size_t sgl_size; + u32 max_sge; + + max_sge = DIV_ROUND_UP(page_count, PAGE_SIZE * 8); + sgl_size = max_sge * sizeof(struct pds_lm_sg_elem); + + sgl = kzalloc(sgl_size, GFP_KERNEL); + if (!sgl) + return -ENOMEM; + + sgl_addr = dma_map_single(pdsc_dev, sgl, sgl_size, DMA_BIDIRECTIONAL); + if (dma_mapping_error(pdsc_dev, sgl_addr)) { + kfree(sgl); + return -EIO; + } + + bmp_info->sgl = sgl; + bmp_info->num_sge = max_sge; + bmp_info->sgl_addr = sgl_addr; + + return 0; +} + +static int pds_vfio_dirty_alloc_sgl(struct pds_vfio_pci_device *pds_vfio, + u32 page_count) +{ + struct pds_vfio_dirty *dirty = &pds_vfio->dirty; + int err; + + err = __pds_vfio_dirty_alloc_sgl(pds_vfio, &dirty->host_seq, + page_count); + if (err) + return err; + + err = __pds_vfio_dirty_alloc_sgl(pds_vfio, &dirty->host_ack, + page_count); + if (err) { + __pds_vfio_dirty_free_sgl(pds_vfio, &dirty->host_seq); + return err; + } + + return 0; +} + +static int pds_vfio_dirty_enable(struct pds_vfio_pci_device *pds_vfio, + struct rb_root_cached *ranges, u32 nnodes, + u64 *page_size) +{ + struct pci_dev *pdev = pds_vfio->vfio_coredev.pdev; + struct device *pdsc_dev = &pci_physfn(pdev)->dev; + struct pds_vfio_dirty *dirty = &pds_vfio->dirty; + u64 region_start, region_size, region_page_size; + struct pds_lm_dirty_region_info *region_info; + struct interval_tree_node *node = NULL; + u8 max_regions = 0, num_regions; + dma_addr_t regions_dma = 0; + u32 num_ranges = nnodes; + u32 page_count; + u16 len; + int err; + + dev_dbg(&pdev->dev, "vf%u: Start dirty page tracking\n", + pds_vfio->vf_id); + + if (pds_vfio_dirty_is_enabled(pds_vfio)) + return -EINVAL; + + pds_vfio_dirty_set_enabled(pds_vfio); + + /* find if dirty tracking is disabled, i.e. num_regions == 0 */ + err = pds_vfio_dirty_status_cmd(pds_vfio, 0, &max_regions, + &num_regions); + if (err < 0) { + dev_err(&pdev->dev, "Failed to get dirty status, err %pe\n", + ERR_PTR(err)); + goto out_set_disabled; + } else if (num_regions) { + dev_err(&pdev->dev, + "Dirty tracking already enabled for %d regions\n", + num_regions); + err = -EEXIST; + goto out_set_disabled; + } else if (!max_regions) { + dev_err(&pdev->dev, + "Device doesn't support dirty tracking, max_regions %d\n", + max_regions); + err = -EOPNOTSUPP; + goto out_set_disabled; + } + + /* + * Only support 1 region for now. If there are any large gaps in the + * VM's address regions, then this would be a waste of memory as we are + * generating 2 bitmaps (ack/seq) from the min address to the max + * address of the VM's address regions. In the future, if we support + * more than one region in the device/driver we can split the bitmaps + * on the largest address region gaps. We can do this split up to the + * max_regions times returned from the dirty_status command. + */ + max_regions = 1; + if (num_ranges > max_regions) { + vfio_combine_iova_ranges(ranges, nnodes, max_regions); + num_ranges = max_regions; + } + + node = interval_tree_iter_first(ranges, 0, ULONG_MAX); + if (!node) { + err = -EINVAL; + goto out_set_disabled; + } + + region_size = node->last - node->start + 1; + region_start = node->start; + region_page_size = *page_size; + + len = sizeof(*region_info); + region_info = kzalloc(len, GFP_KERNEL); + if (!region_info) { + err = -ENOMEM; + goto out_set_disabled; + } + + page_count = DIV_ROUND_UP(region_size, region_page_size); + + region_info->dma_base = cpu_to_le64(region_start); + region_info->page_count = cpu_to_le32(page_count); + region_info->page_size_log2 = ilog2(region_page_size); + + regions_dma = dma_map_single(pdsc_dev, (void *)region_info, len, + DMA_BIDIRECTIONAL); + if (dma_mapping_error(pdsc_dev, regions_dma)) { + err = -ENOMEM; + goto out_free_region_info; + } + + err = pds_vfio_dirty_enable_cmd(pds_vfio, regions_dma, max_regions); + dma_unmap_single(pdsc_dev, regions_dma, len, DMA_BIDIRECTIONAL); + if (err) + goto out_free_region_info; + + /* + * page_count might be adjusted by the device, + * update it before freeing region_info DMA + */ + page_count = le32_to_cpu(region_info->page_count); + + dev_dbg(&pdev->dev, + "region_info: regions_dma 0x%llx dma_base 0x%llx page_count %u page_size_log2 %u\n", + regions_dma, region_start, page_count, + (u8)ilog2(region_page_size)); + + err = pds_vfio_dirty_alloc_bitmaps(dirty, page_count / BITS_PER_BYTE); + if (err) { + dev_err(&pdev->dev, "Failed to alloc dirty bitmaps: %pe\n", + ERR_PTR(err)); + goto out_free_region_info; + } + + err = pds_vfio_dirty_alloc_sgl(pds_vfio, page_count); + if (err) { + dev_err(&pdev->dev, "Failed to alloc dirty sg lists: %pe\n", + ERR_PTR(err)); + goto out_free_bitmaps; + } + + dirty->region_start = region_start; + dirty->region_size = region_size; + dirty->region_page_size = region_page_size; + + pds_vfio_print_guest_region_info(pds_vfio, max_regions); + + kfree(region_info); + + return 0; + +out_free_bitmaps: + pds_vfio_dirty_free_bitmaps(dirty); +out_free_region_info: + kfree(region_info); +out_set_disabled: + pds_vfio_dirty_set_disabled(pds_vfio); + return err; +} + +void pds_vfio_dirty_disable(struct pds_vfio_pci_device *pds_vfio, bool send_cmd) +{ + if (pds_vfio_dirty_is_enabled(pds_vfio)) { + pds_vfio_dirty_set_disabled(pds_vfio); + if (send_cmd) + pds_vfio_dirty_disable_cmd(pds_vfio); + pds_vfio_dirty_free_sgl(pds_vfio); + pds_vfio_dirty_free_bitmaps(&pds_vfio->dirty); + } + + if (send_cmd) + pds_vfio_send_host_vf_lm_status_cmd(pds_vfio, PDS_LM_STA_NONE); +} + +static int pds_vfio_dirty_seq_ack(struct pds_vfio_pci_device *pds_vfio, + struct pds_vfio_bmp_info *bmp_info, + u32 offset, u32 bmp_bytes, bool read_seq) +{ + const char *bmp_type_str = read_seq ? "read_seq" : "write_ack"; + u8 dma_dir = read_seq ? DMA_FROM_DEVICE : DMA_TO_DEVICE; + struct pci_dev *pdev = pds_vfio->vfio_coredev.pdev; + struct device *pdsc_dev = &pci_physfn(pdev)->dev; + unsigned long long npages; + struct sg_table sg_table; + struct scatterlist *sg; + struct page **pages; + u32 page_offset; + const void *bmp; + size_t size; + u16 num_sge; + int err; + int i; + + bmp = (void *)((u64)bmp_info->bmp + offset); + page_offset = offset_in_page(bmp); + bmp -= page_offset; + + /* + * Start and end of bitmap section to seq/ack might not be page + * aligned, so use the page_offset to account for that so there + * will be enough pages to represent the bmp_bytes + */ + npages = DIV_ROUND_UP_ULL(bmp_bytes + page_offset, PAGE_SIZE); + pages = kmalloc_array(npages, sizeof(*pages), GFP_KERNEL); + if (!pages) + return -ENOMEM; + + for (unsigned long long i = 0; i < npages; i++) { + struct page *page = vmalloc_to_page(bmp); + + if (!page) { + err = -EFAULT; + goto out_free_pages; + } + + pages[i] = page; + bmp += PAGE_SIZE; + } + + err = sg_alloc_table_from_pages(&sg_table, pages, npages, page_offset, + bmp_bytes, GFP_KERNEL); + if (err) + goto out_free_pages; + + err = dma_map_sgtable(pdsc_dev, &sg_table, dma_dir, 0); + if (err) + goto out_free_sg_table; + + for_each_sgtable_dma_sg(&sg_table, sg, i) { + struct pds_lm_sg_elem *sg_elem = &bmp_info->sgl[i]; + + sg_elem->addr = cpu_to_le64(sg_dma_address(sg)); + sg_elem->len = cpu_to_le32(sg_dma_len(sg)); + } + + num_sge = sg_table.nents; + size = num_sge * sizeof(struct pds_lm_sg_elem); + dma_sync_single_for_device(pdsc_dev, bmp_info->sgl_addr, size, dma_dir); + err = pds_vfio_dirty_seq_ack_cmd(pds_vfio, bmp_info->sgl_addr, num_sge, + offset, bmp_bytes, read_seq); + if (err) + dev_err(&pdev->dev, + "Dirty bitmap %s failed offset %u bmp_bytes %u num_sge %u DMA 0x%llx: %pe\n", + bmp_type_str, offset, bmp_bytes, + num_sge, bmp_info->sgl_addr, ERR_PTR(err)); + dma_sync_single_for_cpu(pdsc_dev, bmp_info->sgl_addr, size, dma_dir); + + dma_unmap_sgtable(pdsc_dev, &sg_table, dma_dir, 0); +out_free_sg_table: + sg_free_table(&sg_table); +out_free_pages: + kfree(pages); + + return err; +} + +static int pds_vfio_dirty_write_ack(struct pds_vfio_pci_device *pds_vfio, + u32 offset, u32 len) +{ + return pds_vfio_dirty_seq_ack(pds_vfio, &pds_vfio->dirty.host_ack, + offset, len, WRITE_ACK); +} + +static int pds_vfio_dirty_read_seq(struct pds_vfio_pci_device *pds_vfio, + u32 offset, u32 len) +{ + return pds_vfio_dirty_seq_ack(pds_vfio, &pds_vfio->dirty.host_seq, + offset, len, READ_SEQ); +} + +static int pds_vfio_dirty_process_bitmaps(struct pds_vfio_pci_device *pds_vfio, + struct iova_bitmap *dirty_bitmap, + u32 bmp_offset, u32 len_bytes) +{ + u64 page_size = pds_vfio->dirty.region_page_size; + u64 region_start = pds_vfio->dirty.region_start; + u32 bmp_offset_bit; + __le64 *seq, *ack; + int dword_count; + + dword_count = len_bytes / sizeof(u64); + seq = (__le64 *)((u64)pds_vfio->dirty.host_seq.bmp + bmp_offset); + ack = (__le64 *)((u64)pds_vfio->dirty.host_ack.bmp + bmp_offset); + bmp_offset_bit = bmp_offset * 8; + + for (int i = 0; i < dword_count; i++) { + u64 xor = le64_to_cpu(seq[i]) ^ le64_to_cpu(ack[i]); + + /* prepare for next write_ack call */ + ack[i] = seq[i]; + + for (u8 bit_i = 0; bit_i < BITS_PER_TYPE(u64); ++bit_i) { + if (xor & BIT(bit_i)) { + u64 abs_bit_i = bmp_offset_bit + + i * BITS_PER_TYPE(u64) + bit_i; + u64 addr = abs_bit_i * page_size + region_start; + + iova_bitmap_set(dirty_bitmap, addr, page_size); + } + } + } + + return 0; +} + +static int pds_vfio_dirty_sync(struct pds_vfio_pci_device *pds_vfio, + struct iova_bitmap *dirty_bitmap, + unsigned long iova, unsigned long length) +{ + struct device *dev = &pds_vfio->vfio_coredev.pdev->dev; + struct pds_vfio_dirty *dirty = &pds_vfio->dirty; + u64 bmp_offset, bmp_bytes; + u64 bitmap_size, pages; + int err; + + dev_dbg(dev, "vf%u: Get dirty page bitmap\n", pds_vfio->vf_id); + + if (!pds_vfio_dirty_is_enabled(pds_vfio)) { + dev_err(dev, "vf%u: Sync failed, dirty tracking is disabled\n", + pds_vfio->vf_id); + return -EINVAL; + } + + pages = DIV_ROUND_UP(length, pds_vfio->dirty.region_page_size); + bitmap_size = + round_up(pages, sizeof(u64) * BITS_PER_BYTE) / BITS_PER_BYTE; + + dev_dbg(dev, + "vf%u: iova 0x%lx length %lu page_size %llu pages %llu bitmap_size %llu\n", + pds_vfio->vf_id, iova, length, pds_vfio->dirty.region_page_size, + pages, bitmap_size); + + if (!length || ((dirty->region_start + iova + length) > + (dirty->region_start + dirty->region_size))) { + dev_err(dev, "Invalid iova 0x%lx and/or length 0x%lx to sync\n", + iova, length); + return -EINVAL; + } + + /* bitmap is modified in 64 bit chunks */ + bmp_bytes = ALIGN(DIV_ROUND_UP(length / dirty->region_page_size, + sizeof(u64)), + sizeof(u64)); + if (bmp_bytes != bitmap_size) { + dev_err(dev, + "Calculated bitmap bytes %llu not equal to bitmap size %llu\n", + bmp_bytes, bitmap_size); + return -EINVAL; + } + + bmp_offset = DIV_ROUND_UP(iova / dirty->region_page_size, sizeof(u64)); + + dev_dbg(dev, + "Syncing dirty bitmap, iova 0x%lx length 0x%lx, bmp_offset %llu bmp_bytes %llu\n", + iova, length, bmp_offset, bmp_bytes); + + err = pds_vfio_dirty_read_seq(pds_vfio, bmp_offset, bmp_bytes); + if (err) + return err; + + err = pds_vfio_dirty_process_bitmaps(pds_vfio, dirty_bitmap, bmp_offset, + bmp_bytes); + if (err) + return err; + + err = pds_vfio_dirty_write_ack(pds_vfio, bmp_offset, bmp_bytes); + if (err) + return err; + + return 0; +} + +int pds_vfio_dma_logging_report(struct vfio_device *vdev, unsigned long iova, + unsigned long length, struct iova_bitmap *dirty) +{ + struct pds_vfio_pci_device *pds_vfio = + container_of(vdev, struct pds_vfio_pci_device, + vfio_coredev.vdev); + int err; + + mutex_lock(&pds_vfio->state_mutex); + err = pds_vfio_dirty_sync(pds_vfio, dirty, iova, length); + pds_vfio_state_mutex_unlock(pds_vfio); + + return err; +} + +int pds_vfio_dma_logging_start(struct vfio_device *vdev, + struct rb_root_cached *ranges, u32 nnodes, + u64 *page_size) +{ + struct pds_vfio_pci_device *pds_vfio = + container_of(vdev, struct pds_vfio_pci_device, + vfio_coredev.vdev); + int err; + + mutex_lock(&pds_vfio->state_mutex); + pds_vfio_send_host_vf_lm_status_cmd(pds_vfio, PDS_LM_STA_IN_PROGRESS); + err = pds_vfio_dirty_enable(pds_vfio, ranges, nnodes, page_size); + pds_vfio_state_mutex_unlock(pds_vfio); + + return err; +} + +int pds_vfio_dma_logging_stop(struct vfio_device *vdev) +{ + struct pds_vfio_pci_device *pds_vfio = + container_of(vdev, struct pds_vfio_pci_device, + vfio_coredev.vdev); + + mutex_lock(&pds_vfio->state_mutex); + pds_vfio_dirty_disable(pds_vfio, true); + pds_vfio_state_mutex_unlock(pds_vfio); + + return 0; +} diff --git a/drivers/vfio/pci/pds/dirty.h b/drivers/vfio/pci/pds/dirty.h new file mode 100644 index 000000000000..f78da25d75ca --- /dev/null +++ b/drivers/vfio/pci/pds/dirty.h @@ -0,0 +1,39 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright(c) 2023 Advanced Micro Devices, Inc. */ + +#ifndef _DIRTY_H_ +#define _DIRTY_H_ + +struct pds_vfio_bmp_info { + unsigned long *bmp; + u32 bmp_bytes; + struct pds_lm_sg_elem *sgl; + dma_addr_t sgl_addr; + u16 num_sge; +}; + +struct pds_vfio_dirty { + struct pds_vfio_bmp_info host_seq; + struct pds_vfio_bmp_info host_ack; + u64 region_size; + u64 region_start; + u64 region_page_size; + bool is_enabled; +}; + +struct pds_vfio_pci_device; + +bool pds_vfio_dirty_is_enabled(struct pds_vfio_pci_device *pds_vfio); +void pds_vfio_dirty_set_enabled(struct pds_vfio_pci_device *pds_vfio); +void pds_vfio_dirty_set_disabled(struct pds_vfio_pci_device *pds_vfio); +void pds_vfio_dirty_disable(struct pds_vfio_pci_device *pds_vfio, + bool send_cmd); + +int pds_vfio_dma_logging_report(struct vfio_device *vdev, unsigned long iova, + unsigned long length, + struct iova_bitmap *dirty); +int pds_vfio_dma_logging_start(struct vfio_device *vdev, + struct rb_root_cached *ranges, u32 nnodes, + u64 *page_size); +int pds_vfio_dma_logging_stop(struct vfio_device *vdev); +#endif /* _DIRTY_H_ */ diff --git a/drivers/vfio/pci/pds/lm.c b/drivers/vfio/pci/pds/lm.c index 10c051d46851..a8745877a2fd 100644 --- a/drivers/vfio/pci/pds/lm.c +++ b/drivers/vfio/pci/pds/lm.c @@ -371,7 +371,7 @@ pds_vfio_step_device_state_locked(struct pds_vfio_pci_device *pds_vfio, if (cur == VFIO_DEVICE_STATE_STOP_COPY && next == VFIO_DEVICE_STATE_STOP) { pds_vfio_put_save_file(pds_vfio); - pds_vfio_send_host_vf_lm_status_cmd(pds_vfio, PDS_LM_STA_NONE); + pds_vfio_dirty_disable(pds_vfio, true); return NULL; } diff --git a/drivers/vfio/pci/pds/vfio_dev.c b/drivers/vfio/pci/pds/vfio_dev.c index f4dfd5d5ba09..eb832af39545 100644 --- a/drivers/vfio/pci/pds/vfio_dev.c +++ b/drivers/vfio/pci/pds/vfio_dev.c @@ -5,6 +5,7 @@ #include #include "lm.h" +#include "dirty.h" #include "vfio_dev.h" struct pci_dev *pds_vfio_to_pci_dev(struct pds_vfio_pci_device *pds_vfio) @@ -25,7 +26,7 @@ struct pds_vfio_pci_device *pds_vfio_pci_drvdata(struct pci_dev *pdev) vfio_coredev); } -static void pds_vfio_state_mutex_unlock(struct pds_vfio_pci_device *pds_vfio) +void pds_vfio_state_mutex_unlock(struct pds_vfio_pci_device *pds_vfio) { again: spin_lock(&pds_vfio->reset_lock); @@ -35,6 +36,7 @@ static void pds_vfio_state_mutex_unlock(struct pds_vfio_pci_device *pds_vfio) pds_vfio->state = VFIO_DEVICE_STATE_RUNNING; pds_vfio_put_restore_file(pds_vfio); pds_vfio_put_save_file(pds_vfio); + pds_vfio_dirty_disable(pds_vfio, false); } spin_unlock(&pds_vfio->reset_lock); goto again; @@ -117,6 +119,12 @@ static const struct vfio_migration_ops pds_vfio_lm_ops = { .migration_get_data_size = pds_vfio_get_device_state_size }; +static const struct vfio_log_ops pds_vfio_log_ops = { + .log_start = pds_vfio_dma_logging_start, + .log_stop = pds_vfio_dma_logging_stop, + .log_read_and_clear = pds_vfio_dma_logging_report, +}; + static int pds_vfio_init_device(struct vfio_device *vdev) { struct pds_vfio_pci_device *pds_vfio = @@ -137,6 +145,7 @@ static int pds_vfio_init_device(struct vfio_device *vdev) vdev->migration_flags = VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P; vdev->mig_ops = &pds_vfio_lm_ops; + vdev->log_ops = &pds_vfio_log_ops; pci_id = PCI_DEVID(pdev->bus->number, pdev->devfn); dev_dbg(&pdev->dev, @@ -175,6 +184,7 @@ static void pds_vfio_close_device(struct vfio_device *vdev) mutex_lock(&pds_vfio->state_mutex); pds_vfio_put_restore_file(pds_vfio); pds_vfio_put_save_file(pds_vfio); + pds_vfio_dirty_disable(pds_vfio, true); mutex_unlock(&pds_vfio->state_mutex); mutex_destroy(&pds_vfio->state_mutex); vfio_pci_core_close_device(vdev); diff --git a/drivers/vfio/pci/pds/vfio_dev.h b/drivers/vfio/pci/pds/vfio_dev.h index 31bd14de0c91..8109fe101694 100644 --- a/drivers/vfio/pci/pds/vfio_dev.h +++ b/drivers/vfio/pci/pds/vfio_dev.h @@ -7,6 +7,7 @@ #include #include +#include "dirty.h" #include "lm.h" struct pdsc; @@ -17,6 +18,7 @@ struct pds_vfio_pci_device { struct pds_vfio_lm_file *save_file; struct pds_vfio_lm_file *restore_file; + struct pds_vfio_dirty dirty; struct mutex state_mutex; /* protect migration state */ enum vfio_device_mig_state state; spinlock_t reset_lock; /* protect reset_done flow */ @@ -26,6 +28,8 @@ struct pds_vfio_pci_device { u16 client_id; }; +void pds_vfio_state_mutex_unlock(struct pds_vfio_pci_device *pds_vfio); + const struct vfio_device_ops *pds_vfio_ops_info(void); struct pds_vfio_pci_device *pds_vfio_pci_drvdata(struct pci_dev *pdev); void pds_vfio_reset(struct pds_vfio_pci_device *pds_vfio); diff --git a/include/linux/pds/pds_adminq.h b/include/linux/pds/pds_adminq.h index 29ac6514e421..a4ced2bfc534 100644 --- a/include/linux/pds/pds_adminq.h +++ b/include/linux/pds/pds_adminq.h @@ -835,6 +835,13 @@ enum pds_lm_cmd_opcode { PDS_LM_CMD_RESUME = 20, PDS_LM_CMD_SAVE = 21, PDS_LM_CMD_RESTORE = 22, + + /* Dirty page tracking commands */ + PDS_LM_CMD_DIRTY_STATUS = 32, + PDS_LM_CMD_DIRTY_ENABLE = 33, + PDS_LM_CMD_DIRTY_DISABLE = 34, + PDS_LM_CMD_DIRTY_READ_SEQ = 35, + PDS_LM_CMD_DIRTY_WRITE_ACK = 36, }; /** @@ -992,6 +999,172 @@ enum pds_lm_host_vf_status { PDS_LM_STA_MAX, }; +/** + * struct pds_lm_dirty_region_info - Memory region info for STATUS and ENABLE + * @dma_base: Base address of the DMA-contiguous memory region + * @page_count: Number of pages in the memory region + * @page_size_log2: Log2 page size in the memory region + * @rsvd: Word boundary padding + */ +struct pds_lm_dirty_region_info { + __le64 dma_base; + __le32 page_count; + u8 page_size_log2; + u8 rsvd[3]; +}; + +/** + * struct pds_lm_dirty_status_cmd - DIRTY_STATUS command + * @opcode: Opcode PDS_LM_CMD_DIRTY_STATUS + * @rsvd: Word boundary padding + * @vf_id: VF id + * @max_regions: Capacity of the region info buffer + * @rsvd2: Word boundary padding + * @regions_dma: DMA address of the region info buffer + * + * The minimum of max_regions (from the command) and num_regions (from the + * completion) of struct pds_lm_dirty_region_info will be written to + * regions_dma. + * + * The max_regions may be zero, in which case regions_dma is ignored. In that + * case, the completion will only report the maximum number of regions + * supported by the device, and the number of regions currently enabled. + */ +struct pds_lm_dirty_status_cmd { + u8 opcode; + u8 rsvd; + __le16 vf_id; + u8 max_regions; + u8 rsvd2[3]; + __le64 regions_dma; +} __packed; + +/** + * enum pds_lm_dirty_bmp_type - Type of dirty page bitmap + * @PDS_LM_DIRTY_BMP_TYPE_NONE: No bitmap / disabled + * @PDS_LM_DIRTY_BMP_TYPE_SEQ_ACK: Seq/Ack bitmap representation + */ +enum pds_lm_dirty_bmp_type { + PDS_LM_DIRTY_BMP_TYPE_NONE = 0, + PDS_LM_DIRTY_BMP_TYPE_SEQ_ACK = 1, +}; + +/** + * struct pds_lm_dirty_status_comp - STATUS command completion + * @status: Status of the command (enum pds_core_status_code) + * @rsvd: Word boundary padding + * @comp_index: Index in the desc ring for which this is the completion + * @max_regions: Maximum number of regions supported by the device + * @num_regions: Number of regions currently enabled + * @bmp_type: Type of dirty bitmap representation + * @rsvd2: Word boundary padding + * @bmp_type_mask: Mask of supported bitmap types, bit index per type + * @rsvd3: Word boundary padding + * @color: Color bit + * + * This completion descriptor is used for STATUS, ENABLE, and DISABLE. + */ +struct pds_lm_dirty_status_comp { + u8 status; + u8 rsvd; + __le16 comp_index; + u8 max_regions; + u8 num_regions; + u8 bmp_type; + u8 rsvd2; + __le32 bmp_type_mask; + u8 rsvd3[3]; + u8 color; +}; + +/** + * struct pds_lm_dirty_enable_cmd - DIRTY_ENABLE command + * @opcode: Opcode PDS_LM_CMD_DIRTY_ENABLE + * @rsvd: Word boundary padding + * @vf_id: VF id + * @bmp_type: Type of dirty bitmap representation + * @num_regions: Number of entries in the region info buffer + * @rsvd2: Word boundary padding + * @regions_dma: DMA address of the region info buffer + * + * The num_regions must be nonzero, and less than or equal to the maximum + * number of regions supported by the device. + * + * The memory regions should not overlap. + * + * The information should be initialized by the driver. The device may modify + * the information on successful completion, such as by size-aligning the + * number of pages in a region. + * + * The modified number of pages will be greater than or equal to the page count + * given in the enable command, and at least as coarsly aligned as the given + * value. For example, the count might be aligned to a multiple of 64, but + * if the value is already a multiple of 128 or higher, it will not change. + * If the driver requires its own minimum alignment of the number of pages, the + * driver should account for that already in the region info of this command. + * + * This command uses struct pds_lm_dirty_status_comp for its completion. + */ +struct pds_lm_dirty_enable_cmd { + u8 opcode; + u8 rsvd; + __le16 vf_id; + u8 bmp_type; + u8 num_regions; + u8 rsvd2[2]; + __le64 regions_dma; +} __packed; + +/** + * struct pds_lm_dirty_disable_cmd - DIRTY_DISABLE command + * @opcode: Opcode PDS_LM_CMD_DIRTY_DISABLE + * @rsvd: Word boundary padding + * @vf_id: VF id + * + * Dirty page tracking will be disabled. This may be called in any state, as + * long as dirty page tracking is supported by the device, to ensure that dirty + * page tracking is disabled. + * + * This command uses struct pds_lm_dirty_status_comp for its completion. On + * success, num_regions will be zero. + */ +struct pds_lm_dirty_disable_cmd { + u8 opcode; + u8 rsvd; + __le16 vf_id; +}; + +/** + * struct pds_lm_dirty_seq_ack_cmd - DIRTY_READ_SEQ or _WRITE_ACK command + * @opcode: Opcode PDS_LM_CMD_DIRTY_[READ_SEQ|WRITE_ACK] + * @rsvd: Word boundary padding + * @vf_id: VF id + * @off_bytes: Byte offset in the bitmap + * @len_bytes: Number of bytes to transfer + * @num_sge: Number of DMA scatter gather elements + * @rsvd2: Word boundary padding + * @sgl_addr: DMA address of scatter gather list + * + * Read bytes from the SEQ bitmap, or write bytes into the ACK bitmap. + * + * This command treats the entire bitmap as a byte buffer. It does not + * distinguish between guest memory regions. The driver should refer to the + * number of pages in each region, according to PDS_LM_CMD_DIRTY_STATUS, to + * determine the region boundaries in the bitmap. Each region will be + * represented by exactly the number of bits as the page count for that region, + * immediately following the last bit of the previous region. + */ +struct pds_lm_dirty_seq_ack_cmd { + u8 opcode; + u8 rsvd; + __le16 vf_id; + __le32 off_bytes; + __le32 len_bytes; + __le16 num_sge; + u8 rsvd2[2]; + __le64 sgl_addr; +} __packed; + /** * struct pds_lm_host_vf_status_cmd - HOST_VF_STATUS command * @opcode: Opcode PDS_LM_CMD_HOST_VF_STATUS @@ -1039,6 +1212,10 @@ union pds_core_adminq_cmd { struct pds_lm_save_cmd lm_save; struct pds_lm_restore_cmd lm_restore; struct pds_lm_host_vf_status_cmd lm_host_vf_status; + struct pds_lm_dirty_status_cmd lm_dirty_status; + struct pds_lm_dirty_enable_cmd lm_dirty_enable; + struct pds_lm_dirty_disable_cmd lm_dirty_disable; + struct pds_lm_dirty_seq_ack_cmd lm_dirty_seq_ack; }; union pds_core_adminq_comp { @@ -1065,6 +1242,7 @@ union pds_core_adminq_comp { struct pds_vdpa_vq_reset_comp vdpa_vq_reset; struct pds_lm_status_comp lm_status; + struct pds_lm_dirty_status_comp lm_dirty_status; }; #ifndef __CHECKER__ From patchwork Wed Jul 19 22:35:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brett Creeley X-Patchwork-Id: 13319607 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 19E0A22EFB for ; Wed, 19 Jul 2023 22:36:04 +0000 (UTC) Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2052.outbound.protection.outlook.com [40.107.223.52]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 845F02115; Wed, 19 Jul 2023 15:35:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ANc/04AUrVIHTFyjmZxT5tWR5lIZpIb8TVwU3vKVEer0WnLf9xVv7iPFk7p1YwB1kEFT7/smL4l02VNhUnDAuZbqXalI0qfAkBui/Mts7VuhuelUQHlWI9HHFM3hYyR5+d4EIl5QDCvhx5hC/3j5rq4IiMiMa47yX3UhCMjHilSgjVW8r64vHmrPzp1PvmkI3XvUwJD44d37AHHiuS6LrvXLRDDVnQdIzfpjy9X34jUaqqWCxGRfxJ97BCLd7EMunkj57IVJpZhM4FwEvmckWf0qjQ0bZkqf93P3Aen57Vx2mRMhnZnWdvky04TO8ji+Sic4C0WhuEIzPf928q5XJg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=uKEoZ7TG/GaAjkg6Y9oOKc+HMAq4NthSj7iXw+tARsE=; b=jD4FDBMPB1Am1nqyT6RPjRuN+z6F/HYP0DYFK14Ispv9XFpX7YAfk1MBENkI4tO0B+kKRdlF0Vjwx0HKwvVV6uNi8EPxKILSyUB4pwXGSL+Ugbku5rhVpm368SNVaUV3hpLWy70InwNHzBX4le7r6rmEroRvG6hhew9mFz4O62tvXKmmnI3FU+ty8pySyzakUITqwKOSnzU6k4X/vwdVQI+7GNAyXJ24SxoFsKZ4FF40DgE9hn5tczM/ynPwCLK24/vVktqrOJvYhCCiSF88TRTumGEi2A9FIYjb7+VLYayOSrTA6AB2fLLBO6i3ql8c8BXN93DnaA9Rr/G0na/n/g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=uKEoZ7TG/GaAjkg6Y9oOKc+HMAq4NthSj7iXw+tARsE=; b=az2ciWp8uG+ZEun/OcD00z6NsR1LB1aOzp4UdaON32wz5EX49RbtGHHNB6X8YJpYuA+WMn7+XQ6QS+a8/3bpnJjAYxUkNB/Fw1Ze50aaqgSIGBd/E+r8e7AuFrp3ZpdDETYdPcRxII+F6/gG2I05+76IfQQhpEyKypn4AF3PCng= Received: from BN0PR08CA0023.namprd08.prod.outlook.com (2603:10b6:408:142::17) by PH7PR12MB7113.namprd12.prod.outlook.com (2603:10b6:510:1ec::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6609.24; Wed, 19 Jul 2023 22:35:53 +0000 Received: from BN8NAM11FT046.eop-nam11.prod.protection.outlook.com (2603:10b6:408:142:cafe::e1) by BN0PR08CA0023.outlook.office365.com (2603:10b6:408:142::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6609.24 via Frontend Transport; Wed, 19 Jul 2023 22:35:52 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by BN8NAM11FT046.mail.protection.outlook.com (10.13.177.127) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.6588.35 via Frontend Transport; Wed, 19 Jul 2023 22:35:52 +0000 Received: from driver-dev1.pensando.io (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Wed, 19 Jul 2023 17:35:51 -0500 From: Brett Creeley To: , , , , , , CC: , Subject: [PATCH v12 vfio 6/7] vfio/pds: Add support for firmware recovery Date: Wed, 19 Jul 2023 15:35:26 -0700 Message-ID: <20230719223527.12795-7-brett.creeley@amd.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230719223527.12795-1-brett.creeley@amd.com> References: <20230719223527.12795-1-brett.creeley@amd.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN8NAM11FT046:EE_|PH7PR12MB7113:EE_ X-MS-Office365-Filtering-Correlation-Id: 135f52c6-8e20-48da-bb2f-08db88a8827d X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 41uG6pNMy8+pa0OsaHX1Ff9IavNr91VtYNH37u6xpWYYVNmKQwUlweEiDoVzn50zA5X1yC7Iu/nvdk44DC1GIHon1V1s/OH9l84JWJMzivGYa0wHt+99pXuGyq7CeRD1a4N6jj0qZ06aCczrKY7a1ZNM6cku04xzmwICLBQTUTzc+ck4kLHePrfziA8NSiliDraAPdwbkr5t2Re+Pth/6+4HpX+KKR7lhWirOeLPNdSPk9Yc9ICqTnv9ZUhYOwTnQdWmBQTHan+ECdLNbW+htSt7/umkSQGjuxvOIfuxBPtK5G/ibrIJxCLHfFGEaPVisQyrhL5KFGPJ+0SehqJHrTheIOsOmNdRA0sfJwOGEWgfjFC/A4DywClq3Y6DFovSJ7qzqzxqQ2iXSei/sile+QHCCB8WEVNe0rKfV9iuFUDMnwNhFTVLqq9SWZ58hTMUlVzwkY2Wm2AUNUJS0+0vaXnUAbrFF5/1ceha9Tu/6h0l1iWULAVVbsOf/rmeM0W7o1OVt/QupiRQu3Tycwf20KFb9AG2yPQA7CyEJOW2XagCFU+HUE2SniA+nRdFeN2MYpTIX6PoO3E/NoUJrGOOk+iEy2IQ4hfaD27G86vsFrWFHLroi6KYkaYrlKqkpJl4N3jPmQgBYy/86KtFD3JZPUDXfnGsVeHCWdXeqF7/CkcyAL/8h62w+J3UEV2SMje29CfgvQxKSY+e63BnaZKuz5wveGLJMuDD8OTAXfjUb0mlm5bM3U5eVouHIgF3jck1RH3xV+l1B1Js4ZvtwPofrw== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230028)(4636009)(136003)(396003)(39860400002)(376002)(346002)(451199021)(82310400008)(46966006)(36840700001)(40470700004)(426003)(2616005)(186003)(16526019)(1076003)(83380400001)(6666004)(47076005)(26005)(36860700001)(478600001)(316002)(54906003)(110136005)(336012)(4326008)(70586007)(70206006)(44832011)(5660300002)(41300700001)(8936002)(8676002)(2906002)(86362001)(36756003)(40460700003)(81166007)(356005)(82740400003)(40480700001)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 19 Jul 2023 22:35:52.7749 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 135f52c6-8e20-48da-bb2f-08db88a8827d X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT046.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH7PR12MB7113 X-Spam-Status: No, score=-1.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FORGED_SPF_HELO, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE, T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net It's possible that the device firmware crashes and is able to recover due to some configuration and/or other issue. If a live migration is in progress while the firmware crashes, the live migration will fail. However, the VF PCI device should still be functional post crash recovery and subsequent migrations should go through as expected. When the pds_core device notices that firmware crashes it sends an event to all its client drivers. When the pds_vfio driver receives this event while migration is in progress it will request a deferred reset on the next migration state transition. This state transition will report failure as well as any subsequent state transition requests from the VMM/VFIO. Based on uapi/vfio.h the only way out of VFIO_DEVICE_STATE_ERROR is by issuing VFIO_DEVICE_RESET. Once this reset is done, the migration state will be reset to VFIO_DEVICE_STATE_RUNNING and migration can be performed. If the event is received while no migration is in progress (i.e. the VM is in normal operating mode), then no actions are taken and the migration state remains VFIO_DEVICE_STATE_RUNNING. Signed-off-by: Brett Creeley Signed-off-by: Shannon Nelson --- drivers/vfio/pci/pds/pci_drv.c | 114 ++++++++++++++++++++++++++++++++ drivers/vfio/pci/pds/vfio_dev.c | 17 ++++- drivers/vfio/pci/pds/vfio_dev.h | 2 + 3 files changed, 131 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/pci/pds/pci_drv.c b/drivers/vfio/pci/pds/pci_drv.c index c1739edd261a..a03b7d5a3e60 100644 --- a/drivers/vfio/pci/pds/pci_drv.c +++ b/drivers/vfio/pci/pds/pci_drv.c @@ -19,6 +19,113 @@ #define PDS_VFIO_DRV_DESCRIPTION "AMD/Pensando VFIO Device Driver" #define PCI_VENDOR_ID_PENSANDO 0x1dd8 +static void pds_vfio_recovery(struct pds_vfio_pci_device *pds_vfio) +{ + bool deferred_reset_needed = false; + + /* + * Documentation states that the kernel migration driver must not + * generate asynchronous device state transitions outside of + * manipulation by the user or the VFIO_DEVICE_RESET ioctl. + * + * Since recovery is an asynchronous event received from the device, + * initiate a deferred reset. Issue a deferred reset in the following + * situations: + * 1. Migration is in progress, which will cause the next step of + * the migration to fail. + * 2. If the device is in a state that will be set to + * VFIO_DEVICE_STATE_RUNNING on the next action (i.e. VM is + * shutdown and device is in VFIO_DEVICE_STATE_STOP). + */ + mutex_lock(&pds_vfio->state_mutex); + if ((pds_vfio->state != VFIO_DEVICE_STATE_RUNNING && + pds_vfio->state != VFIO_DEVICE_STATE_ERROR) || + (pds_vfio->state == VFIO_DEVICE_STATE_RUNNING && + pds_vfio_dirty_is_enabled(pds_vfio))) + deferred_reset_needed = true; + mutex_unlock(&pds_vfio->state_mutex); + + /* + * On the next user initiated state transition, the device will + * transition to the VFIO_DEVICE_STATE_ERROR. At this point it's the user's + * responsibility to reset the device. + * + * If a VFIO_DEVICE_RESET is requested post recovery and before the next + * state transition, then the deferred reset state will be set to + * VFIO_DEVICE_STATE_RUNNING. + */ + if (deferred_reset_needed) { + spin_lock(&pds_vfio->reset_lock); + pds_vfio->deferred_reset = true; + pds_vfio->deferred_reset_state = VFIO_DEVICE_STATE_ERROR; + spin_unlock(&pds_vfio->reset_lock); + } +} + +static int pds_vfio_pci_notify_handler(struct notifier_block *nb, + unsigned long ecode, void *data) +{ + struct pds_vfio_pci_device *pds_vfio = + container_of(nb, struct pds_vfio_pci_device, nb); + struct device *dev = pds_vfio_to_dev(pds_vfio); + union pds_core_notifyq_comp *event = data; + + dev_dbg(dev, "%s: event code %lu\n", __func__, ecode); + + /* + * We don't need to do anything for RESET state==0 as there is no notify + * or feedback mechanism available, and it is possible that we won't + * even see a state==0 event since the pds_core recovery is pending. + * + * Any requests from VFIO while state==0 will fail, which will return + * error and may cause migration to fail. + */ + if (ecode == PDS_EVENT_RESET) { + dev_info(dev, "%s: PDS_EVENT_RESET event received, state==%d\n", + __func__, event->reset.state); + /* + * pds_core device finished recovery and sent us the + * notification (state == 1) to allow us to recover + */ + if (event->reset.state == 1) + pds_vfio_recovery(pds_vfio); + } + + return 0; +} + +static int +pds_vfio_pci_register_event_handler(struct pds_vfio_pci_device *pds_vfio) +{ + struct device *dev = pds_vfio_to_dev(pds_vfio); + struct notifier_block *nb = &pds_vfio->nb; + int err; + + if (!nb->notifier_call) { + nb->notifier_call = pds_vfio_pci_notify_handler; + err = pdsc_register_notify(nb); + if (err) { + nb->notifier_call = NULL; + dev_err(dev, + "failed to register pds event handler: %pe\n", + ERR_PTR(err)); + return -EINVAL; + } + dev_dbg(dev, "pds event handler registered\n"); + } + + return 0; +} + +static void +pds_vfio_pci_unregister_event_handler(struct pds_vfio_pci_device *pds_vfio) +{ + if (pds_vfio->nb.notifier_call) { + pdsc_unregister_notify(&pds_vfio->nb); + pds_vfio->nb.notifier_call = NULL; + } +} + static int pds_vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) { @@ -48,8 +155,14 @@ static int pds_vfio_pci_probe(struct pci_dev *pdev, goto out_unregister_coredev; } + err = pds_vfio_pci_register_event_handler(pds_vfio); + if (err) + goto out_unregister_client; + return 0; +out_unregister_client: + pds_vfio_unregister_client_cmd(pds_vfio); out_unregister_coredev: vfio_pci_core_unregister_device(&pds_vfio->vfio_coredev); out_put_vdev: @@ -61,6 +174,7 @@ static void pds_vfio_pci_remove(struct pci_dev *pdev) { struct pds_vfio_pci_device *pds_vfio = pds_vfio_pci_drvdata(pdev); + pds_vfio_pci_unregister_event_handler(pds_vfio); pds_vfio_unregister_client_cmd(pds_vfio); vfio_pci_core_unregister_device(&pds_vfio->vfio_coredev); vfio_put_device(&pds_vfio->vfio_coredev.vdev); diff --git a/drivers/vfio/pci/pds/vfio_dev.c b/drivers/vfio/pci/pds/vfio_dev.c index eb832af39545..b38c66bf28a4 100644 --- a/drivers/vfio/pci/pds/vfio_dev.c +++ b/drivers/vfio/pci/pds/vfio_dev.c @@ -33,11 +33,12 @@ void pds_vfio_state_mutex_unlock(struct pds_vfio_pci_device *pds_vfio) if (pds_vfio->deferred_reset) { pds_vfio->deferred_reset = false; if (pds_vfio->state == VFIO_DEVICE_STATE_ERROR) { - pds_vfio->state = VFIO_DEVICE_STATE_RUNNING; pds_vfio_put_restore_file(pds_vfio); pds_vfio_put_save_file(pds_vfio); pds_vfio_dirty_disable(pds_vfio, false); } + pds_vfio->state = pds_vfio->deferred_reset_state; + pds_vfio->deferred_reset_state = VFIO_DEVICE_STATE_RUNNING; spin_unlock(&pds_vfio->reset_lock); goto again; } @@ -49,6 +50,7 @@ void pds_vfio_reset(struct pds_vfio_pci_device *pds_vfio) { spin_lock(&pds_vfio->reset_lock); pds_vfio->deferred_reset = true; + pds_vfio->deferred_reset_state = VFIO_DEVICE_STATE_RUNNING; if (!mutex_trylock(&pds_vfio->state_mutex)) { spin_unlock(&pds_vfio->reset_lock); return; @@ -67,7 +69,14 @@ pds_vfio_set_device_state(struct vfio_device *vdev, struct file *res = NULL; mutex_lock(&pds_vfio->state_mutex); - while (new_state != pds_vfio->state) { + /* + * only way to transition out of VFIO_DEVICE_STATE_ERROR is via + * VFIO_DEVICE_RESET, so prevent the state machine from running since + * vfio_mig_get_next_state() will throw a WARN_ON() when transitioning + * from VFIO_DEVICE_STATE_ERROR to any other state + */ + while (pds_vfio->state != VFIO_DEVICE_STATE_ERROR && + new_state != pds_vfio->state) { enum vfio_device_mig_state next_state; int err = vfio_mig_get_next_state(vdev, pds_vfio->state, @@ -89,6 +98,9 @@ pds_vfio_set_device_state(struct vfio_device *vdev, } } pds_vfio_state_mutex_unlock(pds_vfio); + /* still waiting on a deferred_reset */ + if (pds_vfio->state == VFIO_DEVICE_STATE_ERROR) + res = ERR_PTR(-EIO); return res; } @@ -169,6 +181,7 @@ static int pds_vfio_open_device(struct vfio_device *vdev) mutex_init(&pds_vfio->state_mutex); pds_vfio->state = VFIO_DEVICE_STATE_RUNNING; + pds_vfio->deferred_reset_state = VFIO_DEVICE_STATE_RUNNING; vfio_pci_core_finish_enable(&pds_vfio->vfio_coredev); diff --git a/drivers/vfio/pci/pds/vfio_dev.h b/drivers/vfio/pci/pds/vfio_dev.h index 8109fe101694..30a7d22f48eb 100644 --- a/drivers/vfio/pci/pds/vfio_dev.h +++ b/drivers/vfio/pci/pds/vfio_dev.h @@ -23,6 +23,8 @@ struct pds_vfio_pci_device { enum vfio_device_mig_state state; spinlock_t reset_lock; /* protect reset_done flow */ u8 deferred_reset; + enum vfio_device_mig_state deferred_reset_state; + struct notifier_block nb; int vf_id; u16 client_id; From patchwork Wed Jul 19 22:35:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brett Creeley X-Patchwork-Id: 13319606 X-Patchwork-Delegate: kuba@kernel.org Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C05E8263C4 for ; Wed, 19 Jul 2023 22:36:01 +0000 (UTC) Received: from NAM12-BN8-obe.outbound.protection.outlook.com (mail-bn8nam12on2052.outbound.protection.outlook.com [40.107.237.52]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 901951FF3; Wed, 19 Jul 2023 15:35:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=PhEWyoxd6j76a+usbu2OOI/0Nca6hE6m/sQOZUB88Ja74S8ZP2pb2CoUO9R0I52EApe455bBIiE3vIIsd+r2g8TcLXetFskmHC/nmLFTrCyLsMkhhp8+fMvkvpvIWNGR/2bd9FWpZJ2vHc6TABsgLdyGo1dNLoQY253MdrN076LHv4eyO9bgv3aEH9KiPLXvL4Fo7PfzxI3gwr0V1AYyk46G03kJQd0jK02+m8FQed2kAZ1VY4IF9ce5/EV3gfowVEVWQJ/HwgoVa4FuOTiprs8lqpq7OV8HSNd0ZnUauH6/7+RPF376y9oqnGIVI89u8TxPFEuNBqfs8JQr/RwsQg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=7W3Xv+v/QshPyHRUjBk8bST9H/pxbU+hcgI2UZ96j1k=; b=VC10yaUjnfug8lGP3urG8JqwrkbqQlqGJAg2UjMfgjEMCCM0KAwCdawVfsKIieF9opyEhP1fQZy6x0ak+JEtlzzyFWRmadnoiLHie6Vgi3jmbJuOjtrZG5ZLhUof6kjA4oWiY3wW9CSo8Pdw/Wkx4DTGhop24J5TrFeVKAdo5ctqb48XyMUZVQjMceylE3jMoB+S12Xo/PsnUfxfFP8PrpUhQuWIxT0Y0bn40b9WuHcyM05XtxWMkO73rL+adEF3ly+pPwiqbYwnmIwglhuTSCJuCWTlhLmoFmc0MgwV4aSHgJWwYRkb8iBowqJKMva3oDrQri/mMhdRH5fkEar8YA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=7W3Xv+v/QshPyHRUjBk8bST9H/pxbU+hcgI2UZ96j1k=; b=0iaObt4ckMg8rse/uoMEn2JTBHQa4pdH2QyqUqYRfJ1/CaLwW3vNRb7mbCsAXd4VLnYVy8bC2ijzweOvEM/czeC6rAm9LVzdYpkv4oHCXoK8wQZPCsUplAIaoq/bLufCJ3HwQLKFe+Eag8qizCpAo5oxT8w1lHBOUvBHKP/jagU= Received: from BN9PR03CA0954.namprd03.prod.outlook.com (2603:10b6:408:108::29) by SA3PR12MB9225.namprd12.prod.outlook.com (2603:10b6:806:39e::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6588.27; Wed, 19 Jul 2023 22:35:54 +0000 Received: from BN8NAM11FT026.eop-nam11.prod.protection.outlook.com (2603:10b6:408:108:cafe::46) by BN9PR03CA0954.outlook.office365.com (2603:10b6:408:108::29) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6588.33 via Frontend Transport; Wed, 19 Jul 2023 22:35:54 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by BN8NAM11FT026.mail.protection.outlook.com (10.13.177.51) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.6588.34 via Frontend Transport; Wed, 19 Jul 2023 22:35:54 +0000 Received: from driver-dev1.pensando.io (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Wed, 19 Jul 2023 17:35:52 -0500 From: Brett Creeley To: , , , , , , CC: , Subject: [PATCH v12 vfio 7/7] vfio/pds: Add Kconfig and documentation Date: Wed, 19 Jul 2023 15:35:27 -0700 Message-ID: <20230719223527.12795-8-brett.creeley@amd.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230719223527.12795-1-brett.creeley@amd.com> References: <20230719223527.12795-1-brett.creeley@amd.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN8NAM11FT026:EE_|SA3PR12MB9225:EE_ X-MS-Office365-Filtering-Correlation-Id: 13d82df7-a432-4298-d6ab-08db88a88352 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: mUxx04iWKhCHcocVfR3Vcefp3Alm+1eYXzcnECbjWOZi81Rf2gp3FJDOda4+jf8rosaJU3A1fn4QA3qqFIEXF3/zw2hksNgK3NwDmcm7Ki4b1Y/IZNl/P0Zr9h6oO8h3Jqa2HnYav85KLLPYnOCnH9MYyUtre8LdDTGk92TH6H9eTIzlzirEk5r1tUNyUkJoAhK95+7kczIds2rUfoTxQfmVT1DPOAvdlDLbs8UJWLMvGMHuJChTZFWuwfcM4Ed2i6kIpTc9Iux0MQZjA6wN5DvR15fQk+Fgicvh7Enp4hOMeyBdca6/fWP26gUv+f4RNfTFU4/lrEnypBU1DX+JgAV4UcyujhQIb+qaPkZxwqwtSFT17Y0rMqAnphHnRKGDuv9+GodSPt7SKoarVset22I9/KHHFuhMnr0lbwmUbFVPQJkXRwmHMRLaXmQf2yXuBwhYUgwlYfzJc6b6fzA47IdqwoJ/3egxN6eXvJ9a8wuSPcITgHFGpWsYoFgBVRbl+f26lCawGAHCvbnhai1o5WXeECQs+aUChG+elJCYdr/St4y82y8UjbWd0/EZ/oSxD/Ia06CHU6/8V1/NHisIqxYXkhx1WIy2tw7B5bBFUEirQNqUbvAJHnhAq//MnB0JCHMPc+G2/eGXUZI+9mTyzEumHMGAKmJybWlILL63X0+SvgVL9jfHf69X2Voa8HMnm8M/WPzWF3ewrXiSI0EjLYmWG+CMJkEeXTQACjx1g64eU0oAayVGAPfKtkJYNPF7wUX73gKAz3lIAD1MqqLWyw== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230028)(4636009)(39860400002)(376002)(136003)(346002)(396003)(451199021)(82310400008)(40470700004)(46966006)(36840700001)(1076003)(40460700003)(6666004)(36860700001)(47076005)(186003)(16526019)(2616005)(336012)(83380400001)(426003)(36756003)(356005)(82740400003)(86362001)(81166007)(26005)(40480700001)(4326008)(70586007)(41300700001)(2906002)(44832011)(316002)(70206006)(110136005)(8936002)(8676002)(54906003)(5660300002)(478600001)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 19 Jul 2023 22:35:54.1715 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 13d82df7-a432-4298-d6ab-08db88a88352 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT026.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA3PR12MB9225 X-Spam-Status: No, score=-1.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FORGED_SPF_HELO, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_NONE, T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net X-Patchwork-Delegate: kuba@kernel.org Add Kconfig entries and pds-vfio-pci.rst. Also, add an entry in the MAINTAINERS file for this new driver. It's not clear where documentation for vendor specific VFIO drivers should live, so just re-use the current amd ethernet location. Signed-off-by: Brett Creeley Signed-off-by: Shannon Nelson Reviewed-by: Simon Horman --- .../ethernet/amd/pds_vfio_pci.rst | 79 +++++++++++++++++++ .../device_drivers/ethernet/index.rst | 1 + MAINTAINERS | 7 ++ drivers/vfio/pci/Kconfig | 2 + drivers/vfio/pci/pds/Kconfig | 19 +++++ 5 files changed, 108 insertions(+) create mode 100644 Documentation/networking/device_drivers/ethernet/amd/pds_vfio_pci.rst create mode 100644 drivers/vfio/pci/pds/Kconfig diff --git a/Documentation/networking/device_drivers/ethernet/amd/pds_vfio_pci.rst b/Documentation/networking/device_drivers/ethernet/amd/pds_vfio_pci.rst new file mode 100644 index 000000000000..7a6bc848a2b2 --- /dev/null +++ b/Documentation/networking/device_drivers/ethernet/amd/pds_vfio_pci.rst @@ -0,0 +1,79 @@ +.. SPDX-License-Identifier: GPL-2.0+ +.. note: can be edited and viewed with /usr/bin/formiko-vim + +========================================================== +PCI VFIO driver for the AMD/Pensando(R) DSC adapter family +========================================================== + +AMD/Pensando Linux VFIO PCI Device Driver +Copyright(c) 2023 Advanced Micro Devices, Inc. + +Overview +======== + +The ``pds-vfio-pci`` module is a PCI driver that supports Live Migration +capable Virtual Function (VF) devices in the DSC hardware. + +Using the device +================ + +The pds-vfio-pci device is enabled via multiple configuration steps and +depends on the ``pds_core`` driver to create and enable SR-IOV Virtual +Function devices. + +Shown below are the steps to bind the driver to a VF and also to the +associated auxiliary device created by the ``pds_core`` driver. This +example assumes the pds_core and pds-vfio-pci modules are already +loaded. + +.. code-block:: bash + :name: example-setup-script + + #!/bin/bash + + PF_BUS="0000:60" + PF_BDF="0000:60:00.0" + VF_BDF="0000:60:00.1" + + # Prevent non-vfio VF driver from probing the VF device + echo 0 > /sys/class/pci_bus/$PF_BUS/device/$PF_BDF/sriov_drivers_autoprobe + + # Create single VF for Live Migration via pds_core + echo 1 > /sys/bus/pci/drivers/pds_core/$PF_BDF/sriov_numvfs + + # Allow the VF to be bound to the pds-vfio-pci driver + echo "pds-vfio-pci" > /sys/class/pci_bus/$PF_BUS/device/$VF_BDF/driver_override + + # Bind the VF to the pds-vfio-pci driver + echo "$VF_BDF" > /sys/bus/pci/drivers/pds-vfio-pci/bind + +After performing the steps above, a file in /dev/vfio/ +should have been created. + + +Enabling the driver +=================== + +The driver is enabled via the standard kernel configuration system, +using the make command:: + + make oldconfig/menuconfig/etc. + +The driver is located in the menu structure at: + + -> Device Drivers + -> VFIO Non-Privileged userspace driver framework + -> VFIO support for PDS PCI devices + +Support +======= + +For general Linux networking support, please use the netdev mailing +list, which is monitored by Pensando personnel:: + + netdev@vger.kernel.org + +For more specific support needs, please use the Pensando driver support +email:: + + drivers@pensando.io diff --git a/Documentation/networking/device_drivers/ethernet/index.rst b/Documentation/networking/device_drivers/ethernet/index.rst index 94ecb67c0885..9827e816084b 100644 --- a/Documentation/networking/device_drivers/ethernet/index.rst +++ b/Documentation/networking/device_drivers/ethernet/index.rst @@ -16,6 +16,7 @@ Contents: altera/altera_tse amd/pds_core amd/pds_vdpa + amd/pds_vfio_pci aquantia/atlantic chelsio/cxgb cirrus/cs89x0 diff --git a/MAINTAINERS b/MAINTAINERS index d30f622ed7b9..d320d76aaf7b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -22423,6 +22423,13 @@ S: Maintained P: Documentation/driver-api/vfio-pci-device-specific-driver-acceptance.rst F: drivers/vfio/pci/*/ +VFIO PDS PCI DRIVER +M: Brett Creeley +L: kvm@vger.kernel.org +S: Maintained +F: Documentation/networking/device_drivers/ethernet/amd/pds_vfio_pci.rst +F: drivers/vfio/pci/pds/ + VFIO PLATFORM DRIVER M: Eric Auger L: kvm@vger.kernel.org diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig index 86bb7835cf3c..8125e5f37832 100644 --- a/drivers/vfio/pci/Kconfig +++ b/drivers/vfio/pci/Kconfig @@ -63,4 +63,6 @@ source "drivers/vfio/pci/mlx5/Kconfig" source "drivers/vfio/pci/hisilicon/Kconfig" +source "drivers/vfio/pci/pds/Kconfig" + endmenu diff --git a/drivers/vfio/pci/pds/Kconfig b/drivers/vfio/pci/pds/Kconfig new file mode 100644 index 000000000000..407b3fd32733 --- /dev/null +++ b/drivers/vfio/pci/pds/Kconfig @@ -0,0 +1,19 @@ +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2023 Advanced Micro Devices, Inc. + +config PDS_VFIO_PCI + tristate "VFIO support for PDS PCI devices" + depends on PDS_CORE + select VFIO_PCI_CORE + help + This provides generic PCI support for PDS devices using the VFIO + framework. + + More specific information on this driver can be + found in + . + + To compile this driver as a module, choose M here. The module + will be called pds-vfio-pci. + + If you don't know what to do here, say N.