From patchwork Wed Sep 22 10:38:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 12510193 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-20.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0652AC433FE for ; Wed, 22 Sep 2021 10:39:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E4A356127B for ; Wed, 22 Sep 2021 10:39:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235070AbhIVKkn (ORCPT ); Wed, 22 Sep 2021 06:40:43 -0400 Received: from mail.kernel.org ([198.145.29.99]:48844 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234734AbhIVKkl (ORCPT ); Wed, 22 Sep 2021 06:40:41 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 185B061211; Wed, 22 Sep 2021 10:39:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1632307151; bh=Zadl6Tz0FhQmg6kXGXYq+M3fminDfVp/WJYOCet7ElI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=tj9MYBxm7VeBHUWsrSjtKTY1ACo/vTmq3pah33yVHDgSsaHTXvQZLAF9FRNyZoBZw hYGQBqyj3Mu8Ymv8QnKj4zeF3ugeEJgHW8uHlDXGYDLfiz+rcjxOZtf0AwfqVcpwMf ioaWatENJY+6FTfYiGJGnGKmDI1ga8oX4IS+bZrJHv3wPyb0TtNEl1/Olti5nmLPNO zxXRLzPIt160WFdUxeIN0lw9PwT4MeRx8kmkAMBwiA8nQ/P25m0pOIXauRP0ylMl05 avpd15AHvVluKQEYx/pZhZep1HE59BSu+bc3YQdfNQeTv8JlFFWlfMCIw5vhhKnQbw lGvZhpG0XvQMg== From: Leon Romanovsky To: Doug Ledford , Jason Gunthorpe Cc: Alex Williamson , Bjorn Helgaas , "David S. Miller" , Jakub Kicinski , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-rdma@vger.kernel.org, netdev@vger.kernel.org, Saeed Mahameed , Yishai Hadas Subject: [PATCH mlx5-next 1/7] PCI/IOV: Provide internal VF index Date: Wed, 22 Sep 2021 13:38:50 +0300 Message-Id: <8d5bba9a6a1067989c3291fa2929528578812334.1632305919.git.leonro@nvidia.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org From: Jason Gunthorpe The PCI core uses the VF index internally, often called the vf_id, during the setup of the VF, eg pci_iov_add_virtfn(). This index is needed for device drivers that implement live migration for their internal operations that configure/control their VFs. Specifically, mlx5_vfio_pci driver that is introduced in coming patches from this series needs it and not the bus/device/function which is exposed today. Add pci_iov_vf_id() which computes the vf_id by reversing the math that was used to create the bus/device/function. Signed-off-by: Yishai Hadas Signed-off-by: Jason Gunthorpe Signed-off-by: Leon Romanovsky Acked-by: Bjorn Helgaas --- drivers/pci/iov.c | 14 ++++++++++++++ include/linux/pci.h | 7 ++++++- 2 files changed, 20 insertions(+), 1 deletion(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index dafdc652fcd0..e7751fa3fe0b 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -33,6 +33,20 @@ int pci_iov_virtfn_devfn(struct pci_dev *dev, int vf_id) } EXPORT_SYMBOL_GPL(pci_iov_virtfn_devfn); +int pci_iov_vf_id(struct pci_dev *dev) +{ + struct pci_dev *pf; + + if (!dev->is_virtfn) + return -EINVAL; + + pf = pci_physfn(dev); + return (((dev->bus->number << 8) + dev->devfn) - + ((pf->bus->number << 8) + pf->devfn + pf->sriov->offset)) / + pf->sriov->stride; +} +EXPORT_SYMBOL_GPL(pci_iov_vf_id); + /* * Per SR-IOV spec sec 3.3.10 and 3.3.11, First VF Offset and VF Stride may * change when NumVFs changes. diff --git a/include/linux/pci.h b/include/linux/pci.h index cd8aa6fce204..4d6c73506e18 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -2153,7 +2153,7 @@ void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar); #ifdef CONFIG_PCI_IOV int pci_iov_virtfn_bus(struct pci_dev *dev, int id); int pci_iov_virtfn_devfn(struct pci_dev *dev, int id); - +int pci_iov_vf_id(struct pci_dev *dev); int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn); void pci_disable_sriov(struct pci_dev *dev); @@ -2181,6 +2181,11 @@ static inline int pci_iov_virtfn_devfn(struct pci_dev *dev, int id) { return -ENOSYS; } +static inline int pci_iov_vf_id(struct pci_dev *dev) +{ + return -ENOSYS; +} + static inline int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn) { return -ENODEV; } From patchwork Wed Sep 22 10:38:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 12510189 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-20.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E1CAC433EF for ; Wed, 22 Sep 2021 10:39:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8566A61211 for ; Wed, 22 Sep 2021 10:39:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234936AbhIVKkf (ORCPT ); Wed, 22 Sep 2021 06:40:35 -0400 Received: from mail.kernel.org ([198.145.29.99]:48610 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234734AbhIVKke (ORCPT ); Wed, 22 Sep 2021 06:40:34 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 476D261242; Wed, 22 Sep 2021 10:39:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1632307145; bh=BcqOr4/BxckdG0JgeNfJQFTFky8MX/TUfOd14KVupMg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=NO7+DDQKkUHe7MiF7RQT1bVBXOL2lYpqKp+0Ducw833CAOcSDD17RVIKO1Xlcmhha qAK7XZgWeoTr0t3Sw0vnaMW+1qy9SG1EalylI50ryXMVFLn48PfrJIzysTvtIXI3Yd edP8Z3l6ZdORIuDJ/c4Ae8J9Bxf7D9w8mZw8Hk3YUteyd76r6z7Q7MIedjHjGI1hZ/ KLNrcE8hCecIMtLEQYa5EAnj+NFB0Hl6Z7isQQT+2t6HppHdJmrb34STPrgi7Q2JpH VxId5zhQyJLVystRl10Avj5WSMlmNPPwywY1/Od0m2dasU3bldr3ownauJ4LQyUoor Vn/oZBE7cmBtQ== From: Leon Romanovsky To: Doug Ledford , Jason Gunthorpe Cc: Yishai Hadas , Alex Williamson , Bjorn Helgaas , "David S. Miller" , Jakub Kicinski , Kirti Wankhede , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-rdma@vger.kernel.org, netdev@vger.kernel.org, Saeed Mahameed Subject: [PATCH mlx5-next 2/7] vfio: Add an API to check migration state transition validity Date: Wed, 22 Sep 2021 13:38:51 +0300 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org From: Yishai Hadas Add an API in the core layer to check migration state transition validity as part of a migration flow. The valid transitions follow the expected usage as described in uapi/vfio.h and triggered by QEMU. This ensures that all migration implementations follow a consistent migration state machine. Signed-off-by: Yishai Hadas Reviewed-by: Kirti Wankhede Signed-off-by: Jason Gunthorpe Signed-off-by: Leon Romanovsky --- drivers/vfio/vfio.c | 41 +++++++++++++++++++++++++++++++++++++++++ include/linux/vfio.h | 1 + 2 files changed, 42 insertions(+) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index 3c034fe14ccb..c3ca33e513c8 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -1664,6 +1664,47 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep) return 0; } +/** + * vfio_change_migration_state_allowed - Checks whether a migration state + * transition is valid. + * @new_state: The new state to move to. + * @old_state: The old state. + * Return: true if the transition is valid. + */ +bool vfio_change_migration_state_allowed(u32 new_state, u32 old_state) +{ + enum { MAX_STATE = VFIO_DEVICE_STATE_RESUMING }; + static const u8 vfio_from_state_table[MAX_STATE + 1][MAX_STATE + 1] = { + [VFIO_DEVICE_STATE_STOP] = { + [VFIO_DEVICE_STATE_RUNNING] = 1, + [VFIO_DEVICE_STATE_RESUMING] = 1, + }, + [VFIO_DEVICE_STATE_RUNNING] = { + [VFIO_DEVICE_STATE_STOP] = 1, + [VFIO_DEVICE_STATE_SAVING] = 1, + [VFIO_DEVICE_STATE_SAVING | VFIO_DEVICE_STATE_RUNNING] = 1, + }, + [VFIO_DEVICE_STATE_SAVING] = { + [VFIO_DEVICE_STATE_STOP] = 1, + [VFIO_DEVICE_STATE_RUNNING] = 1, + }, + [VFIO_DEVICE_STATE_SAVING | VFIO_DEVICE_STATE_RUNNING] = { + [VFIO_DEVICE_STATE_RUNNING] = 1, + [VFIO_DEVICE_STATE_SAVING] = 1, + }, + [VFIO_DEVICE_STATE_RESUMING] = { + [VFIO_DEVICE_STATE_RUNNING] = 1, + [VFIO_DEVICE_STATE_STOP] = 1, + }, + }; + + if (new_state > MAX_STATE || old_state > MAX_STATE) + return false; + + return vfio_from_state_table[old_state][new_state]; +} +EXPORT_SYMBOL_GPL(vfio_change_migration_state_allowed); + static long vfio_device_fops_unl_ioctl(struct file *filep, unsigned int cmd, unsigned long arg) { diff --git a/include/linux/vfio.h b/include/linux/vfio.h index b53a9557884a..e65137a708f1 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -83,6 +83,7 @@ extern struct vfio_device *vfio_device_get_from_dev(struct device *dev); extern void vfio_device_put(struct vfio_device *device); int vfio_assign_device_set(struct vfio_device *device, void *set_id); +bool vfio_change_migration_state_allowed(u32 new_state, u32 old_state); /* events for the backend driver notify callback */ enum vfio_iommu_notify_type { From patchwork Wed Sep 22 10:38:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 12510191 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-20.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6F485C433F5 for ; Wed, 22 Sep 2021 10:39:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 58F6861267 for ; Wed, 22 Sep 2021 10:39:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234966AbhIVKkj (ORCPT ); Wed, 22 Sep 2021 06:40:39 -0400 Received: from mail.kernel.org ([198.145.29.99]:48696 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234734AbhIVKki (ORCPT ); Wed, 22 Sep 2021 06:40:38 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id A7FF16124A; Wed, 22 Sep 2021 10:39:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1632307148; bh=/gc7+wOTA0t/zZJ425FFjXn+DjYzc8eMUW6GifvrrhE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=rnnkUJn0WU39rcKQfn/jPed6FTY7mF3iRgRKZsitvrugUkayo382ItD0I9O7Y7lME g9JN6FPXLQXmVPK5g3Jp9vIrpSm5LFZ+DYI5yMH7xvVVphqJQ3HvVIiDRsxK9ueQOR sXUFS0y4tR7caLpFZulhbVd8CMK+4OZ+i89iTl8tWjA+G94saKvinTp0MEbV+0ZOvE UzJTon0x5dPZsV7s6OQAGxiCNyj1VJ4xi22Ktgkeu+BUkHllCXONviGs/CDpPi22HU /VhzaEFw8e/k2QfrtXjkjLhqQ1AMRO2aL7+kZVHygJyRNZSHs2sqKBzkHoC+BM05if x7r15FL+ZxSSw== From: Leon Romanovsky To: Doug Ledford , Jason Gunthorpe Cc: Yishai Hadas , Alex Williamson , Bjorn Helgaas , "David S. Miller" , Jakub Kicinski , Kirti Wankhede , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-rdma@vger.kernel.org, netdev@vger.kernel.org, Saeed Mahameed Subject: [PATCH mlx5-next 3/7] vfio/pci_core: Make the region->release() function optional Date: Wed, 22 Sep 2021 13:38:52 +0300 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org From: Yishai Hadas Make the region->release() function optional as in some cases there is nothing to do by driver as part of it. This is needed for coming patch from this series once we add mlx5_vfio_cpi driver to support live migration but we don't need a migration release function. Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky Reviewed-by: Max Gurtovoy --- drivers/vfio/pci/vfio_pci_core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index 68198e0f2a63..3ddc3adb24de 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -341,7 +341,8 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev) vdev->virq_disabled = false; for (i = 0; i < vdev->num_regions; i++) - vdev->region[i].ops->release(vdev, &vdev->region[i]); + if (vdev->region[i].ops->release) + vdev->region[i].ops->release(vdev, &vdev->region[i]); vdev->num_regions = 0; kfree(vdev->region); From patchwork Wed Sep 22 10:38:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 12510197 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-20.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1DEEC433FE for ; Wed, 22 Sep 2021 10:39:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9A15E61350 for ; Wed, 22 Sep 2021 10:39:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235271AbhIVKlA (ORCPT ); Wed, 22 Sep 2021 06:41:00 -0400 Received: from mail.kernel.org ([198.145.29.99]:49154 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235036AbhIVKkw (ORCPT ); Wed, 22 Sep 2021 06:40:52 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 6AC956127A; Wed, 22 Sep 2021 10:39:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1632307162; bh=z5oZXLI0h8Ek3djMH7ls/qPFRJgYxR58aLTQEnNJkhg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=OqoIKrKGHdmywiecobbj4FTQx26BxxNhYFCsTT/F3ZHD/XtJAj32GARC17NOcWsFM NdhQEci0qF6TfLU/2kWetEyCx4KqSGSuJwqN6ur6J9IJTZEtW7WXOtMtwXQ7d8fHYo Ypj/OGqdHPZwAoR7P4dbI2dn0j9oRa2wTeH6FFNRJ2793Ftf6/PLtahuAHtKm7TvDJ +zBT4d9qC1fpaKkCKmS+LyrDT7DA6zqlYpA4rRM8PX4af7jV8DiUN0wa+QM0KfNCB2 E/9s4zz7TiD9/uTatsQniFYWspR55RysWsoqpfjdiKRZg2dmmDDEEZ+2eEkfLNpO4h cYcPikqUkz+XQ== From: Leon Romanovsky To: Doug Ledford , Jason Gunthorpe Cc: Yishai Hadas , Alex Williamson , Bjorn Helgaas , Jakub Kicinski , Kirti Wankhede , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-rdma@vger.kernel.org, netdev@vger.kernel.org, Saeed Mahameed Subject: [PATCH mlx5-next 4/7] net/mlx5: Introduce migration bits and structures Date: Wed, 22 Sep 2021 13:38:53 +0300 Message-Id: X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org From: Yishai Hadas Introduce migration IFC related stuff to enable migration commands. Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- include/linux/mlx5/mlx5_ifc.h | 145 +++++++++++++++++++++++++++++++++- 1 file changed, 144 insertions(+), 1 deletion(-) diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index d90a65b6824f..366c7b030eb7 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -126,6 +126,11 @@ enum { MLX5_CMD_OP_QUERY_SF_PARTITION = 0x111, MLX5_CMD_OP_ALLOC_SF = 0x113, MLX5_CMD_OP_DEALLOC_SF = 0x114, + MLX5_CMD_OP_SUSPEND_VHCA = 0x115, + MLX5_CMD_OP_RESUME_VHCA = 0x116, + MLX5_CMD_OP_QUERY_VHCA_MIGRATION_STATE = 0x117, + MLX5_CMD_OP_SAVE_VHCA_STATE = 0x118, + MLX5_CMD_OP_LOAD_VHCA_STATE = 0x119, MLX5_CMD_OP_CREATE_MKEY = 0x200, MLX5_CMD_OP_QUERY_MKEY = 0x201, MLX5_CMD_OP_DESTROY_MKEY = 0x202, @@ -1719,7 +1724,9 @@ struct mlx5_ifc_cmd_hca_cap_bits { u8 reserved_at_682[0x1]; u8 log_max_sf[0x5]; u8 apu[0x1]; - u8 reserved_at_689[0x7]; + u8 reserved_at_689[0x4]; + u8 migration[0x1]; + u8 reserved_at_68d[0x2]; u8 log_min_sf_size[0x8]; u8 max_num_sf_partitions[0x8]; @@ -11146,4 +11153,140 @@ enum { MLX5_MTT_PERM_RW = MLX5_MTT_PERM_READ | MLX5_MTT_PERM_WRITE, }; +enum { + MLX5_SUSPEND_VHCA_IN_OP_MOD_SUSPEND_MASTER = 0x0, + MLX5_SUSPEND_VHCA_IN_OP_MOD_SUSPEND_SLAVE = 0x1, +}; + +struct mlx5_ifc_suspend_vhca_in_bits { + u8 opcode[0x10]; + u8 uid[0x10]; + + u8 reserved_at_20[0x10]; + u8 op_mod[0x10]; + + u8 reserved_at_40[0x10]; + u8 vhca_id[0x10]; + + u8 reserved_at_60[0x20]; +}; + +struct mlx5_ifc_suspend_vhca_out_bits { + u8 status[0x8]; + u8 reserved_at_8[0x18]; + + u8 syndrome[0x20]; + + u8 reserved_at_40[0x40]; +}; + +enum { + MLX5_RESUME_VHCA_IN_OP_MOD_RESUME_SLAVE = 0x0, + MLX5_RESUME_VHCA_IN_OP_MOD_RESUME_MASTER = 0x1, +}; + +struct mlx5_ifc_resume_vhca_in_bits { + u8 opcode[0x10]; + u8 uid[0x10]; + + u8 reserved_at_20[0x10]; + u8 op_mod[0x10]; + + u8 reserved_at_40[0x10]; + u8 vhca_id[0x10]; + + u8 reserved_at_60[0x20]; +}; + +struct mlx5_ifc_resume_vhca_out_bits { + u8 status[0x8]; + u8 reserved_at_8[0x18]; + + u8 syndrome[0x20]; + + u8 reserved_at_40[0x40]; +}; + +struct mlx5_ifc_query_vhca_migration_state_in_bits { + u8 opcode[0x10]; + u8 uid[0x10]; + + u8 reserved_at_20[0x10]; + u8 op_mod[0x10]; + + u8 reserved_at_40[0x10]; + u8 vhca_id[0x10]; + + u8 reserved_at_60[0x20]; +}; + +struct mlx5_ifc_query_vhca_migration_state_out_bits { + u8 status[0x8]; + u8 reserved_at_8[0x18]; + + u8 syndrome[0x20]; + + u8 reserved_at_40[0x40]; + + u8 required_umem_size[0x20]; + + u8 reserved_at_a0[0x160]; +}; + +struct mlx5_ifc_save_vhca_state_in_bits { + u8 opcode[0x10]; + u8 uid[0x10]; + + u8 reserved_at_20[0x10]; + u8 op_mod[0x10]; + + u8 reserved_at_40[0x10]; + u8 vhca_id[0x10]; + + u8 reserved_at_60[0x20]; + + u8 va[0x40]; + + u8 mkey[0x20]; + + u8 size[0x20]; +}; + +struct mlx5_ifc_save_vhca_state_out_bits { + u8 status[0x8]; + u8 reserved_at_8[0x18]; + + u8 syndrome[0x20]; + + u8 reserved_at_40[0x40]; +}; + +struct mlx5_ifc_load_vhca_state_in_bits { + u8 opcode[0x10]; + u8 uid[0x10]; + + u8 reserved_at_20[0x10]; + u8 op_mod[0x10]; + + u8 reserved_at_40[0x10]; + u8 vhca_id[0x10]; + + u8 reserved_at_60[0x20]; + + u8 va[0x40]; + + u8 mkey[0x20]; + + u8 size[0x20]; +}; + +struct mlx5_ifc_load_vhca_state_out_bits { + u8 status[0x8]; + u8 reserved_at_8[0x18]; + + u8 syndrome[0x20]; + + u8 reserved_at_40[0x40]; +}; + #endif /* MLX5_IFC_H */ From patchwork Wed Sep 22 10:38:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 12510195 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-20.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0AE1C4332F for ; Wed, 22 Sep 2021 10:39:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AF0A061350 for ; Wed, 22 Sep 2021 10:39:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235150AbhIVKk5 (ORCPT ); Wed, 22 Sep 2021 06:40:57 -0400 Received: from mail.kernel.org ([198.145.29.99]:48978 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235122AbhIVKkp (ORCPT ); Wed, 22 Sep 2021 06:40:45 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 8BAD861267; Wed, 22 Sep 2021 10:39:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1632307155; bh=1qzyHXULrGHTwODUKDiWXKGr1uQ4+mGgIlC+7I0WkZo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=dtgZQCZ1fSQg9R/3fhVf72zdgoHr/LyJzOvOjgYivRPWggxSsxp1SZfmTDdXXzvY2 Wqqnf/nHCGoJRzYKFQAmaGJOn8cY8U4Z0qMuXcdO+YRXwl4mMT+AWGnqk/3id8mT70 UFcdTPRcaFqAeBUAa/4a+g5Gdf67eerRwN0AVTYwzHFHfNsETNbjBwtVSg6K3NSFEc MvytLUBiggfx1/WbmiAdS3ZLSpII4T6V4b7oZVng1sijHi5qG9/dlpZMiqcALe4nis kbQXr4xdJUKjsGRxC2sCbACYhQRuoPCGjDHReq1iYSuYAaBXf2gYej9U6mLE/BjmAo m5G4izsSg7MgA== From: Leon Romanovsky To: Doug Ledford , Jason Gunthorpe Cc: Yishai Hadas , Alex Williamson , Bjorn Helgaas , Jakub Kicinski , Kirti Wankhede , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-rdma@vger.kernel.org, netdev@vger.kernel.org, Saeed Mahameed Subject: [PATCH mlx5-next 5/7] net/mlx5: Expose APIs to get/put the mlx5 core device Date: Wed, 22 Sep 2021 13:38:54 +0300 Message-Id: <25d023ef23ef78fcf9b99bf428244149e1f78e95.1632305919.git.leonro@nvidia.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org From: Yishai Hadas Expose an API to get the mlx5 core device from a given PCI device if mlx5_core is its driver. Upon the get API we stay with the intf_state_mutex locked to make sure that the device can't be gone/unloaded till the caller will complete its job over the device, this expects to be for a short period of time for any flow that the lock is taken. Upon the put API we unlock the intf_state_mutex. The use case for those APIs is the migration flow of a VF over VFIO PCI. In that case the VF doesn't ride on mlx5_core, because the device is driving *two* different PCI devices, the PF owned by mlx5_core and the VF owned by the vfio driver. The mlx5_core of the PF is accessed only during the narrow window of the VF's ioctl that requires its services. This allows the PF driver to be more independent of the VF driver, so long as it doesn't reset the FW. Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- .../net/ethernet/mellanox/mlx5/core/main.c | 43 +++++++++++++++++++ include/linux/mlx5/driver.h | 3 ++ 2 files changed, 46 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index 79482824c64f..fcc8b7830421 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -1795,6 +1795,49 @@ static struct pci_driver mlx5_core_driver = { .sriov_set_msix_vec_count = mlx5_core_sriov_set_msix_vec_count, }; +/** + * mlx5_get_core_dev - Get the mlx5 core device from a given PCI device if + * mlx5_core is its driver. + * @pdev: The associated PCI device. + * + * Upon return the interface state lock stay held to let caller uses it safely. + * Caller must ensure to use the returned mlx5 device for a narrow window + * and put it back with mlx5_put_core_dev() immediately once usage was over. + * + * Return: Pointer to the associated mlx5_core_dev or NULL. + */ +struct mlx5_core_dev *mlx5_get_core_dev(struct pci_dev *pdev) + __acquires(&mdev->intf_state_mutex) +{ + struct mlx5_core_dev *mdev; + + device_lock(&pdev->dev); + if (pdev->driver != &mlx5_core_driver) { + device_unlock(&pdev->dev); + return NULL; + } + + mdev = pci_get_drvdata(pdev); + mutex_lock(&mdev->intf_state_mutex); + device_unlock(&pdev->dev); + + return mdev; +} +EXPORT_SYMBOL(mlx5_get_core_dev); + +/** + * mlx5_put_core_dev - Put the mlx5 core device back. + * @mdev: The mlx5 core device. + * + * Upon return the interface state lock is unlocked and caller should not + * access the mdev any more. + */ +void mlx5_put_core_dev(struct mlx5_core_dev *mdev) +{ + mutex_unlock(&mdev->intf_state_mutex); +} +EXPORT_SYMBOL(mlx5_put_core_dev); + static void mlx5_core_verify_params(void) { if (prof_sel >= ARRAY_SIZE(profile)) { diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 1b8bae246b28..e9a96904d6f1 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -1156,6 +1156,9 @@ int mlx5_dm_sw_icm_alloc(struct mlx5_core_dev *dev, enum mlx5_sw_icm_type type, int mlx5_dm_sw_icm_dealloc(struct mlx5_core_dev *dev, enum mlx5_sw_icm_type type, u64 length, u16 uid, phys_addr_t addr, u32 obj_id); +struct mlx5_core_dev *mlx5_get_core_dev(struct pci_dev *pdev); +void mlx5_put_core_dev(struct mlx5_core_dev *mdev); + #ifdef CONFIG_MLX5_CORE_IPOIB struct net_device *mlx5_rdma_netdev_alloc(struct mlx5_core_dev *mdev, struct ib_device *ibdev, From patchwork Wed Sep 22 10:38:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 12510199 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-20.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5881AC433F5 for ; Wed, 22 Sep 2021 10:39:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4310D6126A for ; Wed, 22 Sep 2021 10:39:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235332AbhIVKlJ (ORCPT ); Wed, 22 Sep 2021 06:41:09 -0400 Received: from mail.kernel.org ([198.145.29.99]:49070 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235101AbhIVKks (ORCPT ); Wed, 22 Sep 2021 06:40:48 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 0009F61283; Wed, 22 Sep 2021 10:39:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1632307158; bh=TGRBrptdrH21ZWPUeUsU1Gvcbh1WTC12wznn0duabtI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=I6UqKdS79BZGJpxKnISgIIta7TLK71zJva+RzSCoIuSyj/obIJxmiyuB8Fudz3bfJ JfyLdfKBiB3FsAE1T8qM0BRLSYLqnweQZVMBCC7RiG+EHlJzYSljbSASqnpKrErZ/U ESyK6vE6GtUQEUsckyjJWzdIfOW3rFMThthd4f7AuubUH6reDtsoiJc6kqdP27CouP te/cULlK1eGjN9gZf4TTrCfFhVcEspxtBjcAjZ3HSLObYlCab5mIdPbptVQ/dyrEBb nZvz3XieljnhLrCAklG87zb3sQdjPoK5YBOVpWrO2pX/pm39ou5wHzks32IdH+w1Fd Kfw8IfywwPapw== From: Leon Romanovsky To: Doug Ledford , Jason Gunthorpe Cc: Yishai Hadas , Alex Williamson , Bjorn Helgaas , "David S. Miller" , Jakub Kicinski , Kirti Wankhede , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-rdma@vger.kernel.org, netdev@vger.kernel.org, Saeed Mahameed Subject: [PATCH mlx5-next 6/7] mlx5_vfio_pci: Expose migration commands over mlx5 device Date: Wed, 22 Sep 2021 13:38:55 +0300 Message-Id: <7fc0cf0d76bb6df4679b55974959f5494e48f54d.1632305919.git.leonro@nvidia.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org From: Yishai Hadas Expose migration commands over the device, it includes: suspend, resume, get vhca id, query/save/load state. As part of this adds the APIs and data structure that are needed to manage the migration data. Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- drivers/vfio/pci/mlx5_vfio_pci_cmd.c | 358 +++++++++++++++++++++++++++ drivers/vfio/pci/mlx5_vfio_pci_cmd.h | 43 ++++ 2 files changed, 401 insertions(+) create mode 100644 drivers/vfio/pci/mlx5_vfio_pci_cmd.c create mode 100644 drivers/vfio/pci/mlx5_vfio_pci_cmd.h diff --git a/drivers/vfio/pci/mlx5_vfio_pci_cmd.c b/drivers/vfio/pci/mlx5_vfio_pci_cmd.c new file mode 100644 index 000000000000..7e4f83d196a8 --- /dev/null +++ b/drivers/vfio/pci/mlx5_vfio_pci_cmd.c @@ -0,0 +1,358 @@ +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB +/* + * Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved + */ + +#include "mlx5_vfio_pci_cmd.h" + +int mlx5vf_cmd_suspend_vhca(struct pci_dev *pdev, u16 vhca_id, u16 op_mod) +{ + struct mlx5_core_dev *mdev = mlx5_get_core_dev(pci_physfn(pdev)); + u32 out[MLX5_ST_SZ_DW(suspend_vhca_out)] = {}; + u32 in[MLX5_ST_SZ_DW(suspend_vhca_in)] = {}; + int ret; + + if (!mdev) + return -ENOTCONN; + + MLX5_SET(suspend_vhca_in, in, opcode, MLX5_CMD_OP_SUSPEND_VHCA); + MLX5_SET(suspend_vhca_in, in, vhca_id, vhca_id); + MLX5_SET(suspend_vhca_in, in, op_mod, op_mod); + + ret = mlx5_cmd_exec_inout(mdev, suspend_vhca, in, out); + mlx5_put_core_dev(mdev); + return ret; +} + +int mlx5vf_cmd_resume_vhca(struct pci_dev *pdev, u16 vhca_id, u16 op_mod) +{ + struct mlx5_core_dev *mdev = mlx5_get_core_dev(pci_physfn(pdev)); + u32 out[MLX5_ST_SZ_DW(resume_vhca_out)] = {}; + u32 in[MLX5_ST_SZ_DW(resume_vhca_in)] = {}; + int ret; + + if (!mdev) + return -ENOTCONN; + + MLX5_SET(resume_vhca_in, in, opcode, MLX5_CMD_OP_RESUME_VHCA); + MLX5_SET(resume_vhca_in, in, vhca_id, vhca_id); + MLX5_SET(resume_vhca_in, in, op_mod, op_mod); + + ret = mlx5_cmd_exec_inout(mdev, resume_vhca, in, out); + mlx5_put_core_dev(mdev); + return ret; +} + +int mlx5vf_cmd_query_vhca_migration_state(struct pci_dev *pdev, u16 vhca_id, + u32 *state_size) +{ + struct mlx5_core_dev *mdev = mlx5_get_core_dev(pci_physfn(pdev)); + u32 out[MLX5_ST_SZ_DW(query_vhca_migration_state_out)] = {}; + u32 in[MLX5_ST_SZ_DW(query_vhca_migration_state_in)] = {}; + int ret; + + if (!mdev) + return -ENOTCONN; + + MLX5_SET(query_vhca_migration_state_in, in, opcode, + MLX5_CMD_OP_QUERY_VHCA_MIGRATION_STATE); + MLX5_SET(query_vhca_migration_state_in, in, vhca_id, vhca_id); + MLX5_SET(query_vhca_migration_state_in, in, op_mod, 0); + + ret = mlx5_cmd_exec_inout(mdev, query_vhca_migration_state, in, out); + if (ret) + goto end; + + *state_size = MLX5_GET(query_vhca_migration_state_out, out, + required_umem_size); + +end: + mlx5_put_core_dev(mdev); + return ret; +} + +int mlx5vf_cmd_get_vhca_id(struct pci_dev *pdev, u16 function_id, u16 *vhca_id) +{ + struct mlx5_core_dev *mdev = mlx5_get_core_dev(pci_physfn(pdev)); + u32 in[MLX5_ST_SZ_DW(query_hca_cap_in)] = {}; + int out_size; + void *out; + int ret; + + if (!mdev) + return -ENOTCONN; + + out_size = MLX5_ST_SZ_BYTES(query_hca_cap_out); + out = kzalloc(out_size, GFP_KERNEL); + if (!out) { + ret = -ENOMEM; + goto end; + } + + MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP); + MLX5_SET(query_hca_cap_in, in, other_function, 1); + MLX5_SET(query_hca_cap_in, in, function_id, function_id); + MLX5_SET(query_hca_cap_in, in, op_mod, + MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE << 1 | + HCA_CAP_OPMOD_GET_CUR); + + ret = mlx5_cmd_exec_inout(mdev, query_hca_cap, in, out); + if (ret) + goto err_exec; + + *vhca_id = MLX5_GET(query_hca_cap_out, out, capability.cmd_hca_cap.vhca_id); + +err_exec: + kfree(out); +end: + mlx5_put_core_dev(mdev); + return ret; +} + +static int _create_state_mkey(struct mlx5_core_dev *mdev, u32 pdn, + struct mlx5_vhca_state_data *state, + struct mlx5_core_mkey *mkey) +{ + struct sg_dma_page_iter dma_iter; + int err = 0, inlen; + __be64 *mtt; + void *mkc; + u32 *in; + + inlen = MLX5_ST_SZ_BYTES(create_mkey_in) + + sizeof(*mtt) * round_up(state->num_pages, 2); + + in = kvzalloc(inlen, GFP_KERNEL); + if (!in) + return -ENOMEM; + + MLX5_SET(create_mkey_in, in, translations_octword_actual_size, + DIV_ROUND_UP(state->num_pages, 2)); + mtt = (__be64 *)MLX5_ADDR_OF(create_mkey_in, in, klm_pas_mtt); + + for_each_sgtable_dma_page(&state->mig_data.table.sgt, &dma_iter, 0) + *mtt++ = cpu_to_be64(sg_page_iter_dma_address(&dma_iter)); + + mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry); + MLX5_SET(mkc, mkc, access_mode_1_0, MLX5_MKC_ACCESS_MODE_MTT); + MLX5_SET(mkc, mkc, lr, 1); + MLX5_SET(mkc, mkc, lw, 1); + MLX5_SET(mkc, mkc, pd, pdn); + MLX5_SET(mkc, mkc, bsf_octword_size, 0); + MLX5_SET(mkc, mkc, qpn, 0xffffff); + MLX5_SET(mkc, mkc, log_page_size, PAGE_SHIFT); + MLX5_SET(mkc, mkc, translations_octword_size, + DIV_ROUND_UP(state->num_pages, 2)); + MLX5_SET64(mkc, mkc, start_addr, mkey->iova); + MLX5_SET64(mkc, mkc, len, state->num_pages * PAGE_SIZE); + err = mlx5_core_create_mkey(mdev, mkey, in, inlen); + + kvfree(in); + + return err; +} + +struct page *mlx5vf_get_migration_page(struct migration_data *data, + unsigned long offset) +{ + unsigned long cur_offset = 0; + struct scatterlist *sg; + unsigned int i; + + if (offset < data->last_offset || !data->last_offset_sg) { + data->last_offset = 0; + data->last_offset_sg = data->table.sgt.sgl; + data->sg_last_entry = 0; + } + + cur_offset = data->last_offset; + + for_each_sg(data->last_offset_sg, sg, + data->table.sgt.orig_nents - data->sg_last_entry, i) { + if (offset < sg->length + cur_offset) { + data->last_offset_sg = sg; + data->sg_last_entry += i; + data->last_offset = cur_offset; + return nth_page(sg_page(sg), + (offset - cur_offset) / PAGE_SIZE); + } + cur_offset += sg->length; + } + return NULL; +} + +void mlx5vf_reset_vhca_state(struct mlx5_vhca_state_data *state) +{ + struct migration_data *data = &state->mig_data; + struct sg_page_iter sg_iter; + + if (!data->table.prv) + goto end; + + /* Undo alloc_pages_bulk_array() */ + for_each_sgtable_page(&data->table.sgt, &sg_iter, 0) + __free_page(sg_page_iter_page(&sg_iter)); + sg_free_append_table(&data->table); +end: + memset(state, 0, sizeof(*state)); +} + +int mlx5vf_add_migration_pages(struct mlx5_vhca_state_data *state, + unsigned int npages) +{ + unsigned int to_alloc = npages; + struct page **page_list; + unsigned long filled; + unsigned int to_fill; + int ret = 0; + + to_fill = min_t(unsigned int, npages, PAGE_SIZE / sizeof(*page_list)); + page_list = kvzalloc(to_fill * sizeof(*page_list), GFP_KERNEL); + if (!page_list) + return -ENOMEM; + + do { + filled = alloc_pages_bulk_array(GFP_KERNEL, to_fill, + page_list); + if (!filled) { + ret = -ENOMEM; + goto err; + } + to_alloc -= filled; + ret = sg_alloc_append_table_from_pages( + &state->mig_data.table, page_list, filled, 0, + filled << PAGE_SHIFT, UINT_MAX, SG_MAX_SINGLE_ALLOC, + GFP_KERNEL); + + if (ret) + goto err; + /* clean input for another bulk allocation */ + memset(page_list, 0, filled * sizeof(*page_list)); + to_fill = min_t(unsigned int, to_alloc, + PAGE_SIZE / sizeof(*page_list)); + } while (to_alloc > 0); + + kvfree(page_list); + state->num_pages += npages; + + return 0; + +err: + kvfree(page_list); + return ret; +} + +int mlx5vf_cmd_save_vhca_state(struct pci_dev *pdev, u16 vhca_id, + u64 state_size, + struct mlx5_vhca_state_data *state) +{ + struct mlx5_core_dev *mdev = mlx5_get_core_dev(pci_physfn(pdev)); + u32 out[MLX5_ST_SZ_DW(save_vhca_state_out)] = {}; + u32 in[MLX5_ST_SZ_DW(save_vhca_state_in)] = {}; + struct mlx5_core_mkey mkey = {}; + u32 pdn; + int err; + + if (!mdev) + return -ENOTCONN; + + err = mlx5_core_alloc_pd(mdev, &pdn); + if (err) + goto end; + + err = mlx5vf_add_migration_pages(state, + DIV_ROUND_UP_ULL(state_size, PAGE_SIZE)); + if (err < 0) + goto err_alloc_pages; + + err = dma_map_sgtable(mdev->device, &state->mig_data.table.sgt, + DMA_FROM_DEVICE, 0); + if (err) + goto err_reg_dma; + + err = _create_state_mkey(mdev, pdn, state, &mkey); + if (err) + goto err_create_mkey; + + MLX5_SET(save_vhca_state_in, in, opcode, + MLX5_CMD_OP_SAVE_VHCA_STATE); + MLX5_SET(save_vhca_state_in, in, op_mod, 0); + MLX5_SET(save_vhca_state_in, in, vhca_id, vhca_id); + + MLX5_SET64(save_vhca_state_in, in, va, mkey.iova); + MLX5_SET(save_vhca_state_in, in, mkey, mkey.key); + MLX5_SET(save_vhca_state_in, in, size, mkey.size); + + err = mlx5_cmd_exec_inout(mdev, save_vhca_state, in, out); + if (err) + goto err_exec; + + state->state_size = state_size; + + mlx5_core_destroy_mkey(mdev, &mkey); + mlx5_core_dealloc_pd(mdev, pdn); + dma_unmap_sgtable(mdev->device, &state->mig_data.table.sgt, + DMA_FROM_DEVICE, 0); + mlx5_put_core_dev(mdev); + + return 0; + +err_exec: + mlx5_core_destroy_mkey(mdev, &mkey); +err_create_mkey: + dma_unmap_sgtable(mdev->device, &state->mig_data.table.sgt, + DMA_FROM_DEVICE, 0); +err_reg_dma: + mlx5vf_reset_vhca_state(state); +err_alloc_pages: + mlx5_core_dealloc_pd(mdev, pdn); +end: + mlx5_put_core_dev(mdev); + return err; +} + +int mlx5vf_cmd_load_vhca_state(struct pci_dev *pdev, u16 vhca_id, + struct mlx5_vhca_state_data *state) +{ + struct mlx5_core_dev *mdev = mlx5_get_core_dev(pci_physfn(pdev)); + u32 out[MLX5_ST_SZ_DW(save_vhca_state_out)] = {}; + u32 in[MLX5_ST_SZ_DW(save_vhca_state_in)] = {}; + struct mlx5_core_mkey mkey = {}; + u32 pdn; + int err; + + if (!mdev) + return -ENOTCONN; + + err = mlx5_core_alloc_pd(mdev, &pdn); + if (err) + goto end; + + err = dma_map_sgtable(mdev->device, &state->mig_data.table.sgt, + DMA_TO_DEVICE, 0); + if (err) + goto err_reg; + + err = _create_state_mkey(mdev, pdn, state, &mkey); + if (err) + goto err_mkey; + + MLX5_SET(load_vhca_state_in, in, opcode, + MLX5_CMD_OP_LOAD_VHCA_STATE); + MLX5_SET(load_vhca_state_in, in, op_mod, 0); + MLX5_SET(load_vhca_state_in, in, vhca_id, vhca_id); + MLX5_SET64(load_vhca_state_in, in, va, mkey.iova); + MLX5_SET(load_vhca_state_in, in, mkey, mkey.key); + MLX5_SET(load_vhca_state_in, in, size, state->state_size); + + err = mlx5_cmd_exec_inout(mdev, load_vhca_state, in, out); + mlx5_core_destroy_mkey(mdev, &mkey); +err_mkey: + dma_unmap_sgtable(mdev->device, &state->mig_data.table.sgt, + DMA_TO_DEVICE, 0); +err_reg: + mlx5_core_dealloc_pd(mdev, pdn); +end: + mlx5_put_core_dev(mdev); + return err; +} diff --git a/drivers/vfio/pci/mlx5_vfio_pci_cmd.h b/drivers/vfio/pci/mlx5_vfio_pci_cmd.h new file mode 100644 index 000000000000..66221df24b19 --- /dev/null +++ b/drivers/vfio/pci/mlx5_vfio_pci_cmd.h @@ -0,0 +1,43 @@ +/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */ +/* + * Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved. + */ + +#ifndef MLX5_VFIO_CMD_H +#define MLX5_VFIO_CMD_H + +#include +#include + +struct migration_data { + struct sg_append_table table; + + struct scatterlist *last_offset_sg; + unsigned int sg_last_entry; + unsigned long last_offset; +}; + +/* state data of vhca to be used as part of migration flow */ +struct mlx5_vhca_state_data { + u64 state_size; + u64 num_pages; + u32 win_start_offset; + struct migration_data mig_data; +}; + +int mlx5vf_cmd_suspend_vhca(struct pci_dev *pdev, u16 vhca_id, u16 op_mod); +int mlx5vf_cmd_resume_vhca(struct pci_dev *pdev, u16 vhca_id, u16 op_mod); +int mlx5vf_cmd_query_vhca_migration_state(struct pci_dev *pdev, u16 vhca_id, + uint32_t *state_size); +int mlx5vf_cmd_get_vhca_id(struct pci_dev *pdev, u16 function_id, u16 *vhca_id); +int mlx5vf_cmd_save_vhca_state(struct pci_dev *pdev, u16 vhca_id, + u64 state_size, + struct mlx5_vhca_state_data *state); +void mlx5vf_reset_vhca_state(struct mlx5_vhca_state_data *state); +int mlx5vf_cmd_load_vhca_state(struct pci_dev *pdev, u16 vhca_id, + struct mlx5_vhca_state_data *state); +int mlx5vf_add_migration_pages(struct mlx5_vhca_state_data *state, + unsigned int npages); +struct page *mlx5vf_get_migration_page(struct migration_data *data, + unsigned long offset); +#endif /* MLX5_VFIO_CMD_H */ From patchwork Wed Sep 22 10:38:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leon Romanovsky X-Patchwork-Id: 12510201 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-20.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0265C433F5 for ; Wed, 22 Sep 2021 10:39:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C8D576126A for ; Wed, 22 Sep 2021 10:39:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235156AbhIVKlO (ORCPT ); Wed, 22 Sep 2021 06:41:14 -0400 Received: from mail.kernel.org ([198.145.29.99]:49256 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235237AbhIVKkz (ORCPT ); Wed, 22 Sep 2021 06:40:55 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id CE2AD6127B; Wed, 22 Sep 2021 10:39:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1632307165; bh=xOTZ47fmx+uIZvI/Fo0JFPpoeZXkq/9twGVoQpazB4U=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=OmY0naqBax2lSjpsiQSvd+dw10WvvYR0dCwpnDYR0RCMZa7w4Xed83rLxdR8UCz/u y8idDe0zzF64Yn+fxLnh1q1k5zaVD1lYHPXYWtPksGjp1NsSdvd946gEvtfAt9BKip 6bGV+nnxS0amkTfqIQkJMnvp1X/UUDGWHMjOHUJkCJmwUnZMERFZR2SlfIl72PGJNJ V8+nQWJi60qyKY7iVWkWsf9BEWW7MDHCAuisT0uC5NWRKpON7359RhsySmhfoty/Fz 5QRdNTdnesXy9s22qvanbgglPUGo9K1GV+JHDEwQC/ZP05ZuxFcQFNvIGra65VHNZS N82lKmr0lLgmQ== From: Leon Romanovsky To: Doug Ledford , Jason Gunthorpe Cc: Yishai Hadas , Alex Williamson , Bjorn Helgaas , "David S. Miller" , Jakub Kicinski , Kirti Wankhede , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org, linux-rdma@vger.kernel.org, netdev@vger.kernel.org, Saeed Mahameed Subject: [PATCH mlx5-next 7/7] mlx5_vfio_pci: Implement vfio_pci driver for mlx5 devices Date: Wed, 22 Sep 2021 13:38:56 +0300 Message-Id: <51624515f1fe2d4043e0d10056f65b69f523bdc3.1632305919.git.leonro@nvidia.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org From: Yishai Hadas This patch adds support for vfio_pci driver for mlx5 devices. It uses vfio_pci_core to register to the VFIO subsystem and then implements the mlx5 specific logic in the migration area. The migration implementation follows the definition from uapi/vfio.h and uses the mlx5 VF->PF command channel to achieve it. This patch implements the suspend/resume flows. Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- drivers/vfio/pci/Kconfig | 11 + drivers/vfio/pci/Makefile | 3 + drivers/vfio/pci/mlx5_vfio_pci.c | 736 +++++++++++++++++++++++++++++++ 3 files changed, 750 insertions(+) create mode 100644 drivers/vfio/pci/mlx5_vfio_pci.c diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig index 860424ccda1b..c10b53028309 100644 --- a/drivers/vfio/pci/Kconfig +++ b/drivers/vfio/pci/Kconfig @@ -43,4 +43,15 @@ config VFIO_PCI_IGD To enable Intel IGD assignment through vfio-pci, say Y. endif + +config MLX5_VFIO_PCI + tristate "VFIO support for MLX5 PCI devices" + depends on MLX5_CORE + select VFIO_PCI_CORE + help + This provides a PCI support for MLX5 devices using the VFIO + framework. The device specific driver supports suspend/resume + of the MLX5 device. + + If you don't know what to do here, say N. endif diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile index 349d68d242b4..b9448bba0c83 100644 --- a/drivers/vfio/pci/Makefile +++ b/drivers/vfio/pci/Makefile @@ -7,3 +7,6 @@ obj-$(CONFIG_VFIO_PCI_CORE) += vfio-pci-core.o vfio-pci-y := vfio_pci.o vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o obj-$(CONFIG_VFIO_PCI) += vfio-pci.o + +mlx5-vfio-pci-y := mlx5_vfio_pci.o mlx5_vfio_pci_cmd.o +obj-$(CONFIG_MLX5_VFIO_PCI) += mlx5-vfio-pci.o diff --git a/drivers/vfio/pci/mlx5_vfio_pci.c b/drivers/vfio/pci/mlx5_vfio_pci.c new file mode 100644 index 000000000000..710a3ff9cbcc --- /dev/null +++ b/drivers/vfio/pci/mlx5_vfio_pci.c @@ -0,0 +1,736 @@ +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB +/* + * Copyright (c) 2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "mlx5_vfio_pci_cmd.h" + +enum { + MLX5VF_PCI_QUIESCED = 1 << 0, + MLX5VF_PCI_FREEZED = 1 << 1, +}; + +enum { + MLX5VF_REGION_PENDING_BYTES = 1 << 0, + MLX5VF_REGION_DATA_SIZE = 1 << 1, +}; + +#define MLX5VF_MIG_REGION_DATA_SIZE SZ_128K +/* Data section offset from migration region */ +#define MLX5VF_MIG_REGION_DATA_OFFSET \ + (sizeof(struct vfio_device_migration_info)) + +#define VFIO_DEVICE_MIGRATION_OFFSET(x) \ + (offsetof(struct vfio_device_migration_info, x)) + +struct mlx5vf_pci_migration_info { + u32 vfio_dev_state; /* VFIO_DEVICE_STATE_XXX */ + u32 dev_state; /* device migration state */ + u32 region_state; /* Use MLX5VF_REGION_XXX */ + u16 vhca_id; + struct mlx5_vhca_state_data vhca_state_data; +}; + +struct mlx5vf_pci_core_device { + struct vfio_pci_core_device core_device; + u8 migrate_cap:1; + struct mlx5vf_pci_migration_info vmig; +}; + +static int mlx5vf_pci_unquiesce_device(struct mlx5vf_pci_core_device *mvdev) +{ + int ret; + + if (!(mvdev->vmig.dev_state & MLX5VF_PCI_QUIESCED)) + return 0; + + ret = mlx5vf_cmd_resume_vhca(mvdev->core_device.pdev, + mvdev->vmig.vhca_id, + MLX5_RESUME_VHCA_IN_OP_MOD_RESUME_MASTER); + if (ret) + return ret; + + mvdev->vmig.dev_state &= ~MLX5VF_PCI_QUIESCED; + return 0; +} + +static int mlx5vf_pci_quiesce_device(struct mlx5vf_pci_core_device *mvdev) +{ + int ret; + + if (mvdev->vmig.dev_state & MLX5VF_PCI_QUIESCED) + return 0; + + ret = mlx5vf_cmd_suspend_vhca( + mvdev->core_device.pdev, mvdev->vmig.vhca_id, + MLX5_SUSPEND_VHCA_IN_OP_MOD_SUSPEND_MASTER); + if (ret) + return ret; + + mvdev->vmig.dev_state |= MLX5VF_PCI_QUIESCED; + return 0; +} + +static int mlx5vf_pci_unfreeze_device(struct mlx5vf_pci_core_device *mvdev) +{ + int ret; + + if (!(mvdev->vmig.dev_state & MLX5VF_PCI_FREEZED)) + return 0; + + ret = mlx5vf_cmd_resume_vhca(mvdev->core_device.pdev, + mvdev->vmig.vhca_id, + MLX5_RESUME_VHCA_IN_OP_MOD_RESUME_SLAVE); + if (ret) + return ret; + + mvdev->vmig.dev_state &= ~MLX5VF_PCI_FREEZED; + return 0; +} + +static int mlx5vf_pci_freeze_device(struct mlx5vf_pci_core_device *mvdev) +{ + int ret; + + if (mvdev->vmig.dev_state & MLX5VF_PCI_FREEZED) + return 0; + + ret = mlx5vf_cmd_suspend_vhca( + mvdev->core_device.pdev, mvdev->vmig.vhca_id, + MLX5_SUSPEND_VHCA_IN_OP_MOD_SUSPEND_SLAVE); + if (ret) + return ret; + + mvdev->vmig.dev_state |= MLX5VF_PCI_FREEZED; + return 0; +} + +static int mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev) +{ + u32 state_size = 0; + int ret; + + if (!(mvdev->vmig.vfio_dev_state & VFIO_DEVICE_STATE_SAVING)) + return -EFAULT; + + if (!(mvdev->vmig.dev_state & MLX5VF_PCI_FREEZED)) + return -EFAULT; + + /* If we already read state no reason to re-read */ + if (mvdev->vmig.vhca_state_data.state_size) + return 0; + + ret = mlx5vf_cmd_query_vhca_migration_state( + mvdev->core_device.pdev, mvdev->vmig.vhca_id, &state_size); + if (ret) + return ret; + + return mlx5vf_cmd_save_vhca_state(mvdev->core_device.pdev, + mvdev->vmig.vhca_id, state_size, + &mvdev->vmig.vhca_state_data); +} + +static int mlx5vf_pci_new_write_window(struct mlx5vf_pci_core_device *mvdev) +{ + struct mlx5_vhca_state_data *state_data = &mvdev->vmig.vhca_state_data; + u32 num_pages_needed; + u64 allocated_ready; + u32 bytes_needed; + + /* Check how many bytes are available from previous flows */ + WARN_ON(state_data->num_pages * PAGE_SIZE < + state_data->win_start_offset); + allocated_ready = (state_data->num_pages * PAGE_SIZE) - + state_data->win_start_offset; + WARN_ON(allocated_ready > MLX5VF_MIG_REGION_DATA_SIZE); + + bytes_needed = MLX5VF_MIG_REGION_DATA_SIZE - allocated_ready; + if (!bytes_needed) + return 0; + + num_pages_needed = DIV_ROUND_UP_ULL(bytes_needed, PAGE_SIZE); + return mlx5vf_add_migration_pages(state_data, num_pages_needed); +} + +static ssize_t +mlx5vf_pci_handle_migration_data_size(struct mlx5vf_pci_core_device *mvdev, + char __user *buf, bool iswrite) +{ + struct mlx5vf_pci_migration_info *vmig = &mvdev->vmig; + u64 data_size; + int ret; + + if (iswrite) { + /* data_size is writable only during resuming state */ + if (vmig->vfio_dev_state != VFIO_DEVICE_STATE_RESUMING) + return -EINVAL; + + ret = copy_from_user(&data_size, buf, sizeof(data_size)); + if (ret) + return -EFAULT; + + vmig->vhca_state_data.state_size += data_size; + vmig->vhca_state_data.win_start_offset += data_size; + ret = mlx5vf_pci_new_write_window(mvdev); + if (ret) + return ret; + + } else { + if (vmig->vfio_dev_state != VFIO_DEVICE_STATE_SAVING) + return -EINVAL; + + data_size = min_t(u64, MLX5VF_MIG_REGION_DATA_SIZE, + vmig->vhca_state_data.state_size - + vmig->vhca_state_data.win_start_offset); + ret = copy_to_user(buf, &data_size, sizeof(data_size)); + if (ret) + return -EFAULT; + } + + vmig->region_state |= MLX5VF_REGION_DATA_SIZE; + return sizeof(data_size); +} + +static ssize_t +mlx5vf_pci_handle_migration_data_offset(struct mlx5vf_pci_core_device *mvdev, + char __user *buf, bool iswrite) +{ + static const u64 data_offset = MLX5VF_MIG_REGION_DATA_OFFSET; + int ret; + + /* RO field */ + if (iswrite) + return -EFAULT; + + ret = copy_to_user(buf, &data_offset, sizeof(data_offset)); + if (ret) + return -EFAULT; + + return sizeof(data_offset); +} + +static ssize_t +mlx5vf_pci_handle_migration_pending_bytes(struct mlx5vf_pci_core_device *mvdev, + char __user *buf, bool iswrite) +{ + struct mlx5vf_pci_migration_info *vmig = &mvdev->vmig; + u64 pending_bytes; + int ret; + + /* RO field */ + if (iswrite) + return -EFAULT; + + if (vmig->vfio_dev_state == (VFIO_DEVICE_STATE_SAVING | + VFIO_DEVICE_STATE_RUNNING)) { + /* In pre-copy state we have no data to return for now, + * return 0 pending bytes + */ + pending_bytes = 0; + } else { + /* + * In case that the device is quiesced, we can freeze the device + * since it's guaranteed that all other DMA masters are quiesced + * as well. + */ + if (vmig->dev_state & MLX5VF_PCI_QUIESCED) { + ret = mlx5vf_pci_freeze_device(mvdev); + if (ret) + return ret; + } + + ret = mlx5vf_pci_save_device_data(mvdev); + if (ret) + return ret; + + pending_bytes = vmig->vhca_state_data.state_size - + vmig->vhca_state_data.win_start_offset; + } + + ret = copy_to_user(buf, &pending_bytes, sizeof(pending_bytes)); + if (ret) + return -EFAULT; + + /* Window moves forward once data from previous iteration was read */ + if (vmig->region_state & MLX5VF_REGION_DATA_SIZE) + vmig->vhca_state_data.win_start_offset += + min_t(u64, MLX5VF_MIG_REGION_DATA_SIZE, pending_bytes); + + WARN_ON(vmig->vhca_state_data.win_start_offset > + vmig->vhca_state_data.state_size); + + /* New iteration started */ + vmig->region_state = MLX5VF_REGION_PENDING_BYTES; + return sizeof(pending_bytes); +} + +static int mlx5vf_load_state(struct mlx5vf_pci_core_device *mvdev) +{ + if (!mvdev->vmig.vhca_state_data.state_size) + return 0; + + return mlx5vf_cmd_load_vhca_state(mvdev->core_device.pdev, + mvdev->vmig.vhca_id, + &mvdev->vmig.vhca_state_data); +} + +static void mlx5vf_reset_mig_state(struct mlx5vf_pci_core_device *mvdev) +{ + struct mlx5vf_pci_migration_info *vmig = &mvdev->vmig; + + vmig->region_state = 0; + mlx5vf_reset_vhca_state(&vmig->vhca_state_data); +} + +static int mlx5vf_pci_set_device_state(struct mlx5vf_pci_core_device *mvdev, + u32 state) +{ + struct mlx5vf_pci_migration_info *vmig = &mvdev->vmig; + int ret; + + if (state == vmig->vfio_dev_state) + return 0; + + if (!vfio_change_migration_state_allowed(state, vmig->vfio_dev_state)) + return -EINVAL; + + switch (state) { + case VFIO_DEVICE_STATE_RUNNING: + /* + * (running) - When we move to _RUNNING state we must: + * 1. stop dirty track (in case we got here + * after recovering from error). + * 2. reset device migration info fields + * 3. make sure device is unfreezed + * 4. make sure device is unquiesced + */ + + /* When moving from resuming to running we may load state + * to the device if was previously set. + */ + if (vmig->vfio_dev_state == VFIO_DEVICE_STATE_RESUMING) { + ret = mlx5vf_load_state(mvdev); + if (ret) + return ret; + } + + /* Any previous migration state if exists should be reset */ + mlx5vf_reset_mig_state(mvdev); + + ret = mlx5vf_pci_unfreeze_device(mvdev); + if (ret) + return ret; + ret = mlx5vf_pci_unquiesce_device(mvdev); + break; + case VFIO_DEVICE_STATE_SAVING | VFIO_DEVICE_STATE_RUNNING: + /* + * (pre-copy) - device should start logging data. + */ + ret = 0; + break; + case VFIO_DEVICE_STATE_SAVING: + /* + * (stop-and-copy) - Stop the device as DMA master. + * At this stage the device can't dirty more + * pages so we can stop logging for it. + */ + ret = mlx5vf_pci_quiesce_device(mvdev); + break; + case VFIO_DEVICE_STATE_STOP: + /* + * (stop) - device stopped, not saving or resuming data. + */ + ret = 0; + break; + case VFIO_DEVICE_STATE_RESUMING: + /* + * (resuming) - device stopped, should soon start resuming + * data. Device must be quiesced (not a DMA master) and + * freezed (not a DMA slave). Also migration info should + * reset. + */ + ret = mlx5vf_pci_quiesce_device(mvdev); + if (ret) + break; + ret = mlx5vf_pci_freeze_device(mvdev); + if (ret) + break; + mlx5vf_reset_mig_state(mvdev); + ret = mlx5vf_pci_new_write_window(mvdev); + break; + default: + return -EFAULT; + } + if (ret) + return ret; + + vmig->vfio_dev_state = state; + return 0; +} + +static ssize_t +mlx5vf_pci_handle_migration_device_state(struct mlx5vf_pci_core_device *mvdev, + char __user *buf, bool iswrite) +{ + size_t count = sizeof(mvdev->vmig.vfio_dev_state); + int ret; + + if (iswrite) { + u32 device_state; + + ret = copy_from_user(&device_state, buf, count); + if (ret) + return -EFAULT; + + ret = mlx5vf_pci_set_device_state(mvdev, device_state); + if (ret) + return ret; + } else { + ret = copy_to_user(buf, &mvdev->vmig.vfio_dev_state, count); + if (ret) + return -EFAULT; + } + + return count; +} + +static ssize_t +mlx5vf_pci_copy_user_data_to_device_state(struct mlx5vf_pci_core_device *mvdev, + char __user *buf, size_t count, + u64 offset) +{ + struct mlx5_vhca_state_data *state_data = &mvdev->vmig.vhca_state_data; + u32 curr_offset; + char *from_buff = buf; + u32 win_page_offset; + u32 copy_count; + struct page *page; + char *to_buff; + int ret; + + curr_offset = state_data->win_start_offset + offset; + + do { + page = mlx5vf_get_migration_page(&state_data->mig_data, + curr_offset); + if (!page) + return -EINVAL; + + win_page_offset = curr_offset % PAGE_SIZE; + copy_count = min_t(u32, PAGE_SIZE - win_page_offset, count); + + to_buff = kmap_local_page(page); + ret = copy_from_user(to_buff + win_page_offset, from_buff, + copy_count); + kunmap_local(to_buff); + if (ret) + return -EFAULT; + + from_buff += copy_count; + curr_offset += copy_count; + count -= copy_count; + } while (count > 0); + + return 0; +} + +static ssize_t +mlx5vf_pci_copy_device_state_to_user(struct mlx5vf_pci_core_device *mvdev, + char __user *buf, u64 offset, size_t count) +{ + struct mlx5_vhca_state_data *state_data = &mvdev->vmig.vhca_state_data; + u32 win_available_bytes; + u32 win_page_offset; + char *to_buff = buf; + u32 copy_count; + u32 curr_offset; + char *from_buff; + struct page *page; + int ret; + + win_available_bytes = + min_t(u64, MLX5VF_MIG_REGION_DATA_SIZE, + mvdev->vmig.vhca_state_data.state_size - + mvdev->vmig.vhca_state_data.win_start_offset); + + if (count + offset > win_available_bytes) + return -EINVAL; + + curr_offset = state_data->win_start_offset + offset; + + do { + page = mlx5vf_get_migration_page(&state_data->mig_data, + curr_offset); + if (!page) + return -EINVAL; + + win_page_offset = curr_offset % PAGE_SIZE; + copy_count = min_t(u32, PAGE_SIZE - win_page_offset, count); + + from_buff = kmap_local_page(page); + ret = copy_to_user(buf, from_buff + win_page_offset, + copy_count); + kunmap_local(from_buff); + if (ret) + return -EFAULT; + + curr_offset += copy_count; + count -= copy_count; + to_buff += copy_count; + } while (count); + + return 0; +} + +static ssize_t +mlx5vf_pci_migration_data_rw(struct mlx5vf_pci_core_device *mvdev, + char __user *buf, size_t count, u64 offset, + bool iswrite) +{ + int ret; + + if (offset + count > MLX5VF_MIG_REGION_DATA_SIZE) + return -EINVAL; + + if (iswrite) + ret = mlx5vf_pci_copy_user_data_to_device_state(mvdev, buf, + count, offset); + else + ret = mlx5vf_pci_copy_device_state_to_user(mvdev, buf, offset, + count); + if (ret) + return ret; + return count; +} + +static ssize_t mlx5vf_pci_mig_rw(struct vfio_pci_core_device *vdev, + char __user *buf, size_t count, loff_t *ppos, + bool iswrite) +{ + struct mlx5vf_pci_core_device *mvdev = + container_of(vdev, struct mlx5vf_pci_core_device, core_device); + u64 pos = *ppos & VFIO_PCI_OFFSET_MASK; + int ret; + + /* Copy to/from the migration region data section */ + if (pos >= MLX5VF_MIG_REGION_DATA_OFFSET) + return mlx5vf_pci_migration_data_rw( + mvdev, buf, count, pos - MLX5VF_MIG_REGION_DATA_OFFSET, + iswrite); + + switch (pos) { + case VFIO_DEVICE_MIGRATION_OFFSET(device_state): + /* This is RW field. */ + if (count != sizeof(mvdev->vmig.vfio_dev_state)) { + ret = -EINVAL; + break; + } + ret = mlx5vf_pci_handle_migration_device_state(mvdev, buf, + iswrite); + break; + case VFIO_DEVICE_MIGRATION_OFFSET(pending_bytes): + /* + * The number of pending bytes still to be migrated from the + * vendor driver. This is RO field. + * Reading this field indicates on the start of a new iteration + * to get device data. + * + */ + ret = mlx5vf_pci_handle_migration_pending_bytes(mvdev, buf, + iswrite); + break; + case VFIO_DEVICE_MIGRATION_OFFSET(data_offset): + /* + * The user application should read data_offset field from the + * migration region. The user application should read the + * device data from this offset within the migration region + * during the _SAVING mode or write the device data during the + * _RESUMING mode. This is RO field. + */ + ret = mlx5vf_pci_handle_migration_data_offset(mvdev, buf, + iswrite); + break; + case VFIO_DEVICE_MIGRATION_OFFSET(data_size): + /* + * The user application should read data_size to get the size + * in bytes of the data copied to the migration region during + * the _SAVING state by the device. The user application should + * write the size in bytes of the data that was copied to + * the migration region during the _RESUMING state by the user. + * This is RW field. + */ + ret = mlx5vf_pci_handle_migration_data_size(mvdev, buf, + iswrite); + break; + default: + ret = -EFAULT; + break; + } + + return ret; +} + +static struct vfio_pci_regops migration_ops = { + .rw = mlx5vf_pci_mig_rw, +}; + +static int mlx5vf_pci_open_device(struct vfio_device *core_vdev) +{ + struct mlx5vf_pci_core_device *mvdev = container_of( + core_vdev, struct mlx5vf_pci_core_device, core_device.vdev); + struct vfio_pci_core_device *vdev = &mvdev->core_device; + int vf_id; + int ret; + + ret = vfio_pci_core_enable(vdev); + if (ret) + return ret; + + if (!mvdev->migrate_cap) { + vfio_pci_core_finish_enable(vdev); + return 0; + } + + vf_id = pci_iov_vf_id(vdev->pdev); + if (vf_id < 0) { + ret = vf_id; + goto out_disable; + } + + ret = mlx5vf_cmd_get_vhca_id(vdev->pdev, vf_id + 1, + &mvdev->vmig.vhca_id); + if (ret) + goto out_disable; + + ret = vfio_pci_register_dev_region(vdev, VFIO_REGION_TYPE_MIGRATION, + VFIO_REGION_SUBTYPE_MIGRATION, + &migration_ops, + MLX5VF_MIG_REGION_DATA_OFFSET + + MLX5VF_MIG_REGION_DATA_SIZE, + VFIO_REGION_INFO_FLAG_READ | + VFIO_REGION_INFO_FLAG_WRITE, + NULL); + if (ret) + goto out_disable; + + vfio_pci_core_finish_enable(vdev); + return 0; +out_disable: + vfio_pci_core_disable(vdev); + return ret; +} + +static void mlx5vf_pci_close_device(struct vfio_device *core_vdev) +{ + struct mlx5vf_pci_core_device *mvdev = container_of( + core_vdev, struct mlx5vf_pci_core_device, core_device.vdev); + + vfio_pci_core_close_device(core_vdev); + mlx5vf_reset_mig_state(mvdev); +} + +static const struct vfio_device_ops mlx5vf_pci_ops = { + .name = "mlx5-vfio-pci", + .open_device = mlx5vf_pci_open_device, + .close_device = mlx5vf_pci_close_device, + .ioctl = vfio_pci_core_ioctl, + .read = vfio_pci_core_read, + .write = vfio_pci_core_write, + .mmap = vfio_pci_core_mmap, + .request = vfio_pci_core_request, + .match = vfio_pci_core_match, +}; + +static int mlx5vf_pci_probe(struct pci_dev *pdev, + const struct pci_device_id *id) +{ + struct mlx5vf_pci_core_device *mvdev; + int ret; + + mvdev = kzalloc(sizeof(*mvdev), GFP_KERNEL); + if (!mvdev) + return -ENOMEM; + vfio_pci_core_init_device(&mvdev->core_device, pdev, &mlx5vf_pci_ops); + + if (pdev->is_virtfn) { + struct mlx5_core_dev *mdev = + mlx5_get_core_dev(pci_physfn(pdev)); + + if (mdev) { + if (MLX5_CAP_GEN(mdev, migration)) + mvdev->migrate_cap = 1; + mlx5_put_core_dev(mdev); + } + } + + ret = vfio_pci_core_register_device(&mvdev->core_device); + if (ret) + goto out_free; + + dev_set_drvdata(&pdev->dev, mvdev); + return 0; + +out_free: + vfio_pci_core_uninit_device(&mvdev->core_device); + kfree(mvdev); + return ret; +} + +static void mlx5vf_pci_remove(struct pci_dev *pdev) +{ + struct mlx5vf_pci_core_device *mvdev = dev_get_drvdata(&pdev->dev); + + vfio_pci_core_unregister_device(&mvdev->core_device); + vfio_pci_core_uninit_device(&mvdev->core_device); + kfree(mvdev); +} + +static const struct pci_device_id mlx5vf_pci_table[] = { + { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_MELLANOX, 0x101e) }, /* ConnectX Family mlx5Gen Virtual Function */ + {} +}; + +MODULE_DEVICE_TABLE(pci, mlx5vf_pci_table); + +static struct pci_driver mlx5vf_pci_driver = { + .name = KBUILD_MODNAME, + .id_table = mlx5vf_pci_table, + .probe = mlx5vf_pci_probe, + .remove = mlx5vf_pci_remove, + .err_handler = &vfio_pci_core_err_handlers, +}; + +static void __exit mlx5vf_pci_cleanup(void) +{ + pci_unregister_driver(&mlx5vf_pci_driver); +} + +static int __init mlx5vf_pci_init(void) +{ + return pci_register_driver(&mlx5vf_pci_driver); +} + +module_init(mlx5vf_pci_init); +module_exit(mlx5vf_pci_cleanup); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Max Gurtovoy "); +MODULE_AUTHOR("Yishai Hadas "); +MODULE_DESCRIPTION( + "MLX5 VFIO PCI - User Level meta-driver for MLX5 device family");