[RFC,02/29] nvkm/vgpu: attach to nvkm as a nvkm client

Message ID	20240922124951.1946072-3-zhiw@nvidia.com (mailing list archive)
State	New, archived
Headers	show Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2071.outbound.protection.outlook.com [40.107.223.71]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4466B28FF for <kvm@vger.kernel.org>; Sun, 22 Sep 2024 12:50:35 +0000 (UTC) Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.232 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.232; helo=mail.nvidia.com; pr=C From: Zhi Wang <zhiw@nvidia.com> To: <kvm@vger.kernel.org>, <nouveau@lists.freedesktop.org> CC: <alex.williamson@redhat.com>, <kevin.tian@intel.com>, <jgg@nvidia.com>, <airlied@gmail.com>, <daniel@ffwll.ch>, <acurrid@nvidia.com>, <cjia@nvidia.com>, <smitra@nvidia.com>, <ankita@nvidia.com>, <aniketa@nvidia.com>, <kwankhede@nvidia.com>, <targupta@nvidia.com>, <zhiw@nvidia.com>, <zhiwang@kernel.org> Subject: [RFC 02/29] nvkm/vgpu: attach to nvkm as a nvkm client Date: Sun, 22 Sep 2024 05:49:24 -0700 Message-ID: <20240922124951.1946072-3-zhiw@nvidia.com> In-Reply-To: <20240922124951.1946072-1-zhiw@nvidia.com> References: <20240922124951.1946072-1-zhiw@nvidia.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain
Series	Introduce NVIDIA GPU Virtualization (vGPU) Support \| expand [RFC,00/29] Introduce NVIDIA GPU Virtualization (vGPU) Support [RFC,01/29] nvkm/vgpu: introduce NVIDIA vGPU support prelude [RFC,02/29] nvkm/vgpu: attach to nvkm as a nvkm client [RFC,03/29] nvkm/vgpu: reserve a larger GSP heap when NVIDIA vGPU is enabled [RFC,04/29] nvkm/vgpu: set the VF partition count when NVIDIA vGPU is enabled [RFC,05/29] nvkm/vgpu: populate GSP_VF_INFO when NVIDIA vGPU is enabled [RFC,06/29] nvkm/vgpu: set RMSetSriovMode when NVIDIA vGPU is enabled [RFC,07/29] nvkm/gsp: add a notify handler for GSP event GPUACCT_PERFMON_UTIL_SAMPLES [RFC,08/29] nvkm/vgpu: get the size VMMU segment from GSP firmware [RFC,09/29] nvkm/vgpu: introduce the reserved channel allocator [RFC,10/29] nvkm/vgpu: introduce interfaces for NVIDIA vGPU VFIO module [RFC,11/29] nvkm/vgpu: introduce GSP RM client alloc and free for vGPU [RFC,12/29] nvkm/vgpu: introduce GSP RM control interface for vGPU [RFC,13/29] nvkm: move chid.h to nvkm/engine. [RFC,14/29] nvkm/vgpu: introduce channel allocation for vGPU [RFC,15/29] nvkm/vgpu: introduce FB memory allocation for vGPU [RFC,16/29] nvkm/vgpu: introduce BAR1 map routines for vGPUs [RFC,17/29] nvkm/vgpu: introduce engine bitmap for vGPU [RFC,18/29] nvkm/vgpu: introduce pci_driver.sriov_configure() in nvkm [RFC,19/29] vfio/vgpu_mgr: introdcue vGPU lifecycle management prelude [RFC,20/29] vfio/vgpu_mgr: allocate GSP RM client for NVIDIA vGPU manager [RFC,21/29] vfio/vgpu_mgr: introduce vGPU type uploading [RFC,22/29] vfio/vgpu_mgr: allocate vGPU FB memory when creating vGPUs [RFC,23/29] vfio/vgpu_mgr: allocate vGPU channels when creating vGPUs [RFC,24/29] vfio/vgpu_mgr: allocate mgmt heap when creating vGPUs [RFC,25/29] vfio/vgpu_mgr: map mgmt heap when creating a vGPU [RFC,26/29] vfio/vgpu_mgr: allocate GSP RM client when creating vGPUs [RFC,27/29] vfio/vgpu_mgr: bootload the new vGPU [RFC,28/29] vfio/vgpu_mgr: introduce vGPU host RPC channel [RFC,29/29] vfio/vgpu_mgr: introduce NVIDIA vGPU VFIO variant driver

Message ID

20240922124951.1946072-3-zhiw@nvidia.com (mailing list archive)

State

New, archived

Headers

Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates
 216.228.118.232 as permitted sender) receiver=protection.outlook.com;
 client-ip=216.228.118.232; helo=mail.nvidia.com; pr=C
From: Zhi Wang <zhiw@nvidia.com>
To: <kvm@vger.kernel.org>, <nouveau@lists.freedesktop.org>
CC: <alex.williamson@redhat.com>, <kevin.tian@intel.com>, <jgg@nvidia.com>,
	<airlied@gmail.com>, <daniel@ffwll.ch>, <acurrid@nvidia.com>,
	<cjia@nvidia.com>, <smitra@nvidia.com>, <ankita@nvidia.com>,
	<aniketa@nvidia.com>, <kwankhede@nvidia.com>, <targupta@nvidia.com>,
	<zhiw@nvidia.com>, <zhiwang@kernel.org>
Subject: [RFC 02/29] nvkm/vgpu: attach to nvkm as a nvkm client
Date: Sun, 22 Sep 2024 05:49:24 -0700
Message-ID: <20240922124951.1946072-3-zhiw@nvidia.com>
In-Reply-To: <20240922124951.1946072-1-zhiw@nvidia.com>
References: <20240922124951.1946072-1-zhiw@nvidia.com>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Sep 2024 12:50:29.8753
 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 5ad602e5-8376-4bd2-a7e3-08dcdb0523be
X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: 
 TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.118.232];Helo=[mail.nvidia.com]
X-MS-Exchange-CrossTenant-AuthSource: 
	BL6PEPF00020E66.namprd04.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM4PR12MB8452

Series

Introduce NVIDIA GPU Virtualization (vGPU) Support | expand

Commit Message

Zhi Wang Sept. 22, 2024, 12:49 p.m. UTC

nvkm is a HW abstraction layer(HAL) that initializes the HW and
allows its clients to manipulate the GPU functions regardless of the
generations of GPU HW. On the top layer, it provides generic APIs for a
client to connect to NVKM, enumerate the GPU functions, and manipulate
the GPU HW.

To reach nvkm, the client needs to connect to NVKM layer by layer: driver
layer, client layer, and eventually, the device layer, which provides all
the access routines to GPU functions. After a client attaches to NVKM,
it initializes the HW and is able to serve the clients.

Attach to nvkm as a nvkm client.

Cc: Neo Jia <cjia@nvidia.com>
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
---
 .../nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h  |  8 ++++
 .../gpu/drm/nouveau/nvkm/vgpu_mgr/vgpu_mgr.c  | 48 ++++++++++++++++++-
 2 files changed, 55 insertions(+), 1 deletion(-)

Comments

Greg KH Sept. 26, 2024, 9:21 a.m. UTC | #1

On Sun, Sep 22, 2024 at 05:49:24AM -0700, Zhi Wang wrote:
> nvkm is a HW abstraction layer(HAL) that initializes the HW and
> allows its clients to manipulate the GPU functions regardless of the
> generations of GPU HW. On the top layer, it provides generic APIs for a
> client to connect to NVKM, enumerate the GPU functions, and manipulate
> the GPU HW.
> 
> To reach nvkm, the client needs to connect to NVKM layer by layer: driver
> layer, client layer, and eventually, the device layer, which provides all
> the access routines to GPU functions. After a client attaches to NVKM,
> it initializes the HW and is able to serve the clients.
> 
> Attach to nvkm as a nvkm client.
> 
> Cc: Neo Jia <cjia@nvidia.com>
> Signed-off-by: Zhi Wang <zhiw@nvidia.com>
> ---
>  .../nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h  |  8 ++++
>  .../gpu/drm/nouveau/nvkm/vgpu_mgr/vgpu_mgr.c  | 48 ++++++++++++++++++-
>  2 files changed, 55 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h b/drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h
> index 3163fff1085b..9e10e18306b0 100644
> --- a/drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h
> +++ b/drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h
> @@ -7,6 +7,14 @@
>  struct nvkm_vgpu_mgr {
>  	bool enabled;
>  	struct nvkm_device *nvkm_dev;
> +
> +	const struct nvif_driver *driver;

Meta-comment, why is this attempting to act like a "driver" and yet not
tieing into the driver model code at all?  Please fix that up, it's not
ok to add more layers on top of a broken one like this.  We have
infrastructure for this type of thing, please don't route around it.

thanks,

greg k-h

Zhi Wang Oct. 14, 2024, 10:16 a.m. UTC | #2

On 26/09/2024 12.21, Greg KH wrote:
> External email: Use caution opening links or attachments
> 
> 
> On Sun, Sep 22, 2024 at 05:49:24AM -0700, Zhi Wang wrote:
>> nvkm is a HW abstraction layer(HAL) that initializes the HW and
>> allows its clients to manipulate the GPU functions regardless of the
>> generations of GPU HW. On the top layer, it provides generic APIs for a
>> client to connect to NVKM, enumerate the GPU functions, and manipulate
>> the GPU HW.
>>
>> To reach nvkm, the client needs to connect to NVKM layer by layer: driver
>> layer, client layer, and eventually, the device layer, which provides all
>> the access routines to GPU functions. After a client attaches to NVKM,
>> it initializes the HW and is able to serve the clients.
>>
>> Attach to nvkm as a nvkm client.
>>
>> Cc: Neo Jia <cjia@nvidia.com>
>> Signed-off-by: Zhi Wang <zhiw@nvidia.com>
>> ---
>>   .../nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h  |  8 ++++
>>   .../gpu/drm/nouveau/nvkm/vgpu_mgr/vgpu_mgr.c  | 48 ++++++++++++++++++-
>>   2 files changed, 55 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h b/drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h
>> index 3163fff1085b..9e10e18306b0 100644
>> --- a/drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h
>> +++ b/drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h
>> @@ -7,6 +7,14 @@
>>   struct nvkm_vgpu_mgr {
>>        bool enabled;
>>        struct nvkm_device *nvkm_dev;
>> +
>> +     const struct nvif_driver *driver;
> 
> Meta-comment, why is this attempting to act like a "driver" and yet not
> tieing into the driver model code at all?  Please fix that up, it's not
> ok to add more layers on top of a broken one like this.  We have
> infrastructure for this type of thing, please don't route around it.
> 

Thanks for the guidelines. Will try to work with folks and figure out a 
solution.

Ben is doing quite some clean-ups of nouveau driver[1], they had been 
reviewed and merged by Danilo. Also, the split driver patchset he is 
working on seems a meaningful pre-step to fix this, as it also includes 
the re-factor of the interface between the nvkm and the nvif stuff.

[1] 
https://lore.kernel.org/nouveau/CAPM=9tyW=YuDQrRwrYK_ayuvEnp+9irTuze=MP-zkowm3CFJ9A@mail.gmail.com/T/

[2] 
https://lore.kernel.org/dri-devel/20240613170211.88779-1-bskeggs@nvidia.com/T/

> thanks,
> 
> greg k-h

Greg KH Oct. 14, 2024, 11:33 a.m. UTC | #3

On Mon, Oct 14, 2024 at 10:16:21AM +0000, Zhi Wang wrote:
> On 26/09/2024 12.21, Greg KH wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > On Sun, Sep 22, 2024 at 05:49:24AM -0700, Zhi Wang wrote:
> >> nvkm is a HW abstraction layer(HAL) that initializes the HW and
> >> allows its clients to manipulate the GPU functions regardless of the
> >> generations of GPU HW. On the top layer, it provides generic APIs for a
> >> client to connect to NVKM, enumerate the GPU functions, and manipulate
> >> the GPU HW.
> >>
> >> To reach nvkm, the client needs to connect to NVKM layer by layer: driver
> >> layer, client layer, and eventually, the device layer, which provides all
> >> the access routines to GPU functions. After a client attaches to NVKM,
> >> it initializes the HW and is able to serve the clients.
> >>
> >> Attach to nvkm as a nvkm client.
> >>
> >> Cc: Neo Jia <cjia@nvidia.com>
> >> Signed-off-by: Zhi Wang <zhiw@nvidia.com>
> >> ---
> >>   .../nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h  |  8 ++++
> >>   .../gpu/drm/nouveau/nvkm/vgpu_mgr/vgpu_mgr.c  | 48 ++++++++++++++++++-
> >>   2 files changed, 55 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h b/drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h
> >> index 3163fff1085b..9e10e18306b0 100644
> >> --- a/drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h
> >> +++ b/drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h
> >> @@ -7,6 +7,14 @@
> >>   struct nvkm_vgpu_mgr {
> >>        bool enabled;
> >>        struct nvkm_device *nvkm_dev;
> >> +
> >> +     const struct nvif_driver *driver;
> > 
> > Meta-comment, why is this attempting to act like a "driver" and yet not
> > tieing into the driver model code at all?  Please fix that up, it's not
> > ok to add more layers on top of a broken one like this.  We have
> > infrastructure for this type of thing, please don't route around it.
> > 
> 
> Thanks for the guidelines. Will try to work with folks and figure out a 
> solution.
> 
> Ben is doing quite some clean-ups of nouveau driver[1], they had been 
> reviewed and merged by Danilo. Also, the split driver patchset he is 
> working on seems a meaningful pre-step to fix this, as it also includes 
> the re-factor of the interface between the nvkm and the nvif stuff.

What we need is the auxbus code changes that are pointed to somewhere in
those long threads, that needs to land first and this series rebased
before it can be reviewed properly as that is going to change your
device lifetime rules a lot.

Please do that and then move this "nvif_driver" to be the proper driver
core type to tie into the kernel correctly.

thanks,

greg k-h

diff --git a/drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h b/drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h
index 3163fff1085b..9e10e18306b0 100644
--- a/drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h
+++ b/drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h
@@ -7,6 +7,14 @@ 
 struct nvkm_vgpu_mgr {
 	bool enabled;
 	struct nvkm_device *nvkm_dev;
+
+	const struct nvif_driver *driver;
+
+	const struct nvif_client_impl *cli_impl;
+	struct nvif_client_priv *cli_priv;
+
+	const struct nvif_device_impl *dev_impl;
+	struct nvif_device_priv *dev_priv;
 };
 
 bool nvkm_vgpu_mgr_is_supported(struct nvkm_device *device);
diff --git a/drivers/gpu/drm/nouveau/nvkm/vgpu_mgr/vgpu_mgr.c b/drivers/gpu/drm/nouveau/nvkm/vgpu_mgr/vgpu_mgr.c
index a506414e5ba2..0639596f8a96 100644
--- a/drivers/gpu/drm/nouveau/nvkm/vgpu_mgr/vgpu_mgr.c
+++ b/drivers/gpu/drm/nouveau/nvkm/vgpu_mgr/vgpu_mgr.c
@@ -1,5 +1,7 @@ 
 /* SPDX-License-Identifier: MIT */
 #include <core/device.h>
+#include <core/driver.h>
+#include <nvif/driverif.h>
 #include <core/pci.h>
 #include <vgpu_mgr/vgpu_mgr.h>
 
@@ -42,6 +44,44 @@  bool nvkm_vgpu_mgr_is_enabled(struct nvkm_device *device)
 	return device->vgpu_mgr.enabled;
 }
 
+static void detach_nvkm(struct nvkm_vgpu_mgr *vgpu_mgr)
+{
+	if (vgpu_mgr->dev_impl) {
+		vgpu_mgr->dev_impl->del(vgpu_mgr->dev_priv);
+		vgpu_mgr->dev_impl = NULL;
+	}
+
+	if (vgpu_mgr->cli_impl) {
+		vgpu_mgr->cli_impl->del(vgpu_mgr->cli_priv);
+		vgpu_mgr->cli_impl = NULL;
+	}
+}
+
+static int attach_nvkm(struct nvkm_vgpu_mgr *vgpu_mgr)
+{
+	struct nvkm_device *device = vgpu_mgr->nvkm_dev;
+	int ret;
+
+	ret = nvkm_driver_ctor(device, &vgpu_mgr->driver,
+			       &vgpu_mgr->cli_impl, &vgpu_mgr->cli_priv);
+	if (ret)
+		return ret;
+
+	ret = vgpu_mgr->cli_impl->device.new(vgpu_mgr->cli_priv,
+					     &vgpu_mgr->dev_impl,
+					     &vgpu_mgr->dev_priv);
+	if (ret)
+		goto fail_device_new;
+
+	return 0;
+
+fail_device_new:
+	vgpu_mgr->cli_impl->del(vgpu_mgr->cli_priv);
+	vgpu_mgr->cli_impl = NULL;
+
+	return ret;
+}
+
 /**
  * nvkm_vgpu_mgr_init - Initialize the vGPU manager support
  * @device: the nvkm_device pointer
@@ -51,13 +91,18 @@  bool nvkm_vgpu_mgr_is_enabled(struct nvkm_device *device)
 int nvkm_vgpu_mgr_init(struct nvkm_device *device)
 {
 	struct nvkm_vgpu_mgr *vgpu_mgr = &device->vgpu_mgr;
+	int ret;
 
 	if (!nvkm_vgpu_mgr_is_supported(device))
 		return -ENODEV;
 
 	vgpu_mgr->nvkm_dev = device;
-	vgpu_mgr->enabled = true;
 
+	ret = attach_nvkm(vgpu_mgr);
+	if (ret)
+		return ret;
+
+	vgpu_mgr->enabled = true;
 	pci_info(nvkm_to_pdev(device),
 		 "NVIDIA vGPU mananger support is enabled.\n");
 
@@ -72,5 +117,6 @@  void nvkm_vgpu_mgr_fini(struct nvkm_device *device)
 {
 	struct nvkm_vgpu_mgr *vgpu_mgr = &device->vgpu_mgr;
 
+	detach_nvkm(vgpu_mgr);
 	vgpu_mgr->enabled = false;
 }

[RFC,02/29] nvkm/vgpu: attach to nvkm as a nvkm client

Commit Message

Comments

Patch