diff mbox series

[RFC,02/29] nvkm/vgpu: attach to nvkm as a nvkm client

Message ID 20240922124951.1946072-3-zhiw@nvidia.com (mailing list archive)
State New, archived
Headers show
Series Introduce NVIDIA GPU Virtualization (vGPU) Support | expand

Commit Message

Zhi Wang Sept. 22, 2024, 12:49 p.m. UTC
nvkm is a HW abstraction layer(HAL) that initializes the HW and
allows its clients to manipulate the GPU functions regardless of the
generations of GPU HW. On the top layer, it provides generic APIs for a
client to connect to NVKM, enumerate the GPU functions, and manipulate
the GPU HW.

To reach nvkm, the client needs to connect to NVKM layer by layer: driver
layer, client layer, and eventually, the device layer, which provides all
the access routines to GPU functions. After a client attaches to NVKM,
it initializes the HW and is able to serve the clients.

Attach to nvkm as a nvkm client.

Cc: Neo Jia <cjia@nvidia.com>
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
---
 .../nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h  |  8 ++++
 .../gpu/drm/nouveau/nvkm/vgpu_mgr/vgpu_mgr.c  | 48 ++++++++++++++++++-
 2 files changed, 55 insertions(+), 1 deletion(-)

Comments

Greg KH Sept. 26, 2024, 9:21 a.m. UTC | #1
On Sun, Sep 22, 2024 at 05:49:24AM -0700, Zhi Wang wrote:
> nvkm is a HW abstraction layer(HAL) that initializes the HW and
> allows its clients to manipulate the GPU functions regardless of the
> generations of GPU HW. On the top layer, it provides generic APIs for a
> client to connect to NVKM, enumerate the GPU functions, and manipulate
> the GPU HW.
> 
> To reach nvkm, the client needs to connect to NVKM layer by layer: driver
> layer, client layer, and eventually, the device layer, which provides all
> the access routines to GPU functions. After a client attaches to NVKM,
> it initializes the HW and is able to serve the clients.
> 
> Attach to nvkm as a nvkm client.
> 
> Cc: Neo Jia <cjia@nvidia.com>
> Signed-off-by: Zhi Wang <zhiw@nvidia.com>
> ---
>  .../nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h  |  8 ++++
>  .../gpu/drm/nouveau/nvkm/vgpu_mgr/vgpu_mgr.c  | 48 ++++++++++++++++++-
>  2 files changed, 55 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h b/drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h
> index 3163fff1085b..9e10e18306b0 100644
> --- a/drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h
> +++ b/drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h
> @@ -7,6 +7,14 @@
>  struct nvkm_vgpu_mgr {
>  	bool enabled;
>  	struct nvkm_device *nvkm_dev;
> +
> +	const struct nvif_driver *driver;

Meta-comment, why is this attempting to act like a "driver" and yet not
tieing into the driver model code at all?  Please fix that up, it's not
ok to add more layers on top of a broken one like this.  We have
infrastructure for this type of thing, please don't route around it.

thanks,

greg k-h
Zhi Wang Oct. 14, 2024, 10:16 a.m. UTC | #2
On 26/09/2024 12.21, Greg KH wrote:
> External email: Use caution opening links or attachments
> 
> 
> On Sun, Sep 22, 2024 at 05:49:24AM -0700, Zhi Wang wrote:
>> nvkm is a HW abstraction layer(HAL) that initializes the HW and
>> allows its clients to manipulate the GPU functions regardless of the
>> generations of GPU HW. On the top layer, it provides generic APIs for a
>> client to connect to NVKM, enumerate the GPU functions, and manipulate
>> the GPU HW.
>>
>> To reach nvkm, the client needs to connect to NVKM layer by layer: driver
>> layer, client layer, and eventually, the device layer, which provides all
>> the access routines to GPU functions. After a client attaches to NVKM,
>> it initializes the HW and is able to serve the clients.
>>
>> Attach to nvkm as a nvkm client.
>>
>> Cc: Neo Jia <cjia@nvidia.com>
>> Signed-off-by: Zhi Wang <zhiw@nvidia.com>
>> ---
>>   .../nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h  |  8 ++++
>>   .../gpu/drm/nouveau/nvkm/vgpu_mgr/vgpu_mgr.c  | 48 ++++++++++++++++++-
>>   2 files changed, 55 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h b/drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h
>> index 3163fff1085b..9e10e18306b0 100644
>> --- a/drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h
>> +++ b/drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h
>> @@ -7,6 +7,14 @@
>>   struct nvkm_vgpu_mgr {
>>        bool enabled;
>>        struct nvkm_device *nvkm_dev;
>> +
>> +     const struct nvif_driver *driver;
> 
> Meta-comment, why is this attempting to act like a "driver" and yet not
> tieing into the driver model code at all?  Please fix that up, it's not
> ok to add more layers on top of a broken one like this.  We have
> infrastructure for this type of thing, please don't route around it.
> 

Thanks for the guidelines. Will try to work with folks and figure out a 
solution.

Ben is doing quite some clean-ups of nouveau driver[1], they had been 
reviewed and merged by Danilo. Also, the split driver patchset he is 
working on seems a meaningful pre-step to fix this, as it also includes 
the re-factor of the interface between the nvkm and the nvif stuff.

[1] 
https://lore.kernel.org/nouveau/CAPM=9tyW=YuDQrRwrYK_ayuvEnp+9irTuze=MP-zkowm3CFJ9A@mail.gmail.com/T/

[2] 
https://lore.kernel.org/dri-devel/20240613170211.88779-1-bskeggs@nvidia.com/T/

> thanks,
> 
> greg k-h
Greg KH Oct. 14, 2024, 11:33 a.m. UTC | #3
On Mon, Oct 14, 2024 at 10:16:21AM +0000, Zhi Wang wrote:
> On 26/09/2024 12.21, Greg KH wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > On Sun, Sep 22, 2024 at 05:49:24AM -0700, Zhi Wang wrote:
> >> nvkm is a HW abstraction layer(HAL) that initializes the HW and
> >> allows its clients to manipulate the GPU functions regardless of the
> >> generations of GPU HW. On the top layer, it provides generic APIs for a
> >> client to connect to NVKM, enumerate the GPU functions, and manipulate
> >> the GPU HW.
> >>
> >> To reach nvkm, the client needs to connect to NVKM layer by layer: driver
> >> layer, client layer, and eventually, the device layer, which provides all
> >> the access routines to GPU functions. After a client attaches to NVKM,
> >> it initializes the HW and is able to serve the clients.
> >>
> >> Attach to nvkm as a nvkm client.
> >>
> >> Cc: Neo Jia <cjia@nvidia.com>
> >> Signed-off-by: Zhi Wang <zhiw@nvidia.com>
> >> ---
> >>   .../nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h  |  8 ++++
> >>   .../gpu/drm/nouveau/nvkm/vgpu_mgr/vgpu_mgr.c  | 48 ++++++++++++++++++-
> >>   2 files changed, 55 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h b/drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h
> >> index 3163fff1085b..9e10e18306b0 100644
> >> --- a/drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h
> >> +++ b/drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h
> >> @@ -7,6 +7,14 @@
> >>   struct nvkm_vgpu_mgr {
> >>        bool enabled;
> >>        struct nvkm_device *nvkm_dev;
> >> +
> >> +     const struct nvif_driver *driver;
> > 
> > Meta-comment, why is this attempting to act like a "driver" and yet not
> > tieing into the driver model code at all?  Please fix that up, it's not
> > ok to add more layers on top of a broken one like this.  We have
> > infrastructure for this type of thing, please don't route around it.
> > 
> 
> Thanks for the guidelines. Will try to work with folks and figure out a 
> solution.
> 
> Ben is doing quite some clean-ups of nouveau driver[1], they had been 
> reviewed and merged by Danilo. Also, the split driver patchset he is 
> working on seems a meaningful pre-step to fix this, as it also includes 
> the re-factor of the interface between the nvkm and the nvif stuff.

What we need is the auxbus code changes that are pointed to somewhere in
those long threads, that needs to land first and this series rebased
before it can be reviewed properly as that is going to change your
device lifetime rules a lot.

Please do that and then move this "nvif_driver" to be the proper driver
core type to tie into the kernel correctly.

thanks,

greg k-h
diff mbox series

Patch

diff --git a/drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h b/drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h
index 3163fff1085b..9e10e18306b0 100644
--- a/drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h
+++ b/drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h
@@ -7,6 +7,14 @@ 
 struct nvkm_vgpu_mgr {
 	bool enabled;
 	struct nvkm_device *nvkm_dev;
+
+	const struct nvif_driver *driver;
+
+	const struct nvif_client_impl *cli_impl;
+	struct nvif_client_priv *cli_priv;
+
+	const struct nvif_device_impl *dev_impl;
+	struct nvif_device_priv *dev_priv;
 };
 
 bool nvkm_vgpu_mgr_is_supported(struct nvkm_device *device);
diff --git a/drivers/gpu/drm/nouveau/nvkm/vgpu_mgr/vgpu_mgr.c b/drivers/gpu/drm/nouveau/nvkm/vgpu_mgr/vgpu_mgr.c
index a506414e5ba2..0639596f8a96 100644
--- a/drivers/gpu/drm/nouveau/nvkm/vgpu_mgr/vgpu_mgr.c
+++ b/drivers/gpu/drm/nouveau/nvkm/vgpu_mgr/vgpu_mgr.c
@@ -1,5 +1,7 @@ 
 /* SPDX-License-Identifier: MIT */
 #include <core/device.h>
+#include <core/driver.h>
+#include <nvif/driverif.h>
 #include <core/pci.h>
 #include <vgpu_mgr/vgpu_mgr.h>
 
@@ -42,6 +44,44 @@  bool nvkm_vgpu_mgr_is_enabled(struct nvkm_device *device)
 	return device->vgpu_mgr.enabled;
 }
 
+static void detach_nvkm(struct nvkm_vgpu_mgr *vgpu_mgr)
+{
+	if (vgpu_mgr->dev_impl) {
+		vgpu_mgr->dev_impl->del(vgpu_mgr->dev_priv);
+		vgpu_mgr->dev_impl = NULL;
+	}
+
+	if (vgpu_mgr->cli_impl) {
+		vgpu_mgr->cli_impl->del(vgpu_mgr->cli_priv);
+		vgpu_mgr->cli_impl = NULL;
+	}
+}
+
+static int attach_nvkm(struct nvkm_vgpu_mgr *vgpu_mgr)
+{
+	struct nvkm_device *device = vgpu_mgr->nvkm_dev;
+	int ret;
+
+	ret = nvkm_driver_ctor(device, &vgpu_mgr->driver,
+			       &vgpu_mgr->cli_impl, &vgpu_mgr->cli_priv);
+	if (ret)
+		return ret;
+
+	ret = vgpu_mgr->cli_impl->device.new(vgpu_mgr->cli_priv,
+					     &vgpu_mgr->dev_impl,
+					     &vgpu_mgr->dev_priv);
+	if (ret)
+		goto fail_device_new;
+
+	return 0;
+
+fail_device_new:
+	vgpu_mgr->cli_impl->del(vgpu_mgr->cli_priv);
+	vgpu_mgr->cli_impl = NULL;
+
+	return ret;
+}
+
 /**
  * nvkm_vgpu_mgr_init - Initialize the vGPU manager support
  * @device: the nvkm_device pointer
@@ -51,13 +91,18 @@  bool nvkm_vgpu_mgr_is_enabled(struct nvkm_device *device)
 int nvkm_vgpu_mgr_init(struct nvkm_device *device)
 {
 	struct nvkm_vgpu_mgr *vgpu_mgr = &device->vgpu_mgr;
+	int ret;
 
 	if (!nvkm_vgpu_mgr_is_supported(device))
 		return -ENODEV;
 
 	vgpu_mgr->nvkm_dev = device;
-	vgpu_mgr->enabled = true;
 
+	ret = attach_nvkm(vgpu_mgr);
+	if (ret)
+		return ret;
+
+	vgpu_mgr->enabled = true;
 	pci_info(nvkm_to_pdev(device),
 		 "NVIDIA vGPU mananger support is enabled.\n");
 
@@ -72,5 +117,6 @@  void nvkm_vgpu_mgr_fini(struct nvkm_device *device)
 {
 	struct nvkm_vgpu_mgr *vgpu_mgr = &device->vgpu_mgr;
 
+	detach_nvkm(vgpu_mgr);
 	vgpu_mgr->enabled = false;
 }