[RFC,00/29] Introduce NVIDIA GPU Virtualization (vGPU) Support

Message ID	20240922124951.1946072-1-zhiw@nvidia.com (mailing list archive)
Headers	show Received: from NAM12-DM6-obe.outbound.protection.outlook.com (mail-dm6nam12on2086.outbound.protection.outlook.com [40.107.243.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 84EA728F1 for <kvm@vger.kernel.org>; Sun, 22 Sep 2024 12:50:35 +0000 (UTC) Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.233 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.233; helo=mail.nvidia.com; pr=C From: Zhi Wang <zhiw@nvidia.com> To: <kvm@vger.kernel.org>, <nouveau@lists.freedesktop.org> CC: <alex.williamson@redhat.com>, <kevin.tian@intel.com>, <jgg@nvidia.com>, <airlied@gmail.com>, <daniel@ffwll.ch>, <acurrid@nvidia.com>, <cjia@nvidia.com>, <smitra@nvidia.com>, <ankita@nvidia.com>, <aniketa@nvidia.com>, <kwankhede@nvidia.com>, <targupta@nvidia.com>, <zhiw@nvidia.com>, <zhiwang@kernel.org> Subject: [RFC 00/29] Introduce NVIDIA GPU Virtualization (vGPU) Support Date: Sun, 22 Sep 2024 05:49:22 -0700 Message-ID: <20240922124951.1946072-1-zhiw@nvidia.com> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit
Series	Introduce NVIDIA GPU Virtualization (vGPU) Support \| expand [RFC,00/29] Introduce NVIDIA GPU Virtualization (vGPU) Support [RFC,01/29] nvkm/vgpu: introduce NVIDIA vGPU support prelude [RFC,02/29] nvkm/vgpu: attach to nvkm as a nvkm client [RFC,03/29] nvkm/vgpu: reserve a larger GSP heap when NVIDIA vGPU is enabled [RFC,04/29] nvkm/vgpu: set the VF partition count when NVIDIA vGPU is enabled [RFC,05/29] nvkm/vgpu: populate GSP_VF_INFO when NVIDIA vGPU is enabled [RFC,06/29] nvkm/vgpu: set RMSetSriovMode when NVIDIA vGPU is enabled [RFC,07/29] nvkm/gsp: add a notify handler for GSP event GPUACCT_PERFMON_UTIL_SAMPLES [RFC,08/29] nvkm/vgpu: get the size VMMU segment from GSP firmware [RFC,09/29] nvkm/vgpu: introduce the reserved channel allocator [RFC,10/29] nvkm/vgpu: introduce interfaces for NVIDIA vGPU VFIO module [RFC,11/29] nvkm/vgpu: introduce GSP RM client alloc and free for vGPU [RFC,12/29] nvkm/vgpu: introduce GSP RM control interface for vGPU [RFC,13/29] nvkm: move chid.h to nvkm/engine. [RFC,14/29] nvkm/vgpu: introduce channel allocation for vGPU [RFC,15/29] nvkm/vgpu: introduce FB memory allocation for vGPU [RFC,16/29] nvkm/vgpu: introduce BAR1 map routines for vGPUs [RFC,17/29] nvkm/vgpu: introduce engine bitmap for vGPU [RFC,18/29] nvkm/vgpu: introduce pci_driver.sriov_configure() in nvkm [RFC,19/29] vfio/vgpu_mgr: introdcue vGPU lifecycle management prelude [RFC,20/29] vfio/vgpu_mgr: allocate GSP RM client for NVIDIA vGPU manager [RFC,21/29] vfio/vgpu_mgr: introduce vGPU type uploading [RFC,22/29] vfio/vgpu_mgr: allocate vGPU FB memory when creating vGPUs [RFC,23/29] vfio/vgpu_mgr: allocate vGPU channels when creating vGPUs [RFC,24/29] vfio/vgpu_mgr: allocate mgmt heap when creating vGPUs [RFC,25/29] vfio/vgpu_mgr: map mgmt heap when creating a vGPU [RFC,26/29] vfio/vgpu_mgr: allocate GSP RM client when creating vGPUs [RFC,27/29] vfio/vgpu_mgr: bootload the new vGPU [RFC,28/29] vfio/vgpu_mgr: introduce vGPU host RPC channel [RFC,29/29] vfio/vgpu_mgr: introduce NVIDIA vGPU VFIO variant driver

Zhi Wang Sept. 22, 2024, 12:49 p.m. UTC

1. Background
=============

NVIDIA vGPU[1] software enables powerful GPU performance for workloads
ranging from graphics-rich virtual workstations to data science and AI,
enabling IT to leverage the management and security benefits of
virtualization as well as the performance of NVIDIA GPUs required for
modern workloads. Installed on a physical GPU in a cloud or enterprise
data center server, NVIDIA vGPU software creates virtual GPUs that can
be shared across multiple virtual machines.

The vGPU architecture[2] can be illustrated as follow:

 +--------------------+    +--------------------+ +--------------------+ +--------------------+ 
 | Hypervisor         |    | Guest VM           | | Guest VM           | | Guest VM           | 
 |                    |    | +----------------+ | | +----------------+ | | +----------------+ | 
 | +----------------+ |    | |Applications... | | | |Applications... | | | |Applications... | | 
 | |  NVIDIA        | |    | +----------------+ | | +----------------+ | | +----------------+ | 
 | |  Virtual GPU   | |    | +----------------+ | | +----------------+ | | +----------------+ | 
 | |  Manager       | |    | |  Guest Driver  | | | |  Guest Driver  | | | |  Guest Driver  | | 
 | +------^---------+ |    | +----------------+ | | +----------------+ | | +----------------+ | 
 |        |           |    +---------^----------+ +----------^---------+ +----------^---------+ 
 |        |           |              |                       |                      |           
 |        |           +--------------+-----------------------+----------------------+---------+ 
 |        |                          |                       |                      |         | 
 |        |                          |                       |                      |         | 
 +--------+--------------------------+-----------------------+----------------------+---------+ 
+---------v--------------------------+-----------------------+----------------------+----------+
| NVIDIA                  +----------v---------+ +-----------v--------+ +-----------v--------+ |
| Physical GPU            |   Virtual GPU      | |   Virtual GPU      | |   Virtual GPU      | |
|                         +--------------------+ +--------------------+ +--------------------+ |
+----------------------------------------------------------------------------------------------+

Each NVIDIA vGPU is analogous to a conventional GPU, having a fixed amount
of GPU framebuffer, and one or more virtual display outputs or "heads".
The vGPU’s framebuffer is allocated out of the physical GPU’s framebuffer
at the time the vGPU is created, and the vGPU retains exclusive use of
that framebuffer until it is destroyed.

The number of physical GPUs that a board has depends on the board. Each
physical GPU can support several different types of virtual GPU (vGPU).
vGPU types have a fixed amount of frame buffer, number of supported
display heads, and maximum resolutions. They are grouped into different
series according to the different classes of workload for which they are
optimized. Each series is identified by the last letter of the vGPU type
name.

NVIDIA vGPU supports Windows and Linux guest VM operating systems. The
supported vGPU types depend on the guest VM OS.

2. Proposal for upstream
========================

2.1 Architecture
----------------

Moving to the upstream, the proposed architecture can be illustrated as followings:

                            +--------------------+ +--------------------+ +--------------------+ 
                            | Linux VM           | | Windows VM         | | Guest VM           | 
                            | +----------------+ | | +----------------+ | | +----------------+ | 
                            | |Applications... | | | |Applications... | | | |Applications... | | 
                            | +----------------+ | | +----------------+ | | +----------------+ | ... 
                            | +----------------+ | | +----------------+ | | +----------------+ | 
                            | |  Guest Driver  | | | |  Guest Driver  | | | |  Guest Driver  | | 
                            | +----------------+ | | +----------------+ | | +----------------+ | 
                            +---------^----------+ +----------^---------+ +----------^---------+ 
                                      |                       |                      |           
                           +--------------------------------------------------------------------+
                           |+--------------------+ +--------------------+ +--------------------+|
                           ||       QEMU         | |       QEMU         | |       QEMU         ||
                           ||                    | |                    | |                    ||
                           |+--------------------+ +--------------------+ +--------------------+|
                           +--------------------------------------------------------------------+
                                      |                       |                      |
+-----------------------------------------------------------------------------------------------+
|                           +----------------------------------------------------------------+  |
|                           |                                VFIO                            |  |
|                           |                                                                |  |
| +-----------------------+ | +------------------------+  +---------------------------------+|  |
| |  Core Driver vGPU     | | |                        |  |                                 ||  |
| |       Support        <--->|                       <---->                                ||  |
| +-----------------------+ | | NVIDIA vGPU Manager    |  | NVIDIA vGPU VFIO Variant Driver ||  |
| |    NVIDIA GPU Core    | | |                        |  |                                 ||  |
| |        Driver         | | +------------------------+  +---------------------------------+|  |
| +--------^--------------+ +----------------------------------------------------------------+  |
|          |                          |                       |                      |          |
+-----------------------------------------------------------------------------------------------+
           |                          |                       |                      |           
+----------|--------------------------|-----------------------|----------------------|----------+
|          v               +----------v---------+ +-----------v--------+ +-----------v--------+ |
|  NVIDIA                  |       PCI VF       | |       PCI VF       | |       PCI VF       | |
|  Physical GPU            |                    | |                    | |                    | |
|                          |   (Virtual GPU)    | |   (Virtual GPU)    | |    (Virtual GPU)   | |
|                          +--------------------+ +--------------------+ +--------------------+ |
+-----------------------------------------------------------------------------------------------+

The supported GPU generations will be Ada which come with the supported
GPU architecture. Each vGPU is backed by a PCI virtual function.

The NVIDIA vGPU VFIO module together with VFIO sits on VFs, provides
extended management and features, e.g. selecting the vGPU types, support
live migration and driver warm update.

Like other devices that VFIO supports, VFIO provides the standard
userspace APIs for device lifecycle management and advance feature
support.

The NVIDIA vGPU manager provides necessary support to the NVIDIA vGPU VFIO
variant driver to create/destroy vGPUs, query available vGPU types, select
the vGPU type, etc.

On the other side, NVIDIA vGPU manager talks to the NVIDIA GPU core driver,
which provide necessary support to reach the HW functions.

2.2 Requirements to the NVIDIA GPU core driver
----------------------------------------------

The primary use case of CSP and enterprise is a standalone minimal
drivers of vGPU manager and other necessary components.

NVIDIA vGPU manager talks to the NVIDIA GPU core driver, which provide
necessary support to:

- Load the GSP firmware, boot the GSP, provide commnication channel.
- Manage the shared/partitioned HW resources. E.g. reserving FB memory,
  channels for the vGPU mananger to create vGPUs.
- Exception handling. E.g. delivering the GSP events to vGPU manager.
- Host event dispatch. E.g. suspend/resume.
- Enumerations of HW configuration.

The NVIDIA GPU core driver, which sits on the PCI device interface of
NVIDIA GPU, provides support to both DRM driver and the vGPU manager.

In this RFC, the split nouveau GPU driver[3] is used as an example to
demostrate the requirements of vGPU manager to the core driver. The
nouveau driver is split into nouveau (the DRM driver) and nvkm (the core
driver).

3 Try the RFC patches
-----------------------

The RFC supports to create one VM to test the simple GPU workload.

- Host kernel: https://github.com/zhiwang-nvidia/linux/tree/zhi/vgpu-mgr-rfc
- Guest driver package: NVIDIA-Linux-x86_64-535.154.05.run [4]

  Install guest driver:
  # export GRID_BUILD=1
  # ./NVIDIA-Linux-x86_64-535.154.05.run

- Tested platforms: L40.
- Tested guest OS: Ubutnu 24.04 LTS.
- Supported experience: Linux rich desktop experience with simple 3D
  workload, e.g. glmark2

4 Demo
------

A demo video can be found at: https://youtu.be/YwgIvvk-V94

[1] https://www.nvidia.com/en-us/data-center/virtual-solutions/
[2] https://docs.nvidia.com/vgpu/17.0/grid-vgpu-user-guide/index.html#architecture-grid-vgpu
[3] https://lore.kernel.org/dri-devel/20240613170211.88779-1-bskeggs@nvidia.com/T/
[4] https://us.download.nvidia.com/XFree86/Linux-x86_64/535.154.05/NVIDIA-Linux-x86_64-535.154.05.run

Zhi Wang (29):
  nvkm/vgpu: introduce NVIDIA vGPU support prelude
  nvkm/vgpu: attach to nvkm as a nvkm client
  nvkm/vgpu: reserve a larger GSP heap when NVIDIA vGPU is enabled
  nvkm/vgpu: set the VF partition count when NVIDIA vGPU is enabled
  nvkm/vgpu: populate GSP_VF_INFO when NVIDIA vGPU is enabled
  nvkm/vgpu: set RMSetSriovMode when NVIDIA vGPU is enabled
  nvkm/gsp: add a notify handler for GSP event
    GPUACCT_PERFMON_UTIL_SAMPLES
  nvkm/vgpu: get the size VMMU segment from GSP firmware
  nvkm/vgpu: introduce the reserved channel allocator
  nvkm/vgpu: introduce interfaces for NVIDIA vGPU VFIO module
  nvkm/vgpu: introduce GSP RM client alloc and free for vGPU
  nvkm/vgpu: introduce GSP RM control interface for vGPU
  nvkm: move chid.h to nvkm/engine.
  nvkm/vgpu: introduce channel allocation for vGPU
  nvkm/vgpu: introduce FB memory allocation for vGPU
  nvkm/vgpu: introduce BAR1 map routines for vGPUs
  nvkm/vgpu: introduce engine bitmap for vGPU
  nvkm/vgpu: introduce pci_driver.sriov_configure() in nvkm
  vfio/vgpu_mgr: introdcue vGPU lifecycle management prelude
  vfio/vgpu_mgr: allocate GSP RM client for NVIDIA vGPU manager
  vfio/vgpu_mgr: introduce vGPU type uploading
  vfio/vgpu_mgr: allocate vGPU FB memory when creating vGPUs
  vfio/vgpu_mgr: allocate vGPU channels when creating vGPUs
  vfio/vgpu_mgr: allocate mgmt heap when creating vGPUs
  vfio/vgpu_mgr: map mgmt heap when creating a vGPU
  vfio/vgpu_mgr: allocate GSP RM client when creating vGPUs
  vfio/vgpu_mgr: bootload the new vGPU
  vfio/vgpu_mgr: introduce vGPU host RPC channel
  vfio/vgpu_mgr: introduce NVIDIA vGPU VFIO variant driver

 .../drm/nouveau/include/nvkm/core/device.h    |   3 +
 .../drm/nouveau/include/nvkm/engine/chid.h    |  29 +
 .../gpu/drm/nouveau/include/nvkm/subdev/gsp.h |   1 +
 .../nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h  |  45 ++
 .../nvidia/inc/ctrl/ctrl2080/ctrl2080gpu.h    |  12 +
 drivers/gpu/drm/nouveau/nvkm/Kbuild           |   1 +
 drivers/gpu/drm/nouveau/nvkm/device/pci.c     |  33 +-
 .../gpu/drm/nouveau/nvkm/engine/fifo/chid.c   |  49 +-
 .../gpu/drm/nouveau/nvkm/engine/fifo/chid.h   |  26 +-
 .../gpu/drm/nouveau/nvkm/engine/fifo/r535.c   |   3 +
 .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c    |  14 +-
 drivers/gpu/drm/nouveau/nvkm/vgpu_mgr/Kbuild  |   3 +
 drivers/gpu/drm/nouveau/nvkm/vgpu_mgr/vfio.c  | 302 +++++++++++
 .../gpu/drm/nouveau/nvkm/vgpu_mgr/vgpu_mgr.c  | 234 ++++++++
 drivers/vfio/pci/Kconfig                      |   2 +
 drivers/vfio/pci/Makefile                     |   2 +
 drivers/vfio/pci/nvidia-vgpu/Kconfig          |  13 +
 drivers/vfio/pci/nvidia-vgpu/Makefile         |   8 +
 drivers/vfio/pci/nvidia-vgpu/debug.h          |  18 +
 .../nvidia/inc/ctrl/ctrl0000/ctrl0000system.h |  30 +
 .../nvidia/inc/ctrl/ctrl2080/ctrl2080gpu.h    |  33 ++
 .../ctrl/ctrl2080/ctrl2080vgpumgrinternal.h   | 152 ++++++
 .../common/sdk/nvidia/inc/ctrl/ctrla081.h     | 109 ++++
 .../nvrm/common/sdk/nvidia/inc/dev_vgpu_gsp.h | 213 ++++++++
 .../common/sdk/nvidia/inc/nv_vgpu_types.h     |  51 ++
 .../common/sdk/vmioplugin/inc/vmioplugin.h    |  26 +
 .../pci/nvidia-vgpu/include/nvrm/nvtypes.h    |  24 +
 drivers/vfio/pci/nvidia-vgpu/nvkm.h           |  94 ++++
 drivers/vfio/pci/nvidia-vgpu/rpc.c            | 242 +++++++++
 drivers/vfio/pci/nvidia-vgpu/vfio.h           |  43 ++
 drivers/vfio/pci/nvidia-vgpu/vfio_access.c    | 297 ++++++++++
 drivers/vfio/pci/nvidia-vgpu/vfio_main.c      | 511 ++++++++++++++++++
 drivers/vfio/pci/nvidia-vgpu/vgpu.c           | 352 ++++++++++++
 drivers/vfio/pci/nvidia-vgpu/vgpu_mgr.c       | 144 +++++
 drivers/vfio/pci/nvidia-vgpu/vgpu_mgr.h       |  89 +++
 drivers/vfio/pci/nvidia-vgpu/vgpu_types.c     | 466 ++++++++++++++++
 include/drm/nvkm_vgpu_mgr_vfio.h              |  61 +++
 37 files changed, 3702 insertions(+), 33 deletions(-)
 create mode 100644 drivers/gpu/drm/nouveau/include/nvkm/engine/chid.h
 create mode 100644 drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h
 create mode 100644 drivers/gpu/drm/nouveau/nvkm/vgpu_mgr/Kbuild
 create mode 100644 drivers/gpu/drm/nouveau/nvkm/vgpu_mgr/vfio.c
 create mode 100644 drivers/gpu/drm/nouveau/nvkm/vgpu_mgr/vgpu_mgr.c
 create mode 100644 drivers/vfio/pci/nvidia-vgpu/Kconfig
 create mode 100644 drivers/vfio/pci/nvidia-vgpu/Makefile
 create mode 100644 drivers/vfio/pci/nvidia-vgpu/debug.h
 create mode 100644 drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/nvidia/inc/ctrl/ctrl0000/ctrl0000system.h
 create mode 100644 drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gpu.h
 create mode 100644 drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080vgpumgrinternal.h
 create mode 100644 drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/nvidia/inc/ctrl/ctrla081.h
 create mode 100644 drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/nvidia/inc/dev_vgpu_gsp.h
 create mode 100644 drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/nvidia/inc/nv_vgpu_types.h
 create mode 100644 drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/vmioplugin/inc/vmioplugin.h
 create mode 100644 drivers/vfio/pci/nvidia-vgpu/include/nvrm/nvtypes.h
 create mode 100644 drivers/vfio/pci/nvidia-vgpu/nvkm.h
 create mode 100644 drivers/vfio/pci/nvidia-vgpu/rpc.c
 create mode 100644 drivers/vfio/pci/nvidia-vgpu/vfio.h
 create mode 100644 drivers/vfio/pci/nvidia-vgpu/vfio_access.c
 create mode 100644 drivers/vfio/pci/nvidia-vgpu/vfio_main.c
 create mode 100644 drivers/vfio/pci/nvidia-vgpu/vgpu.c
 create mode 100644 drivers/vfio/pci/nvidia-vgpu/vgpu_mgr.c
 create mode 100644 drivers/vfio/pci/nvidia-vgpu/vgpu_mgr.h
 create mode 100644 drivers/vfio/pci/nvidia-vgpu/vgpu_types.c
 create mode 100644 include/drm/nvkm_vgpu_mgr_vfio.h

Zhi Wang Sept. 22, 2024, 1:11 p.m. UTC | #1

On Sun, 22 Sep 2024 05:49:22 -0700
Zhi Wang <zhiw@nvidia.com> wrote:

+Ben.

Forget to add you. My bad. 
 

> 1. Background
> =============
> 
> NVIDIA vGPU[1] software enables powerful GPU performance for workloads
> ranging from graphics-rich virtual workstations to data science and
> AI, enabling IT to leverage the management and security benefits of
> virtualization as well as the performance of NVIDIA GPUs required for
> modern workloads. Installed on a physical GPU in a cloud or enterprise
> data center server, NVIDIA vGPU software creates virtual GPUs that can
> be shared across multiple virtual machines.
> 
> The vGPU architecture[2] can be illustrated as follow:
> 
>  +--------------------+    +--------------------+
> +--------------------+ +--------------------+ | Hypervisor         |
>   | Guest VM           | | Guest VM           | | Guest VM
> | |                    |    | +----------------+ | |
> +----------------+ | | +----------------+ | | +----------------+ |
> | |Applications... | | | |Applications... | | | |Applications... | |
> | |  NVIDIA        | |    | +----------------+ | | +----------------+
> | | +----------------+ | | |  Virtual GPU   | |    |
> +----------------+ | | +----------------+ | | +----------------+ | |
> |  Manager       | |    | |  Guest Driver  | | | |  Guest Driver  | |
> | |  Guest Driver  | | | +------^---------+ |    | +----------------+
> | | +----------------+ | | +----------------+ | |        |
> |    +---------^----------+ +----------^---------+
> +----------^---------+ |        |           |              |
>              |                      | |        |
> +--------------+-----------------------+----------------------+---------+
> |        |                          |                       |
>              |         | |        |                          |
>                |                      |         |
> +--------+--------------------------+-----------------------+----------------------+---------+
> +---------v--------------------------+-----------------------+----------------------+----------+
> | NVIDIA                  +----------v---------+
> +-----------v--------+ +-----------v--------+ | | Physical GPU
>     |   Virtual GPU      | |   Virtual GPU      | |   Virtual GPU
>  | | |                         +--------------------+
> +--------------------+ +--------------------+ |
> +----------------------------------------------------------------------------------------------+
> 
> Each NVIDIA vGPU is analogous to a conventional GPU, having a fixed
> amount of GPU framebuffer, and one or more virtual display outputs or
> "heads". The vGPU’s framebuffer is allocated out of the physical
> GPU’s framebuffer at the time the vGPU is created, and the vGPU
> retains exclusive use of that framebuffer until it is destroyed.
> 
> The number of physical GPUs that a board has depends on the board.
> Each physical GPU can support several different types of virtual GPU
> (vGPU). vGPU types have a fixed amount of frame buffer, number of
> supported display heads, and maximum resolutions. They are grouped
> into different series according to the different classes of workload
> for which they are optimized. Each series is identified by the last
> letter of the vGPU type name.
> 
> NVIDIA vGPU supports Windows and Linux guest VM operating systems. The
> supported vGPU types depend on the guest VM OS.
> 
> 2. Proposal for upstream
> ========================
> 
> 2.1 Architecture
> ----------------
> 
> Moving to the upstream, the proposed architecture can be illustrated
> as followings:
> 
>                             +--------------------+
> +--------------------+ +--------------------+ | Linux VM           |
> | Windows VM         | | Guest VM           | | +----------------+ |
> | +----------------+ | | +----------------+ | | |Applications... | |
> | |Applications... | | | |Applications... | | | +----------------+ |
> | +----------------+ | | +----------------+ | ... |
> +----------------+ | | +----------------+ | | +----------------+ | |
> |  Guest Driver  | | | |  Guest Driver  | | | |  Guest Driver  | | |
> +----------------+ | | +----------------+ | | +----------------+ |
> +---------^----------+ +----------^---------+ +----------^---------+
> |                       |                      |
> +--------------------------------------------------------------------+
> |+--------------------+ +--------------------+
> +--------------------+| ||       QEMU         | |       QEMU
> | |       QEMU         || ||                    | |
>  | |                    || |+--------------------+
> +--------------------+ +--------------------+|
> +--------------------------------------------------------------------+
> |                       |                      |
> +-----------------------------------------------------------------------------------------------+
> |
> +----------------------------------------------------------------+  |
> |                           |                                VFIO
>                        |  | |                           |
>                                                    |  | |
> +-----------------------+ | +------------------------+
> +---------------------------------+|  | | |  Core Driver vGPU     | |
> |                        |  |                                 ||  | |
> |       Support        <--->|                       <---->
>                     ||  | | +-----------------------+ | | NVIDIA vGPU
> Manager    |  | NVIDIA vGPU VFIO Variant Driver ||  | | |    NVIDIA
> GPU Core    | | |                        |  |
>         ||  | | |        Driver         | |
> +------------------------+  +---------------------------------+|  | |
> +--------^--------------+
> +----------------------------------------------------------------+  |
> |          |                          |                       |
>                |          |
> +-----------------------------------------------------------------------------------------------+
> |                          |                       |
>     |
> +----------|--------------------------|-----------------------|----------------------|----------+
> |          v               +----------v---------+
> +-----------v--------+ +-----------v--------+ | |  NVIDIA
>      |       PCI VF       | |       PCI VF       | |       PCI VF
>   | | |  Physical GPU            |                    | |
>        | |                    | | |                          |
> (Virtual GPU)    | |   (Virtual GPU)    | |    (Virtual GPU)   | | |
>                         +--------------------+ +--------------------+
> +--------------------+ |
> +-----------------------------------------------------------------------------------------------+
> 
> The supported GPU generations will be Ada which come with the
> supported GPU architecture. Each vGPU is backed by a PCI virtual
> function.
> 
> The NVIDIA vGPU VFIO module together with VFIO sits on VFs, provides
> extended management and features, e.g. selecting the vGPU types,
> support live migration and driver warm update.
> 
> Like other devices that VFIO supports, VFIO provides the standard
> userspace APIs for device lifecycle management and advance feature
> support.
> 
> The NVIDIA vGPU manager provides necessary support to the NVIDIA vGPU
> VFIO variant driver to create/destroy vGPUs, query available vGPU
> types, select the vGPU type, etc.
> 
> On the other side, NVIDIA vGPU manager talks to the NVIDIA GPU core
> driver, which provide necessary support to reach the HW functions.
> 
> 2.2 Requirements to the NVIDIA GPU core driver
> ----------------------------------------------
> 
> The primary use case of CSP and enterprise is a standalone minimal
> drivers of vGPU manager and other necessary components.
> 
> NVIDIA vGPU manager talks to the NVIDIA GPU core driver, which provide
> necessary support to:
> 
> - Load the GSP firmware, boot the GSP, provide commnication channel.
> - Manage the shared/partitioned HW resources. E.g. reserving FB
> memory, channels for the vGPU mananger to create vGPUs.
> - Exception handling. E.g. delivering the GSP events to vGPU manager.
> - Host event dispatch. E.g. suspend/resume.
> - Enumerations of HW configuration.
> 
> The NVIDIA GPU core driver, which sits on the PCI device interface of
> NVIDIA GPU, provides support to both DRM driver and the vGPU manager.
> 
> In this RFC, the split nouveau GPU driver[3] is used as an example to
> demostrate the requirements of vGPU manager to the core driver. The
> nouveau driver is split into nouveau (the DRM driver) and nvkm (the
> core driver).
> 
> 3 Try the RFC patches
> -----------------------
> 
> The RFC supports to create one VM to test the simple GPU workload.
> 
> - Host kernel:
> https://github.com/zhiwang-nvidia/linux/tree/zhi/vgpu-mgr-rfc
> - Guest driver package: NVIDIA-Linux-x86_64-535.154.05.run [4]
> 
>   Install guest driver:
>   # export GRID_BUILD=1
>   # ./NVIDIA-Linux-x86_64-535.154.05.run
> 
> - Tested platforms: L40.
> - Tested guest OS: Ubutnu 24.04 LTS.
> - Supported experience: Linux rich desktop experience with simple 3D
>   workload, e.g. glmark2
> 
> 4 Demo
> ------
> 
> A demo video can be found at: https://youtu.be/YwgIvvk-V94
> 
> [1] https://www.nvidia.com/en-us/data-center/virtual-solutions/
> [2]
> https://docs.nvidia.com/vgpu/17.0/grid-vgpu-user-guide/index.html#architecture-grid-vgpu
> [3]
> https://lore.kernel.org/dri-devel/20240613170211.88779-1-bskeggs@nvidia.com/T/
> [4]
> https://us.download.nvidia.com/XFree86/Linux-x86_64/535.154.05/NVIDIA-Linux-x86_64-535.154.05.run
> 
> Zhi Wang (29):
>   nvkm/vgpu: introduce NVIDIA vGPU support prelude
>   nvkm/vgpu: attach to nvkm as a nvkm client
>   nvkm/vgpu: reserve a larger GSP heap when NVIDIA vGPU is enabled
>   nvkm/vgpu: set the VF partition count when NVIDIA vGPU is enabled
>   nvkm/vgpu: populate GSP_VF_INFO when NVIDIA vGPU is enabled
>   nvkm/vgpu: set RMSetSriovMode when NVIDIA vGPU is enabled
>   nvkm/gsp: add a notify handler for GSP event
>     GPUACCT_PERFMON_UTIL_SAMPLES
>   nvkm/vgpu: get the size VMMU segment from GSP firmware
>   nvkm/vgpu: introduce the reserved channel allocator
>   nvkm/vgpu: introduce interfaces for NVIDIA vGPU VFIO module
>   nvkm/vgpu: introduce GSP RM client alloc and free for vGPU
>   nvkm/vgpu: introduce GSP RM control interface for vGPU
>   nvkm: move chid.h to nvkm/engine.
>   nvkm/vgpu: introduce channel allocation for vGPU
>   nvkm/vgpu: introduce FB memory allocation for vGPU
>   nvkm/vgpu: introduce BAR1 map routines for vGPUs
>   nvkm/vgpu: introduce engine bitmap for vGPU
>   nvkm/vgpu: introduce pci_driver.sriov_configure() in nvkm
>   vfio/vgpu_mgr: introdcue vGPU lifecycle management prelude
>   vfio/vgpu_mgr: allocate GSP RM client for NVIDIA vGPU manager
>   vfio/vgpu_mgr: introduce vGPU type uploading
>   vfio/vgpu_mgr: allocate vGPU FB memory when creating vGPUs
>   vfio/vgpu_mgr: allocate vGPU channels when creating vGPUs
>   vfio/vgpu_mgr: allocate mgmt heap when creating vGPUs
>   vfio/vgpu_mgr: map mgmt heap when creating a vGPU
>   vfio/vgpu_mgr: allocate GSP RM client when creating vGPUs
>   vfio/vgpu_mgr: bootload the new vGPU
>   vfio/vgpu_mgr: introduce vGPU host RPC channel
>   vfio/vgpu_mgr: introduce NVIDIA vGPU VFIO variant driver
> 
>  .../drm/nouveau/include/nvkm/core/device.h    |   3 +
>  .../drm/nouveau/include/nvkm/engine/chid.h    |  29 +
>  .../gpu/drm/nouveau/include/nvkm/subdev/gsp.h |   1 +
>  .../nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h  |  45 ++
>  .../nvidia/inc/ctrl/ctrl2080/ctrl2080gpu.h    |  12 +
>  drivers/gpu/drm/nouveau/nvkm/Kbuild           |   1 +
>  drivers/gpu/drm/nouveau/nvkm/device/pci.c     |  33 +-
>  .../gpu/drm/nouveau/nvkm/engine/fifo/chid.c   |  49 +-
>  .../gpu/drm/nouveau/nvkm/engine/fifo/chid.h   |  26 +-
>  .../gpu/drm/nouveau/nvkm/engine/fifo/r535.c   |   3 +
>  .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c    |  14 +-
>  drivers/gpu/drm/nouveau/nvkm/vgpu_mgr/Kbuild  |   3 +
>  drivers/gpu/drm/nouveau/nvkm/vgpu_mgr/vfio.c  | 302 +++++++++++
>  .../gpu/drm/nouveau/nvkm/vgpu_mgr/vgpu_mgr.c  | 234 ++++++++
>  drivers/vfio/pci/Kconfig                      |   2 +
>  drivers/vfio/pci/Makefile                     |   2 +
>  drivers/vfio/pci/nvidia-vgpu/Kconfig          |  13 +
>  drivers/vfio/pci/nvidia-vgpu/Makefile         |   8 +
>  drivers/vfio/pci/nvidia-vgpu/debug.h          |  18 +
>  .../nvidia/inc/ctrl/ctrl0000/ctrl0000system.h |  30 +
>  .../nvidia/inc/ctrl/ctrl2080/ctrl2080gpu.h    |  33 ++
>  .../ctrl/ctrl2080/ctrl2080vgpumgrinternal.h   | 152 ++++++
>  .../common/sdk/nvidia/inc/ctrl/ctrla081.h     | 109 ++++
>  .../nvrm/common/sdk/nvidia/inc/dev_vgpu_gsp.h | 213 ++++++++
>  .../common/sdk/nvidia/inc/nv_vgpu_types.h     |  51 ++
>  .../common/sdk/vmioplugin/inc/vmioplugin.h    |  26 +
>  .../pci/nvidia-vgpu/include/nvrm/nvtypes.h    |  24 +
>  drivers/vfio/pci/nvidia-vgpu/nvkm.h           |  94 ++++
>  drivers/vfio/pci/nvidia-vgpu/rpc.c            | 242 +++++++++
>  drivers/vfio/pci/nvidia-vgpu/vfio.h           |  43 ++
>  drivers/vfio/pci/nvidia-vgpu/vfio_access.c    | 297 ++++++++++
>  drivers/vfio/pci/nvidia-vgpu/vfio_main.c      | 511
> ++++++++++++++++++ drivers/vfio/pci/nvidia-vgpu/vgpu.c           |
> 352 ++++++++++++ drivers/vfio/pci/nvidia-vgpu/vgpu_mgr.c       | 144
> +++++ drivers/vfio/pci/nvidia-vgpu/vgpu_mgr.h       |  89 +++
>  drivers/vfio/pci/nvidia-vgpu/vgpu_types.c     | 466 ++++++++++++++++
>  include/drm/nvkm_vgpu_mgr_vfio.h              |  61 +++
>  37 files changed, 3702 insertions(+), 33 deletions(-)
>  create mode 100644 drivers/gpu/drm/nouveau/include/nvkm/engine/chid.h
>  create mode 100644
> drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h create mode
> 100644 drivers/gpu/drm/nouveau/nvkm/vgpu_mgr/Kbuild create mode
> 100644 drivers/gpu/drm/nouveau/nvkm/vgpu_mgr/vfio.c create mode
> 100644 drivers/gpu/drm/nouveau/nvkm/vgpu_mgr/vgpu_mgr.c create mode
> 100644 drivers/vfio/pci/nvidia-vgpu/Kconfig create mode 100644
> drivers/vfio/pci/nvidia-vgpu/Makefile create mode 100644
> drivers/vfio/pci/nvidia-vgpu/debug.h create mode 100644
> drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/nvidia/inc/ctrl/ctrl0000/ctrl0000system.h
> create mode 100644
> drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gpu.h
> create mode 100644
> drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080vgpumgrinternal.h
> create mode 100644
> drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/nvidia/inc/ctrl/ctrla081.h
> create mode 100644
> drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/nvidia/inc/dev_vgpu_gsp.h
> create mode 100644
> drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/nvidia/inc/nv_vgpu_types.h
> create mode 100644
> drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/vmioplugin/inc/vmioplugin.h
> create mode 100644
> drivers/vfio/pci/nvidia-vgpu/include/nvrm/nvtypes.h create mode
> 100644 drivers/vfio/pci/nvidia-vgpu/nvkm.h create mode 100644
> drivers/vfio/pci/nvidia-vgpu/rpc.c create mode 100644
> drivers/vfio/pci/nvidia-vgpu/vfio.h create mode 100644
> drivers/vfio/pci/nvidia-vgpu/vfio_access.c create mode 100644
> drivers/vfio/pci/nvidia-vgpu/vfio_main.c create mode 100644
> drivers/vfio/pci/nvidia-vgpu/vgpu.c create mode 100644
> drivers/vfio/pci/nvidia-vgpu/vgpu_mgr.c create mode 100644
> drivers/vfio/pci/nvidia-vgpu/vgpu_mgr.h create mode 100644
> drivers/vfio/pci/nvidia-vgpu/vgpu_types.c create mode 100644
> include/drm/nvkm_vgpu_mgr_vfio.h
>

Tian, Kevin Sept. 23, 2024, 6:22 a.m. UTC | #2

> From: Zhi Wang <zhiw@nvidia.com>
> Sent: Sunday, September 22, 2024 8:49 PM
> 
[...]
> 
> The NVIDIA vGPU VFIO module together with VFIO sits on VFs, provides
> extended management and features, e.g. selecting the vGPU types, support
> live migration and driver warm update.
> 
> Like other devices that VFIO supports, VFIO provides the standard
> userspace APIs for device lifecycle management and advance feature
> support.
> 
> The NVIDIA vGPU manager provides necessary support to the NVIDIA vGPU VFIO
> variant driver to create/destroy vGPUs, query available vGPU types, select
> the vGPU type, etc.
> 
> On the other side, NVIDIA vGPU manager talks to the NVIDIA GPU core driver,
> which provide necessary support to reach the HW functions.
> 

I'm not sure VFIO is the right place to host the NVIDIA vGPU manager. 
It's very NVIDIA specific and naturally fit in the PF driver.

The VFIO side should focus on what's necessary for managing userspace
access to the VF hw, i.e. patch29.

Danilo Krummrich Sept. 23, 2024, 8:38 a.m. UTC | #3

On Sun, Sep 22, 2024 at 04:11:21PM +0300, Zhi Wang wrote:
> On Sun, 22 Sep 2024 05:49:22 -0700
> Zhi Wang <zhiw@nvidia.com> wrote:
> 
> +Ben.
> 
> Forget to add you. My bad. 

Please also add the driver maintainers!

I had to fetch the patchset from the KVM list, since they did not hit the
nouveau list (I'm trying to get @nvidia.com addresses whitelisted).

- Danilo

>  
> 
> > 1. Background
> > =============
> > 
> > NVIDIA vGPU[1] software enables powerful GPU performance for workloads
> > ranging from graphics-rich virtual workstations to data science and
> > AI, enabling IT to leverage the management and security benefits of
> > virtualization as well as the performance of NVIDIA GPUs required for
> > modern workloads. Installed on a physical GPU in a cloud or enterprise
> > data center server, NVIDIA vGPU software creates virtual GPUs that can
> > be shared across multiple virtual machines.
> > 
> > The vGPU architecture[2] can be illustrated as follow:
> > 
> >  +--------------------+    +--------------------+
> > +--------------------+ +--------------------+ | Hypervisor         |
> >   | Guest VM           | | Guest VM           | | Guest VM
> > | |                    |    | +----------------+ | |
> > +----------------+ | | +----------------+ | | +----------------+ |
> > | |Applications... | | | |Applications... | | | |Applications... | |
> > | |  NVIDIA        | |    | +----------------+ | | +----------------+
> > | | +----------------+ | | |  Virtual GPU   | |    |
> > +----------------+ | | +----------------+ | | +----------------+ | |
> > |  Manager       | |    | |  Guest Driver  | | | |  Guest Driver  | |
> > | |  Guest Driver  | | | +------^---------+ |    | +----------------+
> > | | +----------------+ | | +----------------+ | |        |
> > |    +---------^----------+ +----------^---------+
> > +----------^---------+ |        |           |              |
> >              |                      | |        |
> > +--------------+-----------------------+----------------------+---------+
> > |        |                          |                       |
> >              |         | |        |                          |
> >                |                      |         |
> > +--------+--------------------------+-----------------------+----------------------+---------+
> > +---------v--------------------------+-----------------------+----------------------+----------+
> > | NVIDIA                  +----------v---------+
> > +-----------v--------+ +-----------v--------+ | | Physical GPU
> >     |   Virtual GPU      | |   Virtual GPU      | |   Virtual GPU
> >  | | |                         +--------------------+
> > +--------------------+ +--------------------+ |
> > +----------------------------------------------------------------------------------------------+
> > 
> > Each NVIDIA vGPU is analogous to a conventional GPU, having a fixed
> > amount of GPU framebuffer, and one or more virtual display outputs or
> > "heads". The vGPU’s framebuffer is allocated out of the physical
> > GPU’s framebuffer at the time the vGPU is created, and the vGPU
> > retains exclusive use of that framebuffer until it is destroyed.
> > 
> > The number of physical GPUs that a board has depends on the board.
> > Each physical GPU can support several different types of virtual GPU
> > (vGPU). vGPU types have a fixed amount of frame buffer, number of
> > supported display heads, and maximum resolutions. They are grouped
> > into different series according to the different classes of workload
> > for which they are optimized. Each series is identified by the last
> > letter of the vGPU type name.
> > 
> > NVIDIA vGPU supports Windows and Linux guest VM operating systems. The
> > supported vGPU types depend on the guest VM OS.
> > 
> > 2. Proposal for upstream
> > ========================
> > 
> > 2.1 Architecture
> > ----------------
> > 
> > Moving to the upstream, the proposed architecture can be illustrated
> > as followings:
> > 
> >                             +--------------------+
> > +--------------------+ +--------------------+ | Linux VM           |
> > | Windows VM         | | Guest VM           | | +----------------+ |
> > | +----------------+ | | +----------------+ | | |Applications... | |
> > | |Applications... | | | |Applications... | | | +----------------+ |
> > | +----------------+ | | +----------------+ | ... |
> > +----------------+ | | +----------------+ | | +----------------+ | |
> > |  Guest Driver  | | | |  Guest Driver  | | | |  Guest Driver  | | |
> > +----------------+ | | +----------------+ | | +----------------+ |
> > +---------^----------+ +----------^---------+ +----------^---------+
> > |                       |                      |
> > +--------------------------------------------------------------------+
> > |+--------------------+ +--------------------+
> > +--------------------+| ||       QEMU         | |       QEMU
> > | |       QEMU         || ||                    | |
> >  | |                    || |+--------------------+
> > +--------------------+ +--------------------+|
> > +--------------------------------------------------------------------+
> > |                       |                      |
> > +-----------------------------------------------------------------------------------------------+
> > |
> > +----------------------------------------------------------------+  |
> > |                           |                                VFIO
> >                        |  | |                           |
> >                                                    |  | |
> > +-----------------------+ | +------------------------+
> > +---------------------------------+|  | | |  Core Driver vGPU     | |
> > |                        |  |                                 ||  | |
> > |       Support        <--->|                       <---->
> >                     ||  | | +-----------------------+ | | NVIDIA vGPU
> > Manager    |  | NVIDIA vGPU VFIO Variant Driver ||  | | |    NVIDIA
> > GPU Core    | | |                        |  |
> >         ||  | | |        Driver         | |
> > +------------------------+  +---------------------------------+|  | |
> > +--------^--------------+
> > +----------------------------------------------------------------+  |
> > |          |                          |                       |
> >                |          |
> > +-----------------------------------------------------------------------------------------------+
> > |                          |                       |
> >     |
> > +----------|--------------------------|-----------------------|----------------------|----------+
> > |          v               +----------v---------+
> > +-----------v--------+ +-----------v--------+ | |  NVIDIA
> >      |       PCI VF       | |       PCI VF       | |       PCI VF
> >   | | |  Physical GPU            |                    | |
> >        | |                    | | |                          |
> > (Virtual GPU)    | |   (Virtual GPU)    | |    (Virtual GPU)   | | |
> >                         +--------------------+ +--------------------+
> > +--------------------+ |
> > +-----------------------------------------------------------------------------------------------+
> > 
> > The supported GPU generations will be Ada which come with the
> > supported GPU architecture. Each vGPU is backed by a PCI virtual
> > function.
> > 
> > The NVIDIA vGPU VFIO module together with VFIO sits on VFs, provides
> > extended management and features, e.g. selecting the vGPU types,
> > support live migration and driver warm update.
> > 
> > Like other devices that VFIO supports, VFIO provides the standard
> > userspace APIs for device lifecycle management and advance feature
> > support.
> > 
> > The NVIDIA vGPU manager provides necessary support to the NVIDIA vGPU
> > VFIO variant driver to create/destroy vGPUs, query available vGPU
> > types, select the vGPU type, etc.
> > 
> > On the other side, NVIDIA vGPU manager talks to the NVIDIA GPU core
> > driver, which provide necessary support to reach the HW functions.
> > 
> > 2.2 Requirements to the NVIDIA GPU core driver
> > ----------------------------------------------
> > 
> > The primary use case of CSP and enterprise is a standalone minimal
> > drivers of vGPU manager and other necessary components.
> > 
> > NVIDIA vGPU manager talks to the NVIDIA GPU core driver, which provide
> > necessary support to:
> > 
> > - Load the GSP firmware, boot the GSP, provide commnication channel.
> > - Manage the shared/partitioned HW resources. E.g. reserving FB
> > memory, channels for the vGPU mananger to create vGPUs.
> > - Exception handling. E.g. delivering the GSP events to vGPU manager.
> > - Host event dispatch. E.g. suspend/resume.
> > - Enumerations of HW configuration.
> > 
> > The NVIDIA GPU core driver, which sits on the PCI device interface of
> > NVIDIA GPU, provides support to both DRM driver and the vGPU manager.
> > 
> > In this RFC, the split nouveau GPU driver[3] is used as an example to
> > demostrate the requirements of vGPU manager to the core driver. The
> > nouveau driver is split into nouveau (the DRM driver) and nvkm (the
> > core driver).
> > 
> > 3 Try the RFC patches
> > -----------------------
> > 
> > The RFC supports to create one VM to test the simple GPU workload.
> > 
> > - Host kernel:
> > https://github.com/zhiwang-nvidia/linux/tree/zhi/vgpu-mgr-rfc
> > - Guest driver package: NVIDIA-Linux-x86_64-535.154.05.run [4]
> > 
> >   Install guest driver:
> >   # export GRID_BUILD=1
> >   # ./NVIDIA-Linux-x86_64-535.154.05.run
> > 
> > - Tested platforms: L40.
> > - Tested guest OS: Ubutnu 24.04 LTS.
> > - Supported experience: Linux rich desktop experience with simple 3D
> >   workload, e.g. glmark2
> > 
> > 4 Demo
> > ------
> > 
> > A demo video can be found at: https://youtu.be/YwgIvvk-V94
> > 
> > [1] https://www.nvidia.com/en-us/data-center/virtual-solutions/
> > [2]
> > https://docs.nvidia.com/vgpu/17.0/grid-vgpu-user-guide/index.html#architecture-grid-vgpu
> > [3]
> > https://lore.kernel.org/dri-devel/20240613170211.88779-1-bskeggs@nvidia.com/T/
> > [4]
> > https://us.download.nvidia.com/XFree86/Linux-x86_64/535.154.05/NVIDIA-Linux-x86_64-535.154.05.run
> > 
> > Zhi Wang (29):
> >   nvkm/vgpu: introduce NVIDIA vGPU support prelude
> >   nvkm/vgpu: attach to nvkm as a nvkm client
> >   nvkm/vgpu: reserve a larger GSP heap when NVIDIA vGPU is enabled
> >   nvkm/vgpu: set the VF partition count when NVIDIA vGPU is enabled
> >   nvkm/vgpu: populate GSP_VF_INFO when NVIDIA vGPU is enabled
> >   nvkm/vgpu: set RMSetSriovMode when NVIDIA vGPU is enabled
> >   nvkm/gsp: add a notify handler for GSP event
> >     GPUACCT_PERFMON_UTIL_SAMPLES
> >   nvkm/vgpu: get the size VMMU segment from GSP firmware
> >   nvkm/vgpu: introduce the reserved channel allocator
> >   nvkm/vgpu: introduce interfaces for NVIDIA vGPU VFIO module
> >   nvkm/vgpu: introduce GSP RM client alloc and free for vGPU
> >   nvkm/vgpu: introduce GSP RM control interface for vGPU
> >   nvkm: move chid.h to nvkm/engine.
> >   nvkm/vgpu: introduce channel allocation for vGPU
> >   nvkm/vgpu: introduce FB memory allocation for vGPU
> >   nvkm/vgpu: introduce BAR1 map routines for vGPUs
> >   nvkm/vgpu: introduce engine bitmap for vGPU
> >   nvkm/vgpu: introduce pci_driver.sriov_configure() in nvkm
> >   vfio/vgpu_mgr: introdcue vGPU lifecycle management prelude
> >   vfio/vgpu_mgr: allocate GSP RM client for NVIDIA vGPU manager
> >   vfio/vgpu_mgr: introduce vGPU type uploading
> >   vfio/vgpu_mgr: allocate vGPU FB memory when creating vGPUs
> >   vfio/vgpu_mgr: allocate vGPU channels when creating vGPUs
> >   vfio/vgpu_mgr: allocate mgmt heap when creating vGPUs
> >   vfio/vgpu_mgr: map mgmt heap when creating a vGPU
> >   vfio/vgpu_mgr: allocate GSP RM client when creating vGPUs
> >   vfio/vgpu_mgr: bootload the new vGPU
> >   vfio/vgpu_mgr: introduce vGPU host RPC channel
> >   vfio/vgpu_mgr: introduce NVIDIA vGPU VFIO variant driver
> > 
> >  .../drm/nouveau/include/nvkm/core/device.h    |   3 +
> >  .../drm/nouveau/include/nvkm/engine/chid.h    |  29 +
> >  .../gpu/drm/nouveau/include/nvkm/subdev/gsp.h |   1 +
> >  .../nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h  |  45 ++
> >  .../nvidia/inc/ctrl/ctrl2080/ctrl2080gpu.h    |  12 +
> >  drivers/gpu/drm/nouveau/nvkm/Kbuild           |   1 +
> >  drivers/gpu/drm/nouveau/nvkm/device/pci.c     |  33 +-
> >  .../gpu/drm/nouveau/nvkm/engine/fifo/chid.c   |  49 +-
> >  .../gpu/drm/nouveau/nvkm/engine/fifo/chid.h   |  26 +-
> >  .../gpu/drm/nouveau/nvkm/engine/fifo/r535.c   |   3 +
> >  .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c    |  14 +-
> >  drivers/gpu/drm/nouveau/nvkm/vgpu_mgr/Kbuild  |   3 +
> >  drivers/gpu/drm/nouveau/nvkm/vgpu_mgr/vfio.c  | 302 +++++++++++
> >  .../gpu/drm/nouveau/nvkm/vgpu_mgr/vgpu_mgr.c  | 234 ++++++++
> >  drivers/vfio/pci/Kconfig                      |   2 +
> >  drivers/vfio/pci/Makefile                     |   2 +
> >  drivers/vfio/pci/nvidia-vgpu/Kconfig          |  13 +
> >  drivers/vfio/pci/nvidia-vgpu/Makefile         |   8 +
> >  drivers/vfio/pci/nvidia-vgpu/debug.h          |  18 +
> >  .../nvidia/inc/ctrl/ctrl0000/ctrl0000system.h |  30 +
> >  .../nvidia/inc/ctrl/ctrl2080/ctrl2080gpu.h    |  33 ++
> >  .../ctrl/ctrl2080/ctrl2080vgpumgrinternal.h   | 152 ++++++
> >  .../common/sdk/nvidia/inc/ctrl/ctrla081.h     | 109 ++++
> >  .../nvrm/common/sdk/nvidia/inc/dev_vgpu_gsp.h | 213 ++++++++
> >  .../common/sdk/nvidia/inc/nv_vgpu_types.h     |  51 ++
> >  .../common/sdk/vmioplugin/inc/vmioplugin.h    |  26 +
> >  .../pci/nvidia-vgpu/include/nvrm/nvtypes.h    |  24 +
> >  drivers/vfio/pci/nvidia-vgpu/nvkm.h           |  94 ++++
> >  drivers/vfio/pci/nvidia-vgpu/rpc.c            | 242 +++++++++
> >  drivers/vfio/pci/nvidia-vgpu/vfio.h           |  43 ++
> >  drivers/vfio/pci/nvidia-vgpu/vfio_access.c    | 297 ++++++++++
> >  drivers/vfio/pci/nvidia-vgpu/vfio_main.c      | 511
> > ++++++++++++++++++ drivers/vfio/pci/nvidia-vgpu/vgpu.c           |
> > 352 ++++++++++++ drivers/vfio/pci/nvidia-vgpu/vgpu_mgr.c       | 144
> > +++++ drivers/vfio/pci/nvidia-vgpu/vgpu_mgr.h       |  89 +++
> >  drivers/vfio/pci/nvidia-vgpu/vgpu_types.c     | 466 ++++++++++++++++
> >  include/drm/nvkm_vgpu_mgr_vfio.h              |  61 +++
> >  37 files changed, 3702 insertions(+), 33 deletions(-)
> >  create mode 100644 drivers/gpu/drm/nouveau/include/nvkm/engine/chid.h
> >  create mode 100644
> > drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h create mode
> > 100644 drivers/gpu/drm/nouveau/nvkm/vgpu_mgr/Kbuild create mode
> > 100644 drivers/gpu/drm/nouveau/nvkm/vgpu_mgr/vfio.c create mode
> > 100644 drivers/gpu/drm/nouveau/nvkm/vgpu_mgr/vgpu_mgr.c create mode
> > 100644 drivers/vfio/pci/nvidia-vgpu/Kconfig create mode 100644
> > drivers/vfio/pci/nvidia-vgpu/Makefile create mode 100644
> > drivers/vfio/pci/nvidia-vgpu/debug.h create mode 100644
> > drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/nvidia/inc/ctrl/ctrl0000/ctrl0000system.h
> > create mode 100644
> > drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gpu.h
> > create mode 100644
> > drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080vgpumgrinternal.h
> > create mode 100644
> > drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/nvidia/inc/ctrl/ctrla081.h
> > create mode 100644
> > drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/nvidia/inc/dev_vgpu_gsp.h
> > create mode 100644
> > drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/nvidia/inc/nv_vgpu_types.h
> > create mode 100644
> > drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/vmioplugin/inc/vmioplugin.h
> > create mode 100644
> > drivers/vfio/pci/nvidia-vgpu/include/nvrm/nvtypes.h create mode
> > 100644 drivers/vfio/pci/nvidia-vgpu/nvkm.h create mode 100644
> > drivers/vfio/pci/nvidia-vgpu/rpc.c create mode 100644
> > drivers/vfio/pci/nvidia-vgpu/vfio.h create mode 100644
> > drivers/vfio/pci/nvidia-vgpu/vfio_access.c create mode 100644
> > drivers/vfio/pci/nvidia-vgpu/vfio_main.c create mode 100644
> > drivers/vfio/pci/nvidia-vgpu/vgpu.c create mode 100644
> > drivers/vfio/pci/nvidia-vgpu/vgpu_mgr.c create mode 100644
> > drivers/vfio/pci/nvidia-vgpu/vgpu_mgr.h create mode 100644
> > drivers/vfio/pci/nvidia-vgpu/vgpu_types.c create mode 100644
> > include/drm/nvkm_vgpu_mgr_vfio.h
> > 
>

Danilo Krummrich Sept. 23, 2024, 8:49 a.m. UTC | #4

Hi Zhi,

Thanks for the very detailed cover letter.

On Sun, Sep 22, 2024 at 05:49:22AM -0700, Zhi Wang wrote:
> 1. Background
> =============
> 
> NVIDIA vGPU[1] software enables powerful GPU performance for workloads
> ranging from graphics-rich virtual workstations to data science and AI,
> enabling IT to leverage the management and security benefits of
> virtualization as well as the performance of NVIDIA GPUs required for
> modern workloads. Installed on a physical GPU in a cloud or enterprise
> data center server, NVIDIA vGPU software creates virtual GPUs that can
> be shared across multiple virtual machines.
> 
> The vGPU architecture[2] can be illustrated as follow:
> 
>  +--------------------+    +--------------------+ +--------------------+ +--------------------+ 
>  | Hypervisor         |    | Guest VM           | | Guest VM           | | Guest VM           | 
>  |                    |    | +----------------+ | | +----------------+ | | +----------------+ | 
>  | +----------------+ |    | |Applications... | | | |Applications... | | | |Applications... | | 
>  | |  NVIDIA        | |    | +----------------+ | | +----------------+ | | +----------------+ | 
>  | |  Virtual GPU   | |    | +----------------+ | | +----------------+ | | +----------------+ | 
>  | |  Manager       | |    | |  Guest Driver  | | | |  Guest Driver  | | | |  Guest Driver  | | 
>  | +------^---------+ |    | +----------------+ | | +----------------+ | | +----------------+ | 
>  |        |           |    +---------^----------+ +----------^---------+ +----------^---------+ 
>  |        |           |              |                       |                      |           
>  |        |           +--------------+-----------------------+----------------------+---------+ 
>  |        |                          |                       |                      |         | 
>  |        |                          |                       |                      |         | 
>  +--------+--------------------------+-----------------------+----------------------+---------+ 
> +---------v--------------------------+-----------------------+----------------------+----------+
> | NVIDIA                  +----------v---------+ +-----------v--------+ +-----------v--------+ |
> | Physical GPU            |   Virtual GPU      | |   Virtual GPU      | |   Virtual GPU      | |
> |                         +--------------------+ +--------------------+ +--------------------+ |
> +----------------------------------------------------------------------------------------------+
> 
> Each NVIDIA vGPU is analogous to a conventional GPU, having a fixed amount
> of GPU framebuffer, and one or more virtual display outputs or "heads".
> The vGPU’s framebuffer is allocated out of the physical GPU’s framebuffer
> at the time the vGPU is created, and the vGPU retains exclusive use of
> that framebuffer until it is destroyed.
> 
> The number of physical GPUs that a board has depends on the board. Each
> physical GPU can support several different types of virtual GPU (vGPU).
> vGPU types have a fixed amount of frame buffer, number of supported
> display heads, and maximum resolutions. They are grouped into different
> series according to the different classes of workload for which they are
> optimized. Each series is identified by the last letter of the vGPU type
> name.
> 
> NVIDIA vGPU supports Windows and Linux guest VM operating systems. The
> supported vGPU types depend on the guest VM OS.
> 
> 2. Proposal for upstream
> ========================

What is the strategy in the mid / long term with this?

As you know, we're trying to move to Nova and the blockers with the device /
driver infrastructure have been resolved and we're able to move forward. Besides
that, Dave made great progress on the firmware abstraction side of things.

Is this more of a proof of concept? Do you plan to work on Nova in general and
vGPU support for Nova?

> 
> 2.1 Architecture
> ----------------
> 
> Moving to the upstream, the proposed architecture can be illustrated as followings:
> 
>                             +--------------------+ +--------------------+ +--------------------+ 
>                             | Linux VM           | | Windows VM         | | Guest VM           | 
>                             | +----------------+ | | +----------------+ | | +----------------+ | 
>                             | |Applications... | | | |Applications... | | | |Applications... | | 
>                             | +----------------+ | | +----------------+ | | +----------------+ | ... 
>                             | +----------------+ | | +----------------+ | | +----------------+ | 
>                             | |  Guest Driver  | | | |  Guest Driver  | | | |  Guest Driver  | | 
>                             | +----------------+ | | +----------------+ | | +----------------+ | 
>                             +---------^----------+ +----------^---------+ +----------^---------+ 
>                                       |                       |                      |           
>                            +--------------------------------------------------------------------+
>                            |+--------------------+ +--------------------+ +--------------------+|
>                            ||       QEMU         | |       QEMU         | |       QEMU         ||
>                            ||                    | |                    | |                    ||
>                            |+--------------------+ +--------------------+ +--------------------+|
>                            +--------------------------------------------------------------------+
>                                       |                       |                      |
> +-----------------------------------------------------------------------------------------------+
> |                           +----------------------------------------------------------------+  |
> |                           |                                VFIO                            |  |
> |                           |                                                                |  |
> | +-----------------------+ | +------------------------+  +---------------------------------+|  |
> | |  Core Driver vGPU     | | |                        |  |                                 ||  |
> | |       Support        <--->|                       <---->                                ||  |
> | +-----------------------+ | | NVIDIA vGPU Manager    |  | NVIDIA vGPU VFIO Variant Driver ||  |
> | |    NVIDIA GPU Core    | | |                        |  |                                 ||  |
> | |        Driver         | | +------------------------+  +---------------------------------+|  |
> | +--------^--------------+ +----------------------------------------------------------------+  |
> |          |                          |                       |                      |          |
> +-----------------------------------------------------------------------------------------------+
>            |                          |                       |                      |           
> +----------|--------------------------|-----------------------|----------------------|----------+
> |          v               +----------v---------+ +-----------v--------+ +-----------v--------+ |
> |  NVIDIA                  |       PCI VF       | |       PCI VF       | |       PCI VF       | |
> |  Physical GPU            |                    | |                    | |                    | |
> |                          |   (Virtual GPU)    | |   (Virtual GPU)    | |    (Virtual GPU)   | |
> |                          +--------------------+ +--------------------+ +--------------------+ |
> +-----------------------------------------------------------------------------------------------+
> 
> The supported GPU generations will be Ada which come with the supported
> GPU architecture. Each vGPU is backed by a PCI virtual function.
> 
> The NVIDIA vGPU VFIO module together with VFIO sits on VFs, provides
> extended management and features, e.g. selecting the vGPU types, support
> live migration and driver warm update.
> 
> Like other devices that VFIO supports, VFIO provides the standard
> userspace APIs for device lifecycle management and advance feature
> support.
> 
> The NVIDIA vGPU manager provides necessary support to the NVIDIA vGPU VFIO
> variant driver to create/destroy vGPUs, query available vGPU types, select
> the vGPU type, etc.
> 
> On the other side, NVIDIA vGPU manager talks to the NVIDIA GPU core driver,
> which provide necessary support to reach the HW functions.
> 
> 2.2 Requirements to the NVIDIA GPU core driver
> ----------------------------------------------
> 
> The primary use case of CSP and enterprise is a standalone minimal
> drivers of vGPU manager and other necessary components.
> 
> NVIDIA vGPU manager talks to the NVIDIA GPU core driver, which provide
> necessary support to:
> 
> - Load the GSP firmware, boot the GSP, provide commnication channel.
> - Manage the shared/partitioned HW resources. E.g. reserving FB memory,
>   channels for the vGPU mananger to create vGPUs.
> - Exception handling. E.g. delivering the GSP events to vGPU manager.
> - Host event dispatch. E.g. suspend/resume.
> - Enumerations of HW configuration.
> 
> The NVIDIA GPU core driver, which sits on the PCI device interface of
> NVIDIA GPU, provides support to both DRM driver and the vGPU manager.
> 
> In this RFC, the split nouveau GPU driver[3] is used as an example to
> demostrate the requirements of vGPU manager to the core driver. The
> nouveau driver is split into nouveau (the DRM driver) and nvkm (the core
> driver).
> 
> 3 Try the RFC patches
> -----------------------
> 
> The RFC supports to create one VM to test the simple GPU workload.
> 
> - Host kernel: https://github.com/zhiwang-nvidia/linux/tree/zhi/vgpu-mgr-rfc
> - Guest driver package: NVIDIA-Linux-x86_64-535.154.05.run [4]
> 
>   Install guest driver:
>   # export GRID_BUILD=1
>   # ./NVIDIA-Linux-x86_64-535.154.05.run
> 
> - Tested platforms: L40.
> - Tested guest OS: Ubutnu 24.04 LTS.
> - Supported experience: Linux rich desktop experience with simple 3D
>   workload, e.g. glmark2
> 
> 4 Demo
> ------
> 
> A demo video can be found at: https://youtu.be/YwgIvvk-V94
> 
> [1] https://www.nvidia.com/en-us/data-center/virtual-solutions/
> [2] https://docs.nvidia.com/vgpu/17.0/grid-vgpu-user-guide/index.html#architecture-grid-vgpu
> [3] https://lore.kernel.org/dri-devel/20240613170211.88779-1-bskeggs@nvidia.com/T/
> [4] https://us.download.nvidia.com/XFree86/Linux-x86_64/535.154.05/NVIDIA-Linux-x86_64-535.154.05.run
> 
> Zhi Wang (29):
>   nvkm/vgpu: introduce NVIDIA vGPU support prelude
>   nvkm/vgpu: attach to nvkm as a nvkm client
>   nvkm/vgpu: reserve a larger GSP heap when NVIDIA vGPU is enabled
>   nvkm/vgpu: set the VF partition count when NVIDIA vGPU is enabled
>   nvkm/vgpu: populate GSP_VF_INFO when NVIDIA vGPU is enabled
>   nvkm/vgpu: set RMSetSriovMode when NVIDIA vGPU is enabled
>   nvkm/gsp: add a notify handler for GSP event
>     GPUACCT_PERFMON_UTIL_SAMPLES
>   nvkm/vgpu: get the size VMMU segment from GSP firmware
>   nvkm/vgpu: introduce the reserved channel allocator
>   nvkm/vgpu: introduce interfaces for NVIDIA vGPU VFIO module
>   nvkm/vgpu: introduce GSP RM client alloc and free for vGPU
>   nvkm/vgpu: introduce GSP RM control interface for vGPU
>   nvkm: move chid.h to nvkm/engine.
>   nvkm/vgpu: introduce channel allocation for vGPU
>   nvkm/vgpu: introduce FB memory allocation for vGPU
>   nvkm/vgpu: introduce BAR1 map routines for vGPUs
>   nvkm/vgpu: introduce engine bitmap for vGPU
>   nvkm/vgpu: introduce pci_driver.sriov_configure() in nvkm
>   vfio/vgpu_mgr: introdcue vGPU lifecycle management prelude
>   vfio/vgpu_mgr: allocate GSP RM client for NVIDIA vGPU manager
>   vfio/vgpu_mgr: introduce vGPU type uploading
>   vfio/vgpu_mgr: allocate vGPU FB memory when creating vGPUs
>   vfio/vgpu_mgr: allocate vGPU channels when creating vGPUs
>   vfio/vgpu_mgr: allocate mgmt heap when creating vGPUs
>   vfio/vgpu_mgr: map mgmt heap when creating a vGPU
>   vfio/vgpu_mgr: allocate GSP RM client when creating vGPUs
>   vfio/vgpu_mgr: bootload the new vGPU
>   vfio/vgpu_mgr: introduce vGPU host RPC channel
>   vfio/vgpu_mgr: introduce NVIDIA vGPU VFIO variant driver
> 
>  .../drm/nouveau/include/nvkm/core/device.h    |   3 +
>  .../drm/nouveau/include/nvkm/engine/chid.h    |  29 +
>  .../gpu/drm/nouveau/include/nvkm/subdev/gsp.h |   1 +
>  .../nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h  |  45 ++
>  .../nvidia/inc/ctrl/ctrl2080/ctrl2080gpu.h    |  12 +
>  drivers/gpu/drm/nouveau/nvkm/Kbuild           |   1 +
>  drivers/gpu/drm/nouveau/nvkm/device/pci.c     |  33 +-
>  .../gpu/drm/nouveau/nvkm/engine/fifo/chid.c   |  49 +-
>  .../gpu/drm/nouveau/nvkm/engine/fifo/chid.h   |  26 +-
>  .../gpu/drm/nouveau/nvkm/engine/fifo/r535.c   |   3 +
>  .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c    |  14 +-
>  drivers/gpu/drm/nouveau/nvkm/vgpu_mgr/Kbuild  |   3 +
>  drivers/gpu/drm/nouveau/nvkm/vgpu_mgr/vfio.c  | 302 +++++++++++
>  .../gpu/drm/nouveau/nvkm/vgpu_mgr/vgpu_mgr.c  | 234 ++++++++
>  drivers/vfio/pci/Kconfig                      |   2 +
>  drivers/vfio/pci/Makefile                     |   2 +
>  drivers/vfio/pci/nvidia-vgpu/Kconfig          |  13 +
>  drivers/vfio/pci/nvidia-vgpu/Makefile         |   8 +
>  drivers/vfio/pci/nvidia-vgpu/debug.h          |  18 +
>  .../nvidia/inc/ctrl/ctrl0000/ctrl0000system.h |  30 +
>  .../nvidia/inc/ctrl/ctrl2080/ctrl2080gpu.h    |  33 ++
>  .../ctrl/ctrl2080/ctrl2080vgpumgrinternal.h   | 152 ++++++
>  .../common/sdk/nvidia/inc/ctrl/ctrla081.h     | 109 ++++
>  .../nvrm/common/sdk/nvidia/inc/dev_vgpu_gsp.h | 213 ++++++++
>  .../common/sdk/nvidia/inc/nv_vgpu_types.h     |  51 ++
>  .../common/sdk/vmioplugin/inc/vmioplugin.h    |  26 +
>  .../pci/nvidia-vgpu/include/nvrm/nvtypes.h    |  24 +
>  drivers/vfio/pci/nvidia-vgpu/nvkm.h           |  94 ++++
>  drivers/vfio/pci/nvidia-vgpu/rpc.c            | 242 +++++++++
>  drivers/vfio/pci/nvidia-vgpu/vfio.h           |  43 ++
>  drivers/vfio/pci/nvidia-vgpu/vfio_access.c    | 297 ++++++++++
>  drivers/vfio/pci/nvidia-vgpu/vfio_main.c      | 511 ++++++++++++++++++
>  drivers/vfio/pci/nvidia-vgpu/vgpu.c           | 352 ++++++++++++
>  drivers/vfio/pci/nvidia-vgpu/vgpu_mgr.c       | 144 +++++
>  drivers/vfio/pci/nvidia-vgpu/vgpu_mgr.h       |  89 +++
>  drivers/vfio/pci/nvidia-vgpu/vgpu_types.c     | 466 ++++++++++++++++
>  include/drm/nvkm_vgpu_mgr_vfio.h              |  61 +++
>  37 files changed, 3702 insertions(+), 33 deletions(-)
>  create mode 100644 drivers/gpu/drm/nouveau/include/nvkm/engine/chid.h
>  create mode 100644 drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h
>  create mode 100644 drivers/gpu/drm/nouveau/nvkm/vgpu_mgr/Kbuild
>  create mode 100644 drivers/gpu/drm/nouveau/nvkm/vgpu_mgr/vfio.c
>  create mode 100644 drivers/gpu/drm/nouveau/nvkm/vgpu_mgr/vgpu_mgr.c
>  create mode 100644 drivers/vfio/pci/nvidia-vgpu/Kconfig
>  create mode 100644 drivers/vfio/pci/nvidia-vgpu/Makefile
>  create mode 100644 drivers/vfio/pci/nvidia-vgpu/debug.h
>  create mode 100644 drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/nvidia/inc/ctrl/ctrl0000/ctrl0000system.h
>  create mode 100644 drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gpu.h
>  create mode 100644 drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080vgpumgrinternal.h
>  create mode 100644 drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/nvidia/inc/ctrl/ctrla081.h
>  create mode 100644 drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/nvidia/inc/dev_vgpu_gsp.h
>  create mode 100644 drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/nvidia/inc/nv_vgpu_types.h
>  create mode 100644 drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/vmioplugin/inc/vmioplugin.h
>  create mode 100644 drivers/vfio/pci/nvidia-vgpu/include/nvrm/nvtypes.h
>  create mode 100644 drivers/vfio/pci/nvidia-vgpu/nvkm.h
>  create mode 100644 drivers/vfio/pci/nvidia-vgpu/rpc.c
>  create mode 100644 drivers/vfio/pci/nvidia-vgpu/vfio.h
>  create mode 100644 drivers/vfio/pci/nvidia-vgpu/vfio_access.c
>  create mode 100644 drivers/vfio/pci/nvidia-vgpu/vfio_main.c
>  create mode 100644 drivers/vfio/pci/nvidia-vgpu/vgpu.c
>  create mode 100644 drivers/vfio/pci/nvidia-vgpu/vgpu_mgr.c
>  create mode 100644 drivers/vfio/pci/nvidia-vgpu/vgpu_mgr.h
>  create mode 100644 drivers/vfio/pci/nvidia-vgpu/vgpu_types.c
>  create mode 100644 include/drm/nvkm_vgpu_mgr_vfio.h
> 
> -- 
> 2.34.1
>

Jason Gunthorpe Sept. 23, 2024, 3:01 p.m. UTC | #5

On Mon, Sep 23, 2024 at 10:49:07AM +0200, Danilo Krummrich wrote:
> > 2. Proposal for upstream
> > ========================
> 
> What is the strategy in the mid / long term with this?
> 
> As you know, we're trying to move to Nova and the blockers with the device /
> driver infrastructure have been resolved and we're able to move forward. Besides
> that, Dave made great progress on the firmware abstraction side of things.
> 
> Is this more of a proof of concept? Do you plan to work on Nova in general and
> vGPU support for Nova?

This is intended to be a real product that customers would use, it is
not a proof of concept. There is alot of demand for this kind of
simplified virtualization infrastructure in the host side. The series
here is the first attempt at making thin host infrastructure and
Zhi/etc are doing it with an upstream-first approach.

From the VFIO side I would like to see something like this merged in
nearish future as it would bring a previously out of tree approach to
be fully intree using our modern infrastructure. This is a big win for
the VFIO world.

As a commercial product this will be backported extensively to many
old kernels and that is harder/impossible if it isn't exclusively in
C. So, I think nova needs to co-exist in some way.

Jason

Jason Gunthorpe Sept. 23, 2024, 3:02 p.m. UTC | #6

On Mon, Sep 23, 2024 at 06:22:33AM +0000, Tian, Kevin wrote:
> > From: Zhi Wang <zhiw@nvidia.com>
> > Sent: Sunday, September 22, 2024 8:49 PM
> > 
> [...]
> > 
> > The NVIDIA vGPU VFIO module together with VFIO sits on VFs, provides
> > extended management and features, e.g. selecting the vGPU types, support
> > live migration and driver warm update.
> > 
> > Like other devices that VFIO supports, VFIO provides the standard
> > userspace APIs for device lifecycle management and advance feature
> > support.
> > 
> > The NVIDIA vGPU manager provides necessary support to the NVIDIA vGPU VFIO
> > variant driver to create/destroy vGPUs, query available vGPU types, select
> > the vGPU type, etc.
> > 
> > On the other side, NVIDIA vGPU manager talks to the NVIDIA GPU core driver,
> > which provide necessary support to reach the HW functions.
> > 
> 
> I'm not sure VFIO is the right place to host the NVIDIA vGPU manager. 
> It's very NVIDIA specific and naturally fit in the PF driver.

drm isn't a particularly logical place for that either :|

Jason

Danilo Krummrich Sept. 23, 2024, 10:50 p.m. UTC | #7

On Mon, Sep 23, 2024 at 12:01:40PM -0300, Jason Gunthorpe wrote:
> On Mon, Sep 23, 2024 at 10:49:07AM +0200, Danilo Krummrich wrote:
> > > 2. Proposal for upstream
> > > ========================
> > 
> > What is the strategy in the mid / long term with this?
> > 
> > As you know, we're trying to move to Nova and the blockers with the device /
> > driver infrastructure have been resolved and we're able to move forward. Besides
> > that, Dave made great progress on the firmware abstraction side of things.
> > 
> > Is this more of a proof of concept? Do you plan to work on Nova in general and
> > vGPU support for Nova?
> 
> This is intended to be a real product that customers would use, it is
> not a proof of concept. There is alot of demand for this kind of
> simplified virtualization infrastructure in the host side.

I see...

> The series
> here is the first attempt at making thin host infrastructure and
> Zhi/etc are doing it with an upstream-first approach.

This is great!

> 
> From the VFIO side I would like to see something like this merged in
> nearish future as it would bring a previously out of tree approach to
> be fully intree using our modern infrastructure. This is a big win for
> the VFIO world.
> 
> As a commercial product this will be backported extensively to many
> old kernels and that is harder/impossible if it isn't exclusively in
> C. So, I think nova needs to co-exist in some way.

We'll surely not support two drivers for the same thing in the long term,
neither does it make sense, nor is it sustainable.

We have a lot of good reasons why we decided to move forward with Nova as a
successor of Nouveau for GSP-based GPUs in the long term -- I also just held a
talk about this at LPC.

For the short/mid term I think it may be reasonable to start with Nouveau, but
this must be based on some agreements, for instance:

- take responsibility, e.g. commitment to help with maintainance with some of
  NVKM / NVIDIA GPU core (or whatever we want to call it) within Nouveau
- commitment to help with Nova in general and, once applicable, move the vGPU
  parts over to Nova

But I think the very last one naturally happens if we stop further support for
new HW in Nouveau at some point.

> 
> Jason
>

Jason Gunthorpe Sept. 24, 2024, 4:41 p.m. UTC | #8

On Tue, Sep 24, 2024 at 12:50:55AM +0200, Danilo Krummrich wrote:

> > From the VFIO side I would like to see something like this merged in
> > nearish future as it would bring a previously out of tree approach to
> > be fully intree using our modern infrastructure. This is a big win for
> > the VFIO world.
> > 
> > As a commercial product this will be backported extensively to many
> > old kernels and that is harder/impossible if it isn't exclusively in
> > C. So, I think nova needs to co-exist in some way.
> 
> We'll surely not support two drivers for the same thing in the long term,
> neither does it make sense, nor is it sustainable.

What is being done here is the normal correct kernel thing to
do. Refactor the shared core code into a module and stick higher level
stuff on top of it. Ideally Nova/Nouveau would exist as peers
implementing DRM subsystem on this shared core infrastructure. We've
done this sort of thing before in other places in the kernel. It has
been proven to work well.

So, I'm not sure why you think there should be two drivers in the long
term? Do you have some technical reason why Nova can't fit into this
modular architecture?

Regardless, assuming Nova will eventually propose merging duplicated
bootup code then I suggest it should be able to fully replace the C
code with a kconfig switch and provide C compatible interfaces for
VFIO. When Rust is sufficiently mature we can consider a deprecation
schedule for the C version.

I agree duplication doesn't make sense, but if it is essential to make
everyone happy then we should do it to accommodate the ongoing Rust
experiment.

> We have a lot of good reasons why we decided to move forward with Nova as a
> successor of Nouveau for GSP-based GPUs in the long term -- I also just held a
> talk about this at LPC.

I know, but this series is adding a VFIO driver to the kernel, and a
complete Nova driver doesn't even exist yet. It is fine to think about
future plans, but let's not get too far ahead of ourselves here..

> For the short/mid term I think it may be reasonable to start with
> Nouveau, but this must be based on some agreements, for instance:
> 
> - take responsibility, e.g. commitment to help with maintainance with some of
>   NVKM / NVIDIA GPU core (or whatever we want to call it) within Nouveau

I fully expect NVIDIA teams to own this core driver and VFIO parts. I
see there are no changes to the MAINTAINERs file in this RFC, that
will need to be corrected.

> - commitment to help with Nova in general and, once applicable, move the vGPU
>   parts over to Nova

I think you will get help with Nova based on its own merit, but I
don't like where you are going with this. Linus has had negative
things to say about this sort of cross-linking and I agree with
him. We should not be trying to extract unrelated promises on Nova as
a condition for progressing a VFIO series. :\

> But I think the very last one naturally happens if we stop further support for
> new HW in Nouveau at some point.

I expect the core code would continue to support new HW going forward
to support the VFIO driver, even if nouveau doesn't use it, until Rust
reaches some full ecosystem readyness for the server space.

There are going to be a lot of users of this code, let's not rush to
harm them please.

Fortunately there is no use case for DRM and VFIO to coexist in a
hypervisor, so this does not turn into a such a technical problem like
most other dual-driver situations.

Jason

Zhi Wang Sept. 24, 2024, 7:49 p.m. UTC | #9

On 23/09/2024 11.38, Danilo Krummrich wrote:
> External email: Use caution opening links or attachments
> 
> 
> On Sun, Sep 22, 2024 at 04:11:21PM +0300, Zhi Wang wrote:
>> On Sun, 22 Sep 2024 05:49:22 -0700
>> Zhi Wang <zhiw@nvidia.com> wrote:
>>
>> +Ben.
>>
>> Forget to add you. My bad.
> 
> Please also add the driver maintainers!
> 
> I had to fetch the patchset from the KVM list, since they did not hit the
> nouveau list (I'm trying to get @nvidia.com addresses whitelisted).
> 
> - Danilo
> 

My bad. Will do in the next iteration. Weird...never thought this cound 
happen since alll my previous emails landed in the mailing list. Did you 
see any email discussion in the thread in the nouveau list? Feel free to 
let me know if I should send them again to nouveau list. Maybe it is 
also easier that you can pull the patches from my tree.

Note that I will be on vacation until Oct 11th. Email reply might be 
slow. But I will read the emails in the mailing list.

Thanks,
Zhi.
>>
>>
>>> 1. Background
>>> =============
>>>
>>> NVIDIA vGPU[1] software enables powerful GPU performance for workloads
>>> ranging from graphics-rich virtual workstations to data science and
>>> AI, enabling IT to leverage the management and security benefits of
>>> virtualization as well as the performance of NVIDIA GPUs required for
>>> modern workloads. Installed on a physical GPU in a cloud or enterprise
>>> data center server, NVIDIA vGPU software creates virtual GPUs that can
>>> be shared across multiple virtual machines.
>>>
>>> The vGPU architecture[2] can be illustrated as follow:
>>>
>>>   +--------------------+    +--------------------+
>>> +--------------------+ +--------------------+ | Hypervisor         |
>>>    | Guest VM           | | Guest VM           | | Guest VM
>>> | |                    |    | +----------------+ | |
>>> +----------------+ | | +----------------+ | | +----------------+ |
>>> | |Applications... | | | |Applications... | | | |Applications... | |
>>> | |  NVIDIA        | |    | +----------------+ | | +----------------+
>>> | | +----------------+ | | |  Virtual GPU   | |    |
>>> +----------------+ | | +----------------+ | | +----------------+ | |
>>> |  Manager       | |    | |  Guest Driver  | | | |  Guest Driver  | |
>>> | |  Guest Driver  | | | +------^---------+ |    | +----------------+
>>> | | +----------------+ | | +----------------+ | |        |
>>> |    +---------^----------+ +----------^---------+
>>> +----------^---------+ |        |           |              |
>>>               |                      | |        |
>>> +--------------+-----------------------+----------------------+---------+
>>> |        |                          |                       |
>>>               |         | |        |                          |
>>>                 |                      |         |
>>> +--------+--------------------------+-----------------------+----------------------+---------+
>>> +---------v--------------------------+-----------------------+----------------------+----------+
>>> | NVIDIA                  +----------v---------+
>>> +-----------v--------+ +-----------v--------+ | | Physical GPU
>>>      |   Virtual GPU      | |   Virtual GPU      | |   Virtual GPU
>>>   | | |                         +--------------------+
>>> +--------------------+ +--------------------+ |
>>> +----------------------------------------------------------------------------------------------+
>>>
>>> Each NVIDIA vGPU is analogous to a conventional GPU, having a fixed
>>> amount of GPU framebuffer, and one or more virtual display outputs or
>>> "heads". The vGPU’s framebuffer is allocated out of the physical
>>> GPU’s framebuffer at the time the vGPU is created, and the vGPU
>>> retains exclusive use of that framebuffer until it is destroyed.
>>>
>>> The number of physical GPUs that a board has depends on the board.
>>> Each physical GPU can support several different types of virtual GPU
>>> (vGPU). vGPU types have a fixed amount of frame buffer, number of
>>> supported display heads, and maximum resolutions. They are grouped
>>> into different series according to the different classes of workload
>>> for which they are optimized. Each series is identified by the last
>>> letter of the vGPU type name.
>>>
>>> NVIDIA vGPU supports Windows and Linux guest VM operating systems. The
>>> supported vGPU types depend on the guest VM OS.
>>>
>>> 2. Proposal for upstream
>>> ========================
>>>
>>> 2.1 Architecture
>>> ----------------
>>>
>>> Moving to the upstream, the proposed architecture can be illustrated
>>> as followings:
>>>
>>>                              +--------------------+
>>> +--------------------+ +--------------------+ | Linux VM           |
>>> | Windows VM         | | Guest VM           | | +----------------+ |
>>> | +----------------+ | | +----------------+ | | |Applications... | |
>>> | |Applications... | | | |Applications... | | | +----------------+ |
>>> | +----------------+ | | +----------------+ | ... |
>>> +----------------+ | | +----------------+ | | +----------------+ | |
>>> |  Guest Driver  | | | |  Guest Driver  | | | |  Guest Driver  | | |
>>> +----------------+ | | +----------------+ | | +----------------+ |
>>> +---------^----------+ +----------^---------+ +----------^---------+
>>> |                       |                      |
>>> +--------------------------------------------------------------------+
>>> |+--------------------+ +--------------------+
>>> +--------------------+| ||       QEMU         | |       QEMU
>>> | |       QEMU         || ||                    | |
>>>   | |                    || |+--------------------+
>>> +--------------------+ +--------------------+|
>>> +--------------------------------------------------------------------+
>>> |                       |                      |
>>> +-----------------------------------------------------------------------------------------------+
>>> |
>>> +----------------------------------------------------------------+  |
>>> |                           |                                VFIO
>>>                         |  | |                           |
>>>                                                     |  | |
>>> +-----------------------+ | +------------------------+
>>> +---------------------------------+|  | | |  Core Driver vGPU     | |
>>> |                        |  |                                 ||  | |
>>> |       Support        <--->|                       <---->
>>>                      ||  | | +-----------------------+ | | NVIDIA vGPU
>>> Manager    |  | NVIDIA vGPU VFIO Variant Driver ||  | | |    NVIDIA
>>> GPU Core    | | |                        |  |
>>>          ||  | | |        Driver         | |
>>> +------------------------+  +---------------------------------+|  | |
>>> +--------^--------------+
>>> +----------------------------------------------------------------+  |
>>> |          |                          |                       |
>>>                 |          |
>>> +-----------------------------------------------------------------------------------------------+
>>> |                          |                       |
>>>      |
>>> +----------|--------------------------|-----------------------|----------------------|----------+
>>> |          v               +----------v---------+
>>> +-----------v--------+ +-----------v--------+ | |  NVIDIA
>>>       |       PCI VF       | |       PCI VF       | |       PCI VF
>>>    | | |  Physical GPU            |                    | |
>>>         | |                    | | |                          |
>>> (Virtual GPU)    | |   (Virtual GPU)    | |    (Virtual GPU)   | | |
>>>                          +--------------------+ +--------------------+
>>> +--------------------+ |
>>> +-----------------------------------------------------------------------------------------------+
>>>
>>> The supported GPU generations will be Ada which come with the
>>> supported GPU architecture. Each vGPU is backed by a PCI virtual
>>> function.
>>>
>>> The NVIDIA vGPU VFIO module together with VFIO sits on VFs, provides
>>> extended management and features, e.g. selecting the vGPU types,
>>> support live migration and driver warm update.
>>>
>>> Like other devices that VFIO supports, VFIO provides the standard
>>> userspace APIs for device lifecycle management and advance feature
>>> support.
>>>
>>> The NVIDIA vGPU manager provides necessary support to the NVIDIA vGPU
>>> VFIO variant driver to create/destroy vGPUs, query available vGPU
>>> types, select the vGPU type, etc.
>>>
>>> On the other side, NVIDIA vGPU manager talks to the NVIDIA GPU core
>>> driver, which provide necessary support to reach the HW functions.
>>>
>>> 2.2 Requirements to the NVIDIA GPU core driver
>>> ----------------------------------------------
>>>
>>> The primary use case of CSP and enterprise is a standalone minimal
>>> drivers of vGPU manager and other necessary components.
>>>
>>> NVIDIA vGPU manager talks to the NVIDIA GPU core driver, which provide
>>> necessary support to:
>>>
>>> - Load the GSP firmware, boot the GSP, provide commnication channel.
>>> - Manage the shared/partitioned HW resources. E.g. reserving FB
>>> memory, channels for the vGPU mananger to create vGPUs.
>>> - Exception handling. E.g. delivering the GSP events to vGPU manager.
>>> - Host event dispatch. E.g. suspend/resume.
>>> - Enumerations of HW configuration.
>>>
>>> The NVIDIA GPU core driver, which sits on the PCI device interface of
>>> NVIDIA GPU, provides support to both DRM driver and the vGPU manager.
>>>
>>> In this RFC, the split nouveau GPU driver[3] is used as an example to
>>> demostrate the requirements of vGPU manager to the core driver. The
>>> nouveau driver is split into nouveau (the DRM driver) and nvkm (the
>>> core driver).
>>>
>>> 3 Try the RFC patches
>>> -----------------------
>>>
>>> The RFC supports to create one VM to test the simple GPU workload.
>>>
>>> - Host kernel:
>>> https://github.com/zhiwang-nvidia/linux/tree/zhi/vgpu-mgr-rfc
>>> - Guest driver package: NVIDIA-Linux-x86_64-535.154.05.run [4]
>>>
>>>    Install guest driver:
>>>    # export GRID_BUILD=1
>>>    # ./NVIDIA-Linux-x86_64-535.154.05.run
>>>
>>> - Tested platforms: L40.
>>> - Tested guest OS: Ubutnu 24.04 LTS.
>>> - Supported experience: Linux rich desktop experience with simple 3D
>>>    workload, e.g. glmark2
>>>
>>> 4 Demo
>>> ------
>>>
>>> A demo video can be found at: https://youtu.be/YwgIvvk-V94
>>>
>>> [1] https://www.nvidia.com/en-us/data-center/virtual-solutions/
>>> [2]
>>> https://docs.nvidia.com/vgpu/17.0/grid-vgpu-user-guide/index.html#architecture-grid-vgpu
>>> [3]
>>> https://lore.kernel.org/dri-devel/20240613170211.88779-1-bskeggs@nvidia.com/T/
>>> [4]
>>> https://us.download.nvidia.com/XFree86/Linux-x86_64/535.154.05/NVIDIA-Linux-x86_64-535.154.05.run
>>>
>>> Zhi Wang (29):
>>>    nvkm/vgpu: introduce NVIDIA vGPU support prelude
>>>    nvkm/vgpu: attach to nvkm as a nvkm client
>>>    nvkm/vgpu: reserve a larger GSP heap when NVIDIA vGPU is enabled
>>>    nvkm/vgpu: set the VF partition count when NVIDIA vGPU is enabled
>>>    nvkm/vgpu: populate GSP_VF_INFO when NVIDIA vGPU is enabled
>>>    nvkm/vgpu: set RMSetSriovMode when NVIDIA vGPU is enabled
>>>    nvkm/gsp: add a notify handler for GSP event
>>>      GPUACCT_PERFMON_UTIL_SAMPLES
>>>    nvkm/vgpu: get the size VMMU segment from GSP firmware
>>>    nvkm/vgpu: introduce the reserved channel allocator
>>>    nvkm/vgpu: introduce interfaces for NVIDIA vGPU VFIO module
>>>    nvkm/vgpu: introduce GSP RM client alloc and free for vGPU
>>>    nvkm/vgpu: introduce GSP RM control interface for vGPU
>>>    nvkm: move chid.h to nvkm/engine.
>>>    nvkm/vgpu: introduce channel allocation for vGPU
>>>    nvkm/vgpu: introduce FB memory allocation for vGPU
>>>    nvkm/vgpu: introduce BAR1 map routines for vGPUs
>>>    nvkm/vgpu: introduce engine bitmap for vGPU
>>>    nvkm/vgpu: introduce pci_driver.sriov_configure() in nvkm
>>>    vfio/vgpu_mgr: introdcue vGPU lifecycle management prelude
>>>    vfio/vgpu_mgr: allocate GSP RM client for NVIDIA vGPU manager
>>>    vfio/vgpu_mgr: introduce vGPU type uploading
>>>    vfio/vgpu_mgr: allocate vGPU FB memory when creating vGPUs
>>>    vfio/vgpu_mgr: allocate vGPU channels when creating vGPUs
>>>    vfio/vgpu_mgr: allocate mgmt heap when creating vGPUs
>>>    vfio/vgpu_mgr: map mgmt heap when creating a vGPU
>>>    vfio/vgpu_mgr: allocate GSP RM client when creating vGPUs
>>>    vfio/vgpu_mgr: bootload the new vGPU
>>>    vfio/vgpu_mgr: introduce vGPU host RPC channel
>>>    vfio/vgpu_mgr: introduce NVIDIA vGPU VFIO variant driver
>>>
>>>   .../drm/nouveau/include/nvkm/core/device.h    |   3 +
>>>   .../drm/nouveau/include/nvkm/engine/chid.h    |  29 +
>>>   .../gpu/drm/nouveau/include/nvkm/subdev/gsp.h |   1 +
>>>   .../nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h  |  45 ++
>>>   .../nvidia/inc/ctrl/ctrl2080/ctrl2080gpu.h    |  12 +
>>>   drivers/gpu/drm/nouveau/nvkm/Kbuild           |   1 +
>>>   drivers/gpu/drm/nouveau/nvkm/device/pci.c     |  33 +-
>>>   .../gpu/drm/nouveau/nvkm/engine/fifo/chid.c   |  49 +-
>>>   .../gpu/drm/nouveau/nvkm/engine/fifo/chid.h   |  26 +-
>>>   .../gpu/drm/nouveau/nvkm/engine/fifo/r535.c   |   3 +
>>>   .../gpu/drm/nouveau/nvkm/subdev/gsp/r535.c    |  14 +-
>>>   drivers/gpu/drm/nouveau/nvkm/vgpu_mgr/Kbuild  |   3 +
>>>   drivers/gpu/drm/nouveau/nvkm/vgpu_mgr/vfio.c  | 302 +++++++++++
>>>   .../gpu/drm/nouveau/nvkm/vgpu_mgr/vgpu_mgr.c  | 234 ++++++++
>>>   drivers/vfio/pci/Kconfig                      |   2 +
>>>   drivers/vfio/pci/Makefile                     |   2 +
>>>   drivers/vfio/pci/nvidia-vgpu/Kconfig          |  13 +
>>>   drivers/vfio/pci/nvidia-vgpu/Makefile         |   8 +
>>>   drivers/vfio/pci/nvidia-vgpu/debug.h          |  18 +
>>>   .../nvidia/inc/ctrl/ctrl0000/ctrl0000system.h |  30 +
>>>   .../nvidia/inc/ctrl/ctrl2080/ctrl2080gpu.h    |  33 ++
>>>   .../ctrl/ctrl2080/ctrl2080vgpumgrinternal.h   | 152 ++++++
>>>   .../common/sdk/nvidia/inc/ctrl/ctrla081.h     | 109 ++++
>>>   .../nvrm/common/sdk/nvidia/inc/dev_vgpu_gsp.h | 213 ++++++++
>>>   .../common/sdk/nvidia/inc/nv_vgpu_types.h     |  51 ++
>>>   .../common/sdk/vmioplugin/inc/vmioplugin.h    |  26 +
>>>   .../pci/nvidia-vgpu/include/nvrm/nvtypes.h    |  24 +
>>>   drivers/vfio/pci/nvidia-vgpu/nvkm.h           |  94 ++++
>>>   drivers/vfio/pci/nvidia-vgpu/rpc.c            | 242 +++++++++
>>>   drivers/vfio/pci/nvidia-vgpu/vfio.h           |  43 ++
>>>   drivers/vfio/pci/nvidia-vgpu/vfio_access.c    | 297 ++++++++++
>>>   drivers/vfio/pci/nvidia-vgpu/vfio_main.c      | 511
>>> ++++++++++++++++++ drivers/vfio/pci/nvidia-vgpu/vgpu.c           |
>>> 352 ++++++++++++ drivers/vfio/pci/nvidia-vgpu/vgpu_mgr.c       | 144
>>> +++++ drivers/vfio/pci/nvidia-vgpu/vgpu_mgr.h       |  89 +++
>>>   drivers/vfio/pci/nvidia-vgpu/vgpu_types.c     | 466 ++++++++++++++++
>>>   include/drm/nvkm_vgpu_mgr_vfio.h              |  61 +++
>>>   37 files changed, 3702 insertions(+), 33 deletions(-)
>>>   create mode 100644 drivers/gpu/drm/nouveau/include/nvkm/engine/chid.h
>>>   create mode 100644
>>> drivers/gpu/drm/nouveau/include/nvkm/vgpu_mgr/vgpu_mgr.h create mode
>>> 100644 drivers/gpu/drm/nouveau/nvkm/vgpu_mgr/Kbuild create mode
>>> 100644 drivers/gpu/drm/nouveau/nvkm/vgpu_mgr/vfio.c create mode
>>> 100644 drivers/gpu/drm/nouveau/nvkm/vgpu_mgr/vgpu_mgr.c create mode
>>> 100644 drivers/vfio/pci/nvidia-vgpu/Kconfig create mode 100644
>>> drivers/vfio/pci/nvidia-vgpu/Makefile create mode 100644
>>> drivers/vfio/pci/nvidia-vgpu/debug.h create mode 100644
>>> drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/nvidia/inc/ctrl/ctrl0000/ctrl0000system.h
>>> create mode 100644
>>> drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gpu.h
>>> create mode 100644
>>> drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080vgpumgrinternal.h
>>> create mode 100644
>>> drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/nvidia/inc/ctrl/ctrla081.h
>>> create mode 100644
>>> drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/nvidia/inc/dev_vgpu_gsp.h
>>> create mode 100644
>>> drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/nvidia/inc/nv_vgpu_types.h
>>> create mode 100644
>>> drivers/vfio/pci/nvidia-vgpu/include/nvrm/common/sdk/vmioplugin/inc/vmioplugin.h
>>> create mode 100644
>>> drivers/vfio/pci/nvidia-vgpu/include/nvrm/nvtypes.h create mode
>>> 100644 drivers/vfio/pci/nvidia-vgpu/nvkm.h create mode 100644
>>> drivers/vfio/pci/nvidia-vgpu/rpc.c create mode 100644
>>> drivers/vfio/pci/nvidia-vgpu/vfio.h create mode 100644
>>> drivers/vfio/pci/nvidia-vgpu/vfio_access.c create mode 100644
>>> drivers/vfio/pci/nvidia-vgpu/vfio_main.c create mode 100644
>>> drivers/vfio/pci/nvidia-vgpu/vgpu.c create mode 100644
>>> drivers/vfio/pci/nvidia-vgpu/vgpu_mgr.c create mode 100644
>>> drivers/vfio/pci/nvidia-vgpu/vgpu_mgr.h create mode 100644
>>> drivers/vfio/pci/nvidia-vgpu/vgpu_types.c create mode 100644
>>> include/drm/nvkm_vgpu_mgr_vfio.h
>>>
>>

Danilo Krummrich Sept. 24, 2024, 7:56 p.m. UTC | #10

On Tue, Sep 24, 2024 at 01:41:51PM -0300, Jason Gunthorpe wrote:
> On Tue, Sep 24, 2024 at 12:50:55AM +0200, Danilo Krummrich wrote:
> 
> > > From the VFIO side I would like to see something like this merged in
> > > nearish future as it would bring a previously out of tree approach to
> > > be fully intree using our modern infrastructure. This is a big win for
> > > the VFIO world.
> > > 
> > > As a commercial product this will be backported extensively to many
> > > old kernels and that is harder/impossible if it isn't exclusively in
> > > C. So, I think nova needs to co-exist in some way.
> > 
> > We'll surely not support two drivers for the same thing in the long term,
> > neither does it make sense, nor is it sustainable.
> 
> What is being done here is the normal correct kernel thing to
> do. Refactor the shared core code into a module and stick higher level
> stuff on top of it. Ideally Nova/Nouveau would exist as peers
> implementing DRM subsystem on this shared core infrastructure. We've
> done this sort of thing before in other places in the kernel. It has
> been proven to work well.

So, that's where you have the wrong understanding of what we're working on: You
seem to think that Nova is just another DRM subsystem layer on top of the NVKM
parts (what you call the core driver) of Nouveau.

But the whole point of Nova is to replace the NVKM parts of Nouveau, since
that's where the problems we want to solve reside in.

> 
> So, I'm not sure why you think there should be two drivers in the long
> term? Do you have some technical reason why Nova can't fit into this
> modular architecture?

Like I said above, the whole point of Nova is to be the core driver, the DRM
parts on top are more like "the icing on the cake".

> 
> Regardless, assuming Nova will eventually propose merging duplicated
> bootup code then I suggest it should be able to fully replace the C
> code with a kconfig switch and provide C compatible interfaces for
> VFIO. When Rust is sufficiently mature we can consider a deprecation
> schedule for the C version.
> 
> I agree duplication doesn't make sense, but if it is essential to make
> everyone happy then we should do it to accommodate the ongoing Rust
> experiment.
> 
> > We have a lot of good reasons why we decided to move forward with Nova as a
> > successor of Nouveau for GSP-based GPUs in the long term -- I also just held a
> > talk about this at LPC.
> 
> I know, but this series is adding a VFIO driver to the kernel, and a

I have no concerns regarding the VFIO driver, this is about the new features
that you intend to add to Nouveau.

> complete Nova driver doesn't even exist yet. It is fine to think about
> future plans, but let's not get too far ahead of ourselves here..

Well, that's true, but we can't just add new features to something that has been
agreed to be replaced without having a strategy for this for the successor.

> 
> > For the short/mid term I think it may be reasonable to start with
> > Nouveau, but this must be based on some agreements, for instance:
> > 
> > - take responsibility, e.g. commitment to help with maintainance with some of
> >   NVKM / NVIDIA GPU core (or whatever we want to call it) within Nouveau
> 
> I fully expect NVIDIA teams to own this core driver and VFIO parts. I
> see there are no changes to the MAINTAINERs file in this RFC, that
> will need to be corrected.

Well, I did not say to just take over the biggest part of Nouveau.

Currently - and please correct me if I'm wrong - you make it sound to me as if
you're not willing to respect the decisions that have been taken by Nouveau and
DRM maintainers.

> 
> > - commitment to help with Nova in general and, once applicable, move the vGPU
> >   parts over to Nova
> 
> I think you will get help with Nova based on its own merit, but I
> don't like where you are going with this. Linus has had negative
> things to say about this sort of cross-linking and I agree with
> him. We should not be trying to extract unrelated promises on Nova as
> a condition for progressing a VFIO series. :\

No cross-linking, no unrelated promises.

Again, we're working on a successor of Nouveau and if we keep adding features to
Nouveau in the meantime, we have to have a strategy for the transition,
otherwise we're effectively just ignoring this decision.

So, I really need you to respect the fact that there has been a decision for a
successor and that this *is* in fact relevant for all major changes to Nouveau
as well.

Once you do this, we get the chance to work things out for the short/mid term
and for the long term and make everyone benefit.

I encourage that NVIDIA wants to move things upstream and I'm absolutely willing
to collaborate and help with the use-cases and goals NVIDIA has. But it really
has to be a collaboration and this starts with acknowledging the goals of *each
other*.

> 
> > But I think the very last one naturally happens if we stop further support for
> > new HW in Nouveau at some point.
> 
> I expect the core code would continue to support new HW going forward
> to support the VFIO driver, even if nouveau doesn't use it, until Rust
> reaches some full ecosystem readyness for the server space.

From an upstream perspective the kernel doesn't need to consider OOT drivers,
i.e. the guest driver.

This doesn't mean that we can't work something out for a seamless transition
though.

But again, this can only really work if we acknowledge the goals of each other.

> 
> There are going to be a lot of users of this code, let's not rush to
> harm them please.

Please abstain from such kind of unconstructive insinuations; it's ridiculous to
imply that upstream kernel developers and maintainers would harm the users of
NVIDIA GPUs.

> 
> Fortunately there is no use case for DRM and VFIO to coexist in a
> hypervisor, so this does not turn into a such a technical problem like
> most other dual-driver situations.
> 
> Jason
>

Dave Airlie Sept. 24, 2024, 10:52 p.m. UTC | #11

On Wed, 25 Sept 2024 at 05:57, Danilo Krummrich <dakr@kernel.org> wrote:
>
> On Tue, Sep 24, 2024 at 01:41:51PM -0300, Jason Gunthorpe wrote:
> > On Tue, Sep 24, 2024 at 12:50:55AM +0200, Danilo Krummrich wrote:
> >
> > > > From the VFIO side I would like to see something like this merged in
> > > > nearish future as it would bring a previously out of tree approach to
> > > > be fully intree using our modern infrastructure. This is a big win for
> > > > the VFIO world.
> > > >
> > > > As a commercial product this will be backported extensively to many
> > > > old kernels and that is harder/impossible if it isn't exclusively in
> > > > C. So, I think nova needs to co-exist in some way.
> > >
> > > We'll surely not support two drivers for the same thing in the long term,
> > > neither does it make sense, nor is it sustainable.
> >
> > What is being done here is the normal correct kernel thing to
> > do. Refactor the shared core code into a module and stick higher level
> > stuff on top of it. Ideally Nova/Nouveau would exist as peers
> > implementing DRM subsystem on this shared core infrastructure. We've
> > done this sort of thing before in other places in the kernel. It has
> > been proven to work well.
>
> So, that's where you have the wrong understanding of what we're working on: You
> seem to think that Nova is just another DRM subsystem layer on top of the NVKM
> parts (what you call the core driver) of Nouveau.
>
> But the whole point of Nova is to replace the NVKM parts of Nouveau, since
> that's where the problems we want to solve reside in.

Just to re-emphasise for Jason who might not be as across this stuff,

NVKM replacement with rust is the main reason for the nova project,
100% the driving force for nova is the unstable NVIDIA firmware API.
The ability to use rust proc-macros to hide the NVIDIA instability
instead of trying to do it in C by either generators or abusing C
macros (which I don't think are sufficient).

The lower level nvkm driver needs to start being in rust before we can
add support for newer stuff.

Now there is possibly some scope about evolving the rust pieces in it
as, rust wrapped in C APIs to make things easier for backports or
avoid some pitfalls, but that is a discussion that we need to have
here.

I think the idea of a nova drm and nova core driver architecture is
acceptable to most of us, but long term trying to main a nouveau based
nvkm is definitely not acceptable due to the unstable firmware APIs.

Dave.

Jason Gunthorpe Sept. 24, 2024, 11:47 p.m. UTC | #12

On Wed, Sep 25, 2024 at 08:52:32AM +1000, Dave Airlie wrote:
> On Wed, 25 Sept 2024 at 05:57, Danilo Krummrich <dakr@kernel.org> wrote:
> >
> > On Tue, Sep 24, 2024 at 01:41:51PM -0300, Jason Gunthorpe wrote:
> > > On Tue, Sep 24, 2024 at 12:50:55AM +0200, Danilo Krummrich wrote:
> > >
> > > > > From the VFIO side I would like to see something like this merged in
> > > > > nearish future as it would bring a previously out of tree approach to
> > > > > be fully intree using our modern infrastructure. This is a big win for
> > > > > the VFIO world.
> > > > >
> > > > > As a commercial product this will be backported extensively to many
> > > > > old kernels and that is harder/impossible if it isn't exclusively in
> > > > > C. So, I think nova needs to co-exist in some way.
> > > >
> > > > We'll surely not support two drivers for the same thing in the long term,
> > > > neither does it make sense, nor is it sustainable.
> > >
> > > What is being done here is the normal correct kernel thing to
> > > do. Refactor the shared core code into a module and stick higher level
> > > stuff on top of it. Ideally Nova/Nouveau would exist as peers
> > > implementing DRM subsystem on this shared core infrastructure. We've
> > > done this sort of thing before in other places in the kernel. It has
> > > been proven to work well.
> >
> > So, that's where you have the wrong understanding of what we're
> > working on: You seem to think that Nova is just another DRM
> > subsystem layer on top of the NVKM parts (what you call the core
> > driver) of Nouveau.

Well, no, I am calling a core driver to be the very minimal parts that
are actually shared between vfio and drm. It should definitely not
include key parts you want to work on in rust, like the command
marshaling. 

I expect there is more work to do in order to make this kind of split,
but this is what I'm thinking/expecting.

> > But the whole point of Nova is to replace the NVKM parts of Nouveau, since
> > that's where the problems we want to solve reside in.
> 
> Just to re-emphasise for Jason who might not be as across this stuff,
> 
> NVKM replacement with rust is the main reason for the nova project,
> 100% the driving force for nova is the unstable NVIDIA firmware API.
> The ability to use rust proc-macros to hide the NVIDIA instability
> instead of trying to do it in C by either generators or abusing C
> macros (which I don't think are sufficient).

I would not include any of this in the very core most driver. My
thinking is informed by what we've done in RDMA, particularly mlx5
which has a pretty thin PCI driver and each of the drivers stacked on
top form their own command buffers directly. The PCI driver primarily
just does some device bootup, command execution and interrupts because
those are all shared by the subsystem drivers.

We have a lot of experiance now building these kinds of
multi-subsystem structures and this pattern works very well.

So, broadly, build your rust proc macros on the DRM Nova driver and
call a core function to submit a command buffer to the device and get
back a response.

VFIO will make it's command buffers with C and call the same core
function.

> I think the idea of a nova drm and nova core driver architecture is
> acceptable to most of us, but long term trying to main a nouveau based
> nvkm is definitely not acceptable due to the unstable firmware APIs.

? nova core, meaning nova rust, meaning vfio depends on rust, doesn't
seem acceptable ? We need to keep rust isolated to DRM for the
foreseeable future. Just need to find a separation that can do that.

Jason

Dave Airlie Sept. 25, 2024, 12:18 a.m. UTC | #13

>
> Well, no, I am calling a core driver to be the very minimal parts that
> are actually shared between vfio and drm. It should definitely not
> include key parts you want to work on in rust, like the command
> marshaling.

Unfortunately not, the fw ABI is the unsolved problem, rust is our
best solution.

>
> I expect there is more work to do in order to make this kind of split,
> but this is what I'm thinking/expecting.
>
> > > But the whole point of Nova is to replace the NVKM parts of Nouveau, since
> > > that's where the problems we want to solve reside in.
> >
> > Just to re-emphasise for Jason who might not be as across this stuff,
> >
> > NVKM replacement with rust is the main reason for the nova project,
> > 100% the driving force for nova is the unstable NVIDIA firmware API.
> > The ability to use rust proc-macros to hide the NVIDIA instability
> > instead of trying to do it in C by either generators or abusing C
> > macros (which I don't think are sufficient).
>
> I would not include any of this in the very core most driver. My
> thinking is informed by what we've done in RDMA, particularly mlx5
> which has a pretty thin PCI driver and each of the drivers stacked on
> top form their own command buffers directly. The PCI driver primarily
> just does some device bootup, command execution and interrupts because
> those are all shared by the subsystem drivers.
>
> We have a lot of experiance now building these kinds of
> multi-subsystem structures and this pattern works very well.
>
> So, broadly, build your rust proc macros on the DRM Nova driver and
> call a core function to submit a command buffer to the device and get
> back a response.
>
> VFIO will make it's command buffers with C and call the same core
> function.
>
> > I think the idea of a nova drm and nova core driver architecture is
> > acceptable to most of us, but long term trying to main a nouveau based
> > nvkm is definitely not acceptable due to the unstable firmware APIs.
>
> ? nova core, meaning nova rust, meaning vfio depends on rust, doesn't
> seem acceptable ? We need to keep rust isolated to DRM for the
> foreseeable future. Just need to find a separation that can do that.

That isn't going to happen, if we start with that as the default
positioning it won't get us very far.

The core has to be rust, because NVIDIA has an unstable firmware API.
The unstable firmware API isn't some command marshalling, it's deep
down into the depths of it, like memory sizing requirements, base
message queue layout and encoding, firmware init procedures. These are
all changeable at any time with no regard for upstream development, so
upstream development needs to be insulated from these as much as
possible. Using rust provides that insulation layer. The unstable ABI
isn't a solvable problem in the short term, using rust is the
maintainable answer.

Now there are maybe some on/off ramps we can use here that might
provide some solutions to bridge the gap. Using rust in the kernel has
various levels, which we currently tie into one place, but if we
consider different longer term progressions it might be possible to
start with some rust that is easier to backport than other rust might
be etc.

Strategies for moving nvkm core from C to rust in steps, or along a
sliding scale of fws supported could be open for discussion.

The end result though is to have nova core and nova drm in rust, that
is the decision upstream made 6-12 months ago, I don't see any of the
initial reasons for using rust have been invalidated or removed that
warrant revisiting that decision.

Dave.

Jason Gunthorpe Sept. 25, 2024, 12:53 a.m. UTC | #14

On Tue, Sep 24, 2024 at 09:56:58PM +0200, Danilo Krummrich wrote:

> Currently - and please correct me if I'm wrong - you make it sound to me as if
> you're not willing to respect the decisions that have been taken by Nouveau and
> DRM maintainers.

I've never said anything about your work, go do Nova, have fun.

I'm just not agreeing to being forced into taking Rust dependencies in
VFIO because Nova is participating in the Rust Experiment.

I think the reasonable answer is to accept some code duplication, or
try to consolidate around a small C core. I understad this is
different than you may have planned so far for Nova, but all projects
are subject to community feedback, especially when faced with new
requirements.

I think this discussion is getting a little overheated, there is lots
of space here for everyone to do their things. Let's not get too
excited.

> I encourage that NVIDIA wants to move things upstream and I'm absolutely willing
> to collaborate and help with the use-cases and goals NVIDIA has. But it really
> has to be a collaboration and this starts with acknowledging the goals of *each
> other*.

I've always acknowledged Nova's goal - it is fine.

It is just quite incompatible with the VFIO side requirement of no
Rust in our stack until the ecosystem can consume it.

I belive there is no reason we can't find an agreeable compromise.

> > I expect the core code would continue to support new HW going forward
> > to support the VFIO driver, even if nouveau doesn't use it, until Rust
> > reaches some full ecosystem readyness for the server space.
> 
> From an upstream perspective the kernel doesn't need to consider OOT drivers,
> i.e. the guest driver.

?? VFIO already took the decision that it is agnostic to what is
running in the VM. Run Windows-only VMs for all we care, it is still
supposed to be virtualized correctly.

> > There are going to be a lot of users of this code, let's not rush to
> > harm them please.
> 
> Please abstain from such kind of unconstructive insinuations; it's ridiculous to
> imply that upstream kernel developers and maintainers would harm the users of
> NVIDIA GPUs.

You literally just said you'd want to effectively block usable VFIO
support for new GPU HW when "we stop further support for new HW in
Nouveau at some point" and "move the vGPU parts over to Nova(& rust)".

I don't agree to that, it harms VFIO users, and is not acknowledging
that conflicting goals exist.

VFIO will decide when it starts to depend on rust, Nova should not
force that decision on VFIO. They are very different ecosystems with
different needs.

Jason

Dave Airlie Sept. 25, 2024, 1:08 a.m. UTC | #15

On Wed, 25 Sept 2024 at 10:53, Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Tue, Sep 24, 2024 at 09:56:58PM +0200, Danilo Krummrich wrote:
>
> > Currently - and please correct me if I'm wrong - you make it sound to me as if
> > you're not willing to respect the decisions that have been taken by Nouveau and
> > DRM maintainers.
>
> I've never said anything about your work, go do Nova, have fun.
>
> I'm just not agreeing to being forced into taking Rust dependencies in
> VFIO because Nova is participating in the Rust Experiment.
>
> I think the reasonable answer is to accept some code duplication, or
> try to consolidate around a small C core. I understad this is
> different than you may have planned so far for Nova, but all projects
> are subject to community feedback, especially when faced with new
> requirements.
>
> I think this discussion is getting a little overheated, there is lots
> of space here for everyone to do their things. Let's not get too
> excited.

How do you intend to solve the stable ABI problem caused by the GSP firmware?

If you haven't got an answer to that, that is reasonable, you can talk
about VFIO and DRM and who is in charge all you like, but it doesn't
matter.

Fundamentally the problem is the unstable API exposure isn't something
you can build a castle on top of, the nova idea is to use rust to
solve a fundamental problem with the NVIDIA driver design process
forces on us (vfio included), I'm not seeing anything constructive as
to why doing that in C would be worth the investment. Nothing has
changed from when we designed nova, this idea was on the table then,
it has all sorts of problems leaking the unstable ABI that have to be
solved, and when I see a solution for that in C that is maintainable
and doesn't leak like a sieve I might be interested, but you know keep
thinking we are using rust so we can have fun, not because we are
using it to solve maintainability problems caused by an internal
NVIDIA design decision over which we have zero influence.

Dave.

Jason Gunthorpe Sept. 25, 2024, 1:29 a.m. UTC | #16

On Wed, Sep 25, 2024 at 10:18:44AM +1000, Dave Airlie wrote:

> > ? nova core, meaning nova rust, meaning vfio depends on rust, doesn't
> > seem acceptable ? We need to keep rust isolated to DRM for the
> > foreseeable future. Just need to find a separation that can do that.
> 
> That isn't going to happen, if we start with that as the default
> positioning it won't get us very far.

What do you want me to say to that? We can't have rust in VFIO right
now, we don't have that luxury. This is just a fact, I can't change
it.

If you say upstream has to be rust then there just won't be upstream
and this will all go OOT and stay as C code. That isn't a good
outcome. Having rust usage actively harm participation in the kernel
seems like the exact opposite of the consensus of the maintainer
summit.

> The core has to be rust, because NVIDIA has an unstable firmware API.
> The unstable firmware API isn't some command marshalling, it's deep
> down into the depths of it, like memory sizing requirements, base
> message queue layout and encoding, firmware init procedures.

I get the feeling the vast majorty of the work, and primary rust
benefit, lies in the command marshalling.

If the init *procedures* change, for instance, you are going to have to
write branches no matter what language you use.

I don't know, it is just a suggestion to consider.

> Now there are maybe some on/off ramps we can use here that might
> provide some solutions to bridge the gap. Using rust in the kernel has
> various levels, which we currently tie into one place, but if we
> consider different longer term progressions it might be possible to
> start with some rust that is easier to backport than other rust might
> be etc.

That seems to be entirely unexplored territory. Certainly if the
backporting can be shown to be solved then I have much less objection
to having VFIO depend on rust.

This is part of why I suggested that a rust core driver could expose
the C APIs that VFIO needs with a kconfig switch. Then people can
experiment and give feedback on what backporting this rust stuff is
actually like. That would be valuable for everyone I think. Especially
if the feedback is that backporting is no problem.

Yes we have duplication while that is ongoing, but I think that is
inevitable, and at least everyone could agree to the duplication and I
expect NVIDIA would sign up to maintain the C VFIO stack top to
bottom.

> The end result though is to have nova core and nova drm in rust, that
> is the decision upstream made 6-12 months ago, I don't see any of the
> initial reasons for using rust have been invalidated or removed that
> warrant revisiting that decision.

Never said they did, but your decision to use Rust in Nova does not
automatically mean a decision to use Rust in VFIO, and now we have a
new requirement to couple the two together. It still must be resolved
satisfactorily.

Jason

Danilo Krummrich Sept. 25, 2024, 10:55 a.m. UTC | #17

On Tue, Sep 24, 2024 at 09:53:19PM -0300, Jason Gunthorpe wrote:
> On Tue, Sep 24, 2024 at 09:56:58PM +0200, Danilo Krummrich wrote:
> 
> > Currently - and please correct me if I'm wrong - you make it sound to me as if
> > you're not willing to respect the decisions that have been taken by Nouveau and
> > DRM maintainers.
> 
> I've never said anything about your work, go do Nova, have fun.

See, that's the attitude that doesn't get us anywhere.

You act as if we'd just be toying around to have fun, position yourself as the
one who wants to do the "real deal" and just claim that our decisions would harm
users.

And at the same time you proof that you did not get up to speed on what were the
reasons to move in this direction and what are the problems we try to solve.

This just won't lead to a constructive discussion that addresses your concerns.

Try to not go like a bull at a gate. Instead start with asking questions to
understand why we chose this direction and then feel free to raise concerns.

I assure you, we will hear and recognize them! And I'm also sure that we'll find
solutions and compromises.

> 
> I'm just not agreeing to being forced into taking Rust dependencies in
> VFIO because Nova is participating in the Rust Experiment.
> 
> I think the reasonable answer is to accept some code duplication, or
> try to consolidate around a small C core. I understad this is
> different than you may have planned so far for Nova, but all projects
> are subject to community feedback, especially when faced with new
> requirements.

Fully agree, and I'm absolutely open to consider feedback and new requirements.

But again, consider what I said above -- you're creating counterproposals out of
thin air, without considering what we have planned for so far at all.

So, I wonder what kind of reaction you expect approaching things this way?

> 
> I think this discussion is getting a little overheated, there is lots
> of space here for everyone to do their things. Let's not get too
> excited.
> 
> > I encourage that NVIDIA wants to move things upstream and I'm absolutely willing
> > to collaborate and help with the use-cases and goals NVIDIA has. But it really
> > has to be a collaboration and this starts with acknowledging the goals of *each
> > other*.
> 
> I've always acknowledged Nova's goal - it is fine.
> 
> It is just quite incompatible with the VFIO side requirement of no
> Rust in our stack until the ecosystem can consume it.
> 
> I belive there is no reason we can't find an agreeable compromise.

I'm pretty sure we indeed can find agreeable compromise. But again, please
understand that the way of approaching this you've chosen so far won't get us
there.

> 
> > > I expect the core code would continue to support new HW going forward
> > > to support the VFIO driver, even if nouveau doesn't use it, until Rust
> > > reaches some full ecosystem readyness for the server space.
> > 
> > From an upstream perspective the kernel doesn't need to consider OOT drivers,
> > i.e. the guest driver.
> 
> ?? VFIO already took the decision that it is agnostic to what is
> running in the VM. Run Windows-only VMs for all we care, it is still
> supposed to be virtualized correctly.
> 
> > > There are going to be a lot of users of this code, let's not rush to
> > > harm them please.
> > 
> > Please abstain from such kind of unconstructive insinuations; it's ridiculous to
> > imply that upstream kernel developers and maintainers would harm the users of
> > NVIDIA GPUs.
> 
> You literally just said you'd want to effectively block usable VFIO
> support for new GPU HW when "we stop further support for new HW in
> Nouveau at some point" and "move the vGPU parts over to Nova(& rust)".

Well, working on a successor means that once it's in place the support for the
replaced thing has to end at some point.

This doesn't mean that we can't work out ways to address your concerns.

You just make it a binary thing and claim that if we don't choose 1 we harm
users.

This effectively denies looking for solutions of your concerns in the first
place. And again, this won't get us anywhere. It just creates the impression
that you're not interested in solutions, but push through your agenda.

> 
> I don't agree to that, it harms VFIO users, and is not acknowledging
> that conflicting goals exist.
> 
> VFIO will decide when it starts to depend on rust, Nova should not
> force that decision on VFIO. They are very different ecosystems with
> different needs.
> 
> Jason
>

Jason Gunthorpe Sept. 25, 2024, 3:28 p.m. UTC | #18

On Wed, Sep 25, 2024 at 11:08:40AM +1000, Dave Airlie wrote:
> On Wed, 25 Sept 2024 at 10:53, Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > On Tue, Sep 24, 2024 at 09:56:58PM +0200, Danilo Krummrich wrote:
> >
> > > Currently - and please correct me if I'm wrong - you make it sound to me as if
> > > you're not willing to respect the decisions that have been taken by Nouveau and
> > > DRM maintainers.
> >
> > I've never said anything about your work, go do Nova, have fun.
> >
> > I'm just not agreeing to being forced into taking Rust dependencies in
> > VFIO because Nova is participating in the Rust Experiment.
> >
> > I think the reasonable answer is to accept some code duplication, or
> > try to consolidate around a small C core. I understad this is
> > different than you may have planned so far for Nova, but all projects
> > are subject to community feedback, especially when faced with new
> > requirements.
> >
> > I think this discussion is getting a little overheated, there is lots
> > of space here for everyone to do their things. Let's not get too
> > excited.
> 
> How do you intend to solve the stable ABI problem caused by the GSP firmware?
> 
> If you haven't got an answer to that, that is reasonable, you can talk
> about VFIO and DRM and who is in charge all you like, but it doesn't
> matter.

I suggest the same answer everyone else building HW in the kernel
operates under. You get to update your driver with your new HW once
per generation.

Not once per FW release, once per generation. That is a similar level
of burden to maintain as most drivers. It is not as good as the
excellence Mellanox does (no SW change for a new HW generation), but
it is still good.

I would apply this logic to Nova as well, no reason to be supporting
random ABI changes coming out every month(s).

> Fundamentally the problem is the unstable API exposure isn't something
> you can build a castle on top of, the nova idea is to use rust to
> solve a fundamental problem with the NVIDIA driver design process
> forces on us (vfio included), 

I firmly believe you can't solve a stable ABI problem with language
features in an OS. The ABI is totally unstable, it will change
semantically, the order and nature of functions you need will
change. New HW will need new behaviors and semantics.

Language support can certainly handle the mindless churn that ideally
shouldn't even be happening in the first place.

The way you solve this is at the root, in the FW. Don't churn
everything. I'm a big believer and supporter of the Mellanox
super-stable approach that has really proven how valuable this concept
is to everyone.

So I agree with you, the extreme unstableness is not OK in upstream,
it needs to slow down a lot to be acceptable. I don't necessarily
agree to Mellanox like gold standard as the bar, but certainly must be
way better than it is now.

FWIW when I discussed the VFIO patches I was given some impression
there would not be high levels of ABI churn on the VFIO side, and that
there was awareness and understanding of this issue on Zhi's side.

Jason

Tian, Kevin Sept. 26, 2024, 6:43 a.m. UTC | #19

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Monday, September 23, 2024 11:02 PM
> 
> On Mon, Sep 23, 2024 at 06:22:33AM +0000, Tian, Kevin wrote:
> > > From: Zhi Wang <zhiw@nvidia.com>
> > > Sent: Sunday, September 22, 2024 8:49 PM
> > >
> > [...]
> > >
> > > The NVIDIA vGPU VFIO module together with VFIO sits on VFs, provides
> > > extended management and features, e.g. selecting the vGPU types,
> support
> > > live migration and driver warm update.
> > >
> > > Like other devices that VFIO supports, VFIO provides the standard
> > > userspace APIs for device lifecycle management and advance feature
> > > support.
> > >
> > > The NVIDIA vGPU manager provides necessary support to the NVIDIA
> vGPU VFIO
> > > variant driver to create/destroy vGPUs, query available vGPU types, select
> > > the vGPU type, etc.
> > >
> > > On the other side, NVIDIA vGPU manager talks to the NVIDIA GPU core
> driver,
> > > which provide necessary support to reach the HW functions.
> > >
> >
> > I'm not sure VFIO is the right place to host the NVIDIA vGPU manager.
> > It's very NVIDIA specific and naturally fit in the PF driver.
> 
> drm isn't a particularly logical place for that either :|
> 

This RFC doesn't expose any new uAPI in the vGPU manager, e.g. with
the vGPU type hard-coded to L40-24Q. In this way the boundary between
code in VFIO and code in PF driver is probably more a vendor specific
choice.

However according to the cover letter it's reasonable for future extension
to implement new uAPI  for admin to select the vGPU type and potentially
do more manual configurations before the target VF can be used:

Then there comes an open whether VFIO is a right place to host such
vendor specific provisioning interface. The existing mdev type based
provisioning mechanism was considered a bad fit already.

IIRC the previous discussion came to suggest putting the provisioning
interface in the PF driver. There may be chance to generalize and
move to VFIO but no idea what it will be until multiple drivers already
demonstrate their own implementations as the base for discussion.

But now seems you prefer to vendors putting their own provisioning
interface in VFIO directly?

Thanks
Kevin

Greg Kroah-Hartman Sept. 26, 2024, 9:14 a.m. UTC | #20

On Mon, Sep 23, 2024 at 12:01:40PM -0300, Jason Gunthorpe wrote:
> On Mon, Sep 23, 2024 at 10:49:07AM +0200, Danilo Krummrich wrote:
> > > 2. Proposal for upstream
> > > ========================
> > 
> > What is the strategy in the mid / long term with this?
> > 
> > As you know, we're trying to move to Nova and the blockers with the device /
> > driver infrastructure have been resolved and we're able to move forward. Besides
> > that, Dave made great progress on the firmware abstraction side of things.
> > 
> > Is this more of a proof of concept? Do you plan to work on Nova in general and
> > vGPU support for Nova?
> 
> This is intended to be a real product that customers would use, it is
> not a proof of concept. There is alot of demand for this kind of
> simplified virtualization infrastructure in the host side. The series
> here is the first attempt at making thin host infrastructure and
> Zhi/etc are doing it with an upstream-first approach.
> 
> >From the VFIO side I would like to see something like this merged in
> nearish future as it would bring a previously out of tree approach to
> be fully intree using our modern infrastructure. This is a big win for
> the VFIO world.
> 
> As a commercial product this will be backported extensively to many
> old kernels and that is harder/impossible if it isn't exclusively in
> C. So, I think nova needs to co-exist in some way.

Please never make design decisions based on old ancient commercial
kernels that have any relevance to upstream kernel development today.
If you care about those kernels, work with the companies that get paid
to support such things.  Otherwise development upstream would just
completely stall and never go forward, as you well know.

As it seems that future support for this hardware is going to be in
rust, just use those apis going forward and backport the small number of
missing infrastructure patches to the relevant ancient kernels as well,
it's not like that would even be noticed in the overall number of
patches they take for normal subsystem improvements :)

thanks,

greg k-h

Jason Gunthorpe Sept. 26, 2024, 12:42 p.m. UTC | #21

On Thu, Sep 26, 2024 at 11:14:27AM +0200, Greg KH wrote:
> On Mon, Sep 23, 2024 at 12:01:40PM -0300, Jason Gunthorpe wrote:
> > On Mon, Sep 23, 2024 at 10:49:07AM +0200, Danilo Krummrich wrote:
> > > > 2. Proposal for upstream
> > > > ========================
> > > 
> > > What is the strategy in the mid / long term with this?
> > > 
> > > As you know, we're trying to move to Nova and the blockers with the device /
> > > driver infrastructure have been resolved and we're able to move forward. Besides
> > > that, Dave made great progress on the firmware abstraction side of things.
> > > 
> > > Is this more of a proof of concept? Do you plan to work on Nova in general and
> > > vGPU support for Nova?
> > 
> > This is intended to be a real product that customers would use, it is
> > not a proof of concept. There is alot of demand for this kind of
> > simplified virtualization infrastructure in the host side. The series
> > here is the first attempt at making thin host infrastructure and
> > Zhi/etc are doing it with an upstream-first approach.
> > 
> > >From the VFIO side I would like to see something like this merged in
> > nearish future as it would bring a previously out of tree approach to
> > be fully intree using our modern infrastructure. This is a big win for
> > the VFIO world.
> > 
> > As a commercial product this will be backported extensively to many
> > old kernels and that is harder/impossible if it isn't exclusively in
> > C. So, I think nova needs to co-exist in some way.
> 
> Please never make design decisions based on old ancient commercial
> kernels that have any relevance to upstream kernel development
> today.

Greg, you are being too extreme. Those "ancient commercial kernels"
have a huge relevance to alot of our community because they are the
users that actually run the code we are building and pay for it to be
created. Yes we usually (but not always!) push back on accommodations
upstream, but taking hard dependencies on rust is currently a very
different thing.

> If you care about those kernels, work with the companies that get paid
> to support such things.  Otherwise development upstream would just
> completely stall and never go forward, as you well know.

They seem to be engaged, but upstream rust isn't even done yet. So
what exactly do you expect them to do? Throw out whole architectures
from their products?

I know how things work, I just don't think we are ready to elevate
Rust to the category of decisions where upstream can ignore the
downstream side readiness. In my view the community needs to agree to
remove the experimental status from Rust first.

> As it seems that future support for this hardware is going to be in
> rust, just use those apis going forward and backport the small number of

"those apis" don't even exist yet! There is a big multi-year gap
between when pure upstream would even be ready to put something like
VFIO on top of Nova and Rust and where we are now with this series.

This argument is *way too early*. I'm deeply hoping we never have to
actually have it, that by the time Nova gets merged Rust will be 100%
ready upstream and there will be no issue. Please? Can that happen?

Otherwise, let's slow down here. Nova is still years away from being
finished. Nouveau is the in-tree driver for this HW. This series
improves on Nouveau. We are definitely not at the point of refusing
new code because it is not writte in Rust, RIGHT?

Jason

Greg Kroah-Hartman Sept. 26, 2024, 12:54 p.m. UTC | #22

On Thu, Sep 26, 2024 at 09:42:39AM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 26, 2024 at 11:14:27AM +0200, Greg KH wrote:
> > On Mon, Sep 23, 2024 at 12:01:40PM -0300, Jason Gunthorpe wrote:
> > > On Mon, Sep 23, 2024 at 10:49:07AM +0200, Danilo Krummrich wrote:
> > > > > 2. Proposal for upstream
> > > > > ========================
> > > > 
> > > > What is the strategy in the mid / long term with this?
> > > > 
> > > > As you know, we're trying to move to Nova and the blockers with the device /
> > > > driver infrastructure have been resolved and we're able to move forward. Besides
> > > > that, Dave made great progress on the firmware abstraction side of things.
> > > > 
> > > > Is this more of a proof of concept? Do you plan to work on Nova in general and
> > > > vGPU support for Nova?
> > > 
> > > This is intended to be a real product that customers would use, it is
> > > not a proof of concept. There is alot of demand for this kind of
> > > simplified virtualization infrastructure in the host side. The series
> > > here is the first attempt at making thin host infrastructure and
> > > Zhi/etc are doing it with an upstream-first approach.
> > > 
> > > >From the VFIO side I would like to see something like this merged in
> > > nearish future as it would bring a previously out of tree approach to
> > > be fully intree using our modern infrastructure. This is a big win for
> > > the VFIO world.
> > > 
> > > As a commercial product this will be backported extensively to many
> > > old kernels and that is harder/impossible if it isn't exclusively in
> > > C. So, I think nova needs to co-exist in some way.
> > 
> > Please never make design decisions based on old ancient commercial
> > kernels that have any relevance to upstream kernel development
> > today.
> 
> Greg, you are being too extreme. Those "ancient commercial kernels"
> have a huge relevance to alot of our community because they are the
> users that actually run the code we are building and pay for it to be
> created. Yes we usually (but not always!) push back on accommodations
> upstream, but taking hard dependencies on rust is currently a very
> different thing.

That's fine, but again, do NOT make design decisions based on what you
can, and can not, feel you can slide by one of these companies to get it
into their old kernels.  That's what I take objection to here.

Also always remember please, that the % of overall Linux kernel
installs, even counting out Android and embedded, is VERY tiny for these
companies.  The huge % overall is doing the "right thing" by using
upstream kernels.  And with the laws in place now that % is only going
to grow and those older kernels will rightfully fall away into even
smaller %.

I know those companies pay for many developers, I'm not saying that
their contributions are any less or more important than others, they all
are equal.  You wouldn't want design decisions for a patch series to be
dictated by some really old Yocto kernel restrictions that are only in
autos, right?  We are a large community, that's what I'm saying.

> Otherwise, let's slow down here. Nova is still years away from being
> finished. Nouveau is the in-tree driver for this HW. This series
> improves on Nouveau. We are definitely not at the point of refusing
> new code because it is not writte in Rust, RIGHT?

No, I do object to "we are ignoring the driver being proposed by the
developers involved for this hardware by adding to the old one instead"
which it seems like is happening here.

Anyway, let's focus on the code, there's already real issues with this
patch series as pointed out by me and others that need to be addressed
before it can go anywhere.

thanks,

greg k-h

Jason Gunthorpe Sept. 26, 2024, 12:55 p.m. UTC | #23

On Thu, Sep 26, 2024 at 06:43:44AM +0000, Tian, Kevin wrote:

> Then there comes an open whether VFIO is a right place to host such
> vendor specific provisioning interface. The existing mdev type based
> provisioning mechanism was considered a bad fit already.

> IIRC the previous discussion came to suggest putting the provisioning
> interface in the PF driver. There may be chance to generalize and
> move to VFIO but no idea what it will be until multiple drivers already
> demonstrate their own implementations as the base for discussion.

I am looking at fwctl do to alot of this in the SRIOV world.

You'd provision the VF prior to opening VFIO using the fwctl interface
and the VFIO would perceive a VF that has exactly the required
properties. At least for SRIOV where the VM is talking directly to
device FW, mdev/paravirtualization would be different.

> But now seems you prefer to vendors putting their own provisioning
> interface in VFIO directly?

Maybe not, just that drm isn't the right place either. If the we do
fwctl stuff then the VF provisioning would be done through a fwctl
driver.

I'm not entirely sure yet what this whole 'mgr' component is actually
doing though.

Jason

Danilo Krummrich Sept. 26, 2024, 1:07 p.m. UTC | #24

On Thu, Sep 26, 2024 at 02:54:38PM +0200, Greg KH wrote:
> On Thu, Sep 26, 2024 at 09:42:39AM -0300, Jason Gunthorpe wrote:
> > On Thu, Sep 26, 2024 at 11:14:27AM +0200, Greg KH wrote:
> > > On Mon, Sep 23, 2024 at 12:01:40PM -0300, Jason Gunthorpe wrote:
> > > > On Mon, Sep 23, 2024 at 10:49:07AM +0200, Danilo Krummrich wrote:
> > > > > > 2. Proposal for upstream
> > > > > > ========================
> > > > > 
> > > > > What is the strategy in the mid / long term with this?
> > > > > 
> > > > > As you know, we're trying to move to Nova and the blockers with the device /
> > > > > driver infrastructure have been resolved and we're able to move forward. Besides
> > > > > that, Dave made great progress on the firmware abstraction side of things.
> > > > > 
> > > > > Is this more of a proof of concept? Do you plan to work on Nova in general and
> > > > > vGPU support for Nova?
> > > > 
> > > > This is intended to be a real product that customers would use, it is
> > > > not a proof of concept. There is alot of demand for this kind of
> > > > simplified virtualization infrastructure in the host side. The series
> > > > here is the first attempt at making thin host infrastructure and
> > > > Zhi/etc are doing it with an upstream-first approach.
> > > > 
> > > > >From the VFIO side I would like to see something like this merged in
> > > > nearish future as it would bring a previously out of tree approach to
> > > > be fully intree using our modern infrastructure. This is a big win for
> > > > the VFIO world.
> > > > 
> > > > As a commercial product this will be backported extensively to many
> > > > old kernels and that is harder/impossible if it isn't exclusively in
> > > > C. So, I think nova needs to co-exist in some way.
> > > 
> > > Please never make design decisions based on old ancient commercial
> > > kernels that have any relevance to upstream kernel development
> > > today.
> > 
> > Greg, you are being too extreme. Those "ancient commercial kernels"
> > have a huge relevance to alot of our community because they are the
> > users that actually run the code we are building and pay for it to be
> > created. Yes we usually (but not always!) push back on accommodations
> > upstream, but taking hard dependencies on rust is currently a very
> > different thing.
> 
> That's fine, but again, do NOT make design decisions based on what you
> can, and can not, feel you can slide by one of these companies to get it
> into their old kernels.  That's what I take objection to here.
> 
> Also always remember please, that the % of overall Linux kernel
> installs, even counting out Android and embedded, is VERY tiny for these
> companies.  The huge % overall is doing the "right thing" by using
> upstream kernels.  And with the laws in place now that % is only going
> to grow and those older kernels will rightfully fall away into even
> smaller %.
> 
> I know those companies pay for many developers, I'm not saying that
> their contributions are any less or more important than others, they all
> are equal.  You wouldn't want design decisions for a patch series to be
> dictated by some really old Yocto kernel restrictions that are only in
> autos, right?  We are a large community, that's what I'm saying.
> 
> > Otherwise, let's slow down here. Nova is still years away from being
> > finished. Nouveau is the in-tree driver for this HW. This series
> > improves on Nouveau. We are definitely not at the point of refusing
> > new code because it is not writte in Rust, RIGHT?

Just a reminder on what I said and not said, respectively. I never said we can't
support this in Nouveau for the short and mid term.

But we can't add new features and support new use-cases in Nouveau *without*
considering the way forward to the new driver.

> 
> No, I do object to "we are ignoring the driver being proposed by the
> developers involved for this hardware by adding to the old one instead"
> which it seems like is happening here.
> 
> Anyway, let's focus on the code, there's already real issues with this
> patch series as pointed out by me and others that need to be addressed
> before it can go anywhere.
> 
> thanks,
> 
> greg k-h
>

Jason Gunthorpe Sept. 26, 2024, 2:40 p.m. UTC | #25

On Thu, Sep 26, 2024 at 02:54:38PM +0200, Greg KH wrote:

> That's fine, but again, do NOT make design decisions based on what you
> can, and can not, feel you can slide by one of these companies to get it
> into their old kernels.  That's what I take objection to here.

It is not slide by. It is a recognition that participating in the
community gives everyone value. If you excessively deny value from one
side they will have no reason to participate.

In this case the value is that, with enough light work, the
kernel-fork community can deploy this code to their users. This has
been the accepted bargin for a long time now.

There is a great big question mark over Rust regarding what impact it
actually has on this dynamic. It is definitely not just backport a few
hundred upstream patches. There is clearly new upstream development
work needed still - arch support being a very obvious one.

> Also always remember please, that the % of overall Linux kernel
> installs, even counting out Android and embedded, is VERY tiny for these
> companies.  The huge % overall is doing the "right thing" by using
> upstream kernels.  And with the laws in place now that % is only going
> to grow and those older kernels will rightfully fall away into even
> smaller %.

Who is "doing the right thing"? That is not what I see, we sell
server HW to *everyone*. There are a couple sites that are "near"
upstream, but that is not too common. Everyone is running some kind of
kernel fork.

I dislike this generalization you do with % of users. Almost 100% of
NVIDIA server HW are running forks. I would estimate around 10% is
above a 6.0 baseline. It is not tiny either, NVIDIA sold like $60B of
server HW running Linux last year with this kind of demographic. So
did Intel, AMD, etc.

I would not describe this as "VERY tiny". Maybe you mean RHEL-alike
specifically, and yes, they are a diminishing install share. However,
the hyperscale companies more than make up for that with their
internal secret proprietary forks :(

> > Otherwise, let's slow down here. Nova is still years away from being
> > finished. Nouveau is the in-tree driver for this HW. This series
> > improves on Nouveau. We are definitely not at the point of refusing
> > new code because it is not writte in Rust, RIGHT?
> 
> No, I do object to "we are ignoring the driver being proposed by the
> developers involved for this hardware by adding to the old one instead"
> which it seems like is happening here.

That is too harsh. We've consistently taken a community position that
OOT stuff doesn't matter, and yes that includes OOT stuff that people
we trust and respect are working on. Until it is ready for submission,
and ideally merged, it is an unknown quantity. Good well meaning
people routinely drop their projects, good projects run into
unexpected roadblocks, and life happens.

Nova is not being ignored, there is dialog, and yes some disagreement.

Again, nobody here is talking about disrupting Nova. We just want to
keep going as-is until we can all agree together it is ready to make a
change.

Jason

Andy Ritger Sept. 26, 2024, 6:07 p.m. UTC | #26

I hope and expect the nova and vgpu_mgr efforts to ultimately converge.

First, for the fw ABI debacle: yes, it is unfortunate that we still don't
have a stable ABI from GSP.  We /are/ working on it, though there isn't
anything to show, yet.  FWIW, I expect the end result will be a much
simpler interface than what is there today, and a stable interface that
NVIDIA can guarantee.

But, for now, we have a timing problem like Jason described:

- We have customers eager for upstream vfio support in the near term,
  and that seems like something NVIDIA can develop/contribute/maintain in
  the near term, as an incremental step forward.

- Nova is still early in its development, relative to nouveau/nvkm.

- From NVIDIA's perspective, we're nervous about the backportability of
  rust-based components to enterprise kernels in the near term.

- The stable GSP ABI is not going to be ready in the near term.


I agree with what Dave said in one of the forks of this thread, in the context of
NV2080_CTRL_VGPU_MGR_INTERNAL_BOOTLOAD_GSP_VGPU_PLUGIN_TASK_PARAMS:

> The GSP firmware interfaces are not guaranteed stable. Exposing these
> interfaces outside the nvkm core is unacceptable, as otherwise we
> would have to adapt the whole kernel depending on the loaded firmware.
>
> You cannot use any nvidia sdk headers, these all have to be abstracted
> behind things that have no bearing on the API.

Agreed.  Though not infinitely scalable, and not
as clean as in rust, it seems possible to abstract
NV2080_CTRL_VGPU_MGR_INTERNAL_BOOTLOAD_GSP_VGPU_PLUGIN_TASK_PARAMS behind
a C-implemented abstraction layer in nvkm, at least for the short term.

Is there a potential compromise where vgpu_mgr starts its life with a
dependency on nvkm, and as things mature we migrate it to instead depend
on nova?


On Thu, Sep 26, 2024 at 11:40:57AM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 26, 2024 at 02:54:38PM +0200, Greg KH wrote:
> 
> > That's fine, but again, do NOT make design decisions based on what you
> > can, and can not, feel you can slide by one of these companies to get it
> > into their old kernels.  That's what I take objection to here.
> 
> It is not slide by. It is a recognition that participating in the
> community gives everyone value. If you excessively deny value from one
> side they will have no reason to participate.
> 
> In this case the value is that, with enough light work, the
> kernel-fork community can deploy this code to their users. This has
> been the accepted bargin for a long time now.
> 
> There is a great big question mark over Rust regarding what impact it
> actually has on this dynamic. It is definitely not just backport a few
> hundred upstream patches. There is clearly new upstream development
> work needed still - arch support being a very obvious one.
> 
> > Also always remember please, that the % of overall Linux kernel
> > installs, even counting out Android and embedded, is VERY tiny for these
> > companies.  The huge % overall is doing the "right thing" by using
> > upstream kernels.  And with the laws in place now that % is only going
> > to grow and those older kernels will rightfully fall away into even
> > smaller %.
> 
> Who is "doing the right thing"? That is not what I see, we sell
> server HW to *everyone*. There are a couple sites that are "near"
> upstream, but that is not too common. Everyone is running some kind of
> kernel fork.
> 
> I dislike this generalization you do with % of users. Almost 100% of
> NVIDIA server HW are running forks. I would estimate around 10% is
> above a 6.0 baseline. It is not tiny either, NVIDIA sold like $60B of
> server HW running Linux last year with this kind of demographic. So
> did Intel, AMD, etc.
> 
> I would not describe this as "VERY tiny". Maybe you mean RHEL-alike
> specifically, and yes, they are a diminishing install share. However,
> the hyperscale companies more than make up for that with their
> internal secret proprietary forks :(
> 
> > > Otherwise, let's slow down here. Nova is still years away from being
> > > finished. Nouveau is the in-tree driver for this HW. This series
> > > improves on Nouveau. We are definitely not at the point of refusing
> > > new code because it is not writte in Rust, RIGHT?
> > 
> > No, I do object to "we are ignoring the driver being proposed by the
> > developers involved for this hardware by adding to the old one instead"
> > which it seems like is happening here.
> 
> That is too harsh. We've consistently taken a community position that
> OOT stuff doesn't matter, and yes that includes OOT stuff that people
> we trust and respect are working on. Until it is ready for submission,
> and ideally merged, it is an unknown quantity. Good well meaning
> people routinely drop their projects, good projects run into
> unexpected roadblocks, and life happens.
> 
> Nova is not being ignored, there is dialog, and yes some disagreement.
> 
> Again, nobody here is talking about disrupting Nova. We just want to
> keep going as-is until we can all agree together it is ready to make a
> change.
> 
> Jason

Danilo Krummrich Sept. 26, 2024, 10:23 p.m. UTC | #27

On Thu, Sep 26, 2024 at 11:07:56AM -0700, Andy Ritger wrote:
> 
> I hope and expect the nova and vgpu_mgr efforts to ultimately converge.
> 
> First, for the fw ABI debacle: yes, it is unfortunate that we still don't
> have a stable ABI from GSP.  We /are/ working on it, though there isn't
> anything to show, yet.  FWIW, I expect the end result will be a much
> simpler interface than what is there today, and a stable interface that
> NVIDIA can guarantee.
> 
> But, for now, we have a timing problem like Jason described:
> 
> - We have customers eager for upstream vfio support in the near term,
>   and that seems like something NVIDIA can develop/contribute/maintain in
>   the near term, as an incremental step forward.
> 
> - Nova is still early in its development, relative to nouveau/nvkm.
> 
> - From NVIDIA's perspective, we're nervous about the backportability of
>   rust-based components to enterprise kernels in the near term.
> 
> - The stable GSP ABI is not going to be ready in the near term.
> 
> 
> I agree with what Dave said in one of the forks of this thread, in the context of
> NV2080_CTRL_VGPU_MGR_INTERNAL_BOOTLOAD_GSP_VGPU_PLUGIN_TASK_PARAMS:
> 
> > The GSP firmware interfaces are not guaranteed stable. Exposing these
> > interfaces outside the nvkm core is unacceptable, as otherwise we
> > would have to adapt the whole kernel depending on the loaded firmware.
> >
> > You cannot use any nvidia sdk headers, these all have to be abstracted
> > behind things that have no bearing on the API.
> 
> Agreed.  Though not infinitely scalable, and not
> as clean as in rust, it seems possible to abstract
> NV2080_CTRL_VGPU_MGR_INTERNAL_BOOTLOAD_GSP_VGPU_PLUGIN_TASK_PARAMS behind
> a C-implemented abstraction layer in nvkm, at least for the short term.
> 
> Is there a potential compromise where vgpu_mgr starts its life with a
> dependency on nvkm, and as things mature we migrate it to instead depend
> on nova?
> 

Of course, I've always said that it's perfectly fine to go with Nouveau as long
as Nova is not ready yet.

But, and that's very central, the condition must be that we agree on the long
term goal and agree on working towards this goal *together*.

Having two competing upstream strategies is not acceptable.

The baseline for the long term goal that we have set so far is Nova. And this
must also be the baseline for a discussion.

Raising concerns about that is perfectly valid, we can discuss them and look for
solutions.

Danilo Krummrich Sept. 26, 2024, 10:42 p.m. UTC | #28

On Thu, Sep 26, 2024 at 11:40:57AM -0300, Jason Gunthorpe wrote:
> On Thu, Sep 26, 2024 at 02:54:38PM +0200, Greg KH wrote:
> > 
> > No, I do object to "we are ignoring the driver being proposed by the
> > developers involved for this hardware by adding to the old one instead"
> > which it seems like is happening here.
> 
> That is too harsh. We've consistently taken a community position that
> OOT stuff doesn't matter, and yes that includes OOT stuff that people
> we trust and respect are working on. Until it is ready for submission,
> and ideally merged, it is an unknown quantity. Good well meaning
> people routinely drop their projects, good projects run into
> unexpected roadblocks, and life happens.

That's not the point -- at least it never was my point.

Upstream has set a strategy, and it's totally fine to raise concerns, discuss
them, look for solutions, draw conclusions and do adjustments where needed.

But, we have to agree on a long term strategy and work towards the corresponding
goals *together*.

I don't want to end up in a situation where everyone just does their own thing.

So, when you say things like "go do Nova, have fun", it really just sounds like
as if you just want to do your own thing and ignore the existing upstream
strategy instead of collaborate and shape it.

Jason Gunthorpe Sept. 26, 2024, 10:57 p.m. UTC | #29

On Thu, Sep 26, 2024 at 09:55:28AM -0300, Jason Gunthorpe wrote:

> I'm not entirely sure yet what this whole 'mgr' component is actually
> doing though.

Looking more closely I think some of it is certainly appropriate to be
in vfio. Like when something opens the VFIO device it should allocate
the PF device resources from FW, setup kernel structures and so on to
allow the about to be opened VF to work. That is good VFIO topics. IOW
if you don't open any VFIO devices there would be a minimal overhead

But that stuff shouldn't be shunted into some weird "mgr", it should
just be inside the struct vfio_device subclass inside the variant
driver.

How to get the provisioning into the kernel prior to VFIO open, and
what kind of control object should exist for the hypervisor side of
the VF, I'm not sure. In mlx5 we used devlink and a netdev/rdma
"respresentor" for alot of this complex control stuff.

Jason

Tian, Kevin Sept. 27, 2024, 12:13 a.m. UTC | #30

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, September 27, 2024 6:57 AM
> 
> On Thu, Sep 26, 2024 at 09:55:28AM -0300, Jason Gunthorpe wrote:
> 
> > I'm not entirely sure yet what this whole 'mgr' component is actually
> > doing though.
> 
> Looking more closely I think some of it is certainly appropriate to be
> in vfio. Like when something opens the VFIO device it should allocate
> the PF device resources from FW, setup kernel structures and so on to
> allow the about to be opened VF to work. That is good VFIO topics. IOW
> if you don't open any VFIO devices there would be a minimal overhead
> 
> But that stuff shouldn't be shunted into some weird "mgr", it should
> just be inside the struct vfio_device subclass inside the variant
> driver.

yes. That's why I said earlier that the current way looks fine as long as
it won't expand to carry vendor specific provisioning interface. The
majority of the series is to allocate backend resource when the
device is opened. that's perfectly a VFIO topic.

Just the point of hardcoding a vGPU type now while stating the mgr
will supporting selecting a vGPU type later implies something not
clearly designed.

> 
> How to get the provisioning into the kernel prior to VFIO open, and
> what kind of control object should exist for the hypervisor side of
> the VF, I'm not sure. In mlx5 we used devlink and a netdev/rdma
> "respresentor" for alot of this complex control stuff.
> 

the mlx5 approach is what I envisioned. or the fwctl option is
also fine after it's merged.

Jason Gunthorpe Sept. 27, 2024, 12:51 p.m. UTC | #31

On Fri, Sep 27, 2024 at 12:42:56AM +0200, Danilo Krummrich wrote:
> On Thu, Sep 26, 2024 at 11:40:57AM -0300, Jason Gunthorpe wrote:
> > On Thu, Sep 26, 2024 at 02:54:38PM +0200, Greg KH wrote:
> > > 
> > > No, I do object to "we are ignoring the driver being proposed by the
> > > developers involved for this hardware by adding to the old one instead"
> > > which it seems like is happening here.
> > 
> > That is too harsh. We've consistently taken a community position that
> > OOT stuff doesn't matter, and yes that includes OOT stuff that people
> > we trust and respect are working on. Until it is ready for submission,
> > and ideally merged, it is an unknown quantity. Good well meaning
> > people routinely drop their projects, good projects run into
> > unexpected roadblocks, and life happens.
> 
> That's not the point -- at least it never was my point.
> 
> Upstream has set a strategy, and it's totally fine to raise concerns, discuss
> them, look for solutions, draw conclusions and do adjustments where needed.

We don't really do strategy in the kernel. This language is a bit
off putting. Linux runs on community consensus and if any strategy
exists it is reflected by the code actually merged.

When you say things like this it comes across as though you are
implying there are two tiers to the community. Ie those that set the
strategy and those that don't.

> But, we have to agree on a long term strategy and work towards the corresponding
> goals *together*.

I think we went over all the options already. IMHO the right one is
for nova and vfio to share some kind of core driver. The choice of
Rust for nova complicates planning this, but it doesn't mean anyone is
saying no to it.

My main point is when this switches from VFIO on nouveau to VFIO on
Nova is something that needs to be a mutual decision with the VFIO
side and user community as well.

> So, when you say things like "go do Nova, have fun", it really just sounds like
> as if you just want to do your own thing and ignore the existing upstream
> strategy instead of collaborate and shape it.

I am saying I have no interest in interfering with your
project. Really, I read your responses as though you feel Nova is
under attack and I'm trying hard to say that is not at all my
intention.

Jason

Danilo Krummrich Sept. 27, 2024, 2:22 p.m. UTC | #32

On Fri, Sep 27, 2024 at 09:51:15AM -0300, Jason Gunthorpe wrote:
> On Fri, Sep 27, 2024 at 12:42:56AM +0200, Danilo Krummrich wrote:
> > On Thu, Sep 26, 2024 at 11:40:57AM -0300, Jason Gunthorpe wrote:
> > > On Thu, Sep 26, 2024 at 02:54:38PM +0200, Greg KH wrote:
> > > > 
> > > > No, I do object to "we are ignoring the driver being proposed by the
> > > > developers involved for this hardware by adding to the old one instead"
> > > > which it seems like is happening here.
> > > 
> > > That is too harsh. We've consistently taken a community position that
> > > OOT stuff doesn't matter, and yes that includes OOT stuff that people
> > > we trust and respect are working on. Until it is ready for submission,
> > > and ideally merged, it is an unknown quantity. Good well meaning
> > > people routinely drop their projects, good projects run into
> > > unexpected roadblocks, and life happens.
> > 
> > That's not the point -- at least it never was my point.
> > 
> > Upstream has set a strategy, and it's totally fine to raise concerns, discuss
> > them, look for solutions, draw conclusions and do adjustments where needed.
> 
> We don't really do strategy in the kernel. This language is a bit
> off putting. Linux runs on community consensus and if any strategy
> exists it is reflected by the code actually merged.

We can also just call it "goals", but either way, of course maintainers set
goals for the components they maintain and hence have some sort of "strategy"
how they want to evolve their components, to solve existing or foreseeable
problems.

However, I agree that those things may be reevaluated based on community
feedback and consensus. And I'm happy to do that.

See, you're twisting my words and imply that we wouldn't look for community
consensus, while I'm *explicitly* asking you to let us do exactly that. I want
to find consensus on the long term goals that we all work on *together*, because
I don't want to end up with competing projects.

And I think it's reasonable to first consider the goals that have been set
already. Again, feel free to raise concerns and we'll discuss them and look for
solutions, but please not just ignore the existing goals.

> 
> When you say things like this it comes across as though you are
> implying there are two tiers to the community. Ie those that set the
> strategy and those that don't.

This isn't true, I just ask you to consider the goals that have been set
already, because we have been working on this already.

*We can discuss them*, but I indeed ask you to accept the current direction as a
baseline for discussion. I don't think this is unreasonable, is it?

> 
> > But, we have to agree on a long term strategy and work towards the corresponding
> > goals *together*.
> 
> I think we went over all the options already. IMHO the right one is
> for nova and vfio to share some kind of core driver. The choice of
> Rust for nova complicates planning this, but it doesn't mean anyone is
> saying no to it.

This is the problem, you're many steps ahead.

You should start with understanding why we want the core driver to be in Rust.
You then can raise your concerns about it and then we can discuss them and see
if we can find solutions / consensus.

But you're not even considering it, and instead start with a counter proposal.
This isn't acceptable to me.

> 
> My main point is when this switches from VFIO on nouveau to VFIO on
> Nova is something that needs to be a mutual decision with the VFIO
> side and user community as well.

To me it's important that we agree on the goals and work towards them together.
If we seriously do that, then the "when" should be trival to agree on.

> 
> > So, when you say things like "go do Nova, have fun", it really just sounds like
> > as if you just want to do your own thing and ignore the existing upstream
> > strategy instead of collaborate and shape it.
> 
> I am saying I have no interest in interfering with your
> project. Really, I read your responses as though you feel Nova is
> under attack and I'm trying hard to say that is not at all my
> intention.

I don't read this as Nova "being under attack" at all. I read it as "I don't
care about the goal to have the core driver in Rust, nor do I care about the
reasons you have for this.".

> 
> Jason
>

Jason Gunthorpe Sept. 27, 2024, 3:27 p.m. UTC | #33

On Fri, Sep 27, 2024 at 04:22:32PM +0200, Danilo Krummrich wrote:
> > When you say things like this it comes across as though you are
> > implying there are two tiers to the community. Ie those that set the
> > strategy and those that don't.
> 
> This isn't true, I just ask you to consider the goals that have been set
> already, because we have been working on this already.

Why do keep saying I haven't?

I have no intention of becoming involved in your project or
nouveau. My only interest here is to get an agreement that we can get
a VFIO driver (to improve the VFIO subsystem and community!) in the
near term on top of in-tree nouveau.

> > > But, we have to agree on a long term strategy and work towards the corresponding
> > > goals *together*.
> > 
> > I think we went over all the options already. IMHO the right one is
> > for nova and vfio to share some kind of core driver. The choice of
> > Rust for nova complicates planning this, but it doesn't mean anyone is
> > saying no to it.
> 
> This is the problem, you're many steps ahead.
> 
> You should start with understanding why we want the core driver to be in Rust.
> You then can raise your concerns about it and then we can discuss them and see
> if we can find solutions / consensus.

I don't want to debate with you about Nova. It is too far in the
future, and it doesn't intersect with anything I am doing.

> But you're not even considering it, and instead start with a counter proposal.
> This isn't acceptable to me.

I'm even agreeing to a transition into a core driver in Rust, someday,
when the full community can agree it is the right time.

What more do you want from me?

Jason

Danilo Krummrich Sept. 30, 2024, 3:59 p.m. UTC | #34

On Fri, Sep 27, 2024 at 12:27:24PM -0300, Jason Gunthorpe wrote:
> On Fri, Sep 27, 2024 at 04:22:32PM +0200, Danilo Krummrich wrote:
> > > When you say things like this it comes across as though you are
> > > implying there are two tiers to the community. Ie those that set the
> > > strategy and those that don't.
> > 
> > This isn't true, I just ask you to consider the goals that have been set
> > already, because we have been working on this already.
> 
> Why do keep saying I haven't?

Because I haven't seen you to acknowlege that the current direction we're moving
to is that we're trying to move away from Nouveau and start over with a new
GSP-only solution.

Instead you propose a huge architectural rework of Nouveau, extract a core
driver from Nouveau and make this the long term solution.

> 
> I have no intention of becoming involved in your project or
> nouveau. My only interest here is to get an agreement that we can get
> a VFIO driver (to improve the VFIO subsystem and community!) in the
> near term on top of in-tree nouveau.

Two aspects about this.

First, Nova isn't a different project in this sense, it's the continuation of
Nouveau to overcome several problems we have with Nouveau.

Second, of course you have the intention of becoming involved in the Nouveau /
Nova project. You ask for huge architectural changes of Nouveau, including new
interfaces for a VFIO driver on top. If that's not becoming involved what else
would it be?

> 
> > > > But, we have to agree on a long term strategy and work towards the corresponding
> > > > goals *together*.
> > > 
> > > I think we went over all the options already. IMHO the right one is
> > > for nova and vfio to share some kind of core driver. The choice of
> > > Rust for nova complicates planning this, but it doesn't mean anyone is
> > > saying no to it.
> > 
> > This is the problem, you're many steps ahead.
> > 
> > You should start with understanding why we want the core driver to be in Rust.
> > You then can raise your concerns about it and then we can discuss them and see
> > if we can find solutions / consensus.
> 
> I don't want to debate with you about Nova. It is too far in the
> future, and it doesn't intersect with anything I am doing.

Sure it does. Again, Nova is intended to be the continuation of Nouveau. So, if
you want to do a major rework in Nouveau (and hence become involved in the
project) we have to make sure that we progress things in the same direction.

How do you expect the project to be successful in the long term, if the involved
parties are not willing to agree at a direction and common goals for the
project?

Or is it that you are simply not interested in long term? Do you have reasons to
think that the problems we have with Nouveau just go away in the long term? Do
you plan to solve them within Nouveau? If so, how do you plan to do that?

> 
> > But you're not even considering it, and instead start with a counter proposal.
> > This isn't acceptable to me.
> 
> I'm even agreeing to a transition into a core driver in Rust, someday,
> when the full community can agree it is the right time.
> 
> What more do you want from me?

I want that the people involved in the project seriously discuss and align on
the direction and goals for the project in the long term and work towards them
together.

[RFC,00/29] Introduce NVIDIA GPU Virtualization (vGPU) Support

Message

Comments