mbox series

[drm-next,v5,00/14,RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI

Message ID 20230620004217.4700-1-dakr@redhat.com (mailing list archive)
Headers show
Series DRM GPUVA Manager & Nouveau VM_BIND UAPI | expand

Message

Danilo Krummrich June 20, 2023, 12:42 a.m. UTC
This patch series provides a new UAPI for the Nouveau driver in order to
support Vulkan features, such as sparse bindings and sparse residency.

Furthermore, with the DRM GPUVA manager it provides a new DRM core feature to
keep track of GPU virtual address (VA) mappings in a more generic way.

The DRM GPUVA manager is indented to help drivers implement userspace-manageable
GPU VA spaces in reference to the Vulkan API. In order to achieve this goal it
serves the following purposes in this context.

    1) Provide infrastructure to track GPU VA allocations and mappings,
       making use of the maple_tree.

    2) Generically connect GPU VA mappings to their backing buffers, in
       particular DRM GEM objects.

    3) Provide a common implementation to perform more complex mapping
       operations on the GPU VA space. In particular splitting and merging
       of GPU VA mappings, e.g. for intersecting mapping requests or partial
       unmap requests.

The new VM_BIND Nouveau UAPI build on top of the DRM GPUVA manager, itself
providing the following new interfaces.

    1) Initialize a GPU VA space via the new DRM_IOCTL_NOUVEAU_VM_INIT ioctl
       for UMDs to specify the portion of VA space managed by the kernel and
       userspace, respectively.

    2) Allocate and free a VA space region as well as bind and unbind memory
       to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl.

    3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.

Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, make use of the DRM
scheduler to queue jobs and support asynchronous processing with DRM syncobjs
as synchronization mechanism.

By default DRM_IOCTL_NOUVEAU_VM_BIND does synchronous processing,
DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.

The new VM_BIND UAPI for Nouveau makes also use of drm_exec (execution context
for GEM buffers) by Christian König. Since the patch implementing drm_exec was
not yet merged into drm-next it is part of this series, as well as a small fix
for this patch, which was found while testing this series.

This patch series is also available at [1].

There is a Mesa NVK merge request by Dave Airlie [2] implementing the
corresponding userspace parts for this series.

The Vulkan CTS test suite passes the sparse binding and sparse residency test
cases for the new UAPI together with Dave's Mesa work.

There are also some test cases in the igt-gpu-tools project [3] for the new UAPI
and hence the DRM GPU VA manager. However, most of them are testing the DRM GPU
VA manager's logic through Nouveau's new UAPI and should be considered just as
helper for implementation.

However, I absolutely intend to change those test cases to proper kunit test
cases for the DRM GPUVA manager, once and if we agree on it's usefulness and
design.

[1] https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/new-uapi-drm-next /
    https://gitlab.freedesktop.org/nouvelles/kernel/-/merge_requests/1
[2] https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/150/
[3] https://gitlab.freedesktop.org/dakr/igt-gpu-tools/-/tree/wip_nouveau_vm_bind

Changes in V2:
==============
  Nouveau:
    - Reworked the Nouveau VM_BIND UAPI to avoid memory allocations in fence
      signalling critical sections. Updates to the VA space are split up in three
      separate stages, where only the 2. stage executes in a fence signalling
      critical section:

        1. update the VA space, allocate new structures and page tables
        2. (un-)map the requested memory bindings
        3. free structures and page tables

    - Separated generic job scheduler code from specific job implementations.
    - Separated the EXEC and VM_BIND implementation of the UAPI.
    - Reworked the locking parts of the nvkm/vmm RAW interface, such that
      (un-)map operations can be executed in fence signalling critical sections.

  GPUVA Manager:
    - made drm_gpuva_regions optional for users of the GPUVA manager
    - allow NULL GEMs for drm_gpuva entries
    - swichted from drm_mm to maple_tree for track drm_gpuva / drm_gpuva_region
      entries
    - provide callbacks for users to allocate custom drm_gpuva_op structures to
      allow inheritance
    - added user bits to drm_gpuva_flags
    - added a prefetch operation type in order to support generating prefetch
      operations in the same way other operations generated
    - hand the responsibility for mutual exclusion for a GEM's
      drm_gpuva list to the user; simplified corresponding (un-)link functions

  Maple Tree:
    - I added two maple tree patches to the series, one to support custom tree
      walk macros and one to hand the locking responsibility to the user of the
      GPUVA manager without pre-defined lockdep checks.

Changes in V3:
==============
  Nouveau:
    - Reworked the Nouveau VM_BIND UAPI to do the job cleanup (including page
      table cleanup) within a workqueue rather than the job_free() callback of
      the scheduler itself. A job_free() callback can stall the execution (run()
      callback) of the next job in the queue. Since the page table cleanup
      requires to take the same locks as need to be taken for page table
      allocation, doing it directly in the job_free() callback would still
      violate the fence signalling critical path.
    - Separated Nouveau fence allocation and emit, such that we do not violate
      the fence signalling critical path in EXEC jobs.
    - Implement "regions" (for handling sparse mappings through PDEs and dual
      page tables) within Nouveau.
    - Drop the requirement for every mapping to be contained within a region.
    - Add necassary synchronization of VM_BIND job operation sequences in order
      to work around limitations in page table handling. This will be addressed
      in a future re-work of Nouveau's page table handling.
    - Fixed a couple of race conditions found through more testing. Thanks to
      Dave for consitently trying to break it. :-)

  GPUVA Manager:
    - Implement pre-allocation capabilities for tree modifications within fence
      signalling critical sections.
    - Implement accessors to to apply tree modification while walking the GPUVA
      tree in order to actually support processing of drm_gpuva_ops through
      callbacks in fence signalling critical sections rather than through
      pre-allocated operation lists.
    - Remove merging of GPUVAs; the kernel has limited to none knowlege about
      the semantics of mapping sequences. Hence, merging is purely speculative.
      It seems that gaining a significant (or at least a measurable) performance
      increase through merging is way more likely to happen when userspace is
      responsible for merging mappings up to the next larger page size if
      possible.
    - Since merging was removed, regions pretty much loose their right to exist.
      They might still be useful for handling dual page tables or similar
      mechanisms, but since Nouveau seems to be the only driver having a need
      for this for now, regions were removed from the GPUVA manager.
    - Fixed a couple of maple_tree related issues; thanks to Liam for helping me
      out.

Changes in V4:
==============
  Nouveau:
    - Refactored how specific VM_BIND and EXEC jobs are created and how their
      arguments are passed to the generic job implementation.
    - Fixed a UAF race condition where bind job ops could have been freed
      already while still waiting for a job cleanup to finish. This is due to
      in certain cases we need to wait for mappings actually being unmapped
      before creating sparse regions in the same area.
    - Re-based the code onto drm_exec v4 patch.

  GPUVA Manager:
    - Fixed a maple tree related bug when pre-allocating MA states.
      (Boris Brezillion)
    - Made struct drm_gpuva_fn_ops a const object in all occurrences.
      (Boris Brezillion)

Changes in V5:
==============
  Nouveau:
    - Link and unlink GPUVAs outside the fence signalling critical path in
      nouveau_uvmm_bind_job_submit() holding the dma-resv lock. Mutual exclusion
      of BO evicts causing mapping invalidation and regular mapping operations
      is ensured with dma-fences.

  GPUVA Manager:
    - Removed the separate GEMs GPUVA list lock. Link and unlink as well as
      iterating the GEM's GPUVA list should be protected with the GEM's dma-resv
      lock instead.
    - Renamed DRM_GPUVA_EVICTED flag to DRM_GPUVA_INVALIDATED. Mappings do not
      get eviced, they might get invalidated due to eviction.
    - Maple tree uses the 'unsinged long' type for node entries. While this
      works for GPU VA spaces larger than 32-bit on 64-bit kernel, the GPU VA
      space is limited to 32-bit on 32-bit kernels as well.
      As long as we do not have a 64-bit capable maple tree for 32-bit kernels,
      the GPU VA manager contains checks to throw warnings when GPU VA entries
      exceed the maple tree's storage capabilities.
    - Extended the Documentation and added example code as requested by Donald
      Robson.

Christian König (1):
  drm: execution context for GEM buffers v4

Danilo Krummrich (13):
  maple_tree: split up MA_STATE() macro
  drm: manager to keep track of GPUs VA mappings
  drm: debugfs: provide infrastructure to dump a DRM GPU VA space
  drm/nouveau: new VM_BIND uapi interfaces
  drm/nouveau: get vmm via nouveau_cli_vmm()
  drm/nouveau: bo: initialize GEM GPU VA interface
  drm/nouveau: move usercopy helpers to nouveau_drv.h
  drm/nouveau: fence: separate fence alloc and emit
  drm/nouveau: fence: fail to emit when fence context is killed
  drm/nouveau: chan: provide nouveau_channel_kill()
  drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm
  drm/nouveau: implement new VM_BIND uAPI
  drm/nouveau: debugfs: implement DRM GPU VA debugfs

 Documentation/gpu/driver-uapi.rst             |   11 +
 Documentation/gpu/drm-mm.rst                  |   54 +
 drivers/gpu/drm/Kconfig                       |    6 +
 drivers/gpu/drm/Makefile                      |    3 +
 drivers/gpu/drm/drm_debugfs.c                 |   41 +
 drivers/gpu/drm/drm_exec.c                    |  278 +++
 drivers/gpu/drm/drm_gem.c                     |    3 +
 drivers/gpu/drm/drm_gpuva_mgr.c               | 1971 ++++++++++++++++
 drivers/gpu/drm/nouveau/Kbuild                |    3 +
 drivers/gpu/drm/nouveau/Kconfig               |    2 +
 drivers/gpu/drm/nouveau/dispnv04/crtc.c       |    9 +-
 drivers/gpu/drm/nouveau/include/nvif/if000c.h |   26 +-
 drivers/gpu/drm/nouveau/include/nvif/vmm.h    |   19 +-
 .../gpu/drm/nouveau/include/nvkm/subdev/mmu.h |   20 +-
 drivers/gpu/drm/nouveau/nouveau_abi16.c       |   24 +
 drivers/gpu/drm/nouveau/nouveau_abi16.h       |    1 +
 drivers/gpu/drm/nouveau/nouveau_bo.c          |  204 +-
 drivers/gpu/drm/nouveau/nouveau_bo.h          |    2 +-
 drivers/gpu/drm/nouveau/nouveau_chan.c        |   22 +-
 drivers/gpu/drm/nouveau/nouveau_chan.h        |    1 +
 drivers/gpu/drm/nouveau/nouveau_debugfs.c     |   39 +
 drivers/gpu/drm/nouveau/nouveau_dmem.c        |    9 +-
 drivers/gpu/drm/nouveau/nouveau_drm.c         |   27 +-
 drivers/gpu/drm/nouveau/nouveau_drv.h         |   94 +-
 drivers/gpu/drm/nouveau/nouveau_exec.c        |  418 ++++
 drivers/gpu/drm/nouveau/nouveau_exec.h        |   54 +
 drivers/gpu/drm/nouveau/nouveau_fence.c       |   23 +-
 drivers/gpu/drm/nouveau/nouveau_fence.h       |    5 +-
 drivers/gpu/drm/nouveau/nouveau_gem.c         |   62 +-
 drivers/gpu/drm/nouveau/nouveau_mem.h         |    5 +
 drivers/gpu/drm/nouveau/nouveau_prime.c       |    2 +-
 drivers/gpu/drm/nouveau/nouveau_sched.c       |  461 ++++
 drivers/gpu/drm/nouveau/nouveau_sched.h       |  123 +
 drivers/gpu/drm/nouveau/nouveau_svm.c         |    2 +-
 drivers/gpu/drm/nouveau/nouveau_uvmm.c        | 1979 +++++++++++++++++
 drivers/gpu/drm/nouveau/nouveau_uvmm.h        |  107 +
 drivers/gpu/drm/nouveau/nouveau_vmm.c         |    4 +-
 drivers/gpu/drm/nouveau/nvif/vmm.c            |  100 +-
 .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c    |  213 +-
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c |  197 +-
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |   25 +
 .../drm/nouveau/nvkm/subdev/mmu/vmmgf100.c    |   16 +-
 .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c    |   16 +-
 .../gpu/drm/nouveau/nvkm/subdev/mmu/vmmnv50.c |   27 +-
 include/drm/drm_debugfs.h                     |   25 +
 include/drm/drm_drv.h                         |    6 +
 include/drm/drm_exec.h                        |  119 +
 include/drm/drm_gem.h                         |   52 +
 include/drm/drm_gpuva_mgr.h                   |  682 ++++++
 include/linux/maple_tree.h                    |    7 +-
 include/uapi/drm/nouveau_drm.h                |  209 ++
 51 files changed, 7566 insertions(+), 242 deletions(-)
 create mode 100644 drivers/gpu/drm/drm_exec.c
 create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.c
 create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.h
 create mode 100644 include/drm/drm_exec.h
 create mode 100644 include/drm/drm_gpuva_mgr.h


base-commit: 2222dcb0775d36de28992f56455ab3967b30d380

Comments

Danilo Krummrich June 19, 2023, 11:06 p.m. UTC | #1
Hi Donald,

forgot to add your email address to the patch series - sorry about that.

This series (v5) contains the Documentation changes you requested.

- Danilo

On 6/20/23 02:42, Danilo Krummrich wrote:
> This patch series provides a new UAPI for the Nouveau driver in order to
> support Vulkan features, such as sparse bindings and sparse residency.
> 
> Furthermore, with the DRM GPUVA manager it provides a new DRM core feature to
> keep track of GPU virtual address (VA) mappings in a more generic way.
> 
> The DRM GPUVA manager is indented to help drivers implement userspace-manageable
> GPU VA spaces in reference to the Vulkan API. In order to achieve this goal it
> serves the following purposes in this context.
> 
>      1) Provide infrastructure to track GPU VA allocations and mappings,
>         making use of the maple_tree.
> 
>      2) Generically connect GPU VA mappings to their backing buffers, in
>         particular DRM GEM objects.
> 
>      3) Provide a common implementation to perform more complex mapping
>         operations on the GPU VA space. In particular splitting and merging
>         of GPU VA mappings, e.g. for intersecting mapping requests or partial
>         unmap requests.
> 
> The new VM_BIND Nouveau UAPI build on top of the DRM GPUVA manager, itself
> providing the following new interfaces.
> 
>      1) Initialize a GPU VA space via the new DRM_IOCTL_NOUVEAU_VM_INIT ioctl
>         for UMDs to specify the portion of VA space managed by the kernel and
>         userspace, respectively.
> 
>      2) Allocate and free a VA space region as well as bind and unbind memory
>         to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
> 
>      3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.
> 
> Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, make use of the DRM
> scheduler to queue jobs and support asynchronous processing with DRM syncobjs
> as synchronization mechanism.
> 
> By default DRM_IOCTL_NOUVEAU_VM_BIND does synchronous processing,
> DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.
> 
> The new VM_BIND UAPI for Nouveau makes also use of drm_exec (execution context
> for GEM buffers) by Christian König. Since the patch implementing drm_exec was
> not yet merged into drm-next it is part of this series, as well as a small fix
> for this patch, which was found while testing this series.
> 
> This patch series is also available at [1].
> 
> There is a Mesa NVK merge request by Dave Airlie [2] implementing the
> corresponding userspace parts for this series.
> 
> The Vulkan CTS test suite passes the sparse binding and sparse residency test
> cases for the new UAPI together with Dave's Mesa work.
> 
> There are also some test cases in the igt-gpu-tools project [3] for the new UAPI
> and hence the DRM GPU VA manager. However, most of them are testing the DRM GPU
> VA manager's logic through Nouveau's new UAPI and should be considered just as
> helper for implementation.
> 
> However, I absolutely intend to change those test cases to proper kunit test
> cases for the DRM GPUVA manager, once and if we agree on it's usefulness and
> design.
> 
> [1] https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/new-uapi-drm-next /
>      https://gitlab.freedesktop.org/nouvelles/kernel/-/merge_requests/1
> [2] https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/150/
> [3] https://gitlab.freedesktop.org/dakr/igt-gpu-tools/-/tree/wip_nouveau_vm_bind
> 
> Changes in V2:
> ==============
>    Nouveau:
>      - Reworked the Nouveau VM_BIND UAPI to avoid memory allocations in fence
>        signalling critical sections. Updates to the VA space are split up in three
>        separate stages, where only the 2. stage executes in a fence signalling
>        critical section:
> 
>          1. update the VA space, allocate new structures and page tables
>          2. (un-)map the requested memory bindings
>          3. free structures and page tables
> 
>      - Separated generic job scheduler code from specific job implementations.
>      - Separated the EXEC and VM_BIND implementation of the UAPI.
>      - Reworked the locking parts of the nvkm/vmm RAW interface, such that
>        (un-)map operations can be executed in fence signalling critical sections.
> 
>    GPUVA Manager:
>      - made drm_gpuva_regions optional for users of the GPUVA manager
>      - allow NULL GEMs for drm_gpuva entries
>      - swichted from drm_mm to maple_tree for track drm_gpuva / drm_gpuva_region
>        entries
>      - provide callbacks for users to allocate custom drm_gpuva_op structures to
>        allow inheritance
>      - added user bits to drm_gpuva_flags
>      - added a prefetch operation type in order to support generating prefetch
>        operations in the same way other operations generated
>      - hand the responsibility for mutual exclusion for a GEM's
>        drm_gpuva list to the user; simplified corresponding (un-)link functions
> 
>    Maple Tree:
>      - I added two maple tree patches to the series, one to support custom tree
>        walk macros and one to hand the locking responsibility to the user of the
>        GPUVA manager without pre-defined lockdep checks.
> 
> Changes in V3:
> ==============
>    Nouveau:
>      - Reworked the Nouveau VM_BIND UAPI to do the job cleanup (including page
>        table cleanup) within a workqueue rather than the job_free() callback of
>        the scheduler itself. A job_free() callback can stall the execution (run()
>        callback) of the next job in the queue. Since the page table cleanup
>        requires to take the same locks as need to be taken for page table
>        allocation, doing it directly in the job_free() callback would still
>        violate the fence signalling critical path.
>      - Separated Nouveau fence allocation and emit, such that we do not violate
>        the fence signalling critical path in EXEC jobs.
>      - Implement "regions" (for handling sparse mappings through PDEs and dual
>        page tables) within Nouveau.
>      - Drop the requirement for every mapping to be contained within a region.
>      - Add necassary synchronization of VM_BIND job operation sequences in order
>        to work around limitations in page table handling. This will be addressed
>        in a future re-work of Nouveau's page table handling.
>      - Fixed a couple of race conditions found through more testing. Thanks to
>        Dave for consitently trying to break it. :-)
> 
>    GPUVA Manager:
>      - Implement pre-allocation capabilities for tree modifications within fence
>        signalling critical sections.
>      - Implement accessors to to apply tree modification while walking the GPUVA
>        tree in order to actually support processing of drm_gpuva_ops through
>        callbacks in fence signalling critical sections rather than through
>        pre-allocated operation lists.
>      - Remove merging of GPUVAs; the kernel has limited to none knowlege about
>        the semantics of mapping sequences. Hence, merging is purely speculative.
>        It seems that gaining a significant (or at least a measurable) performance
>        increase through merging is way more likely to happen when userspace is
>        responsible for merging mappings up to the next larger page size if
>        possible.
>      - Since merging was removed, regions pretty much loose their right to exist.
>        They might still be useful for handling dual page tables or similar
>        mechanisms, but since Nouveau seems to be the only driver having a need
>        for this for now, regions were removed from the GPUVA manager.
>      - Fixed a couple of maple_tree related issues; thanks to Liam for helping me
>        out.
> 
> Changes in V4:
> ==============
>    Nouveau:
>      - Refactored how specific VM_BIND and EXEC jobs are created and how their
>        arguments are passed to the generic job implementation.
>      - Fixed a UAF race condition where bind job ops could have been freed
>        already while still waiting for a job cleanup to finish. This is due to
>        in certain cases we need to wait for mappings actually being unmapped
>        before creating sparse regions in the same area.
>      - Re-based the code onto drm_exec v4 patch.
> 
>    GPUVA Manager:
>      - Fixed a maple tree related bug when pre-allocating MA states.
>        (Boris Brezillion)
>      - Made struct drm_gpuva_fn_ops a const object in all occurrences.
>        (Boris Brezillion)
> 
> Changes in V5:
> ==============
>    Nouveau:
>      - Link and unlink GPUVAs outside the fence signalling critical path in
>        nouveau_uvmm_bind_job_submit() holding the dma-resv lock. Mutual exclusion
>        of BO evicts causing mapping invalidation and regular mapping operations
>        is ensured with dma-fences.
> 
>    GPUVA Manager:
>      - Removed the separate GEMs GPUVA list lock. Link and unlink as well as
>        iterating the GEM's GPUVA list should be protected with the GEM's dma-resv
>        lock instead.
>      - Renamed DRM_GPUVA_EVICTED flag to DRM_GPUVA_INVALIDATED. Mappings do not
>        get eviced, they might get invalidated due to eviction.
>      - Maple tree uses the 'unsinged long' type for node entries. While this
>        works for GPU VA spaces larger than 32-bit on 64-bit kernel, the GPU VA
>        space is limited to 32-bit on 32-bit kernels as well.
>        As long as we do not have a 64-bit capable maple tree for 32-bit kernels,
>        the GPU VA manager contains checks to throw warnings when GPU VA entries
>        exceed the maple tree's storage capabilities.
>      - Extended the Documentation and added example code as requested by Donald
>        Robson.
> 
> Christian König (1):
>    drm: execution context for GEM buffers v4
> 
> Danilo Krummrich (13):
>    maple_tree: split up MA_STATE() macro
>    drm: manager to keep track of GPUs VA mappings
>    drm: debugfs: provide infrastructure to dump a DRM GPU VA space
>    drm/nouveau: new VM_BIND uapi interfaces
>    drm/nouveau: get vmm via nouveau_cli_vmm()
>    drm/nouveau: bo: initialize GEM GPU VA interface
>    drm/nouveau: move usercopy helpers to nouveau_drv.h
>    drm/nouveau: fence: separate fence alloc and emit
>    drm/nouveau: fence: fail to emit when fence context is killed
>    drm/nouveau: chan: provide nouveau_channel_kill()
>    drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm
>    drm/nouveau: implement new VM_BIND uAPI
>    drm/nouveau: debugfs: implement DRM GPU VA debugfs
> 
>   Documentation/gpu/driver-uapi.rst             |   11 +
>   Documentation/gpu/drm-mm.rst                  |   54 +
>   drivers/gpu/drm/Kconfig                       |    6 +
>   drivers/gpu/drm/Makefile                      |    3 +
>   drivers/gpu/drm/drm_debugfs.c                 |   41 +
>   drivers/gpu/drm/drm_exec.c                    |  278 +++
>   drivers/gpu/drm/drm_gem.c                     |    3 +
>   drivers/gpu/drm/drm_gpuva_mgr.c               | 1971 ++++++++++++++++
>   drivers/gpu/drm/nouveau/Kbuild                |    3 +
>   drivers/gpu/drm/nouveau/Kconfig               |    2 +
>   drivers/gpu/drm/nouveau/dispnv04/crtc.c       |    9 +-
>   drivers/gpu/drm/nouveau/include/nvif/if000c.h |   26 +-
>   drivers/gpu/drm/nouveau/include/nvif/vmm.h    |   19 +-
>   .../gpu/drm/nouveau/include/nvkm/subdev/mmu.h |   20 +-
>   drivers/gpu/drm/nouveau/nouveau_abi16.c       |   24 +
>   drivers/gpu/drm/nouveau/nouveau_abi16.h       |    1 +
>   drivers/gpu/drm/nouveau/nouveau_bo.c          |  204 +-
>   drivers/gpu/drm/nouveau/nouveau_bo.h          |    2 +-
>   drivers/gpu/drm/nouveau/nouveau_chan.c        |   22 +-
>   drivers/gpu/drm/nouveau/nouveau_chan.h        |    1 +
>   drivers/gpu/drm/nouveau/nouveau_debugfs.c     |   39 +
>   drivers/gpu/drm/nouveau/nouveau_dmem.c        |    9 +-
>   drivers/gpu/drm/nouveau/nouveau_drm.c         |   27 +-
>   drivers/gpu/drm/nouveau/nouveau_drv.h         |   94 +-
>   drivers/gpu/drm/nouveau/nouveau_exec.c        |  418 ++++
>   drivers/gpu/drm/nouveau/nouveau_exec.h        |   54 +
>   drivers/gpu/drm/nouveau/nouveau_fence.c       |   23 +-
>   drivers/gpu/drm/nouveau/nouveau_fence.h       |    5 +-
>   drivers/gpu/drm/nouveau/nouveau_gem.c         |   62 +-
>   drivers/gpu/drm/nouveau/nouveau_mem.h         |    5 +
>   drivers/gpu/drm/nouveau/nouveau_prime.c       |    2 +-
>   drivers/gpu/drm/nouveau/nouveau_sched.c       |  461 ++++
>   drivers/gpu/drm/nouveau/nouveau_sched.h       |  123 +
>   drivers/gpu/drm/nouveau/nouveau_svm.c         |    2 +-
>   drivers/gpu/drm/nouveau/nouveau_uvmm.c        | 1979 +++++++++++++++++
>   drivers/gpu/drm/nouveau/nouveau_uvmm.h        |  107 +
>   drivers/gpu/drm/nouveau/nouveau_vmm.c         |    4 +-
>   drivers/gpu/drm/nouveau/nvif/vmm.c            |  100 +-
>   .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c    |  213 +-
>   drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c |  197 +-
>   drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |   25 +
>   .../drm/nouveau/nvkm/subdev/mmu/vmmgf100.c    |   16 +-
>   .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c    |   16 +-
>   .../gpu/drm/nouveau/nvkm/subdev/mmu/vmmnv50.c |   27 +-
>   include/drm/drm_debugfs.h                     |   25 +
>   include/drm/drm_drv.h                         |    6 +
>   include/drm/drm_exec.h                        |  119 +
>   include/drm/drm_gem.h                         |   52 +
>   include/drm/drm_gpuva_mgr.h                   |  682 ++++++
>   include/linux/maple_tree.h                    |    7 +-
>   include/uapi/drm/nouveau_drm.h                |  209 ++
>   51 files changed, 7566 insertions(+), 242 deletions(-)
>   create mode 100644 drivers/gpu/drm/drm_exec.c
>   create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h
>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.c
>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.h
>   create mode 100644 include/drm/drm_exec.h
>   create mode 100644 include/drm/drm_gpuva_mgr.h
> 
> 
> base-commit: 2222dcb0775d36de28992f56455ab3967b30d380
Dave Airlie June 20, 2023, 4:05 a.m. UTC | #2
Since this is feature is nouveau only currently and doesn't disturb
the current nouveau code paths, I'd like to try and get this work in
tree so other drivers can work from it.

If there are any major objections to this, I'm happy to pull it back
out again, but I'd like to get some acks/rb in the next couple of days
in order to land some of it.

Dave.


>
> forgot to add your email address to the patch series - sorry about that.
>
> This series (v5) contains the Documentation changes you requested.
>
> - Danilo
>
> On 6/20/23 02:42, Danilo Krummrich wrote:
> > This patch series provides a new UAPI for the Nouveau driver in order to
> > support Vulkan features, such as sparse bindings and sparse residency.
> >
> > Furthermore, with the DRM GPUVA manager it provides a new DRM core feature to
> > keep track of GPU virtual address (VA) mappings in a more generic way.
> >
> > The DRM GPUVA manager is indented to help drivers implement userspace-manageable
> > GPU VA spaces in reference to the Vulkan API. In order to achieve this goal it
> > serves the following purposes in this context.
> >
> >      1) Provide infrastructure to track GPU VA allocations and mappings,
> >         making use of the maple_tree.
> >
> >      2) Generically connect GPU VA mappings to their backing buffers, in
> >         particular DRM GEM objects.
> >
> >      3) Provide a common implementation to perform more complex mapping
> >         operations on the GPU VA space. In particular splitting and merging
> >         of GPU VA mappings, e.g. for intersecting mapping requests or partial
> >         unmap requests.
> >
> > The new VM_BIND Nouveau UAPI build on top of the DRM GPUVA manager, itself
> > providing the following new interfaces.
> >
> >      1) Initialize a GPU VA space via the new DRM_IOCTL_NOUVEAU_VM_INIT ioctl
> >         for UMDs to specify the portion of VA space managed by the kernel and
> >         userspace, respectively.
> >
> >      2) Allocate and free a VA space region as well as bind and unbind memory
> >         to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
> >
> >      3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.
> >
> > Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, make use of the DRM
> > scheduler to queue jobs and support asynchronous processing with DRM syncobjs
> > as synchronization mechanism.
> >
> > By default DRM_IOCTL_NOUVEAU_VM_BIND does synchronous processing,
> > DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.
> >
> > The new VM_BIND UAPI for Nouveau makes also use of drm_exec (execution context
> > for GEM buffers) by Christian König. Since the patch implementing drm_exec was
> > not yet merged into drm-next it is part of this series, as well as a small fix
> > for this patch, which was found while testing this series.
> >
> > This patch series is also available at [1].
> >
> > There is a Mesa NVK merge request by Dave Airlie [2] implementing the
> > corresponding userspace parts for this series.
> >
> > The Vulkan CTS test suite passes the sparse binding and sparse residency test
> > cases for the new UAPI together with Dave's Mesa work.
> >
> > There are also some test cases in the igt-gpu-tools project [3] for the new UAPI
> > and hence the DRM GPU VA manager. However, most of them are testing the DRM GPU
> > VA manager's logic through Nouveau's new UAPI and should be considered just as
> > helper for implementation.
> >
> > However, I absolutely intend to change those test cases to proper kunit test
> > cases for the DRM GPUVA manager, once and if we agree on it's usefulness and
> > design.
> >
> > [1] https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/new-uapi-drm-next /
> >      https://gitlab.freedesktop.org/nouvelles/kernel/-/merge_requests/1
> > [2] https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/150/
> > [3] https://gitlab.freedesktop.org/dakr/igt-gpu-tools/-/tree/wip_nouveau_vm_bind
> >
> > Changes in V2:
> > ==============
> >    Nouveau:
> >      - Reworked the Nouveau VM_BIND UAPI to avoid memory allocations in fence
> >        signalling critical sections. Updates to the VA space are split up in three
> >        separate stages, where only the 2. stage executes in a fence signalling
> >        critical section:
> >
> >          1. update the VA space, allocate new structures and page tables
> >          2. (un-)map the requested memory bindings
> >          3. free structures and page tables
> >
> >      - Separated generic job scheduler code from specific job implementations.
> >      - Separated the EXEC and VM_BIND implementation of the UAPI.
> >      - Reworked the locking parts of the nvkm/vmm RAW interface, such that
> >        (un-)map operations can be executed in fence signalling critical sections.
> >
> >    GPUVA Manager:
> >      - made drm_gpuva_regions optional for users of the GPUVA manager
> >      - allow NULL GEMs for drm_gpuva entries
> >      - swichted from drm_mm to maple_tree for track drm_gpuva / drm_gpuva_region
> >        entries
> >      - provide callbacks for users to allocate custom drm_gpuva_op structures to
> >        allow inheritance
> >      - added user bits to drm_gpuva_flags
> >      - added a prefetch operation type in order to support generating prefetch
> >        operations in the same way other operations generated
> >      - hand the responsibility for mutual exclusion for a GEM's
> >        drm_gpuva list to the user; simplified corresponding (un-)link functions
> >
> >    Maple Tree:
> >      - I added two maple tree patches to the series, one to support custom tree
> >        walk macros and one to hand the locking responsibility to the user of the
> >        GPUVA manager without pre-defined lockdep checks.
> >
> > Changes in V3:
> > ==============
> >    Nouveau:
> >      - Reworked the Nouveau VM_BIND UAPI to do the job cleanup (including page
> >        table cleanup) within a workqueue rather than the job_free() callback of
> >        the scheduler itself. A job_free() callback can stall the execution (run()
> >        callback) of the next job in the queue. Since the page table cleanup
> >        requires to take the same locks as need to be taken for page table
> >        allocation, doing it directly in the job_free() callback would still
> >        violate the fence signalling critical path.
> >      - Separated Nouveau fence allocation and emit, such that we do not violate
> >        the fence signalling critical path in EXEC jobs.
> >      - Implement "regions" (for handling sparse mappings through PDEs and dual
> >        page tables) within Nouveau.
> >      - Drop the requirement for every mapping to be contained within a region.
> >      - Add necassary synchronization of VM_BIND job operation sequences in order
> >        to work around limitations in page table handling. This will be addressed
> >        in a future re-work of Nouveau's page table handling.
> >      - Fixed a couple of race conditions found through more testing. Thanks to
> >        Dave for consitently trying to break it. :-)
> >
> >    GPUVA Manager:
> >      - Implement pre-allocation capabilities for tree modifications within fence
> >        signalling critical sections.
> >      - Implement accessors to to apply tree modification while walking the GPUVA
> >        tree in order to actually support processing of drm_gpuva_ops through
> >        callbacks in fence signalling critical sections rather than through
> >        pre-allocated operation lists.
> >      - Remove merging of GPUVAs; the kernel has limited to none knowlege about
> >        the semantics of mapping sequences. Hence, merging is purely speculative.
> >        It seems that gaining a significant (or at least a measurable) performance
> >        increase through merging is way more likely to happen when userspace is
> >        responsible for merging mappings up to the next larger page size if
> >        possible.
> >      - Since merging was removed, regions pretty much loose their right to exist.
> >        They might still be useful for handling dual page tables or similar
> >        mechanisms, but since Nouveau seems to be the only driver having a need
> >        for this for now, regions were removed from the GPUVA manager.
> >      - Fixed a couple of maple_tree related issues; thanks to Liam for helping me
> >        out.
> >
> > Changes in V4:
> > ==============
> >    Nouveau:
> >      - Refactored how specific VM_BIND and EXEC jobs are created and how their
> >        arguments are passed to the generic job implementation.
> >      - Fixed a UAF race condition where bind job ops could have been freed
> >        already while still waiting for a job cleanup to finish. This is due to
> >        in certain cases we need to wait for mappings actually being unmapped
> >        before creating sparse regions in the same area.
> >      - Re-based the code onto drm_exec v4 patch.
> >
> >    GPUVA Manager:
> >      - Fixed a maple tree related bug when pre-allocating MA states.
> >        (Boris Brezillion)
> >      - Made struct drm_gpuva_fn_ops a const object in all occurrences.
> >        (Boris Brezillion)
> >
> > Changes in V5:
> > ==============
> >    Nouveau:
> >      - Link and unlink GPUVAs outside the fence signalling critical path in
> >        nouveau_uvmm_bind_job_submit() holding the dma-resv lock. Mutual exclusion
> >        of BO evicts causing mapping invalidation and regular mapping operations
> >        is ensured with dma-fences.
> >
> >    GPUVA Manager:
> >      - Removed the separate GEMs GPUVA list lock. Link and unlink as well as
> >        iterating the GEM's GPUVA list should be protected with the GEM's dma-resv
> >        lock instead.
> >      - Renamed DRM_GPUVA_EVICTED flag to DRM_GPUVA_INVALIDATED. Mappings do not
> >        get eviced, they might get invalidated due to eviction.
> >      - Maple tree uses the 'unsinged long' type for node entries. While this
> >        works for GPU VA spaces larger than 32-bit on 64-bit kernel, the GPU VA
> >        space is limited to 32-bit on 32-bit kernels as well.
> >        As long as we do not have a 64-bit capable maple tree for 32-bit kernels,
> >        the GPU VA manager contains checks to throw warnings when GPU VA entries
> >        exceed the maple tree's storage capabilities.
> >      - Extended the Documentation and added example code as requested by Donald
> >        Robson.
> >
> > Christian König (1):
> >    drm: execution context for GEM buffers v4
> >
> > Danilo Krummrich (13):
> >    maple_tree: split up MA_STATE() macro
> >    drm: manager to keep track of GPUs VA mappings
> >    drm: debugfs: provide infrastructure to dump a DRM GPU VA space
> >    drm/nouveau: new VM_BIND uapi interfaces
> >    drm/nouveau: get vmm via nouveau_cli_vmm()
> >    drm/nouveau: bo: initialize GEM GPU VA interface
> >    drm/nouveau: move usercopy helpers to nouveau_drv.h
> >    drm/nouveau: fence: separate fence alloc and emit
> >    drm/nouveau: fence: fail to emit when fence context is killed
> >    drm/nouveau: chan: provide nouveau_channel_kill()
> >    drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm
> >    drm/nouveau: implement new VM_BIND uAPI
> >    drm/nouveau: debugfs: implement DRM GPU VA debugfs
> >
> >   Documentation/gpu/driver-uapi.rst             |   11 +
> >   Documentation/gpu/drm-mm.rst                  |   54 +
> >   drivers/gpu/drm/Kconfig                       |    6 +
> >   drivers/gpu/drm/Makefile                      |    3 +
> >   drivers/gpu/drm/drm_debugfs.c                 |   41 +
> >   drivers/gpu/drm/drm_exec.c                    |  278 +++
> >   drivers/gpu/drm/drm_gem.c                     |    3 +
> >   drivers/gpu/drm/drm_gpuva_mgr.c               | 1971 ++++++++++++++++
> >   drivers/gpu/drm/nouveau/Kbuild                |    3 +
> >   drivers/gpu/drm/nouveau/Kconfig               |    2 +
> >   drivers/gpu/drm/nouveau/dispnv04/crtc.c       |    9 +-
> >   drivers/gpu/drm/nouveau/include/nvif/if000c.h |   26 +-
> >   drivers/gpu/drm/nouveau/include/nvif/vmm.h    |   19 +-
> >   .../gpu/drm/nouveau/include/nvkm/subdev/mmu.h |   20 +-
> >   drivers/gpu/drm/nouveau/nouveau_abi16.c       |   24 +
> >   drivers/gpu/drm/nouveau/nouveau_abi16.h       |    1 +
> >   drivers/gpu/drm/nouveau/nouveau_bo.c          |  204 +-
> >   drivers/gpu/drm/nouveau/nouveau_bo.h          |    2 +-
> >   drivers/gpu/drm/nouveau/nouveau_chan.c        |   22 +-
> >   drivers/gpu/drm/nouveau/nouveau_chan.h        |    1 +
> >   drivers/gpu/drm/nouveau/nouveau_debugfs.c     |   39 +
> >   drivers/gpu/drm/nouveau/nouveau_dmem.c        |    9 +-
> >   drivers/gpu/drm/nouveau/nouveau_drm.c         |   27 +-
> >   drivers/gpu/drm/nouveau/nouveau_drv.h         |   94 +-
> >   drivers/gpu/drm/nouveau/nouveau_exec.c        |  418 ++++
> >   drivers/gpu/drm/nouveau/nouveau_exec.h        |   54 +
> >   drivers/gpu/drm/nouveau/nouveau_fence.c       |   23 +-
> >   drivers/gpu/drm/nouveau/nouveau_fence.h       |    5 +-
> >   drivers/gpu/drm/nouveau/nouveau_gem.c         |   62 +-
> >   drivers/gpu/drm/nouveau/nouveau_mem.h         |    5 +
> >   drivers/gpu/drm/nouveau/nouveau_prime.c       |    2 +-
> >   drivers/gpu/drm/nouveau/nouveau_sched.c       |  461 ++++
> >   drivers/gpu/drm/nouveau/nouveau_sched.h       |  123 +
> >   drivers/gpu/drm/nouveau/nouveau_svm.c         |    2 +-
> >   drivers/gpu/drm/nouveau/nouveau_uvmm.c        | 1979 +++++++++++++++++
> >   drivers/gpu/drm/nouveau/nouveau_uvmm.h        |  107 +
> >   drivers/gpu/drm/nouveau/nouveau_vmm.c         |    4 +-
> >   drivers/gpu/drm/nouveau/nvif/vmm.c            |  100 +-
> >   .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c    |  213 +-
> >   drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c |  197 +-
> >   drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |   25 +
> >   .../drm/nouveau/nvkm/subdev/mmu/vmmgf100.c    |   16 +-
> >   .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c    |   16 +-
> >   .../gpu/drm/nouveau/nvkm/subdev/mmu/vmmnv50.c |   27 +-
> >   include/drm/drm_debugfs.h                     |   25 +
> >   include/drm/drm_drv.h                         |    6 +
> >   include/drm/drm_exec.h                        |  119 +
> >   include/drm/drm_gem.h                         |   52 +
> >   include/drm/drm_gpuva_mgr.h                   |  682 ++++++
> >   include/linux/maple_tree.h                    |    7 +-
> >   include/uapi/drm/nouveau_drm.h                |  209 ++
> >   51 files changed, 7566 insertions(+), 242 deletions(-)
> >   create mode 100644 drivers/gpu/drm/drm_exec.c
> >   create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
> >   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
> >   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
> >   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
> >   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h
> >   create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.c
> >   create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.h
> >   create mode 100644 include/drm/drm_exec.h
> >   create mode 100644 include/drm/drm_gpuva_mgr.h
> >
> >
> > base-commit: 2222dcb0775d36de28992f56455ab3967b30d380
>
Oded Gabbay June 20, 2023, 7:06 a.m. UTC | #3
On Tue, Jun 20, 2023 at 7:05 AM Dave Airlie <airlied@gmail.com> wrote:
>
> Since this is feature is nouveau only currently and doesn't disturb
> the current nouveau code paths, I'd like to try and get this work in
> tree so other drivers can work from it.
>
> If there are any major objections to this, I'm happy to pull it back
> out again, but I'd like to get some acks/rb in the next couple of days
> in order to land some of it.
>
> Dave.
>
>
> >
> > forgot to add your email address to the patch series - sorry about that.
> >
> > This series (v5) contains the Documentation changes you requested.
> >
> > - Danilo
> >
> > On 6/20/23 02:42, Danilo Krummrich wrote:
> > > This patch series provides a new UAPI for the Nouveau driver in order to
> > > support Vulkan features, such as sparse bindings and sparse residency.
> > >
> > > Furthermore, with the DRM GPUVA manager it provides a new DRM core feature to
> > > keep track of GPU virtual address (VA) mappings in a more generic way.
> > >
> > > The DRM GPUVA manager is indented to help drivers implement userspace-manageable
> > > GPU VA spaces in reference to the Vulkan API. In order to achieve this goal it
> > > serves the following purposes in this context.
> > >
> > >      1) Provide infrastructure to track GPU VA allocations and mappings,
> > >         making use of the maple_tree.
> > >
> > >      2) Generically connect GPU VA mappings to their backing buffers, in
> > >         particular DRM GEM objects.
Will this manager be able to connect GPU VA mappings to host memory
allocations (aka user pointers) ?

I only skimmed over the uapi definitions, but from that quick glance I
saw you can only pass a (gem) handle to the vm bind uapi.

I think it is an important feature because you don't want to have two
GPU VA managers running in your driver (if that's even possible).
Maybe we should at least try to make sure the uapi is/will be
compatible with such an extension.

Oded

Oded

> > >
> > >      3) Provide a common implementation to perform more complex mapping
> > >         operations on the GPU VA space. In particular splitting and merging
> > >         of GPU VA mappings, e.g. for intersecting mapping requests or partial
> > >         unmap requests.
> > >
> > > The new VM_BIND Nouveau UAPI build on top of the DRM GPUVA manager, itself
> > > providing the following new interfaces.
> > >
> > >      1) Initialize a GPU VA space via the new DRM_IOCTL_NOUVEAU_VM_INIT ioctl
> > >         for UMDs to specify the portion of VA space managed by the kernel and
> > >         userspace, respectively.
> > >
> > >      2) Allocate and free a VA space region as well as bind and unbind memory
> > >         to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
> > >
> > >      3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.
> > >
> > > Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, make use of the DRM
> > > scheduler to queue jobs and support asynchronous processing with DRM syncobjs
> > > as synchronization mechanism.
> > >
> > > By default DRM_IOCTL_NOUVEAU_VM_BIND does synchronous processing,
> > > DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.
> > >
> > > The new VM_BIND UAPI for Nouveau makes also use of drm_exec (execution context
> > > for GEM buffers) by Christian König. Since the patch implementing drm_exec was
> > > not yet merged into drm-next it is part of this series, as well as a small fix
> > > for this patch, which was found while testing this series.
> > >
> > > This patch series is also available at [1].
> > >
> > > There is a Mesa NVK merge request by Dave Airlie [2] implementing the
> > > corresponding userspace parts for this series.
> > >
> > > The Vulkan CTS test suite passes the sparse binding and sparse residency test
> > > cases for the new UAPI together with Dave's Mesa work.
> > >
> > > There are also some test cases in the igt-gpu-tools project [3] for the new UAPI
> > > and hence the DRM GPU VA manager. However, most of them are testing the DRM GPU
> > > VA manager's logic through Nouveau's new UAPI and should be considered just as
> > > helper for implementation.
> > >
> > > However, I absolutely intend to change those test cases to proper kunit test
> > > cases for the DRM GPUVA manager, once and if we agree on it's usefulness and
> > > design.
> > >
> > > [1] https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/new-uapi-drm-next /
> > >      https://gitlab.freedesktop.org/nouvelles/kernel/-/merge_requests/1
> > > [2] https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/150/
> > > [3] https://gitlab.freedesktop.org/dakr/igt-gpu-tools/-/tree/wip_nouveau_vm_bind
> > >
> > > Changes in V2:
> > > ==============
> > >    Nouveau:
> > >      - Reworked the Nouveau VM_BIND UAPI to avoid memory allocations in fence
> > >        signalling critical sections. Updates to the VA space are split up in three
> > >        separate stages, where only the 2. stage executes in a fence signalling
> > >        critical section:
> > >
> > >          1. update the VA space, allocate new structures and page tables
> > >          2. (un-)map the requested memory bindings
> > >          3. free structures and page tables
> > >
> > >      - Separated generic job scheduler code from specific job implementations.
> > >      - Separated the EXEC and VM_BIND implementation of the UAPI.
> > >      - Reworked the locking parts of the nvkm/vmm RAW interface, such that
> > >        (un-)map operations can be executed in fence signalling critical sections.
> > >
> > >    GPUVA Manager:
> > >      - made drm_gpuva_regions optional for users of the GPUVA manager
> > >      - allow NULL GEMs for drm_gpuva entries
> > >      - swichted from drm_mm to maple_tree for track drm_gpuva / drm_gpuva_region
> > >        entries
> > >      - provide callbacks for users to allocate custom drm_gpuva_op structures to
> > >        allow inheritance
> > >      - added user bits to drm_gpuva_flags
> > >      - added a prefetch operation type in order to support generating prefetch
> > >        operations in the same way other operations generated
> > >      - hand the responsibility for mutual exclusion for a GEM's
> > >        drm_gpuva list to the user; simplified corresponding (un-)link functions
> > >
> > >    Maple Tree:
> > >      - I added two maple tree patches to the series, one to support custom tree
> > >        walk macros and one to hand the locking responsibility to the user of the
> > >        GPUVA manager without pre-defined lockdep checks.
> > >
> > > Changes in V3:
> > > ==============
> > >    Nouveau:
> > >      - Reworked the Nouveau VM_BIND UAPI to do the job cleanup (including page
> > >        table cleanup) within a workqueue rather than the job_free() callback of
> > >        the scheduler itself. A job_free() callback can stall the execution (run()
> > >        callback) of the next job in the queue. Since the page table cleanup
> > >        requires to take the same locks as need to be taken for page table
> > >        allocation, doing it directly in the job_free() callback would still
> > >        violate the fence signalling critical path.
> > >      - Separated Nouveau fence allocation and emit, such that we do not violate
> > >        the fence signalling critical path in EXEC jobs.
> > >      - Implement "regions" (for handling sparse mappings through PDEs and dual
> > >        page tables) within Nouveau.
> > >      - Drop the requirement for every mapping to be contained within a region.
> > >      - Add necassary synchronization of VM_BIND job operation sequences in order
> > >        to work around limitations in page table handling. This will be addressed
> > >        in a future re-work of Nouveau's page table handling.
> > >      - Fixed a couple of race conditions found through more testing. Thanks to
> > >        Dave for consitently trying to break it. :-)
> > >
> > >    GPUVA Manager:
> > >      - Implement pre-allocation capabilities for tree modifications within fence
> > >        signalling critical sections.
> > >      - Implement accessors to to apply tree modification while walking the GPUVA
> > >        tree in order to actually support processing of drm_gpuva_ops through
> > >        callbacks in fence signalling critical sections rather than through
> > >        pre-allocated operation lists.
> > >      - Remove merging of GPUVAs; the kernel has limited to none knowlege about
> > >        the semantics of mapping sequences. Hence, merging is purely speculative.
> > >        It seems that gaining a significant (or at least a measurable) performance
> > >        increase through merging is way more likely to happen when userspace is
> > >        responsible for merging mappings up to the next larger page size if
> > >        possible.
> > >      - Since merging was removed, regions pretty much loose their right to exist.
> > >        They might still be useful for handling dual page tables or similar
> > >        mechanisms, but since Nouveau seems to be the only driver having a need
> > >        for this for now, regions were removed from the GPUVA manager.
> > >      - Fixed a couple of maple_tree related issues; thanks to Liam for helping me
> > >        out.
> > >
> > > Changes in V4:
> > > ==============
> > >    Nouveau:
> > >      - Refactored how specific VM_BIND and EXEC jobs are created and how their
> > >        arguments are passed to the generic job implementation.
> > >      - Fixed a UAF race condition where bind job ops could have been freed
> > >        already while still waiting for a job cleanup to finish. This is due to
> > >        in certain cases we need to wait for mappings actually being unmapped
> > >        before creating sparse regions in the same area.
> > >      - Re-based the code onto drm_exec v4 patch.
> > >
> > >    GPUVA Manager:
> > >      - Fixed a maple tree related bug when pre-allocating MA states.
> > >        (Boris Brezillion)
> > >      - Made struct drm_gpuva_fn_ops a const object in all occurrences.
> > >        (Boris Brezillion)
> > >
> > > Changes in V5:
> > > ==============
> > >    Nouveau:
> > >      - Link and unlink GPUVAs outside the fence signalling critical path in
> > >        nouveau_uvmm_bind_job_submit() holding the dma-resv lock. Mutual exclusion
> > >        of BO evicts causing mapping invalidation and regular mapping operations
> > >        is ensured with dma-fences.
> > >
> > >    GPUVA Manager:
> > >      - Removed the separate GEMs GPUVA list lock. Link and unlink as well as
> > >        iterating the GEM's GPUVA list should be protected with the GEM's dma-resv
> > >        lock instead.
> > >      - Renamed DRM_GPUVA_EVICTED flag to DRM_GPUVA_INVALIDATED. Mappings do not
> > >        get eviced, they might get invalidated due to eviction.
> > >      - Maple tree uses the 'unsinged long' type for node entries. While this
> > >        works for GPU VA spaces larger than 32-bit on 64-bit kernel, the GPU VA
> > >        space is limited to 32-bit on 32-bit kernels as well.
> > >        As long as we do not have a 64-bit capable maple tree for 32-bit kernels,
> > >        the GPU VA manager contains checks to throw warnings when GPU VA entries
> > >        exceed the maple tree's storage capabilities.
> > >      - Extended the Documentation and added example code as requested by Donald
> > >        Robson.
> > >
> > > Christian König (1):
> > >    drm: execution context for GEM buffers v4
> > >
> > > Danilo Krummrich (13):
> > >    maple_tree: split up MA_STATE() macro
> > >    drm: manager to keep track of GPUs VA mappings
> > >    drm: debugfs: provide infrastructure to dump a DRM GPU VA space
> > >    drm/nouveau: new VM_BIND uapi interfaces
> > >    drm/nouveau: get vmm via nouveau_cli_vmm()
> > >    drm/nouveau: bo: initialize GEM GPU VA interface
> > >    drm/nouveau: move usercopy helpers to nouveau_drv.h
> > >    drm/nouveau: fence: separate fence alloc and emit
> > >    drm/nouveau: fence: fail to emit when fence context is killed
> > >    drm/nouveau: chan: provide nouveau_channel_kill()
> > >    drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm
> > >    drm/nouveau: implement new VM_BIND uAPI
> > >    drm/nouveau: debugfs: implement DRM GPU VA debugfs
> > >
> > >   Documentation/gpu/driver-uapi.rst             |   11 +
> > >   Documentation/gpu/drm-mm.rst                  |   54 +
> > >   drivers/gpu/drm/Kconfig                       |    6 +
> > >   drivers/gpu/drm/Makefile                      |    3 +
> > >   drivers/gpu/drm/drm_debugfs.c                 |   41 +
> > >   drivers/gpu/drm/drm_exec.c                    |  278 +++
> > >   drivers/gpu/drm/drm_gem.c                     |    3 +
> > >   drivers/gpu/drm/drm_gpuva_mgr.c               | 1971 ++++++++++++++++
> > >   drivers/gpu/drm/nouveau/Kbuild                |    3 +
> > >   drivers/gpu/drm/nouveau/Kconfig               |    2 +
> > >   drivers/gpu/drm/nouveau/dispnv04/crtc.c       |    9 +-
> > >   drivers/gpu/drm/nouveau/include/nvif/if000c.h |   26 +-
> > >   drivers/gpu/drm/nouveau/include/nvif/vmm.h    |   19 +-
> > >   .../gpu/drm/nouveau/include/nvkm/subdev/mmu.h |   20 +-
> > >   drivers/gpu/drm/nouveau/nouveau_abi16.c       |   24 +
> > >   drivers/gpu/drm/nouveau/nouveau_abi16.h       |    1 +
> > >   drivers/gpu/drm/nouveau/nouveau_bo.c          |  204 +-
> > >   drivers/gpu/drm/nouveau/nouveau_bo.h          |    2 +-
> > >   drivers/gpu/drm/nouveau/nouveau_chan.c        |   22 +-
> > >   drivers/gpu/drm/nouveau/nouveau_chan.h        |    1 +
> > >   drivers/gpu/drm/nouveau/nouveau_debugfs.c     |   39 +
> > >   drivers/gpu/drm/nouveau/nouveau_dmem.c        |    9 +-
> > >   drivers/gpu/drm/nouveau/nouveau_drm.c         |   27 +-
> > >   drivers/gpu/drm/nouveau/nouveau_drv.h         |   94 +-
> > >   drivers/gpu/drm/nouveau/nouveau_exec.c        |  418 ++++
> > >   drivers/gpu/drm/nouveau/nouveau_exec.h        |   54 +
> > >   drivers/gpu/drm/nouveau/nouveau_fence.c       |   23 +-
> > >   drivers/gpu/drm/nouveau/nouveau_fence.h       |    5 +-
> > >   drivers/gpu/drm/nouveau/nouveau_gem.c         |   62 +-
> > >   drivers/gpu/drm/nouveau/nouveau_mem.h         |    5 +
> > >   drivers/gpu/drm/nouveau/nouveau_prime.c       |    2 +-
> > >   drivers/gpu/drm/nouveau/nouveau_sched.c       |  461 ++++
> > >   drivers/gpu/drm/nouveau/nouveau_sched.h       |  123 +
> > >   drivers/gpu/drm/nouveau/nouveau_svm.c         |    2 +-
> > >   drivers/gpu/drm/nouveau/nouveau_uvmm.c        | 1979 +++++++++++++++++
> > >   drivers/gpu/drm/nouveau/nouveau_uvmm.h        |  107 +
> > >   drivers/gpu/drm/nouveau/nouveau_vmm.c         |    4 +-
> > >   drivers/gpu/drm/nouveau/nvif/vmm.c            |  100 +-
> > >   .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c    |  213 +-
> > >   drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c |  197 +-
> > >   drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |   25 +
> > >   .../drm/nouveau/nvkm/subdev/mmu/vmmgf100.c    |   16 +-
> > >   .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c    |   16 +-
> > >   .../gpu/drm/nouveau/nvkm/subdev/mmu/vmmnv50.c |   27 +-
> > >   include/drm/drm_debugfs.h                     |   25 +
> > >   include/drm/drm_drv.h                         |    6 +
> > >   include/drm/drm_exec.h                        |  119 +
> > >   include/drm/drm_gem.h                         |   52 +
> > >   include/drm/drm_gpuva_mgr.h                   |  682 ++++++
> > >   include/linux/maple_tree.h                    |    7 +-
> > >   include/uapi/drm/nouveau_drm.h                |  209 ++
> > >   51 files changed, 7566 insertions(+), 242 deletions(-)
> > >   create mode 100644 drivers/gpu/drm/drm_exec.c
> > >   create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
> > >   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
> > >   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
> > >   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
> > >   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h
> > >   create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.c
> > >   create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.h
> > >   create mode 100644 include/drm/drm_exec.h
> > >   create mode 100644 include/drm/drm_gpuva_mgr.h
> > >
> > >
> > > base-commit: 2222dcb0775d36de28992f56455ab3967b30d380
> >
Dave Airlie June 20, 2023, 7:13 a.m. UTC | #4
On Tue, 20 Jun 2023 at 17:06, Oded Gabbay <ogabbay@kernel.org> wrote:
>
> On Tue, Jun 20, 2023 at 7:05 AM Dave Airlie <airlied@gmail.com> wrote:
> >
> > Since this is feature is nouveau only currently and doesn't disturb
> > the current nouveau code paths, I'd like to try and get this work in
> > tree so other drivers can work from it.
> >
> > If there are any major objections to this, I'm happy to pull it back
> > out again, but I'd like to get some acks/rb in the next couple of days
> > in order to land some of it.
> >
> > Dave.
> >
> >
> > >
> > > forgot to add your email address to the patch series - sorry about that.
> > >
> > > This series (v5) contains the Documentation changes you requested.
> > >
> > > - Danilo
> > >
> > > On 6/20/23 02:42, Danilo Krummrich wrote:
> > > > This patch series provides a new UAPI for the Nouveau driver in order to
> > > > support Vulkan features, such as sparse bindings and sparse residency.
> > > >
> > > > Furthermore, with the DRM GPUVA manager it provides a new DRM core feature to
> > > > keep track of GPU virtual address (VA) mappings in a more generic way.
> > > >
> > > > The DRM GPUVA manager is indented to help drivers implement userspace-manageable
> > > > GPU VA spaces in reference to the Vulkan API. In order to achieve this goal it
> > > > serves the following purposes in this context.
> > > >
> > > >      1) Provide infrastructure to track GPU VA allocations and mappings,
> > > >         making use of the maple_tree.
> > > >
> > > >      2) Generically connect GPU VA mappings to their backing buffers, in
> > > >         particular DRM GEM objects.
> Will this manager be able to connect GPU VA mappings to host memory
> allocations (aka user pointers) ?
>
> I only skimmed over the uapi definitions, but from that quick glance I
> saw you can only pass a (gem) handle to the vm bind uapi.
>
> I think it is an important feature because you don't want to have two
> GPU VA managers running in your driver (if that's even possible).
> Maybe we should at least try to make sure the uapi is/will be
> compatible with such an extension.
>

I think that would have to be a new uAPI entry point anyways, since
managing user ptrs is extra, but the uAPI is nouveau specific and
nouveau has no hostptr support as of now.

The gpuva manager is kernel internal, I think adding host ptr tracking
is useful, but I don't think it's a blocker right now.

One of the reasons I'd like to get this in the tree is to add things
like that instead of overloading this initial patchset with feature
creep.

Dave.
Oded Gabbay June 20, 2023, 7:34 a.m. UTC | #5
On Tue, Jun 20, 2023 at 10:13 AM Dave Airlie <airlied@gmail.com> wrote:
>
> On Tue, 20 Jun 2023 at 17:06, Oded Gabbay <ogabbay@kernel.org> wrote:
> >
> > On Tue, Jun 20, 2023 at 7:05 AM Dave Airlie <airlied@gmail.com> wrote:
> > >
> > > Since this is feature is nouveau only currently and doesn't disturb
> > > the current nouveau code paths, I'd like to try and get this work in
> > > tree so other drivers can work from it.
> > >
> > > If there are any major objections to this, I'm happy to pull it back
> > > out again, but I'd like to get some acks/rb in the next couple of days
> > > in order to land some of it.
> > >
> > > Dave.
> > >
> > >
> > > >
> > > > forgot to add your email address to the patch series - sorry about that.
> > > >
> > > > This series (v5) contains the Documentation changes you requested.
> > > >
> > > > - Danilo
> > > >
> > > > On 6/20/23 02:42, Danilo Krummrich wrote:
> > > > > This patch series provides a new UAPI for the Nouveau driver in order to
> > > > > support Vulkan features, such as sparse bindings and sparse residency.
> > > > >
> > > > > Furthermore, with the DRM GPUVA manager it provides a new DRM core feature to
> > > > > keep track of GPU virtual address (VA) mappings in a more generic way.
> > > > >
> > > > > The DRM GPUVA manager is indented to help drivers implement userspace-manageable
> > > > > GPU VA spaces in reference to the Vulkan API. In order to achieve this goal it
> > > > > serves the following purposes in this context.
> > > > >
> > > > >      1) Provide infrastructure to track GPU VA allocations and mappings,
> > > > >         making use of the maple_tree.
> > > > >
> > > > >      2) Generically connect GPU VA mappings to their backing buffers, in
> > > > >         particular DRM GEM objects.
> > Will this manager be able to connect GPU VA mappings to host memory
> > allocations (aka user pointers) ?
> >
> > I only skimmed over the uapi definitions, but from that quick glance I
> > saw you can only pass a (gem) handle to the vm bind uapi.
> >
> > I think it is an important feature because you don't want to have two
> > GPU VA managers running in your driver (if that's even possible).
> > Maybe we should at least try to make sure the uapi is/will be
> > compatible with such an extension.
> >
>
> I think that would have to be a new uAPI entry point anyways, since
> managing user ptrs is extra, but the uAPI is nouveau specific and
> nouveau has no hostptr support as of now.
>
> The gpuva manager is kernel internal, I think adding host ptr tracking
> is useful, but I don't think it's a blocker right now.
>
> One of the reasons I'd like to get this in the tree is to add things
> like that instead of overloading this initial patchset with feature
> creep.
>
> Dave.
ok, that makes sense.
Thanks,
Oded
Boris Brezillon June 20, 2023, 9:25 a.m. UTC | #6
Hi Danilo,

On Tue, 20 Jun 2023 02:42:03 +0200
Danilo Krummrich <dakr@redhat.com> wrote:

> This patch series provides a new UAPI for the Nouveau driver in order to
> support Vulkan features, such as sparse bindings and sparse residency.
> 
> Furthermore, with the DRM GPUVA manager it provides a new DRM core feature to
> keep track of GPU virtual address (VA) mappings in a more generic way.
> 
> The DRM GPUVA manager is indented to help drivers implement userspace-manageable
> GPU VA spaces in reference to the Vulkan API. In order to achieve this goal it
> serves the following purposes in this context.
> 
>     1) Provide infrastructure to track GPU VA allocations and mappings,
>        making use of the maple_tree.
> 
>     2) Generically connect GPU VA mappings to their backing buffers, in
>        particular DRM GEM objects.
> 
>     3) Provide a common implementation to perform more complex mapping
>        operations on the GPU VA space. In particular splitting and merging
>        of GPU VA mappings, e.g. for intersecting mapping requests or partial
>        unmap requests.
> 
> The new VM_BIND Nouveau UAPI build on top of the DRM GPUVA manager, itself
> providing the following new interfaces.
> 
>     1) Initialize a GPU VA space via the new DRM_IOCTL_NOUVEAU_VM_INIT ioctl
>        for UMDs to specify the portion of VA space managed by the kernel and
>        userspace, respectively.
> 
>     2) Allocate and free a VA space region as well as bind and unbind memory
>        to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
> 
>     3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.
> 
> Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, make use of the DRM
> scheduler to queue jobs and support asynchronous processing with DRM syncobjs
> as synchronization mechanism.
> 
> By default DRM_IOCTL_NOUVEAU_VM_BIND does synchronous processing,
> DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.
> 
> The new VM_BIND UAPI for Nouveau makes also use of drm_exec (execution context
> for GEM buffers) by Christian König. Since the patch implementing drm_exec was
> not yet merged into drm-next it is part of this series, as well as a small fix
> for this patch, which was found while testing this series.
> 
> This patch series is also available at [1].
> 
> There is a Mesa NVK merge request by Dave Airlie [2] implementing the
> corresponding userspace parts for this series.
> 
> The Vulkan CTS test suite passes the sparse binding and sparse residency test
> cases for the new UAPI together with Dave's Mesa work.
> 
> There are also some test cases in the igt-gpu-tools project [3] for the new UAPI
> and hence the DRM GPU VA manager. However, most of them are testing the DRM GPU
> VA manager's logic through Nouveau's new UAPI and should be considered just as
> helper for implementation.
> 
> However, I absolutely intend to change those test cases to proper kunit test
> cases for the DRM GPUVA manager, once and if we agree on it's usefulness and
> design.
> 
> [1] https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/new-uapi-drm-next /
>     https://gitlab.freedesktop.org/nouvelles/kernel/-/merge_requests/1
> [2] https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/150/
> [3] https://gitlab.freedesktop.org/dakr/igt-gpu-tools/-/tree/wip_nouveau_vm_bind
> 
> Changes in V2:
> ==============
>   Nouveau:
>     - Reworked the Nouveau VM_BIND UAPI to avoid memory allocations in fence
>       signalling critical sections. Updates to the VA space are split up in three
>       separate stages, where only the 2. stage executes in a fence signalling
>       critical section:
> 
>         1. update the VA space, allocate new structures and page tables
>         2. (un-)map the requested memory bindings
>         3. free structures and page tables
> 
>     - Separated generic job scheduler code from specific job implementations.
>     - Separated the EXEC and VM_BIND implementation of the UAPI.
>     - Reworked the locking parts of the nvkm/vmm RAW interface, such that
>       (un-)map operations can be executed in fence signalling critical sections.
> 
>   GPUVA Manager:
>     - made drm_gpuva_regions optional for users of the GPUVA manager
>     - allow NULL GEMs for drm_gpuva entries
>     - swichted from drm_mm to maple_tree for track drm_gpuva / drm_gpuva_region
>       entries
>     - provide callbacks for users to allocate custom drm_gpuva_op structures to
>       allow inheritance
>     - added user bits to drm_gpuva_flags
>     - added a prefetch operation type in order to support generating prefetch
>       operations in the same way other operations generated
>     - hand the responsibility for mutual exclusion for a GEM's
>       drm_gpuva list to the user; simplified corresponding (un-)link functions
> 
>   Maple Tree:
>     - I added two maple tree patches to the series, one to support custom tree
>       walk macros and one to hand the locking responsibility to the user of the
>       GPUVA manager without pre-defined lockdep checks.
> 
> Changes in V3:
> ==============
>   Nouveau:
>     - Reworked the Nouveau VM_BIND UAPI to do the job cleanup (including page
>       table cleanup) within a workqueue rather than the job_free() callback of
>       the scheduler itself. A job_free() callback can stall the execution (run()
>       callback) of the next job in the queue. Since the page table cleanup
>       requires to take the same locks as need to be taken for page table
>       allocation, doing it directly in the job_free() callback would still
>       violate the fence signalling critical path.
>     - Separated Nouveau fence allocation and emit, such that we do not violate
>       the fence signalling critical path in EXEC jobs.
>     - Implement "regions" (for handling sparse mappings through PDEs and dual
>       page tables) within Nouveau.
>     - Drop the requirement for every mapping to be contained within a region.
>     - Add necassary synchronization of VM_BIND job operation sequences in order
>       to work around limitations in page table handling. This will be addressed
>       in a future re-work of Nouveau's page table handling.
>     - Fixed a couple of race conditions found through more testing. Thanks to
>       Dave for consitently trying to break it. :-)
> 
>   GPUVA Manager:
>     - Implement pre-allocation capabilities for tree modifications within fence
>       signalling critical sections.
>     - Implement accessors to to apply tree modification while walking the GPUVA
>       tree in order to actually support processing of drm_gpuva_ops through
>       callbacks in fence signalling critical sections rather than through
>       pre-allocated operation lists.
>     - Remove merging of GPUVAs; the kernel has limited to none knowlege about
>       the semantics of mapping sequences. Hence, merging is purely speculative.
>       It seems that gaining a significant (or at least a measurable) performance
>       increase through merging is way more likely to happen when userspace is
>       responsible for merging mappings up to the next larger page size if
>       possible.
>     - Since merging was removed, regions pretty much loose their right to exist.
>       They might still be useful for handling dual page tables or similar
>       mechanisms, but since Nouveau seems to be the only driver having a need
>       for this for now, regions were removed from the GPUVA manager.
>     - Fixed a couple of maple_tree related issues; thanks to Liam for helping me
>       out.
> 
> Changes in V4:
> ==============
>   Nouveau:
>     - Refactored how specific VM_BIND and EXEC jobs are created and how their
>       arguments are passed to the generic job implementation.
>     - Fixed a UAF race condition where bind job ops could have been freed
>       already while still waiting for a job cleanup to finish. This is due to
>       in certain cases we need to wait for mappings actually being unmapped
>       before creating sparse regions in the same area.
>     - Re-based the code onto drm_exec v4 patch.
> 
>   GPUVA Manager:
>     - Fixed a maple tree related bug when pre-allocating MA states.
>       (Boris Brezillion)
>     - Made struct drm_gpuva_fn_ops a const object in all occurrences.
>       (Boris Brezillion)
> 
> Changes in V5:
> ==============
>   Nouveau:
>     - Link and unlink GPUVAs outside the fence signalling critical path in
>       nouveau_uvmm_bind_job_submit() holding the dma-resv lock. Mutual exclusion
>       of BO evicts causing mapping invalidation and regular mapping operations
>       is ensured with dma-fences.
> 
>   GPUVA Manager:
>     - Removed the separate GEMs GPUVA list lock. Link and unlink as well as
>       iterating the GEM's GPUVA list should be protected with the GEM's dma-resv
>       lock instead.
>     - Renamed DRM_GPUVA_EVICTED flag to DRM_GPUVA_INVALIDATED. Mappings do not
>       get eviced, they might get invalidated due to eviction.
>     - Maple tree uses the 'unsinged long' type for node entries. While this
>       works for GPU VA spaces larger than 32-bit on 64-bit kernel, the GPU VA
>       space is limited to 32-bit on 32-bit kernels as well.
>       As long as we do not have a 64-bit capable maple tree for 32-bit kernels,
>       the GPU VA manager contains checks to throw warnings when GPU VA entries
>       exceed the maple tree's storage capabilities.
>     - Extended the Documentation and added example code as requested by Donald
>       Robson.
> 
> Christian König (1):
>   drm: execution context for GEM buffers v4
> 
> Danilo Krummrich (13):
>   maple_tree: split up MA_STATE() macro
>   drm: manager to keep track of GPUs VA mappings
>   drm: debugfs: provide infrastructure to dump a DRM GPU VA space

Core drm patches are 

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>

The only thing I'm worried about is the 'sync mapping requests have to
go through the async path and wait for all previous async requests to
be processed' problem I mentioned in one of your previous submission,
but I'm happy leave that for later.

>   drm/nouveau: new VM_BIND uapi interfaces
>   drm/nouveau: get vmm via nouveau_cli_vmm()
>   drm/nouveau: bo: initialize GEM GPU VA interface
>   drm/nouveau: move usercopy helpers to nouveau_drv.h
>   drm/nouveau: fence: separate fence alloc and emit
>   drm/nouveau: fence: fail to emit when fence context is killed
>   drm/nouveau: chan: provide nouveau_channel_kill()
>   drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm
>   drm/nouveau: implement new VM_BIND uAPI
>   drm/nouveau: debugfs: implement DRM GPU VA debugfs
> 
>  Documentation/gpu/driver-uapi.rst             |   11 +
>  Documentation/gpu/drm-mm.rst                  |   54 +
>  drivers/gpu/drm/Kconfig                       |    6 +
>  drivers/gpu/drm/Makefile                      |    3 +
>  drivers/gpu/drm/drm_debugfs.c                 |   41 +
>  drivers/gpu/drm/drm_exec.c                    |  278 +++
>  drivers/gpu/drm/drm_gem.c                     |    3 +
>  drivers/gpu/drm/drm_gpuva_mgr.c               | 1971 ++++++++++++++++
>  drivers/gpu/drm/nouveau/Kbuild                |    3 +
>  drivers/gpu/drm/nouveau/Kconfig               |    2 +
>  drivers/gpu/drm/nouveau/dispnv04/crtc.c       |    9 +-
>  drivers/gpu/drm/nouveau/include/nvif/if000c.h |   26 +-
>  drivers/gpu/drm/nouveau/include/nvif/vmm.h    |   19 +-
>  .../gpu/drm/nouveau/include/nvkm/subdev/mmu.h |   20 +-
>  drivers/gpu/drm/nouveau/nouveau_abi16.c       |   24 +
>  drivers/gpu/drm/nouveau/nouveau_abi16.h       |    1 +
>  drivers/gpu/drm/nouveau/nouveau_bo.c          |  204 +-
>  drivers/gpu/drm/nouveau/nouveau_bo.h          |    2 +-
>  drivers/gpu/drm/nouveau/nouveau_chan.c        |   22 +-
>  drivers/gpu/drm/nouveau/nouveau_chan.h        |    1 +
>  drivers/gpu/drm/nouveau/nouveau_debugfs.c     |   39 +
>  drivers/gpu/drm/nouveau/nouveau_dmem.c        |    9 +-
>  drivers/gpu/drm/nouveau/nouveau_drm.c         |   27 +-
>  drivers/gpu/drm/nouveau/nouveau_drv.h         |   94 +-
>  drivers/gpu/drm/nouveau/nouveau_exec.c        |  418 ++++
>  drivers/gpu/drm/nouveau/nouveau_exec.h        |   54 +
>  drivers/gpu/drm/nouveau/nouveau_fence.c       |   23 +-
>  drivers/gpu/drm/nouveau/nouveau_fence.h       |    5 +-
>  drivers/gpu/drm/nouveau/nouveau_gem.c         |   62 +-
>  drivers/gpu/drm/nouveau/nouveau_mem.h         |    5 +
>  drivers/gpu/drm/nouveau/nouveau_prime.c       |    2 +-
>  drivers/gpu/drm/nouveau/nouveau_sched.c       |  461 ++++
>  drivers/gpu/drm/nouveau/nouveau_sched.h       |  123 +
>  drivers/gpu/drm/nouveau/nouveau_svm.c         |    2 +-
>  drivers/gpu/drm/nouveau/nouveau_uvmm.c        | 1979 +++++++++++++++++
>  drivers/gpu/drm/nouveau/nouveau_uvmm.h        |  107 +
>  drivers/gpu/drm/nouveau/nouveau_vmm.c         |    4 +-
>  drivers/gpu/drm/nouveau/nvif/vmm.c            |  100 +-
>  .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c    |  213 +-
>  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c |  197 +-
>  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |   25 +
>  .../drm/nouveau/nvkm/subdev/mmu/vmmgf100.c    |   16 +-
>  .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c    |   16 +-
>  .../gpu/drm/nouveau/nvkm/subdev/mmu/vmmnv50.c |   27 +-
>  include/drm/drm_debugfs.h                     |   25 +
>  include/drm/drm_drv.h                         |    6 +
>  include/drm/drm_exec.h                        |  119 +
>  include/drm/drm_gem.h                         |   52 +
>  include/drm/drm_gpuva_mgr.h                   |  682 ++++++
>  include/linux/maple_tree.h                    |    7 +-
>  include/uapi/drm/nouveau_drm.h                |  209 ++
>  51 files changed, 7566 insertions(+), 242 deletions(-)
>  create mode 100644 drivers/gpu/drm/drm_exec.c
>  create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
>  create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
>  create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
>  create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
>  create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h
>  create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.c
>  create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.h
>  create mode 100644 include/drm/drm_exec.h
>  create mode 100644 include/drm/drm_gpuva_mgr.h
> 
> 
> base-commit: 2222dcb0775d36de28992f56455ab3967b30d380
Danilo Krummrich June 20, 2023, 12:46 p.m. UTC | #7
Hi Boris,

On 6/20/23 11:25, Boris Brezillon wrote:
> Hi Danilo,
> 
> On Tue, 20 Jun 2023 02:42:03 +0200
> Danilo Krummrich <dakr@redhat.com> wrote:
> 
>> This patch series provides a new UAPI for the Nouveau driver in order to
>> support Vulkan features, such as sparse bindings and sparse residency.
>>
>> Furthermore, with the DRM GPUVA manager it provides a new DRM core feature to
>> keep track of GPU virtual address (VA) mappings in a more generic way.
>>
>> The DRM GPUVA manager is indented to help drivers implement userspace-manageable
>> GPU VA spaces in reference to the Vulkan API. In order to achieve this goal it
>> serves the following purposes in this context.
>>
>>      1) Provide infrastructure to track GPU VA allocations and mappings,
>>         making use of the maple_tree.
>>
>>      2) Generically connect GPU VA mappings to their backing buffers, in
>>         particular DRM GEM objects.
>>
>>      3) Provide a common implementation to perform more complex mapping
>>         operations on the GPU VA space. In particular splitting and merging
>>         of GPU VA mappings, e.g. for intersecting mapping requests or partial
>>         unmap requests.
>>
>> The new VM_BIND Nouveau UAPI build on top of the DRM GPUVA manager, itself
>> providing the following new interfaces.
>>
>>      1) Initialize a GPU VA space via the new DRM_IOCTL_NOUVEAU_VM_INIT ioctl
>>         for UMDs to specify the portion of VA space managed by the kernel and
>>         userspace, respectively.
>>
>>      2) Allocate and free a VA space region as well as bind and unbind memory
>>         to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
>>
>>      3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl.
>>
>> Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, make use of the DRM
>> scheduler to queue jobs and support asynchronous processing with DRM syncobjs
>> as synchronization mechanism.
>>
>> By default DRM_IOCTL_NOUVEAU_VM_BIND does synchronous processing,
>> DRM_IOCTL_NOUVEAU_EXEC supports asynchronous processing only.
>>
>> The new VM_BIND UAPI for Nouveau makes also use of drm_exec (execution context
>> for GEM buffers) by Christian König. Since the patch implementing drm_exec was
>> not yet merged into drm-next it is part of this series, as well as a small fix
>> for this patch, which was found while testing this series.
>>
>> This patch series is also available at [1].
>>
>> There is a Mesa NVK merge request by Dave Airlie [2] implementing the
>> corresponding userspace parts for this series.
>>
>> The Vulkan CTS test suite passes the sparse binding and sparse residency test
>> cases for the new UAPI together with Dave's Mesa work.
>>
>> There are also some test cases in the igt-gpu-tools project [3] for the new UAPI
>> and hence the DRM GPU VA manager. However, most of them are testing the DRM GPU
>> VA manager's logic through Nouveau's new UAPI and should be considered just as
>> helper for implementation.
>>
>> However, I absolutely intend to change those test cases to proper kunit test
>> cases for the DRM GPUVA manager, once and if we agree on it's usefulness and
>> design.
>>
>> [1] https://gitlab.freedesktop.org/nouvelles/kernel/-/tree/new-uapi-drm-next /
>>      https://gitlab.freedesktop.org/nouvelles/kernel/-/merge_requests/1
>> [2] https://gitlab.freedesktop.org/nouveau/mesa/-/merge_requests/150/
>> [3] https://gitlab.freedesktop.org/dakr/igt-gpu-tools/-/tree/wip_nouveau_vm_bind
>>
>> Changes in V2:
>> ==============
>>    Nouveau:
>>      - Reworked the Nouveau VM_BIND UAPI to avoid memory allocations in fence
>>        signalling critical sections. Updates to the VA space are split up in three
>>        separate stages, where only the 2. stage executes in a fence signalling
>>        critical section:
>>
>>          1. update the VA space, allocate new structures and page tables
>>          2. (un-)map the requested memory bindings
>>          3. free structures and page tables
>>
>>      - Separated generic job scheduler code from specific job implementations.
>>      - Separated the EXEC and VM_BIND implementation of the UAPI.
>>      - Reworked the locking parts of the nvkm/vmm RAW interface, such that
>>        (un-)map operations can be executed in fence signalling critical sections.
>>
>>    GPUVA Manager:
>>      - made drm_gpuva_regions optional for users of the GPUVA manager
>>      - allow NULL GEMs for drm_gpuva entries
>>      - swichted from drm_mm to maple_tree for track drm_gpuva / drm_gpuva_region
>>        entries
>>      - provide callbacks for users to allocate custom drm_gpuva_op structures to
>>        allow inheritance
>>      - added user bits to drm_gpuva_flags
>>      - added a prefetch operation type in order to support generating prefetch
>>        operations in the same way other operations generated
>>      - hand the responsibility for mutual exclusion for a GEM's
>>        drm_gpuva list to the user; simplified corresponding (un-)link functions
>>
>>    Maple Tree:
>>      - I added two maple tree patches to the series, one to support custom tree
>>        walk macros and one to hand the locking responsibility to the user of the
>>        GPUVA manager without pre-defined lockdep checks.
>>
>> Changes in V3:
>> ==============
>>    Nouveau:
>>      - Reworked the Nouveau VM_BIND UAPI to do the job cleanup (including page
>>        table cleanup) within a workqueue rather than the job_free() callback of
>>        the scheduler itself. A job_free() callback can stall the execution (run()
>>        callback) of the next job in the queue. Since the page table cleanup
>>        requires to take the same locks as need to be taken for page table
>>        allocation, doing it directly in the job_free() callback would still
>>        violate the fence signalling critical path.
>>      - Separated Nouveau fence allocation and emit, such that we do not violate
>>        the fence signalling critical path in EXEC jobs.
>>      - Implement "regions" (for handling sparse mappings through PDEs and dual
>>        page tables) within Nouveau.
>>      - Drop the requirement for every mapping to be contained within a region.
>>      - Add necassary synchronization of VM_BIND job operation sequences in order
>>        to work around limitations in page table handling. This will be addressed
>>        in a future re-work of Nouveau's page table handling.
>>      - Fixed a couple of race conditions found through more testing. Thanks to
>>        Dave for consitently trying to break it. :-)
>>
>>    GPUVA Manager:
>>      - Implement pre-allocation capabilities for tree modifications within fence
>>        signalling critical sections.
>>      - Implement accessors to to apply tree modification while walking the GPUVA
>>        tree in order to actually support processing of drm_gpuva_ops through
>>        callbacks in fence signalling critical sections rather than through
>>        pre-allocated operation lists.
>>      - Remove merging of GPUVAs; the kernel has limited to none knowlege about
>>        the semantics of mapping sequences. Hence, merging is purely speculative.
>>        It seems that gaining a significant (or at least a measurable) performance
>>        increase through merging is way more likely to happen when userspace is
>>        responsible for merging mappings up to the next larger page size if
>>        possible.
>>      - Since merging was removed, regions pretty much loose their right to exist.
>>        They might still be useful for handling dual page tables or similar
>>        mechanisms, but since Nouveau seems to be the only driver having a need
>>        for this for now, regions were removed from the GPUVA manager.
>>      - Fixed a couple of maple_tree related issues; thanks to Liam for helping me
>>        out.
>>
>> Changes in V4:
>> ==============
>>    Nouveau:
>>      - Refactored how specific VM_BIND and EXEC jobs are created and how their
>>        arguments are passed to the generic job implementation.
>>      - Fixed a UAF race condition where bind job ops could have been freed
>>        already while still waiting for a job cleanup to finish. This is due to
>>        in certain cases we need to wait for mappings actually being unmapped
>>        before creating sparse regions in the same area.
>>      - Re-based the code onto drm_exec v4 patch.
>>
>>    GPUVA Manager:
>>      - Fixed a maple tree related bug when pre-allocating MA states.
>>        (Boris Brezillion)
>>      - Made struct drm_gpuva_fn_ops a const object in all occurrences.
>>        (Boris Brezillion)
>>
>> Changes in V5:
>> ==============
>>    Nouveau:
>>      - Link and unlink GPUVAs outside the fence signalling critical path in
>>        nouveau_uvmm_bind_job_submit() holding the dma-resv lock. Mutual exclusion
>>        of BO evicts causing mapping invalidation and regular mapping operations
>>        is ensured with dma-fences.
>>
>>    GPUVA Manager:
>>      - Removed the separate GEMs GPUVA list lock. Link and unlink as well as
>>        iterating the GEM's GPUVA list should be protected with the GEM's dma-resv
>>        lock instead.
>>      - Renamed DRM_GPUVA_EVICTED flag to DRM_GPUVA_INVALIDATED. Mappings do not
>>        get eviced, they might get invalidated due to eviction.
>>      - Maple tree uses the 'unsinged long' type for node entries. While this
>>        works for GPU VA spaces larger than 32-bit on 64-bit kernel, the GPU VA
>>        space is limited to 32-bit on 32-bit kernels as well.
>>        As long as we do not have a 64-bit capable maple tree for 32-bit kernels,
>>        the GPU VA manager contains checks to throw warnings when GPU VA entries
>>        exceed the maple tree's storage capabilities.
>>      - Extended the Documentation and added example code as requested by Donald
>>        Robson.
>>
>> Christian König (1):
>>    drm: execution context for GEM buffers v4
>>
>> Danilo Krummrich (13):
>>    maple_tree: split up MA_STATE() macro
>>    drm: manager to keep track of GPUs VA mappings
>>    drm: debugfs: provide infrastructure to dump a DRM GPU VA space
> 
> Core drm patches are
> 
> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
> 
> The only thing I'm worried about is the 'sync mapping requests have to
> go through the async path and wait for all previous async requests to
> be processed' problem I mentioned in one of your previous submission,
> but I'm happy leave that for later.

Yes, I'm aware of this limitation.

Let me quickly try to explain where this limitation comes from and how I 
intend to address it.

In order to be able to allocate the required page tables for a mapping 
request and in order to free corresponding page tables once the (async) 
job finished I need to know the corresponding sequence of operations 
(drm_gpuva_ops) to fulfill the mapping request.

This requires me to update the GPUVA space in the ioctl() rather than in 
the async stage, because otherwise I would need to wait for previous 
jobs to finish before being able to submit subsequent jobs to the job 
queue, since I need an up to date view of the GPUVA space in order to 
calculate the sequence of operations to fulfill a mapping request.

As a consequence all jobs need to be processed in the order they were 
submitted, including synchronous jobs.

@Matt: I think you will have the same limitation with synchronous jobs 
as your implementation in XE should be similar?

In order to address it I want to switch to using callbacks rather than 
'pre-allocated' drm_gpuva_ops and update the GPUVA space within the 
asynchronous stage. This would allow me to 'fit' synchronous jobs 
between jobs waiting in the async job queue. However, to do this I have 
to re-work how the page table handling in Nouveau is implemented, since 
this would require me to be able to manage the page tables without 
knowing the exact sequence of operations to fulfill a mapping request.

- Danilo

> 
>>    drm/nouveau: new VM_BIND uapi interfaces
>>    drm/nouveau: get vmm via nouveau_cli_vmm()
>>    drm/nouveau: bo: initialize GEM GPU VA interface
>>    drm/nouveau: move usercopy helpers to nouveau_drv.h
>>    drm/nouveau: fence: separate fence alloc and emit
>>    drm/nouveau: fence: fail to emit when fence context is killed
>>    drm/nouveau: chan: provide nouveau_channel_kill()
>>    drm/nouveau: nvkm/vmm: implement raw ops to manage uvmm
>>    drm/nouveau: implement new VM_BIND uAPI
>>    drm/nouveau: debugfs: implement DRM GPU VA debugfs
>>
>>   Documentation/gpu/driver-uapi.rst             |   11 +
>>   Documentation/gpu/drm-mm.rst                  |   54 +
>>   drivers/gpu/drm/Kconfig                       |    6 +
>>   drivers/gpu/drm/Makefile                      |    3 +
>>   drivers/gpu/drm/drm_debugfs.c                 |   41 +
>>   drivers/gpu/drm/drm_exec.c                    |  278 +++
>>   drivers/gpu/drm/drm_gem.c                     |    3 +
>>   drivers/gpu/drm/drm_gpuva_mgr.c               | 1971 ++++++++++++++++
>>   drivers/gpu/drm/nouveau/Kbuild                |    3 +
>>   drivers/gpu/drm/nouveau/Kconfig               |    2 +
>>   drivers/gpu/drm/nouveau/dispnv04/crtc.c       |    9 +-
>>   drivers/gpu/drm/nouveau/include/nvif/if000c.h |   26 +-
>>   drivers/gpu/drm/nouveau/include/nvif/vmm.h    |   19 +-
>>   .../gpu/drm/nouveau/include/nvkm/subdev/mmu.h |   20 +-
>>   drivers/gpu/drm/nouveau/nouveau_abi16.c       |   24 +
>>   drivers/gpu/drm/nouveau/nouveau_abi16.h       |    1 +
>>   drivers/gpu/drm/nouveau/nouveau_bo.c          |  204 +-
>>   drivers/gpu/drm/nouveau/nouveau_bo.h          |    2 +-
>>   drivers/gpu/drm/nouveau/nouveau_chan.c        |   22 +-
>>   drivers/gpu/drm/nouveau/nouveau_chan.h        |    1 +
>>   drivers/gpu/drm/nouveau/nouveau_debugfs.c     |   39 +
>>   drivers/gpu/drm/nouveau/nouveau_dmem.c        |    9 +-
>>   drivers/gpu/drm/nouveau/nouveau_drm.c         |   27 +-
>>   drivers/gpu/drm/nouveau/nouveau_drv.h         |   94 +-
>>   drivers/gpu/drm/nouveau/nouveau_exec.c        |  418 ++++
>>   drivers/gpu/drm/nouveau/nouveau_exec.h        |   54 +
>>   drivers/gpu/drm/nouveau/nouveau_fence.c       |   23 +-
>>   drivers/gpu/drm/nouveau/nouveau_fence.h       |    5 +-
>>   drivers/gpu/drm/nouveau/nouveau_gem.c         |   62 +-
>>   drivers/gpu/drm/nouveau/nouveau_mem.h         |    5 +
>>   drivers/gpu/drm/nouveau/nouveau_prime.c       |    2 +-
>>   drivers/gpu/drm/nouveau/nouveau_sched.c       |  461 ++++
>>   drivers/gpu/drm/nouveau/nouveau_sched.h       |  123 +
>>   drivers/gpu/drm/nouveau/nouveau_svm.c         |    2 +-
>>   drivers/gpu/drm/nouveau/nouveau_uvmm.c        | 1979 +++++++++++++++++
>>   drivers/gpu/drm/nouveau/nouveau_uvmm.h        |  107 +
>>   drivers/gpu/drm/nouveau/nouveau_vmm.c         |    4 +-
>>   drivers/gpu/drm/nouveau/nvif/vmm.c            |  100 +-
>>   .../gpu/drm/nouveau/nvkm/subdev/mmu/uvmm.c    |  213 +-
>>   drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.c |  197 +-
>>   drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |   25 +
>>   .../drm/nouveau/nvkm/subdev/mmu/vmmgf100.c    |   16 +-
>>   .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c    |   16 +-
>>   .../gpu/drm/nouveau/nvkm/subdev/mmu/vmmnv50.c |   27 +-
>>   include/drm/drm_debugfs.h                     |   25 +
>>   include/drm/drm_drv.h                         |    6 +
>>   include/drm/drm_exec.h                        |  119 +
>>   include/drm/drm_gem.h                         |   52 +
>>   include/drm/drm_gpuva_mgr.h                   |  682 ++++++
>>   include/linux/maple_tree.h                    |    7 +-
>>   include/uapi/drm/nouveau_drm.h                |  209 ++
>>   51 files changed, 7566 insertions(+), 242 deletions(-)
>>   create mode 100644 drivers/gpu/drm/drm_exec.c
>>   create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h
>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.c
>>   create mode 100644 drivers/gpu/drm/nouveau/nouveau_uvmm.h
>>   create mode 100644 include/drm/drm_exec.h
>>   create mode 100644 include/drm/drm_gpuva_mgr.h
>>
>>
>> base-commit: 2222dcb0775d36de28992f56455ab3967b30d380
>
Boris Brezillon June 22, 2023, 1:01 p.m. UTC | #8
Hi Danilo,

On Tue, 20 Jun 2023 14:46:07 +0200
Danilo Krummrich <dakr@redhat.com> wrote:

> > The only thing I'm worried about is the 'sync mapping requests have to
> > go through the async path and wait for all previous async requests to
> > be processed' problem I mentioned in one of your previous submission,
> > but I'm happy leave that for later.  
> 
> Yes, I'm aware of this limitation.
> 
> Let me quickly try to explain where this limitation comes from and how I 
> intend to address it.
> 
> In order to be able to allocate the required page tables for a mapping 
> request and in order to free corresponding page tables once the (async) 
> job finished I need to know the corresponding sequence of operations 
> (drm_gpuva_ops) to fulfill the mapping request.
> 
> This requires me to update the GPUVA space in the ioctl() rather than in 
> the async stage, because otherwise I would need to wait for previous 
> jobs to finish before being able to submit subsequent jobs to the job 
> queue, since I need an up to date view of the GPUVA space in order to 
> calculate the sequence of operations to fulfill a mapping request.
> 
> As a consequence all jobs need to be processed in the order they were 
> submitted, including synchronous jobs.
> 
> @Matt: I think you will have the same limitation with synchronous jobs 
> as your implementation in XE should be similar?
> 
> In order to address it I want to switch to using callbacks rather than 
> 'pre-allocated' drm_gpuva_ops and update the GPUVA space within the 
> asynchronous stage.
> This would allow me to 'fit' synchronous jobs 
> between jobs waiting in the async job queue. However, to do this I have 
> to re-work how the page table handling in Nouveau is implemented, since 
> this would require me to be able to manage the page tables without 
> knowing the exact sequence of operations to fulfill a mapping request.

Ok, so I think that's more or less what we're trying to do right
now in PowerVR.

- First, we make sure we reserve enough MMU page tables for a given map
  operation to succeed no matter the VM state in the VM_BIND job
  submission path (our VM_BIND ioctl). That means we're always
  over-provisioning and returning unused memory back when the operation
  is done if we end up using less memory.
- We pre-allocate for the mapple-tree insertions.
- Then we map using drm_gpuva_sm_map() and the callbacks we provided in
  the drm_sched::run_job() path. We guarantee that no memory is
  allocated in that path thanks to the pre-allocation/reservation we've
  done at VM_BIND job submission time.

The problem I see with this v5 is that:

1/ We now have a dma_resv_lock_held() in drm_gpuva_{link,unlink}(),
   which, in our case, is called in the async drm_sched::run_job() path,
   and we don't hold the lock in that path (it's been released just
   after the job submission).
2/ I'm worried that Liam's plan to only reserve what's actually needed
   based on the mapple tree state is going to play against us, because
   the mapple-tree is only modified at job exec time, and we might have
   several unmaps happening between the moment we created and queued the
   jobs, and the moment they actually get executed, meaning the
   mapple-tree reservation might no longer fit the bill.

For issue #1, it shouldn't be to problematic if we use a regular lock to
insert to/remove from the GEM gpuva list.

For issue #2, I can see a way out if, instead of freeing gpuva nodes,
we flag those as unused when we see that something happening later in
the queue is going to map a section being unmapped. All of this implies
keeping access to already queued VM_BIND jobs (using the spsc queue at
the entity level is not practical), and iterating over them every time
a new sync or async job is queued to flag what needs to be retained. It
would obviously be easier if we could tell the mapple-tree API
'provision as if the tree was empty', so all we have to do is just
over-provision for both the page tables and mapple-tree insertion, and
free the unused mem when the operation is done.

Don't know if you already thought about that and/or have solutions to
solve these issues.

Regards,

Boris
Danilo Krummrich June 22, 2023, 1:58 p.m. UTC | #9
Hi Boris,

On 6/22/23 15:01, Boris Brezillon wrote:
> Hi Danilo,
> 
> On Tue, 20 Jun 2023 14:46:07 +0200
> Danilo Krummrich <dakr@redhat.com> wrote:
> 
>>> The only thing I'm worried about is the 'sync mapping requests have to
>>> go through the async path and wait for all previous async requests to
>>> be processed' problem I mentioned in one of your previous submission,
>>> but I'm happy leave that for later.
>>
>> Yes, I'm aware of this limitation.
>>
>> Let me quickly try to explain where this limitation comes from and how I
>> intend to address it.
>>
>> In order to be able to allocate the required page tables for a mapping
>> request and in order to free corresponding page tables once the (async)
>> job finished I need to know the corresponding sequence of operations
>> (drm_gpuva_ops) to fulfill the mapping request.
>>
>> This requires me to update the GPUVA space in the ioctl() rather than in
>> the async stage, because otherwise I would need to wait for previous
>> jobs to finish before being able to submit subsequent jobs to the job
>> queue, since I need an up to date view of the GPUVA space in order to
>> calculate the sequence of operations to fulfill a mapping request.
>>
>> As a consequence all jobs need to be processed in the order they were
>> submitted, including synchronous jobs.
>>
>> @Matt: I think you will have the same limitation with synchronous jobs
>> as your implementation in XE should be similar?
>>
>> In order to address it I want to switch to using callbacks rather than
>> 'pre-allocated' drm_gpuva_ops and update the GPUVA space within the
>> asynchronous stage.
>> This would allow me to 'fit' synchronous jobs
>> between jobs waiting in the async job queue. However, to do this I have
>> to re-work how the page table handling in Nouveau is implemented, since
>> this would require me to be able to manage the page tables without
>> knowing the exact sequence of operations to fulfill a mapping request.
> 
> Ok, so I think that's more or less what we're trying to do right
> now in PowerVR.
> 
> - First, we make sure we reserve enough MMU page tables for a given map
>    operation to succeed no matter the VM state in the VM_BIND job
>    submission path (our VM_BIND ioctl). That means we're always
>    over-provisioning and returning unused memory back when the operation
>    is done if we end up using less memory.
> - We pre-allocate for the mapple-tree insertions.
> - Then we map using drm_gpuva_sm_map() and the callbacks we provided in
>    the drm_sched::run_job() path. We guarantee that no memory is
>    allocated in that path thanks to the pre-allocation/reservation we've
>    done at VM_BIND job submission time.
> 
> The problem I see with this v5 is that:
> 
> 1/ We now have a dma_resv_lock_held() in drm_gpuva_{link,unlink}(),
>     which, in our case, is called in the async drm_sched::run_job() path,
>     and we don't hold the lock in that path (it's been released just
>     after the job submission).

My solution to this, as by now, is to - in the same way we pre-allocate 
- to just pre-link and pre-unlink. And then fix things up in the cleanup 
path.

However, depending on the driver, this might require you to set a flag 
in the driver specific structure (embedding struct drm_gpuva) whether 
the gpuva is actually mapped (as in has active page table entries). 
Maybe we could also just add such a flag to struct drm_gpuva. But yeah, 
doesn't sound too nice to be honest...

> 2/ I'm worried that Liam's plan to only reserve what's actually needed
>     based on the mapple tree state is going to play against us, because
>     the mapple-tree is only modified at job exec time, and we might have
>     several unmaps happening between the moment we created and queued the
>     jobs, and the moment they actually get executed, meaning the
>     mapple-tree reservation might no longer fit the bill.

Yes, I'm aware and I explained to Liam in detail why we need the 
mas_preallocate_worst_case() way of doing it.

See this mail: 
https://lore.kernel.org/nouveau/68cd25de-e767-725e-2e7b-703217230bb0@redhat.com/T/#ma326e200b1de1e3c9df4e9fcb3bf243061fee8b5

He hasn't answered yet, but I hope we can just get (or actually keep) 
such a function (hopefully with better naming), since it shouldn't 
interfere with anything else.

> 
> For issue #1, it shouldn't be to problematic if we use a regular lock to
> insert to/remove from the GEM gpuva list.

Yes, that's why I had a separate GEM gpuva list lock in the first place. 
However, this doesn't really work when generating ops rather than using 
the callback interface.

Have a look at drm_gpuva_gem_unmap_ops_create() requested by Matt for 
XE. This function generates drm_gpuva_ops to unmap all mappings of a 
given GEM. In order to do that the function must iterate the GEM's gpuva 
list and allocate operations for each mapping. As a consequence the 
gpuva list lock wouldn't be allowed to be taken in the fence signalling 
path (run_job()) any longer. Hence, we can just protect the list with 
the GEM's dma-resv lock.

However, I can understand that it might be inconvenient for the callback 
interface and admittedly my solution to that isn't that nice as well. 
Hence the following idea:

For drivers to be able to use their own lock for that it would be enough 
to get rid of the lockdep checks. We could just add a flag to the GPUVA 
manager to let the driver indicate it wants to do it's own locking for 
the GPUVA list and skip the lockdep checks for the dma-resv lock in that 
case.

> 
> For issue #2, I can see a way out if, instead of freeing gpuva nodes,
> we flag those as unused when we see that something happening later in
> the queue is going to map a section being unmapped. All of this implies
> keeping access to already queued VM_BIND jobs (using the spsc queue at
> the entity level is not practical), and iterating over them every time
> a new sync or async job is queued to flag what needs to be retained. It
> would obviously be easier if we could tell the mapple-tree API
> 'provision as if the tree was empty', so all we have to do is just
> over-provision for both the page tables and mapple-tree insertion, and
> free the unused mem when the operation is done.
> 
> Don't know if you already thought about that and/or have solutions to
> solve these issues.

As already mentioned above, I'd just expect we can keep it the 
over-provision way, as you say. I think it's a legit use case to not 
know the state of the maple tree at the time the pre-allocated nodes 
will be used and keeping that should not interfere with Liams plan to 
(hopefully separately) optimize for the pre-allocation use case they 
have within -mm.

But let's wait for his take on that.

- Danilo

> 
> Regards,
> 
> Boris
>
Boris Brezillon June 22, 2023, 3:19 p.m. UTC | #10
Hi Danilo,

On Thu, 22 Jun 2023 15:58:23 +0200
Danilo Krummrich <dakr@redhat.com> wrote:

> Hi Boris,
> 
> On 6/22/23 15:01, Boris Brezillon wrote:
> > Hi Danilo,
> > 
> > On Tue, 20 Jun 2023 14:46:07 +0200
> > Danilo Krummrich <dakr@redhat.com> wrote:
> >   
> >>> The only thing I'm worried about is the 'sync mapping requests have to
> >>> go through the async path and wait for all previous async requests to
> >>> be processed' problem I mentioned in one of your previous submission,
> >>> but I'm happy leave that for later.  
> >>
> >> Yes, I'm aware of this limitation.
> >>
> >> Let me quickly try to explain where this limitation comes from and how I
> >> intend to address it.
> >>
> >> In order to be able to allocate the required page tables for a mapping
> >> request and in order to free corresponding page tables once the (async)
> >> job finished I need to know the corresponding sequence of operations
> >> (drm_gpuva_ops) to fulfill the mapping request.
> >>
> >> This requires me to update the GPUVA space in the ioctl() rather than in
> >> the async stage, because otherwise I would need to wait for previous
> >> jobs to finish before being able to submit subsequent jobs to the job
> >> queue, since I need an up to date view of the GPUVA space in order to
> >> calculate the sequence of operations to fulfill a mapping request.
> >>
> >> As a consequence all jobs need to be processed in the order they were
> >> submitted, including synchronous jobs.
> >>
> >> @Matt: I think you will have the same limitation with synchronous jobs
> >> as your implementation in XE should be similar?
> >>
> >> In order to address it I want to switch to using callbacks rather than
> >> 'pre-allocated' drm_gpuva_ops and update the GPUVA space within the
> >> asynchronous stage.
> >> This would allow me to 'fit' synchronous jobs
> >> between jobs waiting in the async job queue. However, to do this I have
> >> to re-work how the page table handling in Nouveau is implemented, since
> >> this would require me to be able to manage the page tables without
> >> knowing the exact sequence of operations to fulfill a mapping request.  
> > 
> > Ok, so I think that's more or less what we're trying to do right
> > now in PowerVR.
> > 
> > - First, we make sure we reserve enough MMU page tables for a given map
> >    operation to succeed no matter the VM state in the VM_BIND job
> >    submission path (our VM_BIND ioctl). That means we're always
> >    over-provisioning and returning unused memory back when the operation
> >    is done if we end up using less memory.
> > - We pre-allocate for the mapple-tree insertions.
> > - Then we map using drm_gpuva_sm_map() and the callbacks we provided in
> >    the drm_sched::run_job() path. We guarantee that no memory is
> >    allocated in that path thanks to the pre-allocation/reservation we've
> >    done at VM_BIND job submission time.
> > 
> > The problem I see with this v5 is that:
> > 
> > 1/ We now have a dma_resv_lock_held() in drm_gpuva_{link,unlink}(),
> >     which, in our case, is called in the async drm_sched::run_job() path,
> >     and we don't hold the lock in that path (it's been released just
> >     after the job submission).  
> 
> My solution to this, as by now, is to - in the same way we pre-allocate 
> - to just pre-link and pre-unlink. And then fix things up in the cleanup 
> path.
> 
> However, depending on the driver, this might require you to set a flag 
> in the driver specific structure (embedding struct drm_gpuva) whether 
> the gpuva is actually mapped (as in has active page table entries). 
> Maybe we could also just add such a flag to struct drm_gpuva. But yeah, 
> doesn't sound too nice to be honest...
> 
> > 2/ I'm worried that Liam's plan to only reserve what's actually needed
> >     based on the mapple tree state is going to play against us, because
> >     the mapple-tree is only modified at job exec time, and we might have
> >     several unmaps happening between the moment we created and queued the
> >     jobs, and the moment they actually get executed, meaning the
> >     mapple-tree reservation might no longer fit the bill.  
> 
> Yes, I'm aware and I explained to Liam in detail why we need the 
> mas_preallocate_worst_case() way of doing it.
> 
> See this mail: 
> https://lore.kernel.org/nouveau/68cd25de-e767-725e-2e7b-703217230bb0@redhat.com/T/#ma326e200b1de1e3c9df4e9fcb3bf243061fee8b5
> 
> He hasn't answered yet, but I hope we can just get (or actually keep) 
> such a function (hopefully with better naming), since it shouldn't 
> interfere with anything else.

My bad, I started reading your reply and got interrupted. Never got
back to it, which I should definitely have done before posting my
questions. Anyway, glad to hear we're on the same page regarding the
mas_preallocate_worst_case() thing.

> 
> > 
> > For issue #1, it shouldn't be to problematic if we use a regular lock to
> > insert to/remove from the GEM gpuva list.  
> 
> Yes, that's why I had a separate GEM gpuva list lock in the first place. 
> However, this doesn't really work when generating ops rather than using 
> the callback interface.
> 
> Have a look at drm_gpuva_gem_unmap_ops_create() requested by Matt for 
> XE. This function generates drm_gpuva_ops to unmap all mappings of a 
> given GEM. In order to do that the function must iterate the GEM's gpuva 
> list and allocate operations for each mapping. As a consequence the 
> gpuva list lock wouldn't be allowed to be taken in the fence signalling 
> path (run_job()) any longer. Hence, we can just protect the list with 
> the GEM's dma-resv lock.

Yeah, I see why using dma_resv when pre-inserting the mapping is
useful, it just didn't really work with late mapping insertion.

> 
> However, I can understand that it might be inconvenient for the callback 
> interface and admittedly my solution to that isn't that nice as well. 
> Hence the following idea:
> 
> For drivers to be able to use their own lock for that it would be enough 
> to get rid of the lockdep checks. We could just add a flag to the GPUVA 
> manager to let the driver indicate it wants to do it's own locking for 
> the GPUVA list and skip the lockdep checks for the dma-resv lock in that 
> case.

Sounds good to me.

> 
> > 
> > For issue #2, I can see a way out if, instead of freeing gpuva nodes,
> > we flag those as unused when we see that something happening later in
> > the queue is going to map a section being unmapped. All of this implies
> > keeping access to already queued VM_BIND jobs (using the spsc queue at
> > the entity level is not practical), and iterating over them every time
> > a new sync or async job is queued to flag what needs to be retained. It
> > would obviously be easier if we could tell the mapple-tree API
> > 'provision as if the tree was empty', so all we have to do is just
> > over-provision for both the page tables and mapple-tree insertion, and
> > free the unused mem when the operation is done.
> > 
> > Don't know if you already thought about that and/or have solutions to
> > solve these issues.  
> 
> As already mentioned above, I'd just expect we can keep it the 
> over-provision way, as you say. I think it's a legit use case to not 
> know the state of the maple tree at the time the pre-allocated nodes 
> will be used and keeping that should not interfere with Liams plan to 
> (hopefully separately) optimize for the pre-allocation use case they 
> have within -mm.
> 
> But let's wait for his take on that.

Sure. As I said, I'm fine getting this version merged, we can sort out
the changes needed for PowerVR later. Just thought I'd mention those
issues early, so you're not surprised when we come back with crazy
requests (which apparently are not that crazy ;-)).

Regards,

Boris
Danilo Krummrich June 22, 2023, 3:27 p.m. UTC | #11
On 6/22/23 17:19, Boris Brezillon wrote:
> Hi Danilo,
> 
> On Thu, 22 Jun 2023 15:58:23 +0200
> Danilo Krummrich <dakr@redhat.com> wrote:
> 
>> Hi Boris,
>>
>> On 6/22/23 15:01, Boris Brezillon wrote:
>>> Hi Danilo,
>>>
>>> On Tue, 20 Jun 2023 14:46:07 +0200
>>> Danilo Krummrich <dakr@redhat.com> wrote:
>>>    
>>>>> The only thing I'm worried about is the 'sync mapping requests have to
>>>>> go through the async path and wait for all previous async requests to
>>>>> be processed' problem I mentioned in one of your previous submission,
>>>>> but I'm happy leave that for later.
>>>>
>>>> Yes, I'm aware of this limitation.
>>>>
>>>> Let me quickly try to explain where this limitation comes from and how I
>>>> intend to address it.
>>>>
>>>> In order to be able to allocate the required page tables for a mapping
>>>> request and in order to free corresponding page tables once the (async)
>>>> job finished I need to know the corresponding sequence of operations
>>>> (drm_gpuva_ops) to fulfill the mapping request.
>>>>
>>>> This requires me to update the GPUVA space in the ioctl() rather than in
>>>> the async stage, because otherwise I would need to wait for previous
>>>> jobs to finish before being able to submit subsequent jobs to the job
>>>> queue, since I need an up to date view of the GPUVA space in order to
>>>> calculate the sequence of operations to fulfill a mapping request.
>>>>
>>>> As a consequence all jobs need to be processed in the order they were
>>>> submitted, including synchronous jobs.
>>>>
>>>> @Matt: I think you will have the same limitation with synchronous jobs
>>>> as your implementation in XE should be similar?
>>>>
>>>> In order to address it I want to switch to using callbacks rather than
>>>> 'pre-allocated' drm_gpuva_ops and update the GPUVA space within the
>>>> asynchronous stage.
>>>> This would allow me to 'fit' synchronous jobs
>>>> between jobs waiting in the async job queue. However, to do this I have
>>>> to re-work how the page table handling in Nouveau is implemented, since
>>>> this would require me to be able to manage the page tables without
>>>> knowing the exact sequence of operations to fulfill a mapping request.
>>>
>>> Ok, so I think that's more or less what we're trying to do right
>>> now in PowerVR.
>>>
>>> - First, we make sure we reserve enough MMU page tables for a given map
>>>     operation to succeed no matter the VM state in the VM_BIND job
>>>     submission path (our VM_BIND ioctl). That means we're always
>>>     over-provisioning and returning unused memory back when the operation
>>>     is done if we end up using less memory.
>>> - We pre-allocate for the mapple-tree insertions.
>>> - Then we map using drm_gpuva_sm_map() and the callbacks we provided in
>>>     the drm_sched::run_job() path. We guarantee that no memory is
>>>     allocated in that path thanks to the pre-allocation/reservation we've
>>>     done at VM_BIND job submission time.
>>>
>>> The problem I see with this v5 is that:
>>>
>>> 1/ We now have a dma_resv_lock_held() in drm_gpuva_{link,unlink}(),
>>>      which, in our case, is called in the async drm_sched::run_job() path,
>>>      and we don't hold the lock in that path (it's been released just
>>>      after the job submission).
>>
>> My solution to this, as by now, is to - in the same way we pre-allocate
>> - to just pre-link and pre-unlink. And then fix things up in the cleanup
>> path.
>>
>> However, depending on the driver, this might require you to set a flag
>> in the driver specific structure (embedding struct drm_gpuva) whether
>> the gpuva is actually mapped (as in has active page table entries).
>> Maybe we could also just add such a flag to struct drm_gpuva. But yeah,
>> doesn't sound too nice to be honest...
>>
>>> 2/ I'm worried that Liam's plan to only reserve what's actually needed
>>>      based on the mapple tree state is going to play against us, because
>>>      the mapple-tree is only modified at job exec time, and we might have
>>>      several unmaps happening between the moment we created and queued the
>>>      jobs, and the moment they actually get executed, meaning the
>>>      mapple-tree reservation might no longer fit the bill.
>>
>> Yes, I'm aware and I explained to Liam in detail why we need the
>> mas_preallocate_worst_case() way of doing it.
>>
>> See this mail:
>> https://lore.kernel.org/nouveau/68cd25de-e767-725e-2e7b-703217230bb0@redhat.com/T/#ma326e200b1de1e3c9df4e9fcb3bf243061fee8b5
>>
>> He hasn't answered yet, but I hope we can just get (or actually keep)
>> such a function (hopefully with better naming), since it shouldn't
>> interfere with anything else.
> 
> My bad, I started reading your reply and got interrupted. Never got
> back to it, which I should definitely have done before posting my
> questions. Anyway, glad to hear we're on the same page regarding the
> mas_preallocate_worst_case() thing.

No worries, I should probably also reply to Liams patch introducing the 
change. I will do that in a minute.

> 
>>
>>>
>>> For issue #1, it shouldn't be to problematic if we use a regular lock to
>>> insert to/remove from the GEM gpuva list.
>>
>> Yes, that's why I had a separate GEM gpuva list lock in the first place.
>> However, this doesn't really work when generating ops rather than using
>> the callback interface.
>>
>> Have a look at drm_gpuva_gem_unmap_ops_create() requested by Matt for
>> XE. This function generates drm_gpuva_ops to unmap all mappings of a
>> given GEM. In order to do that the function must iterate the GEM's gpuva
>> list and allocate operations for each mapping. As a consequence the
>> gpuva list lock wouldn't be allowed to be taken in the fence signalling
>> path (run_job()) any longer. Hence, we can just protect the list with
>> the GEM's dma-resv lock.
> 
> Yeah, I see why using dma_resv when pre-inserting the mapping is
> useful, it just didn't really work with late mapping insertion.
> 
>>
>> However, I can understand that it might be inconvenient for the callback
>> interface and admittedly my solution to that isn't that nice as well.
>> Hence the following idea:
>>
>> For drivers to be able to use their own lock for that it would be enough
>> to get rid of the lockdep checks. We could just add a flag to the GPUVA
>> manager to let the driver indicate it wants to do it's own locking for
>> the GPUVA list and skip the lockdep checks for the dma-resv lock in that
>> case.
> 
> Sounds good to me.

I think it's way better than the pre-link / pre-unlink mess. I will add 
this to v6.

> 
>>
>>>
>>> For issue #2, I can see a way out if, instead of freeing gpuva nodes,
>>> we flag those as unused when we see that something happening later in
>>> the queue is going to map a section being unmapped. All of this implies
>>> keeping access to already queued VM_BIND jobs (using the spsc queue at
>>> the entity level is not practical), and iterating over them every time
>>> a new sync or async job is queued to flag what needs to be retained. It
>>> would obviously be easier if we could tell the mapple-tree API
>>> 'provision as if the tree was empty', so all we have to do is just
>>> over-provision for both the page tables and mapple-tree insertion, and
>>> free the unused mem when the operation is done.
>>>
>>> Don't know if you already thought about that and/or have solutions to
>>> solve these issues.
>>
>> As already mentioned above, I'd just expect we can keep it the
>> over-provision way, as you say. I think it's a legit use case to not
>> know the state of the maple tree at the time the pre-allocated nodes
>> will be used and keeping that should not interfere with Liams plan to
>> (hopefully separately) optimize for the pre-allocation use case they
>> have within -mm.
>>
>> But let's wait for his take on that.
> 
> Sure. As I said, I'm fine getting this version merged, we can sort out
> the changes needed for PowerVR later. Just thought I'd mention those
> issues early, so you're not surprised when we come back with crazy
> requests (which apparently are not that crazy ;-)).

They're not crazy at all, in fact they entirely represent what the 
callback interface was designed for. :-)

- Danilo

> 
> Regards,
> 
> Boris
>