mbox series

[RFC,00/29] UMD direct submission in Xe

Message ID 20241118233757.2374041-1-matthew.brost@intel.com (mailing list archive)
Headers show
Series UMD direct submission in Xe | expand

Message

Matthew Brost Nov. 18, 2024, 11:37 p.m. UTC
This is an RFC, or possibly even a proof of concept, for UMD (User Mode
Driver) direct submission in Xe. It is similar to AMD's design [1] [2]
or ARM's design [3], utilizing a uAPI to convert user-space syncs
(memory writes) to kernel-space syncs (DMA fences). It is built around
the existing Xe preemption fences for dynamic memory management, such as
userptr invalidation and buffer object (BO) eviction.

The series also enables mapping a PPGTT-bound submission ring in
non-privileged mode, as well as exposing indirect ring state (such as
ring head, tail, etc.) and the doorbell to user space, enabling UMD
direct submission.

The target for this series is Mesa, with the goal of enabling UMD direct
submission and removing the submission thread that currently handles
future fences. I've discussed this with Sima and the Intel Mesa team,
and it seems like a reachable target. Most synchronization will be
handled in user space via memory writes and semaphore wait ring
instructions, with only legacy cross-process synchronization (e.g.,
compositors) requiring kernel synchronization (DMA fences).

The series includes some common patches at the beginning to implement
preemption fences and user fences. The idea of preemption
DMA-reservation slots [4] has been dropped in favor of attaching the
last exported DMA fence to the preemption fence as suggested by AMD.

This is a public checkpoint on the KMD (Kernel Mode Driver) work, which
will be tabled until Intel's Mesa team has the bandwidth to begin the
UMD work. That said, the uAPI is very preliminary and likely to change.
One idea that was discussed is a common user fence interface based
around DRM syncobjs, which will likely be explored further as UMD
engagement begins. Some work for syncing VM binds (kernel operation)
with UMD direct submission is also likely required.

Testing has been done with [5], and the main features—such as basic
submission, dynamic memory management, user-to-kernel sync conversion,
and protection against endless user fences—are working on BMG and LNL.

The GitLab branch [6] has also been pushed for reference.

Any early community feedback is always appreciated.

Matt

[1] https://patchwork.freedesktop.org/series/113675/
[2] https://patchwork.freedesktop.org/series/114385/
[3] https://patchwork.freedesktop.org/series/137924/
[4] https://patchwork.freedesktop.org/series/141129/
[5] https://patchwork.freedesktop.org/series/141518/
[6] https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-umd-submission-post/-/tree/post-11-18-24?ref_type=heads 

Matthew Brost (28):
  dma-fence: Add dma_fence_preempt base class
  dma-fence: Add dma_fence_user_fence
  drm/xe: Use dma_fence_preempt base class
  drm/xe: Allocate doorbells for UMD exec queues
  drm/xe: Add doorbell ID to snapshot capture
  drm/xe: Break submission ring out into its own BO
  drm/xe: Break indirect ring state out into its own BO
  drm/xe: Clear GGTT in xe_bo_restore_kernel
  FIXME: drm/xe: Add pad to ring and indirect state
  drm/xe: Enable indirect ring on media GT
  drm/xe: Don't add pinned mappings to VM bulk move
  drm/xe: Add exec queue post init extension processing
  drm/xe: Add support for mmapping doorbells to user space
  drm/xe: Add support for mmapping submission ring and indirect ring
    state to user space
  drm/xe/uapi: Define UMD exec queue mapping uAPI
  drm/xe: Add usermap exec queue extension
  drm/xe: Drop EXEC_QUEUE_FLAG_UMD_SUBMISSION flag
  drm/xe: Do not allow usermap exec queues in exec IOCTL
  drm/xe: Teach GuC backend to kill usermap queues
  drm/xe: Enable preempt fences on usermap queues
  drm/xe/uapi: Add uAPI to convert user semaphore to / from drm syncobj
  drm/xe: Add user fence IRQ handler
  drm/xe: Add xe_hw_fence_user_init
  drm/xe: Add a message lock to the Xe GPU scheduler
  drm/xe: Always wait on preempt fences in vma_check_userptr
  drm/xe: Teach xe_sync layer about drm_xe_semaphore
  drm/xe: Add VM convert fence IOCTL
  drm/xe: Add user fence TDR

Tejas Upadhyay (1):
  drm/xe/mmap: Add mmap support for PCI memory barrier

 drivers/dma-buf/Makefile                     |   2 +-
 drivers/dma-buf/dma-fence-preempt.c          | 134 ++++++
 drivers/dma-buf/dma-fence-user-fence.c       |  73 ++++
 drivers/gpu/drm/xe/xe_bo.c                   |  29 +-
 drivers/gpu/drm/xe/xe_bo.h                   |   5 +
 drivers/gpu/drm/xe/xe_bo_evict.c             |   8 +-
 drivers/gpu/drm/xe/xe_device.c               | 181 +++++++-
 drivers/gpu/drm/xe/xe_device_types.h         |   3 +
 drivers/gpu/drm/xe/xe_exec.c                 |   3 +-
 drivers/gpu/drm/xe/xe_exec_queue.c           | 175 +++++++-
 drivers/gpu/drm/xe/xe_exec_queue.h           |   5 +
 drivers/gpu/drm/xe/xe_exec_queue_types.h     |  13 +
 drivers/gpu/drm/xe/xe_execlist.c             |   2 +-
 drivers/gpu/drm/xe/xe_ggtt.c                 |  19 +-
 drivers/gpu/drm/xe/xe_ggtt.h                 |   2 +
 drivers/gpu/drm/xe/xe_gpu_scheduler.c        |  19 +-
 drivers/gpu/drm/xe/xe_gpu_scheduler.h        |  12 +-
 drivers/gpu/drm/xe/xe_gpu_scheduler_types.h  |   2 +
 drivers/gpu/drm/xe/xe_guc_exec_queue_types.h |   9 +-
 drivers/gpu/drm/xe/xe_guc_submit.c           | 177 +++++++-
 drivers/gpu/drm/xe/xe_guc_submit_types.h     |   2 +
 drivers/gpu/drm/xe/xe_hw_engine.c            |   4 +-
 drivers/gpu/drm/xe/xe_hw_engine_group.c      |   4 +-
 drivers/gpu/drm/xe/xe_hw_fence.c             |  17 +
 drivers/gpu/drm/xe/xe_hw_fence.h             |   3 +
 drivers/gpu/drm/xe/xe_lrc.c                  | 176 ++++++--
 drivers/gpu/drm/xe/xe_lrc.h                  |   4 +-
 drivers/gpu/drm/xe/xe_lrc_types.h            |  16 +-
 drivers/gpu/drm/xe/xe_pci.c                  |   1 +
 drivers/gpu/drm/xe/xe_preempt_fence.c        |  89 ++--
 drivers/gpu/drm/xe/xe_preempt_fence.h        |   2 +-
 drivers/gpu/drm/xe/xe_preempt_fence_types.h  |  11 +-
 drivers/gpu/drm/xe/xe_pt.c                   |   5 +-
 drivers/gpu/drm/xe/xe_sync.c                 |  90 ++++
 drivers/gpu/drm/xe/xe_sync.h                 |   8 +
 drivers/gpu/drm/xe/xe_sync_types.h           |   5 +-
 drivers/gpu/drm/xe/xe_vm.c                   | 423 ++++++++++++++++++-
 drivers/gpu/drm/xe/xe_vm.h                   |   4 +-
 drivers/gpu/drm/xe/xe_vm_types.h             |  26 ++
 include/linux/dma-fence-preempt.h            |  56 +++
 include/linux/dma-fence-user-fence.h         |  31 ++
 include/uapi/drm/xe_drm.h                    | 147 ++++++-
 42 files changed, 1798 insertions(+), 199 deletions(-)
 create mode 100644 drivers/dma-buf/dma-fence-preempt.c
 create mode 100644 drivers/dma-buf/dma-fence-user-fence.c
 create mode 100644 include/linux/dma-fence-preempt.h
 create mode 100644 include/linux/dma-fence-user-fence.h