mbox series

[v8,0/8] Add support for SVM atomics in Nouveau

Message ID 20210407084238.20443-1-apopple@nvidia.com (mailing list archive)
Headers show
Series Add support for SVM atomics in Nouveau | expand

Message

Alistair Popple April 7, 2021, 8:42 a.m. UTC
This is the eighth version of a series to add support to Nouveau for atomic
memory operations on OpenCL shared virtual memory (SVM) regions.

The main change for this version is a simplification of device exclusive
entry handling. Instead of copying entries for copy-on-write mappings
during fork they are removed instead. This is safer because there could be
unique corner cases when copying, particularly for pinned pages which
should follow the same logic as copy_present_page(). Removing entries
avoids this possiblity by treating them as normal ptes.

Exclusive device access is implemented by adding a new swap entry type
(SWAP_DEVICE_EXCLUSIVE) which is similar to a migration entry. The main
difference is that on fault the original entry is immediately restored by
the fault handler instead of waiting.

Restoring the entry triggers calls to MMU notifers which allows a device
driver to revoke the atomic access permission from the GPU prior to the CPU
finalising the entry.

Patches 1 & 2 refactor existing migration and device private entry
functions.

Patches 3 & 4 rework try_to_unmap_one() by splitting out unrelated
functionality into separate functions - try_to_migrate_one() and
try_to_munlock_one(). These should not change any functionality, but any
help testing would be much appreciated as I have not been able to test
every usage of try_to_unmap_one().

Patch 5 contains the bulk of the implementation for device exclusive
memory.

Patch 6 contains some additions to the HMM selftests to ensure everything
works as expected.

Patch 7 is a cleanup for the Nouveau SVM implementation.

Patch 8 contains the implementation of atomic access for the Nouveau
driver.

This has been tested using the latest upstream Mesa userspace with a simple
OpenCL test program which checks the results of atomic GPU operations on a
SVM buffer whilst also writing to the same buffer from the CPU.

Alistair Popple (8):
  mm: Remove special swap entry functions
  mm/swapops: Rework swap entry manipulation code
  mm/rmap: Split try_to_munlock from try_to_unmap
  mm/rmap: Split migration into its own function
  mm: Device exclusive memory access
  mm: Selftests for exclusive device memory
  nouveau/svm: Refactor nouveau_range_fault
  nouveau/svm: Implement atomic SVM access

 Documentation/vm/hmm.rst                      |  19 +-
 Documentation/vm/unevictable-lru.rst          |  33 +-
 arch/s390/mm/pgtable.c                        |   2 +-
 drivers/gpu/drm/nouveau/include/nvif/if000c.h |   1 +
 drivers/gpu/drm/nouveau/nouveau_svm.c         | 156 ++++-
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |   1 +
 .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c    |   6 +
 fs/proc/task_mmu.c                            |  23 +-
 include/linux/mmu_notifier.h                  |  26 +-
 include/linux/rmap.h                          |  11 +-
 include/linux/swap.h                          |   8 +-
 include/linux/swapops.h                       | 123 ++--
 lib/test_hmm.c                                | 126 +++-
 lib/test_hmm_uapi.h                           |   2 +
 mm/debug_vm_pgtable.c                         |  12 +-
 mm/hmm.c                                      |  12 +-
 mm/huge_memory.c                              |  45 +-
 mm/hugetlb.c                                  |  10 +-
 mm/memcontrol.c                               |   2 +-
 mm/memory.c                                   | 196 +++++-
 mm/migrate.c                                  |  51 +-
 mm/mlock.c                                    |  10 +-
 mm/mprotect.c                                 |  18 +-
 mm/page_vma_mapped.c                          |  15 +-
 mm/rmap.c                                     | 612 +++++++++++++++---
 tools/testing/selftests/vm/hmm-tests.c        | 158 +++++
 26 files changed, 1366 insertions(+), 312 deletions(-)

Comments

Alistair Popple May 6, 2021, 7:43 a.m. UTC | #1
Hi Andrew,

There is currently no outstanding feedback for this series so I am hoping it 
may be considered for inclusion (or at least the mm portions - I still need 
Reviews/Acks for the Nouveau bits). The main change for v8 was removal of 
entries on fork rather than copying in response to feedback from Jason so any 
follow up comments on patch 5 would also be welcome. The series contains a 
number of general clean-ups suggested by Christoph along with a feature to 
temporarily make selected user page mappings write-protected.

This is needed to support OpenCL atomic operations in Nouveau to shared 
virtual memory (SVM) regions allocated with the CL_MEM_SVM_ATOMICS clSVMAlloc 
flag. A more complete description of the OpenCL SVM feature is available at 
https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/
OpenCL_API.html#_shared_virtual_memory .

I have been testing this with Mesa 21.1.0 and a simple OpenCL program which 
checks GPU atomic accesses to system memory are atomic. Without this series 
the test fails as there is no way of write-protecting the userspace page 
mapping which results in the device clobbering CPU writes. For reference the 
test is available at https://ozlabs.org/~apopple/opencl_svm_atomics/ .

 - Alistair

On Wednesday, 7 April 2021 6:42:30 PM AEST Alistair Popple wrote:
> This is the eighth version of a series to add support to Nouveau for atomic
> memory operations on OpenCL shared virtual memory (SVM) regions.
> 
> The main change for this version is a simplification of device exclusive
> entry handling. Instead of copying entries for copy-on-write mappings
> during fork they are removed instead. This is safer because there could be
> unique corner cases when copying, particularly for pinned pages which
> should follow the same logic as copy_present_page(). Removing entries
> avoids this possiblity by treating them as normal ptes.
> 
> Exclusive device access is implemented by adding a new swap entry type
> (SWAP_DEVICE_EXCLUSIVE) which is similar to a migration entry. The main
> difference is that on fault the original entry is immediately restored by
> the fault handler instead of waiting.
> 
> Restoring the entry triggers calls to MMU notifers which allows a device
> driver to revoke the atomic access permission from the GPU prior to the CPU
> finalising the entry.
> 
> Patches 1 & 2 refactor existing migration and device private entry
> functions.
> 
> Patches 3 & 4 rework try_to_unmap_one() by splitting out unrelated
> functionality into separate functions - try_to_migrate_one() and
> try_to_munlock_one(). These should not change any functionality, but any
> help testing would be much appreciated as I have not been able to test
> every usage of try_to_unmap_one().
> 
> Patch 5 contains the bulk of the implementation for device exclusive
> memory.
> 
> Patch 6 contains some additions to the HMM selftests to ensure everything
> works as expected.
> 
> Patch 7 is a cleanup for the Nouveau SVM implementation.
> 
> Patch 8 contains the implementation of atomic access for the Nouveau
> driver.
> 
> This has been tested using the latest upstream Mesa userspace with a simple
> OpenCL test program which checks the results of atomic GPU operations on a
> SVM buffer whilst also writing to the same buffer from the CPU.
> 
> Alistair Popple (8):
>   mm: Remove special swap entry functions
>   mm/swapops: Rework swap entry manipulation code
>   mm/rmap: Split try_to_munlock from try_to_unmap
>   mm/rmap: Split migration into its own function
>   mm: Device exclusive memory access
>   mm: Selftests for exclusive device memory
>   nouveau/svm: Refactor nouveau_range_fault
>   nouveau/svm: Implement atomic SVM access
> 
>  Documentation/vm/hmm.rst                      |  19 +-
>  Documentation/vm/unevictable-lru.rst          |  33 +-
>  arch/s390/mm/pgtable.c                        |   2 +-
>  drivers/gpu/drm/nouveau/include/nvif/if000c.h |   1 +
>  drivers/gpu/drm/nouveau/nouveau_svm.c         | 156 ++++-
>  drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |   1 +
>  .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c    |   6 +
>  fs/proc/task_mmu.c                            |  23 +-
>  include/linux/mmu_notifier.h                  |  26 +-
>  include/linux/rmap.h                          |  11 +-
>  include/linux/swap.h                          |   8 +-
>  include/linux/swapops.h                       | 123 ++--
>  lib/test_hmm.c                                | 126 +++-
>  lib/test_hmm_uapi.h                           |   2 +
>  mm/debug_vm_pgtable.c                         |  12 +-
>  mm/hmm.c                                      |  12 +-
>  mm/huge_memory.c                              |  45 +-
>  mm/hugetlb.c                                  |  10 +-
>  mm/memcontrol.c                               |   2 +-
>  mm/memory.c                                   | 196 +++++-
>  mm/migrate.c                                  |  51 +-
>  mm/mlock.c                                    |  10 +-
>  mm/mprotect.c                                 |  18 +-
>  mm/page_vma_mapped.c                          |  15 +-
>  mm/rmap.c                                     | 612 +++++++++++++++---
>  tools/testing/selftests/vm/hmm-tests.c        | 158 +++++
>  26 files changed, 1366 insertions(+), 312 deletions(-)
> 
>