Message ID | 20231031134059.171277-6-ishitatsuyuki@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/amdgpu: Add flag to disable implicit sync for GEM operations. | expand |
On 10/31/23 14:40, Tatsuyuki Ishi wrote: > In Vulkan, it is the application's responsibility to perform adequate > synchronization before a sparse unmap, replace or BO destroy operation. > Until now, the kernel applied the same rule as implicitly-synchronized > APIs like OpenGL, which with per-VM BOs made page table updates stall the > queue completely. The newly added AMDGPU_VM_EXPLICIT_SYNC flag allows > drivers to opt-out of this behavior, while still ensuring adequate implicit > sync happens for kernel-initiated updates (e.g. BO moves). > > We record whether to use implicit sync or not for each freed mapping. To > avoid increasing the mapping struct's size, this is union-ized with the > interval tree field which is unused after the unmap. > > The reason this is done with a GEM ioctl flag, instead of being a VM / > context global setting, is that the current libdrm implementation shares > the DRM handle even between different kind of drivers (radeonsi vs radv). Different drivers always use separate contexts though, even with the same DRM file description, don't they? FWIW, RADV will also want explicit sync in the CS ioctl.
On Tue, Oct 31, 2023 at 3:14 PM Michel Dänzer <michel.daenzer@mailbox.org> wrote: > On 10/31/23 14:40, Tatsuyuki Ishi wrote: > > In Vulkan, it is the application's responsibility to perform adequate > > synchronization before a sparse unmap, replace or BO destroy operation. > > Until now, the kernel applied the same rule as implicitly-synchronized > > APIs like OpenGL, which with per-VM BOs made page table updates stall the > > queue completely. The newly added AMDGPU_VM_EXPLICIT_SYNC flag allows > > drivers to opt-out of this behavior, while still ensuring adequate > implicit > > sync happens for kernel-initiated updates (e.g. BO moves). > > > > We record whether to use implicit sync or not for each freed mapping. To > > avoid increasing the mapping struct's size, this is union-ized with the > > interval tree field which is unused after the unmap. > > > > The reason this is done with a GEM ioctl flag, instead of being a VM / > > context global setting, is that the current libdrm implementation shares > > the DRM handle even between different kind of drivers (radeonsi vs radv). > > Different drivers always use separate contexts though, even with the same > DRM file description, don't they? > > FWIW, RADV will also want explicit sync in the CS ioctl. > > I think a crucial problem is that VA ioctls don't take a context so a per-context flag doesn't solve this (the previous attempt used it because all the sync changes were on the CS submit side and not the VA ioctl side) . So I'd still like to solve that side for RADV, but I think the VA ioctl flag makes sense here if we need to do anything different VA ioctl wise. > -- > Earthling Michel Dänzer | https://redhat.com > Libre software enthusiast | Mesa and Xwayland developer > >
Am 31.10.23 um 15:14 schrieb Michel Dänzer: > On 10/31/23 14:40, Tatsuyuki Ishi wrote: >> In Vulkan, it is the application's responsibility to perform adequate >> synchronization before a sparse unmap, replace or BO destroy operation. >> Until now, the kernel applied the same rule as implicitly-synchronized >> APIs like OpenGL, which with per-VM BOs made page table updates stall the >> queue completely. The newly added AMDGPU_VM_EXPLICIT_SYNC flag allows >> drivers to opt-out of this behavior, while still ensuring adequate implicit >> sync happens for kernel-initiated updates (e.g. BO moves). >> >> We record whether to use implicit sync or not for each freed mapping. To >> avoid increasing the mapping struct's size, this is union-ized with the >> interval tree field which is unused after the unmap. >> >> The reason this is done with a GEM ioctl flag, instead of being a VM / >> context global setting, is that the current libdrm implementation shares >> the DRM handle even between different kind of drivers (radeonsi vs radv). > Different drivers always use separate contexts though, even with the same DRM file description, don't they? Separate contexts don't help here since the VA space is shared between the two. > > FWIW, RADV will also want explicit sync in the CS ioctl. You can replace that with the DMA-buf IOCTLs like Faith is planning to do for NVK. Regards, Christian.
On 10/31/23 15:34, Christian König wrote: > Am 31.10.23 um 15:14 schrieb Michel Dänzer: > >> FWIW, RADV will also want explicit sync in the CS ioctl. > You can replace that with the DMA-buf IOCTLs like Faith is planning to do for NVK. Those ioctls cannot disable implicit sync for the CS ioctl. They can be used for making implicit sync work correctly for individual BOs though, once implicit sync is disabled for the CS ioctl.
Hi Tatsuyuki,
kernel test robot noticed the following build warnings:
[auto build test WARNING on drm-misc/drm-misc-next]
[also build test WARNING on drm-exynos/exynos-drm-next drm-intel/for-linux-next-fixes linus/master v6.6]
[cannot apply to drm/drm-next drm-intel/for-linux-next drm-tip/drm-tip next-20231031]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Tatsuyuki-Ishi/drm-amdgpu-Don-t-implicit-sync-PRT-maps/20231031-224530
base: git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
patch link: https://lore.kernel.org/r/20231031134059.171277-6-ishitatsuyuki%40gmail.com
patch subject: [PATCH 5/6] drm/amdgpu: Add flag to disable implicit sync for GEM operations.
config: arc-randconfig-001-20231101 (https://download.01.org/0day-ci/archive/20231101/202311011037.Bt6NSYwA-lkp@intel.com/config)
compiler: arceb-elf-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231101/202311011037.Bt6NSYwA-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202311011037.Bt6NSYwA-lkp@intel.com/
All warnings (new ones prefixed by >>):
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:178: warning: Function parameter or member 'evicted' not described in 'amdgpu_vm_bo_set_evicted'
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:178: warning: expecting prototype for amdgpu_vm_bo_evicted(). Prototype was for amdgpu_vm_bo_set_evicted() instead
>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1667: warning: Function parameter or member 'sync_unmap' not described in 'amdgpu_vm_bo_unmap'
vim +1667 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
d38ceaf99ed015 Alex Deucher 2015-04-20 1650
d38ceaf99ed015 Alex Deucher 2015-04-20 1651 /**
d38ceaf99ed015 Alex Deucher 2015-04-20 1652 * amdgpu_vm_bo_unmap - remove bo mapping from vm
d38ceaf99ed015 Alex Deucher 2015-04-20 1653 *
d38ceaf99ed015 Alex Deucher 2015-04-20 1654 * @adev: amdgpu_device pointer
d38ceaf99ed015 Alex Deucher 2015-04-20 1655 * @bo_va: bo_va to remove the address from
d38ceaf99ed015 Alex Deucher 2015-04-20 1656 * @saddr: where to the BO is mapped
d38ceaf99ed015 Alex Deucher 2015-04-20 1657 *
d38ceaf99ed015 Alex Deucher 2015-04-20 1658 * Remove a mapping of the BO at the specefied addr from the VM.
7fc48e5912795c Andrey Grodzovsky 2018-06-11 1659 *
7fc48e5912795c Andrey Grodzovsky 2018-06-11 1660 * Returns:
7fc48e5912795c Andrey Grodzovsky 2018-06-11 1661 * 0 for success, error for failure.
d38ceaf99ed015 Alex Deucher 2015-04-20 1662 *
49b02b180a541d Chunming Zhou 2015-11-13 1663 * Object has to be reserved and unreserved outside!
d38ceaf99ed015 Alex Deucher 2015-04-20 1664 */
1550024e9de031 Tatsuyuki Ishi 2023-10-31 1665 int amdgpu_vm_bo_unmap(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va,
1550024e9de031 Tatsuyuki Ishi 2023-10-31 1666 uint64_t saddr, bool sync_unmap)
d38ceaf99ed015 Alex Deucher 2015-04-20 @1667 {
d38ceaf99ed015 Alex Deucher 2015-04-20 1668 struct amdgpu_bo_va_mapping *mapping;
ec681545afe5a4 Christian König 2017-08-01 1669 struct amdgpu_vm *vm = bo_va->base.vm;
7fc11959018f8b Christian König 2015-07-30 1670 bool valid = true;
d38ceaf99ed015 Alex Deucher 2015-04-20 1671
6c7fc503a47f9b Christian König 2015-06-05 1672 saddr /= AMDGPU_GPU_PAGE_SIZE;
32b41ac21fde8f Christian König 2016-03-08 1673
7fc11959018f8b Christian König 2015-07-30 1674 list_for_each_entry(mapping, &bo_va->valids, list) {
a9f87f64525435 Christian König 2017-03-30 1675 if (mapping->start == saddr)
7fc11959018f8b Christian König 2015-07-30 1676 break;
7fc11959018f8b Christian König 2015-07-30 1677 }
7fc11959018f8b Christian König 2015-07-30 1678
7fc11959018f8b Christian König 2015-07-30 1679 if (&mapping->list == &bo_va->valids) {
7fc11959018f8b Christian König 2015-07-30 1680 valid = false;
7fc11959018f8b Christian König 2015-07-30 1681
7fc11959018f8b Christian König 2015-07-30 1682 list_for_each_entry(mapping, &bo_va->invalids, list) {
a9f87f64525435 Christian König 2017-03-30 1683 if (mapping->start == saddr)
d38ceaf99ed015 Alex Deucher 2015-04-20 1684 break;
d38ceaf99ed015 Alex Deucher 2015-04-20 1685 }
d38ceaf99ed015 Alex Deucher 2015-04-20 1686
32b41ac21fde8f Christian König 2016-03-08 1687 if (&mapping->list == &bo_va->invalids)
d38ceaf99ed015 Alex Deucher 2015-04-20 1688 return -ENOENT;
d38ceaf99ed015 Alex Deucher 2015-04-20 1689 }
32b41ac21fde8f Christian König 2016-03-08 1690
d38ceaf99ed015 Alex Deucher 2015-04-20 1691 list_del(&mapping->list);
a9f87f64525435 Christian König 2017-03-30 1692 amdgpu_vm_it_remove(mapping, &vm->va);
aebc5e6f50f770 Christian König 2017-09-06 1693 mapping->bo_va = NULL;
1550024e9de031 Tatsuyuki Ishi 2023-10-31 1694 mapping->sync_unmap = sync_unmap;
93e3e4385b69d8 Christian König 2015-06-09 1695 trace_amdgpu_vm_bo_unmap(bo_va, mapping);
d38ceaf99ed015 Alex Deucher 2015-04-20 1696
e17841b97587ad Christian König 2016-03-08 1697 if (valid)
d38ceaf99ed015 Alex Deucher 2015-04-20 1698 list_add(&mapping->list, &vm->freed);
e17841b97587ad Christian König 2016-03-08 1699 else
284710fa6c3a5f Christian König 2017-01-30 1700 amdgpu_vm_free_mapping(adev, vm, mapping,
284710fa6c3a5f Christian König 2017-01-30 1701 bo_va->last_pt_update);
d38ceaf99ed015 Alex Deucher 2015-04-20 1702
d38ceaf99ed015 Alex Deucher 2015-04-20 1703 return 0;
d38ceaf99ed015 Alex Deucher 2015-04-20 1704 }
d38ceaf99ed015 Alex Deucher 2015-04-20 1705
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 7d6daf8d2bfa..10e129bff977 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1196,7 +1196,7 @@ static void unmap_bo_from_gpuvm(struct kgd_mem *mem, struct amdgpu_device *adev = entry->adev; struct amdgpu_vm *vm = bo_va->base.vm; - amdgpu_vm_bo_unmap(adev, bo_va, entry->va); + amdgpu_vm_bo_unmap(adev, bo_va, entry->va, true); amdgpu_vm_clear_freed(adev, vm, &bo_va->last_pt_update); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c index 720011019741..612279e65bff 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c @@ -122,7 +122,7 @@ int amdgpu_unmap_static_csa(struct amdgpu_device *adev, struct amdgpu_vm *vm, } } - r = amdgpu_vm_bo_unmap(adev, bo_va, csa_addr); + r = amdgpu_vm_bo_unmap(adev, bo_va, csa_addr, true); if (r) { DRM_ERROR("failed to do bo_unmap on static CSA, err=%d\n", r); goto error; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index 01d3a97248b0..0d9496a06947 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -672,9 +672,9 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data, const uint32_t valid_flags = AMDGPU_VM_DELAY_UPDATE | AMDGPU_VM_PAGE_READABLE | AMDGPU_VM_PAGE_WRITEABLE | AMDGPU_VM_PAGE_EXECUTABLE | AMDGPU_VM_MTYPE_MASK | - AMDGPU_VM_PAGE_NOALLOC; + AMDGPU_VM_PAGE_NOALLOC | AMDGPU_VM_EXPLICIT_SYNC; const uint32_t prt_flags = AMDGPU_VM_DELAY_UPDATE | - AMDGPU_VM_PAGE_PRT; + AMDGPU_VM_PAGE_PRT | AMDGPU_VM_EXPLICIT_SYNC; struct drm_amdgpu_gem_va *args = data; struct drm_gem_object *gobj; @@ -685,6 +685,7 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data, struct drm_exec exec; uint64_t va_flags; uint64_t vm_size; + bool sync_unmap; int r = 0; if (args->va_address < AMDGPU_VA_RESERVED_SIZE) { @@ -720,6 +721,8 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data, return -EINVAL; } + sync_unmap = !(args->flags & AMDGPU_VM_EXPLICIT_SYNC); + switch (args->operation) { case AMDGPU_VA_OP_MAP: case AMDGPU_VA_OP_UNMAP: @@ -779,19 +782,20 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data, va_flags); break; case AMDGPU_VA_OP_UNMAP: - r = amdgpu_vm_bo_unmap(adev, bo_va, args->va_address); + r = amdgpu_vm_bo_unmap(adev, bo_va, args->va_address, + sync_unmap); break; case AMDGPU_VA_OP_CLEAR: r = amdgpu_vm_bo_clear_mappings(adev, &fpriv->vm, args->va_address, - args->map_size); + args->map_size, sync_unmap); break; case AMDGPU_VA_OP_REPLACE: va_flags = amdgpu_gem_va_map_flags(adev, args->flags); r = amdgpu_vm_bo_replace_map(adev, bo_va, args->va_address, args->offset_in_bo, args->map_size, - va_flags); + va_flags, sync_unmap); break; default: break; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h index f3ee83cdf97e..28be03f1bbcf 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h @@ -67,7 +67,12 @@ struct amdgpu_bo_va_mapping { struct rb_node rb; uint64_t start; uint64_t last; - uint64_t __subtree_last; + union { + /* BOs in interval tree only */ + uint64_t __subtree_last; + /* Freed BOs only */ + bool sync_unmap; + }; uint64_t offset; uint64_t flags; }; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h index 2fd1bfb35916..e71443c8c59b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h @@ -276,6 +276,7 @@ TRACE_EVENT(amdgpu_vm_bo_unmap, __field(long, last) __field(u64, offset) __field(u64, flags) + __field(bool, sync_unmap) ), TP_fast_assign( @@ -284,10 +285,11 @@ TRACE_EVENT(amdgpu_vm_bo_unmap, __entry->last = mapping->last; __entry->offset = mapping->offset; __entry->flags = mapping->flags; + __entry->sync_unmap = mapping->sync_unmap; ), - TP_printk("bo=%p, start=%lx, last=%lx, offset=%010llx, flags=%llx", + TP_printk("bo=%p, start=%lx, last=%lx, offset=%010llx, flags=%llx, sync_unmap=%d", __entry->bo, __entry->start, __entry->last, - __entry->offset, __entry->flags) + __entry->offset, __entry->flags, __entry->sync_unmap) ); DECLARE_EVENT_CLASS(amdgpu_vm_mapping, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 50f7cee639ac..a15463e0bbc5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -861,6 +861,7 @@ static void amdgpu_vm_tlb_seq_cb(struct dma_fence *fence, * @immediate: immediate submission in a page fault * @unlocked: unlocked invalidation during MM callback * @flush_tlb: trigger tlb invalidation after update completed + * @sync_unmap: wait for BO users before unmapping * @resv: fences we need to sync to * @start: start of mapped range * @last: last mapped entry @@ -878,8 +879,9 @@ static void amdgpu_vm_tlb_seq_cb(struct dma_fence *fence, */ int amdgpu_vm_update_range(struct amdgpu_device *adev, struct amdgpu_vm *vm, bool immediate, bool unlocked, bool flush_tlb, - struct dma_resv *resv, uint64_t start, uint64_t last, - uint64_t flags, uint64_t offset, uint64_t vram_base, + bool sync_unmap, struct dma_resv *resv, + uint64_t start, uint64_t last, uint64_t flags, + uint64_t offset, uint64_t vram_base, struct ttm_resource *res, dma_addr_t *pages_addr, struct dma_fence **fence) { @@ -919,7 +921,7 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, struct amdgpu_vm *vm, /* Implicitly sync to command submissions in the same VM before * unmapping. Sync to moving fences before mapping. */ - if (!(flags & (AMDGPU_PTE_VALID | AMDGPU_PTE_PRT))) + if (!(flags & (AMDGPU_PTE_VALID | AMDGPU_PTE_PRT)) && sync_unmap) sync_mode = AMDGPU_SYNC_EQ_OWNER; else sync_mode = AMDGPU_SYNC_EXPLICIT; @@ -1165,10 +1167,10 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va, trace_amdgpu_vm_bo_update(mapping); r = amdgpu_vm_update_range(adev, vm, false, false, flush_tlb, - resv, mapping->start, mapping->last, - update_flags, mapping->offset, - vram_base, mem, pages_addr, - last_update); + true, resv, mapping->start, + mapping->last, update_flags, + mapping->offset, vram_base, mem, + pages_addr, last_update); if (r) return r; } @@ -1349,7 +1351,8 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev, mapping->start < AMDGPU_GMC_HOLE_START) init_pte_value = AMDGPU_PTE_DEFAULT_ATC; - r = amdgpu_vm_update_range(adev, vm, false, false, true, resv, + r = amdgpu_vm_update_range(adev, vm, false, false, true, + mapping->sync_unmap, resv, mapping->start, mapping->last, init_pte_value, 0, 0, NULL, NULL, &f); @@ -1589,6 +1592,7 @@ int amdgpu_vm_bo_map(struct amdgpu_device *adev, * @offset: requested offset in the BO * @size: BO size in bytes * @flags: attributes of pages (read/write/valid/etc.) + * @sync_unmap: wait for BO users before replacing existing mapping * * Add a mapping of the BO at the specefied addr into the VM. Replace existing * mappings as we do so. @@ -1599,9 +1603,9 @@ int amdgpu_vm_bo_map(struct amdgpu_device *adev, * Object has to be reserved and unreserved outside! */ int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev, - struct amdgpu_bo_va *bo_va, - uint64_t saddr, uint64_t offset, - uint64_t size, uint64_t flags) + struct amdgpu_bo_va *bo_va, uint64_t saddr, + uint64_t offset, uint64_t size, uint64_t flags, + bool sync_unmap) { struct amdgpu_bo_va_mapping *mapping; struct amdgpu_bo *bo = bo_va->base.bo; @@ -1625,7 +1629,7 @@ int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev, if (!mapping) return -ENOMEM; - r = amdgpu_vm_bo_clear_mappings(adev, bo_va->base.vm, saddr, size); + r = amdgpu_vm_bo_clear_mappings(adev, bo_va->base.vm, saddr, size, sync_unmap); if (r) { kfree(mapping); return r; @@ -1658,9 +1662,8 @@ int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev, * * Object has to be reserved and unreserved outside! */ -int amdgpu_vm_bo_unmap(struct amdgpu_device *adev, - struct amdgpu_bo_va *bo_va, - uint64_t saddr) +int amdgpu_vm_bo_unmap(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va, + uint64_t saddr, bool sync_unmap) { struct amdgpu_bo_va_mapping *mapping; struct amdgpu_vm *vm = bo_va->base.vm; @@ -1688,6 +1691,7 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev, list_del(&mapping->list); amdgpu_vm_it_remove(mapping, &vm->va); mapping->bo_va = NULL; + mapping->sync_unmap = sync_unmap; trace_amdgpu_vm_bo_unmap(bo_va, mapping); if (valid) @@ -1706,6 +1710,7 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev, * @vm: VM structure to use * @saddr: start of the range * @size: size of the range + * @sync_unmap: wait for BO users before unmapping * * Remove all mappings in a range, split them as appropriate. * @@ -1713,8 +1718,8 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev, * 0 for success, error for failure. */ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev, - struct amdgpu_vm *vm, - uint64_t saddr, uint64_t size) + struct amdgpu_vm *vm, uint64_t saddr, + uint64_t size, bool sync_unmap) { struct amdgpu_bo_va_mapping *before, *after, *tmp, *next; LIST_HEAD(removed); @@ -1782,6 +1787,7 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev, tmp->last = eaddr; tmp->bo_va = NULL; + tmp->sync_unmap = sync_unmap; list_add(&tmp->list, &vm->freed); trace_amdgpu_vm_bo_unmap(NULL, tmp); } @@ -1899,6 +1905,7 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev, list_del(&mapping->list); amdgpu_vm_it_remove(mapping, &vm->va); mapping->bo_va = NULL; + mapping->sync_unmap = true; trace_amdgpu_vm_bo_unmap(bo_va, mapping); list_add(&mapping->list, &vm->freed); } @@ -2481,20 +2488,19 @@ int amdgpu_vm_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) struct amdgpu_device *adev = drm_to_adev(dev); struct amdgpu_fpriv *fpriv = filp->driver_priv; - /* No valid flags defined yet */ - if (args->in.flags) - return -EINVAL; - switch (args->in.op) { case AMDGPU_VM_OP_RESERVE_VMID: + if (args->in.flags) + return -EINVAL; /* We only have requirement to reserve vmid from gfxhub */ if (!fpriv->vm.reserved_vmid[AMDGPU_GFXHUB(0)]) { amdgpu_vmid_alloc_reserved(adev, AMDGPU_GFXHUB(0)); fpriv->vm.reserved_vmid[AMDGPU_GFXHUB(0)] = true; } - break; case AMDGPU_VM_OP_UNRESERVE_VMID: + if (args->in.flags) + return -EINVAL; if (fpriv->vm.reserved_vmid[AMDGPU_GFXHUB(0)]) { amdgpu_vmid_free_reserved(adev, AMDGPU_GFXHUB(0)); fpriv->vm.reserved_vmid[AMDGPU_GFXHUB(0)] = false; @@ -2633,8 +2639,9 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, u32 pasid, goto error_unlock; } - r = amdgpu_vm_update_range(adev, vm, true, false, false, NULL, addr, - addr, flags, value, 0, NULL, NULL, NULL); + r = amdgpu_vm_update_range(adev, vm, true, false, false, true, NULL, + addr, addr, flags, value, 0, NULL, NULL, + NULL); if (r) goto error_unlock; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h index f91d4fcf80b8..3574987595d1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h @@ -427,12 +427,12 @@ void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base, struct amdgpu_vm *vm, struct amdgpu_bo *bo); int amdgpu_vm_update_range(struct amdgpu_device *adev, struct amdgpu_vm *vm, bool immediate, bool unlocked, bool flush_tlb, - struct dma_resv *resv, uint64_t start, uint64_t last, - uint64_t flags, uint64_t offset, uint64_t vram_base, + bool sync_unmap, struct dma_resv *resv, + uint64_t start, uint64_t last, uint64_t flags, + uint64_t offset, uint64_t vram_base, struct ttm_resource *res, dma_addr_t *pages_addr, struct dma_fence **fence); -int amdgpu_vm_bo_update(struct amdgpu_device *adev, - struct amdgpu_bo_va *bo_va, +int amdgpu_vm_bo_update(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va, bool clear); bool amdgpu_vm_evictable(struct amdgpu_bo *bo); void amdgpu_vm_bo_invalidate(struct amdgpu_device *adev, @@ -448,15 +448,14 @@ int amdgpu_vm_bo_map(struct amdgpu_device *adev, uint64_t addr, uint64_t offset, uint64_t size, uint64_t flags); int amdgpu_vm_bo_replace_map(struct amdgpu_device *adev, - struct amdgpu_bo_va *bo_va, - uint64_t addr, uint64_t offset, - uint64_t size, uint64_t flags); -int amdgpu_vm_bo_unmap(struct amdgpu_device *adev, - struct amdgpu_bo_va *bo_va, - uint64_t addr); + struct amdgpu_bo_va *bo_va, uint64_t addr, + uint64_t offset, uint64_t size, uint64_t flags, + bool sync_unmap); +int amdgpu_vm_bo_unmap(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va, + uint64_t addr, bool sync_unmap); int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev, - struct amdgpu_vm *vm, - uint64_t saddr, uint64_t size); + struct amdgpu_vm *vm, uint64_t saddr, + uint64_t size, bool sync_unmap); struct amdgpu_bo_va_mapping *amdgpu_vm_bo_lookup_mapping(struct amdgpu_vm *vm, uint64_t addr); void amdgpu_vm_bo_trace_cs(struct amdgpu_vm *vm, struct ww_acquire_ctx *ticket); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index bb16b795d1bc..6eb4a0a4bc84 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c @@ -1291,9 +1291,9 @@ svm_range_unmap_from_gpu(struct amdgpu_device *adev, struct amdgpu_vm *vm, pr_debug("[0x%llx 0x%llx]\n", start, last); - return amdgpu_vm_update_range(adev, vm, false, true, true, NULL, start, - last, init_pte_value, 0, 0, NULL, NULL, - fence); + return amdgpu_vm_update_range(adev, vm, false, true, true, true, NULL, + start, last, init_pte_value, 0, 0, NULL, + NULL, fence); } static int @@ -1398,12 +1398,12 @@ svm_range_map_to_gpu(struct kfd_process_device *pdd, struct svm_range *prange, * different memory partition based on fpfn/lpfn, we should use * same vm_manager.vram_base_offset regardless memory partition. */ - r = amdgpu_vm_update_range(adev, vm, false, false, flush_tlb, NULL, - last_start, prange->start + i, - pte_flags, - (last_start - prange->start) << PAGE_SHIFT, - bo_adev ? bo_adev->vm_manager.vram_base_offset : 0, - NULL, dma_addr, &vm->last_update); + r = amdgpu_vm_update_range( + adev, vm, false, false, flush_tlb, true, NULL, + last_start, prange->start + i, pte_flags, + (last_start - prange->start) << PAGE_SHIFT, + bo_adev ? bo_adev->vm_manager.vram_base_offset : 0, + NULL, dma_addr, &vm->last_update); for (j = last_start - prange->start; j <= i; j++) dma_addr[j] |= last_domain; diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h index f477eda6a2b8..3cdcc299956e 100644 --- a/include/uapi/drm/amdgpu_drm.h +++ b/include/uapi/drm/amdgpu_drm.h @@ -556,6 +556,8 @@ struct drm_amdgpu_gem_op { #define AMDGPU_VM_MTYPE_RW (5 << 5) /* don't allocate MALL */ #define AMDGPU_VM_PAGE_NOALLOC (1 << 9) +/* don't sync on unmap */ +#define AMDGPU_VM_EXPLICIT_SYNC (1 << 10) struct drm_amdgpu_gem_va { /** GEM object handle */
In Vulkan, it is the application's responsibility to perform adequate synchronization before a sparse unmap, replace or BO destroy operation. Until now, the kernel applied the same rule as implicitly-synchronized APIs like OpenGL, which with per-VM BOs made page table updates stall the queue completely. The newly added AMDGPU_VM_EXPLICIT_SYNC flag allows drivers to opt-out of this behavior, while still ensuring adequate implicit sync happens for kernel-initiated updates (e.g. BO moves). We record whether to use implicit sync or not for each freed mapping. To avoid increasing the mapping struct's size, this is union-ized with the interval tree field which is unused after the unmap. The reason this is done with a GEM ioctl flag, instead of being a VM / context global setting, is that the current libdrm implementation shares the DRM handle even between different kind of drivers (radeonsi vs radv). Signed-off-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com> --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 14 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 7 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 6 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 55 +++++++++++-------- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 23 ++++---- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 18 +++--- include/uapi/drm/amdgpu_drm.h | 2 + 9 files changed, 74 insertions(+), 55 deletions(-)