mbox series

[00/11] Introduce drm evictable lru

Message ID 20231102043306.2931989-1-oak.zeng@intel.com (mailing list archive)
Headers show
Series Introduce drm evictable lru | expand

Message

Zeng, Oak Nov. 2, 2023, 4:32 a.m. UTC
We plan to implement xe driver's shared virtual memory
manager (aka SVM) without buffer object concept. This
means we won't build our shared virtual memory manager
upon TTM infrastructure like amdgpu does.

Even though this approach is more efficient, it does
create a problem for memory eviction when there is
memory pressure: memory allocated by SVM and memory
allocated by TTM should be able to mutually evict
from each other. TTM's resource manager maintains
a LRU list for each memory type and this list is used
to pick up the memory eviction victim. Since we don't
use TTM for SVM implementation, SVM allocated memory
can't be added to TTM resource manager's LRU list. Thus
SVM allocated memory and TTM allocated memory are not
mutually evictable.

See more discussion on this topic here:
https://www.spinics.net/lists/dri-devel/msg410740.html

This series solve this problem by creating a shared
LRU list b/t SVM and TTM, or any other resource manager.

The basic idea is, abstract a drm_lru_entity structure
which is supposed to be embedded in ttm_resource structure,
or any other resource manager. The resource LRU list is a 
list of drm_lru_entity. drm_lru_entity has eviction function
pointers which can be used to call back drivers' specific
eviction function to evict a memory resource.

Introduce global drm_lru_manager to struct drm_device
to manage LRU lists. Each memory type or memory region
can have a LRU list. TTM resource manager's LRU list functions
including bulk move functions are moved to drm lru manager.
drm lru manager provides a evict_first function to evict
the first memory resource from LRU list. This function can
be called from TTM, SVM or any other resource manager, so
all the memory allocated in the drm sub-system can be mutually
evicted.

The lru_lock is also moved from struct ttm_device to struct 
drm_device.

Opens:
1) memory accounting: currently the ttm resource manager's
memory accounting functions is kept at ttm resource manager.
Since memory accounting should be cross TTM and SVM, it should
be ideally in the drm lru manager layer. This will be polished
in the future.

2) eviction callback function interface: The current eviction
function interface is designed to meet TTM memory eviction
requirements. When SVM is in the picture, this interface
need to be futher tunned to meet SVM requirement also. 

This series is not tested and it is only compiled for xe
driver. Some minor changes are needed for other driver
such as amdgpu, nouveau etc. I intended to send this out
as a request for comment series to get some early feedback,
to see whether this is the right direction to go. I will
futher polish this series after a direction is agreed.

Oak Zeng (11):
  drm/ttm: re-parameter ttm_device_init
  drm: move lru_lock from ttm_device to drm_device
  drm: introduce drm evictable LRU
  drm: Add evict function pointer to drm lru entity
  drm: Replace ttm macros with drm macros
  drm/ttm: Set lru manager to ttm resource manager
  drm/ttm: re-parameterize a few ttm functions
  drm: Initialize drm lru manager
  drm/ttm: Use drm LRU manager iterator
  drm/ttm: Implement ttm memory evict functions
  drm/ttm: Write ttm functions using drm lru manager functions

 drivers/gpu/drm/Makefile                      |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c   |   6 +
 .../gpu/drm/amd/amdgpu/amdgpu_preempt_mgr.c   |   6 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c        |   6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h        |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c  |  10 +-
 drivers/gpu/drm/drm_drv.c                     |   1 +
 drivers/gpu/drm/drm_evictable_lru.c           | 266 ++++++++++++++++++
 drivers/gpu/drm/drm_gem_vram_helper.c         |  10 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |   6 +-
 drivers/gpu/drm/i915/i915_ttm_buddy_manager.c |  10 +
 drivers/gpu/drm/i915/intel_region_ttm.c       |   4 +-
 drivers/gpu/drm/i915/selftests/mock_region.c  |   2 +-
 drivers/gpu/drm/loongson/lsdc_ttm.c           |  10 +-
 drivers/gpu/drm/nouveau/nouveau_ttm.c         |  22 +-
 drivers/gpu/drm/qxl/qxl_ttm.c                 |   6 +-
 drivers/gpu/drm/radeon/radeon_ttm.c           |  10 +-
 drivers/gpu/drm/ttm/tests/ttm_device_test.c   |   2 +-
 drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c |   2 +-
 drivers/gpu/drm/ttm/ttm_bo.c                  | 247 ++++++++++++----
 drivers/gpu/drm/ttm/ttm_bo_util.c             |  20 +-
 drivers/gpu/drm/ttm/ttm_bo_vm.c               |   2 +-
 drivers/gpu/drm/ttm/ttm_device.c              |  55 ++--
 drivers/gpu/drm/ttm/ttm_module.h              |   3 +-
 drivers/gpu/drm/ttm/ttm_range_manager.c       |  14 +-
 drivers/gpu/drm/ttm/ttm_resource.c            | 242 +++-------------
 drivers/gpu/drm/ttm/ttm_sys_manager.c         |   8 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.c            |   2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_bo.h            |   2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c           |   6 +-
 .../gpu/drm/vmwgfx/vmwgfx_system_manager.c    |   6 +
 drivers/gpu/drm/xe/xe_bo.c                    |  48 ++--
 drivers/gpu/drm/xe/xe_bo.h                    |   5 +-
 drivers/gpu/drm/xe/xe_device.c                |   2 +-
 drivers/gpu/drm/xe/xe_dma_buf.c               |   4 +-
 drivers/gpu/drm/xe/xe_exec.c                  |   6 +-
 drivers/gpu/drm/xe/xe_migrate.c               |   6 +-
 drivers/gpu/drm/xe/xe_res_cursor.h            |  10 +-
 drivers/gpu/drm/xe/xe_ttm_sys_mgr.c           |   8 +-
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.c          |  18 +-
 drivers/gpu/drm/xe/xe_vm.c                    |   6 +-
 drivers/gpu/drm/xe/xe_vm_types.h              |   2 +-
 include/drm/drm_device.h                      |  12 +
 include/drm/drm_evictable_lru.h               | 260 +++++++++++++++++
 include/drm/ttm/ttm_bo.h                      |  10 +-
 include/drm/ttm/ttm_device.h                  |  13 +-
 include/drm/ttm/ttm_range_manager.h           |  17 +-
 include/drm/ttm/ttm_resource.h                | 117 +++-----
 49 files changed, 1042 insertions(+), 501 deletions(-)
 create mode 100644 drivers/gpu/drm/drm_evictable_lru.c
 create mode 100644 include/drm/drm_evictable_lru.h

Comments

Thomas Hellstrom Dec. 21, 2023, 1:12 p.m. UTC | #1
Hi Oak, Christian

On 11/2/23 05:32, Oak Zeng wrote:
> We plan to implement xe driver's shared virtual memory
> manager (aka SVM) without buffer object concept. This
> means we won't build our shared virtual memory manager
> upon TTM infrastructure like amdgpu does.
>
> Even though this approach is more efficient, it does
> create a problem for memory eviction when there is
> memory pressure: memory allocated by SVM and memory
> allocated by TTM should be able to mutually evict
> from each other. TTM's resource manager maintains
> a LRU list for each memory type and this list is used
> to pick up the memory eviction victim. Since we don't
> use TTM for SVM implementation, SVM allocated memory
> can't be added to TTM resource manager's LRU list. Thus
> SVM allocated memory and TTM allocated memory are not
> mutually evictable.
>
> See more discussion on this topic here:
> https://www.spinics.net/lists/dri-devel/msg410740.html
>
> This series solve this problem by creating a shared
> LRU list b/t SVM and TTM, or any other resource manager.
>
> The basic idea is, abstract a drm_lru_entity structure
> which is supposed to be embedded in ttm_resource structure,
> or any other resource manager. The resource LRU list is a
> list of drm_lru_entity. drm_lru_entity has eviction function
> pointers which can be used to call back drivers' specific
> eviction function to evict a memory resource.
>
> Introduce global drm_lru_manager to struct drm_device
> to manage LRU lists. Each memory type or memory region
> can have a LRU list. TTM resource manager's LRU list functions
> including bulk move functions are moved to drm lru manager.
> drm lru manager provides a evict_first function to evict
> the first memory resource from LRU list. This function can
> be called from TTM, SVM or any other resource manager, so
> all the memory allocated in the drm sub-system can be mutually
> evicted.
>
> The lru_lock is also moved from struct ttm_device to struct
> drm_device.
>
> Opens:
> 1) memory accounting: currently the ttm resource manager's
> memory accounting functions is kept at ttm resource manager.
> Since memory accounting should be cross TTM and SVM, it should
> be ideally in the drm lru manager layer. This will be polished
> in the future.
>
> 2) eviction callback function interface: The current eviction
> function interface is designed to meet TTM memory eviction
> requirements. When SVM is in the picture, this interface
> need to be futher tunned to meet SVM requirement also.
>
> This series is not tested and it is only compiled for xe
> driver. Some minor changes are needed for other driver
> such as amdgpu, nouveau etc. I intended to send this out
> as a request for comment series to get some early feedback,
> to see whether this is the right direction to go. I will
> futher polish this series after a direction is agreed.
>
> Oak Zeng (11):
>    drm/ttm: re-parameter ttm_device_init
>    drm: move lru_lock from ttm_device to drm_device
>    drm: introduce drm evictable LRU
>    drm: Add evict function pointer to drm lru entity
>    drm: Replace ttm macros with drm macros
>    drm/ttm: Set lru manager to ttm resource manager
>    drm/ttm: re-parameterize a few ttm functions
>    drm: Initialize drm lru manager
>    drm/ttm: Use drm LRU manager iterator
>    drm/ttm: Implement ttm memory evict functions
>    drm/ttm: Write ttm functions using drm lru manager functions
>
>   drivers/gpu/drm/Makefile                      |   1 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c   |   6 +
>   .../gpu/drm/amd/amdgpu/amdgpu_preempt_mgr.c   |   6 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |  10 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c        |   6 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h        |   2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c  |  10 +-
>   drivers/gpu/drm/drm_drv.c                     |   1 +
>   drivers/gpu/drm/drm_evictable_lru.c           | 266 ++++++++++++++++++
>   drivers/gpu/drm/drm_gem_vram_helper.c         |  10 +-
>   drivers/gpu/drm/i915/gem/i915_gem_ttm.c       |   6 +-
>   drivers/gpu/drm/i915/i915_ttm_buddy_manager.c |  10 +
>   drivers/gpu/drm/i915/intel_region_ttm.c       |   4 +-
>   drivers/gpu/drm/i915/selftests/mock_region.c  |   2 +-
>   drivers/gpu/drm/loongson/lsdc_ttm.c           |  10 +-
>   drivers/gpu/drm/nouveau/nouveau_ttm.c         |  22 +-
>   drivers/gpu/drm/qxl/qxl_ttm.c                 |   6 +-
>   drivers/gpu/drm/radeon/radeon_ttm.c           |  10 +-
>   drivers/gpu/drm/ttm/tests/ttm_device_test.c   |   2 +-
>   drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c |   2 +-
>   drivers/gpu/drm/ttm/ttm_bo.c                  | 247 ++++++++++++----
>   drivers/gpu/drm/ttm/ttm_bo_util.c             |  20 +-
>   drivers/gpu/drm/ttm/ttm_bo_vm.c               |   2 +-
>   drivers/gpu/drm/ttm/ttm_device.c              |  55 ++--
>   drivers/gpu/drm/ttm/ttm_module.h              |   3 +-
>   drivers/gpu/drm/ttm/ttm_range_manager.c       |  14 +-
>   drivers/gpu/drm/ttm/ttm_resource.c            | 242 +++-------------
>   drivers/gpu/drm/ttm/ttm_sys_manager.c         |   8 +-
>   drivers/gpu/drm/vmwgfx/vmwgfx_bo.c            |   2 +-
>   drivers/gpu/drm/vmwgfx/vmwgfx_bo.h            |   2 +-
>   drivers/gpu/drm/vmwgfx/vmwgfx_drv.c           |   6 +-
>   .../gpu/drm/vmwgfx/vmwgfx_system_manager.c    |   6 +
>   drivers/gpu/drm/xe/xe_bo.c                    |  48 ++--
>   drivers/gpu/drm/xe/xe_bo.h                    |   5 +-
>   drivers/gpu/drm/xe/xe_device.c                |   2 +-
>   drivers/gpu/drm/xe/xe_dma_buf.c               |   4 +-
>   drivers/gpu/drm/xe/xe_exec.c                  |   6 +-
>   drivers/gpu/drm/xe/xe_migrate.c               |   6 +-
>   drivers/gpu/drm/xe/xe_res_cursor.h            |  10 +-
>   drivers/gpu/drm/xe/xe_ttm_sys_mgr.c           |   8 +-
>   drivers/gpu/drm/xe/xe_ttm_vram_mgr.c          |  18 +-
>   drivers/gpu/drm/xe/xe_vm.c                    |   6 +-
>   drivers/gpu/drm/xe/xe_vm_types.h              |   2 +-
>   include/drm/drm_device.h                      |  12 +
>   include/drm/drm_evictable_lru.h               | 260 +++++++++++++++++
>   include/drm/ttm/ttm_bo.h                      |  10 +-
>   include/drm/ttm/ttm_device.h                  |  13 +-
>   include/drm/ttm/ttm_range_manager.h           |  17 +-
>   include/drm/ttm/ttm_resource.h                | 117 +++-----
>   49 files changed, 1042 insertions(+), 501 deletions(-)
>   create mode 100644 drivers/gpu/drm/drm_evictable_lru.c
>   create mode 100644 include/drm/drm_evictable_lru.h
>
Some additional comments after looking through this patchset and the 
comments.

1) General: IMO a good start.

2) Where should this reside? I'm not a big fan of *prescribing* that a 
component should be part of a specific device structure. IMO that leads 
to dumping the whole drm namespace into this component rather than to 
keep it isolated with as few dependencies as possible. I would make the 
part that should reside in the drm device a "struct drm_lru_device", and 
the resource base class a "struct drm_lru_resource". Whether the drm 
device then wants to embed the struct drm_lru_device (nice if you need 
this component) or have a pointer to it (nice if you don't need this 
component) and whether the ttm_device should subclass the drm device 
becomes pretty orthogonal to the actual implementation, and we'd avoid 
hard to follow code cross-references all over the place. The drm_gpuvm 
IMO did a good job with this, but I know there are other opinions on 
this, and I don't want it to become a blocker.

3) Christian's comment about cursors into the LRU for LRU traversal restart
This is quickly becoming a must for the Xe driver, and a very-nice to 
have for a working TTM shrinker, but I think we should start to bring 
what we have in TTM currently over and do the cursors as a separate 
task. I have it pretty much on top of my priority list currently. Both 
downstream i915 and drm_gpuvm have a way to handle this nicely by 
permutating the LRU list just before the lock needs to be released, so I 
was thinking something similar. The complication would be the bulk move 
tracking.

4) The struct drm_lru_evict_ctx - This one needs to be common across all 
users since it's set up by the evictor and used by the evictee ops.

/Thomas