diff mbox series

[v5,12/13] drm/i915/ttm: use cached system pages when evicting lmem

Message ID 20210927114114.152310-12-matthew.auld@intel.com (mailing list archive)
State New, archived
Headers show
Series [v5,01/13] drm/ttm: stop calling tt_swapin in vm_access | expand

Commit Message

Matthew Auld Sept. 27, 2021, 11:41 a.m. UTC
This should let us do an accelerated copy directly to the shmem pages
when temporarily moving lmem-only objects, where the i915-gem shrinker
can later kick in to swap out the pages, if needed.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Thomas Hellström Sept. 29, 2021, 11:54 a.m. UTC | #1
On Mon, 2021-09-27 at 12:41 +0100, Matthew Auld wrote:
> This should let us do an accelerated copy directly to the shmem pages
> when temporarily moving lmem-only objects, where the i915-gem
> shrinker
> can later kick in to swap out the pages, if needed.
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index 194e5f1deda8..46d57541c0b2 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -134,11 +134,11 @@ static enum ttm_caching
>  i915_ttm_select_tt_caching(const struct drm_i915_gem_object *obj)
>  {
>         /*
> -        * Objects only allowed in system get cached cpu-mappings.
> -        * Other objects get WC mapping for now. Even if in system.
> +        * Objects only allowed in system get cached cpu-mappings, or
> when
> +        * evicting lmem-only buffers to system for swapping. Other
> objects get
> +        * WC mapping for now. Even if in system.
>          */
> -       if (obj->mm.region->type == INTEL_MEMORY_SYSTEM &&
> -           obj->mm.n_placements <= 1)
> +       if (obj->mm.n_placements <= 1)
>                 return ttm_cached;
>  
>         return ttm_write_combined;

We should be aware that with TTM, even evicted bos can be mapped by
user-space while evicted, and this will appear to user-space like the
WC-mapped object suddenly became WB-mapped. But it appears like mesa
doesn't care about this as long as the mappings are fully coherent.

Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Michel Dänzer Sept. 30, 2021, 10:04 a.m. UTC | #2
On 2021-09-29 13:54, Thomas Hellström wrote:
> On Mon, 2021-09-27 at 12:41 +0100, Matthew Auld wrote:
>> This should let us do an accelerated copy directly to the shmem pages
>> when temporarily moving lmem-only objects, where the i915-gem
>> shrinker
>> can later kick in to swap out the pages, if needed.
>>
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> ---
>>  drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 8 ++++----
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>> index 194e5f1deda8..46d57541c0b2 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>> @@ -134,11 +134,11 @@ static enum ttm_caching
>>  i915_ttm_select_tt_caching(const struct drm_i915_gem_object *obj)
>>  {
>>         /*
>> -        * Objects only allowed in system get cached cpu-mappings.
>> -        * Other objects get WC mapping for now. Even if in system.
>> +        * Objects only allowed in system get cached cpu-mappings, or
>> when
>> +        * evicting lmem-only buffers to system for swapping. Other
>> objects get
>> +        * WC mapping for now. Even if in system.
>>          */
>> -       if (obj->mm.region->type == INTEL_MEMORY_SYSTEM &&
>> -           obj->mm.n_placements <= 1)
>> +       if (obj->mm.n_placements <= 1)
>>                 return ttm_cached;
>>  
>>         return ttm_write_combined;
> 
> We should be aware that with TTM, even evicted bos can be mapped by
> user-space while evicted, and this will appear to user-space like the
> WC-mapped object suddenly became WB-mapped. But it appears like mesa
> doesn't care about this as long as the mappings are fully coherent.

FWIW, the Mesa radeonsi driver avoids surprises due to this (e.g. some path which involves CPU access suddenly goes faster if the BO was evicted from VRAM) by asking for WC mapping of BOs intended to be in VRAM even while they're evicted (via the AMDGPU_GEM_CREATE_CPU_GTT_USWC flag).
Matthew Auld Sept. 30, 2021, 12:27 p.m. UTC | #3
On 30/09/2021 11:04, Michel Dänzer wrote:
> On 2021-09-29 13:54, Thomas Hellström wrote:
>> On Mon, 2021-09-27 at 12:41 +0100, Matthew Auld wrote:
>>> This should let us do an accelerated copy directly to the shmem pages
>>> when temporarily moving lmem-only objects, where the i915-gem
>>> shrinker
>>> can later kick in to swap out the pages, if needed.
>>>
>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 8 ++++----
>>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>> index 194e5f1deda8..46d57541c0b2 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>> @@ -134,11 +134,11 @@ static enum ttm_caching
>>>   i915_ttm_select_tt_caching(const struct drm_i915_gem_object *obj)
>>>   {
>>>          /*
>>> -        * Objects only allowed in system get cached cpu-mappings.
>>> -        * Other objects get WC mapping for now. Even if in system.
>>> +        * Objects only allowed in system get cached cpu-mappings, or
>>> when
>>> +        * evicting lmem-only buffers to system for swapping. Other
>>> objects get
>>> +        * WC mapping for now. Even if in system.
>>>           */
>>> -       if (obj->mm.region->type == INTEL_MEMORY_SYSTEM &&
>>> -           obj->mm.n_placements <= 1)
>>> +       if (obj->mm.n_placements <= 1)
>>>                  return ttm_cached;
>>>   
>>>          return ttm_write_combined;
>>
>> We should be aware that with TTM, even evicted bos can be mapped by
>> user-space while evicted, and this will appear to user-space like the
>> WC-mapped object suddenly became WB-mapped. But it appears like mesa
>> doesn't care about this as long as the mappings are fully coherent.
> 
> FWIW, the Mesa radeonsi driver avoids surprises due to this (e.g. some path which involves CPU access suddenly goes faster if the BO was evicted from VRAM) by asking for WC mapping of BOs intended to be in VRAM even while they're evicted (via the AMDGPU_GEM_CREATE_CPU_GTT_USWC flag).
> 

Ok, so amdgpu just defaults to cached system memory, even for evicted 
VRAM, unless userspace requests USWC, in which case it will use WC?

>
Michel Dänzer Sept. 30, 2021, 12:55 p.m. UTC | #4
On 2021-09-30 14:27, Matthew Auld wrote:
> On 30/09/2021 11:04, Michel Dänzer wrote:
>> On 2021-09-29 13:54, Thomas Hellström wrote:
>>> On Mon, 2021-09-27 at 12:41 +0100, Matthew Auld wrote:
>>>> This should let us do an accelerated copy directly to the shmem pages
>>>> when temporarily moving lmem-only objects, where the i915-gem
>>>> shrinker
>>>> can later kick in to swap out the pages, if needed.
>>>>
>>>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>>>> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>>>> ---
>>>>   drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 8 ++++----
>>>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>>> index 194e5f1deda8..46d57541c0b2 100644
>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>>> @@ -134,11 +134,11 @@ static enum ttm_caching
>>>>   i915_ttm_select_tt_caching(const struct drm_i915_gem_object *obj)
>>>>   {
>>>>          /*
>>>> -        * Objects only allowed in system get cached cpu-mappings.
>>>> -        * Other objects get WC mapping for now. Even if in system.
>>>> +        * Objects only allowed in system get cached cpu-mappings, or
>>>> when
>>>> +        * evicting lmem-only buffers to system for swapping. Other
>>>> objects get
>>>> +        * WC mapping for now. Even if in system.
>>>>           */
>>>> -       if (obj->mm.region->type == INTEL_MEMORY_SYSTEM &&
>>>> -           obj->mm.n_placements <= 1)
>>>> +       if (obj->mm.n_placements <= 1)
>>>>                  return ttm_cached;
>>>>            return ttm_write_combined;
>>>
>>> We should be aware that with TTM, even evicted bos can be mapped by
>>> user-space while evicted, and this will appear to user-space like the
>>> WC-mapped object suddenly became WB-mapped. But it appears like mesa
>>> doesn't care about this as long as the mappings are fully coherent.
>>
>> FWIW, the Mesa radeonsi driver avoids surprises due to this (e.g. some path which involves CPU access suddenly goes faster if the BO was evicted from VRAM) by asking for WC mapping of BOs intended to be in VRAM even while they're evicted (via the AMDGPU_GEM_CREATE_CPU_GTT_USWC flag).
>>
> 
> Ok, so amdgpu just defaults to cached system memory, even for evicted VRAM, unless userspace requests USWC, in which case it will use WC?

Right.
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 194e5f1deda8..46d57541c0b2 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -134,11 +134,11 @@  static enum ttm_caching
 i915_ttm_select_tt_caching(const struct drm_i915_gem_object *obj)
 {
 	/*
-	 * Objects only allowed in system get cached cpu-mappings.
-	 * Other objects get WC mapping for now. Even if in system.
+	 * Objects only allowed in system get cached cpu-mappings, or when
+	 * evicting lmem-only buffers to system for swapping. Other objects get
+	 * WC mapping for now. Even if in system.
 	 */
-	if (obj->mm.region->type == INTEL_MEMORY_SYSTEM &&
-	    obj->mm.n_placements <= 1)
+	if (obj->mm.n_placements <= 1)
 		return ttm_cached;
 
 	return ttm_write_combined;