diff mbox series

[RFC,7/8] drm/i915: Lift the user PAT restriction from use_cpu_reloc

Message ID 20230727145504.1919316-8-tvrtko.ursulin@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series Another take on PAT/object cache mode refactoring | expand

Commit Message

Tvrtko Ursulin July 27, 2023, 2:55 p.m. UTC
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Now that i915 understands the caching modes behind PAT indices, we can
refine the check in use_cpu_reloc() to not reject the uncached PAT if it
was set by userspace.

Instead it can decide based on the presence of full coherency which
should be functionally equivalent on legacy platforms. We can ignore WT
since it is only used by the display, and we can ignore Meteorlake since
it will fail on the existing "has_llc" condition before the object cache
mode check.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

Comments

Matt Roper July 28, 2023, 12:09 a.m. UTC | #1
On Thu, Jul 27, 2023 at 03:55:03PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Now that i915 understands the caching modes behind PAT indices, we can
> refine the check in use_cpu_reloc() to not reject the uncached PAT if it
> was set by userspace.
> 
> Instead it can decide based on the presence of full coherency which
> should be functionally equivalent on legacy platforms. We can ignore WT
> since it is only used by the display, and we can ignore Meteorlake since
> it will fail on the existing "has_llc" condition before the object cache
> mode check.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 9 +--------
>  1 file changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 9d6e49c8a4c6..f74b33670bad 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -640,16 +640,9 @@ static inline int use_cpu_reloc(const struct reloc_cache *cache,
>  	if (DBG_FORCE_RELOC == FORCE_GTT_RELOC)
>  		return false;
>  
> -	/*
> -	 * For objects created by userspace through GEM_CREATE with pat_index
> -	 * set by set_pat extension, i915_gem_object_has_cache_level() always
> -	 * return true, otherwise the call would fall back to checking whether
> -	 * the object is un-cached.
> -	 */
>  	return (cache->has_llc ||
>  		obj->cache_dirty ||
> -		!(obj->pat_set_by_user ||
> -		  i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)));
> +		i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W));

My understanding of relocations is minimal, but does 2W actually matter
here (CPU snooping GPU caches)?  I would have expected only 1W coherency
to be necessary (GPU snooping CPU caches)?


Matt

>  }
>  
>  static int eb_reserve_vma(struct i915_execbuffer *eb,
> -- 
> 2.39.2
>
Tvrtko Ursulin July 28, 2023, 12:45 p.m. UTC | #2
On 28/07/2023 01:09, Matt Roper wrote:
> On Thu, Jul 27, 2023 at 03:55:03PM +0100, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Now that i915 understands the caching modes behind PAT indices, we can
>> refine the check in use_cpu_reloc() to not reject the uncached PAT if it
>> was set by userspace.
>>
>> Instead it can decide based on the presence of full coherency which
>> should be functionally equivalent on legacy platforms. We can ignore WT
>> since it is only used by the display, and we can ignore Meteorlake since
>> it will fail on the existing "has_llc" condition before the object cache
>> mode check.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Cc: Fei Yang <fei.yang@intel.com>
>> Cc: Matt Roper <matthew.d.roper@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 9 +--------
>>   1 file changed, 1 insertion(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> index 9d6e49c8a4c6..f74b33670bad 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>> @@ -640,16 +640,9 @@ static inline int use_cpu_reloc(const struct reloc_cache *cache,
>>   	if (DBG_FORCE_RELOC == FORCE_GTT_RELOC)
>>   		return false;
>>   
>> -	/*
>> -	 * For objects created by userspace through GEM_CREATE with pat_index
>> -	 * set by set_pat extension, i915_gem_object_has_cache_level() always
>> -	 * return true, otherwise the call would fall back to checking whether
>> -	 * the object is un-cached.
>> -	 */
>>   	return (cache->has_llc ||
>>   		obj->cache_dirty ||
>> -		!(obj->pat_set_by_user ||
>> -		  i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)));
>> +		i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W));
> 
> My understanding of relocations is minimal, but does 2W actually matter
> here (CPU snooping GPU caches)?  I would have expected only 1W coherency
> to be necessary (GPU snooping CPU caches)?

I struggled with this one. Original code was:

         return (cache->has_llc ||
                 obj->cache_dirty ||
                 obj->cache_level != I915_CACHE_NONE);

And I struggled to figure out the intent. It is not "don't do CPU 
relocations for uncached" because it will do them when LLC or dirty 
regardless.

You could be right.. can we interpret it as any mode apart from uncached 
was viewed as coherent for CPU writes being seen by the GPU?

In which case should/could it be based on I915_BO_CACHE_COHERENT_FOR_WRITE?

Regards,

Tvrtko

> 
> 
> Matt
> 
>>   }
>>   
>>   static int eb_reserve_vma(struct i915_execbuffer *eb,
>> -- 
>> 2.39.2
>>
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 9d6e49c8a4c6..f74b33670bad 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -640,16 +640,9 @@  static inline int use_cpu_reloc(const struct reloc_cache *cache,
 	if (DBG_FORCE_RELOC == FORCE_GTT_RELOC)
 		return false;
 
-	/*
-	 * For objects created by userspace through GEM_CREATE with pat_index
-	 * set by set_pat extension, i915_gem_object_has_cache_level() always
-	 * return true, otherwise the call would fall back to checking whether
-	 * the object is un-cached.
-	 */
 	return (cache->has_llc ||
 		obj->cache_dirty ||
-		!(obj->pat_set_by_user ||
-		  i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC)));
+		i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH2W));
 }
 
 static int eb_reserve_vma(struct i915_execbuffer *eb,