diff mbox series

[RFC,2/2] drm/i915: Remove PAT hack from i915_gem_object_can_bypass_llc

Message ID 20230713152718.645488-2-tvrtko.ursulin@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series [RFC,1/2] drm/i915: Refactor PAT/object cache handling | expand

Commit Message

Tvrtko Ursulin July 13, 2023, 3:27 p.m. UTC
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

According to the comment in i915_gem_object_can_bypass_llc the purpose of
the function is to return false if the platform/object has a caching mode
where GPU can bypass the LLC.

So far the only platforms which allegedly can do this are Jasperlake and
Elkhartlake, and that via MOCS (not PAT).

Instead of blindly assuming that objects where userspace has set the PAT
index can (bypass the LLC), question is is there a such PAT index on a
platform. Probably starting with Meteorlake since that one is the only one
where set PAT extension can be currently used. Or if there is a MOCS entry
which can achieve the same thing on Meteorlake.

If there is such PAT, now that i915 can be made to understand them better,
we can make the check more fine grained. Or if there is a MOCS entry then
we probably should apply the blanket IS_METEORLAKE condition.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 9275277d5324 ("drm/i915: use pat_index instead of cache_level")
Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
Cc: Fei Yang <fei.yang@intel.com>
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c | 6 ------
 1 file changed, 6 deletions(-)

Comments

Yang, Fei July 14, 2023, 5:43 a.m. UTC | #1
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> According to the comment in i915_gem_object_can_bypass_llc the
> purpose of the function is to return false if the platform/object
> has a caching mode where GPU can bypass the LLC.
>
> So far the only platforms which allegedly can do this are Jasperlake
> and Elkhartlake, and that via MOCS (not PAT).
>
> Instead of blindly assuming that objects where userspace has set the
> PAT index can (bypass the LLC), question is is there a such PAT index
> on a platform. Probably starting with Meteorlake since that one is the
> only one where set PAT extension can be currently used. Or if there is
> a MOCS entry which can achieve the same thing on Meteorlake.
>
> If there is such PAT, now that i915 can be made to understand them
> better, we can make the check more fine grained. Or if there is a MOCS
> entry then we probably should apply the blanket IS_METEORLAKE condition.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Fixes: 9275277d5324 ("drm/i915: use pat_index instead of cache_level")
> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> Cc: Fei Yang <fei.yang@intel.com>
> Cc: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: Matt Roper <matthew.d.roper@intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_object.c | 6 ------
>  1 file changed, 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index 33a1e97d18b3..1e34171c4162 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -229,12 +229,6 @@ bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
>       if (!(obj->flags & I915_BO_ALLOC_USER))
>               return false;
>
> -     /*
> -      * Always flush cache for UMD objects at creation time.
> -      */
> -     if (obj->pat_set_by_user)

I'm afraid this is going to break MESA. Can we run MESA tests with this patch?

>       /*
>        * EHL and JSL add the 'Bypass LLC' MOCS entry, which should make it
>        * possible for userspace to bypass the GTT caching bits set by the
> --
> 2.39.2
Tvrtko Ursulin July 14, 2023, 10:11 a.m. UTC | #2
On 14/07/2023 06:43, Yang, Fei wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> According to the comment in i915_gem_object_can_bypass_llc the
>> purpose of the function is to return false if the platform/object
>> has a caching mode where GPU can bypass the LLC.
>>
>> So far the only platforms which allegedly can do this are Jasperlake
>> and Elkhartlake, and that via MOCS (not PAT).
>>
>> Instead of blindly assuming that objects where userspace has set the
>> PAT index can (bypass the LLC), question is is there a such PAT index
>> on a platform. Probably starting with Meteorlake since that one is the
>> only one where set PAT extension can be currently used. Or if there is
>> a MOCS entry which can achieve the same thing on Meteorlake.
>>
>> If there is such PAT, now that i915 can be made to understand them
>> better, we can make the check more fine grained. Or if there is a MOCS
>> entry then we probably should apply the blanket IS_METEORLAKE condition.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Fixes: 9275277d5324 ("drm/i915: use pat_index instead of cache_level")
>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
>> Cc: Fei Yang <fei.yang@intel.com>
>> Cc: Andi Shyti <andi.shyti@linux.intel.com>
>> Cc: Matt Roper <matthew.d.roper@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_object.c | 6 ------
>>   1 file changed, 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> index 33a1e97d18b3..1e34171c4162 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>> @@ -229,12 +229,6 @@ bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
>>        if (!(obj->flags & I915_BO_ALLOC_USER))
>>                return false;
>>
>> -     /*
>> -      * Always flush cache for UMD objects at creation time.
>> -      */
>> -     if (obj->pat_set_by_user)
> 
> I'm afraid this is going to break MESA. Can we run MESA tests with this patch?

I can't, but question is why it would break Mesa which would need a nice 
comment here?

For instance should the check be IS_METEORLAKE?

Or should it be "is wb" && "not has 1-way coherent"?

Or both?

Or, given how Meteorlake does not have LLC, how can anything bypass it 
there? Or is it about snooping on Meteorlake and how?

Regards,

Tvrtko

> 
>>        /*
>>         * EHL and JSL add the 'Bypass LLC' MOCS entry, which should make it
>>         * possible for userspace to bypass the GTT caching bits set by the
>> --
>> 2.39.2
Yang, Fei July 14, 2023, 5:38 p.m. UTC | #3
> On 14/07/2023 06:43, Yang, Fei wrote:
>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>
>>> According to the comment in i915_gem_object_can_bypass_llc the
>>> purpose of the function is to return false if the platform/object has
>>> a caching mode where GPU can bypass the LLC.
>>>
>>> So far the only platforms which allegedly can do this are Jasperlake
>>> and Elkhartlake, and that via MOCS (not PAT).
>>>
>>> Instead of blindly assuming that objects where userspace has set the
>>> PAT index can (bypass the LLC), question is is there a such PAT index
>>> on a platform. Probably starting with Meteorlake since that one is
>>> the only one where set PAT extension can be currently used. Or if
>>> there is a MOCS entry which can achieve the same thing on Meteorlake.
>>>
>>> If there is such PAT, now that i915 can be made to understand them
>>> better, we can make the check more fine grained. Or if there is a
>>> MOCS entry then we probably should apply the blanket IS_METEORLAKE condition.
>>>
>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> Fixes: 9275277d5324 ("drm/i915: use pat_index instead of
>>> cache_level")
>>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
>>> Cc: Fei Yang <fei.yang@intel.com>
>>> Cc: Andi Shyti <andi.shyti@linux.intel.com>
>>> Cc: Matt Roper <matthew.d.roper@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/gem/i915_gem_object.c | 6 ------
>>>   1 file changed, 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>> index 33a1e97d18b3..1e34171c4162 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>> @@ -229,12 +229,6 @@ bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
>>>        if (!(obj->flags & I915_BO_ALLOC_USER))
>>>                return false;
>>>
>>> -     /*
>>> -      * Always flush cache for UMD objects at creation time.
>>> -      */
>>> -     if (obj->pat_set_by_user)
>>
>> I'm afraid this is going to break MESA. Can we run MESA tests with this patch?
>
> I can't, but question is why it would break Mesa which would need a
> nice comment here?

For objects with PAT index set by user, the KMD doesn't know whether the user
space would map it as cacheable or not for CPU access. So we want to enforce
a cache flush at BO creation time before handing the BO over to user space.
I remember the issue we had before is that MESA sees garbage data in a BO
that is supposed to be initialized with zero.

I understand your point, checking for obj->pat_set_by_user is not something
to be done in this function. I'm fine with the removal of these lines, but
the checking needs to be done somewhere, maybe just one level up?

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index 1df74f7aa3dc..39cd903ba223 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -258,6 +258,7 @@ static int i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj)
         * the driver.
         */
        if (i915_gem_object_can_bypass_llc(obj) ||
+           obj->pat_set_by_user ||
            (!HAS_LLC(i915) && !IS_DG1(i915)))
                wbinvd_on_all_cpus();

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index 8f1633c3fb93..770e02a851f6 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -254,7 +254,8 @@ static int shmem_get_pages(struct drm_i915_gem_object *obj)
        if (i915_gem_object_needs_bit17_swizzle(obj))
                i915_gem_object_do_bit_17_swizzle(obj, st);

-       if (i915_gem_object_can_bypass_llc(obj))
+       if (i915_gem_object_can_bypass_llc(obj) ||
+           obj->pat_set_by_user)
                obj->cache_dirty = true;

        __i915_gem_object_set_pages(obj, st);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
index 1d3ebdf4069b..9e65e3324422 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -170,7 +170,8 @@ static int i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj)
        }

        WARN_ON_ONCE(!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE));
-       if (i915_gem_object_can_bypass_llc(obj))
+       if (i915_gem_object_can_bypass_llc(obj) ||
+           obj->pat_set_by_user)
                obj->cache_dirty = true;

        __i915_gem_object_set_pages(obj, st);

-Fei

> For instance should the check be IS_METEORLAKE?
>
> Or should it be "is wb" && "not has 1-way coherent"?
>
> Or both?
>
> Or, given how Meteorlake does not have LLC, how can anything bypass it
> there? Or is it about snooping on Meteorlake and how?
>
> Regards,
>
> Tvrtko
>
>>
>>>        /*
>>>         * EHL and JSL add the 'Bypass LLC' MOCS entry, which should make it
>>>         * possible for userspace to bypass the GTT caching bits set
>>> by the
>>> --
>>> 2.39.2
Matt Roper July 15, 2023, 12:20 a.m. UTC | #4
On Fri, Jul 14, 2023 at 11:11:30AM +0100, Tvrtko Ursulin wrote:
> 
> On 14/07/2023 06:43, Yang, Fei wrote:
> > > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > > 
> > > According to the comment in i915_gem_object_can_bypass_llc the
> > > purpose of the function is to return false if the platform/object
> > > has a caching mode where GPU can bypass the LLC.
> > > 
> > > So far the only platforms which allegedly can do this are Jasperlake
> > > and Elkhartlake, and that via MOCS (not PAT).
> > > 
> > > Instead of blindly assuming that objects where userspace has set the
> > > PAT index can (bypass the LLC), question is is there a such PAT index
> > > on a platform. Probably starting with Meteorlake since that one is the
> > > only one where set PAT extension can be currently used. Or if there is
> > > a MOCS entry which can achieve the same thing on Meteorlake.
> > > 
> > > If there is such PAT, now that i915 can be made to understand them
> > > better, we can make the check more fine grained. Or if there is a MOCS
> > > entry then we probably should apply the blanket IS_METEORLAKE condition.
> > > 
> > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > > Fixes: 9275277d5324 ("drm/i915: use pat_index instead of cache_level")
> > > Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
> > > Cc: Fei Yang <fei.yang@intel.com>
> > > Cc: Andi Shyti <andi.shyti@linux.intel.com>
> > > Cc: Matt Roper <matthew.d.roper@intel.com>
> > > ---
> > >   drivers/gpu/drm/i915/gem/i915_gem_object.c | 6 ------
> > >   1 file changed, 6 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > > index 33a1e97d18b3..1e34171c4162 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> > > @@ -229,12 +229,6 @@ bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
> > >        if (!(obj->flags & I915_BO_ALLOC_USER))
> > >                return false;
> > > 
> > > -     /*
> > > -      * Always flush cache for UMD objects at creation time.
> > > -      */
> > > -     if (obj->pat_set_by_user)
> > 
> > I'm afraid this is going to break MESA. Can we run MESA tests with this patch?
> 
> I can't, but question is why it would break Mesa which would need a nice
> comment here?
> 
> For instance should the check be IS_METEORLAKE?
> 
> Or should it be "is wb" && "not has 1-way coherent"?
> 
> Or both?
> 
> Or, given how Meteorlake does not have LLC, how can anything bypass it
> there? Or is it about snooping on Meteorlake and how?

I think the "LLC" in the function name is a bit misleading since this is
really all just about the ability to avoid coherency (which might come
from an LLC on some platforms or from snooping on others).

The concern is that the CPU writes to the buffer and those writes sit in
a CPU cache without making it to RAM immediately.  If the GPU then
reads the object with any of the non-coherent PAT settings that were
introduced in Xe_LPG, it will not snoop the CPU cache and will read old,
stale data from RAM.

So I think we'd want a condition like ("Xe_LPG or later" && "any non
coherent PAT").  The WB/WT/UC status of the GPU behavior shouldn't
matter here, just the coherency setting.


Matt

> 
> Regards,
> 
> Tvrtko
> 
> > 
> > >        /*
> > >         * EHL and JSL add the 'Bypass LLC' MOCS entry, which should make it
> > >         * possible for userspace to bypass the GTT caching bits set by the
> > > --
> > > 2.39.2
Tvrtko Ursulin July 17, 2023, 10:55 a.m. UTC | #5
On 15/07/2023 01:20, Matt Roper wrote:
> On Fri, Jul 14, 2023 at 11:11:30AM +0100, Tvrtko Ursulin wrote:
>>
>> On 14/07/2023 06:43, Yang, Fei wrote:
>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>
>>>> According to the comment in i915_gem_object_can_bypass_llc the
>>>> purpose of the function is to return false if the platform/object
>>>> has a caching mode where GPU can bypass the LLC.
>>>>
>>>> So far the only platforms which allegedly can do this are Jasperlake
>>>> and Elkhartlake, and that via MOCS (not PAT).
>>>>
>>>> Instead of blindly assuming that objects where userspace has set the
>>>> PAT index can (bypass the LLC), question is is there a such PAT index
>>>> on a platform. Probably starting with Meteorlake since that one is the
>>>> only one where set PAT extension can be currently used. Or if there is
>>>> a MOCS entry which can achieve the same thing on Meteorlake.
>>>>
>>>> If there is such PAT, now that i915 can be made to understand them
>>>> better, we can make the check more fine grained. Or if there is a MOCS
>>>> entry then we probably should apply the blanket IS_METEORLAKE condition.
>>>>
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>> Fixes: 9275277d5324 ("drm/i915: use pat_index instead of cache_level")
>>>> Cc: Chris Wilson <chris.p.wilson@linux.intel.com>
>>>> Cc: Fei Yang <fei.yang@intel.com>
>>>> Cc: Andi Shyti <andi.shyti@linux.intel.com>
>>>> Cc: Matt Roper <matthew.d.roper@intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/gem/i915_gem_object.c | 6 ------
>>>>    1 file changed, 6 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>>> index 33a1e97d18b3..1e34171c4162 100644
>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>>> @@ -229,12 +229,6 @@ bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
>>>>         if (!(obj->flags & I915_BO_ALLOC_USER))
>>>>                 return false;
>>>>
>>>> -     /*
>>>> -      * Always flush cache for UMD objects at creation time.
>>>> -      */
>>>> -     if (obj->pat_set_by_user)
>>>
>>> I'm afraid this is going to break MESA. Can we run MESA tests with this patch?
>>
>> I can't, but question is why it would break Mesa which would need a nice
>> comment here?
>>
>> For instance should the check be IS_METEORLAKE?
>>
>> Or should it be "is wb" && "not has 1-way coherent"?
>>
>> Or both?
>>
>> Or, given how Meteorlake does not have LLC, how can anything bypass it
>> there? Or is it about snooping on Meteorlake and how?
> 
> I think the "LLC" in the function name is a bit misleading since this is
> really all just about the ability to avoid coherency (which might come
> from an LLC on some platforms or from snooping on others).
> 
> The concern is that the CPU writes to the buffer and those writes sit in
> a CPU cache without making it to RAM immediately.  If the GPU then
> reads the object with any of the non-coherent PAT settings that were
> introduced in Xe_LPG, it will not snoop the CPU cache and will read old,
> stale data from RAM.
> 
> So I think we'd want a condition like ("Xe_LPG or later" && "any non
> coherent PAT").  The WB/WT/UC status of the GPU behavior shouldn't
> matter here, just the coherency setting.

Right, sounds plausible to me. So with this series the new condition in this function would look like this:

i915_gem_object_can_bypass_llc(..)
{
...
	if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
	    i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH1W) != 1)
		return true;

("!= 1" in the condition meaning either it is not coherent, or i915 does not know due table being incomplete - like some PAT index on some future platform was forgotten to be defined.)

That would catch any platform with non-coherent WB, as long as the PAT-to-i915-cache-mode tables are correct. It would currently only apply to Meteorlake:

#define MTL_CACHE_MODES \
	.cache_modes = { \
		[0] = I915_CACHE(WB), \
		[1] = I915_CACHE(WT), \
		[2] = I915_CACHE(UC), \
		[3] = _I915_CACHE(WB, COH1W), \
		[4] = __I915_CACHE(WB, BIT(I915_CACHE_FLAG_COH1W) | BIT(I915_CACHE_FLAG_COH2W)), \
	}

Or are saying it should apply to UC and WT too somehow?

I'll also try to join sub-threads to Fei's reply here too.

So in terms of the stated issue with _CPU_ access from Mesa seeing stale data (non-zeroed pages) depending on the PAT index - I don't understand that yet. That seems like a completely CPU cache problem space and I do not understand how PAT index gets into the picture.

But the proposed patch from your email Fei looks like it would be covered by the snippet I have in this reply.

Regards,

Tvrtko
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 33a1e97d18b3..1e34171c4162 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -229,12 +229,6 @@  bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
 	if (!(obj->flags & I915_BO_ALLOC_USER))
 		return false;
 
-	/*
-	 * Always flush cache for UMD objects at creation time.
-	 */
-	if (obj->pat_set_by_user)
-		return true;
-
 	/*
 	 * EHL and JSL add the 'Bypass LLC' MOCS entry, which should make it
 	 * possible for userspace to bypass the GTT caching bits set by the