diff mbox series

[1/4] i915/gem: drop wbinvd_on_all_cpus usage

Message ID 20220319194227.297639-2-michael.cheng@intel.com (mailing list archive)
State New, archived
Headers show
Series Drop wbinvd_on_all_cpus usage | expand

Commit Message

Michael Cheng March 19, 2022, 7:42 p.m. UTC
Previous concern with using drm_clflush_sg was that we don't know what the
sg_table is pointing to, thus the usage of wbinvd_on_all_cpus to flush
everything at once to avoid paranoia.

To make i915 more architecture-neutral and be less paranoid, lets attempt to
use drm_clflush_sg to flush the pages for when the GPU wants to read
from main memory.

Signed-off-by: Michael Cheng <michael.cheng@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

Comments

Tvrtko Ursulin March 21, 2022, 10:30 a.m. UTC | #1
On 19/03/2022 19:42, Michael Cheng wrote:
> Previous concern with using drm_clflush_sg was that we don't know what the
> sg_table is pointing to, thus the usage of wbinvd_on_all_cpus to flush
> everything at once to avoid paranoia.

And now we know, or we know it is not a concern?

> To make i915 more architecture-neutral and be less paranoid, lets attempt to

"Lets attempt" as we don't know if this will work and/or what can/will 
break?

> use drm_clflush_sg to flush the pages for when the GPU wants to read
> from main memory.
> 
> Signed-off-by: Michael Cheng <michael.cheng@intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 9 ++-------
>   1 file changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> index f5062d0c6333..b0a5baaebc43 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> @@ -8,6 +8,7 @@
>   #include <linux/highmem.h>
>   #include <linux/dma-resv.h>
>   #include <linux/module.h>
> +#include <drm/drm_cache.h>
>   
>   #include <asm/smp.h>
>   
> @@ -250,16 +251,10 @@ static int i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj)
>   	 * DG1 is special here since it still snoops transactions even with
>   	 * CACHE_NONE. This is not the case with other HAS_SNOOP platforms. We
>   	 * might need to revisit this as we add new discrete platforms.
> -	 *
> -	 * XXX: Consider doing a vmap flush or something, where possible.
> -	 * Currently we just do a heavy handed wbinvd_on_all_cpus() here since
> -	 * the underlying sg_table might not even point to struct pages, so we
> -	 * can't just call drm_clflush_sg or similar, like we do elsewhere in
> -	 * the driver.
>   	 */
>   	if (i915_gem_object_can_bypass_llc(obj) ||
>   	    (!HAS_LLC(i915) && !IS_DG1(i915)))
> -		wbinvd_on_all_cpus();
> +		drm_clflush_sg(pages);

And as noticed before, drm_clfush_sg still can call wbinvd_on_all_cpus 
so are you just punting the issue somewhere else? How will it be solved 
there?

Regards,

Tvrtko

>   
>   	sg_page_sizes = i915_sg_dma_sizes(pages->sgl);
>   	__i915_gem_object_set_pages(obj, pages, sg_page_sizes);
Thomas Hellstrom March 21, 2022, 11:07 a.m. UTC | #2
On 3/21/22 11:30, Tvrtko Ursulin wrote:
>
> On 19/03/2022 19:42, Michael Cheng wrote:
>> Previous concern with using drm_clflush_sg was that we don't know 
>> what the
>> sg_table is pointing to, thus the usage of wbinvd_on_all_cpus to flush
>> everything at once to avoid paranoia.
>
> And now we know, or we know it is not a concern?
>
>> To make i915 more architecture-neutral and be less paranoid, lets 
>> attempt to
>
> "Lets attempt" as we don't know if this will work and/or what can/will 
> break?
>
>> use drm_clflush_sg to flush the pages for when the GPU wants to read
>> from main memory.
>>
>> Signed-off-by: Michael Cheng <michael.cheng@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 9 ++-------
>>   1 file changed, 2 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
>> b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>> index f5062d0c6333..b0a5baaebc43 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>> @@ -8,6 +8,7 @@
>>   #include <linux/highmem.h>
>>   #include <linux/dma-resv.h>
>>   #include <linux/module.h>
>> +#include <drm/drm_cache.h>
>>     #include <asm/smp.h>
>>   @@ -250,16 +251,10 @@ static int 
>> i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj)
>>        * DG1 is special here since it still snoops transactions even 
>> with
>>        * CACHE_NONE. This is not the case with other HAS_SNOOP 
>> platforms. We
>>        * might need to revisit this as we add new discrete platforms.
>> -     *
>> -     * XXX: Consider doing a vmap flush or something, where possible.
>> -     * Currently we just do a heavy handed wbinvd_on_all_cpus() here 
>> since
>> -     * the underlying sg_table might not even point to struct pages, 
>> so we
>> -     * can't just call drm_clflush_sg or similar, like we do 
>> elsewhere in
>> -     * the driver.
>>        */
>>       if (i915_gem_object_can_bypass_llc(obj) ||
>>           (!HAS_LLC(i915) && !IS_DG1(i915)))
>> -        wbinvd_on_all_cpus();
>> +        drm_clflush_sg(pages);
>
> And as noticed before, drm_clfush_sg still can call wbinvd_on_all_cpus 
> so are you just punting the issue somewhere else? How will it be 
> solved there?

I think in this case, drm_clflush_sg() can't be immediately used, 
because pages may not contain actual page pointers; might be just the 
dma address. It needs to be preceded with a dmabuf vmap.

But otherwise this change, I figure, falls into the "prefer range-aware 
apis" category; If the CPU supports it, flush the range only, otherwise 
fall back to wbinvd().

/Thomas



>
> Regards,
>
> Tvrtko
>
>>         sg_page_sizes = i915_sg_dma_sizes(pages->sgl);
>>       __i915_gem_object_set_pages(obj, pages, sg_page_sizes);
Michael Cheng March 21, 2022, 4:31 p.m. UTC | #3
On 2022-03-21 3:30 a.m., Tvrtko Ursulin wrote:

>
> On 19/03/2022 19:42, Michael Cheng wrote:
>> Previous concern with using drm_clflush_sg was that we don't know 
>> what the
>> sg_table is pointing to, thus the usage of wbinvd_on_all_cpus to flush
>> everything at once to avoid paranoia.
>
> And now we know, or we know it is not a concern?
>
>> To make i915 more architecture-neutral and be less paranoid, lets 
>> attempt to
>
> "Lets attempt" as we don't know if this will work and/or what can/will 
> break?

Yes, but it seems like there's no regression with IGT .

If there's a big hit in performance, or if this solution gets accepted 
and the bug reports come flying in, we can explore other solutions. But 
speaking to Dan Vetter, ideal solution would be to avoid any calls 
directly to wbinvd, and use drm helpers in place.

+Daniel for any extra input.

>> use drm_clflush_sg to flush the pages for when the GPU wants to read
>> from main memory.
>>
>> Signed-off-by: Michael Cheng <michael.cheng@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 9 ++-------
>>   1 file changed, 2 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
>> b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>> index f5062d0c6333..b0a5baaebc43 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>> @@ -8,6 +8,7 @@
>>   #include <linux/highmem.h>
>>   #include <linux/dma-resv.h>
>>   #include <linux/module.h>
>> +#include <drm/drm_cache.h>
>>     #include <asm/smp.h>
>>   @@ -250,16 +251,10 @@ static int 
>> i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj)
>>        * DG1 is special here since it still snoops transactions even 
>> with
>>        * CACHE_NONE. This is not the case with other HAS_SNOOP 
>> platforms. We
>>        * might need to revisit this as we add new discrete platforms.
>> -     *
>> -     * XXX: Consider doing a vmap flush or something, where possible.
>> -     * Currently we just do a heavy handed wbinvd_on_all_cpus() here 
>> since
>> -     * the underlying sg_table might not even point to struct pages, 
>> so we
>> -     * can't just call drm_clflush_sg or similar, like we do 
>> elsewhere in
>> -     * the driver.
>>        */
>>       if (i915_gem_object_can_bypass_llc(obj) ||
>>           (!HAS_LLC(i915) && !IS_DG1(i915)))
>> -        wbinvd_on_all_cpus();
>> +        drm_clflush_sg(pages);
>
> And as noticed before, drm_clfush_sg still can call wbinvd_on_all_cpus 
> so are you just punting the issue somewhere else? How will it be 
> solved there?
>
Instead of calling an x86 asm directly, we are using what's available to 
use to make the driver more architecture neutral. Agreeing with Thomas, 
this solution falls within the "prefer range-aware clflush apis", and 
since some other generation platform doesn't support clflushopt, it will 
fall back to using wbinvd.
> Regards,
>
> Tvrtko
>
>>         sg_page_sizes = i915_sg_dma_sizes(pages->sgl);
>>       __i915_gem_object_set_pages(obj, pages, sg_page_sizes);
Tvrtko Ursulin March 21, 2022, 5:28 p.m. UTC | #4
On 21/03/2022 16:31, Michael Cheng wrote:
> On 2022-03-21 3:30 a.m., Tvrtko Ursulin wrote:
> 
>>
>> On 19/03/2022 19:42, Michael Cheng wrote:
>>> Previous concern with using drm_clflush_sg was that we don't know 
>>> what the
>>> sg_table is pointing to, thus the usage of wbinvd_on_all_cpus to flush
>>> everything at once to avoid paranoia.
>>
>> And now we know, or we know it is not a concern?
>>
>>> To make i915 more architecture-neutral and be less paranoid, lets 
>>> attempt to
>>
>> "Lets attempt" as we don't know if this will work and/or what can/will 
>> break?
> 
> Yes, but it seems like there's no regression with IGT .
> 
> If there's a big hit in performance, or if this solution gets accepted 
> and the bug reports come flying in, we can explore other solutions. But 
> speaking to Dan Vetter, ideal solution would be to avoid any calls 
> directly to wbinvd, and use drm helpers in place.
> 
> +Daniel for any extra input.
> 
>>> use drm_clflush_sg to flush the pages for when the GPU wants to read
>>> from main memory.
>>>
>>> Signed-off-by: Michael Cheng <michael.cheng@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 9 ++-------
>>>   1 file changed, 2 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>> index f5062d0c6333..b0a5baaebc43 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>> @@ -8,6 +8,7 @@
>>>   #include <linux/highmem.h>
>>>   #include <linux/dma-resv.h>
>>>   #include <linux/module.h>
>>> +#include <drm/drm_cache.h>
>>>     #include <asm/smp.h>
>>>   @@ -250,16 +251,10 @@ static int 
>>> i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj)
>>>        * DG1 is special here since it still snoops transactions even 
>>> with
>>>        * CACHE_NONE. This is not the case with other HAS_SNOOP 
>>> platforms. We
>>>        * might need to revisit this as we add new discrete platforms.
>>> -     *
>>> -     * XXX: Consider doing a vmap flush or something, where possible.
>>> -     * Currently we just do a heavy handed wbinvd_on_all_cpus() here 
>>> since
>>> -     * the underlying sg_table might not even point to struct pages, 
>>> so we
>>> -     * can't just call drm_clflush_sg or similar, like we do 
>>> elsewhere in
>>> -     * the driver.
>>>        */
>>>       if (i915_gem_object_can_bypass_llc(obj) ||
>>>           (!HAS_LLC(i915) && !IS_DG1(i915)))
>>> -        wbinvd_on_all_cpus();
>>> +        drm_clflush_sg(pages);
>>
>> And as noticed before, drm_clfush_sg still can call wbinvd_on_all_cpus 
>> so are you just punting the issue somewhere else? How will it be 
>> solved there?
>>
> Instead of calling an x86 asm directly, we are using what's available to 
> use to make the driver more architecture neutral. Agreeing with Thomas, 
> this solution falls within the "prefer range-aware clflush apis", and 
> since some other generation platform doesn't support clflushopt, it will 
> fall back to using wbinvd.

Right, I was trying to get the information on what will drm_clflush_sg 
do on Arm. Is it range based or global there, or if the latter exists.

Regards,

Tvrtko
Michael Cheng March 21, 2022, 5:42 p.m. UTC | #5
On 2022-03-21 10:28 a.m., Tvrtko Ursulin wrote:
>
> On 21/03/2022 16:31, Michael Cheng wrote:
>> On 2022-03-21 3:30 a.m., Tvrtko Ursulin wrote:
>>
>>>
>>> On 19/03/2022 19:42, Michael Cheng wrote:
>>>> Previous concern with using drm_clflush_sg was that we don't know 
>>>> what the
>>>> sg_table is pointing to, thus the usage of wbinvd_on_all_cpus to flush
>>>> everything at once to avoid paranoia.
>>>
>>> And now we know, or we know it is not a concern?
>>>
>>>> To make i915 more architecture-neutral and be less paranoid, lets 
>>>> attempt to
>>>
>>> "Lets attempt" as we don't know if this will work and/or what 
>>> can/will break?
>>
>> Yes, but it seems like there's no regression with IGT .
>>
>> If there's a big hit in performance, or if this solution gets 
>> accepted and the bug reports come flying in, we can explore other 
>> solutions. But speaking to Dan Vetter, ideal solution would be to 
>> avoid any calls directly to wbinvd, and use drm helpers in place.
>>
>> +Daniel for any extra input.
>>
>>>> use drm_clflush_sg to flush the pages for when the GPU wants to read
>>>> from main memory.
>>>>
>>>> Signed-off-by: Michael Cheng <michael.cheng@intel.com>
>>>> ---
>>>>   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 9 ++-------
>>>>   1 file changed, 2 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
>>>> b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>>> index f5062d0c6333..b0a5baaebc43 100644
>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>>> @@ -8,6 +8,7 @@
>>>>   #include <linux/highmem.h>
>>>>   #include <linux/dma-resv.h>
>>>>   #include <linux/module.h>
>>>> +#include <drm/drm_cache.h>
>>>>     #include <asm/smp.h>
>>>>   @@ -250,16 +251,10 @@ static int 
>>>> i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj)
>>>>        * DG1 is special here since it still snoops transactions 
>>>> even with
>>>>        * CACHE_NONE. This is not the case with other HAS_SNOOP 
>>>> platforms. We
>>>>        * might need to revisit this as we add new discrete platforms.
>>>> -     *
>>>> -     * XXX: Consider doing a vmap flush or something, where possible.
>>>> -     * Currently we just do a heavy handed wbinvd_on_all_cpus() 
>>>> here since
>>>> -     * the underlying sg_table might not even point to struct 
>>>> pages, so we
>>>> -     * can't just call drm_clflush_sg or similar, like we do 
>>>> elsewhere in
>>>> -     * the driver.
>>>>        */
>>>>       if (i915_gem_object_can_bypass_llc(obj) ||
>>>>           (!HAS_LLC(i915) && !IS_DG1(i915)))
>>>> -        wbinvd_on_all_cpus();
>>>> +        drm_clflush_sg(pages);
>>>
>>> And as noticed before, drm_clfush_sg still can call 
>>> wbinvd_on_all_cpus so are you just punting the issue somewhere else? 
>>> How will it be solved there?
>>>
>> Instead of calling an x86 asm directly, we are using what's available 
>> to use to make the driver more architecture neutral. Agreeing with 
>> Thomas, this solution falls within the "prefer range-aware clflush 
>> apis", and since some other generation platform doesn't support 
>> clflushopt, it will fall back to using wbinvd.
>
> Right, I was trying to get the information on what will drm_clflush_sg 
> do on Arm. Is it range based or global there, or if the latter exists.
>
I am not too sure about the ARM side. We are currently working that out 
with the ARM folks in a different thread.
> Regards,
>
> Tvrtko
Michael Cheng March 21, 2022, 5:51 p.m. UTC | #6
On 2022-03-21 10:28 a.m., Tvrtko Ursulin wrote:
>
> On 21/03/2022 16:31, Michael Cheng wrote:
>> On 2022-03-21 3:30 a.m., Tvrtko Ursulin wrote:
>>
>>>
>>> On 19/03/2022 19:42, Michael Cheng wrote:
>>>> Previous concern with using drm_clflush_sg was that we don't know 
>>>> what the
>>>> sg_table is pointing to, thus the usage of wbinvd_on_all_cpus to flush
>>>> everything at once to avoid paranoia.
>>>
>>> And now we know, or we know it is not a concern?
>>>
>>>> To make i915 more architecture-neutral and be less paranoid, lets 
>>>> attempt to
>>>
>>> "Lets attempt" as we don't know if this will work and/or what 
>>> can/will break?
>>
>> Yes, but it seems like there's no regression with IGT .
>>
>> If there's a big hit in performance, or if this solution gets 
>> accepted and the bug reports come flying in, we can explore other 
>> solutions. But speaking to Dan Vetter, ideal solution would be to 
>> avoid any calls directly to wbinvd, and use drm helpers in place.
>>
>> +Daniel for any extra input.
>>
>>>> use drm_clflush_sg to flush the pages for when the GPU wants to read
>>>> from main memory.
>>>>
>>>> Signed-off-by: Michael Cheng <michael.cheng@intel.com>
>>>> ---
>>>>   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 9 ++-------
>>>>   1 file changed, 2 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
>>>> b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>>> index f5062d0c6333..b0a5baaebc43 100644
>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>>> @@ -8,6 +8,7 @@
>>>>   #include <linux/highmem.h>
>>>>   #include <linux/dma-resv.h>
>>>>   #include <linux/module.h>
>>>> +#include <drm/drm_cache.h>
>>>>     #include <asm/smp.h>
>>>>   @@ -250,16 +251,10 @@ static int 
>>>> i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj)
>>>>        * DG1 is special here since it still snoops transactions 
>>>> even with
>>>>        * CACHE_NONE. This is not the case with other HAS_SNOOP 
>>>> platforms. We
>>>>        * might need to revisit this as we add new discrete platforms.
>>>> -     *
>>>> -     * XXX: Consider doing a vmap flush or something, where possible.
>>>> -     * Currently we just do a heavy handed wbinvd_on_all_cpus() 
>>>> here since
>>>> -     * the underlying sg_table might not even point to struct 
>>>> pages, so we
>>>> -     * can't just call drm_clflush_sg or similar, like we do 
>>>> elsewhere in
>>>> -     * the driver.
>>>>        */
>>>>       if (i915_gem_object_can_bypass_llc(obj) ||
>>>>           (!HAS_LLC(i915) && !IS_DG1(i915)))
>>>> -        wbinvd_on_all_cpus();
>>>> +        drm_clflush_sg(pages);
>>>
>>> And as noticed before, drm_clfush_sg still can call 
>>> wbinvd_on_all_cpus so are you just punting the issue somewhere else? 
>>> How will it be solved there?
>>>
>> Instead of calling an x86 asm directly, we are using what's available 
>> to use to make the driver more architecture neutral. Agreeing with 
>> Thomas, this solution falls within the "prefer range-aware clflush 
>> apis", and since some other generation platform doesn't support 
>> clflushopt, it will fall back to using wbinvd.
>
> Right, I was trying to get the information on what will drm_clflush_sg 
> do on Arm. Is it range based or global there, or if the latter exists.
>
CCing a few ARM folks to see if they have any inputs.

+ Catalin And Robin

> Regards,
>
> Tvrtko
Michael Cheng March 21, 2022, 6:51 p.m. UTC | #7
On 2022-03-21 4:07 a.m., Thomas Hellström wrote:
>
> On 3/21/22 11:30, Tvrtko Ursulin wrote:
>>
>> On 19/03/2022 19:42, Michael Cheng wrote:
>>> Previous concern with using drm_clflush_sg was that we don't know 
>>> what the
>>> sg_table is pointing to, thus the usage of wbinvd_on_all_cpus to flush
>>> everything at once to avoid paranoia.
>>
>> And now we know, or we know it is not a concern?
>>
>>> To make i915 more architecture-neutral and be less paranoid, lets 
>>> attempt to
>>
>> "Lets attempt" as we don't know if this will work and/or what 
>> can/will break?
>>
>>> use drm_clflush_sg to flush the pages for when the GPU wants to read
>>> from main memory.
>>>
>>> Signed-off-by: Michael Cheng <michael.cheng@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 9 ++-------
>>>   1 file changed, 2 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
>>> b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>> index f5062d0c6333..b0a5baaebc43 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>>> @@ -8,6 +8,7 @@
>>>   #include <linux/highmem.h>
>>>   #include <linux/dma-resv.h>
>>>   #include <linux/module.h>
>>> +#include <drm/drm_cache.h>
>>>     #include <asm/smp.h>
>>>   @@ -250,16 +251,10 @@ static int 
>>> i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj)
>>>        * DG1 is special here since it still snoops transactions even 
>>> with
>>>        * CACHE_NONE. This is not the case with other HAS_SNOOP 
>>> platforms. We
>>>        * might need to revisit this as we add new discrete platforms.
>>> -     *
>>> -     * XXX: Consider doing a vmap flush or something, where possible.
>>> -     * Currently we just do a heavy handed wbinvd_on_all_cpus() 
>>> here since
>>> -     * the underlying sg_table might not even point to struct 
>>> pages, so we
>>> -     * can't just call drm_clflush_sg or similar, like we do 
>>> elsewhere in
>>> -     * the driver.
>>>        */
>>>       if (i915_gem_object_can_bypass_llc(obj) ||
>>>           (!HAS_LLC(i915) && !IS_DG1(i915)))
>>> -        wbinvd_on_all_cpus();
>>> +        drm_clflush_sg(pages);
>>
>> And as noticed before, drm_clfush_sg still can call 
>> wbinvd_on_all_cpus so are you just punting the issue somewhere else? 
>> How will it be solved there?
>
> I think in this case, drm_clflush_sg() can't be immediately used, 
> because pages may not contain actual page pointers; might be just the 
> dma address. It needs to be preceded with a dmabuf vmap.

Could you elaborate more with using a dmabuf vmap?

Doing a quick grep on drm_clflush_sg, were you thinking about something 
similar to the following?

if (obj->cache_dirty) {
WARN_ON_ONCE(IS_DGFX(i915));
obj->write_domain = 0;
if (i915_gem_object_has_struct_page(obj))
drm_clflush_sg(pages);
obj->cache_dirty = false;
}


Thanks,

Michael Cheng

> But otherwise this change, I figure, falls into the "prefer 
> range-aware apis" category; If the CPU supports it, flush the range 
> only, otherwise fall back to wbinvd().
>
> /Thomas
>
>
>>
>> Regards,
>>
>> Tvrtko
>>
>>>         sg_page_sizes = i915_sg_dma_sizes(pages->sgl);
>>>       __i915_gem_object_set_pages(obj, pages, sg_page_sizes);
Daniel Vetter March 22, 2022, 2:35 p.m. UTC | #8
On Mon, Mar 21, 2022 at 10:42:03AM -0700, Michael Cheng wrote:
> 
> On 2022-03-21 10:28 a.m., Tvrtko Ursulin wrote:
> > 
> > On 21/03/2022 16:31, Michael Cheng wrote:
> > > On 2022-03-21 3:30 a.m., Tvrtko Ursulin wrote:
> > > 
> > > > 
> > > > On 19/03/2022 19:42, Michael Cheng wrote:
> > > > > Previous concern with using drm_clflush_sg was that we don't
> > > > > know what the
> > > > > sg_table is pointing to, thus the usage of wbinvd_on_all_cpus to flush
> > > > > everything at once to avoid paranoia.
> > > > 
> > > > And now we know, or we know it is not a concern?
> > > > 
> > > > > To make i915 more architecture-neutral and be less paranoid,
> > > > > lets attempt to
> > > > 
> > > > "Lets attempt" as we don't know if this will work and/or what
> > > > can/will break?
> > > 
> > > Yes, but it seems like there's no regression with IGT .
> > > 
> > > If there's a big hit in performance, or if this solution gets
> > > accepted and the bug reports come flying in, we can explore other
> > > solutions. But speaking to Dan Vetter, ideal solution would be to
> > > avoid any calls directly to wbinvd, and use drm helpers in place.
> > > 
> > > +Daniel for any extra input.
> > > 
> > > > > use drm_clflush_sg to flush the pages for when the GPU wants to read
> > > > > from main memory.
> > > > > 
> > > > > Signed-off-by: Michael Cheng <michael.cheng@intel.com>
> > > > > ---
> > > > >   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 9 ++-------
> > > > >   1 file changed, 2 insertions(+), 7 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > > > > b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > > > > index f5062d0c6333..b0a5baaebc43 100644
> > > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> > > > > @@ -8,6 +8,7 @@
> > > > >   #include <linux/highmem.h>
> > > > >   #include <linux/dma-resv.h>
> > > > >   #include <linux/module.h>
> > > > > +#include <drm/drm_cache.h>
> > > > >     #include <asm/smp.h>
> > > > >   @@ -250,16 +251,10 @@ static int
> > > > > i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object
> > > > > *obj)
> > > > >        * DG1 is special here since it still snoops
> > > > > transactions even with
> > > > >        * CACHE_NONE. This is not the case with other
> > > > > HAS_SNOOP platforms. We
> > > > >        * might need to revisit this as we add new discrete platforms.
> > > > > -     *
> > > > > -     * XXX: Consider doing a vmap flush or something, where possible.
> > > > > -     * Currently we just do a heavy handed
> > > > > wbinvd_on_all_cpus() here since
> > > > > -     * the underlying sg_table might not even point to
> > > > > struct pages, so we
> > > > > -     * can't just call drm_clflush_sg or similar, like we
> > > > > do elsewhere in
> > > > > -     * the driver.
> > > > >        */
> > > > >       if (i915_gem_object_can_bypass_llc(obj) ||
> > > > >           (!HAS_LLC(i915) && !IS_DG1(i915)))
> > > > > -        wbinvd_on_all_cpus();
> > > > > +        drm_clflush_sg(pages);
> > > > 
> > > > And as noticed before, drm_clfush_sg still can call
> > > > wbinvd_on_all_cpus so are you just punting the issue somewhere
> > > > else? How will it be solved there?
> > > > 
> > > Instead of calling an x86 asm directly, we are using what's
> > > available to use to make the driver more architecture neutral.
> > > Agreeing with Thomas, this solution falls within the "prefer
> > > range-aware clflush apis", and since some other generation platform
> > > doesn't support clflushopt, it will fall back to using wbinvd.
> > 
> > Right, I was trying to get the information on what will drm_clflush_sg
> > do on Arm. Is it range based or global there, or if the latter exists.
> > 
> I am not too sure about the ARM side. We are currently working that out with
> the ARM folks in a different thread.

It won't do anything useful on arm. The _only_ way to get special memory
on arm is by specifying what you want at allocation time. Anything else is
busted, more or less. Which is why none of these code paths should run on
anything else than x86.

And even on x86 they're at best questionable, but some of these are
mistakes encoded into uapi and we're stuck.

We should still try to use drm_clflush_sg() imo to make the entire ordeal
less horrible, and if that turns out to be problematic, we need to bite
the bullet and fix the uapi architecture instead of trying to
retroshoehorn performance fixes into uapi that just can't do it properly.

In this case here this would mean fixing allocation flags with
GEM_CREATE_EXT and fixing userspace to use that when needed (it should
know already since pretty much all drivers have this issue in some form or
another).

Cheers, Daniel


> > Regards,
> > 
> > Tvrtko
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index f5062d0c6333..b0a5baaebc43 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -8,6 +8,7 @@ 
 #include <linux/highmem.h>
 #include <linux/dma-resv.h>
 #include <linux/module.h>
+#include <drm/drm_cache.h>
 
 #include <asm/smp.h>
 
@@ -250,16 +251,10 @@  static int i915_gem_object_get_pages_dmabuf(struct drm_i915_gem_object *obj)
 	 * DG1 is special here since it still snoops transactions even with
 	 * CACHE_NONE. This is not the case with other HAS_SNOOP platforms. We
 	 * might need to revisit this as we add new discrete platforms.
-	 *
-	 * XXX: Consider doing a vmap flush or something, where possible.
-	 * Currently we just do a heavy handed wbinvd_on_all_cpus() here since
-	 * the underlying sg_table might not even point to struct pages, so we
-	 * can't just call drm_clflush_sg or similar, like we do elsewhere in
-	 * the driver.
 	 */
 	if (i915_gem_object_can_bypass_llc(obj) ||
 	    (!HAS_LLC(i915) && !IS_DG1(i915)))
-		wbinvd_on_all_cpus();
+		drm_clflush_sg(pages);
 
 	sg_page_sizes = i915_sg_dma_sizes(pages->sgl);
 	__i915_gem_object_set_pages(obj, pages, sg_page_sizes);