[v2,2/4] drm/i915/gt: Re-work invalidate_csb_entries

Message ID	20220128221020.188253-3-michael.cheng@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Michael Cheng <michael.cheng@intel.com> To: intel-gfx@lists.freedesktop.org Date: Fri, 28 Jan 2022 14:10:18 -0800 Message-Id: <20220128221020.188253-3-michael.cheng@intel.com> In-Reply-To: <20220128221020.188253-1-michael.cheng@intel.com> References: <20220128221020.188253-1-michael.cheng@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [Intel-gfx] [PATCH v2 2/4] drm/i915/gt: Re-work invalidate_csb_entries Precedence: list Cc: michael.cheng@intel.com, lucas.demarchi@intel.com, matthew.auld@intel.com, mika.kuoppala@intel.com Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
Series	Use drm_clflush* instead of clflush \| expand [v2,0/4] Use drm_clflush* instead of clflush [v2,1/4] drm/i915/gt: Re-work intel_write_status_page [v2,2/4] drm/i915/gt: Re-work invalidate_csb_entries [v2,3/4] drm/i915/gt: Re-work reset_csb [v2,4/4] drm/i915/: Re-work clflush_write32

Message ID

20220128221020.188253-3-michael.cheng@intel.com (mailing list archive)

State

New, archived

Headers

From: Michael Cheng <michael.cheng@intel.com>
To: intel-gfx@lists.freedesktop.org
Date: Fri, 28 Jan 2022 14:10:18 -0800
Message-Id: <20220128221020.188253-3-michael.cheng@intel.com>
In-Reply-To: <20220128221020.188253-1-michael.cheng@intel.com>
References: <20220128221020.188253-1-michael.cheng@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Subject: [Intel-gfx] [PATCH v2 2/4] drm/i915/gt: Re-work
 invalidate_csb_entries
Precedence: list
Cc: michael.cheng@intel.com, lucas.demarchi@intel.com, matthew.auld@intel.com,
 mika.kuoppala@intel.com
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Series

Use drm_clflush* instead of clflush | expand

Commit Message

Michael Cheng Jan. 28, 2022, 10:10 p.m. UTC

Re-work invalidate_csb_entries to use drm_clflush_virt_range. This will
prevent compiler errors when building for non-x86 architectures.

Signed-off-by: Michael Cheng <michael.cheng@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Bowman, Casey G Jan. 29, 2022, 7:21 a.m. UTC | #1

> -----Original Message-----
> From: Cheng, Michael <michael.cheng@intel.com>
> Sent: Friday, January 28, 2022 2:10 PM
> To: intel-gfx@lists.freedesktop.org
> Cc: Cheng, Michael <michael.cheng@intel.com>; Bowman, Casey G
> <casey.g.bowman@intel.com>; De Marchi, Lucas
> <lucas.demarchi@intel.com>; Boyer, Wayne <wayne.boyer@intel.com>;
> ville.syrjala@linux.intel.com; Kuoppala, Mika <mika.kuoppala@intel.com>;
> Auld, Matthew <matthew.auld@intel.com>
> Subject: [PATCH v2 2/4] drm/i915/gt: Re-work invalidate_csb_entries
> 
> Re-work invalidate_csb_entries to use drm_clflush_virt_range. This will
> prevent compiler errors when building for non-x86 architectures.
> 
> Signed-off-by: Michael Cheng <michael.cheng@intel.com>

Reviewed-by: Casey Bowman <casey.g.bowman@intel.com>

> ---
>  drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index 960a9aaf4f3a..90b5daf9433d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -1647,8 +1647,8 @@ cancel_port_requests(struct intel_engine_execlists
> * const execlists,
> 
>  static void invalidate_csb_entries(const u64 *first, const u64 *last)  {
> -	clflush((void *)first);
> -	clflush((void *)last);
> +	drm_clflush_virt_range((void *)first, sizeof(*first));
> +	drm_clflush_virt_range((void *)last, sizeof(*last));
>  }
> 
>  /*
> --
> 2.25.1

Tvrtko Ursulin Jan. 31, 2022, 1:51 p.m. UTC | #2

On 28/01/2022 22:10, Michael Cheng wrote:
> Re-work invalidate_csb_entries to use drm_clflush_virt_range. This will
> prevent compiler errors when building for non-x86 architectures.
> 
> Signed-off-by: Michael Cheng <michael.cheng@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index 960a9aaf4f3a..90b5daf9433d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -1647,8 +1647,8 @@ cancel_port_requests(struct intel_engine_execlists * const execlists,
>   
>   static void invalidate_csb_entries(const u64 *first, const u64 *last)
>   {
> -	clflush((void *)first);
> -	clflush((void *)last);
> +	drm_clflush_virt_range((void *)first, sizeof(*first));
> +	drm_clflush_virt_range((void *)last, sizeof(*last));

How about dropping the helper and from the single call site do:

drm_clflush_virt_range(&buf[0], num_entries * sizeof(buf[0]));

One less function call and CSB is a single cacheline before Gen11 ayway, 
two afterwards, so overall better conversion I think. How does that sound?

Regards,

Tvrtko

>   }
>   
>   /*
>

Mika Kuoppala Jan. 31, 2022, 2:15 p.m. UTC | #3

Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> writes:

> On 28/01/2022 22:10, Michael Cheng wrote:
>> Re-work invalidate_csb_entries to use drm_clflush_virt_range. This will
>> prevent compiler errors when building for non-x86 architectures.
>> 
>> Signed-off-by: Michael Cheng <michael.cheng@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>> index 960a9aaf4f3a..90b5daf9433d 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>> @@ -1647,8 +1647,8 @@ cancel_port_requests(struct intel_engine_execlists * const execlists,
>>   
>>   static void invalidate_csb_entries(const u64 *first, const u64 *last)
>>   {
>> -	clflush((void *)first);
>> -	clflush((void *)last);
>> +	drm_clflush_virt_range((void *)first, sizeof(*first));
>> +	drm_clflush_virt_range((void *)last, sizeof(*last));
>
> How about dropping the helper and from the single call site do:
>
> drm_clflush_virt_range(&buf[0], num_entries * sizeof(buf[0]));
>
> One less function call and CSB is a single cacheline before Gen11 ayway, 
> two afterwards, so overall better conversion I think. How does that sound?

It would definitely work. Now trying to remember why it went into
explicit clflushes: iirc as this is gpu/cpu coherency, the
wbinvd_on_all_cpus we get with *virt_range would then be just
unnecessary perf hit.

-Mika

>
> Regards,
>
> Tvrtko
>
>>   }
>>   
>>   /*
>>

Tvrtko Ursulin Feb. 1, 2022, 9:32 a.m. UTC | #4

On 31/01/2022 14:15, Mika Kuoppala wrote:
> Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> writes:
> 
>> On 28/01/2022 22:10, Michael Cheng wrote:
>>> Re-work invalidate_csb_entries to use drm_clflush_virt_range. This will
>>> prevent compiler errors when building for non-x86 architectures.
>>>
>>> Signed-off-by: Michael Cheng <michael.cheng@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 4 ++--
>>>    1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>>> index 960a9aaf4f3a..90b5daf9433d 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
>>> @@ -1647,8 +1647,8 @@ cancel_port_requests(struct intel_engine_execlists * const execlists,
>>>    
>>>    static void invalidate_csb_entries(const u64 *first, const u64 *last)
>>>    {
>>> -	clflush((void *)first);
>>> -	clflush((void *)last);
>>> +	drm_clflush_virt_range((void *)first, sizeof(*first));
>>> +	drm_clflush_virt_range((void *)last, sizeof(*last));
>>
>> How about dropping the helper and from the single call site do:
>>
>> drm_clflush_virt_range(&buf[0], num_entries * sizeof(buf[0]));
>>
>> One less function call and CSB is a single cacheline before Gen11 ayway,
>> two afterwards, so overall better conversion I think. How does that sound?
> 
> It would definitely work. Now trying to remember why it went into
> explicit clflushes: iirc as this is gpu/cpu coherency, the
> wbinvd_on_all_cpus we get with *virt_range would then be just
> unnecessary perf hit.

Right, apart that AFAICS wbinvd_on_all_cpus does not run on the 
X86_FEATURE_CLFLUSH path of drm_clflush_virt_range, which made me think 
invalidate_csb_entries might have been an a) optimisation which used the 
knowledge CSB is at most two cachelines large, and b) there is no need 
for the memory barrier since as you say it is about CPU/GPU effect so 
CPU ordering is not a concern.

Anyway, larger hammer probably does not harm much, apart that it really 
should be one call to drm_clflush_virt_range.

Regards,

Tvrtko

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 960a9aaf4f3a..90b5daf9433d 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -1647,8 +1647,8 @@  cancel_port_requests(struct intel_engine_execlists * const execlists,
 
 static void invalidate_csb_entries(const u64 *first, const u64 *last)
 {
-	clflush((void *)first);
-	clflush((void *)last);
+	drm_clflush_virt_range((void *)first, sizeof(*first));
+	drm_clflush_virt_range((void *)last, sizeof(*last));
 }
 
 /*

[v2,2/4] drm/i915/gt: Re-work invalidate_csb_entries

Commit Message

Comments

Patch