drm/i915/mtl: workaround coherency issue for Media

Message ID	20230420113859.30965-1-nirmoy.das@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> From: Nirmoy Das <nirmoy.das@intel.com> To: intel-gfx@lists.freedesktop.org Subject: [PATCH] drm/i915/mtl: workaround coherency issue for Media Date: Thu, 20 Apr 2023 13:38:59 +0200 Message-Id: <20230420113859.30965-1-nirmoy.das@intel.com> MIME-Version: 1.0 Organization: Intel Deutschland GmbH, Registered Address: Am Campeon 10, 85579 Neubiberg, Germany, Commercial Register: Amtsgericht Muenchen HRB 186928 Content-Transfer-Encoding: 8bit Precedence: list Cc: Andrzej Hajda <andrzej.hajda@intel.com>, Nirmoy Das <nirmoy.das@intel.com>, Fei Yang <fei.yang@intel.com>, dri-devel@lists.freedesktop.org, Andi Shyti <andi.shyti@linux.intel.com> Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Series	drm/i915/mtl: workaround coherency issue for Media \| expand drm/i915/mtl: workaround coherency issue for Media

Message ID

20230420113859.30965-1-nirmoy.das@intel.com (mailing list archive)

State

New, archived

Headers

From: Nirmoy Das <nirmoy.das@intel.com>
To: intel-gfx@lists.freedesktop.org
Subject: [PATCH] drm/i915/mtl: workaround coherency issue for Media
Date: Thu, 20 Apr 2023 13:38:59 +0200
Message-Id: <20230420113859.30965-1-nirmoy.das@intel.com>
MIME-Version: 1.0
Organization: Intel Deutschland GmbH, Registered Address: Am Campeon 10,
 85579 Neubiberg, Germany,
 Commercial Register: Amtsgericht Muenchen HRB 186928
Content-Transfer-Encoding: 8bit
Precedence: list
Cc: Andrzej Hajda <andrzej.hajda@intel.com>,
 Nirmoy Das <nirmoy.das@intel.com>,
 Fei Yang <fei.yang@intel.com>, dri-devel@lists.freedesktop.org,
 Andi Shyti <andi.shyti@linux.intel.com>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Series

drm/i915/mtl: workaround coherency issue for Media | expand

Commit Message

Nirmoy Das April 20, 2023, 11:38 a.m. UTC

From: Fei Yang <fei.yang@intel.com>

This patch implements Wa_22016122933.

In MTL, memory writes initiated by Media tile update the whole
cache line even for partial writes. This creates a coherency
problem for cacheable memory if both CPU and GPU are writing data
to different locations within a single cache line. CTB communication
is impacted by this issue because the head and tail pointers are
adjacent words within a cache line (see struct guc_ct_buffer_desc),
where one is written by GuC and the other by the host.
This patch circumvents the issue by making CPU/GPU shared memory
uncacheable (WC on CPU side, and PAT index 2 for GPU). Also for
CTB which is being updated by both CPU and GuC, mfence instruction
is added to make sure the CPU writes are visible to GPU right away
(flush the write combining buffer).

While fixing the CTB issue, we noticed some random GSC firmware
loading failure because the share buffers are cacheable (WB) on CPU
side but uncached on GPU side. To fix these issues we need to map
such shared buffers as WC on CPU side. Since such allocations are
not all done through GuC allocator, to avoid too many code changes,
the i915_coherent_map_type() is now hard coded to return WC for MTL.

BSpec: 45101

Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_pages.c |  5 ++++-
 drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c | 13 +++++++++++++
 drivers/gpu/drm/i915/gt/uc/intel_guc.c    |  7 +++++++
 drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  6 ++++++
 4 files changed, 30 insertions(+), 1 deletion(-)

Comments

Matt Roper April 20, 2023, 9:30 p.m. UTC | #1

On Thu, Apr 20, 2023 at 01:38:59PM +0200, Nirmoy Das wrote:
> From: Fei Yang <fei.yang@intel.com>
> 
> This patch implements Wa_22016122933.
> 
> In MTL, memory writes initiated by Media tile update the whole
> cache line even for partial writes. This creates a coherency
> problem for cacheable memory if both CPU and GPU are writing data
> to different locations within a single cache line. CTB communication
> is impacted by this issue because the head and tail pointers are
> adjacent words within a cache line (see struct guc_ct_buffer_desc),
> where one is written by GuC and the other by the host.
> This patch circumvents the issue by making CPU/GPU shared memory
> uncacheable (WC on CPU side, and PAT index 2 for GPU). Also for
> CTB which is being updated by both CPU and GuC, mfence instruction
> is added to make sure the CPU writes are visible to GPU right away
> (flush the write combining buffer).

I posted a note about the commit message here on the original series
about an hour ago:

https://lore.kernel.org/intel-gfx/20230420205238.GA4085390@mdroper-desk1.amr.corp.intel.com/

Patch itself looks fine, I just think the last sentence above should be
simplified to avoid inaccuracy.


Matt

> 
> While fixing the CTB issue, we noticed some random GSC firmware
> loading failure because the share buffers are cacheable (WB) on CPU
> side but uncached on GPU side. To fix these issues we need to map
> such shared buffers as WC on CPU side. Since such allocations are
> not all done through GuC allocator, to avoid too many code changes,
> the i915_coherent_map_type() is now hard coded to return WC for MTL.
> 
> BSpec: 45101
> 
> Signed-off-by: Fei Yang <fei.yang@intel.com>
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
> Acked-by: Nirmoy Das <nirmoy.das@intel.com>
> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_pages.c |  5 ++++-
>  drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c | 13 +++++++++++++
>  drivers/gpu/drm/i915/gt/uc/intel_guc.c    |  7 +++++++
>  drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  6 ++++++
>  4 files changed, 30 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> index ecd86130b74f..89fc8ea6bcfc 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
> @@ -469,7 +469,10 @@ enum i915_map_type i915_coherent_map_type(struct drm_i915_private *i915,
>  					  struct drm_i915_gem_object *obj,
>  					  bool always_coherent)
>  {
> -	if (i915_gem_object_is_lmem(obj))
> +	/*
> +	 * Wa_22016122933: always return I915_MAP_WC for MTL
> +	 */
> +	if (i915_gem_object_is_lmem(obj) || IS_METEORLAKE(i915))
>  		return I915_MAP_WC;
>  	if (HAS_LLC(i915) || always_coherent)
>  		return I915_MAP_WB;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
> index 1d9fdfb11268..236673c02f9a 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
> @@ -110,6 +110,13 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc)
>  	if (obj->base.size < gsc->fw.size)
>  		return -ENOSPC;
>  
> +	/*
> +	 * Wa_22016122933: For MTL the shared memory needs to be mapped
> +	 * as WC on CPU side and UC (PAT index 2) on GPU side
> +	 */
> +	if (IS_METEORLAKE(i915))
> +		i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
> +
>  	dst = i915_gem_object_pin_map_unlocked(obj,
>  					       i915_coherent_map_type(i915, obj, true));
>  	if (IS_ERR(dst))
> @@ -125,6 +132,12 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc)
>  	memset(dst, 0, obj->base.size);
>  	memcpy(dst, src, gsc->fw.size);
>  
> +	/*
> +	 * Wa_22016122933: Making sure the data in dst is
> +	 * visible to GSC right away
> +	 */
> +	intel_guc_write_barrier(&gt->uc.guc);
> +
>  	i915_gem_object_unpin_map(gsc->fw.obj);
>  	i915_gem_object_unpin_map(obj);
>  
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> index e89f16ecf1ae..c9f20385f6a0 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
> @@ -744,6 +744,13 @@ struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 size)
>  	if (IS_ERR(obj))
>  		return ERR_CAST(obj);
>  
> +	/*
> +	 * Wa_22016122933: For MTL the shared memory needs to be mapped
> +	 * as WC on CPU side and UC (PAT index 2) on GPU side
> +	 */
> +	if (IS_METEORLAKE(gt->i915))
> +		i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
> +
>  	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
>  	if (IS_ERR(vma))
>  		goto err;
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> index 1803a633ed64..99a0a89091e7 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
> @@ -902,6 +902,12 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>  	/* now update descriptor */
>  	WRITE_ONCE(desc->head, head);
>  
> +	/*
> +	 * Wa_22016122933: Making sure the head update is
> +	 * visible to GuC right away
> +	 */
> +	intel_guc_write_barrier(ct_to_guc(ct));
> +
>  	return available - len;
>  
>  corrupted:
> -- 
> 2.39.0
>

Nirmoy Das April 21, 2023, 7:57 a.m. UTC | #2

On 4/20/2023 11:30 PM, Matt Roper wrote:
> On Thu, Apr 20, 2023 at 01:38:59PM +0200, Nirmoy Das wrote:
>> From: Fei Yang <fei.yang@intel.com>
>>
>> This patch implements Wa_22016122933.
>>
>> In MTL, memory writes initiated by Media tile update the whole
>> cache line even for partial writes. This creates a coherency
>> problem for cacheable memory if both CPU and GPU are writing data
>> to different locations within a single cache line. CTB communication
>> is impacted by this issue because the head and tail pointers are
>> adjacent words within a cache line (see struct guc_ct_buffer_desc),
>> where one is written by GuC and the other by the host.
>> This patch circumvents the issue by making CPU/GPU shared memory
>> uncacheable (WC on CPU side, and PAT index 2 for GPU). Also for
>> CTB which is being updated by both CPU and GuC, mfence instruction
>> is added to make sure the CPU writes are visible to GPU right away
>> (flush the write combining buffer).
> I posted a note about the commit message here on the original series
> about an hour ago:
>
> https://lore.kernel.org/intel-gfx/20230420205238.GA4085390@mdroper-desk1.amr.corp.intel.com/
>
> Patch itself looks fine, I just think the last sentence above should be
> simplified to avoid inaccuracy.

Thanks for your review, Matt. I will resend with that fixed.


Nirmoy

>
> Matt
>
>> While fixing the CTB issue, we noticed some random GSC firmware
>> loading failure because the share buffers are cacheable (WB) on CPU
>> side but uncached on GPU side. To fix these issues we need to map
>> such shared buffers as WC on CPU side. Since such allocations are
>> not all done through GuC allocator, to avoid too many code changes,
>> the i915_coherent_map_type() is now hard coded to return WC for MTL.
>>
>> BSpec: 45101
>>
>> Signed-off-by: Fei Yang <fei.yang@intel.com>
>> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
>> Acked-by: Nirmoy Das <nirmoy.das@intel.com>
>> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
>> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_pages.c |  5 ++++-
>>   drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c | 13 +++++++++++++
>>   drivers/gpu/drm/i915/gt/uc/intel_guc.c    |  7 +++++++
>>   drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c |  6 ++++++
>>   4 files changed, 30 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
>> index ecd86130b74f..89fc8ea6bcfc 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
>> @@ -469,7 +469,10 @@ enum i915_map_type i915_coherent_map_type(struct drm_i915_private *i915,
>>   					  struct drm_i915_gem_object *obj,
>>   					  bool always_coherent)
>>   {
>> -	if (i915_gem_object_is_lmem(obj))
>> +	/*
>> +	 * Wa_22016122933: always return I915_MAP_WC for MTL
>> +	 */
>> +	if (i915_gem_object_is_lmem(obj) || IS_METEORLAKE(i915))
>>   		return I915_MAP_WC;
>>   	if (HAS_LLC(i915) || always_coherent)
>>   		return I915_MAP_WB;
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
>> index 1d9fdfb11268..236673c02f9a 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
>> @@ -110,6 +110,13 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc)
>>   	if (obj->base.size < gsc->fw.size)
>>   		return -ENOSPC;
>>   
>> +	/*
>> +	 * Wa_22016122933: For MTL the shared memory needs to be mapped
>> +	 * as WC on CPU side and UC (PAT index 2) on GPU side
>> +	 */
>> +	if (IS_METEORLAKE(i915))
>> +		i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
>> +
>>   	dst = i915_gem_object_pin_map_unlocked(obj,
>>   					       i915_coherent_map_type(i915, obj, true));
>>   	if (IS_ERR(dst))
>> @@ -125,6 +132,12 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc)
>>   	memset(dst, 0, obj->base.size);
>>   	memcpy(dst, src, gsc->fw.size);
>>   
>> +	/*
>> +	 * Wa_22016122933: Making sure the data in dst is
>> +	 * visible to GSC right away
>> +	 */
>> +	intel_guc_write_barrier(&gt->uc.guc);
>> +
>>   	i915_gem_object_unpin_map(gsc->fw.obj);
>>   	i915_gem_object_unpin_map(obj);
>>   
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
>> index e89f16ecf1ae..c9f20385f6a0 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
>> @@ -744,6 +744,13 @@ struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 size)
>>   	if (IS_ERR(obj))
>>   		return ERR_CAST(obj);
>>   
>> +	/*
>> +	 * Wa_22016122933: For MTL the shared memory needs to be mapped
>> +	 * as WC on CPU side and UC (PAT index 2) on GPU side
>> +	 */
>> +	if (IS_METEORLAKE(gt->i915))
>> +		i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
>> +
>>   	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
>>   	if (IS_ERR(vma))
>>   		goto err;
>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>> index 1803a633ed64..99a0a89091e7 100644
>> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
>> @@ -902,6 +902,12 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
>>   	/* now update descriptor */
>>   	WRITE_ONCE(desc->head, head);
>>   
>> +	/*
>> +	 * Wa_22016122933: Making sure the head update is
>> +	 * visible to GuC right away
>> +	 */
>> +	intel_guc_write_barrier(ct_to_guc(ct));
>> +
>>   	return available - len;
>>   
>>   corrupted:
>> -- 
>> 2.39.0
>>

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index ecd86130b74f..89fc8ea6bcfc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -469,7 +469,10 @@  enum i915_map_type i915_coherent_map_type(struct drm_i915_private *i915,
 					  struct drm_i915_gem_object *obj,
 					  bool always_coherent)
 {
-	if (i915_gem_object_is_lmem(obj))
+	/*
+	 * Wa_22016122933: always return I915_MAP_WC for MTL
+	 */
+	if (i915_gem_object_is_lmem(obj) || IS_METEORLAKE(i915))
 		return I915_MAP_WC;
 	if (HAS_LLC(i915) || always_coherent)
 		return I915_MAP_WB;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
index 1d9fdfb11268..236673c02f9a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
@@ -110,6 +110,13 @@  static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc)
 	if (obj->base.size < gsc->fw.size)
 		return -ENOSPC;
 
+	/*
+	 * Wa_22016122933: For MTL the shared memory needs to be mapped
+	 * as WC on CPU side and UC (PAT index 2) on GPU side
+	 */
+	if (IS_METEORLAKE(i915))
+		i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
+
 	dst = i915_gem_object_pin_map_unlocked(obj,
 					       i915_coherent_map_type(i915, obj, true));
 	if (IS_ERR(dst))
@@ -125,6 +132,12 @@  static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc)
 	memset(dst, 0, obj->base.size);
 	memcpy(dst, src, gsc->fw.size);
 
+	/*
+	 * Wa_22016122933: Making sure the data in dst is
+	 * visible to GSC right away
+	 */
+	intel_guc_write_barrier(&gt->uc.guc);
+
 	i915_gem_object_unpin_map(gsc->fw.obj);
 	i915_gem_object_unpin_map(obj);
 
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
index e89f16ecf1ae..c9f20385f6a0 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c
@@ -744,6 +744,13 @@  struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 size)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
+	/*
+	 * Wa_22016122933: For MTL the shared memory needs to be mapped
+	 * as WC on CPU side and UC (PAT index 2) on GPU side
+	 */
+	if (IS_METEORLAKE(gt->i915))
+		i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE);
+
 	vma = i915_vma_instance(obj, &gt->ggtt->vm, NULL);
 	if (IS_ERR(vma))
 		goto err;
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
index 1803a633ed64..99a0a89091e7 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c
@@ -902,6 +902,12 @@  static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg)
 	/* now update descriptor */
 	WRITE_ONCE(desc->head, head);
 
+	/*
+	 * Wa_22016122933: Making sure the head update is
+	 * visible to GuC right away
+	 */
+	intel_guc_write_barrier(ct_to_guc(ct));
+
 	return available - len;
 
 corrupted: