Message ID | 20230420113859.30965-1-nirmoy.das@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915/mtl: workaround coherency issue for Media | expand |
On Thu, Apr 20, 2023 at 01:38:59PM +0200, Nirmoy Das wrote: > From: Fei Yang <fei.yang@intel.com> > > This patch implements Wa_22016122933. > > In MTL, memory writes initiated by Media tile update the whole > cache line even for partial writes. This creates a coherency > problem for cacheable memory if both CPU and GPU are writing data > to different locations within a single cache line. CTB communication > is impacted by this issue because the head and tail pointers are > adjacent words within a cache line (see struct guc_ct_buffer_desc), > where one is written by GuC and the other by the host. > This patch circumvents the issue by making CPU/GPU shared memory > uncacheable (WC on CPU side, and PAT index 2 for GPU). Also for > CTB which is being updated by both CPU and GuC, mfence instruction > is added to make sure the CPU writes are visible to GPU right away > (flush the write combining buffer). I posted a note about the commit message here on the original series about an hour ago: https://lore.kernel.org/intel-gfx/20230420205238.GA4085390@mdroper-desk1.amr.corp.intel.com/ Patch itself looks fine, I just think the last sentence above should be simplified to avoid inaccuracy. Matt > > While fixing the CTB issue, we noticed some random GSC firmware > loading failure because the share buffers are cacheable (WB) on CPU > side but uncached on GPU side. To fix these issues we need to map > such shared buffers as WC on CPU side. Since such allocations are > not all done through GuC allocator, to avoid too many code changes, > the i915_coherent_map_type() is now hard coded to return WC for MTL. > > BSpec: 45101 > > Signed-off-by: Fei Yang <fei.yang@intel.com> > Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> > Acked-by: Nirmoy Das <nirmoy.das@intel.com> > Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> > Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> > --- > drivers/gpu/drm/i915/gem/i915_gem_pages.c | 5 ++++- > drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c | 13 +++++++++++++ > drivers/gpu/drm/i915/gt/uc/intel_guc.c | 7 +++++++ > drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 6 ++++++ > 4 files changed, 30 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c > index ecd86130b74f..89fc8ea6bcfc 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c > @@ -469,7 +469,10 @@ enum i915_map_type i915_coherent_map_type(struct drm_i915_private *i915, > struct drm_i915_gem_object *obj, > bool always_coherent) > { > - if (i915_gem_object_is_lmem(obj)) > + /* > + * Wa_22016122933: always return I915_MAP_WC for MTL > + */ > + if (i915_gem_object_is_lmem(obj) || IS_METEORLAKE(i915)) > return I915_MAP_WC; > if (HAS_LLC(i915) || always_coherent) > return I915_MAP_WB; > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c > index 1d9fdfb11268..236673c02f9a 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c > @@ -110,6 +110,13 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc) > if (obj->base.size < gsc->fw.size) > return -ENOSPC; > > + /* > + * Wa_22016122933: For MTL the shared memory needs to be mapped > + * as WC on CPU side and UC (PAT index 2) on GPU side > + */ > + if (IS_METEORLAKE(i915)) > + i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE); > + > dst = i915_gem_object_pin_map_unlocked(obj, > i915_coherent_map_type(i915, obj, true)); > if (IS_ERR(dst)) > @@ -125,6 +132,12 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc) > memset(dst, 0, obj->base.size); > memcpy(dst, src, gsc->fw.size); > > + /* > + * Wa_22016122933: Making sure the data in dst is > + * visible to GSC right away > + */ > + intel_guc_write_barrier(>->uc.guc); > + > i915_gem_object_unpin_map(gsc->fw.obj); > i915_gem_object_unpin_map(obj); > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c > index e89f16ecf1ae..c9f20385f6a0 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c > @@ -744,6 +744,13 @@ struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 size) > if (IS_ERR(obj)) > return ERR_CAST(obj); > > + /* > + * Wa_22016122933: For MTL the shared memory needs to be mapped > + * as WC on CPU side and UC (PAT index 2) on GPU side > + */ > + if (IS_METEORLAKE(gt->i915)) > + i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE); > + > vma = i915_vma_instance(obj, >->ggtt->vm, NULL); > if (IS_ERR(vma)) > goto err; > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > index 1803a633ed64..99a0a89091e7 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > @@ -902,6 +902,12 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) > /* now update descriptor */ > WRITE_ONCE(desc->head, head); > > + /* > + * Wa_22016122933: Making sure the head update is > + * visible to GuC right away > + */ > + intel_guc_write_barrier(ct_to_guc(ct)); > + > return available - len; > > corrupted: > -- > 2.39.0 >
On 4/20/2023 11:30 PM, Matt Roper wrote: > On Thu, Apr 20, 2023 at 01:38:59PM +0200, Nirmoy Das wrote: >> From: Fei Yang <fei.yang@intel.com> >> >> This patch implements Wa_22016122933. >> >> In MTL, memory writes initiated by Media tile update the whole >> cache line even for partial writes. This creates a coherency >> problem for cacheable memory if both CPU and GPU are writing data >> to different locations within a single cache line. CTB communication >> is impacted by this issue because the head and tail pointers are >> adjacent words within a cache line (see struct guc_ct_buffer_desc), >> where one is written by GuC and the other by the host. >> This patch circumvents the issue by making CPU/GPU shared memory >> uncacheable (WC on CPU side, and PAT index 2 for GPU). Also for >> CTB which is being updated by both CPU and GuC, mfence instruction >> is added to make sure the CPU writes are visible to GPU right away >> (flush the write combining buffer). > I posted a note about the commit message here on the original series > about an hour ago: > > https://lore.kernel.org/intel-gfx/20230420205238.GA4085390@mdroper-desk1.amr.corp.intel.com/ > > Patch itself looks fine, I just think the last sentence above should be > simplified to avoid inaccuracy. Thanks for your review, Matt. I will resend with that fixed. Nirmoy > > Matt > >> While fixing the CTB issue, we noticed some random GSC firmware >> loading failure because the share buffers are cacheable (WB) on CPU >> side but uncached on GPU side. To fix these issues we need to map >> such shared buffers as WC on CPU side. Since such allocations are >> not all done through GuC allocator, to avoid too many code changes, >> the i915_coherent_map_type() is now hard coded to return WC for MTL. >> >> BSpec: 45101 >> >> Signed-off-by: Fei Yang <fei.yang@intel.com> >> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> >> Acked-by: Nirmoy Das <nirmoy.das@intel.com> >> Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com> >> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> >> --- >> drivers/gpu/drm/i915/gem/i915_gem_pages.c | 5 ++++- >> drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c | 13 +++++++++++++ >> drivers/gpu/drm/i915/gt/uc/intel_guc.c | 7 +++++++ >> drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 6 ++++++ >> 4 files changed, 30 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c >> index ecd86130b74f..89fc8ea6bcfc 100644 >> --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c >> +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c >> @@ -469,7 +469,10 @@ enum i915_map_type i915_coherent_map_type(struct drm_i915_private *i915, >> struct drm_i915_gem_object *obj, >> bool always_coherent) >> { >> - if (i915_gem_object_is_lmem(obj)) >> + /* >> + * Wa_22016122933: always return I915_MAP_WC for MTL >> + */ >> + if (i915_gem_object_is_lmem(obj) || IS_METEORLAKE(i915)) >> return I915_MAP_WC; >> if (HAS_LLC(i915) || always_coherent) >> return I915_MAP_WB; >> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c >> index 1d9fdfb11268..236673c02f9a 100644 >> --- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c >> +++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c >> @@ -110,6 +110,13 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc) >> if (obj->base.size < gsc->fw.size) >> return -ENOSPC; >> >> + /* >> + * Wa_22016122933: For MTL the shared memory needs to be mapped >> + * as WC on CPU side and UC (PAT index 2) on GPU side >> + */ >> + if (IS_METEORLAKE(i915)) >> + i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE); >> + >> dst = i915_gem_object_pin_map_unlocked(obj, >> i915_coherent_map_type(i915, obj, true)); >> if (IS_ERR(dst)) >> @@ -125,6 +132,12 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc) >> memset(dst, 0, obj->base.size); >> memcpy(dst, src, gsc->fw.size); >> >> + /* >> + * Wa_22016122933: Making sure the data in dst is >> + * visible to GSC right away >> + */ >> + intel_guc_write_barrier(>->uc.guc); >> + >> i915_gem_object_unpin_map(gsc->fw.obj); >> i915_gem_object_unpin_map(obj); >> >> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c >> index e89f16ecf1ae..c9f20385f6a0 100644 >> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c >> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c >> @@ -744,6 +744,13 @@ struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 size) >> if (IS_ERR(obj)) >> return ERR_CAST(obj); >> >> + /* >> + * Wa_22016122933: For MTL the shared memory needs to be mapped >> + * as WC on CPU side and UC (PAT index 2) on GPU side >> + */ >> + if (IS_METEORLAKE(gt->i915)) >> + i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE); >> + >> vma = i915_vma_instance(obj, >->ggtt->vm, NULL); >> if (IS_ERR(vma)) >> goto err; >> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c >> index 1803a633ed64..99a0a89091e7 100644 >> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c >> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c >> @@ -902,6 +902,12 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) >> /* now update descriptor */ >> WRITE_ONCE(desc->head, head); >> >> + /* >> + * Wa_22016122933: Making sure the head update is >> + * visible to GuC right away >> + */ >> + intel_guc_write_barrier(ct_to_guc(ct)); >> + >> return available - len; >> >> corrupted: >> -- >> 2.39.0 >>
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c index ecd86130b74f..89fc8ea6bcfc 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c @@ -469,7 +469,10 @@ enum i915_map_type i915_coherent_map_type(struct drm_i915_private *i915, struct drm_i915_gem_object *obj, bool always_coherent) { - if (i915_gem_object_is_lmem(obj)) + /* + * Wa_22016122933: always return I915_MAP_WC for MTL + */ + if (i915_gem_object_is_lmem(obj) || IS_METEORLAKE(i915)) return I915_MAP_WC; if (HAS_LLC(i915) || always_coherent) return I915_MAP_WB; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c index 1d9fdfb11268..236673c02f9a 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c @@ -110,6 +110,13 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc) if (obj->base.size < gsc->fw.size) return -ENOSPC; + /* + * Wa_22016122933: For MTL the shared memory needs to be mapped + * as WC on CPU side and UC (PAT index 2) on GPU side + */ + if (IS_METEORLAKE(i915)) + i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE); + dst = i915_gem_object_pin_map_unlocked(obj, i915_coherent_map_type(i915, obj, true)); if (IS_ERR(dst)) @@ -125,6 +132,12 @@ static int gsc_fw_load_prepare(struct intel_gsc_uc *gsc) memset(dst, 0, obj->base.size); memcpy(dst, src, gsc->fw.size); + /* + * Wa_22016122933: Making sure the data in dst is + * visible to GSC right away + */ + intel_guc_write_barrier(>->uc.guc); + i915_gem_object_unpin_map(gsc->fw.obj); i915_gem_object_unpin_map(obj); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index e89f16ecf1ae..c9f20385f6a0 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -744,6 +744,13 @@ struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 size) if (IS_ERR(obj)) return ERR_CAST(obj); + /* + * Wa_22016122933: For MTL the shared memory needs to be mapped + * as WC on CPU side and UC (PAT index 2) on GPU side + */ + if (IS_METEORLAKE(gt->i915)) + i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE); + vma = i915_vma_instance(obj, >->ggtt->vm, NULL); if (IS_ERR(vma)) goto err; diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c index 1803a633ed64..99a0a89091e7 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c @@ -902,6 +902,12 @@ static int ct_read(struct intel_guc_ct *ct, struct ct_incoming_msg **msg) /* now update descriptor */ WRITE_ONCE(desc->head, head); + /* + * Wa_22016122933: Making sure the head update is + * visible to GuC right away + */ + intel_guc_write_barrier(ct_to_guc(ct)); + return available - len; corrupted: