diff mbox series

drm/i915: Force pte cacheline to main memory

Message ID 20200511160803.15407-1-mika.kuoppala@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915: Force pte cacheline to main memory | expand

Commit Message

Mika Kuoppala May 11, 2020, 4:08 p.m. UTC
We have problems of tgl not seeing a valid pte entry
when iommu is enabled. Add heavy handed flushing
of entry modification by flushing the cpu, cacheline
and then wcb. This forces the pte out to main memory
past this point regarless of promises of coherency.

This is an evolution of an experimental patch from
Chris Wilson of adding wmb for coherent partners,
by adding a clflush to force the cache->memory step.

Testcase: igt/gem_exec_fence/parallel
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

Comments

Chris Wilson May 11, 2020, 4:13 p.m. UTC | #1
Quoting Mika Kuoppala (2020-05-11 17:08:03)
> We have problems of tgl not seeing a valid pte entry
> when iommu is enabled. Add heavy handed flushing
> of entry modification by flushing the cpu, cacheline
> and then wcb. This forces the pte out to main memory
> past this point regarless of promises of coherency.
> 
> This is an evolution of an experimental patch from
> Chris Wilson of adding wmb for coherent partners,
> by adding a clflush to force the cache->memory step.
> 
> Testcase: igt/gem_exec_fence/parallel
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

Not only does it help tgl, but it is also helping with a coherency
problem on Braswell. We see similar problems on gen9 and icl, and I have
a trybot run to see if it helps with those.

As it is helping with multiple platforms and diverse symptoms, even if
we can't explain why it helps, it is. That makes it prudent to apply to
improve the baseline and work from there.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
Chris Wilson May 11, 2020, 4:16 p.m. UTC | #2
Quoting Chris Wilson (2020-05-11 17:13:52)
> Quoting Mika Kuoppala (2020-05-11 17:08:03)
> > We have problems of tgl not seeing a valid pte entry
> > when iommu is enabled. Add heavy handed flushing
> > of entry modification by flushing the cpu, cacheline
> > and then wcb. This forces the pte out to main memory
> > past this point regarless of promises of coherency.
> > 
> > This is an evolution of an experimental patch from
> > Chris Wilson of adding wmb for coherent partners,
> > by adding a clflush to force the cache->memory step.
> > 
> > Testcase: igt/gem_exec_fence/parallel
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> 
> Not only does it help tgl, but it is also helping with a coherency
> problem on Braswell. We see similar problems on gen9 and icl, and I have
> a trybot run to see if it helps with those.

It should be noted that Braswell is using WC kmaps of the PTE, so this
should not even be necessary... But if we drop the WC and keep the
clflush, it fails. Just to add to the confusion.
-Chris
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index 94e746af8926..6b13408b0e38 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -389,6 +389,15 @@  static int gen8_ppgtt_alloc(struct i915_address_space *vm,
 	return err;
 }
 
+static __always_inline inline void
+write_pte(gen8_pte_t * const pte, const gen8_pte_t val)
+{
+	*pte = val;
+	wmb(); /* cpu to cache */
+	clflush((void *)pte); /* cache to memory */
+	wmb(); /* visible to all */
+}
+
 static __always_inline u64
 gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
 		      struct i915_page_directory *pdp,
@@ -405,7 +414,8 @@  gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt,
 	vaddr = kmap_atomic_px(i915_pt_entry(pd, gen8_pd_index(idx, 1)));
 	do {
 		GEM_BUG_ON(iter->sg->length < I915_GTT_PAGE_SIZE);
-		vaddr[gen8_pd_index(idx, 0)] = pte_encode | iter->dma;
+		write_pte(&vaddr[gen8_pd_index(idx, 0)],
+			  pte_encode | iter->dma);
 
 		iter->dma += I915_GTT_PAGE_SIZE;
 		if (iter->dma >= iter->max) {
@@ -487,7 +497,7 @@  static void gen8_ppgtt_insert_huge(struct i915_vma *vma,
 
 		do {
 			GEM_BUG_ON(iter->sg->length < page_size);
-			vaddr[index++] = encode | iter->dma;
+			write_pte(&vaddr[index++], encode | iter->dma);
 
 			start += page_size;
 			iter->dma += page_size;