diff mbox

[2/2] drm/i915: Serialise updates to GGTT with access through GGTT on Braswell

Message ID 1445622212-32604-2-git-send-email-chris@chris-wilson.co.uk
State New, archived
Headers show

Commit Message

Chris Wilson Oct. 23, 2015, 5:43 p.m. UTC
When accessing through the GTT from one CPU whilst concurrently updating
the GGTT PTEs in another thread, the hardware likes to return random
data. As we have strong serialisation prevent us from modifying the PTE
of an active GTT mmapping, we have to conclude that it whilst modifying
other PTE's that error occurs. (I have not looked for any pattern such
as modifying PTE within the same page or cacheline as active PTE -
though checking whether revoking neighbouring objects should be enough
to test that theory.) The corruption also seems restricted to Braswell
and disappears with maxcpus=0. This patch stops all access through the
GTT by other CPUs when we update any PTE by stopping the machine around
the GGTT update.

Testcase: igt/gem_concurrent_blit
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89079
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/Kconfig        |  1 +
 drivers/gpu/drm/i915/i915_gem_gtt.c | 24 ++++++++++++++++++++++++
 2 files changed, 25 insertions(+)

Comments

Chris Wilson Oct. 23, 2015, 7:45 p.m. UTC | #1
On Fri, Oct 23, 2015 at 06:43:32PM +0100, Chris Wilson wrote:
> (I have not looked for any pattern such
> as modifying PTE within the same page or cacheline as active PTE -
> though checking whether revoking neighbouring objects should be enough
> to test that theory.)

So, fwiw, I revoked the CPU PTE for the neighbouring objects (each object
is 1MiB in size and so should occupy 2KiB in the GGTT) but the
corruption persisted. If I didn't make a mistake that should squash the
theory that it is access through PTE neighbouring the update that are
trampled upon. I had also earlier ioremaped the GGTT (dev_priv->gtt.gsm)
as uncached, and thrown countless mb() around to rule out incomplete
writes to the GGTT prior to CPU access. I haven't tried idling the GPU,
as other tests within gem_concurrent_blit demonstrate that is not a GPU
flushing issue.
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
index 051eab33e4c7..fcd77b27514d 100644
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@@ -10,6 +10,7 @@  config DRM_I915
 	# the shmem_readpage() which depends upon tmpfs
 	select SHMEM
 	select TMPFS
+	select STOP_MACHINE
 	select DRM_KMS_HELPER
 	select DRM_PANEL
 	select DRM_MIPI_DSI
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 016739eefd45..4d357e18e5d1 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -24,6 +24,7 @@ 
  */
 
 #include <linux/seq_file.h>
+#include <linux/stop_machine.h>
 #include <drm/drmP.h>
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
@@ -2533,6 +2534,26 @@  static int ggtt_bind_vma(struct i915_vma *vma,
 	return 0;
 }
 
+struct ggtt_bind_vma__cb {
+	struct i915_vma *vma;
+	enum i915_cache_level cache_level;
+	u32 flags;
+};
+
+static int ggtt_bind_vma__cb(void *_arg)
+{
+	struct ggtt_bind_vma__cb *arg = _arg;
+	return ggtt_bind_vma(arg->vma, arg->cache_level, arg->flags);
+}
+
+static int ggtt_bind_vma__BKL(struct i915_vma *vma,
+			      enum i915_cache_level cache_level,
+			      u32 flags)
+{
+	struct ggtt_bind_vma__cb arg = { vma, cache_level, flags };
+	return stop_machine(ggtt_bind_vma__cb, &arg, NULL);
+}
+
 static int aliasing_gtt_bind_vma(struct i915_vma *vma,
 				 enum i915_cache_level cache_level,
 				 u32 flags)
@@ -3000,6 +3021,9 @@  static int gen8_gmch_probe(struct drm_device *dev,
 	dev_priv->gtt.base.bind_vma = ggtt_bind_vma;
 	dev_priv->gtt.base.unbind_vma = ggtt_unbind_vma;
 
+	if (IS_CHERRYVIEW(dev))
+		dev_priv->gtt.base.bind_vma = ggtt_bind_vma__BKL;
+
 	return ret;
 }