diff mbox

[libdrm] intel: Use CPU mmap for unsynchronized map with linear buffers

Message ID 1442499542-16633-1-git-send-email-ville.syrjala@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ville Syrjälä Sept. 17, 2015, 2:19 p.m. UTC
From: Ville Syrjälä <ville.syrjala@linux.intel.com>

On LLC platforms there's no need to use GTT mmap for unsynchronized
maps if the object isn't tiled. So switch to using CPU mmap for linar
objects. This avoids having to use the GTT for GL buffer objects
entirely, and thus we can ignore the GTT mappable size limitation.
For tiled objects we still want the hardware to do the (de)tiling so
keep using GTT for such objects.

The display engine is not coherent even on LLC platforms, so this won't
work too well if we mix scanout and unsynchronized maps of linear bos.
Actually, that would only be a problem for an already uncached object,
otherwise it will get clflushed anyway when being made UC/WC prior to
scanout. The alreday UC object case could be handled by either
clflushing straight from userspace, or we could add a new ioctl to
clflush or mark the object as cache_dirty so that it will get
clflushed in the future just prior to scanout. I started to think
that a small nop pwrite would have the desired effect, but in fact it
would only flush the cachelines it touches so wouldn't actually work
I doubt we want to pwrite the entire object just to get it clflushed.

This fixes Ilias's arb_texture_buffer_object-max-size piglit test
on LLC platforms.

Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 intel/intel_bufmgr_gem.c | 30 +++++++++++++++++++++++-------
 1 file changed, 23 insertions(+), 7 deletions(-)

Comments

Chris Wilson Sept. 17, 2015, 2:26 p.m. UTC | #1
On Thu, Sep 17, 2015 at 05:19:02PM +0300, ville.syrjala@linux.intel.com wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> On LLC platforms there's no need to use GTT mmap for unsynchronized
> maps if the object isn't tiled. So switch to using CPU mmap for linar
> objects. This avoids having to use the GTT for GL buffer objects
> entirely, and thus we can ignore the GTT mappable size limitation.
> For tiled objects we still want the hardware to do the (de)tiling so
> keep using GTT for such objects.
> 
> The display engine is not coherent even on LLC platforms, so this won't
> work too well if we mix scanout and unsynchronized maps of linear bos.
> Actually, that would only be a problem for an already uncached object,
> otherwise it will get clflushed anyway when being made UC/WC prior to
> scanout. The alreday UC object case could be handled by either
> clflushing straight from userspace, or we could add a new ioctl to
> clflush or mark the object as cache_dirty so that it will get
> clflushed in the future just prior to scanout. I started to think
> that a small nop pwrite would have the desired effect, but in fact it
> would only flush the cachelines it touches so wouldn't actually work
> I doubt we want to pwrite the entire object just to get it clflushed.
> 
> This fixes Ilias's arb_texture_buffer_object-max-size piglit test
> on LLC platforms.

Note that there have been patches to fix mesa/i965 for this issue on
both llc and !llc on the mailing list for a few months.
-Chris
diff mbox

Patch

diff --git a/intel/intel_bufmgr_gem.c b/intel/intel_bufmgr_gem.c
index 63122d0..5e8335a 100644
--- a/intel/intel_bufmgr_gem.c
+++ b/intel/intel_bufmgr_gem.c
@@ -1337,11 +1337,10 @@  static void drm_intel_gem_bo_unreference(drm_intel_bo *bo)
 	}
 }
 
-static int drm_intel_gem_bo_map(drm_intel_bo *bo, int write_enable)
+static int map_cpu(drm_intel_bo *bo)
 {
 	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
 	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
-	struct drm_i915_gem_set_domain set_domain;
 	int ret;
 
 	if (bo_gem->is_userptr) {
@@ -1350,8 +1349,6 @@  static int drm_intel_gem_bo_map(drm_intel_bo *bo, int write_enable)
 		return 0;
 	}
 
-	pthread_mutex_lock(&bufmgr_gem->lock);
-
 	if (bo_gem->map_count++ == 0)
 		drm_intel_gem_bo_open_vma(bufmgr_gem, bo_gem);
 
@@ -1384,6 +1381,24 @@  static int drm_intel_gem_bo_map(drm_intel_bo *bo, int write_enable)
 	    bo_gem->mem_virtual);
 	bo->virtual = bo_gem->mem_virtual;
 
+	return 0;
+}
+
+static int drm_intel_gem_bo_map(drm_intel_bo *bo, int write_enable)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	struct drm_i915_gem_set_domain set_domain;
+	int ret;
+
+	pthread_mutex_lock(&bufmgr_gem->lock);
+
+	ret = map_cpu(bo);
+	if (ret || bo_gem->is_userptr) {
+		pthread_mutex_unlock(&bufmgr_gem->lock);
+		return ret;
+	}
+
 	memclear(set_domain);
 	set_domain.handle = bo_gem->gem_handle;
 	set_domain.read_domains = I915_GEM_DOMAIN_CPU;
@@ -1536,9 +1551,7 @@  int
 drm_intel_gem_bo_map_unsynchronized(drm_intel_bo *bo)
 {
 	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
-#ifdef HAVE_VALGRIND
 	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
-#endif
 	int ret;
 
 	/* If the CPU cache isn't coherent with the GTT, then use a
@@ -1553,7 +1566,10 @@  drm_intel_gem_bo_map_unsynchronized(drm_intel_bo *bo)
 
 	pthread_mutex_lock(&bufmgr_gem->lock);
 
-	ret = map_gtt(bo);
+	if (bo_gem->tiling_mode == I915_TILING_NONE)
+		ret = map_cpu(bo);
+	else
+		ret = map_gtt(bo);
 	if (ret == 0) {
 		drm_intel_gem_bo_mark_mmaps_incoherent(bo);
 		VG(VALGRIND_MAKE_MEM_DEFINED(bo_gem->gtt_virtual, bo->size));