diff mbox

[v5] intel: New libdrm interface to create unbound wc user mappings for objects

Message ID 1457514593-12306-1-git-send-email-akash.goel@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

akash.goel@intel.com March 9, 2016, 9:09 a.m. UTC
From: Akash Goel <akash.goel@intel.com>

A new libdrm interface 'drm_intel_gem_bo_map_wc' is provided by this
patch. Through this interface Gfx clients can create write combining
virtual mappings of the Gem object. It will provide the same funtionality
of 'mmap_gtt' interface without the constraints of limited aperture space,
but provided clients handles the linear to tile conversion on their own.
This patch is intended for improving the CPU write operation performance,
as with such mapping, writes are almost 50% faster than with mmap_gtt.
Also it avoids the Cache flush after update from CPU side, when object is
passed on to GPU, which will be the case if regular mmap interface is used.
This type of mapping is specially useful in case of sub-region
update, i.e. when only a portion of the object is to be updated.
Also there is a support for the unsynchronized version of this interface
named 'drm_intel_gem_bo_map_wc_unsynchronized', through which wc virtual
mappings, but unsynchronized one, can be created of the Gem object.
To ensure the cache coherency, before using this mapping, the GTT domain has
been reused here. This provides the required Cache flush if the object is in
CPU domain or synchronization against the concurrent rendering

The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
extended with a new flags field (defaulting to 0 for existent users). In
order for userspace to detect the extended ioctl, a new parameter
I915_PARAM_HAS_EXT_MMAP has been added for versioning the ioctl interface.

v2: Aligned with the v2 of the corresponding kernel patch (Chris)
v3: Added the unmap calls for the wc mapping (Damien)
    Added the param feature check before creating the wc mapping & reduced
    the vma limit (Chris)
v4: Removed the extraneous unlock call from map_wc function (Damien)
v5: Rebased.

Signed-off-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
---
 intel/intel_bufmgr.h     |   3 +
 intel/intel_bufmgr_gem.c | 147 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 150 insertions(+)

Comments

Martin Peres March 10, 2016, 8:39 a.m. UTC | #1
On 09/03/16 11:09, akash.goel@intel.com wrote:
> From: Akash Goel <akash.goel@intel.com>
>
> A new libdrm interface 'drm_intel_gem_bo_map_wc' is provided by this
> patch. Through this interface Gfx clients can create write combining
> virtual mappings of the Gem object. It will provide the same funtionality
> of 'mmap_gtt' interface without the constraints of limited aperture space,
> but provided clients handles the linear to tile conversion on their own.
> This patch is intended for improving the CPU write operation performance,
> as with such mapping, writes are almost 50% faster than with mmap_gtt.
> Also it avoids the Cache flush after update from CPU side, when object is
> passed on to GPU, which will be the case if regular mmap interface is used.
> This type of mapping is specially useful in case of sub-region
> update, i.e. when only a portion of the object is to be updated.
> Also there is a support for the unsynchronized version of this interface
> named 'drm_intel_gem_bo_map_wc_unsynchronized', through which wc virtual
> mappings, but unsynchronized one, can be created of the Gem object.
> To ensure the cache coherency, before using this mapping, the GTT domain has
> been reused here. This provides the required Cache flush if the object is in
> CPU domain or synchronization against the concurrent rendering
>
> The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
> extended with a new flags field (defaulting to 0 for existent users). In
> order for userspace to detect the extended ioctl, a new parameter
> I915_PARAM_HAS_EXT_MMAP has been added for versioning the ioctl interface.
>
> v2: Aligned with the v2 of the corresponding kernel patch (Chris)
> v3: Added the unmap calls for the wc mapping (Damien)
>      Added the param feature check before creating the wc mapping & reduced
>      the vma limit (Chris)
> v4: Removed the extraneous unlock call from map_wc function (Damien)
> v5: Rebased.
>
> Signed-off-by: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>

Chris, Damien,

Do you still stand by your S-o-b and R-b? If so, I can push the patch in 
the coming days.
Martin Peres March 14, 2016, 4:51 p.m. UTC | #2
On 10/03/16 10:39, Martin Peres wrote:
> On 09/03/16 11:09, akash.goel@intel.com wrote:
>> From: Akash Goel <akash.goel@intel.com>
>>
>> A new libdrm interface 'drm_intel_gem_bo_map_wc' is provided by this
>> patch. Through this interface Gfx clients can create write combining
>> virtual mappings of the Gem object. It will provide the same funtionality
>> of 'mmap_gtt' interface without the constraints of limited aperture
>> space,
>> but provided clients handles the linear to tile conversion on their own.
>> This patch is intended for improving the CPU write operation performance,
>> as with such mapping, writes are almost 50% faster than with mmap_gtt.
>> Also it avoids the Cache flush after update from CPU side, when object is
>> passed on to GPU, which will be the case if regular mmap interface is
>> used.
>> This type of mapping is specially useful in case of sub-region
>> update, i.e. when only a portion of the object is to be updated.
>> Also there is a support for the unsynchronized version of this interface
>> named 'drm_intel_gem_bo_map_wc_unsynchronized', through which wc virtual
>> mappings, but unsynchronized one, can be created of the Gem object.
>> To ensure the cache coherency, before using this mapping, the GTT
>> domain has
>> been reused here. This provides the required Cache flush if the object
>> is in
>> CPU domain or synchronization against the concurrent rendering
>>
>> The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has
>> been
>> extended with a new flags field (defaulting to 0 for existent users). In
>> order for userspace to detect the extended ioctl, a new parameter
>> I915_PARAM_HAS_EXT_MMAP has been added for versioning the ioctl
>> interface.
>>
>> v2: Aligned with the v2 of the corresponding kernel patch (Chris)
>> v3: Added the unmap calls for the wc mapping (Damien)
>>      Added the param feature check before creating the wc mapping &
>> reduced
>>      the vma limit (Chris)
>> v4: Removed the extraneous unlock call from map_wc function (Damien)
>> v5: Rebased.
>>
>> Signed-off-by: Akash Goel <akash.goel@intel.com>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
>
> Chris, Damien,
>
> Do you still stand by your S-o-b and R-b? If so, I can push the patch in
> the coming days.

Another ping, with both Chris and Damien CCed this time.
Daniel Vetter March 15, 2016, 8:41 a.m. UTC | #3
On Mon, Mar 14, 2016 at 06:51:54PM +0200, Martin Peres wrote:
> On 10/03/16 10:39, Martin Peres wrote:
> >On 09/03/16 11:09, akash.goel@intel.com wrote:
> >>From: Akash Goel <akash.goel@intel.com>
> >>
> >>A new libdrm interface 'drm_intel_gem_bo_map_wc' is provided by this
> >>patch. Through this interface Gfx clients can create write combining
> >>virtual mappings of the Gem object. It will provide the same funtionality
> >>of 'mmap_gtt' interface without the constraints of limited aperture
> >>space,
> >>but provided clients handles the linear to tile conversion on their own.
> >>This patch is intended for improving the CPU write operation performance,
> >>as with such mapping, writes are almost 50% faster than with mmap_gtt.
> >>Also it avoids the Cache flush after update from CPU side, when object is
> >>passed on to GPU, which will be the case if regular mmap interface is
> >>used.
> >>This type of mapping is specially useful in case of sub-region
> >>update, i.e. when only a portion of the object is to be updated.
> >>Also there is a support for the unsynchronized version of this interface
> >>named 'drm_intel_gem_bo_map_wc_unsynchronized', through which wc virtual
> >>mappings, but unsynchronized one, can be created of the Gem object.
> >>To ensure the cache coherency, before using this mapping, the GTT
> >>domain has
> >>been reused here. This provides the required Cache flush if the object
> >>is in
> >>CPU domain or synchronization against the concurrent rendering
> >>
> >>The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has
> >>been
> >>extended with a new flags field (defaulting to 0 for existent users). In
> >>order for userspace to detect the extended ioctl, a new parameter
> >>I915_PARAM_HAS_EXT_MMAP has been added for versioning the ioctl
> >>interface.
> >>
> >>v2: Aligned with the v2 of the corresponding kernel patch (Chris)
> >>v3: Added the unmap calls for the wc mapping (Damien)
> >>     Added the param feature check before creating the wc mapping &
> >>reduced
> >>     the vma limit (Chris)
> >>v4: Removed the extraneous unlock call from map_wc function (Damien)
> >>v5: Rebased.
> >>
> >>Signed-off-by: Akash Goel <akash.goel@intel.com>
> >>Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
> >
> >Chris, Damien,
> >
> >Do you still stand by your S-o-b and R-b? If so, I can push the patch in
> >the coming days.
> 
> Another ping, with both Chris and Damien CCed this time.

Not entirely sure, but we tended to not push libdrm patches if the open
source userspace wasn't there yet. That's why this one was hanging in the
air forever iirc.
-Daniel
diff mbox

Patch

diff --git a/intel/intel_bufmgr.h b/intel/intel_bufmgr.h
index a1abbcd..061b500 100644
--- a/intel/intel_bufmgr.h
+++ b/intel/intel_bufmgr.h
@@ -183,6 +183,9 @@  void drm_intel_bufmgr_gem_set_vma_cache_size(drm_intel_bufmgr *bufmgr,
 int drm_intel_gem_bo_map_unsynchronized(drm_intel_bo *bo);
 int drm_intel_gem_bo_map_gtt(drm_intel_bo *bo);
 int drm_intel_gem_bo_unmap_gtt(drm_intel_bo *bo);
+int drm_intel_gem_bo_map_wc_unsynchronized(drm_intel_bo *bo);
+int drm_intel_gem_bo_map_wc(drm_intel_bo *bo);
+int drm_intel_gem_bo_unmap_wc(drm_intel_bo *bo);
 
 int drm_intel_gem_bo_get_reloc_count(drm_intel_bo *bo);
 void drm_intel_gem_bo_clear_relocs(drm_intel_bo *bo, int start);
diff --git a/intel/intel_bufmgr_gem.c b/intel/intel_bufmgr_gem.c
index dc28200..ca2f9ed 100644
--- a/intel/intel_bufmgr_gem.c
+++ b/intel/intel_bufmgr_gem.c
@@ -146,6 +146,7 @@  typedef struct _drm_intel_bufmgr_gem {
 	unsigned int bo_reuse : 1;
 	unsigned int no_exec : 1;
 	unsigned int has_vebox : 1;
+	unsigned int has_ext_mmap : 1;
 	bool fenced_relocs;
 
 	struct {
@@ -209,6 +210,8 @@  struct _drm_intel_bo_gem {
 
 	/** Mapped address for the buffer, saved across map/unmap cycles */
 	void *mem_virtual;
+	/** Uncached Mapped address for the buffer, saved across map/unmap cycles */
+	void *mem_wc_virtual;
 	/** GTT virtual address for the buffer, saved across map/unmap cycles */
 	void *gtt_virtual;
 	/**
@@ -1192,6 +1195,10 @@  drm_intel_gem_bo_free(drm_intel_bo *bo)
 		drm_munmap(bo_gem->gtt_virtual, bo_gem->bo.size);
 		bufmgr_gem->vma_count--;
 	}
+	if (bo_gem->mem_wc_virtual) {
+		drm_munmap(bo_gem->mem_wc_virtual, bo_gem->bo.size);
+		bufmgr_gem->vma_count--;
+	}
 
 	/* Close this object */
 	memclear(close);
@@ -1215,6 +1222,9 @@  drm_intel_gem_bo_mark_mmaps_incoherent(drm_intel_bo *bo)
 
 	if (bo_gem->gtt_virtual)
 		VALGRIND_MAKE_MEM_NOACCESS(bo_gem->gtt_virtual, bo->size);
+
+	if (bo_gem->mem_wc_virtual)
+		VALGRIND_MAKE_MEM_NOACCESS(bo_gem->mem_wc_virtual, bo->size);
 #endif
 }
 
@@ -1260,6 +1270,8 @@  static void drm_intel_gem_bo_purge_vma_cache(drm_intel_bufmgr_gem *bufmgr_gem)
 
 	/* We may need to evict a few entries in order to create new mmaps */
 	limit = bufmgr_gem->vma_max - 2*bufmgr_gem->vma_open;
+	if (bufmgr_gem->has_ext_mmap)
+		limit -= bufmgr_gem->vma_open;
 	if (limit < 0)
 		limit = 0;
 
@@ -1282,6 +1294,11 @@  static void drm_intel_gem_bo_purge_vma_cache(drm_intel_bufmgr_gem *bufmgr_gem)
 			bo_gem->gtt_virtual = NULL;
 			bufmgr_gem->vma_count--;
 		}
+		if (bo_gem->mem_wc_virtual) {
+			drm_munmap(bo_gem->mem_wc_virtual, bo_gem->bo.size);
+			bo_gem->mem_wc_virtual = NULL;
+			bufmgr_gem->vma_count--;
+		}
 	}
 }
 
@@ -1294,6 +1311,8 @@  static void drm_intel_gem_bo_close_vma(drm_intel_bufmgr_gem *bufmgr_gem,
 		bufmgr_gem->vma_count++;
 	if (bo_gem->gtt_virtual)
 		bufmgr_gem->vma_count++;
+	if (bo_gem->mem_wc_virtual)
+		bufmgr_gem->vma_count++;
 	drm_intel_gem_bo_purge_vma_cache(bufmgr_gem);
 }
 
@@ -1306,6 +1325,8 @@  static void drm_intel_gem_bo_open_vma(drm_intel_bufmgr_gem *bufmgr_gem,
 		bufmgr_gem->vma_count--;
 	if (bo_gem->gtt_virtual)
 		bufmgr_gem->vma_count--;
+	if (bo_gem->mem_wc_virtual)
+		bufmgr_gem->vma_count--;
 	drm_intel_gem_bo_purge_vma_cache(bufmgr_gem);
 }
 
@@ -1410,6 +1431,122 @@  static void drm_intel_gem_bo_unreference(drm_intel_bo *bo)
 	}
 }
 
+static int
+map_wc(drm_intel_bo *bo)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	int ret;
+
+	if (bo_gem->is_userptr)
+		return -EINVAL;
+
+	if (!bufmgr_gem->has_ext_mmap)
+		return -EINVAL;
+
+	if (bo_gem->map_count++ == 0)
+		drm_intel_gem_bo_open_vma(bufmgr_gem, bo_gem);
+
+	/* Get a mapping of the buffer if we haven't before. */
+	if (bo_gem->mem_wc_virtual == NULL) {
+		struct drm_i915_gem_mmap mmap_arg;
+
+		DBG("bo_map_wc: mmap %d (%s), map_count=%d\n",
+		    bo_gem->gem_handle, bo_gem->name, bo_gem->map_count);
+
+		memclear(mmap_arg);
+		mmap_arg.handle = bo_gem->gem_handle;
+		/* To indicate the uncached virtual mapping to KMD */
+		mmap_arg.flags = I915_MMAP_WC;
+		mmap_arg.size = bo->size;
+		ret = drmIoctl(bufmgr_gem->fd,
+			       DRM_IOCTL_I915_GEM_MMAP,
+			       &mmap_arg);
+		if (ret != 0) {
+			ret = -errno;
+			DBG("%s:%d: Error mapping buffer %d (%s): %s .\n",
+			    __FILE__, __LINE__, bo_gem->gem_handle,
+			    bo_gem->name, strerror(errno));
+			if (--bo_gem->map_count == 0)
+				drm_intel_gem_bo_close_vma(bufmgr_gem, bo_gem);
+			return ret;
+		}
+		VG(VALGRIND_MALLOCLIKE_BLOCK(mmap_arg.addr_ptr, mmap_arg.size, 0, 1));
+		bo_gem->mem_wc_virtual = (void *)(uintptr_t) mmap_arg.addr_ptr;
+	}
+
+	bo->virtual = bo_gem->mem_wc_virtual;
+
+	DBG("bo_map_wc: %d (%s) -> %p\n", bo_gem->gem_handle, bo_gem->name,
+	    bo_gem->mem_wc_virtual);
+
+	return 0;
+}
+
+/* To be used in a similar way to mmap_gtt */
+int
+drm_intel_gem_bo_map_wc(drm_intel_bo *bo) {
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	struct drm_i915_gem_set_domain set_domain;
+	int ret;
+
+	pthread_mutex_lock(&bufmgr_gem->lock);
+
+	ret = map_wc(bo);
+	if (ret) {
+		pthread_mutex_unlock(&bufmgr_gem->lock);
+		return ret;
+	}
+
+	/* Now move it to the GTT domain so that the GPU and CPU
+	 * caches are flushed and the GPU isn't actively using the
+	 * buffer.
+	 *
+	 * The domain change is done even for the objects which
+	 * are not bounded. For them first the pages are acquired,
+	 * before the domain change.
+	 */
+	memclear(set_domain);
+	set_domain.handle = bo_gem->gem_handle;
+	set_domain.read_domains = I915_GEM_DOMAIN_GTT;
+	set_domain.write_domain = I915_GEM_DOMAIN_GTT;
+	ret = drmIoctl(bufmgr_gem->fd,
+		       DRM_IOCTL_I915_GEM_SET_DOMAIN,
+		       &set_domain);
+	if (ret != 0) {
+		DBG("%s:%d: Error setting domain %d: %s\n",
+		    __FILE__, __LINE__, bo_gem->gem_handle,
+		    strerror(errno));
+	}
+	drm_intel_gem_bo_mark_mmaps_incoherent(bo);
+	VG(VALGRIND_MAKE_MEM_DEFINED(bo_gem->mem_wc_virtual, bo->size));
+	pthread_mutex_unlock(&bufmgr_gem->lock);
+
+	return 0;
+}
+
+int
+drm_intel_gem_bo_map_wc_unsynchronized(drm_intel_bo *bo) {
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+#ifdef HAVE_VALGRIND
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+#endif
+	int ret;
+
+	pthread_mutex_lock(&bufmgr_gem->lock);
+
+	ret = map_wc(bo);
+	if (ret == 0) {
+		drm_intel_gem_bo_mark_mmaps_incoherent(bo);
+		VG(VALGRIND_MAKE_MEM_DEFINED(bo_gem->mem_wc_virtual, bo->size));
+	}
+
+	pthread_mutex_unlock(&bufmgr_gem->lock);
+
+	return ret;
+}
+
 static int drm_intel_gem_bo_map(drm_intel_bo *bo, int write_enable)
 {
 	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
@@ -1695,6 +1832,12 @@  static int drm_intel_gem_bo_unmap(drm_intel_bo *bo)
 }
 
 int
+drm_intel_gem_bo_unmap_wc(drm_intel_bo *bo)
+{
+	return drm_intel_gem_bo_unmap(bo);
+}
+
+int
 drm_intel_gem_bo_unmap_gtt(drm_intel_bo *bo)
 {
 	return drm_intel_gem_bo_unmap(bo);
@@ -3425,6 +3568,10 @@  drm_intel_bufmgr_gem_init(int fd, int batch_size)
 	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GETPARAM, &gp);
 	bufmgr_gem->has_vebox = (ret == 0) & (*gp.value > 0);
 
+	gp.param = I915_PARAM_MMAP_VERSION;
+	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GETPARAM, &gp);
+	bufmgr_gem->has_ext_mmap = (ret == 0) & (*gp.value > 0);
+
 	gp.param = I915_PARAM_HAS_EXEC_SOFTPIN;
 	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GETPARAM, &gp);
 	if (ret == 0 && *gp.value > 0)