diff mbox

[v3] intel: New libdrm interface to create unbound wc user mappings for objects

Message ID 1414501767-22479-1-git-send-email-akash.goel@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

akash.goel@intel.com Oct. 28, 2014, 1:09 p.m. UTC
From: Akash Goel <akash.goel@intel.com>

A new libdrm interface 'drm_intel_gem_bo_map_wc' is provided by this
patch. Through this interface Gfx clients can create write combining
virtual mappings of the Gem object. It will provide the same funtionality
of 'mmap_gtt' interface without the constraints of limited aperture space,
but provided clients handles the linear to tile conversion on their own.
This patch is intended for improving the CPU write operation performance,
as with such mapping, writes are almost 50% faster than with mmap_gtt.
Also it avoids the Cache flush after update from CPU side, when object is
passed on to GPU, which will be the case if regular mmap interface is used.
This type of mapping is specially useful in case of sub-region
update, i.e. when only a portion of the object is to be updated.
Also there is a support for the unsynchronized version of this interface
named 'drm_intel_gem_bo_map_wc_unsynchronized', through which wc virtual
mappings, but unsynchronized one, can be created of the Gem object.
To ensure the cache coherency, before using this mapping, the GTT domain has
been reused here. This provides the required Cache flush if the object is in
CPU domain or synchronization against the concurrent rendering

The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
extended with a new flags field (defaulting to 0 for existent users). In
order for userspace to detect the extended ioctl, a new parameter
I915_PARAM_HAS_EXT_MMAP has been added for versioning the ioctl interface.

v2: Aligned with the v2 of the corresponding kernel patch (Chris)
v3: Added the unmap calls for the wc mapping (Damien)
    Added the param feature check before creating the wc mapping & reduced
    the vma limit (Chris)

Signed-off-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 include/drm/i915_drm.h   |   9 +++
 intel/intel_bufmgr.h     |   3 +
 intel/intel_bufmgr_gem.c | 150 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 162 insertions(+)

Comments

Lespiau, Damien Oct. 28, 2014, 4:11 p.m. UTC | #1
On Tue, Oct 28, 2014 at 06:39:27PM +0530, akash.goel@intel.com wrote:
> From: Akash Goel <akash.goel@intel.com>
> 
> A new libdrm interface 'drm_intel_gem_bo_map_wc' is provided by this
> patch. Through this interface Gfx clients can create write combining
> virtual mappings of the Gem object. It will provide the same funtionality
> of 'mmap_gtt' interface without the constraints of limited aperture space,
> but provided clients handles the linear to tile conversion on their own.
> This patch is intended for improving the CPU write operation performance,
> as with such mapping, writes are almost 50% faster than with mmap_gtt.
> Also it avoids the Cache flush after update from CPU side, when object is
> passed on to GPU, which will be the case if regular mmap interface is used.
> This type of mapping is specially useful in case of sub-region
> update, i.e. when only a portion of the object is to be updated.
> Also there is a support for the unsynchronized version of this interface
> named 'drm_intel_gem_bo_map_wc_unsynchronized', through which wc virtual
> mappings, but unsynchronized one, can be created of the Gem object.
> To ensure the cache coherency, before using this mapping, the GTT domain has
> been reused here. This provides the required Cache flush if the object is in
> CPU domain or synchronization against the concurrent rendering
> 
> The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
> extended with a new flags field (defaulting to 0 for existent users). In
> order for userspace to detect the extended ioctl, a new parameter
> I915_PARAM_HAS_EXT_MMAP has been added for versioning the ioctl interface.
> 
> v2: Aligned with the v2 of the corresponding kernel patch (Chris)
> v3: Added the unmap calls for the wc mapping (Damien)
>     Added the param feature check before creating the wc mapping & reduced
>     the vma limit (Chris)
> 
> Signed-off-by: Akash Goel <akash.goel@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

I'm still not super convinced by the full copy and paste of the function
where all I can see changing is the pointer storing the virtual address
and the flags. Something I'm missing? What do others think (I'm not
the maintainer of anything in libdrm).
Lespiau, Damien Dec. 3, 2014, 2:13 p.m. UTC | #2
On Tue, Oct 28, 2014 at 06:39:27PM +0530, akash.goel@intel.com wrote:
> @@ -1126,6 +1136,8 @@ static void drm_intel_gem_bo_purge_vma_cache(drm_intel_bufmgr_gem *bufmgr_gem)
>  
>  	/* We may need to evict a few entries in order to create new mmaps */
>  	limit = bufmgr_gem->vma_max - 2*bufmgr_gem->vma_open;
> +	if (bufmgr_gem->has_ext_mmap)
> +		limit -= bufmgr_gem->vma_open;
>  	if (limit < 0)
>  		limit = 0;
>  

Aren't we being overly aggressive in purging the vma cache with this new
code? I guess it depends on the workload but I don't think the WC
mapping is likely to be used in addition to the other two often enough
to warrant a max - 3 * open;

> +static int
> +map_wc(drm_intel_bo *bo)
> +{
> +	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
> +	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
> +	int ret;
> +
> +	if (bo_gem->is_userptr)
> +		return -EINVAL;
> +
> +	if (!bufmgr_gem->has_ext_mmap)
> +		return -EINVAL;
> +
> +	if (bo_gem->map_count++ == 0)
> +		drm_intel_gem_bo_open_vma(bufmgr_gem, bo_gem);
> +
> +	/* Get a mapping of the buffer if we haven't before. */
> +	if (bo_gem->mem_wc_virtual == NULL) {
> +		struct drm_i915_gem_mmap mmap_arg;
> +
> +		DBG("bo_map_wc: mmap %d (%s), map_count=%d\n",
> +		    bo_gem->gem_handle, bo_gem->name, bo_gem->map_count);
> +
> +		VG_CLEAR(mmap_arg);
> +		mmap_arg.handle = bo_gem->gem_handle;
> +		/* To indicate the uncached virtual mapping to KMD */
> +		mmap_arg.flags = I915_MMAP_WC;
> +		mmap_arg.offset = 0;
> +		mmap_arg.size = bo->size;
> +		ret = drmIoctl(bufmgr_gem->fd,
> +			       DRM_IOCTL_I915_GEM_MMAP,
> +			       &mmap_arg);
> +		if (ret != 0) {
> +			ret = -errno;
> +			DBG("%s:%d: Error mapping buffer %d (%s): %s .\n",
> +			    __FILE__, __LINE__, bo_gem->gem_handle,
> +			    bo_gem->name, strerror(errno));
> +			if (--bo_gem->map_count == 0)
> +				drm_intel_gem_bo_close_vma(bufmgr_gem, bo_gem);
> +			pthread_mutex_unlock(&bufmgr_gem->lock);


There's an unlock() here (and no lock()!), but the locking is hanled by
the higher level functions.

> +			return ret;
> +		}
> +		VG(VALGRIND_MALLOCLIKE_BLOCK(mmap_arg.addr_ptr, mmap_arg.size, 0, 1));
> +		bo_gem->mem_wc_virtual = (void *)(uintptr_t) mmap_arg.addr_ptr;
> +	}
> +
> +	bo->virtual = bo_gem->mem_wc_virtual;
> +
> +	DBG("bo_map_wc: %d (%s) -> %p\n", bo_gem->gem_handle, bo_gem->name,
> +	    bo_gem->mem_wc_virtual);
> +
> +	return 0;
> +}
> +
> +/* To be used in a similar way to mmap_gtt */
> +drm_public int
> +drm_intel_gem_bo_map_wc(drm_intel_bo *bo) {
> +	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
> +	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
> +	struct drm_i915_gem_set_domain set_domain;
> +	int ret;
> +
> +	pthread_mutex_lock(&bufmgr_gem->lock);
> +
> +	ret = map_wc(bo);
> +	if (ret) {
> +		pthread_mutex_unlock(&bufmgr_gem->lock);
> +		return ret;
> +	}
> +
> +	/* Now move it to the GTT domain so that the GPU and CPU
> +	 * caches are flushed and the GPU isn't actively using the
> +	 * buffer.
> +	 *
> +	 * The domain change is done even for the objects which
> +	 * are not bounded. For them first the pages are acquired,
> +	 * before the domain change.
> +	 */
> +	VG_CLEAR(set_domain);
> +	set_domain.handle = bo_gem->gem_handle;
> +	set_domain.read_domains = I915_GEM_DOMAIN_GTT;
> +	set_domain.write_domain = I915_GEM_DOMAIN_GTT;
> +	ret = drmIoctl(bufmgr_gem->fd,
> +		       DRM_IOCTL_I915_GEM_SET_DOMAIN,
> +		       &set_domain);
> +	if (ret != 0) {
> +		DBG("%s:%d: Error setting domain %d: %s\n",
> +		    __FILE__, __LINE__, bo_gem->gem_handle,
> +		    strerror(errno));
> +	}

Why move the buffer to the GTT domain and not the CPU domain here?

> +	drm_intel_gem_bo_mark_mmaps_incoherent(bo);
> +	VG(VALGRIND_MAKE_MEM_DEFINED(bo_gem->mem_wc_virtual, bo->size));
> +	pthread_mutex_unlock(&bufmgr_gem->lock);
> +
> +	return 0;
> +}
> +
> +drm_public int
> +drm_intel_gem_bo_map_wc_unsynchronized(drm_intel_bo *bo) {
> +	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
> +#ifdef HAVE_VALGRIND
> +	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
> +#endif
> +	int ret;
> +
> +	pthread_mutex_lock(&bufmgr_gem->lock);
> +
> +	ret = map_wc(bo);
> +	if (ret == 0) {
> +		drm_intel_gem_bo_mark_mmaps_incoherent(bo);
> +		VG(VALGRIND_MAKE_MEM_DEFINED(bo_gem->mem_wc_virtual, bo->size));
> +	}
> +
> +	pthread_mutex_unlock(&bufmgr_gem->lock);
> +
> +	return ret;
> +}
> +
>  static int drm_intel_gem_bo_map(drm_intel_bo *bo, int write_enable)
>  {
>  	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
> @@ -1293,6 +1432,7 @@ static int drm_intel_gem_bo_map(drm_intel_bo *bo, int write_enable)
>  
>  		VG_CLEAR(mmap_arg);
>  		mmap_arg.handle = bo_gem->gem_handle;
> +		mmap_arg.flags = 0;
>  		mmap_arg.offset = 0;
>  		mmap_arg.size = bo->size;
>  		ret = drmIoctl(bufmgr_gem->fd,
> @@ -1553,6 +1693,12 @@ static int drm_intel_gem_bo_unmap(drm_intel_bo *bo)
>  }
>  
>  drm_public int
> +drm_intel_gem_bo_unmap_wc(drm_intel_bo *bo)
> +{
> +	return drm_intel_gem_bo_unmap(bo);
> +}
> +
> +drm_public int
>  drm_intel_gem_bo_unmap_gtt(drm_intel_bo *bo)
>  {
>  	return drm_intel_gem_bo_unmap(bo);
> @@ -3538,6 +3684,10 @@ drm_intel_bufmgr_gem_init(int fd, int batch_size)
>  	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GETPARAM, &gp);
>  	bufmgr_gem->has_vebox = (ret == 0) & (*gp.value > 0);
>  
> +	gp.param = I915_PARAM_MMAP_VERSION;
> +	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GETPARAM, &gp);
> +	bufmgr_gem->has_ext_mmap = (ret == 0) & (*gp.value > 0);

It looks like this works, but can we have && instead?
Chris Wilson Dec. 3, 2014, 6:18 p.m. UTC | #3
On Wed, Dec 03, 2014 at 02:13:58PM +0000, Damien Lespiau wrote:
> > +/* To be used in a similar way to mmap_gtt */
> > +drm_public int
> > +drm_intel_gem_bo_map_wc(drm_intel_bo *bo) {
> > +	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
> > +	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
> > +	struct drm_i915_gem_set_domain set_domain;
> > +	int ret;
> > +
> > +	pthread_mutex_lock(&bufmgr_gem->lock);
> > +
> > +	ret = map_wc(bo);
> > +	if (ret) {
> > +		pthread_mutex_unlock(&bufmgr_gem->lock);
> > +		return ret;
> > +	}
> > +
> > +	/* Now move it to the GTT domain so that the GPU and CPU
> > +	 * caches are flushed and the GPU isn't actively using the
> > +	 * buffer.
> > +	 *
> > +	 * The domain change is done even for the objects which
> > +	 * are not bounded. For them first the pages are acquired,
> > +	 * before the domain change.
> > +	 */
> > +	VG_CLEAR(set_domain);
> > +	set_domain.handle = bo_gem->gem_handle;
> > +	set_domain.read_domains = I915_GEM_DOMAIN_GTT;
> > +	set_domain.write_domain = I915_GEM_DOMAIN_GTT;
> > +	ret = drmIoctl(bufmgr_gem->fd,
> > +		       DRM_IOCTL_I915_GEM_SET_DOMAIN,
> > +		       &set_domain);
> > +	if (ret != 0) {
> > +		DBG("%s:%d: Error setting domain %d: %s\n",
> > +		    __FILE__, __LINE__, bo_gem->gem_handle,
> > +		    strerror(errno));
> > +	}
> 
> Why move the buffer to the GTT domain and not the CPU domain here?

The entire point of the mmap(wc) interface is to access the buffer
outside of the CPU cache domain, and not using the GTT indirection.

We have 3 cache domains: in GPU, in CPU, neither (aka GTT).
-Chris
Chris Wilson Dec. 9, 2014, 10:54 a.m. UTC | #4
On Wed, Dec 03, 2014 at 02:13:58PM +0000, Damien Lespiau wrote:
> On Tue, Oct 28, 2014 at 06:39:27PM +0530, akash.goel@intel.com wrote:
> > @@ -1126,6 +1136,8 @@ static void drm_intel_gem_bo_purge_vma_cache(drm_intel_bufmgr_gem *bufmgr_gem)
> >  
> >  	/* We may need to evict a few entries in order to create new mmaps */
> >  	limit = bufmgr_gem->vma_max - 2*bufmgr_gem->vma_open;
> > +	if (bufmgr_gem->has_ext_mmap)
> > +		limit -= bufmgr_gem->vma_open;
> >  	if (limit < 0)
> >  		limit = 0;
> >  
> 
> Aren't we being overly aggressive in purging the vma cache with this new
> code? I guess it depends on the workload but I don't think the WC
> mapping is likely to be used in addition to the other two often enough
> to warrant a max - 3 * open;

And if they do, you've just exceeded their limit. And it conceivably
that you will have bo with all 3 vma mappings, though I only expect 2 to
be active in any single lifetime. The patch is correct regarding the code
as it exists, you may want to argue for finer tracking of vma_open and
vma_cached, but that would require a different approach.
 
> > +	/* Now move it to the GTT domain so that the GPU and CPU
> > +	 * caches are flushed and the GPU isn't actively using the
> > +	 * buffer.
> > +	 *
> > +	 * The domain change is done even for the objects which
> > +	 * are not bounded. For them first the pages are acquired,
> > +	 * before the domain change.
> > +	 */
> > +	VG_CLEAR(set_domain);
> > +	set_domain.handle = bo_gem->gem_handle;
> > +	set_domain.read_domains = I915_GEM_DOMAIN_GTT;
> > +	set_domain.write_domain = I915_GEM_DOMAIN_GTT;
> > +	ret = drmIoctl(bufmgr_gem->fd,
> > +		       DRM_IOCTL_I915_GEM_SET_DOMAIN,
> > +		       &set_domain);
> > +	if (ret != 0) {
> > +		DBG("%s:%d: Error setting domain %d: %s\n",
> > +		    __FILE__, __LINE__, bo_gem->gem_handle,
> > +		    strerror(errno));
> > +	}
> 
> Why move the buffer to the GTT domain and not the CPU domain here?

There are 3 cache domains: GPU, CPU and NONE (aka GTT). The purpose of
mmap(wc) is to be able to access the buffer without going through the CPU
cache (and it should be out of the GPU cache to maintain coherency). The
NONE cache domain is exactly what we want, or else you end up in clflush
misery.
-Chris
Lespiau, Damien Dec. 9, 2014, 4:53 p.m. UTC | #5
On Tue, Dec 09, 2014 at 10:54:25AM +0000, Chris Wilson wrote:
> On Wed, Dec 03, 2014 at 02:13:58PM +0000, Damien Lespiau wrote:
> > On Tue, Oct 28, 2014 at 06:39:27PM +0530, akash.goel@intel.com wrote:
> > > @@ -1126,6 +1136,8 @@ static void drm_intel_gem_bo_purge_vma_cache(drm_intel_bufmgr_gem *bufmgr_gem)
> > >  
> > >  	/* We may need to evict a few entries in order to create new mmaps */
> > >  	limit = bufmgr_gem->vma_max - 2*bufmgr_gem->vma_open;
> > > +	if (bufmgr_gem->has_ext_mmap)
> > > +		limit -= bufmgr_gem->vma_open;
> > >  	if (limit < 0)
> > >  		limit = 0;
> > >  
> > 
> > Aren't we being overly aggressive in purging the vma cache with this new
> > code? I guess it depends on the workload but I don't think the WC
> > mapping is likely to be used in addition to the other two often enough
> > to warrant a max - 3 * open;
> 
> And if they do, you've just exceeded their limit. And it conceivably
> that you will have bo with all 3 vma mappings, though I only expect 2 to
> be active in any single lifetime. The patch is correct regarding the code
> as it exists, you may want to argue for finer tracking of vma_open and
> vma_cached, but that would require a different approach.
>  
> > > +	/* Now move it to the GTT domain so that the GPU and CPU
> > > +	 * caches are flushed and the GPU isn't actively using the
> > > +	 * buffer.
> > > +	 *
> > > +	 * The domain change is done even for the objects which
> > > +	 * are not bounded. For them first the pages are acquired,
> > > +	 * before the domain change.
> > > +	 */
> > > +	VG_CLEAR(set_domain);
> > > +	set_domain.handle = bo_gem->gem_handle;
> > > +	set_domain.read_domains = I915_GEM_DOMAIN_GTT;
> > > +	set_domain.write_domain = I915_GEM_DOMAIN_GTT;
> > > +	ret = drmIoctl(bufmgr_gem->fd,
> > > +		       DRM_IOCTL_I915_GEM_SET_DOMAIN,
> > > +		       &set_domain);
> > > +	if (ret != 0) {
> > > +		DBG("%s:%d: Error setting domain %d: %s\n",
> > > +		    __FILE__, __LINE__, bo_gem->gem_handle,
> > > +		    strerror(errno));
> > > +	}
> > 
> > Why move the buffer to the GTT domain and not the CPU domain here?
> 
> There are 3 cache domains: GPU, CPU and NONE (aka GTT). The purpose of
> mmap(wc) is to be able to access the buffer without going through the CPU
> cache (and it should be out of the GPU cache to maintain coherency). The
> NONE cache domain is exactly what we want, or else you end up in clflush
> misery.

Right that leaves the last point in my answer to v3. With that addressed
this is:

Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>

--
Damien
Lespiau, Damien March 3, 2015, 2:20 p.m. UTC | #6
On Tue, Dec 09, 2014 at 04:53:01PM +0000, Damien Lespiau wrote:
> Right that leaves the last point in my answer to v3. With that addressed
> this is:
> 
> Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>

Hum it seems that we've upstreamed the kernel part without the libdrm
API. Is it time to fix this? Akash, Chris, any objection if I push this
v3? is there a more up-to-date patch that was exercised by some driver?

Thanks,
Chris Wilson March 3, 2015, 5:05 p.m. UTC | #7
On Tue, Mar 03, 2015 at 02:20:03PM +0000, Damien Lespiau wrote:
> On Tue, Dec 09, 2014 at 04:53:01PM +0000, Damien Lespiau wrote:
> > Right that leaves the last point in my answer to v3. With that addressed
> > this is:
> > 
> > Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
> 
> Hum it seems that we've upstreamed the kernel part without the libdrm
> API. Is it time to fix this? Akash, Chris, any objection if I push this
> v3? is there a more up-to-date patch that was exercised by some driver?

There's no actual user of the proposed libdrm API afaik. It should make
a nice drop-in improvement for mesa, but we may need to better expose
bufmgr_gem->has_ext_mmap (I don't think mesa needs to care about failure
with old CPUs if it only focuses on mesa/i965).
-Chris
Lespiau, Damien March 3, 2015, 5:11 p.m. UTC | #8
On Tue, Mar 03, 2015 at 05:05:52PM +0000, Chris Wilson wrote:
> On Tue, Mar 03, 2015 at 02:20:03PM +0000, Damien Lespiau wrote:
> > On Tue, Dec 09, 2014 at 04:53:01PM +0000, Damien Lespiau wrote:
> > > Right that leaves the last point in my answer to v3. With that addressed
> > > this is:
> > > 
> > > Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
> > 
> > Hum it seems that we've upstreamed the kernel part without the libdrm
> > API. Is it time to fix this? Akash, Chris, any objection if I push this
> > v3? is there a more up-to-date patch that was exercised by some driver?
> 
> There's no actual user of the proposed libdrm API afaik. It should make
> a nice drop-in improvement for mesa, but we may need to better expose
> bufmgr_gem->has_ext_mmap (I don't think mesa needs to care about failure
> with old CPUs if it only focuses on mesa/i965).

We've cleared that with Dave last time and if there's the kernel API in
place (because the DDX uses it) then we can upstream the libdrm one, for
the sake of completeness.
Chris Wilson March 3, 2015, 5:13 p.m. UTC | #9
On Tue, Mar 03, 2015 at 05:11:12PM +0000, Damien Lespiau wrote:
> On Tue, Mar 03, 2015 at 05:05:52PM +0000, Chris Wilson wrote:
> > On Tue, Mar 03, 2015 at 02:20:03PM +0000, Damien Lespiau wrote:
> > > On Tue, Dec 09, 2014 at 04:53:01PM +0000, Damien Lespiau wrote:
> > > > Right that leaves the last point in my answer to v3. With that addressed
> > > > this is:
> > > > 
> > > > Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
> > > 
> > > Hum it seems that we've upstreamed the kernel part without the libdrm
> > > API. Is it time to fix this? Akash, Chris, any objection if I push this
> > > v3? is there a more up-to-date patch that was exercised by some driver?
> > 
> > There's no actual user of the proposed libdrm API afaik. It should make
> > a nice drop-in improvement for mesa, but we may need to better expose
> > bufmgr_gem->has_ext_mmap (I don't think mesa needs to care about failure
> > with old CPUs if it only focuses on mesa/i965).
> 
> We've cleared that with Dave last time and if there's the kernel API in
> place (because the DDX uses it) then we can upstream the libdrm one, for
> the sake of completeness.

I meant in terms of whether the API is sufficient for mesa.
-Chris
Lespiau, Damien March 3, 2015, 5:16 p.m. UTC | #10
On Tue, Mar 03, 2015 at 05:13:41PM +0000, Chris Wilson wrote:
> On Tue, Mar 03, 2015 at 05:11:12PM +0000, Damien Lespiau wrote:
> > On Tue, Mar 03, 2015 at 05:05:52PM +0000, Chris Wilson wrote:
> > > On Tue, Mar 03, 2015 at 02:20:03PM +0000, Damien Lespiau wrote:
> > > > On Tue, Dec 09, 2014 at 04:53:01PM +0000, Damien Lespiau wrote:
> > > > > Right that leaves the last point in my answer to v3. With that addressed
> > > > > this is:
> > > > > 
> > > > > Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
> > > > 
> > > > Hum it seems that we've upstreamed the kernel part without the libdrm
> > > > API. Is it time to fix this? Akash, Chris, any objection if I push this
> > > > v3? is there a more up-to-date patch that was exercised by some driver?
> > > 
> > > There's no actual user of the proposed libdrm API afaik. It should make
> > > a nice drop-in improvement for mesa, but we may need to better expose
> > > bufmgr_gem->has_ext_mmap (I don't think mesa needs to care about failure
> > > with old CPUs if it only focuses on mesa/i965).
> > 
> > We've cleared that with Dave last time and if there's the kernel API in
> > place (because the DDX uses it) then we can upstream the libdrm one, for
> > the sake of completeness.
> 
> I meant in terms of whether the API is sufficient for mesa.

Next time, I'll actually read your comment :)
diff mbox

Patch

diff --git a/include/drm/i915_drm.h b/include/drm/i915_drm.h
index 15dd01d..a91a1d0 100644
--- a/include/drm/i915_drm.h
+++ b/include/drm/i915_drm.h
@@ -340,6 +340,7 @@  typedef struct drm_i915_irq_wait {
 #define I915_PARAM_HAS_EXEC_HANDLE_LUT   26
 #define I915_PARAM_HAS_WT     	 	 27
 #define I915_PARAM_CMD_PARSER_VERSION	 28
+#define I915_PARAM_MMAP_VERSION		 29
 
 typedef struct drm_i915_getparam {
 	int param;
@@ -487,6 +488,14 @@  struct drm_i915_gem_mmap {
 	 * This is a fixed-size type for 32/64 compatibility.
 	 */
 	__u64 addr_ptr;
+
+	/**
+	 * Flags for extended behaviour.
+	 *
+	 * Added in version 2.
+	*/
+	__u64 flags;
+#define I915_MMAP_WC 0x1
 };
 
 struct drm_i915_gem_mmap_gtt {
diff --git a/intel/intel_bufmgr.h b/intel/intel_bufmgr.h
index be83a56..bda4115 100644
--- a/intel/intel_bufmgr.h
+++ b/intel/intel_bufmgr.h
@@ -177,6 +177,9 @@  void drm_intel_bufmgr_gem_set_vma_cache_size(drm_intel_bufmgr *bufmgr,
 int drm_intel_gem_bo_map_unsynchronized(drm_intel_bo *bo);
 int drm_intel_gem_bo_map_gtt(drm_intel_bo *bo);
 int drm_intel_gem_bo_unmap_gtt(drm_intel_bo *bo);
+int drm_intel_gem_bo_map_wc_unsynchronized(drm_intel_bo *bo);
+int drm_intel_gem_bo_map_wc(drm_intel_bo *bo);
+int drm_intel_gem_bo_unmap_wc(drm_intel_bo *bo);
 
 int drm_intel_gem_bo_get_reloc_count(drm_intel_bo *bo);
 void drm_intel_gem_bo_clear_relocs(drm_intel_bo *bo, int start);
diff --git a/intel/intel_bufmgr_gem.c b/intel/intel_bufmgr_gem.c
index f2f4fea..7b53a23 100644
--- a/intel/intel_bufmgr_gem.c
+++ b/intel/intel_bufmgr_gem.c
@@ -130,6 +130,7 @@  typedef struct _drm_intel_bufmgr_gem {
 	unsigned int bo_reuse : 1;
 	unsigned int no_exec : 1;
 	unsigned int has_vebox : 1;
+	unsigned int has_ext_mmap : 1;
 	bool fenced_relocs;
 
 	char *aub_filename;
@@ -184,6 +185,8 @@  struct _drm_intel_bo_gem {
 	int reloc_count;
 	/** Mapped address for the buffer, saved across map/unmap cycles */
 	void *mem_virtual;
+	/** Uncached Mapped address for the buffer, saved across map/unmap cycles */
+	void *mem_wc_virtual;
 	/** GTT virtual address for the buffer, saved across map/unmap cycles */
 	void *gtt_virtual;
 	/**
@@ -1057,6 +1060,10 @@  drm_intel_gem_bo_free(drm_intel_bo *bo)
 		drm_munmap(bo_gem->gtt_virtual, bo_gem->bo.size);
 		bufmgr_gem->vma_count--;
 	}
+	if (bo_gem->mem_wc_virtual) {
+		drm_munmap(bo_gem->mem_wc_virtual, bo_gem->bo.size);
+		bufmgr_gem->vma_count--;
+	}
 
 	/* Close this object */
 	VG_CLEAR(close);
@@ -1081,6 +1088,9 @@  drm_intel_gem_bo_mark_mmaps_incoherent(drm_intel_bo *bo)
 
 	if (bo_gem->gtt_virtual)
 		VALGRIND_MAKE_MEM_NOACCESS(bo_gem->gtt_virtual, bo->size);
+
+	if (bo_gem->mem_wc_virtual)
+		VALGRIND_MAKE_MEM_NOACCESS(bo_gem->mem_wc_virtual, bo->size);
 #endif
 }
 
@@ -1126,6 +1136,8 @@  static void drm_intel_gem_bo_purge_vma_cache(drm_intel_bufmgr_gem *bufmgr_gem)
 
 	/* We may need to evict a few entries in order to create new mmaps */
 	limit = bufmgr_gem->vma_max - 2*bufmgr_gem->vma_open;
+	if (bufmgr_gem->has_ext_mmap)
+		limit -= bufmgr_gem->vma_open;
 	if (limit < 0)
 		limit = 0;
 
@@ -1148,6 +1160,11 @@  static void drm_intel_gem_bo_purge_vma_cache(drm_intel_bufmgr_gem *bufmgr_gem)
 			bo_gem->gtt_virtual = NULL;
 			bufmgr_gem->vma_count--;
 		}
+		if (bo_gem->mem_wc_virtual) {
+			drm_munmap(bo_gem->mem_wc_virtual, bo_gem->bo.size);
+			bo_gem->mem_wc_virtual = NULL;
+			bufmgr_gem->vma_count--;
+		}
 	}
 }
 
@@ -1160,6 +1177,8 @@  static void drm_intel_gem_bo_close_vma(drm_intel_bufmgr_gem *bufmgr_gem,
 		bufmgr_gem->vma_count++;
 	if (bo_gem->gtt_virtual)
 		bufmgr_gem->vma_count++;
+	if (bo_gem->mem_wc_virtual)
+		bufmgr_gem->vma_count++;
 	drm_intel_gem_bo_purge_vma_cache(bufmgr_gem);
 }
 
@@ -1172,6 +1191,8 @@  static void drm_intel_gem_bo_open_vma(drm_intel_bufmgr_gem *bufmgr_gem,
 		bufmgr_gem->vma_count--;
 	if (bo_gem->gtt_virtual)
 		bufmgr_gem->vma_count--;
+	if (bo_gem->mem_wc_virtual)
+		bufmgr_gem->vma_count--;
 	drm_intel_gem_bo_purge_vma_cache(bufmgr_gem);
 }
 
@@ -1267,6 +1288,124 @@  static void drm_intel_gem_bo_unreference(drm_intel_bo *bo)
 	}
 }
 
+static int
+map_wc(drm_intel_bo *bo)
+{
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	int ret;
+
+	if (bo_gem->is_userptr)
+		return -EINVAL;
+
+	if (!bufmgr_gem->has_ext_mmap)
+		return -EINVAL;
+
+	if (bo_gem->map_count++ == 0)
+		drm_intel_gem_bo_open_vma(bufmgr_gem, bo_gem);
+
+	/* Get a mapping of the buffer if we haven't before. */
+	if (bo_gem->mem_wc_virtual == NULL) {
+		struct drm_i915_gem_mmap mmap_arg;
+
+		DBG("bo_map_wc: mmap %d (%s), map_count=%d\n",
+		    bo_gem->gem_handle, bo_gem->name, bo_gem->map_count);
+
+		VG_CLEAR(mmap_arg);
+		mmap_arg.handle = bo_gem->gem_handle;
+		/* To indicate the uncached virtual mapping to KMD */
+		mmap_arg.flags = I915_MMAP_WC;
+		mmap_arg.offset = 0;
+		mmap_arg.size = bo->size;
+		ret = drmIoctl(bufmgr_gem->fd,
+			       DRM_IOCTL_I915_GEM_MMAP,
+			       &mmap_arg);
+		if (ret != 0) {
+			ret = -errno;
+			DBG("%s:%d: Error mapping buffer %d (%s): %s .\n",
+			    __FILE__, __LINE__, bo_gem->gem_handle,
+			    bo_gem->name, strerror(errno));
+			if (--bo_gem->map_count == 0)
+				drm_intel_gem_bo_close_vma(bufmgr_gem, bo_gem);
+			pthread_mutex_unlock(&bufmgr_gem->lock);
+			return ret;
+		}
+		VG(VALGRIND_MALLOCLIKE_BLOCK(mmap_arg.addr_ptr, mmap_arg.size, 0, 1));
+		bo_gem->mem_wc_virtual = (void *)(uintptr_t) mmap_arg.addr_ptr;
+	}
+
+	bo->virtual = bo_gem->mem_wc_virtual;
+
+	DBG("bo_map_wc: %d (%s) -> %p\n", bo_gem->gem_handle, bo_gem->name,
+	    bo_gem->mem_wc_virtual);
+
+	return 0;
+}
+
+/* To be used in a similar way to mmap_gtt */
+drm_public int
+drm_intel_gem_bo_map_wc(drm_intel_bo *bo) {
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+	struct drm_i915_gem_set_domain set_domain;
+	int ret;
+
+	pthread_mutex_lock(&bufmgr_gem->lock);
+
+	ret = map_wc(bo);
+	if (ret) {
+		pthread_mutex_unlock(&bufmgr_gem->lock);
+		return ret;
+	}
+
+	/* Now move it to the GTT domain so that the GPU and CPU
+	 * caches are flushed and the GPU isn't actively using the
+	 * buffer.
+	 *
+	 * The domain change is done even for the objects which
+	 * are not bounded. For them first the pages are acquired,
+	 * before the domain change.
+	 */
+	VG_CLEAR(set_domain);
+	set_domain.handle = bo_gem->gem_handle;
+	set_domain.read_domains = I915_GEM_DOMAIN_GTT;
+	set_domain.write_domain = I915_GEM_DOMAIN_GTT;
+	ret = drmIoctl(bufmgr_gem->fd,
+		       DRM_IOCTL_I915_GEM_SET_DOMAIN,
+		       &set_domain);
+	if (ret != 0) {
+		DBG("%s:%d: Error setting domain %d: %s\n",
+		    __FILE__, __LINE__, bo_gem->gem_handle,
+		    strerror(errno));
+	}
+	drm_intel_gem_bo_mark_mmaps_incoherent(bo);
+	VG(VALGRIND_MAKE_MEM_DEFINED(bo_gem->mem_wc_virtual, bo->size));
+	pthread_mutex_unlock(&bufmgr_gem->lock);
+
+	return 0;
+}
+
+drm_public int
+drm_intel_gem_bo_map_wc_unsynchronized(drm_intel_bo *bo) {
+	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
+#ifdef HAVE_VALGRIND
+	drm_intel_bo_gem *bo_gem = (drm_intel_bo_gem *) bo;
+#endif
+	int ret;
+
+	pthread_mutex_lock(&bufmgr_gem->lock);
+
+	ret = map_wc(bo);
+	if (ret == 0) {
+		drm_intel_gem_bo_mark_mmaps_incoherent(bo);
+		VG(VALGRIND_MAKE_MEM_DEFINED(bo_gem->mem_wc_virtual, bo->size));
+	}
+
+	pthread_mutex_unlock(&bufmgr_gem->lock);
+
+	return ret;
+}
+
 static int drm_intel_gem_bo_map(drm_intel_bo *bo, int write_enable)
 {
 	drm_intel_bufmgr_gem *bufmgr_gem = (drm_intel_bufmgr_gem *) bo->bufmgr;
@@ -1293,6 +1432,7 @@  static int drm_intel_gem_bo_map(drm_intel_bo *bo, int write_enable)
 
 		VG_CLEAR(mmap_arg);
 		mmap_arg.handle = bo_gem->gem_handle;
+		mmap_arg.flags = 0;
 		mmap_arg.offset = 0;
 		mmap_arg.size = bo->size;
 		ret = drmIoctl(bufmgr_gem->fd,
@@ -1553,6 +1693,12 @@  static int drm_intel_gem_bo_unmap(drm_intel_bo *bo)
 }
 
 drm_public int
+drm_intel_gem_bo_unmap_wc(drm_intel_bo *bo)
+{
+	return drm_intel_gem_bo_unmap(bo);
+}
+
+drm_public int
 drm_intel_gem_bo_unmap_gtt(drm_intel_bo *bo)
 {
 	return drm_intel_gem_bo_unmap(bo);
@@ -3538,6 +3684,10 @@  drm_intel_bufmgr_gem_init(int fd, int batch_size)
 	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GETPARAM, &gp);
 	bufmgr_gem->has_vebox = (ret == 0) & (*gp.value > 0);
 
+	gp.param = I915_PARAM_MMAP_VERSION;
+	ret = drmIoctl(bufmgr_gem->fd, DRM_IOCTL_I915_GETPARAM, &gp);
+	bufmgr_gem->has_ext_mmap = (ret == 0) & (*gp.value > 0);
+
 	if (bufmgr_gem->gen < 4) {
 		gp.param = I915_PARAM_NUM_FENCES_AVAIL;
 		gp.value = &bufmgr_gem->available_fences;