Message ID | 1399202305-5765-3-git-send-email-akash.goel@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sun, May 04, 2014 at 04:48:25PM +0530, akash.goel@intel.com wrote: > From: Akash Goel <akash.goel@intel.com> > > Moved the cache flush of the preallocated shmem pages outside > the span of 'struct_mutex' lock. This shall not lead to any > redundancy as the cache flush of the newly allocated pages > will be done anyways when same buffer is submitted to GPU or > when the domain of the object is changed from CPU to GTT > in Gem page fault handler. > > Signed-off-by: Akash Goel <akash.goel@intel.com> > --- > drivers/gpu/drm/i915/i915_gem.c | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index b19ccb8..7ab2635 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -1936,8 +1936,11 @@ i915_gem_object_shmem_preallocate(struct drm_i915_gem_object *obj) > if (IS_ERR(page)) { > DRM_DEBUG_DRIVER("Failure for obj(%p), size(%x) at page(%d)\n", > obj, obj->base.size, i); > - break; > + return; > } > + /* Flush the cpu cache for the page now itself */ > + drm_clflush_pages(&page, 1); > + > /* Decrement the extra ref count on the returned page, > otherwise when 'get_pages_gtt' will be called later on > in the regular path, it will also increment the ref count, > @@ -1945,6 +1948,14 @@ i915_gem_object_shmem_preallocate(struct drm_i915_gem_object *obj) > page_cache_release(page); > } > > + /* > + * Reset the CPU domain now itself, so as to avoid the cache > + * flush later (under 'struct_mutex' lock), as the all pages > + * have been cache flushed. > + * Hope this is safe enough to be done here. > + */ > + obj->base.write_domain = 0; So cache-domain-management is under the struct_mutex, for good reason. However, if we follow the idea to move this to create2, then adding a flag to also do the clflush upon allocation seems fine. Note that I don't think doing it by default is a good match for everything, since my code and also the kernel already takes advantage of the fact that fresh buffers are in the CPU cache domain and uses that to do any initial transfers in the CPU domain. So we have I915_CREATE_POPULATE, I915_CREATE_COHERENT ? -Chris
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index b19ccb8..7ab2635 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1936,8 +1936,11 @@ i915_gem_object_shmem_preallocate(struct drm_i915_gem_object *obj) if (IS_ERR(page)) { DRM_DEBUG_DRIVER("Failure for obj(%p), size(%x) at page(%d)\n", obj, obj->base.size, i); - break; + return; } + /* Flush the cpu cache for the page now itself */ + drm_clflush_pages(&page, 1); + /* Decrement the extra ref count on the returned page, otherwise when 'get_pages_gtt' will be called later on in the regular path, it will also increment the ref count, @@ -1945,6 +1948,14 @@ i915_gem_object_shmem_preallocate(struct drm_i915_gem_object *obj) page_cache_release(page); } + /* + * Reset the CPU domain now itself, so as to avoid the cache + * flush later (under 'struct_mutex' lock), as the all pages + * have been cache flushed. + * Hope this is safe enough to be done here. + */ + obj->base.write_domain = 0; + trace_i915_gem_obj_prealloc_end(obj); }