Message ID | 20210114201314.783648-3-imre.deak@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915/gen12: Add display render clear color decompression support | expand |
Quoting Imre Deak (2021-01-14 20:13:13) > Add a simple helper to read data with the CPU from the page of a GEM > object. Do the read either via a kmap if the object has struct pages > or an iomap otherwise. This is needed by the next patch, reading a u64 > value from the object (w/o requiring the obj to be mapped to the GPU). > > Suggested by Chris. > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Signed-off-by: Imre Deak <imre.deak@intel.com> > --- > drivers/gpu/drm/i915/gem/i915_gem_object.c | 75 ++++++++++++++++++++++ > drivers/gpu/drm/i915/gem/i915_gem_object.h | 2 + > 2 files changed, 77 insertions(+) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c > index 00d24000b5e8..010f8d735e40 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c > @@ -32,6 +32,7 @@ > #include "i915_gem_mman.h" > #include "i915_gem_object.h" > #include "i915_globals.h" > +#include "i915_memcpy.h" > #include "i915_trace.h" > > static struct i915_global_object { > @@ -383,6 +384,80 @@ void __i915_gem_object_invalidate_frontbuffer(struct drm_i915_gem_object *obj, > } > } > > +static void > +i915_gem_object_read_from_page_kmap(struct drm_i915_gem_object *obj, unsigned long offset, int size, void *dst) [noted later about parameter order + types] > +{ > + const void *src_map; > + const void *src_ptr; > + > + src_map = kmap_atomic(i915_gem_object_get_page(obj, offset >> PAGE_SHIFT)); > + > + src_ptr = src_map + offset_in_page(offset); > + if (!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_READ)) > + drm_clflush_virt_range((void *)src_ptr, size); > + memcpy(dst, src_ptr, size); > + > + kunmap_atomic((void *)src_map); Live without marking the src pointers as const*. > +} > + > +static void > +i915_gem_object_read_from_page_iomap(struct drm_i915_gem_object *obj, unsigned long offset, int size, void *dst) > +{ > + const void __iomem *src_map; > + const void __iomem *src_ptr; > + > + src_map = io_mapping_map_wc(&obj->mm.region->iomap, > + i915_gem_object_get_dma_address(obj, offset >> PAGE_SHIFT), > + PAGE_SIZE); > + > + src_ptr = src_map + offset_in_page(offset); > + if (!i915_memcpy_from_wc(dst, src_ptr, size)) > + memcpy(dst, src_ptr, size); Sparse will complain about the mixed __iomem/regular pointers. So you might as well use memcpy_from_io() here. Unfortunately memcpy_from_wc needs explicit casting. A task for rainy day is massaging i915_memcpy_from_wc() to be sparse clean for iomem. > + > + io_mapping_unmap((void __iomem *)src_map); > +} > + > +/** > + * i915_gem_object_read_from_page - read data from the page of a GEM object > + * @obj: GEM object to read from > + * @offset: offset within the object > + * @size: size to read > + * @dst: buffer to store the read data > + * > + * Reads data from @obj after syncing against any pending GPU writes on it. > + * The requested region to read from can't cross a page boundary. > + * > + * Returns 0 on sucess, negative error code on failre. > + */ > +int i915_gem_object_read_from_page(struct drm_i915_gem_object *obj, unsigned long offset, size_t size, void *dst) offset -> u64 size_t size? meh, it must only be an int We use the convention of read_from_page(obj, offset_into_obj, dst, length_of_read_into_dst) for parameter ordering. > +{ > + int ret; > + > + WARN_ON(offset + size > obj->base.size || > + offset_in_page(offset) + size > PAGE_SIZE); This is only from internal users, so GEM_BUG_ON() (or you would use if(GEM_WARN_ON) return -EINVAL). GEM_BUG_ON(offset > obj->base.size); GEM_BUG_ON(offset_in_page(offset) > PAGE_SIZE - size); (since size is a multiple of pages) > + > + i915_gem_object_lock(obj, NULL); > + > + ret = i915_gem_object_wait(obj, 0, MAX_SCHEDULE_TIMEOUT); > + if (ret) > + goto unlock; Is there an absolute requirement for this read to be serialised against everything? If not, let the caller decide if they need some sort of flush/wait before reading, and the lock can be removed. In any case, always prefer interruptible waits and if there's a callpath that absolutely must not be interruptible, pass that information along the arguments. > + ret = i915_gem_object_pin_pages(obj); So at present one would not need to lock the object for the pages. And then we would not need to hold the lock across the read as we hold the pages. > + if (ret) > + goto unlock; > + > + if (i915_gem_object_has_struct_page(obj)) > + i915_gem_object_read_from_page_kmap(obj, offset, size, dst); > + else else if (i915_gem_object_is_iomem(obj)) > + i915_gem_object_read_from_page_iomap(obj, offset, size, dst); else ret = -ENODEV; But on the whole, this works and is agnostic enough to handle current HW. -Chris
On Thu, Jan 14, 2021 at 09:23:51PM +0000, Chris Wilson wrote: > Quoting Imre Deak (2021-01-14 20:13:13) > > Add a simple helper to read data with the CPU from the page of a GEM > > object. Do the read either via a kmap if the object has struct pages > > or an iomap otherwise. This is needed by the next patch, reading a u64 > > value from the object (w/o requiring the obj to be mapped to the GPU). > > > > Suggested by Chris. > > > > Cc: Chris Wilson <chris@chris-wilson.co.uk> > > Signed-off-by: Imre Deak <imre.deak@intel.com> > > --- > > drivers/gpu/drm/i915/gem/i915_gem_object.c | 75 ++++++++++++++++++++++ > > drivers/gpu/drm/i915/gem/i915_gem_object.h | 2 + > > 2 files changed, 77 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c > > index 00d24000b5e8..010f8d735e40 100644 > > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c > > @@ -32,6 +32,7 @@ > > #include "i915_gem_mman.h" > > #include "i915_gem_object.h" > > #include "i915_globals.h" > > +#include "i915_memcpy.h" > > #include "i915_trace.h" > > > > static struct i915_global_object { > > @@ -383,6 +384,80 @@ void __i915_gem_object_invalidate_frontbuffer(struct drm_i915_gem_object *obj, > > } > > } > > > > +static void > > +i915_gem_object_read_from_page_kmap(struct drm_i915_gem_object *obj, unsigned long offset, int size, void *dst) > [noted later about parameter order + types] > > > +{ > > + const void *src_map; > > + const void *src_ptr; > > + > > + src_map = kmap_atomic(i915_gem_object_get_page(obj, offset >> PAGE_SHIFT)); > > + > > + src_ptr = src_map + offset_in_page(offset); > > + if (!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_READ)) > > + drm_clflush_virt_range((void *)src_ptr, size); > > + memcpy(dst, src_ptr, size); > > + > > + kunmap_atomic((void *)src_map); > > Live without marking the src pointers as const*. Ok. > > +} > > + > > +static void > > +i915_gem_object_read_from_page_iomap(struct drm_i915_gem_object *obj, unsigned long offset, int size, void *dst) > > +{ > > + const void __iomem *src_map; > > + const void __iomem *src_ptr; > > + > > + src_map = io_mapping_map_wc(&obj->mm.region->iomap, > > + i915_gem_object_get_dma_address(obj, offset >> PAGE_SHIFT), > > + PAGE_SIZE); > > + > > + src_ptr = src_map + offset_in_page(offset); > > + if (!i915_memcpy_from_wc(dst, src_ptr, size)) > > + memcpy(dst, src_ptr, size); > > Sparse will complain about the mixed __iomem/regular pointers. So you > might as well use memcpy_from_io() here. Ok. > Unfortunately memcpy_from_wc needs explicit casting. Ok. > A task for rainy day is massaging i915_memcpy_from_wc() to be sparse > clean for iomem. > > > + > > + io_mapping_unmap((void __iomem *)src_map); > > +} > > + > > +/** > > + * i915_gem_object_read_from_page - read data from the page of a GEM object > > + * @obj: GEM object to read from > > + * @offset: offset within the object > > + * @size: size to read > > + * @dst: buffer to store the read data > > + * > > + * Reads data from @obj after syncing against any pending GPU writes on it. > > + * The requested region to read from can't cross a page boundary. > > + * > > + * Returns 0 on sucess, negative error code on failre. > > + */ > > +int i915_gem_object_read_from_page(struct drm_i915_gem_object *obj, unsigned long offset, size_t size, void *dst) > > offset -> u64 Ok. > size_t size? meh, it must only be an int Yes, used int but forgot to change it here. > We use the convention of > read_from_page(obj, offset_into_obj, > dst, length_of_read_into_dst) > for parameter ordering. Ok. > > +{ > > + int ret; > > + > > + WARN_ON(offset + size > obj->base.size || > > + offset_in_page(offset) + size > PAGE_SIZE); > > This is only from internal users, so GEM_BUG_ON() (or you would use > if(GEM_WARN_ON) return -EINVAL). > > GEM_BUG_ON(offset > obj->base.size); > GEM_BUG_ON(offset_in_page(offset) > PAGE_SIZE - size); > (since size is a multiple of pages) Ok, will use GEM_BUG_ON(). > > + > > + i915_gem_object_lock(obj, NULL); > > + > > + ret = i915_gem_object_wait(obj, 0, MAX_SCHEDULE_TIMEOUT); > > + if (ret) > > + goto unlock; > > Is there an absolute requirement for this read to be serialised against > everything? No, especially not for the only current user that has it already synced. I thought that syncing against any write makes always sense, but I suppose the user may want instead something more fine-grained. > If not, let the caller decide if they need some sort of flush/wait > before reading, and the lock can be removed. > > In any case, always prefer interruptible waits and if there's a callpath > that absolutely must not be interruptible, pass that information along > the arguments. Atm it's only used from atomic_commit_tail() which can't fail, hence went for uninterruptible. But I'll remove the lock. > > > + ret = i915_gem_object_pin_pages(obj); > > So at present one would not need to lock the object for the pages. > And then we would not need to hold the lock across the read as we hold > the pages. Ok, so will remove all of lock/wait/pin and leave instead only a GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj)); > > > + if (ret) > > + goto unlock; > > + > > + if (i915_gem_object_has_struct_page(obj)) > > + i915_gem_object_read_from_page_kmap(obj, offset, size, dst); > > + else > else if (i915_gem_object_is_iomem(obj)) > > + i915_gem_object_read_from_page_iomap(obj, offset, size, dst); > else > ret = -ENODEV; Ok. > But on the whole, this works and is agnostic enough to handle current HW. Thanks for the review. > -Chris
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 00d24000b5e8..010f8d735e40 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -32,6 +32,7 @@ #include "i915_gem_mman.h" #include "i915_gem_object.h" #include "i915_globals.h" +#include "i915_memcpy.h" #include "i915_trace.h" static struct i915_global_object { @@ -383,6 +384,80 @@ void __i915_gem_object_invalidate_frontbuffer(struct drm_i915_gem_object *obj, } } +static void +i915_gem_object_read_from_page_kmap(struct drm_i915_gem_object *obj, unsigned long offset, int size, void *dst) +{ + const void *src_map; + const void *src_ptr; + + src_map = kmap_atomic(i915_gem_object_get_page(obj, offset >> PAGE_SHIFT)); + + src_ptr = src_map + offset_in_page(offset); + if (!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_READ)) + drm_clflush_virt_range((void *)src_ptr, size); + memcpy(dst, src_ptr, size); + + kunmap_atomic((void *)src_map); +} + +static void +i915_gem_object_read_from_page_iomap(struct drm_i915_gem_object *obj, unsigned long offset, int size, void *dst) +{ + const void __iomem *src_map; + const void __iomem *src_ptr; + + src_map = io_mapping_map_wc(&obj->mm.region->iomap, + i915_gem_object_get_dma_address(obj, offset >> PAGE_SHIFT), + PAGE_SIZE); + + src_ptr = src_map + offset_in_page(offset); + if (!i915_memcpy_from_wc(dst, src_ptr, size)) + memcpy(dst, src_ptr, size); + + io_mapping_unmap((void __iomem *)src_map); +} + +/** + * i915_gem_object_read_from_page - read data from the page of a GEM object + * @obj: GEM object to read from + * @offset: offset within the object + * @size: size to read + * @dst: buffer to store the read data + * + * Reads data from @obj after syncing against any pending GPU writes on it. + * The requested region to read from can't cross a page boundary. + * + * Returns 0 on sucess, negative error code on failre. + */ +int i915_gem_object_read_from_page(struct drm_i915_gem_object *obj, unsigned long offset, size_t size, void *dst) +{ + int ret; + + WARN_ON(offset + size > obj->base.size || + offset_in_page(offset) + size > PAGE_SIZE); + + i915_gem_object_lock(obj, NULL); + + ret = i915_gem_object_wait(obj, 0, MAX_SCHEDULE_TIMEOUT); + if (ret) + goto unlock; + + ret = i915_gem_object_pin_pages(obj); + if (ret) + goto unlock; + + if (i915_gem_object_has_struct_page(obj)) + i915_gem_object_read_from_page_kmap(obj, offset, size, dst); + else + i915_gem_object_read_from_page_iomap(obj, offset, size, dst); + + i915_gem_object_unpin_pages(obj); +unlock: + i915_gem_object_unlock(obj); + + return ret; +} + void i915_gem_init__objects(struct drm_i915_private *i915) { INIT_WORK(&i915->mm.free_work, __i915_gem_free_work); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index be14486f63a7..75223f472a2b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -540,4 +540,6 @@ i915_gem_object_invalidate_frontbuffer(struct drm_i915_gem_object *obj, __i915_gem_object_invalidate_frontbuffer(obj, origin); } +int i915_gem_object_read_from_page(struct drm_i915_gem_object *obj, unsigned long offset, size_t size, void *dst); + #endif
Add a simple helper to read data with the CPU from the page of a GEM object. Do the read either via a kmap if the object has struct pages or an iomap otherwise. This is needed by the next patch, reading a u64 value from the object (w/o requiring the obj to be mapped to the GPU). Suggested by Chris. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Imre Deak <imre.deak@intel.com> --- drivers/gpu/drm/i915/gem/i915_gem_object.c | 75 ++++++++++++++++++++++ drivers/gpu/drm/i915/gem/i915_gem_object.h | 2 + 2 files changed, 77 insertions(+)