Message ID | 20220426165148.1784-3-balasubramani.vivekanandan@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915: Use the memcpy_from_wc function from drm | expand |
On Tue, Apr 26, 2022 at 10:21:43PM +0530, Balasubramani Vivekanandan wrote: >Fast copy using non-temporal instructions for x86 currently exists at two >locations. One is implemented in i915 driver at i915/i915_memcpy.c and >another copy at drm_cache.c. The plan is to remove the duplicate >implementation in i915 driver and use the functions from drm_cache.c. > >A variant of drm_memcpy_from_wc() is added in drm_cache.c which accepts >address as argument instead of iosys_map for destination. It is a very >common scenario in i915 to copy from a WC memory type, which may be an >io memory or a system memory to a destination address pointing to system >memory. To avoid the overhead of creating iosys_map type for the >destination, new variant is created to accept the address directly. > >Also a new function is exported in drm_cache.c to find if the fast copy >is supported by the platform or not. It is required for i915. > >v2: Added a new argument to drm_memcpy_from_wc_vaddr() which provides >the offset into the src address to start copy from. > >Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> >Cc: Maxime Ripard <mripard@kernel.org> >Cc: Thomas Zimmermann <tzimmermann@suse.de> >Cc: David Airlie <airlied@linux.ie> >Cc: Daniel Vetter <daniel@ffwll.ch> >Cc: Thomas Hellstr_m <thomas.hellstrom@linux.intel.com> ^ utf-8 error? > >Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> >Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> >Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> >--- > drivers/gpu/drm/drm_cache.c | 55 +++++++++++++++++++++++++++++++++++++ > include/drm/drm_cache.h | 3 ++ > 2 files changed, 58 insertions(+) > >diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c >index 2e2545df3310..8c7af755f7bc 100644 >--- a/drivers/gpu/drm/drm_cache.c >+++ b/drivers/gpu/drm/drm_cache.c >@@ -358,6 +358,55 @@ void drm_memcpy_from_wc(struct iosys_map *dst, > } > EXPORT_SYMBOL(drm_memcpy_from_wc); > >+/** >+ * drm_memcpy_from_wc_vaddr - Perform the fastest available memcpy from a source >+ * that may be WC to a destination in system memory. >+ * @dst: The destination pointer >+ * @src: The source pointer >+ * @src_offset: The offset from which to copy >+ * @len: The size of the area to transfer in bytes >+ * >+ * Same as drm_memcpy_from_wc except destination is accepted as system memory >+ * address. Useful in situations where passing destination address as iosys_map >+ * is simply an overhead and can be avoided. >+ */ >+void drm_memcpy_from_wc_vaddr(void *dst, const struct iosys_map *src, >+ size_t src_offset, unsigned long len) >+{ >+ const void *src_addr = src->is_iomem ? >+ (void const __force *)src->vaddr_iomem : >+ src->vaddr; checking this again... this is a layer violation. We do have some in the codebase, but it'd be good not to add more. later I'd like these memcpy variants to be moved iosys-map. I'm not sure if Thomas Zimmermann or Christian agree. Cc'ing them here. Maybe we could scan the codebase and add a function in iosys-map like: void *iosys_map_get_ptr(struct iosys_map *map, bool is_lmem) ... which is similar to the ttm_kmap_obj_virtual() function we have in ttm. so the cases where the layer is violated at least they come from a single function call in iosys_map API rather than directly accessing is_iomem/vaddr_iomem/vaddr. Thomas/Christian, any opinion here? Lucas De Marchi >+ >+ if (WARN_ON(in_interrupt())) { >+ iosys_map_memcpy_from(dst, src, src_offset, len); >+ return; >+ } >+ >+ if (static_branch_likely(&has_movntdqa)) { >+ __drm_memcpy_from_wc(dst, src_addr + src_offset, len); >+ return; >+ } >+ >+ iosys_map_memcpy_from(dst, src, src_offset, len); >+} >+EXPORT_SYMBOL(drm_memcpy_from_wc_vaddr); >+ >+/* >+ * drm_memcpy_fastcopy_supported - Returns if fast copy using non-temporal >+ * instructions is supported >+ * >+ * Returns true if platform has support for fast copying from wc memory type >+ * using non-temporal instructions. Else false. >+ */ >+bool drm_memcpy_fastcopy_supported(void) >+{ >+ if (static_branch_likely(&has_movntdqa)) >+ return true; >+ >+ return false; >+} >+EXPORT_SYMBOL(drm_memcpy_fastcopy_supported); >+ > /* > * drm_memcpy_init_early - One time initialization of the WC memcpy code > */ >@@ -382,6 +431,12 @@ void drm_memcpy_from_wc(struct iosys_map *dst, > } > EXPORT_SYMBOL(drm_memcpy_from_wc); > >+bool drm_memcpy_fastcopy_supported(void) >+{ >+ return false; >+} >+EXPORT_SYMBOL(drm_memcpy_fastcopy_supported); >+ > void drm_memcpy_init_early(void) > { > } >diff --git a/include/drm/drm_cache.h b/include/drm/drm_cache.h >index 22deb216b59c..d1b57c84a659 100644 >--- a/include/drm/drm_cache.h >+++ b/include/drm/drm_cache.h >@@ -77,4 +77,7 @@ void drm_memcpy_init_early(void); > void drm_memcpy_from_wc(struct iosys_map *dst, > const struct iosys_map *src, > unsigned long len); >+bool drm_memcpy_fastcopy_supported(void); >+void drm_memcpy_from_wc_vaddr(void *dst, const struct iosys_map *src, >+ size_t src_offset, unsigned long len); > #endif >-- >2.25.1 >
Am 13.07.22 um 17:47 schrieb Lucas De Marchi: > On Tue, Apr 26, 2022 at 10:21:43PM +0530, Balasubramani Vivekanandan > wrote: >> Fast copy using non-temporal instructions for x86 currently exists at >> two >> locations. One is implemented in i915 driver at i915/i915_memcpy.c and >> another copy at drm_cache.c. The plan is to remove the duplicate >> implementation in i915 driver and use the functions from drm_cache.c. >> >> A variant of drm_memcpy_from_wc() is added in drm_cache.c which accepts >> address as argument instead of iosys_map for destination. It is a very >> common scenario in i915 to copy from a WC memory type, which may be an >> io memory or a system memory to a destination address pointing to system >> memory. To avoid the overhead of creating iosys_map type for the >> destination, new variant is created to accept the address directly. Well that is pretty much a NAK. iosys_map was created to avoid exactly this kind of usage. Creating a temporary iosys_map instance on the stack should have basically no overhead at all. >> >> Also a new function is exported in drm_cache.c to find if the fast copy >> is supported by the platform or not. It is required for i915. >> >> v2: Added a new argument to drm_memcpy_from_wc_vaddr() which provides >> the offset into the src address to start copy from. >> >> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> >> Cc: Maxime Ripard <mripard@kernel.org> >> Cc: Thomas Zimmermann <tzimmermann@suse.de> >> Cc: David Airlie <airlied@linux.ie> >> Cc: Daniel Vetter <daniel@ffwll.ch> >> Cc: Thomas Hellstr_m <thomas.hellstrom@linux.intel.com> > > ^ utf-8 error? > >> >> Signed-off-by: Balasubramani Vivekanandan >> <balasubramani.vivekanandan@intel.com> >> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> >> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> >> --- >> drivers/gpu/drm/drm_cache.c | 55 +++++++++++++++++++++++++++++++++++++ >> include/drm/drm_cache.h | 3 ++ >> 2 files changed, 58 insertions(+) >> >> diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c >> index 2e2545df3310..8c7af755f7bc 100644 >> --- a/drivers/gpu/drm/drm_cache.c >> +++ b/drivers/gpu/drm/drm_cache.c >> @@ -358,6 +358,55 @@ void drm_memcpy_from_wc(struct iosys_map *dst, >> } >> EXPORT_SYMBOL(drm_memcpy_from_wc); >> >> +/** >> + * drm_memcpy_from_wc_vaddr - Perform the fastest available memcpy >> from a source >> + * that may be WC to a destination in system memory. >> + * @dst: The destination pointer >> + * @src: The source pointer >> + * @src_offset: The offset from which to copy >> + * @len: The size of the area to transfer in bytes >> + * >> + * Same as drm_memcpy_from_wc except destination is accepted as >> system memory >> + * address. Useful in situations where passing destination address >> as iosys_map >> + * is simply an overhead and can be avoided. >> + */ >> +void drm_memcpy_from_wc_vaddr(void *dst, const struct iosys_map *src, >> + size_t src_offset, unsigned long len) >> +{ >> + const void *src_addr = src->is_iomem ? >> + (void const __force *)src->vaddr_iomem : >> + src->vaddr; > > checking this again... this is a layer violation. We do have some in > the codebase, but it'd be good not to add more. Yes, agree completely. That here is something we should try hard to avoid. > > later I'd like these memcpy variants to be moved iosys-map. I'm not sure > if Thomas Zimmermann or Christian agree. Cc'ing them here. Again yes, that sounds like a good idea to me as well. > > Maybe we could scan the codebase and add a function in iosys-map like: > > void *iosys_map_get_ptr(struct iosys_map *map, bool is_lmem) > > ... which is similar to the ttm_kmap_obj_virtual() function we have in > ttm. > > so the cases where the layer is violated at least they come from a > single function call in iosys_map API rather than directly accessing > is_iomem/vaddr_iomem/vaddr. Thomas/Christian, any opinion here? As noted above creating an iosys_map instance on the stack has basically no overhead at all. So we should really switch those functions over to using the structure directly. Regards, Christian. > > Lucas De Marchi > >> + >> + if (WARN_ON(in_interrupt())) { >> + iosys_map_memcpy_from(dst, src, src_offset, len); >> + return; >> + } >> + >> + if (static_branch_likely(&has_movntdqa)) { >> + __drm_memcpy_from_wc(dst, src_addr + src_offset, len); >> + return; >> + } >> + >> + iosys_map_memcpy_from(dst, src, src_offset, len); >> +} >> +EXPORT_SYMBOL(drm_memcpy_from_wc_vaddr); >> + >> +/* >> + * drm_memcpy_fastcopy_supported - Returns if fast copy using >> non-temporal >> + * instructions is supported >> + * >> + * Returns true if platform has support for fast copying from wc >> memory type >> + * using non-temporal instructions. Else false. >> + */ >> +bool drm_memcpy_fastcopy_supported(void) >> +{ >> + if (static_branch_likely(&has_movntdqa)) >> + return true; >> + >> + return false; >> +} >> +EXPORT_SYMBOL(drm_memcpy_fastcopy_supported); >> + >> /* >> * drm_memcpy_init_early - One time initialization of the WC memcpy code >> */ >> @@ -382,6 +431,12 @@ void drm_memcpy_from_wc(struct iosys_map *dst, >> } >> EXPORT_SYMBOL(drm_memcpy_from_wc); >> >> +bool drm_memcpy_fastcopy_supported(void) >> +{ >> + return false; >> +} >> +EXPORT_SYMBOL(drm_memcpy_fastcopy_supported); >> + >> void drm_memcpy_init_early(void) >> { >> } >> diff --git a/include/drm/drm_cache.h b/include/drm/drm_cache.h >> index 22deb216b59c..d1b57c84a659 100644 >> --- a/include/drm/drm_cache.h >> +++ b/include/drm/drm_cache.h >> @@ -77,4 +77,7 @@ void drm_memcpy_init_early(void); >> void drm_memcpy_from_wc(struct iosys_map *dst, >> const struct iosys_map *src, >> unsigned long len); >> +bool drm_memcpy_fastcopy_supported(void); >> +void drm_memcpy_from_wc_vaddr(void *dst, const struct iosys_map *src, >> + size_t src_offset, unsigned long len); >> #endif >> -- >> 2.25.1 >>
diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c index 2e2545df3310..8c7af755f7bc 100644 --- a/drivers/gpu/drm/drm_cache.c +++ b/drivers/gpu/drm/drm_cache.c @@ -358,6 +358,55 @@ void drm_memcpy_from_wc(struct iosys_map *dst, } EXPORT_SYMBOL(drm_memcpy_from_wc); +/** + * drm_memcpy_from_wc_vaddr - Perform the fastest available memcpy from a source + * that may be WC to a destination in system memory. + * @dst: The destination pointer + * @src: The source pointer + * @src_offset: The offset from which to copy + * @len: The size of the area to transfer in bytes + * + * Same as drm_memcpy_from_wc except destination is accepted as system memory + * address. Useful in situations where passing destination address as iosys_map + * is simply an overhead and can be avoided. + */ +void drm_memcpy_from_wc_vaddr(void *dst, const struct iosys_map *src, + size_t src_offset, unsigned long len) +{ + const void *src_addr = src->is_iomem ? + (void const __force *)src->vaddr_iomem : + src->vaddr; + + if (WARN_ON(in_interrupt())) { + iosys_map_memcpy_from(dst, src, src_offset, len); + return; + } + + if (static_branch_likely(&has_movntdqa)) { + __drm_memcpy_from_wc(dst, src_addr + src_offset, len); + return; + } + + iosys_map_memcpy_from(dst, src, src_offset, len); +} +EXPORT_SYMBOL(drm_memcpy_from_wc_vaddr); + +/* + * drm_memcpy_fastcopy_supported - Returns if fast copy using non-temporal + * instructions is supported + * + * Returns true if platform has support for fast copying from wc memory type + * using non-temporal instructions. Else false. + */ +bool drm_memcpy_fastcopy_supported(void) +{ + if (static_branch_likely(&has_movntdqa)) + return true; + + return false; +} +EXPORT_SYMBOL(drm_memcpy_fastcopy_supported); + /* * drm_memcpy_init_early - One time initialization of the WC memcpy code */ @@ -382,6 +431,12 @@ void drm_memcpy_from_wc(struct iosys_map *dst, } EXPORT_SYMBOL(drm_memcpy_from_wc); +bool drm_memcpy_fastcopy_supported(void) +{ + return false; +} +EXPORT_SYMBOL(drm_memcpy_fastcopy_supported); + void drm_memcpy_init_early(void) { } diff --git a/include/drm/drm_cache.h b/include/drm/drm_cache.h index 22deb216b59c..d1b57c84a659 100644 --- a/include/drm/drm_cache.h +++ b/include/drm/drm_cache.h @@ -77,4 +77,7 @@ void drm_memcpy_init_early(void); void drm_memcpy_from_wc(struct iosys_map *dst, const struct iosys_map *src, unsigned long len); +bool drm_memcpy_fastcopy_supported(void); +void drm_memcpy_from_wc_vaddr(void *dst, const struct iosys_map *src, + size_t src_offset, unsigned long len); #endif