diff mbox series

[v3,2/7] drm: Add drm_memcpy_from_wc() variant which accepts destination address

Message ID 20220426165148.1784-3-balasubramani.vivekanandan@intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915: Use the memcpy_from_wc function from drm | expand

Commit Message

Vivekanandan, Balasubramani April 26, 2022, 4:51 p.m. UTC
Fast copy using non-temporal instructions for x86 currently exists at two
locations. One is implemented in i915 driver at i915/i915_memcpy.c and
another copy at drm_cache.c. The plan is to remove the duplicate
implementation in i915 driver and use the functions from drm_cache.c.

A variant of drm_memcpy_from_wc() is added in drm_cache.c which accepts
address as argument instead of iosys_map for destination. It is a very
common scenario in i915 to copy from a WC memory type, which may be an
io memory or a system memory to a destination address pointing to system
memory. To avoid the overhead of creating iosys_map type for the
destination, new variant is created to accept the address directly.

Also a new function is exported in drm_cache.c to find if the fast copy
is supported by the platform or not. It is required for i915.

v2: Added a new argument to drm_memcpy_from_wc_vaddr() which provides
the offset into the src address to start copy from.

Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Thomas Hellstr_m <thomas.hellstrom@linux.intel.com>

Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
---
 drivers/gpu/drm/drm_cache.c | 55 +++++++++++++++++++++++++++++++++++++
 include/drm/drm_cache.h     |  3 ++
 2 files changed, 58 insertions(+)

Comments

Lucas De Marchi July 13, 2022, 3:47 p.m. UTC | #1
On Tue, Apr 26, 2022 at 10:21:43PM +0530, Balasubramani Vivekanandan wrote:
>Fast copy using non-temporal instructions for x86 currently exists at two
>locations. One is implemented in i915 driver at i915/i915_memcpy.c and
>another copy at drm_cache.c. The plan is to remove the duplicate
>implementation in i915 driver and use the functions from drm_cache.c.
>
>A variant of drm_memcpy_from_wc() is added in drm_cache.c which accepts
>address as argument instead of iosys_map for destination. It is a very
>common scenario in i915 to copy from a WC memory type, which may be an
>io memory or a system memory to a destination address pointing to system
>memory. To avoid the overhead of creating iosys_map type for the
>destination, new variant is created to accept the address directly.
>
>Also a new function is exported in drm_cache.c to find if the fast copy
>is supported by the platform or not. It is required for i915.
>
>v2: Added a new argument to drm_memcpy_from_wc_vaddr() which provides
>the offset into the src address to start copy from.
>
>Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>Cc: Maxime Ripard <mripard@kernel.org>
>Cc: Thomas Zimmermann <tzimmermann@suse.de>
>Cc: David Airlie <airlied@linux.ie>
>Cc: Daniel Vetter <daniel@ffwll.ch>
>Cc: Thomas Hellstr_m <thomas.hellstrom@linux.intel.com>

		   ^ utf-8 error?

>
>Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
>Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
>Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
>---
> drivers/gpu/drm/drm_cache.c | 55 +++++++++++++++++++++++++++++++++++++
> include/drm/drm_cache.h     |  3 ++
> 2 files changed, 58 insertions(+)
>
>diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
>index 2e2545df3310..8c7af755f7bc 100644
>--- a/drivers/gpu/drm/drm_cache.c
>+++ b/drivers/gpu/drm/drm_cache.c
>@@ -358,6 +358,55 @@ void drm_memcpy_from_wc(struct iosys_map *dst,
> }
> EXPORT_SYMBOL(drm_memcpy_from_wc);
>
>+/**
>+ * drm_memcpy_from_wc_vaddr - Perform the fastest available memcpy from a source
>+ * that may be WC to a destination in system memory.
>+ * @dst: The destination pointer
>+ * @src: The source pointer
>+ * @src_offset: The offset from which to copy
>+ * @len: The size of the area to transfer in bytes
>+ *
>+ * Same as drm_memcpy_from_wc except destination is accepted as system memory
>+ * address. Useful in situations where passing destination address as iosys_map
>+ * is simply an overhead and can be avoided.
>+ */
>+void drm_memcpy_from_wc_vaddr(void *dst, const struct iosys_map *src,
>+			      size_t src_offset, unsigned long len)
>+{
>+	const void *src_addr = src->is_iomem ?
>+				(void const __force *)src->vaddr_iomem :
>+				src->vaddr;

checking this again... this is a layer violation. We do have some in
the codebase, but it'd be good not to add more.

later I'd like these memcpy variants to be moved iosys-map. I'm not sure
if Thomas Zimmermann or Christian agree. Cc'ing them here.

Maybe we could scan the codebase and add a function in iosys-map like:

	void *iosys_map_get_ptr(struct iosys_map *map, bool is_lmem)

... which is similar to the ttm_kmap_obj_virtual() function we have in
ttm.

so the cases where the layer is violated at least they come from a
single function call in iosys_map API rather than directly accessing
is_iomem/vaddr_iomem/vaddr. Thomas/Christian, any opinion here?

Lucas De Marchi

>+
>+	if (WARN_ON(in_interrupt())) {
>+		iosys_map_memcpy_from(dst, src, src_offset, len);
>+		return;
>+	}
>+
>+	if (static_branch_likely(&has_movntdqa)) {
>+		__drm_memcpy_from_wc(dst, src_addr + src_offset, len);
>+		return;
>+	}
>+
>+	iosys_map_memcpy_from(dst, src, src_offset, len);
>+}
>+EXPORT_SYMBOL(drm_memcpy_from_wc_vaddr);
>+
>+/*
>+ * drm_memcpy_fastcopy_supported - Returns if fast copy using non-temporal
>+ * instructions is supported
>+ *
>+ * Returns true if platform has support for fast copying from wc memory type
>+ * using non-temporal instructions. Else false.
>+ */
>+bool drm_memcpy_fastcopy_supported(void)
>+{
>+	if (static_branch_likely(&has_movntdqa))
>+		return true;
>+
>+	return false;
>+}
>+EXPORT_SYMBOL(drm_memcpy_fastcopy_supported);
>+
> /*
>  * drm_memcpy_init_early - One time initialization of the WC memcpy code
>  */
>@@ -382,6 +431,12 @@ void drm_memcpy_from_wc(struct iosys_map *dst,
> }
> EXPORT_SYMBOL(drm_memcpy_from_wc);
>
>+bool drm_memcpy_fastcopy_supported(void)
>+{
>+	return false;
>+}
>+EXPORT_SYMBOL(drm_memcpy_fastcopy_supported);
>+
> void drm_memcpy_init_early(void)
> {
> }
>diff --git a/include/drm/drm_cache.h b/include/drm/drm_cache.h
>index 22deb216b59c..d1b57c84a659 100644
>--- a/include/drm/drm_cache.h
>+++ b/include/drm/drm_cache.h
>@@ -77,4 +77,7 @@ void drm_memcpy_init_early(void);
> void drm_memcpy_from_wc(struct iosys_map *dst,
> 			const struct iosys_map *src,
> 			unsigned long len);
>+bool drm_memcpy_fastcopy_supported(void);
>+void drm_memcpy_from_wc_vaddr(void *dst, const struct iosys_map *src,
>+			      size_t src_offset, unsigned long len);
> #endif
>-- 
>2.25.1
>
Christian König July 14, 2022, 11:20 a.m. UTC | #2
Am 13.07.22 um 17:47 schrieb Lucas De Marchi:
> On Tue, Apr 26, 2022 at 10:21:43PM +0530, Balasubramani Vivekanandan 
> wrote:
>> Fast copy using non-temporal instructions for x86 currently exists at 
>> two
>> locations. One is implemented in i915 driver at i915/i915_memcpy.c and
>> another copy at drm_cache.c. The plan is to remove the duplicate
>> implementation in i915 driver and use the functions from drm_cache.c.
>>
>> A variant of drm_memcpy_from_wc() is added in drm_cache.c which accepts
>> address as argument instead of iosys_map for destination. It is a very
>> common scenario in i915 to copy from a WC memory type, which may be an
>> io memory or a system memory to a destination address pointing to system
>> memory. To avoid the overhead of creating iosys_map type for the
>> destination, new variant is created to accept the address directly.

Well that is pretty much a NAK. iosys_map was created to avoid exactly 
this kind of usage.

Creating a temporary iosys_map instance on the stack should have 
basically no overhead at all.

>>
>> Also a new function is exported in drm_cache.c to find if the fast copy
>> is supported by the platform or not. It is required for i915.
>>
>> v2: Added a new argument to drm_memcpy_from_wc_vaddr() which provides
>> the offset into the src address to start copy from.
>>
>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>> Cc: Maxime Ripard <mripard@kernel.org>
>> Cc: Thomas Zimmermann <tzimmermann@suse.de>
>> Cc: David Airlie <airlied@linux.ie>
>> Cc: Daniel Vetter <daniel@ffwll.ch>
>> Cc: Thomas Hellstr_m <thomas.hellstrom@linux.intel.com>
>
>            ^ utf-8 error?
>
>>
>> Signed-off-by: Balasubramani Vivekanandan 
>> <balasubramani.vivekanandan@intel.com>
>> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
>> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
>> ---
>> drivers/gpu/drm/drm_cache.c | 55 +++++++++++++++++++++++++++++++++++++
>> include/drm/drm_cache.h     |  3 ++
>> 2 files changed, 58 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
>> index 2e2545df3310..8c7af755f7bc 100644
>> --- a/drivers/gpu/drm/drm_cache.c
>> +++ b/drivers/gpu/drm/drm_cache.c
>> @@ -358,6 +358,55 @@ void drm_memcpy_from_wc(struct iosys_map *dst,
>> }
>> EXPORT_SYMBOL(drm_memcpy_from_wc);
>>
>> +/**
>> + * drm_memcpy_from_wc_vaddr - Perform the fastest available memcpy 
>> from a source
>> + * that may be WC to a destination in system memory.
>> + * @dst: The destination pointer
>> + * @src: The source pointer
>> + * @src_offset: The offset from which to copy
>> + * @len: The size of the area to transfer in bytes
>> + *
>> + * Same as drm_memcpy_from_wc except destination is accepted as 
>> system memory
>> + * address. Useful in situations where passing destination address 
>> as iosys_map
>> + * is simply an overhead and can be avoided.
>> + */
>> +void drm_memcpy_from_wc_vaddr(void *dst, const struct iosys_map *src,
>> +                  size_t src_offset, unsigned long len)
>> +{
>> +    const void *src_addr = src->is_iomem ?
>> +                (void const __force *)src->vaddr_iomem :
>> +                src->vaddr;
>
> checking this again... this is a layer violation. We do have some in
> the codebase, but it'd be good not to add more.

Yes, agree completely. That here is something we should try hard to avoid.

>
> later I'd like these memcpy variants to be moved iosys-map. I'm not sure
> if Thomas Zimmermann or Christian agree. Cc'ing them here.

Again yes, that sounds like a good idea to me as well.

>
> Maybe we could scan the codebase and add a function in iosys-map like:
>
>     void *iosys_map_get_ptr(struct iosys_map *map, bool is_lmem)
>
> ... which is similar to the ttm_kmap_obj_virtual() function we have in
> ttm.
>
> so the cases where the layer is violated at least they come from a
> single function call in iosys_map API rather than directly accessing
> is_iomem/vaddr_iomem/vaddr. Thomas/Christian, any opinion here?

As noted above creating an iosys_map instance on the stack has basically 
no overhead at all.

So we should really switch those functions over to using the structure 
directly.

Regards,
Christian.

>
> Lucas De Marchi
>
>> +
>> +    if (WARN_ON(in_interrupt())) {
>> +        iosys_map_memcpy_from(dst, src, src_offset, len);
>> +        return;
>> +    }
>> +
>> +    if (static_branch_likely(&has_movntdqa)) {
>> +        __drm_memcpy_from_wc(dst, src_addr + src_offset, len);
>> +        return;
>> +    }
>> +
>> +    iosys_map_memcpy_from(dst, src, src_offset, len);
>> +}
>> +EXPORT_SYMBOL(drm_memcpy_from_wc_vaddr);
>> +
>> +/*
>> + * drm_memcpy_fastcopy_supported - Returns if fast copy using 
>> non-temporal
>> + * instructions is supported
>> + *
>> + * Returns true if platform has support for fast copying from wc 
>> memory type
>> + * using non-temporal instructions. Else false.
>> + */
>> +bool drm_memcpy_fastcopy_supported(void)
>> +{
>> +    if (static_branch_likely(&has_movntdqa))
>> +        return true;
>> +
>> +    return false;
>> +}
>> +EXPORT_SYMBOL(drm_memcpy_fastcopy_supported);
>> +
>> /*
>>  * drm_memcpy_init_early - One time initialization of the WC memcpy code
>>  */
>> @@ -382,6 +431,12 @@ void drm_memcpy_from_wc(struct iosys_map *dst,
>> }
>> EXPORT_SYMBOL(drm_memcpy_from_wc);
>>
>> +bool drm_memcpy_fastcopy_supported(void)
>> +{
>> +    return false;
>> +}
>> +EXPORT_SYMBOL(drm_memcpy_fastcopy_supported);
>> +
>> void drm_memcpy_init_early(void)
>> {
>> }
>> diff --git a/include/drm/drm_cache.h b/include/drm/drm_cache.h
>> index 22deb216b59c..d1b57c84a659 100644
>> --- a/include/drm/drm_cache.h
>> +++ b/include/drm/drm_cache.h
>> @@ -77,4 +77,7 @@ void drm_memcpy_init_early(void);
>> void drm_memcpy_from_wc(struct iosys_map *dst,
>>             const struct iosys_map *src,
>>             unsigned long len);
>> +bool drm_memcpy_fastcopy_supported(void);
>> +void drm_memcpy_from_wc_vaddr(void *dst, const struct iosys_map *src,
>> +                  size_t src_offset, unsigned long len);
>> #endif
>> -- 
>> 2.25.1
>>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c
index 2e2545df3310..8c7af755f7bc 100644
--- a/drivers/gpu/drm/drm_cache.c
+++ b/drivers/gpu/drm/drm_cache.c
@@ -358,6 +358,55 @@  void drm_memcpy_from_wc(struct iosys_map *dst,
 }
 EXPORT_SYMBOL(drm_memcpy_from_wc);
 
+/**
+ * drm_memcpy_from_wc_vaddr - Perform the fastest available memcpy from a source
+ * that may be WC to a destination in system memory.
+ * @dst: The destination pointer
+ * @src: The source pointer
+ * @src_offset: The offset from which to copy
+ * @len: The size of the area to transfer in bytes
+ *
+ * Same as drm_memcpy_from_wc except destination is accepted as system memory
+ * address. Useful in situations where passing destination address as iosys_map
+ * is simply an overhead and can be avoided.
+ */
+void drm_memcpy_from_wc_vaddr(void *dst, const struct iosys_map *src,
+			      size_t src_offset, unsigned long len)
+{
+	const void *src_addr = src->is_iomem ?
+				(void const __force *)src->vaddr_iomem :
+				src->vaddr;
+
+	if (WARN_ON(in_interrupt())) {
+		iosys_map_memcpy_from(dst, src, src_offset, len);
+		return;
+	}
+
+	if (static_branch_likely(&has_movntdqa)) {
+		__drm_memcpy_from_wc(dst, src_addr + src_offset, len);
+		return;
+	}
+
+	iosys_map_memcpy_from(dst, src, src_offset, len);
+}
+EXPORT_SYMBOL(drm_memcpy_from_wc_vaddr);
+
+/*
+ * drm_memcpy_fastcopy_supported - Returns if fast copy using non-temporal
+ * instructions is supported
+ *
+ * Returns true if platform has support for fast copying from wc memory type
+ * using non-temporal instructions. Else false.
+ */
+bool drm_memcpy_fastcopy_supported(void)
+{
+	if (static_branch_likely(&has_movntdqa))
+		return true;
+
+	return false;
+}
+EXPORT_SYMBOL(drm_memcpy_fastcopy_supported);
+
 /*
  * drm_memcpy_init_early - One time initialization of the WC memcpy code
  */
@@ -382,6 +431,12 @@  void drm_memcpy_from_wc(struct iosys_map *dst,
 }
 EXPORT_SYMBOL(drm_memcpy_from_wc);
 
+bool drm_memcpy_fastcopy_supported(void)
+{
+	return false;
+}
+EXPORT_SYMBOL(drm_memcpy_fastcopy_supported);
+
 void drm_memcpy_init_early(void)
 {
 }
diff --git a/include/drm/drm_cache.h b/include/drm/drm_cache.h
index 22deb216b59c..d1b57c84a659 100644
--- a/include/drm/drm_cache.h
+++ b/include/drm/drm_cache.h
@@ -77,4 +77,7 @@  void drm_memcpy_init_early(void);
 void drm_memcpy_from_wc(struct iosys_map *dst,
 			const struct iosys_map *src,
 			unsigned long len);
+bool drm_memcpy_fastcopy_supported(void);
+void drm_memcpy_from_wc_vaddr(void *dst, const struct iosys_map *src,
+			      size_t src_offset, unsigned long len);
 #endif