mbox series

[0/7] drm/i915: Use the memcpy_from_wc function from drm

Message ID 20220222145206.76118-1-balasubramani.vivekanandan@intel.com (mailing list archive)
Headers show
Series drm/i915: Use the memcpy_from_wc function from drm | expand

Message

Vivekanandan, Balasubramani Feb. 22, 2022, 2:51 p.m. UTC
drm_memcpy_from_wc() performs fast copy from WC memory type using
non-temporal instructions. Now there are two similar implementations of
this function. One exists in drm_cache.c as drm_memcpy_from_wc() and
another implementation in i915/i915_memcpy.c as i915_memcpy_from_wc().
drm_memcpy_from_wc() was the recent addition through the series
https://patchwork.freedesktop.org/patch/436276/?series=90681&rev=6

The goal of this patch series is to change all users of
i915_memcpy_from_wc() to drm_memcpy_from_wc() and a have common
implementation in drm and eventually remove the copy from i915.

Another benefit of using memcpy functions from drm is that
drm_memcpy_from_wc() is available for non-x86 architectures.
i915_memcpy_from_wc() is implemented only for x86 and prevents building
i915 for ARM64.
drm_memcpy_from_wc() does fast copy using non-temporal instructions for
x86 and for other architectures makes use of memcpy() family of
functions as fallback.

Another major difference is unlike i915_memcpy_from_wc(),
drm_memcpy_from_wc() will not fail if the passed address argument is not
alignment to be used with non-temporal load instructions or if the
platform lacks support for those instructions (non-temporal load
instructions are provided through SSE4.1 instruction set extension).
Instead drm_memcpy_from_wc() continues with fallback functions to
complete the copy.
This relieves the caller from checking the return value of
i915_memcpy_from_wc() and explicitly using a fallback.

Follow up series will be created to remove the memcpy_from_wc functions
from i915 once the dependency is completely removed.

Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com> 
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Chris Wilson <chris.p.wilson@intel.com> 
Cc: Thomas Hellstr_m <thomas.hellstrom@linux.intel.com> 
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>

Balasubramani Vivekanandan (7):
  drm: Relax alignment constraint for destination address
  drm: Add drm_memcpy_from_wc() variant which accepts destination
    address
  drm/i915: use the memcpy_from_wc call from the drm
  drm/i915/guc: use the memcpy_from_wc call from the drm
  drm/i915/selftests: use the memcpy_from_wc call from the drm
  drm/i915/gt: Avoid direct dereferencing of io memory
  drm/i915: Avoid dereferencing io mapped memory

 drivers/gpu/drm/drm_cache.c                   | 98 +++++++++++++++++--
 drivers/gpu/drm/i915/gem/i915_gem_object.c    |  8 +-
 drivers/gpu/drm/i915/gt/selftest_reset.c      | 21 ++--
 drivers/gpu/drm/i915/gt/uc/intel_guc_log.c    | 11 ++-
 drivers/gpu/drm/i915/i915_gpu_error.c         | 45 +++++----
 .../drm/i915/selftests/intel_memory_region.c  |  8 +-
 include/drm/drm_cache.h                       |  3 +
 7 files changed, 148 insertions(+), 46 deletions(-)

Comments

Nirmoy Das Feb. 23, 2022, 9:02 a.m. UTC | #1
On 22/02/2022 15:51, Balasubramani Vivekanandan wrote:
> drm_memcpy_from_wc() performs fast copy from WC memory type using
> non-temporal instructions. Now there are two similar implementations of
> this function. One exists in drm_cache.c as drm_memcpy_from_wc() and
> another implementation in i915/i915_memcpy.c as i915_memcpy_from_wc().
> drm_memcpy_from_wc() was the recent addition through the series
> https://patchwork.freedesktop.org/patch/436276/?series=90681&rev=6
>
> The goal of this patch series is to change all users of
> i915_memcpy_from_wc() to drm_memcpy_from_wc() and a have common
> implementation in drm and eventually remove the copy from i915.
>
> Another benefit of using memcpy functions from drm is that
> drm_memcpy_from_wc() is available for non-x86 architectures.
> i915_memcpy_from_wc() is implemented only for x86 and prevents building
> i915 for ARM64.
> drm_memcpy_from_wc() does fast copy using non-temporal instructions for
> x86 and for other architectures makes use of memcpy() family of
> functions as fallback.
>
> Another major difference is unlike i915_memcpy_from_wc(),
> drm_memcpy_from_wc() will not fail if the passed address argument is not
> alignment to be used with non-temporal load instructions or if the
> platform lacks support for those instructions (non-temporal load
> instructions are provided through SSE4.1 instruction set extension).
> Instead drm_memcpy_from_wc() continues with fallback functions to
> complete the copy.
> This relieves the caller from checking the return value of
> i915_memcpy_from_wc() and explicitly using a fallback.
>
> Follow up series will be created to remove the memcpy_from_wc functions
> from i915 once the dependency is completely removed.

Overall the series looks good to me but I think you can add another 
patch to remove

i915_memcpy_from_wc() as I don't see any other usages left after this series, may be I
am missing something?

Regards,
Nirmoy

>
> Cc: Jani Nikula <jani.nikula@intel.com>
> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> Cc: David Airlie <airlied@linux.ie>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: Chris Wilson <chris.p.wilson@intel.com>
> Cc: Thomas Hellstr_m <thomas.hellstrom@linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>
> Balasubramani Vivekanandan (7):
>    drm: Relax alignment constraint for destination address
>    drm: Add drm_memcpy_from_wc() variant which accepts destination
>      address
>    drm/i915: use the memcpy_from_wc call from the drm
>    drm/i915/guc: use the memcpy_from_wc call from the drm
>    drm/i915/selftests: use the memcpy_from_wc call from the drm
>    drm/i915/gt: Avoid direct dereferencing of io memory
>    drm/i915: Avoid dereferencing io mapped memory
>
>   drivers/gpu/drm/drm_cache.c                   | 98 +++++++++++++++++--
>   drivers/gpu/drm/i915/gem/i915_gem_object.c    |  8 +-
>   drivers/gpu/drm/i915/gt/selftest_reset.c      | 21 ++--
>   drivers/gpu/drm/i915/gt/uc/intel_guc_log.c    | 11 ++-
>   drivers/gpu/drm/i915/i915_gpu_error.c         | 45 +++++----
>   .../drm/i915/selftests/intel_memory_region.c  |  8 +-
>   include/drm/drm_cache.h                       |  3 +
>   7 files changed, 148 insertions(+), 46 deletions(-)
>
Vivekanandan, Balasubramani Feb. 23, 2022, 11:08 a.m. UTC | #2
On 23.02.2022 10:02, Das, Nirmoy wrote:
> 
> On 22/02/2022 15:51, Balasubramani Vivekanandan wrote:
> > drm_memcpy_from_wc() performs fast copy from WC memory type using
> > non-temporal instructions. Now there are two similar implementations of
> > this function. One exists in drm_cache.c as drm_memcpy_from_wc() and
> > another implementation in i915/i915_memcpy.c as i915_memcpy_from_wc().
> > drm_memcpy_from_wc() was the recent addition through the series
> > https://patchwork.freedesktop.org/patch/436276/?series=90681&rev=6
> > 
> > The goal of this patch series is to change all users of
> > i915_memcpy_from_wc() to drm_memcpy_from_wc() and a have common
> > implementation in drm and eventually remove the copy from i915.
> > 
> > Another benefit of using memcpy functions from drm is that
> > drm_memcpy_from_wc() is available for non-x86 architectures.
> > i915_memcpy_from_wc() is implemented only for x86 and prevents building
> > i915 for ARM64.
> > drm_memcpy_from_wc() does fast copy using non-temporal instructions for
> > x86 and for other architectures makes use of memcpy() family of
> > functions as fallback.
> > 
> > Another major difference is unlike i915_memcpy_from_wc(),
> > drm_memcpy_from_wc() will not fail if the passed address argument is not
> > alignment to be used with non-temporal load instructions or if the
> > platform lacks support for those instructions (non-temporal load
> > instructions are provided through SSE4.1 instruction set extension).
> > Instead drm_memcpy_from_wc() continues with fallback functions to
> > complete the copy.
> > This relieves the caller from checking the return value of
> > i915_memcpy_from_wc() and explicitly using a fallback.
> > 
> > Follow up series will be created to remove the memcpy_from_wc functions
> > from i915 once the dependency is completely removed.
> 
> Overall the series looks good to me but I think you can add another patch to
> remove
> 
> i915_memcpy_from_wc() as I don't see any other usages left after this series, may be I
> am missing something?

I have changed all users of i915_memcpy_from_wc() to drm function. But
this is another function i915_unaligned_memcpy_from_wc() in
i915_memcpy.c which is blocking completely eliminating the i915_memcpy.c
file from i915.
This function accepts unaligned source address and does fast copy only
for the aligned region of memory and remaining part is copied using
memcpy function.
Either I can move i915_unaligned_memcpy_from_wc() also to drm but I am
concerned since it is more a platform specific handling, does it make
sense to keep it in drm.
Else I have retain to i915_unaligned_memcpy_from_wc() inside i915 and
refactor the function to use drm_memcpy_from_wc() instead of the
__memcpy_ntdqu().
But before I could do more changes, I wanted feedback on the current
change. So I decided to go ahead with creating series for review.

Regards,
Bala

> 
> Regards,
> Nirmoy
> 
> > 
> > Cc: Jani Nikula <jani.nikula@intel.com>
> > Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> > Cc: David Airlie <airlied@linux.ie>
> > Cc: Daniel Vetter <daniel@ffwll.ch>
> > Cc: Chris Wilson <chris.p.wilson@intel.com>
> > Cc: Thomas Hellstr_m <thomas.hellstrom@linux.intel.com>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> > 
> > Balasubramani Vivekanandan (7):
> >    drm: Relax alignment constraint for destination address
> >    drm: Add drm_memcpy_from_wc() variant which accepts destination
> >      address
> >    drm/i915: use the memcpy_from_wc call from the drm
> >    drm/i915/guc: use the memcpy_from_wc call from the drm
> >    drm/i915/selftests: use the memcpy_from_wc call from the drm
> >    drm/i915/gt: Avoid direct dereferencing of io memory
> >    drm/i915: Avoid dereferencing io mapped memory
> > 
> >   drivers/gpu/drm/drm_cache.c                   | 98 +++++++++++++++++--
> >   drivers/gpu/drm/i915/gem/i915_gem_object.c    |  8 +-
> >   drivers/gpu/drm/i915/gt/selftest_reset.c      | 21 ++--
> >   drivers/gpu/drm/i915/gt/uc/intel_guc_log.c    | 11 ++-
> >   drivers/gpu/drm/i915/i915_gpu_error.c         | 45 +++++----
> >   .../drm/i915/selftests/intel_memory_region.c  |  8 +-
> >   include/drm/drm_cache.h                       |  3 +
> >   7 files changed, 148 insertions(+), 46 deletions(-)
> >
Nirmoy Das Feb. 23, 2022, 1:21 p.m. UTC | #3
On 23/02/2022 12:08, Balasubramani Vivekanandan wrote:
> On 23.02.2022 10:02, Das, Nirmoy wrote:
>> On 22/02/2022 15:51, Balasubramani Vivekanandan wrote:
>>> drm_memcpy_from_wc() performs fast copy from WC memory type using
>>> non-temporal instructions. Now there are two similar implementations of
>>> this function. One exists in drm_cache.c as drm_memcpy_from_wc() and
>>> another implementation in i915/i915_memcpy.c as i915_memcpy_from_wc().
>>> drm_memcpy_from_wc() was the recent addition through the series
>>> https://patchwork.freedesktop.org/patch/436276/?series=90681&rev=6
>>>
>>> The goal of this patch series is to change all users of
>>> i915_memcpy_from_wc() to drm_memcpy_from_wc() and a have common
>>> implementation in drm and eventually remove the copy from i915.
>>>
>>> Another benefit of using memcpy functions from drm is that
>>> drm_memcpy_from_wc() is available for non-x86 architectures.
>>> i915_memcpy_from_wc() is implemented only for x86 and prevents building
>>> i915 for ARM64.
>>> drm_memcpy_from_wc() does fast copy using non-temporal instructions for
>>> x86 and for other architectures makes use of memcpy() family of
>>> functions as fallback.
>>>
>>> Another major difference is unlike i915_memcpy_from_wc(),
>>> drm_memcpy_from_wc() will not fail if the passed address argument is not
>>> alignment to be used with non-temporal load instructions or if the
>>> platform lacks support for those instructions (non-temporal load
>>> instructions are provided through SSE4.1 instruction set extension).
>>> Instead drm_memcpy_from_wc() continues with fallback functions to
>>> complete the copy.
>>> This relieves the caller from checking the return value of
>>> i915_memcpy_from_wc() and explicitly using a fallback.
>>>
>>> Follow up series will be created to remove the memcpy_from_wc functions
>>> from i915 once the dependency is completely removed.
>> Overall the series looks good to me but I think you can add another patch to
>> remove
>>
>> i915_memcpy_from_wc() as I don't see any other usages left after this series, may be I
>> am missing something?
> I have changed all users of i915_memcpy_from_wc() to drm function. But
> this is another function i915_unaligned_memcpy_from_wc() in
> i915_memcpy.c which is blocking completely eliminating the i915_memcpy.c
> file from i915.
> This function accepts unaligned source address and does fast copy only
> for the aligned region of memory and remaining part is copied using
> memcpy function.
> Either I can move i915_unaligned_memcpy_from_wc() also to drm but I am
> concerned since it is more a platform specific handling, does it make
> sense to keep it in drm.
> Else I have retain to i915_unaligned_memcpy_from_wc() inside i915 and
> refactor the function to use drm_memcpy_from_wc() instead of the
> __memcpy_ntdqu().


I think for completeness it makes sense to remove i915_memcpy_from_wc() 
and its helper functions

in this series.  I don't think we can have 
i915_unaligned_memcpy_from_wc() if want i915 on ARM[0] so I think

you can remove usages of i915_unaligned_memcpy_from_wc() as well.


[0]IIUC  CI_BUG_ON() check in i915_unaligned_memcpy_from_wc() will 
raise  a build error on ARM


Regards,

Nirmoy


> But before I could do more changes, I wanted feedback on the current
> change. So I decided to go ahead with creating series for review.
>
> Regards,
> Bala
>
>> Regards,
>> Nirmoy
>>
>>> Cc: Jani Nikula <jani.nikula@intel.com>
>>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>>> Cc: David Airlie <airlied@linux.ie>
>>> Cc: Daniel Vetter <daniel@ffwll.ch>
>>> Cc: Chris Wilson <chris.p.wilson@intel.com>
>>> Cc: Thomas Hellstr_m <thomas.hellstrom@linux.intel.com>
>>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>>>
>>> Balasubramani Vivekanandan (7):
>>>     drm: Relax alignment constraint for destination address
>>>     drm: Add drm_memcpy_from_wc() variant which accepts destination
>>>       address
>>>     drm/i915: use the memcpy_from_wc call from the drm
>>>     drm/i915/guc: use the memcpy_from_wc call from the drm
>>>     drm/i915/selftests: use the memcpy_from_wc call from the drm
>>>     drm/i915/gt: Avoid direct dereferencing of io memory
>>>     drm/i915: Avoid dereferencing io mapped memory
>>>
>>>    drivers/gpu/drm/drm_cache.c                   | 98 +++++++++++++++++--
>>>    drivers/gpu/drm/i915/gem/i915_gem_object.c    |  8 +-
>>>    drivers/gpu/drm/i915/gt/selftest_reset.c      | 21 ++--
>>>    drivers/gpu/drm/i915/gt/uc/intel_guc_log.c    | 11 ++-
>>>    drivers/gpu/drm/i915/i915_gpu_error.c         | 45 +++++----
>>>    .../drm/i915/selftests/intel_memory_region.c  |  8 +-
>>>    include/drm/drm_cache.h                       |  3 +
>>>    7 files changed, 148 insertions(+), 46 deletions(-)
>>>