mbox series

[v10,0/4] Replace fallback for IO memcpy and IO memset

Message ID 20241021133154.516847-1-jvetter@kalrayinc.com (mailing list archive)
Headers show
Series Replace fallback for IO memcpy and IO memset | expand

Message

Julian Vetter Oct. 21, 2024, 1:31 p.m. UTC
Thank you again for your remarks Arnd and Christoph! I have updated the
patchset, and placed the functions directly in asm-generic/io.h. I have
dropped the libs/iomem_copy.c and have updated/clarified the commit
message in the first patch.

Signed-off-by: Julian Vetter <jvetter@kalrayinc.com>
---
Changes for v10:
- Removed libs/iomem_copy.c again
- Replaced the three functions in asm-generic/io.h directly
- Updated description for the first patch to clarify the matter
- Slightly updated description in patches 2 to 4

Changes for v9:
- Moved functions into a new file iomem_copy.c which is built
  unconditionally
- Guard prototypes with '#ifndef memcpy_fromio', etc.
- Dropped patches 5 to 14 for now. I will send some of the changes in
  separate patches or patchsets to the appropriate mailinglists
- Added proper reviewed-by and acked-by to arm64 and csky patches

Changes for v8:
- Dropped the arch/um patch that adds dummy implementations for IO
  memcpy functions
- Added 3 new patches that fix the dependency problem for UM (added
  dependencies on HAS_IOMEM || INDIRECT_IOMEM)
- Added new patch for s390 to internally call the zpci_memcpy functions
  and not the generic ones from libs/iomap_copy.c
- Addressed reviewer comments and replaced 2 or 3 shifts by
  'qc *= ~0UL / 0xff;'
- Addressed reviewer comments on pasrisc (masking the int value)
- Addressed reviewer comments on alpha (masking the int value)

Changes for v7:
- Added dummy implementations for memcpy_{to,from}io and memset_io on um
  architecture so drivers that use these functions build for um
- Replaced all accesses and checks by long type
- Added function prototypes as extern to asm-generic/io.h
- Removed '__' from the 3 new function names
- Some archs implement their own version of these IO functions with
  slightly different prototypes. So, I added 3 new patches to align
  prototypes with new ones in iomap_copy.c + io.h

Changes for v6:
- Added include of linux/align.h to fix build on arm arch
- Replaced compile-time check by ifdef for the CONFIG_64BIT otherwise we
  get a warning for the 'qc << 32' for archs with 32bit int types
- Suffixed arch commits by arch name

Changes for v5:
- Added functions to iomap_copy.c as proposed by Arndt
- Removed again the new io_copy.c and related objects
- Removed GENERIC_IO_COPY symbol and instead rely on the existing
  HAS_IOMEM symbol
- Added prototypes of __memcpy_{to,from}io and __memset_io functions to
  asm-generic/io.h

Changes for v4:
- Replaced memcpy/memset in asm-generic/io.h by the new
  __memcpy_{to,from}io and __memset_io, so individual architectures can
  use it instead of using their own implementation.

Changes for v3:
- Replaced again 'if(IS_ENABLED(CONFIG_64BIT))' by '#ifdef CONFIG_64BIT'
  because on 32bit architectures (e.g., csky), __raw_{read,write}q are
  not defined. So, it leads to compilation errors

Changes for v2:
- Renamed io.c -> io_copy.c
- Updated flag to 'GENERIC_IO_COPY'
- Replaced pointer dereferences by 'put_unaligned()'/'get_unaligned()'
- Replaced '#ifdef CONFIG_64BIT' by 'if(IS_ENABLED(CONFIG_64BIT))'
- Removed '__raw_{read,write}_native' and replaced by
  'if(IS_ENABLED(CONFIG_64BIT))' -> '__raw_write{l,q}'
---
Julian Vetter (4):
  Replace fallback for IO memcpy and IO memset
  arm64: Use new fallback IO memcpy/memset
  csky: Use new fallback IO memcpy/memset
  loongarch: Use new fallback IO memcpy/memset

 arch/arm64/include/asm/io.h     |  11 ---
 arch/arm64/kernel/io.c          |  87 ------------------------
 arch/csky/include/asm/io.h      |  11 ---
 arch/csky/kernel/Makefile       |   2 +-
 arch/csky/kernel/io.c           |  91 -------------------------
 arch/loongarch/include/asm/io.h |  10 ---
 arch/loongarch/kernel/Makefile  |   2 +-
 arch/loongarch/kernel/io.c      |  94 --------------------------
 include/asm-generic/io.h        | 116 ++++++++++++++++++++++++++------
 9 files changed, 98 insertions(+), 326 deletions(-)
 delete mode 100644 arch/csky/kernel/io.c
 delete mode 100644 arch/loongarch/kernel/io.c

Comments

David Laight Oct. 21, 2024, 2:16 p.m. UTC | #1
From: Julian Vetter
> Sent: 21 October 2024 14:32
> 
> Thank you again for your remarks Arnd and Christoph! I have updated the
> patchset, and placed the functions directly in asm-generic/io.h. I have
> dropped the libs/iomem_copy.c and have updated/clarified the commit
> message in the first patch.

Apart from build 'issues' what is the justification for inlining
these functions?

They are quite large for inlining and some drivers could easily
call them many times.

The I/O cycles themselves are likely to be slow enough that
the cost of a function call is pretty much likely to be noise.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Arnd Bergmann Oct. 25, 2024, 2:20 p.m. UTC | #2
On Mon, Oct 21, 2024, at 14:16, David Laight wrote:
> From: Julian Vetter
>> Sent: 21 October 2024 14:32
>> 
>> Thank you again for your remarks Arnd and Christoph! I have updated the
>> patchset, and placed the functions directly in asm-generic/io.h. I have
>> dropped the libs/iomem_copy.c and have updated/clarified the commit
>> message in the first patch.
>
> Apart from build 'issues' what is the justification for inlining
> these functions?

I think I wasn't clear enough with my previous comment, and Julian
just misunderstood what I was asking him to do. Sorry about causing
extra work here.

> They are quite large for inlining and some drivers could easily
> call them many times.
>
> The I/O cycles themselves are likely to be slow enough that
> the cost of a function call is pretty much likely to be noise.

I'm not overly worried about the this, as the functions are
not that big and there are not that many callers. If a file
contains multiple calls to this function, we can expect the
compiler to be smart enough to keep it out of line, though it
still gets duplicated in each driver calling it.

The bit that I am worried about however is the extra #include
for linux/unaligned.h that pulls in fairly large headers
and may lead to circular header dependencies.

To be clear: what I had expected here was to not have any
changes to the v9 version of lib/iomem_copy.c and to simplify
the asm-generic/io.h change to the version below.

       Arnd

---
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -1211,7 +1211,6 @@ static inline void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
 #endif
 
 #ifndef memset_io
-#define memset_io memset_io
 /**
  * memset_io   Set a range of I/O memory to a constant value
  * @addr:      The beginning of the I/O-memory range to set
@@ -1220,15 +1219,10 @@ static inline void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
  *
  * Set a range of I/O memory to a given value.
  */
-static inline void memset_io(volatile void __iomem *addr, int value,
-                            size_t size)
-{
-       memset(__io_virt(addr), value, size);
-}
+void memset_io(volatile void __iomem *addr, int value, size_t size);
 #endif
 
 #ifndef memcpy_fromio
-#define memcpy_fromio memcpy_fromio
 /**
  * memcpy_fromio       Copy a block of data from I/O memory
  * @dst:               The (RAM) destination for the copy
@@ -1237,16 +1231,11 @@ static inline void memset_io(volatile void __iomem *addr, int value,
  *
  * Copy a block of data from I/O memory.
  */
-static inline void memcpy_fromio(void *buffer,
-                                const volatile void __iomem *addr,
-                                size_t size)
-{
-       memcpy(buffer, __io_virt(addr), size);
-}
+void memcpy_fromio(void *buffer, const volatile void __iomem *addr,
+                  size_t size);
 #endif
 
 #ifndef memcpy_toio
-#define memcpy_toio memcpy_toio
 /**
  * memcpy_toio         Copy a block of data into I/O memory
  * @dst:               The (I/O memory) destination for the copy
@@ -1255,11 +1244,8 @@ static inline void memcpy_fromio(void *buffer,
  *
  * Copy a block of data to I/O memory.
  */
-static inline void memcpy_toio(volatile void __iomem *addr, const void *buffer,
-                              size_t size)
-{
-       memcpy(__io_virt(addr), buffer, size);
-}
+void memcpy_toio(volatile void __iomem *addr, const void *buffer,
+                size_t size);
 #endif
 
 extern int devmem_is_allowed(unsigned long pfn);