ARM: optimize memset_io()/memcpy_fromio()/memcpy_toio()

Message ID	E1THMAC-0004nL-8B@rmk-PC.arm.linux.org.uk (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org> From: Russell King <rmk+kernel@arm.linux.org.uk> To: linux-arm-kernel@lists.infradead.org Subject: [PATCH] ARM: optimize memset_io()/memcpy_fromio()/memcpy_toio() Message-Id: <E1THMAC-0004nL-8B@rmk-PC.arm.linux.org.uk> Date: Thu, 27 Sep 2012 23:02:32 +0100 summary: Content analysis details: (-5.1 points) pts rule name description ---- ---------------------- -------------------------------------------------- -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at http://www.dnswl.org/, medium trust [78.32.30.218 listed in list.dnswl.org] -0.8 RP_MATCHES_RCVD Envelope sender domain matches handover relay domain -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: linux-arm-kernel-bounces@lists.infradead.org Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Russell King Sept. 27, 2012, 10:02 p.m. UTC

If we are building for a LE platform, and we haven't overriden the
MMIO ops, then we can optimize the mem*io operations using the
standard string functions.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
---
 arch/arm/include/asm/io.h |   25 +++++++++++++++++++++++++
 1 files changed, 25 insertions(+), 0 deletions(-)

Nicolas Pitre Sept. 28, 2012, 4:17 a.m. UTC | #1

On Thu, 27 Sep 2012, Russell King wrote:

> If we are building for a LE platform, and we haven't overriden the
> MMIO ops, then we can optimize the mem*io operations using the
> standard string functions.
> 
> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

We presume that the IO space is able to cope with a mixture of access 
width other than byte access which should be perfectly reasonable by 
default.  If so then...

Acked-by: Nicolas Pitre <nico@.linaro.org>

> ---
>  arch/arm/include/asm/io.h |   25 +++++++++++++++++++++++++
>  1 files changed, 25 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
> index 815c669..ed0577e 100644
> --- a/arch/arm/include/asm/io.h
> +++ b/arch/arm/include/asm/io.h
> @@ -23,6 +23,7 @@
>  
>  #ifdef __KERNEL__
>  
> +#include <linux/string.h>
>  #include <linux/types.h>
>  #include <asm/byteorder.h>
>  #include <asm/memory.h>
> @@ -249,9 +250,33 @@ extern void _memset_io(volatile void __iomem *, int, size_t);
>  #define writesw(p,d,l)		__raw_writesw(p,d,l)
>  #define writesl(p,d,l)		__raw_writesl(p,d,l)
>  
> +#ifndef __ARMBE__
> +static inline void memset_io(volatile void __iomem *dst, unsigned c,
> +	size_t count)
> +{
> +	memset((void __force *)dst, c, count);
> +}
> +#define memset_io(dst,c,count) memset_io(dst,c,count)
> +
> +static inline void memcpy_fromio(void *to, const volatile void __iomem *from,
> +	size_t count)
> +{
> +	memcpy(to, (const void __force *)from, count);
> +}
> +#define memcpy_fromio(to,from,count) memcpy_fromio(to,from,count)
> +
> +static inline void memcpy_toio(volatile void __iomem *to, const void *from,
> +	size_t count)
> +{
> +	memcpy((void __force *)to, from, count);
> +}
> +#define memcpy_toio(to,from,count) memcpy_toio(to,from,count)
> +
> +#else
>  #define memset_io(c,v,l)	_memset_io(c,(v),(l))
>  #define memcpy_fromio(a,c,l)	_memcpy_fromio((a),c,(l))
>  #define memcpy_toio(c,a,l)	_memcpy_toio(c,(a),(l))
> +#endif
>  
>  #endif	/* readl */
>  
> -- 
> 1.7.4.4
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>

Will Deacon Sept. 28, 2012, 9:58 a.m. UTC | #2

On Fri, Sep 28, 2012 at 05:17:53AM +0100, Nicolas Pitre wrote:
> On Thu, 27 Sep 2012, Russell King wrote:
> 
> > If we are building for a LE platform, and we haven't overriden the
> > MMIO ops, then we can optimize the mem*io operations using the
> > standard string functions.
> > 
> > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> 
> We presume that the IO space is able to cope with a mixture of access 
> width other than byte access which should be perfectly reasonable by 
> default.  If so then...
> 
> Acked-by: Nicolas Pitre <nico@.linaro.org>

This looks pretty scary to me, but maybe I'm worrying too much. The first
thing to ensure is that the accesses are always aligned, which I believe is
true for the string operations. However, a quick glance at memset shows that
we do things like store multiple with writeback addressing modes. This is
bad for a few reasons:

	1. If an access other the first one generated by the instruction
	   causes an abort, the CPU will ultimately re-execute the earlier
	   accesses, which could be problematic to a device.

	2. Writeback addressing modes when accessing MMIO peripherals causes
	   serious performance problems with virtualisation, as I have
	   described before.

	3. We have to guarantee that no single instruction causes accesses
	   that span a page boundary, as this leads to UNPREDICTABLE
	   behaviour.

So, unless we can guarantee that our accesses are all aligned, will never
fault, do not cross a page boundary and we are not running as a guest then
I'd be inclined to stick with byte-by-byte implementations for these
functions.

Will

Russell King - ARM Linux Sept. 28, 2012, 10:31 a.m. UTC | #3

On Fri, Sep 28, 2012 at 10:58:08AM +0100, Will Deacon wrote:
> On Fri, Sep 28, 2012 at 05:17:53AM +0100, Nicolas Pitre wrote:
> > On Thu, 27 Sep 2012, Russell King wrote:
> > 
> > > If we are building for a LE platform, and we haven't overriden the
> > > MMIO ops, then we can optimize the mem*io operations using the
> > > standard string functions.
> > > 
> > > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> > 
> > We presume that the IO space is able to cope with a mixture of access 
> > width other than byte access which should be perfectly reasonable by 
> > default.  If so then...
> > 
> > Acked-by: Nicolas Pitre <nico@.linaro.org>
> 
> This looks pretty scary to me, but maybe I'm worrying too much. The first
> thing to ensure is that the accesses are always aligned, which I believe is
> true for the string operations. However, a quick glance at memset shows that
> we do things like store multiple with writeback addressing modes. This is
> bad for a few reasons:
> 
> 	1. If an access other the first one generated by the instruction
> 	   causes an abort, the CPU will ultimately re-execute the earlier
> 	   accesses, which could be problematic to a device.

I don't think that's a problem for these.  They're used on RAM like
regions.

> 	2. Writeback addressing modes when accessing MMIO peripherals causes
> 	   serious performance problems with virtualisation, as I have
> 	   described before.

Well, virtualisation is in its infancy on ARM, and I don't think should
be worried about _too_ much when these operations are grossly unoptimized
for non-virtualised hardware.  The tradeoff is between grossly unoptimized
on non-virtualised hardware vs performance problems with virtualised
hardware.

> 	3. We have to guarantee that no single instruction causes accesses
> 	   that span a page boundary, as this leads to UNPREDICTABLE
> 	   behaviour.

We do accesses in memset() 16-bytes at a time, so to guarantee that we
need to ensure that the pointer passed in was 16-byte aligned.  I'm not
sure that we can guarantee that in every case.

> So, unless we can guarantee that our accesses are all aligned, will never
> fault, do not cross a page boundary and we are not running as a guest then
> I'd be inclined to stick with byte-by-byte implementations for these
> functions.

Well, that rather sucks if you're memset_io'ing various sizes (in
megabytes - up to 8MB) of video memory.  We desperately need these
functions optimized.

Either that or we allow DRM to be a security hole by omitting any kind
of buffer clearing, because using the existing memset_io() is just far
too expensive to clear 8MB a byte at a time.

Will Deacon Sept. 28, 2012, 10:42 a.m. UTC | #4

On Fri, Sep 28, 2012 at 11:31:10AM +0100, Russell King - ARM Linux wrote:
> On Fri, Sep 28, 2012 at 10:58:08AM +0100, Will Deacon wrote:
> > 	1. If an access other the first one generated by the instruction
> > 	   causes an abort, the CPU will ultimately re-execute the earlier
> > 	   accesses, which could be problematic to a device.
> 
> I don't think that's a problem for these.  They're used on RAM like
> regions.

[...]

> > So, unless we can guarantee that our accesses are all aligned, will never
> > fault, do not cross a page boundary and we are not running as a guest then
> > I'd be inclined to stick with byte-by-byte implementations for these
> > functions.
> 
> Well, that rather sucks if you're memset_io'ing various sizes (in
> megabytes - up to 8MB) of video memory.  We desperately need these
> functions optimized.

How are these regions mapped? If it's something like ioremap_wc, then this
gives us normal non-cacheable memory on ARMv7, which wouldn't be subject to
the restrictions that you have with device memory. 

> Either that or we allow DRM to be a security hole by omitting any kind
> of buffer clearing, because using the existing memset_io() is just far
> too expensive to clear 8MB a byte at a time.

If DRM is using normal memory, it sounds like the *_io string functions need
to be made to work with different memory types.

Will

Russell King - ARM Linux Sept. 28, 2012, 10:44 a.m. UTC | #5

On Fri, Sep 28, 2012 at 11:42:16AM +0100, Will Deacon wrote:
> How are these regions mapped? If it's something like ioremap_wc, then this
> gives us normal non-cacheable memory on ARMv7, which wouldn't be subject to
> the restrictions that you have with device memory. 

Yes, ioremap_wc() but we can't guarantee that for all causes of memset_io.

> > Either that or we allow DRM to be a security hole by omitting any kind
> > of buffer clearing, because using the existing memset_io() is just far
> > too expensive to clear 8MB a byte at a time.
> 
> If DRM is using normal memory, it sounds like the *_io string functions need
> to be made to work with different memory types.

Take a look at the *_io string function declarations, and observe the use
of __iomem.  That means they're only used with memory returned from the
ioremap*() suite of functions (or they're being used non-compliantly and
sparse should be spitting warnings.)

Arnd Bergmann Sept. 28, 2012, 1:36 p.m. UTC | #6

On Thursday 27 September 2012, Russell King wrote:
> +#ifndef __ARMBE__
> +static inline void memset_io(volatile void __iomem *dst, unsigned c,
> +       size_t count)
> +{
> +       memset((void __force *)dst, c, count);
> +}
> +#define memset_io(dst,c,count) memset_io(dst,c,count)
> +
> +static inline void memcpy_fromio(void *to, const volatile void __iomem *from,
> +       size_t count)
> +{
> +       memcpy(to, (const void __force *)from, count);
> +}
> +#define memcpy_fromio(to,from,count) memcpy_fromio(to,from,count)
> +

I wonder whether we need any barriers here. PowerPC has the strongest
barriers befor eand after the access, at least since f007cacffc8870
"[POWERPC] Fix MMIO ops to provide expected barrier behaviour".

This seems to be done more out of caution and trying to mimic the
x86 behavior (where all MMIO accesses are synchronous so some degree)
than fixing a particular bug that was observed.

Most other architectures are just using the raw string operations like
you do here.

Some architectures (powerpc and parisc at least) split the access into
naturally aligned accesses of up to 4 bytes. I think those are the
ones that allow unaligned accesses on RAM but not on MMIO pointers
(powerpc will checkstop or throw an exception).

arm64 actually copies the arm variant, and I suspect we can use the
simpler version there as well.

	Arnd

Catalin Marinas Sept. 28, 2012, 2:13 p.m. UTC | #7

On 28 September 2012 14:36, Arnd Bergmann <arnd@arndb.de> wrote:
> On Thursday 27 September 2012, Russell King wrote:
>> +#ifndef __ARMBE__
>> +static inline void memset_io(volatile void __iomem *dst, unsigned c,
>> +       size_t count)
>> +{
>> +       memset((void __force *)dst, c, count);
>> +}
>> +#define memset_io(dst,c,count) memset_io(dst,c,count)
>> +
>> +static inline void memcpy_fromio(void *to, const volatile void __iomem *from,
>> +       size_t count)
>> +{
>> +       memcpy(to, (const void __force *)from, count);
>> +}
>> +#define memcpy_fromio(to,from,count) memcpy_fromio(to,from,count)
>> +
>
> I wonder whether we need any barriers here. PowerPC has the strongest
> barriers befor eand after the access, at least since f007cacffc8870
> "[POWERPC] Fix MMIO ops to provide expected barrier behaviour".

In theory, we need barriers but only at the beginning and the end of
the operation, not for every iteration (and in practice we might even
be OK without any). The device accesses are ordered with each-other by
hardware, it's only when you need them to be ordered with other types
or memory accesses (or even other devices).

> arm64 actually copies the arm variant, and I suspect we can use the
> simpler version there as well.

For arm64 I have a set of optimised library functions which I left out
from the upstream branch just to make the review easier. I'll post the
patches and push them once the core code is upstream. But the memcpy()
implementation does not perform pointer alignment (it may at some
point, based on benchmarks), so not usable with I/O.

The simplest optimisation for arm64 (already suggested by Olof and on
my to-do list) is  something like the powerpc implementation but using
write*_relaxed() to avoid barriers in a loop. The performance of the
compiled code shouldn't be far from hand-written assembly, especially
on ARMv8 with more out of order execution.

Russell King - ARM Linux Sept. 28, 2012, 2:29 p.m. UTC | #8

On Fri, Sep 28, 2012 at 03:13:42PM +0100, Catalin Marinas wrote:
> On 28 September 2012 14:36, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Thursday 27 September 2012, Russell King wrote:
> >> +#ifndef __ARMBE__
> >> +static inline void memset_io(volatile void __iomem *dst, unsigned c,
> >> +       size_t count)
> >> +{
> >> +       memset((void __force *)dst, c, count);
> >> +}
> >> +#define memset_io(dst,c,count) memset_io(dst,c,count)
> >> +
> >> +static inline void memcpy_fromio(void *to, const volatile void __iomem *from,
> >> +       size_t count)
> >> +{
> >> +       memcpy(to, (const void __force *)from, count);
> >> +}
> >> +#define memcpy_fromio(to,from,count) memcpy_fromio(to,from,count)
> >> +
> >
> > I wonder whether we need any barriers here. PowerPC has the strongest
> > barriers befor eand after the access, at least since f007cacffc8870
> > "[POWERPC] Fix MMIO ops to provide expected barrier behaviour".
> 
> In theory, we need barriers but only at the beginning and the end of
> the operation, not for every iteration (and in practice we might even
> be OK without any).

Yes, we at least need a barrier before each.

The reason for a barrier before is that the memory being accessed may
depend on a previous writel() hitting the device, so we need to ensure
proper ordering wrt. other writel()s.

I don't think we need a barrier after because these any future device
writes by the write[bwl]() accessors will have a barrier before them,
which will ensure proper ordering.

Dirk Behme June 5, 2013, 6:02 a.m. UTC | #9

On 28.09.2012 00:02, Russell King wrote:
> If we are building for a LE platform, and we haven't overriden the
> MMIO ops, then we can optimize the mem*io operations using the
> standard string functions.
>
> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> ---
>   arch/arm/include/asm/io.h |   25 +++++++++++++++++++++++++
>   1 files changed, 25 insertions(+), 0 deletions(-)
>
> diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
> index 815c669..ed0577e 100644
> --- a/arch/arm/include/asm/io.h
> +++ b/arch/arm/include/asm/io.h
> @@ -23,6 +23,7 @@
>
>   #ifdef __KERNEL__
>
> +#include <linux/string.h>
>   #include <linux/types.h>
>   #include <asm/byteorder.h>
>   #include <asm/memory.h>
> @@ -249,9 +250,33 @@ extern void _memset_io(volatile void __iomem *, int, size_t);
>   #define writesw(p,d,l)		__raw_writesw(p,d,l)
>   #define writesl(p,d,l)		__raw_writesl(p,d,l)
>
> +#ifndef __ARMBE__
> +static inline void memset_io(volatile void __iomem *dst, unsigned c,
> +	size_t count)
> +{
> +	memset((void __force *)dst, c, count);
> +}
> +#define memset_io(dst,c,count) memset_io(dst,c,count)
> +
> +static inline void memcpy_fromio(void *to, const volatile void __iomem *from,
> +	size_t count)
> +{
> +	memcpy(to, (const void __force *)from, count);
> +}
> +#define memcpy_fromio(to,from,count) memcpy_fromio(to,from,count)
> +
> +static inline void memcpy_toio(volatile void __iomem *to, const void *from,
> +	size_t count)
> +{
> +	memcpy((void __force *)to, from, count);
> +}
> +#define memcpy_toio(to,from,count) memcpy_toio(to,from,count)
> +
> +#else
>   #define memset_io(c,v,l)	_memset_io(c,(v),(l))
>   #define memcpy_fromio(a,c,l)	_memcpy_fromio((a),c,(l))
>   #define memcpy_toio(c,a,l)	_memcpy_toio(c,(a),(l))
> +#endif
>
>   #endif	/* readl */

I'd like to ask if there is any chance to get anything like this or the 
even older proposal

http://lists.infradead.org/pipermail/linux-arm-kernel/2009-November/004077.html

merged? Maybe an updated version incorporating the discussion conclusions?


In the discussion of

http://lists.infradead.org/pipermail/linux-arm-kernel/2009-November/004077.html

there have been some doubts if it's a good idea to enable it for all 
SoCs. Therefore that patch introduced a

config OPTIMIZED_IOMEM_MEMCPY

which would allow us to enable it for only known to be compatible SoCs. 
Just in case this helps to get it applied ;)


Many thanks and best regards

Dirk

Dirk Behme June 11, 2013, 6:16 p.m. UTC | #10

On 05.06.2013 08:02, Dirk Behme wrote:
> On 28.09.2012 00:02, Russell King wrote:
>> If we are building for a LE platform, and we haven't overriden the
>> MMIO ops, then we can optimize the mem*io operations using the
>> standard string functions.
>>
>> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
>> ---
>>   arch/arm/include/asm/io.h |   25 +++++++++++++++++++++++++
>>   1 files changed, 25 insertions(+), 0 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
>> index 815c669..ed0577e 100644
>> --- a/arch/arm/include/asm/io.h
>> +++ b/arch/arm/include/asm/io.h
>> @@ -23,6 +23,7 @@
>>
>>   #ifdef __KERNEL__
>>
>> +#include <linux/string.h>
>>   #include <linux/types.h>
>>   #include <asm/byteorder.h>
>>   #include <asm/memory.h>
>> @@ -249,9 +250,33 @@ extern void _memset_io(volatile void __iomem *,
>> int, size_t);
>>   #define writesw(p,d,l)        __raw_writesw(p,d,l)
>>   #define writesl(p,d,l)        __raw_writesl(p,d,l)
>>
>> +#ifndef __ARMBE__
>> +static inline void memset_io(volatile void __iomem *dst, unsigned c,
>> +    size_t count)
>> +{
>> +    memset((void __force *)dst, c, count);
>> +}
>> +#define memset_io(dst,c,count) memset_io(dst,c,count)
>> +
>> +static inline void memcpy_fromio(void *to, const volatile void
>> __iomem *from,
>> +    size_t count)
>> +{
>> +    memcpy(to, (const void __force *)from, count);
>> +}
>> +#define memcpy_fromio(to,from,count) memcpy_fromio(to,from,count)
>> +
>> +static inline void memcpy_toio(volatile void __iomem *to, const
>> void *from,
>> +    size_t count)
>> +{
>> +    memcpy((void __force *)to, from, count);
>> +}
>> +#define memcpy_toio(to,from,count) memcpy_toio(to,from,count)
>> +
>> +#else
>>   #define memset_io(c,v,l)    _memset_io(c,(v),(l))
>>   #define memcpy_fromio(a,c,l)    _memcpy_fromio((a),c,(l))
>>   #define memcpy_toio(c,a,l)    _memcpy_toio(c,(a),(l))
>> +#endif
>>
>>   #endif    /* readl */
>
> I'd like to ask if there is any chance to get anything like this or
> the even older proposal
>
> http://lists.infradead.org/pipermail/linux-arm-kernel/2009-November/004077.html
>
>
> merged? Maybe an updated version incorporating the discussion
> conclusions?
>
>
> In the discussion of
>
> http://lists.infradead.org/pipermail/linux-arm-kernel/2009-November/004077.html
>
>
> there have been some doubts if it's a good idea to enable it for all
> SoCs. Therefore that patch introduced a
>
> config OPTIMIZED_IOMEM_MEMCPY
>
> which would allow us to enable it for only known to be compatible
> SoCs. Just in case this helps to get it applied ;)

Any opinion on this?

Best regards

Dirk

Russell King - ARM Linux June 13, 2013, 11:05 a.m. UTC | #11

On Tue, Jun 11, 2013 at 08:16:18PM +0200, Dirk Behme wrote:
> On 05.06.2013 08:02, Dirk Behme wrote:
>> On 28.09.2012 00:02, Russell King wrote:
>>> If we are building for a LE platform, and we haven't overriden the
>>> MMIO ops, then we can optimize the mem*io operations using the
>>> standard string functions.
>>>
>>> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
>>> ---
>>>   arch/arm/include/asm/io.h |   25 +++++++++++++++++++++++++
>>>   1 files changed, 25 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
>>> index 815c669..ed0577e 100644
>>> --- a/arch/arm/include/asm/io.h
>>> +++ b/arch/arm/include/asm/io.h
>>> @@ -23,6 +23,7 @@
>>>
>>>   #ifdef __KERNEL__
>>>
>>> +#include <linux/string.h>
>>>   #include <linux/types.h>
>>>   #include <asm/byteorder.h>
>>>   #include <asm/memory.h>
>>> @@ -249,9 +250,33 @@ extern void _memset_io(volatile void __iomem *,
>>> int, size_t);
>>>   #define writesw(p,d,l)        __raw_writesw(p,d,l)
>>>   #define writesl(p,d,l)        __raw_writesl(p,d,l)
>>>
>>> +#ifndef __ARMBE__
>>> +static inline void memset_io(volatile void __iomem *dst, unsigned c,
>>> +    size_t count)
>>> +{
>>> +    memset((void __force *)dst, c, count);
>>> +}
>>> +#define memset_io(dst,c,count) memset_io(dst,c,count)
>>> +
>>> +static inline void memcpy_fromio(void *to, const volatile void
>>> __iomem *from,
>>> +    size_t count)
>>> +{
>>> +    memcpy(to, (const void __force *)from, count);
>>> +}
>>> +#define memcpy_fromio(to,from,count) memcpy_fromio(to,from,count)
>>> +
>>> +static inline void memcpy_toio(volatile void __iomem *to, const
>>> void *from,
>>> +    size_t count)
>>> +{
>>> +    memcpy((void __force *)to, from, count);
>>> +}
>>> +#define memcpy_toio(to,from,count) memcpy_toio(to,from,count)
>>> +
>>> +#else
>>>   #define memset_io(c,v,l)    _memset_io(c,(v),(l))
>>>   #define memcpy_fromio(a,c,l)    _memcpy_fromio((a),c,(l))
>>>   #define memcpy_toio(c,a,l)    _memcpy_toio(c,(a),(l))
>>> +#endif
>>>
>>>   #endif    /* readl */
>>
>> I'd like to ask if there is any chance to get anything like this or
>> the even older proposal
>>
>> http://lists.infradead.org/pipermail/linux-arm-kernel/2009-November/004077.html
>>
>>
>> merged? Maybe an updated version incorporating the discussion
>> conclusions?
>>
>>
>> In the discussion of
>>
>> http://lists.infradead.org/pipermail/linux-arm-kernel/2009-November/004077.html
>>
>>
>> there have been some doubts if it's a good idea to enable it for all
>> SoCs. Therefore that patch introduced a
>>
>> config OPTIMIZED_IOMEM_MEMCPY
>>
>> which would allow us to enable it for only known to be compatible
>> SoCs. Just in case this helps to get it applied ;)
>
> Any opinion on this?

Yes, unfortunately it should be selectable by platform, though it needs
to be done carefully.  Some ARM platforms can't cope with full 32-bit
accesses to MMIO.

I don't think any of those intersect with the single zImage project, so
those should be fine to have it always enabled - it's only the older
platforms that might have issues.

Arnd Bergmann June 13, 2013, 12:47 p.m. UTC | #12

On Thursday 13 June 2013 12:05:02 Russell King - ARM Linux wrote:
> Yes, unfortunately it should be selectable by platform, though it needs
> to be done carefully.  Some ARM platforms can't cope with full 32-bit
> accesses to MMIO.
> 
> I don't think any of those intersect with the single zImage project, so
> those should be fine to have it always enabled - it's only the older
> platforms that might have issues.

Are those all StrongARM and XScale based platforms? If so, I definitely
don't expect them to go multiplatform.

Can we combine this with the existing CONFIG_NEED_MACH_IO_H?

	Arnd

Russell King - ARM Linux June 13, 2013, 5:13 p.m. UTC | #13

On Thu, Jun 13, 2013 at 02:47:35PM +0200, Arnd Bergmann wrote:
> On Thursday 13 June 2013 12:05:02 Russell King - ARM Linux wrote:
> > Yes, unfortunately it should be selectable by platform, though it needs
> > to be done carefully.  Some ARM platforms can't cope with full 32-bit
> > accesses to MMIO.
> > 
> > I don't think any of those intersect with the single zImage project, so
> > those should be fine to have it always enabled - it's only the older
> > platforms that might have issues.
> 
> Are those all StrongARM and XScale based platforms? If so, I definitely
> don't expect them to go multiplatform.

I'm not 100% positive on that, but...

> Can we combine this with the existing CONFIG_NEED_MACH_IO_H?

I think we probably can - because that implies that the generic set of
IO accessors are already being used, which means 8, 16 and 32 bit
accesses must be supportable without any kind of trickery.

Dirk Behme June 23, 2013, 6:46 a.m. UTC | #14

Hi Arnd,

On 13.06.2013 19:13, Russell King - ARM Linux wrote:
> On Thu, Jun 13, 2013 at 02:47:35PM +0200, Arnd Bergmann wrote:
>> On Thursday 13 June 2013 12:05:02 Russell King - ARM Linux wrote:
>>> Yes, unfortunately it should be selectable by platform, though it needs
>>> to be done carefully.  Some ARM platforms can't cope with full 32-bit
>>> accesses to MMIO.
>>>
>>> I don't think any of those intersect with the single zImage project, so
>>> those should be fine to have it always enabled - it's only the older
>>> platforms that might have issues.
>>
>> Are those all StrongARM and XScale based platforms? If so, I definitely
>> don't expect them to go multiplatform.
>
> I'm not 100% positive on that, but...
>
>> Can we combine this with the existing CONFIG_NEED_MACH_IO_H?
>
> I think we probably can - because that implies that the generic set of
> IO accessors are already being used, which means 8, 16 and 32 bit
> accesses must be supportable without any kind of trickery.

Do you like to send a patch with this CONFIG_NEED_MACH_IO_H proposal?

Many thanks and best regards

Dirk

P.S.: Just for reference, the initial patch from Russell we are 
talking about:

http://permalink.gmane.org/gmane.linux.ports.arm.kernel/190902

ARM: optimize memset_io()/memcpy_fromio()/memcpy_toio()

Commit Message

Comments

Patch