diff mbox

ARM: supplementing IO accessors with 64 bit capability

Message ID 1413993983-17310-1-git-send-email-mathieu.poirier@linaro.org (mailing list archive)
State New, archived
Headers show

Commit Message

Mathieu Poirier Oct. 22, 2014, 4:06 p.m. UTC
From: Mathieu Poirier <mathieu.poirier@linaro.org>

Some drivers on ARMv7 need 64 bit read and writes.

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
---
 arch/arm/include/asm/io.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

Comments

Russell King - ARM Linux Oct. 22, 2014, 4:11 p.m. UTC | #1
On Wed, Oct 22, 2014 at 10:06:23AM -0600, mathieu.poirier@linaro.org wrote:
> @@ -306,10 +324,13 @@ extern void _memset_io(volatile void __iomem *, int, size_t);
>  					__raw_readw(c)); __r; })
>  #define readl_relaxed(c) ({ u32 __r = le32_to_cpu((__force __le32) \
>  					__raw_readl(c)); __r; })
> +#define readq_relaxed(c) ({ u64 __r = le64_to_cpu((__force __le64) \
> +					__raw_readq(c)); __r; })
>  
>  #define writeb_relaxed(v,c)	__raw_writeb(v,c)
>  #define writew_relaxed(v,c)	__raw_writew((__force u16) cpu_to_le16(v),c)
>  #define writel_relaxed(v,c)	__raw_writel((__force u32) cpu_to_le32(v),c)
> +#define writeq_relaxed(v,c)	__raw_writeq((__force u64) cpu_to_le64(v),c)

You should only define these if we have the corresponding __raw_ versions
too.
Mathieu Poirier Oct. 22, 2014, 4:22 p.m. UTC | #2
On 22 October 2014 18:11, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Wed, Oct 22, 2014 at 10:06:23AM -0600, mathieu.poirier@linaro.org wrote:
>> @@ -306,10 +324,13 @@ extern void _memset_io(volatile void __iomem *, int, size_t);
>>                                       __raw_readw(c)); __r; })
>>  #define readl_relaxed(c) ({ u32 __r = le32_to_cpu((__force __le32) \
>>                                       __raw_readl(c)); __r; })
>> +#define readq_relaxed(c) ({ u64 __r = le64_to_cpu((__force __le64) \
>> +                                     __raw_readq(c)); __r; })
>>
>>  #define writeb_relaxed(v,c)  __raw_writeb(v,c)
>>  #define writew_relaxed(v,c)  __raw_writew((__force u16) cpu_to_le16(v),c)
>>  #define writel_relaxed(v,c)  __raw_writel((__force u32) cpu_to_le32(v),c)
>> +#define writeq_relaxed(v,c)  __raw_writeq((__force u64) cpu_to_le64(v),c)
>
> You should only define these if we have the corresponding __raw_ versions
> too.

I had this conversation with a colleague who reviewed the work.  If
the architecture is < 5 the __raw_ versions aren't included and the
compiler won't complain until someone tries to use the macros.  We
achieve the same result - the macros aren't accessible when the
architecture doesn't support it - while saving an #if condition in the
file.

I'm not strongly opinionated on this - I can enclose the macros in an
#if statement.

>
> --
> FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
> according to speedtest.net.
Catalin Marinas Oct. 22, 2014, 4:44 p.m. UTC | #3
On Wed, Oct 22, 2014 at 05:06:23PM +0100, mathieu.poirier@linaro.org wrote:
> +#if __LINUX_ARM_ARCH__ >= 5

My old ARMv5 book does not list LDRD/STRD. It looks like they only come
with ARMv5TE. Are there any processors prior to this supported by the
kernel?

> +static inline void __raw_writeq(u64 val, volatile void __iomem *addr)
> +{
> +	asm volatile("strd %1, %0"
> +		     : "+Qo" (*(volatile u64 __force *)addr)
> +		     : "r" (val));
> +}
> +
> +static inline u64 __raw_readq(const volatile void __iomem *addr)
> +{
> +	u64 val;
> +	asm volatile("ldrd %1, %0"
> +		     : "+Qo" (*(volatile u64 __force *)addr),
> +		       "=r" (val));
> +	return val;
> +}
> +#endif

I'm curious why you need these. Do you have a device that needs a 64-bit
single access or you are trying to read two consecutive registers?
Russell King - ARM Linux Oct. 22, 2014, 5:19 p.m. UTC | #4
On Wed, Oct 22, 2014 at 06:22:09PM +0200, Mathieu Poirier wrote:
> I had this conversation with a colleague who reviewed the work.  If
> the architecture is < 5 the __raw_ versions aren't included and the
> compiler won't complain until someone tries to use the macros.  We
> achieve the same result - the macros aren't accessible when the
> architecture doesn't support it - while saving an #if condition in the
> file.
> 
> I'm not strongly opinionated on this - I can enclose the macros in an
> #if statement.

What you're missing is that some driver may test for their presence by
doing:

#ifndef readq_relaxed
... do something else

which would now break as a detection method, as the macro is always
defined no matter whether it's present or not.
Mathieu Poirier Oct. 22, 2014, 5:55 p.m. UTC | #5
On 22 October 2014 19:19, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Wed, Oct 22, 2014 at 06:22:09PM +0200, Mathieu Poirier wrote:
>> I had this conversation with a colleague who reviewed the work.  If
>> the architecture is < 5 the __raw_ versions aren't included and the
>> compiler won't complain until someone tries to use the macros.  We
>> achieve the same result - the macros aren't accessible when the
>> architecture doesn't support it - while saving an #if condition in the
>> file.
>>
>> I'm not strongly opinionated on this - I can enclose the macros in an
>> #if statement.
>
> What you're missing is that some driver may test for their presence by
> doing:
>
> #ifndef readq_relaxed
> ... do something else
>
> which would now break as a detection method, as the macro is always
> defined no matter whether it's present or not.

Good point - thanks for showing me the light.

>
> --
> FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
> according to speedtest.net.
Mathieu Poirier Oct. 22, 2014, 7:10 p.m. UTC | #6
On 22 October 2014 18:44, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Wed, Oct 22, 2014 at 05:06:23PM +0100, mathieu.poirier@linaro.org wrote:
>> +#if __LINUX_ARM_ARCH__ >= 5
>
> My old ARMv5 book does not list LDRD/STRD. It looks like they only come
> with ARMv5TE. Are there any processors prior to this supported by the
> kernel?
>
>> +static inline void __raw_writeq(u64 val, volatile void __iomem *addr)
>> +{
>> +     asm volatile("strd %1, %0"
>> +                  : "+Qo" (*(volatile u64 __force *)addr)
>> +                  : "r" (val));
>> +}
>> +
>> +static inline u64 __raw_readq(const volatile void __iomem *addr)
>> +{
>> +     u64 val;
>> +     asm volatile("ldrd %1, %0"
>> +                  : "+Qo" (*(volatile u64 __force *)addr),
>> +                    "=r" (val));
>> +     return val;
>> +}
>> +#endif
>
> I'm curious why you need these. Do you have a device that needs a 64-bit
> single access or you are trying to read two consecutive registers?

The fundamental data size of Coresight STM32 for ARMv7 is
implementation defined and can be 32 or 64bit.  As such stimulus ports
can support transaction sizes of up to 64 bit.

Mathieu

>
> --
> Catalin
Nicolas Pitre Oct. 23, 2014, 7:47 p.m. UTC | #7
On Wed, 22 Oct 2014, Catalin Marinas wrote:

> On Wed, Oct 22, 2014 at 05:06:23PM +0100, mathieu.poirier@linaro.org wrote:
> > +#if __LINUX_ARM_ARCH__ >= 5
> 
> My old ARMv5 book does not list LDRD/STRD. It looks like they only come
> with ARMv5TE. Are there any processors prior to this supported by the
> kernel?

We still supports ARMv4 targets.

As far as I know, all the ARMv5 targets we support are also TE capable.


Nicolas
Russell King - ARM Linux Oct. 23, 2014, 7:51 p.m. UTC | #8
On Thu, Oct 23, 2014 at 03:47:32PM -0400, Nicolas Pitre wrote:
> On Wed, 22 Oct 2014, Catalin Marinas wrote:
> 
> > On Wed, Oct 22, 2014 at 05:06:23PM +0100, mathieu.poirier@linaro.org wrote:
> > > +#if __LINUX_ARM_ARCH__ >= 5
> > 
> > My old ARMv5 book does not list LDRD/STRD. It looks like they only come
> > with ARMv5TE. Are there any processors prior to this supported by the
> > kernel?
> 
> We still supports ARMv4 targets.
> 
> As far as I know, all the ARMv5 targets we support are also TE capable.

Not quite.  We have ARM1020, which according to our proc-*.S files is
only ARMv5T, not ARMv5TE.
Nicolas Pitre Oct. 23, 2014, 8:15 p.m. UTC | #9
On Thu, 23 Oct 2014, Russell King - ARM Linux wrote:

> On Thu, Oct 23, 2014 at 03:47:32PM -0400, Nicolas Pitre wrote:
> > On Wed, 22 Oct 2014, Catalin Marinas wrote:
> > 
> > > On Wed, Oct 22, 2014 at 05:06:23PM +0100, mathieu.poirier@linaro.org wrote:
> > > > +#if __LINUX_ARM_ARCH__ >= 5
> > > 
> > > My old ARMv5 book does not list LDRD/STRD. It looks like they only come
> > > with ARMv5TE. Are there any processors prior to this supported by the
> > > kernel?
> > 
> > We still supports ARMv4 targets.
> > 
> > As far as I know, all the ARMv5 targets we support are also TE capable.
> 
> Not quite.  We have ARM1020, which according to our proc-*.S files is
> only ARMv5T, not ARMv5TE.

Oh well.  Never saw such a beast in the field though.

Maybe to be on the very safe side, given that no ARMV5TE is likely to 
need 64-bit IO accessors at this point, this could simply be 
__LINUX_ARM_ARCH__ >= 6 instead.


Nicolas
Catalin Marinas Oct. 24, 2014, 9:23 a.m. UTC | #10
On Thu, Oct 23, 2014 at 08:47:32PM +0100, Nicolas Pitre wrote:
> On Wed, 22 Oct 2014, Catalin Marinas wrote:
> 
> > On Wed, Oct 22, 2014 at 05:06:23PM +0100, mathieu.poirier@linaro.org wrote:
> > > +#if __LINUX_ARM_ARCH__ >= 5
> > 
> > My old ARMv5 book does not list LDRD/STRD. It looks like they only come
> > with ARMv5TE. Are there any processors prior to this supported by the
> > kernel?
> 
> We still supports ARMv4 targets.

Yes. What I meant is non-TE ARMv5 targets.
Catalin Marinas Oct. 24, 2014, 9:28 a.m. UTC | #11
On Wed, Oct 22, 2014 at 08:10:27PM +0100, Mathieu Poirier wrote:
> On 22 October 2014 18:44, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Wed, Oct 22, 2014 at 05:06:23PM +0100, mathieu.poirier@linaro.org wrote:
> >> +static inline void __raw_writeq(u64 val, volatile void __iomem *addr)
> >> +{
> >> +     asm volatile("strd %1, %0"
> >> +                  : "+Qo" (*(volatile u64 __force *)addr)
> >> +                  : "r" (val));
> >> +}
> >> +
> >> +static inline u64 __raw_readq(const volatile void __iomem *addr)
> >> +{
> >> +     u64 val;
> >> +     asm volatile("ldrd %1, %0"
> >> +                  : "+Qo" (*(volatile u64 __force *)addr),
> >> +                    "=r" (val));
> >> +     return val;
> >> +}
> >> +#endif
> >
> > I'm curious why you need these. Do you have a device that needs a 64-bit
> > single access or you are trying to read two consecutive registers?
> 
> The fundamental data size of Coresight STM32 for ARMv7 is
> implementation defined and can be 32 or 64bit.  As such stimulus ports
> can support transaction sizes of up to 64 bit.

The STM programmer's model spec recommends something else (though I find
the "3.6 Data sizes" chapter a bit confusing):

  To ensure that code is portable between processor micro-architectures
  and system implementations, ARM recommends that only the native data
  size of the machine is used, and smaller sizes. For the 32-bit ARMv7
  architecture, only 8, 16, and 32-bit transfers are recommended. For an
  ARMv8 processor that supports the AArch64 Execution state, it is
  recommended that the fundamental data size of 64-bits is implemented.

Which means that you should not use readq/writeq on a 32-bit system.
Arnd Bergmann Oct. 24, 2014, 10:37 a.m. UTC | #12
On Thursday 23 October 2014 16:15:19 Nicolas Pitre wrote:
> On Thu, 23 Oct 2014, Russell King - ARM Linux wrote:
> 
> > On Thu, Oct 23, 2014 at 03:47:32PM -0400, Nicolas Pitre wrote:
> > > On Wed, 22 Oct 2014, Catalin Marinas wrote:
> > > 
> > > > On Wed, Oct 22, 2014 at 05:06:23PM +0100, mathieu.poirier@linaro.org wrote:
> > > > > +#if __LINUX_ARM_ARCH__ >= 5
> > > > 
> > > > My old ARMv5 book does not list LDRD/STRD. It looks like they only come
> > > > with ARMv5TE. Are there any processors prior to this supported by the
> > > > kernel?
> > > 
> > > We still supports ARMv4 targets.
> > > 
> > > As far as I know, all the ARMv5 targets we support are also TE capable.
> > 
> > Not quite.  We have ARM1020, which according to our proc-*.S files is
> > only ARMv5T, not ARMv5TE.

Does this actually work when we are building with -march=armv5te?

The Makefile contains this line:

arch-$(CONFIG_CPU_32v5) =-D__LINUX_ARM_ARCH__=5 $(call cc-option,-march=armv5te,-march=armv4t)

which looks like it would break for ARM1020.


On a related note, I also wonder about this part:

tune-$(CONFIG_CPU_ARM946E)      =$(call cc-option,-mtune=arm9e,-mtune=arm9tdmi)
tune-$(CONFIG_CPU_ARM920T)      =-mtune=arm9tdmi
tune-$(CONFIG_CPU_ARM922T)      =-mtune=arm9tdmi
tune-$(CONFIG_CPU_ARM925T)      =-mtune=arm9tdmi
tune-$(CONFIG_CPU_ARM926T)      =-mtune=arm9tdmi

I stumbled over this a while ago and couldn't figure it out. Does ARM926T
actually exist, or is that a mistake that should actually be ARM926E?

If this is always ARM926E, shouldn't we build with -mtune=arm9e as we do
for ARM946E?

> Oh well.  Never saw such a beast in the field though.

The only ARM10 implementation aside from integrator/realview that I'm aware
of is an ARM1026E based Conexant/Ikanos DSL modem SoC (CX94xxx), and that
is of course ARMv5TE.

> Maybe to be on the very safe side, given that no ARMV5TE is likely to 
> need 64-bit IO accessors at this point, this could simply be 
> __LINUX_ARM_ARCH__ >= 6 instead.

Which drivers need that support anyway? We definitely need to ensure
that we don't try to build them on architectures without this support
when CONFIG_COMPILE_TEST is set.

	Arnd
Mathieu Poirier Oct. 24, 2014, 3:05 p.m. UTC | #13
On 24 October 2014 03:28, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Wed, Oct 22, 2014 at 08:10:27PM +0100, Mathieu Poirier wrote:
>> On 22 October 2014 18:44, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> > On Wed, Oct 22, 2014 at 05:06:23PM +0100, mathieu.poirier@linaro.org wrote:
>> >> +static inline void __raw_writeq(u64 val, volatile void __iomem *addr)
>> >> +{
>> >> +     asm volatile("strd %1, %0"
>> >> +                  : "+Qo" (*(volatile u64 __force *)addr)
>> >> +                  : "r" (val));
>> >> +}
>> >> +
>> >> +static inline u64 __raw_readq(const volatile void __iomem *addr)
>> >> +{
>> >> +     u64 val;
>> >> +     asm volatile("ldrd %1, %0"
>> >> +                  : "+Qo" (*(volatile u64 __force *)addr),
>> >> +                    "=r" (val));
>> >> +     return val;
>> >> +}
>> >> +#endif
>> >
>> > I'm curious why you need these. Do you have a device that needs a 64-bit
>> > single access or you are trying to read two consecutive registers?
>>
>> The fundamental data size of Coresight STM32 for ARMv7 is
>> implementation defined and can be 32 or 64bit.  As such stimulus ports
>> can support transaction sizes of up to 64 bit.
>
> The STM programmer's model spec recommends something else (though I find
> the "3.6 Data sizes" chapter a bit confusing):
>
>   To ensure that code is portable between processor micro-architectures
>   and system implementations, ARM recommends that only the native data
>   size of the machine is used, and smaller sizes. For the 32-bit ARMv7
>   architecture, only 8, 16, and 32-bit transfers are recommended. For an
>   ARMv8 processor that supports the AArch64 Execution state, it is
>   recommended that the fundamental data size of 64-bits is implemented.
>
> Which means that you should not use readq/writeq on a 32-bit system.

Not quite.  ARM documentation IHI0054B (ARM System Trace Macrocell:
Programmers' Model Architecture Specification) stipulate that "For
systems with an ARMv7 processor, ARM recommends configuration 1 or
configuration 2.", where configuration 2 has a fundamental size of 64
bit.

Mathieu

>
> --
> Catalin
Catalin Marinas Oct. 24, 2014, 4:16 p.m. UTC | #14
On Fri, Oct 24, 2014 at 04:05:13PM +0100, Mathieu Poirier wrote:
> On 24 October 2014 03:28, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Wed, Oct 22, 2014 at 08:10:27PM +0100, Mathieu Poirier wrote:
> >> On 22 October 2014 18:44, Catalin Marinas <catalin.marinas@arm.com> wrote:
> >> > On Wed, Oct 22, 2014 at 05:06:23PM +0100, mathieu.poirier@linaro.org wrote:
> >> >> +static inline void __raw_writeq(u64 val, volatile void __iomem *addr)
> >> >> +{
> >> >> +     asm volatile("strd %1, %0"
> >> >> +                  : "+Qo" (*(volatile u64 __force *)addr)
> >> >> +                  : "r" (val));
> >> >> +}
> >> >> +
> >> >> +static inline u64 __raw_readq(const volatile void __iomem *addr)
> >> >> +{
> >> >> +     u64 val;
> >> >> +     asm volatile("ldrd %1, %0"
> >> >> +                  : "+Qo" (*(volatile u64 __force *)addr),
> >> >> +                    "=r" (val));
> >> >> +     return val;
> >> >> +}
> >> >> +#endif
> >> >
> >> > I'm curious why you need these. Do you have a device that needs a 64-bit
> >> > single access or you are trying to read two consecutive registers?
> >>
> >> The fundamental data size of Coresight STM32 for ARMv7 is
> >> implementation defined and can be 32 or 64bit.  As such stimulus ports
> >> can support transaction sizes of up to 64 bit.
> >
> > The STM programmer's model spec recommends something else (though I find
> > the "3.6 Data sizes" chapter a bit confusing):
> >
> >   To ensure that code is portable between processor micro-architectures
> >   and system implementations, ARM recommends that only the native data
> >   size of the machine is used, and smaller sizes. For the 32-bit ARMv7
> >   architecture, only 8, 16, and 32-bit transfers are recommended. For an
> >   ARMv8 processor that supports the AArch64 Execution state, it is
> >   recommended that the fundamental data size of 64-bits is implemented.
> >
> > Which means that you should not use readq/writeq on a 32-bit system.
> 
> Not quite.  ARM documentation IHI0054B (ARM System Trace Macrocell:
> Programmers' Model Architecture Specification) stipulate that "For
> systems with an ARMv7 processor, ARM recommends configuration 1 or
> configuration 2.", where configuration 2 has a fundamental size of 64
> bit.

As I said, it's confusing. Anyway, you can go ahead and add the
readq/writeq for ARMv6 and later, though it won't be guaranteed to have
a 64-bit access, it depends on the bus.

BTW, do you need to define the non-relaxed accessors as well? That would
be readq/writeq.
Mathieu Poirier Oct. 24, 2014, 5:54 p.m. UTC | #15
On 24 October 2014 10:16, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Fri, Oct 24, 2014 at 04:05:13PM +0100, Mathieu Poirier wrote:
>> On 24 October 2014 03:28, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> > On Wed, Oct 22, 2014 at 08:10:27PM +0100, Mathieu Poirier wrote:
>> >> On 22 October 2014 18:44, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> >> > On Wed, Oct 22, 2014 at 05:06:23PM +0100, mathieu.poirier@linaro.org wrote:
>> >> >> +static inline void __raw_writeq(u64 val, volatile void __iomem *addr)
>> >> >> +{
>> >> >> +     asm volatile("strd %1, %0"
>> >> >> +                  : "+Qo" (*(volatile u64 __force *)addr)
>> >> >> +                  : "r" (val));
>> >> >> +}
>> >> >> +
>> >> >> +static inline u64 __raw_readq(const volatile void __iomem *addr)
>> >> >> +{
>> >> >> +     u64 val;
>> >> >> +     asm volatile("ldrd %1, %0"
>> >> >> +                  : "+Qo" (*(volatile u64 __force *)addr),
>> >> >> +                    "=r" (val));
>> >> >> +     return val;
>> >> >> +}
>> >> >> +#endif
>> >> >
>> >> > I'm curious why you need these. Do you have a device that needs a 64-bit
>> >> > single access or you are trying to read two consecutive registers?
>> >>
>> >> The fundamental data size of Coresight STM32 for ARMv7 is
>> >> implementation defined and can be 32 or 64bit.  As such stimulus ports
>> >> can support transaction sizes of up to 64 bit.
>> >
>> > The STM programmer's model spec recommends something else (though I find
>> > the "3.6 Data sizes" chapter a bit confusing):
>> >
>> >   To ensure that code is portable between processor micro-architectures
>> >   and system implementations, ARM recommends that only the native data
>> >   size of the machine is used, and smaller sizes. For the 32-bit ARMv7
>> >   architecture, only 8, 16, and 32-bit transfers are recommended. For an
>> >   ARMv8 processor that supports the AArch64 Execution state, it is
>> >   recommended that the fundamental data size of 64-bits is implemented.
>> >
>> > Which means that you should not use readq/writeq on a 32-bit system.
>>
>> Not quite.  ARM documentation IHI0054B (ARM System Trace Macrocell:
>> Programmers' Model Architecture Specification) stipulate that "For
>> systems with an ARMv7 processor, ARM recommends configuration 1 or
>> configuration 2.", where configuration 2 has a fundamental size of 64
>> bit.
>
> As I said, it's confusing. Anyway, you can go ahead and add the
> readq/writeq for ARMv6 and later, though it won't be guaranteed to have
> a 64-bit access, it depends on the bus.

Agreed.

>
> BTW, do you need to define the non-relaxed accessors as well? That would
> be readq/writeq.

No other master in the system is consuming this data and the channels
return '0' on read.  As such I didn't plan on implementing the
non-relaxed accessors.  Would you like me to do so for completeness?

>
> --
> Catalin
Will Deacon Oct. 27, 2014, 3:54 p.m. UTC | #16
On Fri, Oct 24, 2014 at 05:16:34PM +0100, Catalin Marinas wrote:
> On Fri, Oct 24, 2014 at 04:05:13PM +0100, Mathieu Poirier wrote:
> > On 24 October 2014 03:28, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > On Wed, Oct 22, 2014 at 08:10:27PM +0100, Mathieu Poirier wrote:
> > >> On 22 October 2014 18:44, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > >> > On Wed, Oct 22, 2014 at 05:06:23PM +0100, mathieu.poirier@linaro.org wrote:
> > >> >> +static inline void __raw_writeq(u64 val, volatile void __iomem *addr)
> > >> >> +{
> > >> >> +     asm volatile("strd %1, %0"
> > >> >> +                  : "+Qo" (*(volatile u64 __force *)addr)
> > >> >> +                  : "r" (val));
> > >> >> +}
> > >> >> +
> > >> >> +static inline u64 __raw_readq(const volatile void __iomem *addr)
> > >> >> +{
> > >> >> +     u64 val;
> > >> >> +     asm volatile("ldrd %1, %0"
> > >> >> +                  : "+Qo" (*(volatile u64 __force *)addr),
> > >> >> +                    "=r" (val));
> > >> >> +     return val;
> > >> >> +}
> > >> >> +#endif
> > >> >
> > >> > I'm curious why you need these. Do you have a device that needs a 64-bit
> > >> > single access or you are trying to read two consecutive registers?
> > >>
> > >> The fundamental data size of Coresight STM32 for ARMv7 is
> > >> implementation defined and can be 32 or 64bit.  As such stimulus ports
> > >> can support transaction sizes of up to 64 bit.
> > >
> > > The STM programmer's model spec recommends something else (though I find
> > > the "3.6 Data sizes" chapter a bit confusing):
> > >
> > >   To ensure that code is portable between processor micro-architectures
> > >   and system implementations, ARM recommends that only the native data
> > >   size of the machine is used, and smaller sizes. For the 32-bit ARMv7
> > >   architecture, only 8, 16, and 32-bit transfers are recommended. For an
> > >   ARMv8 processor that supports the AArch64 Execution state, it is
> > >   recommended that the fundamental data size of 64-bits is implemented.
> > >
> > > Which means that you should not use readq/writeq on a 32-bit system.
> > 
> > Not quite.  ARM documentation IHI0054B (ARM System Trace Macrocell:
> > Programmers' Model Architecture Specification) stipulate that "For
> > systems with an ARMv7 processor, ARM recommends configuration 1 or
> > configuration 2.", where configuration 2 has a fundamental size of 64
> > bit.
> 
> As I said, it's confusing. Anyway, you can go ahead and add the
> readq/writeq for ARMv6 and later, though it won't be guaranteed to have
> a 64-bit access, it depends on the bus.

I'm really not comfortable with this... we don't make any guarantees for
32-bit CPUs that a double-word access will be single-copy atomic for MMIO.
That means it could be subjected to things like reordering and merging,
which I think means that it depends on both the bus *and* the endpoint as to
whether or not this will work. Worse still, the endpoint could decide to
return a SLVERR, which would appear as an external abort.

Is it not possible to use 32-bit MMIO accesses for this driver?

Will
Mathieu Poirier Oct. 27, 2014, 10:14 p.m. UTC | #17
On 27 October 2014 09:54, Will Deacon <will.deacon@arm.com> wrote:
> On Fri, Oct 24, 2014 at 05:16:34PM +0100, Catalin Marinas wrote:
>> On Fri, Oct 24, 2014 at 04:05:13PM +0100, Mathieu Poirier wrote:
>> > On 24 October 2014 03:28, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> > > On Wed, Oct 22, 2014 at 08:10:27PM +0100, Mathieu Poirier wrote:
>> > >> On 22 October 2014 18:44, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> > >> > On Wed, Oct 22, 2014 at 05:06:23PM +0100, mathieu.poirier@linaro.org wrote:
>> > >> >> +static inline void __raw_writeq(u64 val, volatile void __iomem *addr)
>> > >> >> +{
>> > >> >> +     asm volatile("strd %1, %0"
>> > >> >> +                  : "+Qo" (*(volatile u64 __force *)addr)
>> > >> >> +                  : "r" (val));
>> > >> >> +}
>> > >> >> +
>> > >> >> +static inline u64 __raw_readq(const volatile void __iomem *addr)
>> > >> >> +{
>> > >> >> +     u64 val;
>> > >> >> +     asm volatile("ldrd %1, %0"
>> > >> >> +                  : "+Qo" (*(volatile u64 __force *)addr),
>> > >> >> +                    "=r" (val));
>> > >> >> +     return val;
>> > >> >> +}
>> > >> >> +#endif
>> > >> >
>> > >> > I'm curious why you need these. Do you have a device that needs a 64-bit
>> > >> > single access or you are trying to read two consecutive registers?
>> > >>
>> > >> The fundamental data size of Coresight STM32 for ARMv7 is
>> > >> implementation defined and can be 32 or 64bit.  As such stimulus ports
>> > >> can support transaction sizes of up to 64 bit.
>> > >
>> > > The STM programmer's model spec recommends something else (though I find
>> > > the "3.6 Data sizes" chapter a bit confusing):
>> > >
>> > >   To ensure that code is portable between processor micro-architectures
>> > >   and system implementations, ARM recommends that only the native data
>> > >   size of the machine is used, and smaller sizes. For the 32-bit ARMv7
>> > >   architecture, only 8, 16, and 32-bit transfers are recommended. For an
>> > >   ARMv8 processor that supports the AArch64 Execution state, it is
>> > >   recommended that the fundamental data size of 64-bits is implemented.
>> > >
>> > > Which means that you should not use readq/writeq on a 32-bit system.
>> >
>> > Not quite.  ARM documentation IHI0054B (ARM System Trace Macrocell:
>> > Programmers' Model Architecture Specification) stipulate that "For
>> > systems with an ARMv7 processor, ARM recommends configuration 1 or
>> > configuration 2.", where configuration 2 has a fundamental size of 64
>> > bit.
>>
>> As I said, it's confusing. Anyway, you can go ahead and add the
>> readq/writeq for ARMv6 and later, though it won't be guaranteed to have
>> a 64-bit access, it depends on the bus.
>
> I'm really not comfortable with this... we don't make any guarantees for
> 32-bit CPUs that a double-word access will be single-copy atomic for MMIO.
> That means it could be subjected to things like reordering and merging,
> which I think means that it depends on both the bus *and* the endpoint as to
> whether or not this will work. Worse still, the endpoint could decide to
> return a SLVERR, which would appear as an external abort.

I agree on all of the point you bring up.  The person using these
should know their architecture and the target endpoint support this
kind of access.  If they don't then a problem will show up pretty
quickly.

>
> Is it not possible to use 32-bit MMIO accesses for this driver?

Sure it is but we wouldn't be using the HW to it's full capability.
Another solution is to move the accessors to the driver itself where
nobody else in the 32 bit world will have access to them.  Russell,
what you're opinion on that?

>
> Will
Will Deacon Oct. 28, 2014, 12:23 p.m. UTC | #18
On Mon, Oct 27, 2014 at 10:14:41PM +0000, Mathieu Poirier wrote:
> On 27 October 2014 09:54, Will Deacon <will.deacon@arm.com> wrote:
> > On Fri, Oct 24, 2014 at 05:16:34PM +0100, Catalin Marinas wrote:
> >> As I said, it's confusing. Anyway, you can go ahead and add the
> >> readq/writeq for ARMv6 and later, though it won't be guaranteed to have
> >> a 64-bit access, it depends on the bus.
> >
> > I'm really not comfortable with this... we don't make any guarantees for
> > 32-bit CPUs that a double-word access will be single-copy atomic for MMIO.
> > That means it could be subjected to things like reordering and merging,
> > which I think means that it depends on both the bus *and* the endpoint as to
> > whether or not this will work. Worse still, the endpoint could decide to
> > return a SLVERR, which would appear as an external abort.
> 
> I agree on all of the point you bring up.  The person using these
> should know their architecture and the target endpoint support this
> kind of access.  If they don't then a problem will show up pretty
> quickly.

That goes against the I/O abstractions provided by the kernel to allow for
portable device drivers. readq/writeq *must* have some portable semantics
and I don't think that we should implement them on a best-effort basis
in io.h.

> >
> > Is it not possible to use 32-bit MMIO accesses for this driver?
> 
> Sure it is but we wouldn't be using the HW to it's full capability.
> Another solution is to move the accessors to the driver itself where
> nobody else in the 32 bit world will have access to them.  Russell,
> what you're opinion on that?

FWIW, I'd much prefer that, but I'd be interested to know how much of a
a couple of {read,write}l_relaxed operations really cost you by
comparison.

Will
diff mbox

Patch

diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
index 1805674..861e52c 100644
--- a/arch/arm/include/asm/io.h
+++ b/arch/arm/include/asm/io.h
@@ -118,6 +118,24 @@  static inline u32 __raw_readl(const volatile void __iomem *addr)
 	return val;
 }
 
+#if __LINUX_ARM_ARCH__ >= 5
+static inline void __raw_writeq(u64 val, volatile void __iomem *addr)
+{
+	asm volatile("strd %1, %0"
+		     : "+Qo" (*(volatile u64 __force *)addr)
+		     : "r" (val));
+}
+
+static inline u64 __raw_readq(const volatile void __iomem *addr)
+{
+	u64 val;
+	asm volatile("ldrd %1, %0"
+		     : "+Qo" (*(volatile u64 __force *)addr),
+		       "=r" (val));
+	return val;
+}
+#endif
+
 /*
  * Architecture ioremap implementation.
  */
@@ -306,10 +324,13 @@  extern void _memset_io(volatile void __iomem *, int, size_t);
 					__raw_readw(c)); __r; })
 #define readl_relaxed(c) ({ u32 __r = le32_to_cpu((__force __le32) \
 					__raw_readl(c)); __r; })
+#define readq_relaxed(c) ({ u64 __r = le64_to_cpu((__force __le64) \
+					__raw_readq(c)); __r; })
 
 #define writeb_relaxed(v,c)	__raw_writeb(v,c)
 #define writew_relaxed(v,c)	__raw_writew((__force u16) cpu_to_le16(v),c)
 #define writel_relaxed(v,c)	__raw_writel((__force u32) cpu_to_le32(v),c)
+#define writeq_relaxed(v,c)	__raw_writeq((__force u64) cpu_to_le64(v),c)
 
 #define readb(c)		({ u8  __v = readb_relaxed(c); __iormb(); __v; })
 #define readw(c)		({ u16 __v = readw_relaxed(c); __iormb(); __v; })