diff mbox

ARM: io: avoid writeback addressing modes for __raw_ accessors

Message ID 20120820144927.GP25864@mudshark.cambridge.arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Will Deacon Aug. 20, 2012, 2:49 p.m. UTC
On Mon, Aug 20, 2012 at 02:29:31PM +0100, Nicolas Pitre wrote:
> On Mon, 20 Aug 2012, Will Deacon wrote:
> > Translates the following code from amba_device_add:
> > 
> > 		for (pid = 0, i = 0; i < 4; i++)
> > 			pid |= (readl(tmp + size - 0x20 + 4 * i) & 255) <<
> > 				(i * 8);
> > 		for (cid = 0, i = 0; i < 4; i++)
> > 			cid |= (readl(tmp + size - 0x10 + 4 * i) & 255) <<
> > 				(i * 8);
> > 
> > into:
> [...]
> 
> OK, I can see how the compiler will fold the loop increment into the IO 
> access instruction.  But my point is: is this common?  And when this 
> happens, is this a critical path?

Sure, this case isn't such a big deal (basically just boot-time probing) but
GCC could still generate this stuff in a driver, where it could end up being
an I/O bottleneck for a guest OS.

> > whilst the addressing modes are still nice, the dsb is going to be the
> > performance limitation here. That said, we could try the "Qo" constraints,
> > which I seem to remember don't generate writebacks. I'll have a play.
> 
> OK.  That would be excellent.

Looks like we have a winner (diff against the original patch below). I now
see:

00000340 <hw_init>:
 340:   e1a03000        mov     r3, r0
 344:   e3a00000        mov     r0, #0
 348:   e5830010        str     r0, [r3, #16]
 34c:   e3e02000        mvn     r2, #0
 350:   e5832014        str     r2, [r3, #20]
 354:   e5932018        ldr     r2, [r3, #24]
 358:   e38220ff        orr     r2, r2, #255    ; 0xff
 35c:   e5832030        str     r2, [r3, #48]   ; 0x30
 360:   e12fff1e        bx      lr

with the new code, which is basically the same as the old code but the mvn and
a str have switched places. The same difference occurs when targetting Thumb2.

> > To deal with this, the hypervisor will likely have to stop the virtual world
> > when emulating any MMIO accesses that report incomplete fault information to
> > avoid racing with a TLB invalidation from another virtual CPU. That will
> > certainly be more expensive than an additional instruction on each access.
> 
> I totally agree with you here.
> 
> However, for completeness and above all for security reasons, the 
> hypervisor will _ahve_ to support that case anyway.

Support it, yes, but perhaps not efficiently.

> So it is now a matter of compromise between performance and code size.  
> If the pathological case you brought up above is the exception and not 
> the rule then I think that we can live with the performance impact in 
> that case and keep the optimal pre-indexed addressing for the common 
> cases.

It looks like the new code does exactly what we want, so I think we could
actually have the best of both worlds: pre-index addressing and no
writeback.

Will

--- >8

Comments

Nicolas Pitre Aug. 20, 2012, 4:09 p.m. UTC | #1
On Mon, 20 Aug 2012, Will Deacon wrote:
> On Mon, Aug 20, 2012 at 02:29:31PM +0100, Nicolas Pitre wrote:
> > On Mon, 20 Aug 2012, Will Deacon wrote:
> > > That said, we could try the "Qo" constraints, which I seem to 
> > > remember don't generate writebacks. I'll have a play.
> > 
> > OK.  That would be excellent.
> 
> Looks like we have a winner (diff against the original patch below). I now
> see:
> 
> 00000340 <hw_init>:
>  340:   e1a03000        mov     r3, r0
>  344:   e3a00000        mov     r0, #0
>  348:   e5830010        str     r0, [r3, #16]
>  34c:   e3e02000        mvn     r2, #0
>  350:   e5832014        str     r2, [r3, #20]
>  354:   e5932018        ldr     r2, [r3, #24]
>  358:   e38220ff        orr     r2, r2, #255    ; 0xff
>  35c:   e5832030        str     r2, [r3, #48]   ; 0x30
>  360:   e12fff1e        bx      lr
> 
> with the new code, which is basically the same as the old code but the mvn and
> a str have switched places. The same difference occurs when targetting Thumb2.

OK.  Of course the compiler cannot have the same cost evaluation when 
instructions are hidden inside an asm statement which might change the 
instruction scheduling slightly.  But I don't think that matters much 
for IO accesses.

> --- >8
> 
> diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
> index b54d687..bbc94c2 100644
> --- a/arch/arm/include/asm/io.h
> +++ b/arch/arm/include/asm/io.h
> @@ -63,38 +63,50 @@ extern void __raw_readsl(const void __iomem *addr, void *data, int longlen);
>   */
>  static inline void __raw_writew(u16 val, volatile void __iomem *addr)
>  {
> -       asm volatile("strh %0, [%1]" : : "r" (val), "r" (addr));
> +       asm volatile("strh %1, %0"
> +                    : "=Qo" (*(volatile u16 __force *)addr)
> +                    : "r" (val));
>  }
>  
>  static inline u16 __raw_readw(const volatile void __iomem *addr)
>  {
>         u16 val;
> -       asm volatile("ldrh %0, [%1]" : "=r" (val) : "r" (addr));
> +       asm volatile("ldrh %0, %1"
> +                    : "=r" (val)
> +                    : "Qo" (*(const volatile u16 __force *)addr));
>         return val;
>  }

Semantically, I think the qualifier on the Qo constraint should be + as 
in "+Qo" listed in the input operand list in both cases since we may not 
assume anything about the memory location when it is referring to IO 
registers.  It is not because you write to it that previous writes can 
be optimized away, and it is not because you read from it that the 
accessed memory location will remain the same after the read.  Granted, 
the volatile should take care of that, but it doesn't hurt to be 
explicit.

If you fix that then you can add...

Reviewed-by: Nicolas Pitre <nico@linaro.org>


Nicolas
Will Deacon Aug. 20, 2012, 4:54 p.m. UTC | #2
On Mon, Aug 20, 2012 at 05:09:06PM +0100, Nicolas Pitre wrote:
> On Mon, 20 Aug 2012, Will Deacon wrote:
> > diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
> > index b54d687..bbc94c2 100644
> > --- a/arch/arm/include/asm/io.h
> > +++ b/arch/arm/include/asm/io.h
> > @@ -63,38 +63,50 @@ extern void __raw_readsl(const void __iomem *addr, void *data, int longlen);
> >   */
> >  static inline void __raw_writew(u16 val, volatile void __iomem *addr)
> >  {
> > -       asm volatile("strh %0, [%1]" : : "r" (val), "r" (addr));
> > +       asm volatile("strh %1, %0"
> > +                    : "=Qo" (*(volatile u16 __force *)addr)
> > +                    : "r" (val));
> >  }
> >  
> >  static inline u16 __raw_readw(const volatile void __iomem *addr)
> >  {
> >         u16 val;
> > -       asm volatile("ldrh %0, [%1]" : "=r" (val) : "r" (addr));
> > +       asm volatile("ldrh %0, %1"
> > +                    : "=r" (val)
> > +                    : "Qo" (*(const volatile u16 __force *)addr));
> >         return val;
> >  }
> 
> Semantically, I think the qualifier on the Qo constraint should be + as 
> in "+Qo" listed in the input operand list in both cases since we may not 
> assume anything about the memory location when it is referring to IO 
> registers.  It is not because you write to it that previous writes can 
> be optimized away, and it is not because you read from it that the 
> accessed memory location will remain the same after the read.  Granted, 
> the volatile should take care of that, but it doesn't hurt to be 
> explicit.

Hmm, ok. I too would hope that the volatile keyword would sort that out but,
since the '+' doesn't seem to change the generated code, I can add that. It
does, however, mean we have to cast away the `const' in the read accessors
which makes the code even uglier.

> Reviewed-by: Nicolas Pitre <nico@linaro.org>

Cheers Nicolas, I'll post the patch independently with your tag.

Will
Nicolas Pitre Aug. 20, 2012, 6:04 p.m. UTC | #3
On Mon, 20 Aug 2012, Will Deacon wrote:

> On Mon, Aug 20, 2012 at 05:09:06PM +0100, Nicolas Pitre wrote:
> > On Mon, 20 Aug 2012, Will Deacon wrote:
> > > diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
> > > index b54d687..bbc94c2 100644
> > > --- a/arch/arm/include/asm/io.h
> > > +++ b/arch/arm/include/asm/io.h
> > > @@ -63,38 +63,50 @@ extern void __raw_readsl(const void __iomem *addr, void *data, int longlen);
> > >   */
> > >  static inline void __raw_writew(u16 val, volatile void __iomem *addr)
> > >  {
> > > -       asm volatile("strh %0, [%1]" : : "r" (val), "r" (addr));
> > > +       asm volatile("strh %1, %0"
> > > +                    : "=Qo" (*(volatile u16 __force *)addr)
> > > +                    : "r" (val));
> > >  }
> > >  
> > >  static inline u16 __raw_readw(const volatile void __iomem *addr)
> > >  {
> > >         u16 val;
> > > -       asm volatile("ldrh %0, [%1]" : "=r" (val) : "r" (addr));
> > > +       asm volatile("ldrh %0, %1"
> > > +                    : "=r" (val)
> > > +                    : "Qo" (*(const volatile u16 __force *)addr));
> > >         return val;
> > >  }
> > 
> > Semantically, I think the qualifier on the Qo constraint should be + as 
> > in "+Qo" listed in the input operand list in both cases since we may not 
> > assume anything about the memory location when it is referring to IO 
> > registers.  It is not because you write to it that previous writes can 
> > be optimized away, and it is not because you read from it that the 
> > accessed memory location will remain the same after the read.  Granted, 
> > the volatile should take care of that, but it doesn't hurt to be 
> > explicit.
> 
> Hmm, ok. I too would hope that the volatile keyword would sort that out but,
> since the '+' doesn't seem to change the generated code, I can add that. It
> does, however, mean we have to cast away the `const' in the read accessors
> which makes the code even uglier.

Nah... the const is wrong.  The way you wrote it means that addr may 
change but the pointed data is constant.  This is obviously wrong since 
we expect the pointed location to change even from a read.


Nicolas
Will Deacon Aug. 20, 2012, 6:10 p.m. UTC | #4
On Mon, Aug 20, 2012 at 07:04:01PM +0100, Nicolas Pitre wrote:
> On Mon, 20 Aug 2012, Will Deacon wrote:
> > 
> > Hmm, ok. I too would hope that the volatile keyword would sort that out but,
> > since the '+' doesn't seem to change the generated code, I can add that. It
> > does, however, mean we have to cast away the `const' in the read accessors
> > which makes the code even uglier.
> 
> Nah... the const is wrong.  The way you wrote it means that addr may 
> change but the pointed data is constant.  This is obviously wrong since 
> we expect the pointed location to change even from a read.

That's the prototype for the read accessors though -- a bunch of other
architectures define them that way (including asm-generic), so I wonder what
the reasoning behind that was?

Will
Nicolas Pitre Aug. 20, 2012, 6:45 p.m. UTC | #5
On Mon, 20 Aug 2012, Will Deacon wrote:

> On Mon, Aug 20, 2012 at 07:04:01PM +0100, Nicolas Pitre wrote:
> > On Mon, 20 Aug 2012, Will Deacon wrote:
> > > 
> > > Hmm, ok. I too would hope that the volatile keyword would sort that out but,
> > > since the '+' doesn't seem to change the generated code, I can add that. It
> > > does, however, mean we have to cast away the `const' in the read accessors
> > > which makes the code even uglier.
> > 
> > Nah... the const is wrong.  The way you wrote it means that addr may 
> > change but the pointed data is constant.  This is obviously wrong since 
> > we expect the pointed location to change even from a read.
> 
> That's the prototype for the read accessors though -- a bunch of other
> architectures define them that way (including asm-generic), so I wonder what
> the reasoning behind that was?

I'm still asserting that they're wrong.


Nicolas
Will Deacon Aug. 21, 2012, 9:02 a.m. UTC | #6
On Mon, Aug 20, 2012 at 07:45:20PM +0100, Nicolas Pitre wrote:
> On Mon, 20 Aug 2012, Will Deacon wrote:
> > On Mon, Aug 20, 2012 at 07:04:01PM +0100, Nicolas Pitre wrote:
> > > Nah... the const is wrong.  The way you wrote it means that addr may 
> > > change but the pointed data is constant.  This is obviously wrong since 
> > > we expect the pointed location to change even from a read.
> > 
> > That's the prototype for the read accessors though -- a bunch of other
> > architectures define them that way (including asm-generic), so I wonder what
> > the reasoning behind that was?
> 
> I'm still asserting that they're wrong.

I did a bit of digging around and `const volatile void *' is apparently used
because a function with such a parameter type can be passed any old pointer
without warnings. Torvalds says something about them here:

  http://readlist.com/lists/vger.kernel.org/linux-kernel/14/72300.html

Personally, I too think that the const is misleading and who on Earth would
be passing in pointers to const for an I/O region? However, it's an argument
I'd rather avoid so, for the sake of consistency, I'll cast away the const in
the asm block.

Cheers,

Will
Arnd Bergmann Aug. 21, 2012, 10:11 a.m. UTC | #7
On Tuesday 21 August 2012, Will Deacon wrote:
> On Mon, Aug 20, 2012 at 07:45:20PM +0100, Nicolas Pitre wrote:
> > On Mon, 20 Aug 2012, Will Deacon wrote:
> > > On Mon, Aug 20, 2012 at 07:04:01PM +0100, Nicolas Pitre wrote:
> > > > Nah... the const is wrong.  The way you wrote it means that addr may 
> > > > change but the pointed data is constant.  This is obviously wrong since 
> > > > we expect the pointed location to change even from a read.
> > > 
> > > That's the prototype for the read accessors though -- a bunch of other
> > > architectures define them that way (including asm-generic), so I wonder what
> > > the reasoning behind that was?
> > 
> > I'm still asserting that they're wrong.
> 
> I did a bit of digging around and `const volatile void *' is apparently used
> because a function with such a parameter type can be passed any old pointer
> without warnings. Torvalds says something about them here:
> 
>   http://readlist.com/lists/vger.kernel.org/linux-kernel/14/72300.html
> 
> Personally, I too think that the const is misleading and who on Earth would
> be passing in pointers to const for an I/O region? However, it's an argument
> I'd rather avoid so, for the sake of consistency, I'll cast away the const in
> the asm block.
> 

You could have a read-only register area, and I would regard it as sensible
to mark a pointer to it as "const", although there should not be any
optimizations based on that.

There is no need to cast away the constness of the pointer when passing
it into the inline assembly, as the "asm volatile" already implies that
gcc cannot remove the contents.

On a related topic, thank you very much for introducing the inline
assemblies here, as they finally solve a lingering problem that has
hit us in the past [1] and that could happen again in other drivers.

	Arnd

[1] http://old.nabble.com/ARM-unaligned-MMIO-access-with-attribute%28%28packed%29%29-td30827280.html
Nicolas Pitre Aug. 21, 2012, 12:33 p.m. UTC | #8
On Tue, 21 Aug 2012, Will Deacon wrote:

> On Mon, Aug 20, 2012 at 07:45:20PM +0100, Nicolas Pitre wrote:
> > On Mon, 20 Aug 2012, Will Deacon wrote:
> > > On Mon, Aug 20, 2012 at 07:04:01PM +0100, Nicolas Pitre wrote:
> > > > Nah... the const is wrong.  The way you wrote it means that addr may 
> > > > change but the pointed data is constant.  This is obviously wrong since 
> > > > we expect the pointed location to change even from a read.
> > > 
> > > That's the prototype for the read accessors though -- a bunch of other
> > > architectures define them that way (including asm-generic), so I wonder what
> > > the reasoning behind that was?
> > 
> > I'm still asserting that they're wrong.
> 
> I did a bit of digging around and `const volatile void *' is apparently used
> because a function with such a parameter type can be passed any old pointer
> without warnings. Torvalds says something about them here:
> 
>   http://readlist.com/lists/vger.kernel.org/linux-kernel/14/72300.html
> 
> Personally, I too think that the const is misleading and who on Earth would
> be passing in pointers to const for an I/O region? However, it's an argument
> I'd rather avoid so, for the sake of consistency, I'll cast away the const in
> the asm block.

OK.  Either that or your previous patch should do then.


Nicolas
diff mbox

Patch

diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
index b54d687..bbc94c2 100644
--- a/arch/arm/include/asm/io.h
+++ b/arch/arm/include/asm/io.h
@@ -63,38 +63,50 @@  extern void __raw_readsl(const void __iomem *addr, void *data, int longlen);
  */
 static inline void __raw_writew(u16 val, volatile void __iomem *addr)
 {
-       asm volatile("strh %0, [%1]" : : "r" (val), "r" (addr));
+       asm volatile("strh %1, %0"
+                    : "=Qo" (*(volatile u16 __force *)addr)
+                    : "r" (val));
 }
 
 static inline u16 __raw_readw(const volatile void __iomem *addr)
 {
        u16 val;
-       asm volatile("ldrh %0, [%1]" : "=r" (val) : "r" (addr));
+       asm volatile("ldrh %0, %1"
+                    : "=r" (val)
+                    : "Qo" (*(const volatile u16 __force *)addr));
        return val;
 }
 #endif
 
 static inline void __raw_writeb(u8 val, volatile void __iomem *addr)
 {
-       asm volatile("strb %0, [%1]" : : "r" (val), "r" (addr));
+       asm volatile("strb %1, %0"
+                    : "=Qo" (*(volatile u8 __force *)addr)
+                    : "r" (val));
 }
 
 static inline void __raw_writel(u32 val, volatile void __iomem *addr)
 {
-       asm volatile("str %0, [%1]" : : "r" (val), "r" (addr));
+       asm volatile("str %1, %0"
+                    : "=Qo" (*(volatile u32 __force *)addr)
+                    : "r" (val));
 }
 
 static inline u8 __raw_readb(const volatile void __iomem *addr)
 {
        u8 val;
-       asm volatile("ldrb %0, [%1]" : "=r" (val) : "r" (addr));
+       asm volatile("ldrb %0, %1"
+                    : "=r" (val)
+                    : "Qo" (*(const volatile u8 __force *)addr));
        return val;
 }
 
 static inline u32 __raw_readl(const volatile void __iomem *addr)
 {
        u32 val;
-       asm volatile("ldr %0, [%1]" : "=r" (val) : "r" (addr));
+       asm volatile("ldr %0, %1"
+                    : "=r" (val)
+                    : "Qo" (*(const volatile u32 __force *)addr));
        return val;
 }