Message ID | E1ZaL45-0003fs-1k@rmk-PC.arm.linux.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Russell King <rmk+kernel@arm.linux.org.uk> writes: > We need to have memory dependencies on get_domain/set_domain to avoid > the compiler over-optimising these inline assembly instructions. > > Loads/stores must not be reordered across a set_domain(), so introduce > a compiler barrier for that assembly. > > The value of get_domain() must not be cached across a set_domain(), but > we still want to allow the compiler to optimise it away. Introduce a > dependency on current_thread_info()->cpu_domain to avoid this; the new > memory clobber in set_domain() should therefore cause the compiler to > re-load this. The other advantage of using this is we should have its > address in the register set already, or very soon after at most call > sites. > > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Tested-by: Robert Jarzmik <robert.jarzmik@free.fr> The test is failing without these 2 patches, while with them, an unaligned access is fixed, and the generated code looks good : 7ac: ee13cf10 mrc 15, 0, ip, cr3, cr0, {0} 7b0: e3cc300c bic r3, ip, #12 7b4: e58dc014 str ip, [sp, #20] 7b8: e3833004 orr r3, r3, #4 7bc: ee033f10 mcr 15, 0, r3, cr3, cr0, {0} Cheers. -- Robert
On Fri, Sep 11, 2015 at 04:56:27PM +0200, Robert Jarzmik wrote: > Russell King <rmk+kernel@arm.linux.org.uk> writes: > > We need to have memory dependencies on get_domain/set_domain to avoid > > the compiler over-optimising these inline assembly instructions. > > > > Loads/stores must not be reordered across a set_domain(), so introduce > > a compiler barrier for that assembly. > > > > The value of get_domain() must not be cached across a set_domain(), but > > we still want to allow the compiler to optimise it away. Introduce a > > dependency on current_thread_info()->cpu_domain to avoid this; the new > > memory clobber in set_domain() should therefore cause the compiler to > > re-load this. The other advantage of using this is we should have its > > address in the register set already, or very soon after at most call > > sites. > > > > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> > Tested-by: Robert Jarzmik <robert.jarzmik@free.fr> > > The test is failing without these 2 patches, while with them, an unaligned > access is fixed, and the generated code looks good : > 7ac: ee13cf10 mrc 15, 0, ip, cr3, cr0, {0} > 7b0: e3cc300c bic r3, ip, #12 > 7b4: e58dc014 str ip, [sp, #20] > 7b8: e3833004 orr r3, r3, #4 > 7bc: ee033f10 mcr 15, 0, r3, cr3, cr0, {0} Thanks Robert, I've queued these as fixes now. If you want to put the bug alignment patch in the patch system, I'll get that off to Linus this weekend too.
Russell King - ARM Linux <linux@arm.linux.org.uk> writes: > If you want to put the bug alignment patch in the patch system, I'll get > that off to Linus this weekend too. Sure, that will be done before tomorrow evening. Cheers.
diff --git a/arch/arm/include/asm/domain.h b/arch/arm/include/asm/domain.h index e878129f2fee..fc8ba1663601 100644 --- a/arch/arm/include/asm/domain.h +++ b/arch/arm/include/asm/domain.h @@ -12,6 +12,7 @@ #ifndef __ASSEMBLY__ #include <asm/barrier.h> +#include <asm/thread_info.h> #endif /* @@ -89,7 +90,8 @@ static inline unsigned int get_domain(void) asm( "mrc p15, 0, %0, c3, c0 @ get domain" - : "=r" (domain)); + : "=r" (domain) + : "m" (current_thread_info()->cpu_domain)); return domain; } @@ -98,7 +100,7 @@ static inline void set_domain(unsigned val) { asm volatile( "mcr p15, 0, %0, c3, c0 @ set domain" - : : "r" (val)); + : : "r" (val) : "memory"); isb(); }
We need to have memory dependencies on get_domain/set_domain to avoid the compiler over-optimising these inline assembly instructions. Loads/stores must not be reordered across a set_domain(), so introduce a compiler barrier for that assembly. The value of get_domain() must not be cached across a set_domain(), but we still want to allow the compiler to optimise it away. Introduce a dependency on current_thread_info()->cpu_domain to avoid this; the new memory clobber in set_domain() should therefore cause the compiler to re-load this. The other advantage of using this is we should have its address in the register set already, or very soon after at most call sites. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> --- arch/arm/include/asm/domain.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)