diff mbox

[RFC] arm64: cmpxchg.h: Bring ldxr and stxr closer

Message ID 1425067776-2794-1-git-send-email-bobby.prani@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Pranith Kumar Feb. 27, 2015, 8:09 p.m. UTC
ARM64 documentation recommends keeping exclusive loads and stores as close as
possible. Any instructions which do not depend on the value loaded should be
moved outside. 

In the current implementation of cmpxchg(), there is a mov instruction which can
be pulled before the load exclusive instruction without any change in
functionality. This patch does that change.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
---
 arch/arm64/include/asm/cmpxchg.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Comments

Will Deacon Feb. 27, 2015, 8:15 p.m. UTC | #1
On Fri, Feb 27, 2015 at 08:09:17PM +0000, Pranith Kumar wrote:
> ARM64 documentation recommends keeping exclusive loads and stores as close as
> possible. Any instructions which do not depend on the value loaded should be
> moved outside. 
> 
> In the current implementation of cmpxchg(), there is a mov instruction which can
> be pulled before the load exclusive instruction without any change in
> functionality. This patch does that change.
> 
> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
> ---
>  arch/arm64/include/asm/cmpxchg.h | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)

[...]

> @@ -166,11 +166,11 @@ static inline int __cmpxchg_double(volatile void *ptr1, volatile void *ptr2,
>  		VM_BUG_ON((unsigned long *)ptr2 - (unsigned long *)ptr1 != 1);
>  		do {
>  			asm volatile("// __cmpxchg_double8\n"
> +			"	mov	%w0, #0\n"
>  			"	ldxp	%0, %1, %2\n"

Seriously, you might want to test this before you mindlessly make changes to
low-level synchronisation code. Not only is the change completely unnecessary
but it is actively harmful.

Have a good weekend,

Will
Pranith Kumar Feb. 27, 2015, 8:17 p.m. UTC | #2
On Fri, Feb 27, 2015 at 3:15 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Fri, Feb 27, 2015 at 08:09:17PM +0000, Pranith Kumar wrote:
>> ARM64 documentation recommends keeping exclusive loads and stores as close as
>> possible. Any instructions which do not depend on the value loaded should be
>> moved outside.
>>
>> In the current implementation of cmpxchg(), there is a mov instruction which can
>> be pulled before the load exclusive instruction without any change in
>> functionality. This patch does that change.
>>
>> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
>> ---
>>  arch/arm64/include/asm/cmpxchg.h | 10 +++++-----
>>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> [...]
>
>> @@ -166,11 +166,11 @@ static inline int __cmpxchg_double(volatile void *ptr1, volatile void *ptr2,
>>               VM_BUG_ON((unsigned long *)ptr2 - (unsigned long *)ptr1 != 1);
>>               do {
>>                       asm volatile("// __cmpxchg_double8\n"
>> +                     "       mov     %w0, #0\n"
>>                       "       ldxp    %0, %1, %2\n"
>
> Seriously, you might want to test this before you mindlessly make changes to
> low-level synchronisation code. Not only is the change completely unnecessary
> but it is actively harmful.
>

Oops, I apologize for this. I should have looked more closely. It is
wrong to do this in cmpxchg_double(). What about the other cases?
Pranith Kumar Feb. 27, 2015, 8:29 p.m. UTC | #3
On Fri, Feb 27, 2015 at 3:17 PM, Pranith Kumar <bobby.prani@gmail.com> wrote:
> On Fri, Feb 27, 2015 at 3:15 PM, Will Deacon <will.deacon@arm.com> wrote:
>>> @@ -166,11 +166,11 @@ static inline int __cmpxchg_double(volatile void *ptr1, volatile void *ptr2,
>>>               VM_BUG_ON((unsigned long *)ptr2 - (unsigned long *)ptr1 != 1);
>>>               do {
>>>                       asm volatile("// __cmpxchg_double8\n"
>>> +                     "       mov     %w0, #0\n"
>>>                       "       ldxp    %0, %1, %2\n"
>>
>> Seriously, you might want to test this before you mindlessly make changes to
>> low-level synchronisation code. Not only is the change completely unnecessary
>> but it is actively harmful.
>>
>
> Oops, I apologize for this. I should have looked more closely. It is
> wrong to do this in cmpxchg_double(). What about the other cases?
>

Hi Will,

I tried looking closely on what might be the problem here. I am
waiting on a HiKey arm64 board and I agree I should not send in
changes without running/testing them first.

Could you please explain (for educational purposes) why you think this
change is harmful?


Thanks,
Catalin Marinas March 3, 2015, 2:34 p.m. UTC | #4
On Fri, Feb 27, 2015 at 03:29:38PM -0500, Pranith Kumar wrote:
> On Fri, Feb 27, 2015 at 3:17 PM, Pranith Kumar <bobby.prani@gmail.com> wrote:
> > On Fri, Feb 27, 2015 at 3:15 PM, Will Deacon <will.deacon@arm.com> wrote:
> >>> @@ -166,11 +166,11 @@ static inline int __cmpxchg_double(volatile void *ptr1, volatile void *ptr2,
> >>>               VM_BUG_ON((unsigned long *)ptr2 - (unsigned long *)ptr1 != 1);
> >>>               do {
> >>>                       asm volatile("// __cmpxchg_double8\n"
> >>> +                     "       mov     %w0, #0\n"
> >>>                       "       ldxp    %0, %1, %2\n"
> >>
> >> Seriously, you might want to test this before you mindlessly make changes to
> >> low-level synchronisation code. Not only is the change completely unnecessary
> >> but it is actively harmful.
> >>
> >
> > Oops, I apologize for this. I should have looked more closely. It is
> > wrong to do this in cmpxchg_double(). What about the other cases?
> 
> I tried looking closely on what might be the problem here. I am
> waiting on a HiKey arm64 board and I agree I should not send in
> changes without running/testing them first.
> 
> Could you please explain (for educational purposes) why you think this
> change is harmful?

Do you mean the cmpxchg_double() change? Becuase %w0 and %0 is the same
physical register. You set it to 0 and immediately override it with
ldxp.
Pranith Kumar March 3, 2015, 4:58 p.m. UTC | #5
On Tue, Mar 3, 2015 at 9:34 AM, Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> Do you mean the cmpxchg_double() change? Becuase %w0 and %0 is the same
> physical register. You set it to 0 and immediately override it with
> ldxp.
>

Thanks Catalin. I realized the blunder a while after Will pointed it
out. The asm here was a bit confusing. %0 usually refers to the first
input/output variable. But for ldxp instruction(which is just a
double-word load, not exclusive), it refers to the physical registers.

What about the changes in cmpxchg()? Why do we need to set %w0 to 0
after the ldxrh instruction? Also, could you please point me to any
arm64 asm reference?

Thanks!
Catalin Marinas March 3, 2015, 5:22 p.m. UTC | #6
On Tue, Mar 03, 2015 at 11:58:26AM -0500, Pranith Kumar wrote:
> On Tue, Mar 3, 2015 at 9:34 AM, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > Do you mean the cmpxchg_double() change? Becuase %w0 and %0 is the same
> > physical register. You set it to 0 and immediately override it with
> > ldxp.
> 
> Thanks Catalin. I realized the blunder a while after Will pointed it
> out. The asm here was a bit confusing. %0 usually refers to the first
> input/output variable. But for ldxp instruction(which is just a
> double-word load, not exclusive), it refers to the physical registers.

OK, so please try not to touch any asm code until you understood (a) the
AArch64 instruction set and (b) the gcc inline assembly syntax ;).

> What about the changes in cmpxchg()? Why do we need to set %w0 to 0
> after the ldxrh instruction? Also, could you please point me to any
> arm64 asm reference?

"info gcc" for the inline assembly syntax and the ARMv8 ARM for the
description of the AArch64 instruction set. It also helps if you look at
the code generated by the compiler (e.g. objdump -d).
diff mbox

Patch

diff --git a/arch/arm64/include/asm/cmpxchg.h b/arch/arm64/include/asm/cmpxchg.h
index cb95930..8057735 100644
--- a/arch/arm64/include/asm/cmpxchg.h
+++ b/arch/arm64/include/asm/cmpxchg.h
@@ -89,8 +89,8 @@  static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
 	case 1:
 		do {
 			asm volatile("// __cmpxchg1\n"
-			"	ldxrb	%w1, %2\n"
 			"	mov	%w0, #0\n"
+			"	ldxrb	%w1, %2\n"
 			"	cmp	%w1, %w3\n"
 			"	b.ne	1f\n"
 			"	stxrb	%w0, %w4, %2\n"
@@ -104,8 +104,8 @@  static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
 	case 2:
 		do {
 			asm volatile("// __cmpxchg2\n"
-			"	ldxrh	%w1, %2\n"
 			"	mov	%w0, #0\n"
+			"	ldxrh	%w1, %2\n"
 			"	cmp	%w1, %w3\n"
 			"	b.ne	1f\n"
 			"	stxrh	%w0, %w4, %2\n"
@@ -119,8 +119,8 @@  static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
 	case 4:
 		do {
 			asm volatile("// __cmpxchg4\n"
-			"	ldxr	%w1, %2\n"
 			"	mov	%w0, #0\n"
+			"	ldxr	%w1, %2\n"
 			"	cmp	%w1, %w3\n"
 			"	b.ne	1f\n"
 			"	stxr	%w0, %w4, %2\n"
@@ -134,8 +134,8 @@  static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
 	case 8:
 		do {
 			asm volatile("// __cmpxchg8\n"
-			"	ldxr	%1, %2\n"
 			"	mov	%w0, #0\n"
+			"	ldxr	%1, %2\n"
 			"	cmp	%1, %3\n"
 			"	b.ne	1f\n"
 			"	stxr	%w0, %4, %2\n"
@@ -166,11 +166,11 @@  static inline int __cmpxchg_double(volatile void *ptr1, volatile void *ptr2,
 		VM_BUG_ON((unsigned long *)ptr2 - (unsigned long *)ptr1 != 1);
 		do {
 			asm volatile("// __cmpxchg_double8\n"
+			"	mov	%w0, #0\n"
 			"	ldxp	%0, %1, %2\n"
 			"	eor	%0, %0, %3\n"
 			"	eor	%1, %1, %4\n"
 			"	orr	%1, %0, %1\n"
-			"	mov	%w0, #0\n"
 			"	cbnz	%1, 1f\n"
 			"	stxp	%w0, %5, %6, %2\n"
 			"1:\n"