Message ID | 1425067776-2794-1-git-send-email-bobby.prani@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Feb 27, 2015 at 08:09:17PM +0000, Pranith Kumar wrote: > ARM64 documentation recommends keeping exclusive loads and stores as close as > possible. Any instructions which do not depend on the value loaded should be > moved outside. > > In the current implementation of cmpxchg(), there is a mov instruction which can > be pulled before the load exclusive instruction without any change in > functionality. This patch does that change. > > Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> > --- > arch/arm64/include/asm/cmpxchg.h | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) [...] > @@ -166,11 +166,11 @@ static inline int __cmpxchg_double(volatile void *ptr1, volatile void *ptr2, > VM_BUG_ON((unsigned long *)ptr2 - (unsigned long *)ptr1 != 1); > do { > asm volatile("// __cmpxchg_double8\n" > + " mov %w0, #0\n" > " ldxp %0, %1, %2\n" Seriously, you might want to test this before you mindlessly make changes to low-level synchronisation code. Not only is the change completely unnecessary but it is actively harmful. Have a good weekend, Will
On Fri, Feb 27, 2015 at 3:15 PM, Will Deacon <will.deacon@arm.com> wrote: > On Fri, Feb 27, 2015 at 08:09:17PM +0000, Pranith Kumar wrote: >> ARM64 documentation recommends keeping exclusive loads and stores as close as >> possible. Any instructions which do not depend on the value loaded should be >> moved outside. >> >> In the current implementation of cmpxchg(), there is a mov instruction which can >> be pulled before the load exclusive instruction without any change in >> functionality. This patch does that change. >> >> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> >> --- >> arch/arm64/include/asm/cmpxchg.h | 10 +++++----- >> 1 file changed, 5 insertions(+), 5 deletions(-) > > [...] > >> @@ -166,11 +166,11 @@ static inline int __cmpxchg_double(volatile void *ptr1, volatile void *ptr2, >> VM_BUG_ON((unsigned long *)ptr2 - (unsigned long *)ptr1 != 1); >> do { >> asm volatile("// __cmpxchg_double8\n" >> + " mov %w0, #0\n" >> " ldxp %0, %1, %2\n" > > Seriously, you might want to test this before you mindlessly make changes to > low-level synchronisation code. Not only is the change completely unnecessary > but it is actively harmful. > Oops, I apologize for this. I should have looked more closely. It is wrong to do this in cmpxchg_double(). What about the other cases?
On Fri, Feb 27, 2015 at 3:17 PM, Pranith Kumar <bobby.prani@gmail.com> wrote: > On Fri, Feb 27, 2015 at 3:15 PM, Will Deacon <will.deacon@arm.com> wrote: >>> @@ -166,11 +166,11 @@ static inline int __cmpxchg_double(volatile void *ptr1, volatile void *ptr2, >>> VM_BUG_ON((unsigned long *)ptr2 - (unsigned long *)ptr1 != 1); >>> do { >>> asm volatile("// __cmpxchg_double8\n" >>> + " mov %w0, #0\n" >>> " ldxp %0, %1, %2\n" >> >> Seriously, you might want to test this before you mindlessly make changes to >> low-level synchronisation code. Not only is the change completely unnecessary >> but it is actively harmful. >> > > Oops, I apologize for this. I should have looked more closely. It is > wrong to do this in cmpxchg_double(). What about the other cases? > Hi Will, I tried looking closely on what might be the problem here. I am waiting on a HiKey arm64 board and I agree I should not send in changes without running/testing them first. Could you please explain (for educational purposes) why you think this change is harmful? Thanks,
On Fri, Feb 27, 2015 at 03:29:38PM -0500, Pranith Kumar wrote: > On Fri, Feb 27, 2015 at 3:17 PM, Pranith Kumar <bobby.prani@gmail.com> wrote: > > On Fri, Feb 27, 2015 at 3:15 PM, Will Deacon <will.deacon@arm.com> wrote: > >>> @@ -166,11 +166,11 @@ static inline int __cmpxchg_double(volatile void *ptr1, volatile void *ptr2, > >>> VM_BUG_ON((unsigned long *)ptr2 - (unsigned long *)ptr1 != 1); > >>> do { > >>> asm volatile("// __cmpxchg_double8\n" > >>> + " mov %w0, #0\n" > >>> " ldxp %0, %1, %2\n" > >> > >> Seriously, you might want to test this before you mindlessly make changes to > >> low-level synchronisation code. Not only is the change completely unnecessary > >> but it is actively harmful. > >> > > > > Oops, I apologize for this. I should have looked more closely. It is > > wrong to do this in cmpxchg_double(). What about the other cases? > > I tried looking closely on what might be the problem here. I am > waiting on a HiKey arm64 board and I agree I should not send in > changes without running/testing them first. > > Could you please explain (for educational purposes) why you think this > change is harmful? Do you mean the cmpxchg_double() change? Becuase %w0 and %0 is the same physical register. You set it to 0 and immediately override it with ldxp.
On Tue, Mar 3, 2015 at 9:34 AM, Catalin Marinas <catalin.marinas@arm.com> wrote: > > Do you mean the cmpxchg_double() change? Becuase %w0 and %0 is the same > physical register. You set it to 0 and immediately override it with > ldxp. > Thanks Catalin. I realized the blunder a while after Will pointed it out. The asm here was a bit confusing. %0 usually refers to the first input/output variable. But for ldxp instruction(which is just a double-word load, not exclusive), it refers to the physical registers. What about the changes in cmpxchg()? Why do we need to set %w0 to 0 after the ldxrh instruction? Also, could you please point me to any arm64 asm reference? Thanks!
On Tue, Mar 03, 2015 at 11:58:26AM -0500, Pranith Kumar wrote: > On Tue, Mar 3, 2015 at 9:34 AM, Catalin Marinas <catalin.marinas@arm.com> wrote: > > Do you mean the cmpxchg_double() change? Becuase %w0 and %0 is the same > > physical register. You set it to 0 and immediately override it with > > ldxp. > > Thanks Catalin. I realized the blunder a while after Will pointed it > out. The asm here was a bit confusing. %0 usually refers to the first > input/output variable. But for ldxp instruction(which is just a > double-word load, not exclusive), it refers to the physical registers. OK, so please try not to touch any asm code until you understood (a) the AArch64 instruction set and (b) the gcc inline assembly syntax ;). > What about the changes in cmpxchg()? Why do we need to set %w0 to 0 > after the ldxrh instruction? Also, could you please point me to any > arm64 asm reference? "info gcc" for the inline assembly syntax and the ARMv8 ARM for the description of the AArch64 instruction set. It also helps if you look at the code generated by the compiler (e.g. objdump -d).
diff --git a/arch/arm64/include/asm/cmpxchg.h b/arch/arm64/include/asm/cmpxchg.h index cb95930..8057735 100644 --- a/arch/arm64/include/asm/cmpxchg.h +++ b/arch/arm64/include/asm/cmpxchg.h @@ -89,8 +89,8 @@ static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old, case 1: do { asm volatile("// __cmpxchg1\n" - " ldxrb %w1, %2\n" " mov %w0, #0\n" + " ldxrb %w1, %2\n" " cmp %w1, %w3\n" " b.ne 1f\n" " stxrb %w0, %w4, %2\n" @@ -104,8 +104,8 @@ static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old, case 2: do { asm volatile("// __cmpxchg2\n" - " ldxrh %w1, %2\n" " mov %w0, #0\n" + " ldxrh %w1, %2\n" " cmp %w1, %w3\n" " b.ne 1f\n" " stxrh %w0, %w4, %2\n" @@ -119,8 +119,8 @@ static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old, case 4: do { asm volatile("// __cmpxchg4\n" - " ldxr %w1, %2\n" " mov %w0, #0\n" + " ldxr %w1, %2\n" " cmp %w1, %w3\n" " b.ne 1f\n" " stxr %w0, %w4, %2\n" @@ -134,8 +134,8 @@ static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old, case 8: do { asm volatile("// __cmpxchg8\n" - " ldxr %1, %2\n" " mov %w0, #0\n" + " ldxr %1, %2\n" " cmp %1, %3\n" " b.ne 1f\n" " stxr %w0, %4, %2\n" @@ -166,11 +166,11 @@ static inline int __cmpxchg_double(volatile void *ptr1, volatile void *ptr2, VM_BUG_ON((unsigned long *)ptr2 - (unsigned long *)ptr1 != 1); do { asm volatile("// __cmpxchg_double8\n" + " mov %w0, #0\n" " ldxp %0, %1, %2\n" " eor %0, %0, %3\n" " eor %1, %1, %4\n" " orr %1, %0, %1\n" - " mov %w0, #0\n" " cbnz %1, 1f\n" " stxp %w0, %5, %6, %2\n" "1:\n"
ARM64 documentation recommends keeping exclusive loads and stores as close as possible. Any instructions which do not depend on the value loaded should be moved outside. In the current implementation of cmpxchg(), there is a mov instruction which can be pulled before the load exclusive instruction without any change in functionality. This patch does that change. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> --- arch/arm64/include/asm/cmpxchg.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)