Message ID | 20150902193840.GA4499@ls3530.box (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
The LWS locks are also used for futex operations. The shifts in arch/parisc/include/asm/futex.h need a corresponding update. Dave On 2015-09-02 3:38 PM, Helge Deller wrote: > Align the locks for the Light weight syscall (LWS) which is used for > atomic userspace operations (e.g. gcc atomic builtins) on L1 cache > boundaries. This should speed up LWS calls on PA20 systems. > > Reported-by: John David Anglin <dave.anglin@bell.net> > Signed-off-by: Helge Deller <deller@gmx.de> > > diff --git a/arch/parisc/kernel/syscall.S b/arch/parisc/kernel/syscall.S > index 7ef22e3..80c2306 100644 > --- a/arch/parisc/kernel/syscall.S > +++ b/arch/parisc/kernel/syscall.S > @@ -561,9 +561,9 @@ lws_compare_and_swap: > extru %r26, 27, 4, %r20 > > /* Find lock to use, the hash is either one of 0 to > - 15, multiplied by 16 (keep it 16-byte aligned) > + 15, multiplied by L1_CACHE_BYTES (keep it L1 cache aligned) > and add to the lock table offset. */ > - shlw %r20, 4, %r20 > + shlw %r20, L1_CACHE_SHIFT, %r20 > add %r20, %r28, %r20 > > # if ENABLE_LWS_DEBUG > @@ -751,9 +751,9 @@ cas2_lock_start: > extru %r26, 27, 4, %r20 > > /* Find lock to use, the hash is either one of 0 to > - 15, multiplied by 16 (keep it 16-byte aligned) > + 15, multiplied by L1_CACHE_BYTES (keep it L1 cache aligned) > and add to the lock table offset. */ > - shlw %r20, 4, %r20 > + shlw %r20, L1_CACHE_SHIFT, %r20 > add %r20, %r28, %r20 > > rsm PSW_SM_I, %r0 /* Disable interrupts */ > @@ -931,11 +931,9 @@ END(sys_call_table64) > ENTRY(lws_lock_start) > /* lws locks */ > .rept 16 > - /* Keep locks aligned at 16-bytes */ > + /* Keep locks aligned to L1_CACHE_BYTES */ > .word 1 > - .word 0 > - .word 0 > - .word 0 > + .align L1_CACHE_BYTES > .endr > END(lws_lock_start) > .previous > >
On Wed, 2015-09-02 at 21:38 +0200, Helge Deller wrote: > Align the locks for the Light weight syscall (LWS) which is used for > atomic userspace operations (e.g. gcc atomic builtins) on L1 cache > boundaries. This should speed up LWS calls on PA20 systems. Is there any evidence for this? The architectural requirement for ldcw on which all this is based is pegged at 16 bytes. This implies that the burst width on PA88/89 may indeed be 128 bytes, but the coherence width for operations may still be 16 bytes. If that speculation is true, there's no speed at all gained by aligning ldcw to 128 bytes and all you do is waste space. James -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 02.09.2015 23:32, James Bottomley wrote: > On Wed, 2015-09-02 at 21:38 +0200, Helge Deller wrote: >> Align the locks for the Light weight syscall (LWS) which is used for >> atomic userspace operations (e.g. gcc atomic builtins) on L1 cache >> boundaries. This should speed up LWS calls on PA20 systems. > > Is there any evidence for this? The architectural requirement for ldcw > on which all this is based is pegged at 16 bytes. This implies that the > burst width on PA88/89 may indeed be 128 bytes, but the coherence width > for operations may still be 16 bytes. If that speculation is true, > there's no speed at all gained by aligning ldcw to 128 bytes and all you > do is waste space. Sure, we'll have to measure timings here... Helge -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/parisc/kernel/syscall.S b/arch/parisc/kernel/syscall.S index 7ef22e3..80c2306 100644 --- a/arch/parisc/kernel/syscall.S +++ b/arch/parisc/kernel/syscall.S @@ -561,9 +561,9 @@ lws_compare_and_swap: extru %r26, 27, 4, %r20 /* Find lock to use, the hash is either one of 0 to - 15, multiplied by 16 (keep it 16-byte aligned) + 15, multiplied by L1_CACHE_BYTES (keep it L1 cache aligned) and add to the lock table offset. */ - shlw %r20, 4, %r20 + shlw %r20, L1_CACHE_SHIFT, %r20 add %r20, %r28, %r20 # if ENABLE_LWS_DEBUG @@ -751,9 +751,9 @@ cas2_lock_start: extru %r26, 27, 4, %r20 /* Find lock to use, the hash is either one of 0 to - 15, multiplied by 16 (keep it 16-byte aligned) + 15, multiplied by L1_CACHE_BYTES (keep it L1 cache aligned) and add to the lock table offset. */ - shlw %r20, 4, %r20 + shlw %r20, L1_CACHE_SHIFT, %r20 add %r20, %r28, %r20 rsm PSW_SM_I, %r0 /* Disable interrupts */ @@ -931,11 +931,9 @@ END(sys_call_table64) ENTRY(lws_lock_start) /* lws locks */ .rept 16 - /* Keep locks aligned at 16-bytes */ + /* Keep locks aligned to L1_CACHE_BYTES */ .word 1 - .word 0 - .word 0 - .word 0 + .align L1_CACHE_BYTES .endr END(lws_lock_start) .previous
Align the locks for the Light weight syscall (LWS) which is used for atomic userspace operations (e.g. gcc atomic builtins) on L1 cache boundaries. This should speed up LWS calls on PA20 systems. Reported-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de> -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html