diff mbox

parisc: Align locks for LWS syscalls to L1 cache size

Message ID 20150902193840.GA4499@ls3530.box (mailing list archive)
State Superseded
Headers show

Commit Message

Helge Deller Sept. 2, 2015, 7:38 p.m. UTC
Align the locks for the Light weight syscall (LWS) which is used for
atomic userspace operations (e.g. gcc atomic builtins) on L1 cache
boundaries. This should speed up LWS calls on PA20 systems.

Reported-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>

--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

John David Anglin Sept. 2, 2015, 7:46 p.m. UTC | #1
The LWS locks are also used for futex operations.  The shifts in
arch/parisc/include/asm/futex.h need a corresponding update.

Dave

On 2015-09-02 3:38 PM, Helge Deller wrote:
> Align the locks for the Light weight syscall (LWS) which is used for
> atomic userspace operations (e.g. gcc atomic builtins) on L1 cache
> boundaries. This should speed up LWS calls on PA20 systems.
>
> Reported-by: John David Anglin <dave.anglin@bell.net>
> Signed-off-by: Helge Deller <deller@gmx.de>
>
> diff --git a/arch/parisc/kernel/syscall.S b/arch/parisc/kernel/syscall.S
> index 7ef22e3..80c2306 100644
> --- a/arch/parisc/kernel/syscall.S
> +++ b/arch/parisc/kernel/syscall.S
> @@ -561,9 +561,9 @@ lws_compare_and_swap:
>   	extru  %r26, 27, 4, %r20
>   
>   	/* Find lock to use, the hash is either one of 0 to
> -	   15, multiplied by 16 (keep it 16-byte aligned)
> +	   15, multiplied by L1_CACHE_BYTES (keep it L1 cache aligned)
>   	   and add to the lock table offset. */
> -	shlw	%r20, 4, %r20
> +	shlw	%r20, L1_CACHE_SHIFT, %r20
>   	add	%r20, %r28, %r20
>   
>   # if ENABLE_LWS_DEBUG
> @@ -751,9 +751,9 @@ cas2_lock_start:
>   	extru  %r26, 27, 4, %r20
>   
>   	/* Find lock to use, the hash is either one of 0 to
> -	   15, multiplied by 16 (keep it 16-byte aligned)
> +	   15, multiplied by L1_CACHE_BYTES (keep it L1 cache aligned)
>   	   and add to the lock table offset. */
> -	shlw	%r20, 4, %r20
> +	shlw	%r20, L1_CACHE_SHIFT, %r20
>   	add	%r20, %r28, %r20
>   
>   	rsm	PSW_SM_I, %r0			/* Disable interrupts */
> @@ -931,11 +931,9 @@ END(sys_call_table64)
>   ENTRY(lws_lock_start)
>   	/* lws locks */
>   	.rept 16
> -	/* Keep locks aligned at 16-bytes */
> +	/* Keep locks aligned to L1_CACHE_BYTES */
>   	.word 1
> -	.word 0
> -	.word 0
> -	.word 0
> +	.align	L1_CACHE_BYTES
>   	.endr
>   END(lws_lock_start)
>   	.previous
>
>
James Bottomley Sept. 2, 2015, 9:32 p.m. UTC | #2
On Wed, 2015-09-02 at 21:38 +0200, Helge Deller wrote:
> Align the locks for the Light weight syscall (LWS) which is used for
> atomic userspace operations (e.g. gcc atomic builtins) on L1 cache
> boundaries. This should speed up LWS calls on PA20 systems.

Is there any evidence for this?  The architectural requirement for ldcw
on which all this is based is pegged at 16 bytes.  This implies that the
burst width on PA88/89 may indeed be 128 bytes, but the coherence width
for operations may still be 16 bytes.  If that speculation is true,
there's no speed at all gained by aligning ldcw to 128 bytes and all you
do is waste space.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Helge Deller Sept. 2, 2015, 10:18 p.m. UTC | #3
On 02.09.2015 23:32, James Bottomley wrote:
> On Wed, 2015-09-02 at 21:38 +0200, Helge Deller wrote:
>> Align the locks for the Light weight syscall (LWS) which is used for
>> atomic userspace operations (e.g. gcc atomic builtins) on L1 cache
>> boundaries. This should speed up LWS calls on PA20 systems.
> 
> Is there any evidence for this?  The architectural requirement for ldcw
> on which all this is based is pegged at 16 bytes.  This implies that the
> burst width on PA88/89 may indeed be 128 bytes, but the coherence width
> for operations may still be 16 bytes.  If that speculation is true,
> there's no speed at all gained by aligning ldcw to 128 bytes and all you
> do is waste space.

Sure, we'll have to measure timings here...

Helge

--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/parisc/kernel/syscall.S b/arch/parisc/kernel/syscall.S
index 7ef22e3..80c2306 100644
--- a/arch/parisc/kernel/syscall.S
+++ b/arch/parisc/kernel/syscall.S
@@ -561,9 +561,9 @@  lws_compare_and_swap:
 	extru  %r26, 27, 4, %r20
 
 	/* Find lock to use, the hash is either one of 0 to
-	   15, multiplied by 16 (keep it 16-byte aligned)
+	   15, multiplied by L1_CACHE_BYTES (keep it L1 cache aligned)
 	   and add to the lock table offset. */
-	shlw	%r20, 4, %r20
+	shlw	%r20, L1_CACHE_SHIFT, %r20
 	add	%r20, %r28, %r20
 
 # if ENABLE_LWS_DEBUG
@@ -751,9 +751,9 @@  cas2_lock_start:
 	extru  %r26, 27, 4, %r20
 
 	/* Find lock to use, the hash is either one of 0 to
-	   15, multiplied by 16 (keep it 16-byte aligned)
+	   15, multiplied by L1_CACHE_BYTES (keep it L1 cache aligned)
 	   and add to the lock table offset. */
-	shlw	%r20, 4, %r20
+	shlw	%r20, L1_CACHE_SHIFT, %r20
 	add	%r20, %r28, %r20
 
 	rsm	PSW_SM_I, %r0			/* Disable interrupts */
@@ -931,11 +931,9 @@  END(sys_call_table64)
 ENTRY(lws_lock_start)
 	/* lws locks */
 	.rept 16
-	/* Keep locks aligned at 16-bytes */
+	/* Keep locks aligned to L1_CACHE_BYTES */
 	.word 1
-	.word 0 
-	.word 0
-	.word 0
+	.align	L1_CACHE_BYTES
 	.endr
 END(lws_lock_start)
 	.previous