parisc: Align locks for LWS syscalls to L1 cache size

Message ID	20150902193840.GA4499@ls3530.box (mailing list archive)
State	Superseded
Headers	show Return-Path: <linux-parisc-owner@kernel.org> Date: Wed, 2 Sep 2015 21:38:40 +0200 From: Helge Deller <deller@gmx.de> To: linux-parisc@vger.kernel.org Cc: James Bottomley <James.Bottomley@HansenPartnership.com>, ohn David Anglin <dave.anglin@bell.net> Subject: [PATCH] parisc: Align locks for LWS syscalls to L1 cache size Message-ID: <20150902193840.GA4499@ls3530.box> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-parisc-owner@vger.kernel.org Precedence: bulk

Message ID

20150902193840.GA4499@ls3530.box (mailing list archive)

State

Superseded

Headers

Date: Wed, 2 Sep 2015 21:38:40 +0200
From: Helge Deller <deller@gmx.de>
To: linux-parisc@vger.kernel.org
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>,
	ohn David Anglin <dave.anglin@bell.net>
Subject: [PATCH] parisc: Align locks for LWS syscalls to L1 cache size
Message-ID: <20150902193840.GA4499@ls3530.box>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: linux-parisc-owner@vger.kernel.org
Precedence: bulk

Commit Message

Helge Deller Sept. 2, 2015, 7:38 p.m. UTC

Align the locks for the Light weight syscall (LWS) which is used for
atomic userspace operations (e.g. gcc atomic builtins) on L1 cache
boundaries. This should speed up LWS calls on PA20 systems.

Reported-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>

--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

John David Anglin Sept. 2, 2015, 7:46 p.m. UTC | #1

The LWS locks are also used for futex operations.  The shifts in
arch/parisc/include/asm/futex.h need a corresponding update.

Dave

On 2015-09-02 3:38 PM, Helge Deller wrote:
> Align the locks for the Light weight syscall (LWS) which is used for
> atomic userspace operations (e.g. gcc atomic builtins) on L1 cache
> boundaries. This should speed up LWS calls on PA20 systems.
>
> Reported-by: John David Anglin <dave.anglin@bell.net>
> Signed-off-by: Helge Deller <deller@gmx.de>
>
> diff --git a/arch/parisc/kernel/syscall.S b/arch/parisc/kernel/syscall.S
> index 7ef22e3..80c2306 100644
> --- a/arch/parisc/kernel/syscall.S
> +++ b/arch/parisc/kernel/syscall.S
> @@ -561,9 +561,9 @@ lws_compare_and_swap:
>   	extru  %r26, 27, 4, %r20
>   
>   	/* Find lock to use, the hash is either one of 0 to
> -	   15, multiplied by 16 (keep it 16-byte aligned)
> +	   15, multiplied by L1_CACHE_BYTES (keep it L1 cache aligned)
>   	   and add to the lock table offset. */
> -	shlw	%r20, 4, %r20
> +	shlw	%r20, L1_CACHE_SHIFT, %r20
>   	add	%r20, %r28, %r20
>   
>   # if ENABLE_LWS_DEBUG
> @@ -751,9 +751,9 @@ cas2_lock_start:
>   	extru  %r26, 27, 4, %r20
>   
>   	/* Find lock to use, the hash is either one of 0 to
> -	   15, multiplied by 16 (keep it 16-byte aligned)
> +	   15, multiplied by L1_CACHE_BYTES (keep it L1 cache aligned)
>   	   and add to the lock table offset. */
> -	shlw	%r20, 4, %r20
> +	shlw	%r20, L1_CACHE_SHIFT, %r20
>   	add	%r20, %r28, %r20
>   
>   	rsm	PSW_SM_I, %r0			/* Disable interrupts */
> @@ -931,11 +931,9 @@ END(sys_call_table64)
>   ENTRY(lws_lock_start)
>   	/* lws locks */
>   	.rept 16
> -	/* Keep locks aligned at 16-bytes */
> +	/* Keep locks aligned to L1_CACHE_BYTES */
>   	.word 1
> -	.word 0
> -	.word 0
> -	.word 0
> +	.align	L1_CACHE_BYTES
>   	.endr
>   END(lws_lock_start)
>   	.previous
>
>

James Bottomley Sept. 2, 2015, 9:32 p.m. UTC | #2

On Wed, 2015-09-02 at 21:38 +0200, Helge Deller wrote:
> Align the locks for the Light weight syscall (LWS) which is used for
> atomic userspace operations (e.g. gcc atomic builtins) on L1 cache
> boundaries. This should speed up LWS calls on PA20 systems.

Is there any evidence for this?  The architectural requirement for ldcw
on which all this is based is pegged at 16 bytes.  This implies that the
burst width on PA88/89 may indeed be 128 bytes, but the coherence width
for operations may still be 16 bytes.  If that speculation is true,
there's no speed at all gained by aligning ldcw to 128 bytes and all you
do is waste space.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Helge Deller Sept. 2, 2015, 10:18 p.m. UTC | #3

On 02.09.2015 23:32, James Bottomley wrote:
> On Wed, 2015-09-02 at 21:38 +0200, Helge Deller wrote:
>> Align the locks for the Light weight syscall (LWS) which is used for
>> atomic userspace operations (e.g. gcc atomic builtins) on L1 cache
>> boundaries. This should speed up LWS calls on PA20 systems.
> 
> Is there any evidence for this?  The architectural requirement for ldcw
> on which all this is based is pegged at 16 bytes.  This implies that the
> burst width on PA88/89 may indeed be 128 bytes, but the coherence width
> for operations may still be 16 bytes.  If that speculation is true,
> there's no speed at all gained by aligning ldcw to 128 bytes and all you
> do is waste space.

Sure, we'll have to measure timings here...

Helge

--
To unsubscribe from this list: send the line "unsubscribe linux-parisc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/arch/parisc/kernel/syscall.S b/arch/parisc/kernel/syscall.S
index 7ef22e3..80c2306 100644
--- a/arch/parisc/kernel/syscall.S
+++ b/arch/parisc/kernel/syscall.S
@@ -561,9 +561,9 @@  lws_compare_and_swap:
 	extru  %r26, 27, 4, %r20
 
 	/* Find lock to use, the hash is either one of 0 to
-	   15, multiplied by 16 (keep it 16-byte aligned)
+	   15, multiplied by L1_CACHE_BYTES (keep it L1 cache aligned)
 	   and add to the lock table offset. */
-	shlw	%r20, 4, %r20
+	shlw	%r20, L1_CACHE_SHIFT, %r20
 	add	%r20, %r28, %r20
 
 # if ENABLE_LWS_DEBUG
@@ -751,9 +751,9 @@  cas2_lock_start:
 	extru  %r26, 27, 4, %r20
 
 	/* Find lock to use, the hash is either one of 0 to
-	   15, multiplied by 16 (keep it 16-byte aligned)
+	   15, multiplied by L1_CACHE_BYTES (keep it L1 cache aligned)
 	   and add to the lock table offset. */
-	shlw	%r20, 4, %r20
+	shlw	%r20, L1_CACHE_SHIFT, %r20
 	add	%r20, %r28, %r20
 
 	rsm	PSW_SM_I, %r0			/* Disable interrupts */
@@ -931,11 +931,9 @@  END(sys_call_table64)
 ENTRY(lws_lock_start)
 	/* lws locks */
 	.rept 16
-	/* Keep locks aligned at 16-bytes */
+	/* Keep locks aligned to L1_CACHE_BYTES */
 	.word 1
-	.word 0 
-	.word 0
-	.word 0
+	.align	L1_CACHE_BYTES
 	.endr
 END(lws_lock_start)
 	.previous

parisc: Align locks for LWS syscalls to L1 cache size

Commit Message

Comments

Patch