[2/5] arm64: atomics lse: define SUBs in terms of ADDs

Message ID	20211210151410.2782645-3-mark.rutland@arm.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org> From: Mark Rutland <mark.rutland@arm.com> To: linux-arm-kernel@lists.infradead.org Cc: boqun.feng@gmail.com, catalin.marinas@arm.com, mark.rutland@arm.com, peterz@infradead.org, will@kernel.org Subject: [PATCH 2/5] arm64: atomics lse: define SUBs in terms of ADDs Date: Fri, 10 Dec 2021 15:14:07 +0000 Message-Id: <20211210151410.2782645-3-mark.rutland@arm.com> In-Reply-To: <20211210151410.2782645-1-mark.rutland@arm.com> References: <20211210151410.2782645-1-mark.rutland@arm.com> MIME-Version: 1.0 Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org
Series	arm64: atomics: cleanups and codegen improvements \| expand [0/5] arm64: atomics: cleanups and codegen improvements [1/5] arm64: atomics: format whitespace consistently [2/5] arm64: atomics lse: define SUBs in terms of ADDs [3/5] arm64: atomics: lse: define ANDs in terms of ANDNOTs [4/5] arm64: atomics: lse: improve constraints for simple ops [5/5] arm64: atomics: lse: define RETURN ops in terms of FETCH ops

Message ID

20211210151410.2782645-3-mark.rutland@arm.com (mailing list archive)

State

New, archived

Headers

From: Mark Rutland <mark.rutland@arm.com>
To: linux-arm-kernel@lists.infradead.org
Cc: boqun.feng@gmail.com, catalin.marinas@arm.com, mark.rutland@arm.com,
 peterz@infradead.org, will@kernel.org
Subject: [PATCH 2/5] arm64: atomics lse: define SUBs in terms of ADDs
Date: Fri, 10 Dec 2021 15:14:07 +0000
Message-Id: <20211210151410.2782645-3-mark.rutland@arm.com>
In-Reply-To: <20211210151410.2782645-1-mark.rutland@arm.com>
References: <20211210151410.2782645-1-mark.rutland@arm.com>
MIME-Version: 1.0
Precedence: list
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: 
 linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org

Series

arm64: atomics: cleanups and codegen improvements | expand

Commit Message

Mark Rutland Dec. 10, 2021, 3:14 p.m. UTC

The FEAT_LSE atomic instructions include atomic ADD instructions
(`stadd*` and `ldadd*`), but do not include atomic SUB instructions, so
we must build all of the SUB operations using the ADD instructions. We
open-code these today, with each SUB op implemented as a copy of the
corresponding ADD op with a leading `neg` instruction in the inline
assembly to negate the `i` argument.

As the compiler has no visibility of the `neg`, this leads to less than
optimal code generation when generating `i` into a register. For
example, __les_atomic_fetch_sub(1, v) can be compiled to:

	mov     w1, #0x1
	neg     w1, w1
	ldaddal w1, w1, [x2]

This patch improves this by replacing the `neg` with negation in C
before the inline assembly block, e.g.

	i = -i;

This allows the compiler to generate `i` into a register more optimally,
e.g.

	mov     w1, #0xffffffff
	ldaddal w1, w1, [x2]

With this change the assembly for each SUB op is identical to the
corresponding ADD op (including barriers and clobbers), so I've removed
the inline assembly and rewritten each SUB op in terms of the
corresponding ADD op, e.g.

| static inline void __lse_atomic_sub(int i, atomic_t *v)
| {
| 	__lse_atomic_add(-i, v);
| }

For clarity I've moved the definition of each SUB op immediately after
the corresponding ADD op, and used a single macro to create the RETURN
forms of both ops.

This is intended as an optimization and cleanup.
There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/atomic_lse.h | 180 +++++++++-------------------
 1 file changed, 58 insertions(+), 122 deletions(-)

Comments

Will Deacon Dec. 13, 2021, 7:27 p.m. UTC | #1

On Fri, Dec 10, 2021 at 03:14:07PM +0000, Mark Rutland wrote:
> The FEAT_LSE atomic instructions include atomic ADD instructions
> (`stadd*` and `ldadd*`), but do not include atomic SUB instructions, so
> we must build all of the SUB operations using the ADD instructions. We
> open-code these today, with each SUB op implemented as a copy of the
> corresponding ADD op with a leading `neg` instruction in the inline
> assembly to negate the `i` argument.
> 
> As the compiler has no visibility of the `neg`, this leads to less than
> optimal code generation when generating `i` into a register. For
> example, __les_atomic_fetch_sub(1, v) can be compiled to:
> 
> 	mov     w1, #0x1
> 	neg     w1, w1
> 	ldaddal w1, w1, [x2]
> 
> This patch improves this by replacing the `neg` with negation in C
> before the inline assembly block, e.g.
> 
> 	i = -i;
> 
> This allows the compiler to generate `i` into a register more optimally,
> e.g.
> 
> 	mov     w1, #0xffffffff
> 	ldaddal w1, w1, [x2]
> 
> With this change the assembly for each SUB op is identical to the
> corresponding ADD op (including barriers and clobbers), so I've removed
> the inline assembly and rewritten each SUB op in terms of the
> corresponding ADD op, e.g.
> 
> | static inline void __lse_atomic_sub(int i, atomic_t *v)
> | {
> | 	__lse_atomic_add(-i, v);
> | }
> 
> For clarity I've moved the definition of each SUB op immediately after
> the corresponding ADD op, and used a single macro to create the RETURN
> forms of both ops.
> 
> This is intended as an optimization and cleanup.
> There should be no functional change as a result of this patch.
>
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/include/asm/atomic_lse.h | 180 +++++++++-------------------
>  1 file changed, 58 insertions(+), 122 deletions(-)

Great diffstat!

Acked-by: Will Deacon <will@kernel.org>

Will

diff --git a/arch/arm64/include/asm/atomic_lse.h b/arch/arm64/include/asm/atomic_lse.h
index ab661375835e..7454febb6d77 100644
--- a/arch/arm64/include/asm/atomic_lse.h
+++ b/arch/arm64/include/asm/atomic_lse.h
@@ -25,6 +25,11 @@  ATOMIC_OP(or, stset)
 ATOMIC_OP(xor, steor)
 ATOMIC_OP(add, stadd)
 
+static inline void __lse_atomic_sub(int i, atomic_t *v)
+{
+	__lse_atomic_add(-i, v);
+}
+
 #undef ATOMIC_OP
 
 #define ATOMIC_FETCH_OP(name, mb, op, asm_op, cl...)			\
@@ -54,7 +59,20 @@  ATOMIC_FETCH_OPS(add, ldadd)
 #undef ATOMIC_FETCH_OP
 #undef ATOMIC_FETCH_OPS
 
-#define ATOMIC_OP_ADD_RETURN(name, mb, cl...)				\
+#define ATOMIC_FETCH_OP_SUB(name)					\
+static inline int __lse_atomic_fetch_sub##name(int i, atomic_t *v)	\
+{									\
+	return __lse_atomic_fetch_add##name(-i, v);			\
+}
+
+ATOMIC_FETCH_OP_SUB(_relaxed)
+ATOMIC_FETCH_OP_SUB(_acquire)
+ATOMIC_FETCH_OP_SUB(_release)
+ATOMIC_FETCH_OP_SUB(        )
+
+#undef ATOMIC_FETCH_OP_SUB
+
+#define ATOMIC_OP_ADD_SUB_RETURN(name, mb, cl...)			\
 static inline int __lse_atomic_add_return##name(int i, atomic_t *v)	\
 {									\
 	u32 tmp;							\
@@ -68,14 +86,19 @@  static inline int __lse_atomic_add_return##name(int i, atomic_t *v)	\
 	: cl);								\
 									\
 	return i;							\
+}									\
+									\
+static inline int __lse_atomic_sub_return##name(int i, atomic_t *v)	\
+{									\
+	return __lse_atomic_add_return##name(-i, v);			\
 }
 
-ATOMIC_OP_ADD_RETURN(_relaxed,   )
-ATOMIC_OP_ADD_RETURN(_acquire,  a, "memory")
-ATOMIC_OP_ADD_RETURN(_release,  l, "memory")
-ATOMIC_OP_ADD_RETURN(        , al, "memory")
+ATOMIC_OP_ADD_SUB_RETURN(_relaxed,   )
+ATOMIC_OP_ADD_SUB_RETURN(_acquire,  a, "memory")
+ATOMIC_OP_ADD_SUB_RETURN(_release,  l, "memory")
+ATOMIC_OP_ADD_SUB_RETURN(        , al, "memory")
 
-#undef ATOMIC_OP_ADD_RETURN
+#undef ATOMIC_OP_ADD_SUB_RETURN
 
 static inline void __lse_atomic_and(int i, atomic_t *v)
 {
@@ -108,61 +131,6 @@  ATOMIC_FETCH_OP_AND(        , al, "memory")
 
 #undef ATOMIC_FETCH_OP_AND
 
-static inline void __lse_atomic_sub(int i, atomic_t *v)
-{
-	asm volatile(
-	__LSE_PREAMBLE
-	"	neg	%w[i], %w[i]\n"
-	"	stadd	%w[i], %[v]"
-	: [i] "+&r" (i), [v] "+Q" (v->counter)
-	: "r" (v));
-}
-
-#define ATOMIC_OP_SUB_RETURN(name, mb, cl...)				\
-static inline int __lse_atomic_sub_return##name(int i, atomic_t *v)	\
-{									\
-	u32 tmp;							\
-									\
-	asm volatile(							\
-	__LSE_PREAMBLE							\
-	"	neg	%w[i], %w[i]\n"					\
-	"	ldadd" #mb "	%w[i], %w[tmp], %[v]\n"			\
-	"	add	%w[i], %w[i], %w[tmp]"				\
-	: [i] "+&r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp)	\
-	: "r" (v)							\
-	: cl);								\
-									\
-	return i;							\
-}
-
-ATOMIC_OP_SUB_RETURN(_relaxed,   )
-ATOMIC_OP_SUB_RETURN(_acquire,  a, "memory")
-ATOMIC_OP_SUB_RETURN(_release,  l, "memory")
-ATOMIC_OP_SUB_RETURN(        , al, "memory")
-
-#undef ATOMIC_OP_SUB_RETURN
-
-#define ATOMIC_FETCH_OP_SUB(name, mb, cl...)				\
-static inline int __lse_atomic_fetch_sub##name(int i, atomic_t *v)	\
-{									\
-	asm volatile(							\
-	__LSE_PREAMBLE							\
-	"	neg	%w[i], %w[i]\n"					\
-	"	ldadd" #mb "	%w[i], %w[i], %[v]"			\
-	: [i] "+&r" (i), [v] "+Q" (v->counter)				\
-	: "r" (v)							\
-	: cl);								\
-									\
-	return i;							\
-}
-
-ATOMIC_FETCH_OP_SUB(_relaxed,   )
-ATOMIC_FETCH_OP_SUB(_acquire,  a, "memory")
-ATOMIC_FETCH_OP_SUB(_release,  l, "memory")
-ATOMIC_FETCH_OP_SUB(        , al, "memory")
-
-#undef ATOMIC_FETCH_OP_SUB
-
 #define ATOMIC64_OP(op, asm_op)						\
 static inline void __lse_atomic64_##op(s64 i, atomic64_t *v)		\
 {									\
@@ -178,6 +146,11 @@  ATOMIC64_OP(or, stset)
 ATOMIC64_OP(xor, steor)
 ATOMIC64_OP(add, stadd)
 
+static inline void __lse_atomic64_sub(s64 i, atomic64_t *v)
+{
+	__lse_atomic64_add(-i, v);
+}
+
 #undef ATOMIC64_OP
 
 #define ATOMIC64_FETCH_OP(name, mb, op, asm_op, cl...)			\
@@ -207,7 +180,20 @@  ATOMIC64_FETCH_OPS(add, ldadd)
 #undef ATOMIC64_FETCH_OP
 #undef ATOMIC64_FETCH_OPS
 
-#define ATOMIC64_OP_ADD_RETURN(name, mb, cl...)				\
+#define ATOMIC64_FETCH_OP_SUB(name)					\
+static inline long __lse_atomic64_fetch_sub##name(s64 i, atomic64_t *v)	\
+{									\
+	return __lse_atomic64_fetch_add##name(-i, v);			\
+}
+
+ATOMIC64_FETCH_OP_SUB(_relaxed)
+ATOMIC64_FETCH_OP_SUB(_acquire)
+ATOMIC64_FETCH_OP_SUB(_release)
+ATOMIC64_FETCH_OP_SUB(        )
+
+#undef ATOMIC64_FETCH_OP_SUB
+
+#define ATOMIC64_OP_ADD_SUB_RETURN(name, mb, cl...)			\
 static inline long __lse_atomic64_add_return##name(s64 i, atomic64_t *v)\
 {									\
 	unsigned long tmp;						\
@@ -221,14 +207,19 @@  static inline long __lse_atomic64_add_return##name(s64 i, atomic64_t *v)\
 	: cl);								\
 									\
 	return i;							\
+}									\
+									\
+static inline long __lse_atomic64_sub_return##name(s64 i, atomic64_t *v)\
+{									\
+	return __lse_atomic64_add_return##name(-i, v);			\
 }
 
-ATOMIC64_OP_ADD_RETURN(_relaxed,   )
-ATOMIC64_OP_ADD_RETURN(_acquire,  a, "memory")
-ATOMIC64_OP_ADD_RETURN(_release,  l, "memory")
-ATOMIC64_OP_ADD_RETURN(        , al, "memory")
+ATOMIC64_OP_ADD_SUB_RETURN(_relaxed,   )
+ATOMIC64_OP_ADD_SUB_RETURN(_acquire,  a, "memory")
+ATOMIC64_OP_ADD_SUB_RETURN(_release,  l, "memory")
+ATOMIC64_OP_ADD_SUB_RETURN(        , al, "memory")
 
-#undef ATOMIC64_OP_ADD_RETURN
+#undef ATOMIC64_OP_ADD_SUB_RETURN
 
 static inline void __lse_atomic64_and(s64 i, atomic64_t *v)
 {
@@ -261,61 +252,6 @@  ATOMIC64_FETCH_OP_AND(        , al, "memory")
 
 #undef ATOMIC64_FETCH_OP_AND
 
-static inline void __lse_atomic64_sub(s64 i, atomic64_t *v)
-{
-	asm volatile(
-	__LSE_PREAMBLE
-	"	neg	%[i], %[i]\n"
-	"	stadd	%[i], %[v]"
-	: [i] "+&r" (i), [v] "+Q" (v->counter)
-	: "r" (v));
-}
-
-#define ATOMIC64_OP_SUB_RETURN(name, mb, cl...)				\
-static inline long __lse_atomic64_sub_return##name(s64 i, atomic64_t *v)\
-{									\
-	unsigned long tmp;						\
-									\
-	asm volatile(							\
-	__LSE_PREAMBLE							\
-	"	neg	%[i], %[i]\n"					\
-	"	ldadd" #mb "	%[i], %x[tmp], %[v]\n"			\
-	"	add	%[i], %[i], %x[tmp]"				\
-	: [i] "+&r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp)	\
-	: "r" (v)							\
-	: cl);								\
-									\
-	return i;							\
-}
-
-ATOMIC64_OP_SUB_RETURN(_relaxed,   )
-ATOMIC64_OP_SUB_RETURN(_acquire,  a, "memory")
-ATOMIC64_OP_SUB_RETURN(_release,  l, "memory")
-ATOMIC64_OP_SUB_RETURN(        , al, "memory")
-
-#undef ATOMIC64_OP_SUB_RETURN
-
-#define ATOMIC64_FETCH_OP_SUB(name, mb, cl...)				\
-static inline long __lse_atomic64_fetch_sub##name(s64 i, atomic64_t *v)	\
-{									\
-	asm volatile(							\
-	__LSE_PREAMBLE							\
-	"	neg	%[i], %[i]\n"					\
-	"	ldadd" #mb "	%[i], %[i], %[v]"			\
-	: [i] "+&r" (i), [v] "+Q" (v->counter)				\
-	: "r" (v)							\
-	: cl);								\
-									\
-	return i;							\
-}
-
-ATOMIC64_FETCH_OP_SUB(_relaxed,   )
-ATOMIC64_FETCH_OP_SUB(_acquire,  a, "memory")
-ATOMIC64_FETCH_OP_SUB(_release,  l, "memory")
-ATOMIC64_FETCH_OP_SUB(        , al, "memory")
-
-#undef ATOMIC64_FETCH_OP_SUB
-
 static inline s64 __lse_atomic64_dec_if_positive(atomic64_t *v)
 {
 	unsigned long tmp;

[2/5] arm64: atomics lse: define SUBs in terms of ADDs

Commit Message

Comments

Patch