From patchwork Fri Dec 10 15:14:10 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Mark Rutland <mark.rutland@arm.com>
X-Patchwork-Id: 12695614
Return-Path: 
 <linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from bombadil.infradead.org (bombadil.infradead.org
 [198.137.202.133])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 43D38C433EF
	for <linux-arm-kernel@archiver.kernel.org>;
 Fri, 10 Dec 2021 15:17:15 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=lists.infradead.org; s=bombadil.20210309; h=Sender:
	Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post:
	List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To:
	Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description:
	Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:
	List-Owner; bh=MyUlbJgj5blefnc1yZH+v+LkgtB84y9NsraoXS07zpI=; b=JwPelqKvc6x2ns
	2CTdq840xiGtMLnzCbaf1oiWw34n511iBPqQKRUUPo8ju0mA93SIanisOe0UkzhwySK+5ODGNEGtC
	HDqpORr+X/urS3tFgC93JFQCRzjPgWL2GtDW8am3ftPfKpNKXdIKMarq9RWZLII4URPK+1M5NtNzP
	xB9+VuBKv9/s5QqG/ZSjw4MNSFGLpb2YIWnzRwpyb/JJF3dQMFwVodfwRFMijGA782iwwBB4d0ivO
	kFsgpn6r6WLY51i//I1z4Nm0/WRFTl9caV/Yu08y0YgVkzX22Tu0wMCL/wVz/b99OUXqY4+U3MJjt
	kIGrRt7UkvglGSxKnkpQ==;
Received: from localhost ([::1] helo=bombadil.infradead.org)
	by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux))
	id 1mvhca-002Lly-IW; Fri, 10 Dec 2021 15:15:40 +0000
Received: from foss.arm.com ([217.140.110.172])
 by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux))
 id 1mvhbM-002LJB-TG
 for linux-arm-kernel@lists.infradead.org; Fri, 10 Dec 2021 15:14:28 +0000
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 58078106F;
 Fri, 10 Dec 2021 07:14:24 -0800 (PST)
Received: from lakrids.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com
 [10.121.207.14])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 72C483F5A1;
 Fri, 10 Dec 2021 07:14:23 -0800 (PST)
From: Mark Rutland <mark.rutland@arm.com>
To: linux-arm-kernel@lists.infradead.org
Cc: boqun.feng@gmail.com, catalin.marinas@arm.com, mark.rutland@arm.com,
 peterz@infradead.org, will@kernel.org
Subject: [PATCH 5/5] arm64: atomics: lse: define RETURN ops in terms of FETCH
 ops
Date: Fri, 10 Dec 2021 15:14:10 +0000
Message-Id: <20211210151410.2782645-6-mark.rutland@arm.com>
X-Mailer: git-send-email 2.30.2
In-Reply-To: <20211210151410.2782645-1-mark.rutland@arm.com>
References: <20211210151410.2782645-1-mark.rutland@arm.com>
MIME-Version: 1.0
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
X-CRM114-CacheID: sfid-20211210_071425_116365_B7BF9584 
X-CRM114-Status: GOOD (  13.04  )
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: <linux-arm-kernel.lists.infradead.org>
List-Unsubscribe: 
 <http://lists.infradead.org/mailman/options/linux-arm-kernel>,
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/>
List-Post: <mailto:linux-arm-kernel@lists.infradead.org>
List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help>
List-Subscribe: 
 <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>,
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: 
 linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org

The FEAT_LSE atomic instructions include LD* instructions which return
the original value of a memory location can be used to directly
implement FETCH opertations. Each RETURN op is implemented as a copy of
the corresponding FETCH op with a trailing instruction instruction to
generate the new value of the memory location. We only
directly implement *_fetch_add*(), for which we have a trailing `add`
instruction.

As the compiler has no visibility of the `add`, this leads to less than
optimal code generation when consuming the result.

For example, the compiler cannot constant-fold the addition into later
operations, and currently GCC 11.1.0 will compile:

       return __lse_atomic_sub_return(1, v) == 0;

As:

	mov     w1, #0xffffffff
	ldaddal w1, w2, [x0]
	add     w1, w1, w2
	cmp     w1, #0x0
	cset    w0, eq  // eq = none
	ret

This patch improves this by replacing the `add` with C addition after
the inline assembly block, e.g.

	ret += i;

This allows the compiler to manipulate `i`. This permits the compiler to
merge the `add` and `cmp` for the above, e.g.

	mov     w1, #0xffffffff
	ldaddal w1, w1, [x0]
	cmp     w1, #0x1
	cset    w0, eq  // eq = none
	ret

With this change the assembly for each RETURN op is identical to the
corresponding FETCH op (including barriers and clobbers) so I've removed
the inline assembly and rewritten each RETURN op in terms of the
corresponding FETCH op, e.g.

| static inline void __lse_atomic_add_return(int i, atomic_t *v)
| {
|       return __lse_atomic_fetch_add(i, v) + i
| }

The new construction does not adversely affect the common case, and
before and after this patch GCC 11.1.0 can compile:

	__lse_atomic_add_return(i, v)

As:

	ldaddal w0, w2, [x1]
	add     w0, w0, w2

... while having the freedom to do better elsewhere.

This is intended as an optimization and cleanup.
There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/atomic_lse.h | 48 +++++++++--------------------
 1 file changed, 14 insertions(+), 34 deletions(-)

diff --git a/arch/arm64/include/asm/atomic_lse.h b/arch/arm64/include/asm/atomic_lse.h
index e4c5c4c34ce6..d955ade5df7c 100644
--- a/arch/arm64/include/asm/atomic_lse.h
+++ b/arch/arm64/include/asm/atomic_lse.h
@@ -75,31 +75,21 @@ ATOMIC_FETCH_OP_SUB(        )
 
 #undef ATOMIC_FETCH_OP_SUB
 
-#define ATOMIC_OP_ADD_SUB_RETURN(name, mb, cl...)			\
+#define ATOMIC_OP_ADD_SUB_RETURN(name)					\
 static inline int __lse_atomic_add_return##name(int i, atomic_t *v)	\
 {									\
-	u32 tmp;							\
-									\
-	asm volatile(							\
-	__LSE_PREAMBLE							\
-	"	ldadd" #mb "	%w[i], %w[tmp], %[v]\n"			\
-	"	add	%w[i], %w[i], %w[tmp]"				\
-	: [i] "+r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp)	\
-	: "r" (v)							\
-	: cl);								\
-									\
-	return i;							\
+	return __lse_atomic_fetch_add##name(i, v) + i;			\
 }									\
 									\
 static inline int __lse_atomic_sub_return##name(int i, atomic_t *v)	\
 {									\
-	return __lse_atomic_add_return##name(-i, v);			\
+	return __lse_atomic_fetch_sub(i, v) - i;			\
 }
 
-ATOMIC_OP_ADD_SUB_RETURN(_relaxed,   )
-ATOMIC_OP_ADD_SUB_RETURN(_acquire,  a, "memory")
-ATOMIC_OP_ADD_SUB_RETURN(_release,  l, "memory")
-ATOMIC_OP_ADD_SUB_RETURN(        , al, "memory")
+ATOMIC_OP_ADD_SUB_RETURN(_relaxed)
+ATOMIC_OP_ADD_SUB_RETURN(_acquire)
+ATOMIC_OP_ADD_SUB_RETURN(_release)
+ATOMIC_OP_ADD_SUB_RETURN(        )
 
 #undef ATOMIC_OP_ADD_SUB_RETURN
 
@@ -186,31 +176,21 @@ ATOMIC64_FETCH_OP_SUB(        )
 
 #undef ATOMIC64_FETCH_OP_SUB
 
-#define ATOMIC64_OP_ADD_SUB_RETURN(name, mb, cl...)			\
+#define ATOMIC64_OP_ADD_SUB_RETURN(name)				\
 static inline long __lse_atomic64_add_return##name(s64 i, atomic64_t *v)\
 {									\
-	unsigned long tmp;						\
-									\
-	asm volatile(							\
-	__LSE_PREAMBLE							\
-	"	ldadd" #mb "	%[i], %x[tmp], %[v]\n"			\
-	"	add	%[i], %[i], %x[tmp]"				\
-	: [i] "+r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp)	\
-	: "r" (v)							\
-	: cl);								\
-									\
-	return i;							\
+	return __lse_atomic64_fetch_add##name(i, v) + i;		\
 }									\
 									\
 static inline long __lse_atomic64_sub_return##name(s64 i, atomic64_t *v)\
 {									\
-	return __lse_atomic64_add_return##name(-i, v);			\
+	return __lse_atomic64_fetch_sub##name(i, v) - i;		\
 }
 
-ATOMIC64_OP_ADD_SUB_RETURN(_relaxed,   )
-ATOMIC64_OP_ADD_SUB_RETURN(_acquire,  a, "memory")
-ATOMIC64_OP_ADD_SUB_RETURN(_release,  l, "memory")
-ATOMIC64_OP_ADD_SUB_RETURN(        , al, "memory")
+ATOMIC64_OP_ADD_SUB_RETURN(_relaxed)
+ATOMIC64_OP_ADD_SUB_RETURN(_acquire)
+ATOMIC64_OP_ADD_SUB_RETURN(_release)
+ATOMIC64_OP_ADD_SUB_RETURN(        )
 
 #undef ATOMIC64_OP_ADD_SUB_RETURN