From patchwork Fri Dec 10 15:14:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mark Rutland X-Patchwork-Id: 12695613 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 818D8C433F5 for ; Fri, 10 Dec 2021 15:17:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=m4JijaPBsqv6BA2gyboZMd52IljozVOvKvG9BAho+8w=; b=I1oKwPen3bL3ae Bvw4JjxvYRoPllS8n86wjDVp/bzho3//zhIIFcS74FvpEou70JXyb0Ca6vuS+27E6RJAudJLtm9u0 TlqtHcpJYi/pOFTuiYXsjOXyYOxS9jjpOa+B3V5+ig6+qTFqDyugLfUaotmgt1MyDYbmUMO+1og9q WaP5OHQtoZiYoCjfzzJO8wvu3Z1UmYst/aInGvgC6aDLLHdag1Go4hSrdjTqv+5IXh03SnkqhppCw D+06YjQv0+kiA3thvc4y10gftad2UIx8VY6Nsw4v0K3usF4rYvbtc8MVvjefflh0A8o8DuODaUmjj FJgGK0zi/j+Aa6HyfeGA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1mvhcD-002Laj-Ia; Fri, 10 Dec 2021 15:15:17 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1mvhbL-002LIC-Ah for linux-arm-kernel@lists.infradead.org; Fri, 10 Dec 2021 15:14:27 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C2435142F; Fri, 10 Dec 2021 07:14:22 -0800 (PST) Received: from lakrids.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id DD7043F5A1; Fri, 10 Dec 2021 07:14:21 -0800 (PST) From: Mark Rutland To: linux-arm-kernel@lists.infradead.org Cc: boqun.feng@gmail.com, catalin.marinas@arm.com, mark.rutland@arm.com, peterz@infradead.org, will@kernel.org Subject: [PATCH 4/5] arm64: atomics: lse: improve constraints for simple ops Date: Fri, 10 Dec 2021 15:14:09 +0000 Message-Id: <20211210151410.2782645-5-mark.rutland@arm.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20211210151410.2782645-1-mark.rutland@arm.com> References: <20211210151410.2782645-1-mark.rutland@arm.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20211210_071423_503699_879119BA X-CRM114-Status: GOOD ( 12.94 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org We have overly conservative assembly constraints for the basic FEAT_LSE atomic instructions, and using more accurate and permissive constraints will allow for better code generation. The FEAT_LSE basic atomic instructions have come in two forms: LD{op}{order}{size} , , [] ST{op}{order}{size} , [] The ST* forms are aliases of the LD* forms where: ST{op}{order}{size} , [] Is: LD{op}{order}{size} , XZR, [] For either form, both and are read but not written back to, and is written with the original value of the memory location. Where ( == ) or ( == ), is written *after* the other register value(s) are consumed. There are no UNPREDICTABLE or CONSTRAINED UNPREDICTABLE behaviours when any pair of , , or are the same register. Our current inline assembly always uses == , treating this register as both an input and an output (using a '+r' constraint). This forces the compiler to do some unnecessary register shuffling and/or redundant value generation. For example, the compiler cannot reuse the value, and currently GCC 11.1.0 will compile: __lse_atomic_add(1, a); __lse_atomic_add(1, b); __lse_atomic_add(1, c); As: mov w3, #0x1 mov w4, w3 stadd w4, [x0] mov w0, w3 stadd w0, [x1] stadd w3, [x2] We can improve this with more accurate constraints, separating and , where is an input-only register ('r'), and is an output-only value ('=r'). As is written back after is consumed, it does not need to be earlyclobber ('=&r'), leaving the compiler free to use the same register for both and where this is desirable. At the same time, the redundant 'r' constraint for `v` is removed, as the `+Q` constraint is sufficient. With this change, the above example becomes: mov w3, #0x1 stadd w3, [x0] stadd w3, [x1] stadd w3, [x2] I've made this change for the non-value-returning and FETCH ops. The RETURN ops have a multi-instruction sequence for which we cannot use the same constraints, and a subsequent patch will rewrite hte RETURN ops in terms of the FETCH ops, relying on the ability for the compiler to reuse the value. This is intended as an optimization. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland Cc: Boqun Feng Cc: Catalin Marinas Cc: Peter Zijlstra Cc: Will Deacon Acked-by: Will Deacon --- arch/arm64/include/asm/atomic_lse.h | 30 +++++++++++++++++------------ 1 file changed, 18 insertions(+), 12 deletions(-) diff --git a/arch/arm64/include/asm/atomic_lse.h b/arch/arm64/include/asm/atomic_lse.h index d707eafb7677..e4c5c4c34ce6 100644 --- a/arch/arm64/include/asm/atomic_lse.h +++ b/arch/arm64/include/asm/atomic_lse.h @@ -16,8 +16,8 @@ static inline void __lse_atomic_##op(int i, atomic_t *v) \ asm volatile( \ __LSE_PREAMBLE \ " " #asm_op " %w[i], %[v]\n" \ - : [i] "+r" (i), [v] "+Q" (v->counter) \ - : "r" (v)); \ + : [v] "+Q" (v->counter) \ + : [i] "r" (i)); \ } ATOMIC_OP(andnot, stclr) @@ -35,14 +35,17 @@ static inline void __lse_atomic_sub(int i, atomic_t *v) #define ATOMIC_FETCH_OP(name, mb, op, asm_op, cl...) \ static inline int __lse_atomic_fetch_##op##name(int i, atomic_t *v) \ { \ + int old; \ + \ asm volatile( \ __LSE_PREAMBLE \ - " " #asm_op #mb " %w[i], %w[i], %[v]" \ - : [i] "+r" (i), [v] "+Q" (v->counter) \ - : "r" (v) \ + " " #asm_op #mb " %w[i], %w[old], %[v]" \ + : [v] "+Q" (v->counter), \ + [old] "=r" (old) \ + : [i] "r" (i) \ : cl); \ \ - return i; \ + return old; \ } #define ATOMIC_FETCH_OPS(op, asm_op) \ @@ -124,8 +127,8 @@ static inline void __lse_atomic64_##op(s64 i, atomic64_t *v) \ asm volatile( \ __LSE_PREAMBLE \ " " #asm_op " %[i], %[v]\n" \ - : [i] "+r" (i), [v] "+Q" (v->counter) \ - : "r" (v)); \ + : [v] "+Q" (v->counter) \ + : [i] "r" (i)); \ } ATOMIC64_OP(andnot, stclr) @@ -143,14 +146,17 @@ static inline void __lse_atomic64_sub(s64 i, atomic64_t *v) #define ATOMIC64_FETCH_OP(name, mb, op, asm_op, cl...) \ static inline long __lse_atomic64_fetch_##op##name(s64 i, atomic64_t *v)\ { \ + s64 old; \ + \ asm volatile( \ __LSE_PREAMBLE \ - " " #asm_op #mb " %[i], %[i], %[v]" \ - : [i] "+r" (i), [v] "+Q" (v->counter) \ - : "r" (v) \ + " " #asm_op #mb " %[i], %[old], %[v]" \ + : [v] "+Q" (v->counter), \ + [old] "=r" (old) \ + : [i] "r" (i) \ : cl); \ \ - return i; \ + return old; \ } #define ATOMIC64_FETCH_OPS(op, asm_op) \