From patchwork Fri Oct 24 12:22:20 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Steve Capper <steve.capper@linaro.org>
X-Patchwork-Id: 5146941
Return-Path: 
 <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org>
X-Original-To: patchwork-linux-arm@patchwork.kernel.org
Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org
Received: from mail.kernel.org (mail.kernel.org [198.145.19.201])
	by patchwork1.web.kernel.org (Postfix) with ESMTP id 150AA9F349
	for <patchwork-linux-arm@patchwork.kernel.org>;
	Fri, 24 Oct 2014 12:24:54 +0000 (UTC)
Received: from mail.kernel.org (localhost [127.0.0.1])
	by mail.kernel.org (Postfix) with ESMTP id 1722D20173
	for <patchwork-linux-arm@patchwork.kernel.org>;
	Fri, 24 Oct 2014 12:24:53 +0000 (UTC)
Received: from bombadil.infradead.org (bombadil.infradead.org
	[198.137.202.9])
	(using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 38C852012F
	for <patchwork-linux-arm@patchwork.kernel.org>;
	Fri, 24 Oct 2014 12:24:51 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org)
	by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux))
	id 1XhdtN-0003HO-Q9; Fri, 24 Oct 2014 12:22:53 +0000
Received: from mail-wi0-f174.google.com ([209.85.212.174])
	by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat
	Linux)) id 1XhdtK-0002wr-2R
	for linux-arm-kernel@lists.infradead.org;
	Fri, 24 Oct 2014 12:22:51 +0000
Received: by mail-wi0-f174.google.com with SMTP id q5so1070363wiv.1
	for <linux-arm-kernel@lists.infradead.org>;
	Fri, 24 Oct 2014 05:22:27 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20130820;
	h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
	:references;
	bh=mVEuFuuPaNR0j1rcrc8cxm3AzOSvKQO288FkK7JKxBk=;
	b=mQowPJHIs3wQpEhiwVi7eF89CeZjMiFdBOVjmiPnBd+sZx8qqBJlvgXqyTAGWzta7z
	Je7syL1srdxXEpDn1KaOQOJvjeM+lEpif1WsgcdoQXUGtycSPXRBO6UGvdDCBZmNDZzR
	Q/F5HD41t1GE2kuXkeQl+z2mSqcuZYbSCQZP4eNtY8n9MytqPEl9WmPjMaYLY/zt6kNX
	WxJiBFrc0wi/9pomEyVypB8MHhI9U5n9Ewi1lGVYSF0Nx/ry0ZzwcOUkPDP+rnJYXXtN
	E+vrraBLR62v5PzW3/v8N1HxUIfhH6JNciAdpw2PyMHhulN0nEx07CA1s9RgS5tqniOl
	wesA==
X-Gm-Message-State: 
 ALoCoQn06I2JInkbuNVoxNwRoxP34nkkL3CIE2ozH4o2KUsn0Z9azSsiQaYscmByYJEnFWfiZyz/
X-Received: by 10.180.90.237 with SMTP id bz13mr3835786wib.50.1414153347221;
	Fri, 24 Oct 2014 05:22:27 -0700 (PDT)
Received: from marmot.wormnet.eu (marmot.wormnet.eu. [188.246.204.87])
	by mx.google.com with ESMTPSA id
	p4sm1833034wiz.23.2014.10.24.05.22.26 for <multiple recipients>
	(version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Fri, 24 Oct 2014 05:22:26 -0700 (PDT)
From: Steve Capper <steve.capper@linaro.org>
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH V2] arm64: xchg: Implement cmpxchg_double
Date: Fri, 24 Oct 2014 13:22:20 +0100
Message-Id: <1414153340-19552-1-git-send-email-steve.capper@linaro.org>
X-Mailer: git-send-email 1.7.10.4
In-Reply-To: <1413374323-2062-1-git-send-email-steve.capper@linaro.org>
References: <1413374323-2062-1-git-send-email-steve.capper@linaro.org>
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
X-CRM114-CacheID: sfid-20141024_052250_265192_284F94AB 
X-CRM114-Status: GOOD (  14.76  )
X-Spam-Score: -0.7 (/)
Cc: catalin.marinas@arm.com, Liviu.Dudau@arm.com, will.deacon@arm.com,
	Steve Capper <steve.capper@linaro.org>
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: <linux-arm-kernel.lists.infradead.org>
List-Unsubscribe: 
 <http://lists.infradead.org/mailman/options/linux-arm-kernel>,
	<mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/>
List-Post: <mailto:linux-arm-kernel@lists.infradead.org>
List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help>
List-Subscribe: 
 <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>,
	<mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
MIME-Version: 1.0
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: 
 linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org
X-Spam-Status: No, score=-3.3 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_NONE,
	RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

The arm64 architecture has the ability to exclusively load and store
a pair of registers from an address (ldxp/stxp). Also the SLUB can take
advantage of a cmpxchg_double implementation to avoid taking some
locks.

This patch provides an implementation of cmpxchg_double for 64-bit
pairs, and activates the logic required for the SLUB to use these
functions (HAVE_ALIGNED_STRUCT_PAGE and HAVE_CMPXCHG_DOUBLE).

Also definitions of this_cpu_cmpxchg_8 and this_cpu_cmpxchg_double_8
are wired up to cmpxchg_local and cmpxchg_double_local (rather than the
stock implementations that perform non-atomic operations with
interrupts disabled) as they are used by the SLUB.

On a Juno platform running on only the A57s I get quite a noticeable
performance improvement with 5 runs of hackbench on v3.17:

         Baseline | With Patch
 -----------------+-----------
 Mean    119.2312 | 106.1782
 StdDev    0.4919 |   0.4494

(times taken to complete `./hackbench 100 process 1000', in seconds)

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
Changed in V2, added the this_cpu_cmpxchg* definitions, these are used
by the fast path of the SLUB (without this our hackbench mean goes up
to 111.9 seconds).
Cheers Liviu for pointing out this ommission!

The performance measurements were taken against a newer kernel running
on a board with newer firmware, thus the baseline is faster than the
one posted in V1.
---
 arch/arm64/Kconfig               |  2 ++
 arch/arm64/include/asm/cmpxchg.h | 71 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 73 insertions(+)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index fd4e81a..4a0f9a1 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -31,12 +31,14 @@ config ARM64
 	select GENERIC_STRNLEN_USER
 	select GENERIC_TIME_VSYSCALL
 	select HARDIRQS_SW_RESEND
+	select HAVE_ALIGNED_STRUCT_PAGE if SLUB
 	select HAVE_ARCH_AUDITSYSCALL
 	select HAVE_ARCH_JUMP_LABEL
 	select HAVE_ARCH_KGDB
 	select HAVE_ARCH_TRACEHOOK
 	select HAVE_C_RECORDMCOUNT
 	select HAVE_CC_STACKPROTECTOR
+	select HAVE_CMPXCHG_DOUBLE
 	select HAVE_DEBUG_BUGVERBOSE
 	select HAVE_DEBUG_KMEMLEAK
 	select HAVE_DMA_API_DEBUG
diff --git a/arch/arm64/include/asm/cmpxchg.h b/arch/arm64/include/asm/cmpxchg.h
index ddb9d78..89e397b 100644
--- a/arch/arm64/include/asm/cmpxchg.h
+++ b/arch/arm64/include/asm/cmpxchg.h
@@ -19,6 +19,7 @@
 #define __ASM_CMPXCHG_H
 
 #include <linux/bug.h>
+#include <linux/mmdebug.h>
 
 #include <asm/barrier.h>
 
@@ -152,6 +153,51 @@ static inline unsigned long __cmpxchg(volatile void *ptr, unsigned long old,
 	return oldval;
 }
 
+#define system_has_cmpxchg_double()     1
+
+static inline int __cmpxchg_double(volatile void *ptr1, volatile void *ptr2,
+		unsigned long old1, unsigned long old2,
+		unsigned long new1, unsigned long new2, int size)
+{
+	unsigned long loop, lost;
+
+	switch (size) {
+	case 8:
+		VM_BUG_ON((unsigned long *)ptr2 - (unsigned long *)ptr1 != 1);
+		do {
+			asm volatile("// __cmpxchg_double8\n"
+			"	ldxp	%0, %1, %2\n"
+			"	eor	%0, %0, %3\n"
+			"	eor	%1, %1, %4\n"
+			"	orr	%1, %0, %1\n"
+			"	mov	%w0, #0\n"
+			"	cbnz	%1, 1f\n"
+			"	stxp	%w0, %5, %6, %2\n"
+			"1:\n"
+				: "=&r"(loop), "=&r"(lost), "+Q" (*(u64 *)ptr1)
+				: "r" (old1), "r"(old2), "r"(new1), "r"(new2));
+		} while (loop);
+		break;
+	default:
+		BUILD_BUG();
+	}
+
+	return !lost;
+}
+
+static inline int __cmpxchg_double_mb(volatile void *ptr1, volatile void *ptr2,
+			unsigned long old1, unsigned long old2,
+			unsigned long new1, unsigned long new2, int size)
+{
+	int ret;
+
+	smp_mb();
+	ret = __cmpxchg_double(ptr1, ptr2, old1, old2, new1, new2, size);
+	smp_mb();
+
+	return ret;
+}
+
 static inline unsigned long __cmpxchg_mb(volatile void *ptr, unsigned long old,
 					 unsigned long new, int size)
 {
@@ -182,6 +228,31 @@ static inline unsigned long __cmpxchg_mb(volatile void *ptr, unsigned long old,
 	__ret; \
 })
 
+#define cmpxchg_double(ptr1, ptr2, o1, o2, n1, n2) \
+({\
+	int __ret;\
+	__ret = __cmpxchg_double_mb((ptr1), (ptr2), (unsigned long)(o1), \
+			(unsigned long)(o2), (unsigned long)(n1), \
+			(unsigned long)(n2), sizeof(*(ptr1)));\
+	__ret; \
+})
+
+#define cmpxchg_double_local(ptr1, ptr2, o1, o2, n1, n2) \
+({\
+	int __ret;\
+	__ret = __cmpxchg_double((ptr1), (ptr2), (unsigned long)(o1), \
+			(unsigned long)(o2), (unsigned long)(n1), \
+			(unsigned long)(n2), sizeof(*(ptr1)));\
+	__ret; \
+})
+
+#define this_cpu_cmpxchg_8(ptr, o, n) \
+	cmpxchg_local(raw_cpu_ptr(&(ptr)), o, n)
+
+#define this_cpu_cmpxchg_double_8(ptr1, ptr2, o1, o2, n1, n2) \
+	cmpxchg_double_local(raw_cpu_ptr(&(ptr1)), raw_cpu_ptr(&(ptr2)), \
+				o1, o2, n1, n2)
+
 #define cmpxchg64(ptr,o,n)		cmpxchg((ptr),(o),(n))
 #define cmpxchg64_local(ptr,o,n)	cmpxchg_local((ptr),(o),(n))