From patchwork Fri Jul 13 11:04:23 2012
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Will Deacon <will.deacon@arm.com>
X-Patchwork-Id: 1195211
Return-Path: 
 <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org>
X-Original-To: patchwork-linux-arm@patchwork.kernel.org
Delivered-To: patchwork-process-083081@patchwork2.kernel.org
Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134])
	by patchwork2.kernel.org (Postfix) with ESMTP id 765EFDFFFD
	for <patchwork-linux-arm@patchwork.kernel.org>;
	Fri, 13 Jul 2012 11:13:25 +0000 (UTC)
Received: from localhost ([::1] helo=merlin.infradead.org)
	by merlin.infradead.org with esmtp (Exim 4.76 #1 (Red Hat Linux))
	id 1Spdjh-0004lh-To; Fri, 13 Jul 2012 11:08:38 +0000
Received: from cam-admin0.cambridge.arm.com ([217.140.96.50])
	by merlin.infradead.org with esmtp (Exim 4.76 #1 (Red Hat Linux))
	id 1SpdgM-0004U8-9P for linux-arm-kernel@lists.infradead.org;
	Fri, 13 Jul 2012 11:05:15 +0000
Received: from mudshark.cambridge.arm.com (mudshark.cambridge.arm.com
	[10.1.79.58])
	by cam-admin0.cambridge.arm.com (8.12.6/8.12.6) with ESMTP id
	q6DB4TOK016081; Fri, 13 Jul 2012 12:04:29 +0100 (BST)
Received: by mudshark.cambridge.arm.com (Postfix, from userid 1000)
	id A1B0AC02D3; Fri, 13 Jul 2012 12:04:27 +0100 (BST)
From: Will Deacon <will.deacon@arm.com>
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH] ARM: mutex: use generic atomic_dec-based implementation for
	ARMv6+
Date: Fri, 13 Jul 2012 12:04:23 +0100
Message-Id: <1342177463-21238-1-git-send-email-will.deacon@arm.com>
X-Mailer: git-send-email 1.7.4.1
X-Spam-Note: CRM114 invocation failed
X-Spam-Score: -6.9 (------)
X-Spam-Report: SpamAssassin version 3.3.2 on merlin.infradead.org summary:
	Content analysis details:   (-6.9 points)
	pts rule name              description
	---- ----------------------
	--------------------------------------------------
	-5.0 RCVD_IN_DNSWL_HI RBL: Sender listed at http://www.dnswl.org/,
	high trust [217.140.96.50 listed in list.dnswl.org]
	-0.0 T_RP_MATCHES_RCVD Envelope sender domain matches handover relay
	domain
	-1.9 BAYES_00               BODY: Bayes spam probability is 0 to 1%
	[score: 0.0000]
Cc: kangshan0910@gmail.com, Will Deacon <will.deacon@arm.com>,
	Nicolas Pitre <nico@fluxnic.net>
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: <linux-arm-kernel.lists.infradead.org>
List-Unsubscribe: 
 <http://lists.infradead.org/mailman/options/linux-arm-kernel>,
	<mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/>
List-Post: <mailto:linux-arm-kernel@lists.infradead.org>
List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help>
List-Subscribe: 
 <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>,
	<mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
MIME-Version: 1.0
Sender: linux-arm-kernel-bounces@lists.infradead.org
Errors-To: 
 linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

The open-coded mutex implementation for ARMv6+ cores suffers from a
couple of problems:

	1. (major) There aren't any barriers in sight, so in the
	   uncontended case we don't actually protect any accesses
	   performed in during the critical section.

	2. (minor) If the strex indicates failure to complete the store,
	   we assume that the lock is contended and run away down the
	   failure (slow) path. This assumption isn't correct and the
	   core may fail the strex for reasons other than contention.

This patch solves both of these problems by using the generic atomic_dec
based implementation for mutexes on ARMv6+. This also has the benefit of
removing a fair amount of inline assembly code.

Cc: Nicolas Pitre <nico@fluxnic.net>
Reported-by: Shan Kang <kangshan0910@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Nicolas Pitre <nico@linaro.org>
---

Given that Shan reports that this has been observed to cause problems in
practice, I think this is certainly a candidate for -stable.

 arch/arm/include/asm/mutex.h |  113 +-----------------------------------------
 1 files changed, 2 insertions(+), 111 deletions(-)

diff --git a/arch/arm/include/asm/mutex.h b/arch/arm/include/asm/mutex.h
index 93226cf..bd68642 100644
--- a/arch/arm/include/asm/mutex.h
+++ b/arch/arm/include/asm/mutex.h
@@ -12,116 +12,7 @@
 /* On pre-ARMv6 hardware the swp based implementation is the most efficient. */
 # include <asm-generic/mutex-xchg.h>
 #else
-
-/*
- * Attempting to lock a mutex on ARMv6+ can be done with a bastardized
- * atomic decrement (it is not a reliable atomic decrement but it satisfies
- * the defined semantics for our purpose, while being smaller and faster
- * than a real atomic decrement or atomic swap.  The idea is to attempt
- * decrementing the lock value only once.  If once decremented it isn't zero,
- * or if its store-back fails due to a dispute on the exclusive store, we
- * simply bail out immediately through the slow path where the lock will be
- * reattempted until it succeeds.
- */
-static inline void
-__mutex_fastpath_lock(atomic_t *count, void (*fail_fn)(atomic_t *))
-{
-	int __ex_flag, __res;
-
-	__asm__ (
-
-		"ldrex	%0, [%2]	\n\t"
-		"sub	%0, %0, #1	\n\t"
-		"strex	%1, %0, [%2]	"
-
-		: "=&r" (__res), "=&r" (__ex_flag)
-		: "r" (&(count)->counter)
-		: "cc","memory" );
-
-	__res |= __ex_flag;
-	if (unlikely(__res != 0))
-		fail_fn(count);
-}
-
-static inline int
-__mutex_fastpath_lock_retval(atomic_t *count, int (*fail_fn)(atomic_t *))
-{
-	int __ex_flag, __res;
-
-	__asm__ (
-
-		"ldrex	%0, [%2]	\n\t"
-		"sub	%0, %0, #1	\n\t"
-		"strex	%1, %0, [%2]	"
-
-		: "=&r" (__res), "=&r" (__ex_flag)
-		: "r" (&(count)->counter)
-		: "cc","memory" );
-
-	__res |= __ex_flag;
-	if (unlikely(__res != 0))
-		__res = fail_fn(count);
-	return __res;
-}
-
-/*
- * Same trick is used for the unlock fast path. However the original value,
- * rather than the result, is used to test for success in order to have
- * better generated assembly.
- */
-static inline void
-__mutex_fastpath_unlock(atomic_t *count, void (*fail_fn)(atomic_t *))
-{
-	int __ex_flag, __res, __orig;
-
-	__asm__ (
-
-		"ldrex	%0, [%3]	\n\t"
-		"add	%1, %0, #1	\n\t"
-		"strex	%2, %1, [%3]	"
-
-		: "=&r" (__orig), "=&r" (__res), "=&r" (__ex_flag)
-		: "r" (&(count)->counter)
-		: "cc","memory" );
-
-	__orig |= __ex_flag;
-	if (unlikely(__orig != 0))
-		fail_fn(count);
-}
-
-/*
- * If the unlock was done on a contended lock, or if the unlock simply fails
- * then the mutex remains locked.
- */
-#define __mutex_slowpath_needs_to_unlock()	1
-
-/*
- * For __mutex_fastpath_trylock we use another construct which could be
- * described as a "single value cmpxchg".
- *
- * This provides the needed trylock semantics like cmpxchg would, but it is
- * lighter and less generic than a true cmpxchg implementation.
- */
-static inline int
-__mutex_fastpath_trylock(atomic_t *count, int (*fail_fn)(atomic_t *))
-{
-	int __ex_flag, __res, __orig;
-
-	__asm__ (
-
-		"1: ldrex	%0, [%3]	\n\t"
-		"subs		%1, %0, #1	\n\t"
-		"strexeq	%2, %1, [%3]	\n\t"
-		"movlt		%0, #0		\n\t"
-		"cmpeq		%2, #0		\n\t"
-		"bgt		1b		"
-
-		: "=&r" (__orig), "=&r" (__res), "=&r" (__ex_flag)
-		: "r" (&count->counter)
-		: "cc", "memory" );
-
-	return __orig;
-}
-
+/* ARMv6+ can implement efficient atomic decrement using exclusive accessors. */
+# include <asm-generic/mutex-dec.h>
 #endif
 #endif