From patchwork Fri Jul 13 17:16:07 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Will Deacon X-Patchwork-Id: 1196471 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork2.kernel.org Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) by patchwork2.kernel.org (Postfix) with ESMTP id 72A81DFFFD for ; Fri, 13 Jul 2012 17:22:09 +0000 (UTC) Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.76 #1 (Red Hat Linux)) id 1SpjTk-0004OV-So; Fri, 13 Jul 2012 17:16:32 +0000 Received: from cam-admin0.cambridge.arm.com ([217.140.96.50]) by merlin.infradead.org with esmtp (Exim 4.76 #1 (Red Hat Linux)) id 1SpjTd-0004Nk-4t for linux-arm-kernel@lists.infradead.org; Fri, 13 Jul 2012 17:16:29 +0000 Received: from mudshark.cambridge.arm.com (mudshark.cambridge.arm.com [10.1.79.58]) by cam-admin0.cambridge.arm.com (8.12.6/8.12.6) with ESMTP id q6DHGDOK025233; Fri, 13 Jul 2012 18:16:13 +0100 (BST) Received: by mudshark.cambridge.arm.com (Postfix, from userid 1000) id E3A25C02D3; Fri, 13 Jul 2012 18:16:11 +0100 (BST) From: Will Deacon To: linux-arm-kernel@lists.infradead.org Subject: [PATCH v2] ARM: mutex: use generic xchg-based implementation for ARMv6+ Date: Fri, 13 Jul 2012 18:16:07 +0100 Message-Id: <1342199767-15108-1-git-send-email-will.deacon@arm.com> X-Mailer: git-send-email 1.7.4.1 X-Spam-Note: CRM114 invocation failed X-Spam-Score: -6.9 (------) X-Spam-Report: SpamAssassin version 3.3.2 on merlin.infradead.org summary: Content analysis details: (-6.9 points) pts rule name description ---- ---------------------- -------------------------------------------------- -5.0 RCVD_IN_DNSWL_HI RBL: Sender listed at http://www.dnswl.org/, high trust [217.140.96.50 listed in list.dnswl.org] -0.0 T_RP_MATCHES_RCVD Envelope sender domain matches handover relay domain -0.0 SPF_PASS SPF: sender matches SPF record -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] Cc: kangshan0910@gmail.com, Will Deacon , arnd@arndb.de, nico@fluxnic.net X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: linux-arm-kernel-bounces@lists.infradead.org Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org The open-coded mutex implementation for ARMv6+ cores suffers from a severe lack of barriers, so in the uncontended case we don't actually protect any accesses performed during the critical section. Furthermore, the code is largely a duplication of the ARMv6+ atomic_dec code but optimised to remove a branch instruction, as the mutex fastpath was previously inlined. Now that this is executed out-of-line, we can reuse the atomic access code for the locking (in fact, we use the xchg code as this produces shorter critical sections). This patch uses the generic xchg based implementation for mutexes on ARMv6+, which introduces barriers to the lock/unlock operations and also has the benefit of removing a fair amount of inline assembly code. Acked-by: Arnd Bergmann Acked-by: Nicolas Pitre Reported-by: Shan Kang Signed-off-by: Will Deacon --- Ok, so here's v2 following the feedback from Arnd and Nico. We now use the xchg-based version for all versions of the architecture and I've updated the commit message to reflect both that and also to remove the bit about avoiding the slowpath on frivolous CPUs! arch/arm/include/asm/mutex.h | 119 ++---------------------------------------- 1 files changed, 4 insertions(+), 115 deletions(-) diff --git a/arch/arm/include/asm/mutex.h b/arch/arm/include/asm/mutex.h index 93226cf..b1479fd 100644 --- a/arch/arm/include/asm/mutex.h +++ b/arch/arm/include/asm/mutex.h @@ -7,121 +7,10 @@ */ #ifndef _ASM_MUTEX_H #define _ASM_MUTEX_H - -#if __LINUX_ARM_ARCH__ < 6 -/* On pre-ARMv6 hardware the swp based implementation is the most efficient. */ -# include -#else - /* - * Attempting to lock a mutex on ARMv6+ can be done with a bastardized - * atomic decrement (it is not a reliable atomic decrement but it satisfies - * the defined semantics for our purpose, while being smaller and faster - * than a real atomic decrement or atomic swap. The idea is to attempt - * decrementing the lock value only once. If once decremented it isn't zero, - * or if its store-back fails due to a dispute on the exclusive store, we - * simply bail out immediately through the slow path where the lock will be - * reattempted until it succeeds. + * On pre-ARMv6 hardware this results in a swp-based implementation, + * which is the most efficient. For ARMv6+, we emit a pair of exclusive + * accesses instead. */ -static inline void -__mutex_fastpath_lock(atomic_t *count, void (*fail_fn)(atomic_t *)) -{ - int __ex_flag, __res; - - __asm__ ( - - "ldrex %0, [%2] \n\t" - "sub %0, %0, #1 \n\t" - "strex %1, %0, [%2] " - - : "=&r" (__res), "=&r" (__ex_flag) - : "r" (&(count)->counter) - : "cc","memory" ); - - __res |= __ex_flag; - if (unlikely(__res != 0)) - fail_fn(count); -} - -static inline int -__mutex_fastpath_lock_retval(atomic_t *count, int (*fail_fn)(atomic_t *)) -{ - int __ex_flag, __res; - - __asm__ ( - - "ldrex %0, [%2] \n\t" - "sub %0, %0, #1 \n\t" - "strex %1, %0, [%2] " - - : "=&r" (__res), "=&r" (__ex_flag) - : "r" (&(count)->counter) - : "cc","memory" ); - - __res |= __ex_flag; - if (unlikely(__res != 0)) - __res = fail_fn(count); - return __res; -} - -/* - * Same trick is used for the unlock fast path. However the original value, - * rather than the result, is used to test for success in order to have - * better generated assembly. - */ -static inline void -__mutex_fastpath_unlock(atomic_t *count, void (*fail_fn)(atomic_t *)) -{ - int __ex_flag, __res, __orig; - - __asm__ ( - - "ldrex %0, [%3] \n\t" - "add %1, %0, #1 \n\t" - "strex %2, %1, [%3] " - - : "=&r" (__orig), "=&r" (__res), "=&r" (__ex_flag) - : "r" (&(count)->counter) - : "cc","memory" ); - - __orig |= __ex_flag; - if (unlikely(__orig != 0)) - fail_fn(count); -} - -/* - * If the unlock was done on a contended lock, or if the unlock simply fails - * then the mutex remains locked. - */ -#define __mutex_slowpath_needs_to_unlock() 1 - -/* - * For __mutex_fastpath_trylock we use another construct which could be - * described as a "single value cmpxchg". - * - * This provides the needed trylock semantics like cmpxchg would, but it is - * lighter and less generic than a true cmpxchg implementation. - */ -static inline int -__mutex_fastpath_trylock(atomic_t *count, int (*fail_fn)(atomic_t *)) -{ - int __ex_flag, __res, __orig; - - __asm__ ( - - "1: ldrex %0, [%3] \n\t" - "subs %1, %0, #1 \n\t" - "strexeq %2, %1, [%3] \n\t" - "movlt %0, #0 \n\t" - "cmpeq %2, #0 \n\t" - "bgt 1b " - - : "=&r" (__orig), "=&r" (__res), "=&r" (__ex_flag) - : "r" (&count->counter) - : "cc", "memory" ); - - return __orig; -} - -#endif +#include #endif