From patchwork Wed Oct 9 18:50:31 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 3011241 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 1F73F9F2B7 for ; Wed, 9 Oct 2013 18:54:18 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 15337202CF for ; Wed, 9 Oct 2013 18:54:17 +0000 (UTC) Received: from casper.infradead.org (casper.infradead.org [85.118.1.10]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E8F3A202B8 for ; Wed, 9 Oct 2013 18:54:15 +0000 (UTC) Received: from merlin.infradead.org ([2001:4978:20e::2]) by casper.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1VTysn-0006Fs-8s; Wed, 09 Oct 2013 18:53:17 +0000 Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1VTysR-0005ML-Nb; Wed, 09 Oct 2013 18:52:55 +0000 Received: from mail-we0-f180.google.com ([74.125.82.180]) by merlin.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1VTysF-0005Jr-4A for linux-arm-kernel@lists.infradead.org; Wed, 09 Oct 2013 18:52:45 +0000 Received: by mail-we0-f180.google.com with SMTP id q59so1327166wes.25 for ; Wed, 09 Oct 2013 11:52:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=SKdz46IINyiI/0dE08VVykGNH7aZyfPpoWEFyAZYwQ8=; b=FFNHM9Mf1vA3GCKFw/1Ja/PapBwf7XeWFWAUeLXEyEYCIbpWlQTGpgkYxtewP1iniS 3jmwJhNwXgP7nq17Jk6MN46ydXsltlWhBfZWl6Lp9NKuOBU/RbJdPBIWQK4HdW/xWpYU iloXsvdAioElUtBa6RZONb+hN2WgQ77kkA3+pQdvVvUlXHYrPgh8jcjEMYIRO2f+rCJY ojNKR6MuhhC0SIGI7KBWlUYjnaL1qTY80ERYY4yYdZnVmDIzWYVmz40ZcJKNmVbewf0A bSKiq0PUHY97uZuQLDEpyC51XQM5Em1mr3IKSlik5Hdda+N1aLnpIQu6uKAg/ahixJ9O REfw== X-Gm-Message-State: ALoCoQn6+eAfCApIkjpZdDYQyNC8v351obN9uMhX09JFXNfIzUit5YlTGhrRVeChJNk4ULp7eK4o X-Received: by 10.180.78.193 with SMTP id d1mr3149915wix.22.1381344741574; Wed, 09 Oct 2013 11:52:21 -0700 (PDT) Received: from ards-mac-mini.local (cag06-7-83-153-85-71.fbx.proxad.net. [83.153.85.71]) by mx.google.com with ESMTPSA id l9sm17911688wif.10.1969.12.31.16.00.00 (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 09 Oct 2013 11:52:21 -0700 (PDT) From: Ard Biesheuvel To: linux-arm-kernel@lists.infradead.org Subject: [RFC v2 PATCH 1/4] ARM: add support for kernel mode NEON in atomic context Date: Wed, 9 Oct 2013 20:50:31 +0200 Message-Id: <1381344634-14917-2-git-send-email-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 1.8.1.2 In-Reply-To: <1381344634-14917-1-git-send-email-ard.biesheuvel@linaro.org> References: <1381344634-14917-1-git-send-email-ard.biesheuvel@linaro.org> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20131009_145243_424315_41615A8F X-CRM114-Status: GOOD ( 15.39 ) X-Spam-Score: -2.6 (--) Cc: Ard Biesheuvel , linux@arm.linux.org.uk, dave.martin@arm.com, nico@linaro.org X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Some applications, such as WPA CCMP encryption, do substantial amounts of work in non-process context. In order to support accelerated NEON implementations under these circumstances, we need a way to preserve the NEON context that may (a) belong to a completely unrelated userland process (if the NEON unit is turned off atm); (b) belong to current userland; (c) belong to current kernel mode in process context. The best way to deal with this is to just stack whatever registers we are going to use, and unstack them when we are done. This patch adds kernel_neon_begin_atomic() and kernel_neon_end_atomic(), which may be called from any context. In !in_interrupt() case, they just call their non-_atomic counterparts. In atomic context, they stack resp. unstack the number of NEON registers declared when setting up the stack area using DEFINE_NEON_REG_STACK(). Signed-off-by: Ard Biesheuvel --- arch/arm/include/asm/fpstate.h | 15 +++++++++++++- arch/arm/include/asm/neon.h | 34 +++++++++++++++++++++++++++++++ arch/arm/vfp/vfphw.S | 46 ++++++++++++++++++++++++++++++++++++++++++ arch/arm/vfp/vfpmodule.c | 3 +++ 4 files changed, 97 insertions(+), 1 deletion(-) diff --git a/arch/arm/include/asm/fpstate.h b/arch/arm/include/asm/fpstate.h index 3ad4c10..7a6e100 100644 --- a/arch/arm/include/asm/fpstate.h +++ b/arch/arm/include/asm/fpstate.h @@ -19,7 +19,7 @@ * - FPEXC, FPSCR, FPINST and FPINST2. * - 16 or 32 double precision data registers * - an implementation-dependent word of state for FLDMX/FSTMX (pre-ARMv6) - * + * * FPEXC will always be non-zero once the VFP has been used in this process. */ @@ -52,6 +52,19 @@ union vfp_state { extern void vfp_flush_thread(union vfp_state *); extern void vfp_release_thread(union vfp_state *); +/* + * Variable sized struct for stacking the bottom 'n' NEON registers. + */ +struct vfp_partial_state { + const __u32 num_regs; + __u32 fpexc; + __u32 fpscr; + __u8 qregs[] __aligned(16); +} __aligned(16); + +extern void vfp_load_partial_state(struct vfp_partial_state *); +extern void vfp_save_partial_state(struct vfp_partial_state *); + #define FP_HARD_SIZE 35 struct fp_hard_struct { diff --git a/arch/arm/include/asm/neon.h b/arch/arm/include/asm/neon.h index 8f730fe..1efd9fc 100644 --- a/arch/arm/include/asm/neon.h +++ b/arch/arm/include/asm/neon.h @@ -8,10 +8,21 @@ * published by the Free Software Foundation. */ +#include +#include +#include #include #define cpu_has_neon() (!!(elf_hwcap & HWCAP_NEON)) +#define DEFINE_NEON_STACK_REGS(v, num) \ + struct { \ + struct vfp_partial_state regs; \ + u8 qregs[(num) > 16 ? 256 : 16 * (((num) + 1) & ~1U)]; \ + } v = { .regs.num_regs = sizeof(v.qregs)/16 } + +#define DEFINE_NEON_STACK_REGS_ALL(name) DEFINE_NEON_STACK_REGS(name,16) + #ifdef __ARM_NEON__ /* @@ -30,7 +41,30 @@ #define kernel_neon_begin() \ BUILD_BUG_ON_MSG(1, "kernel_neon_begin() called from NEON code") +#define kernel_neon_begin_atomic(a) \ + BUILD_BUG_ON_MSG(1, "kernel_neon_begin_atomic() called from NEON code") + #else void kernel_neon_begin(void); +#define kernel_neon_begin_atomic(name) __kernel_neon_begin_atomic(&(name).regs) #endif + +#define kernel_neon_end_atomic(name) __kernel_neon_end_atomic(&(name).regs) + void kernel_neon_end(void); + +static inline void __kernel_neon_begin_atomic(struct vfp_partial_state *regs) +{ + if (!in_interrupt()) + kernel_neon_begin(); + else + vfp_save_partial_state(regs); +} + +static inline void __kernel_neon_end_atomic(struct vfp_partial_state *regs) +{ + if (!in_interrupt()) + kernel_neon_end(); + else + vfp_load_partial_state(regs); +} diff --git a/arch/arm/vfp/vfphw.S b/arch/arm/vfp/vfphw.S index 3e5d311..747e782 100644 --- a/arch/arm/vfp/vfphw.S +++ b/arch/arm/vfp/vfphw.S @@ -322,3 +322,49 @@ ENTRY(vfp_put_double) .endr #endif ENDPROC(vfp_put_double) + + +#ifdef CONFIG_KERNEL_MODE_NEON + + .fpu neon +ENTRY(vfp_save_partial_state) + VFPFMRX r2, FPEXC @ load the control registers + VFPFMRX r3, FPSCR + strd r2, r3, [r0, #4] @ and save to memory + tst r2, #FPEXC_EN + bne 0f + orr r2, r2, #FPEXC_EN @ enable VFP if it was disabled + VFPFMXR FPEXC, r2 +0: ldr r1, [r0] @ load # of regs to preserve + rsbs r1, r1, #16 + add r2, r0, #16 + beq 1f + adr r3, 1f + add r3, r3, r1, lsl #1 +THUMB( orr r3, r3, #1) + bx r3 +1: .irp qq,q14-q15,q12-q13,q10-q11,q8-q9,q6-q7,q4-q5,q2-q3,q0-q1 + vst1.8 {\qq}, [r2,:128]! + .endr + bx lr +ENDPROC(vfp_save_partial_state) + +ENTRY(vfp_load_partial_state) + ldr r2, [r0] @ load # of regs to preserve + rsbs r1, r2, #16 + add r2, r0, #16 + beq 0f + adr r3, 0f + add r3, r3, r1, lsl #1 +THUMB( orr r3, r3, #1) + bx r3 +0: .irp qq,q14-q15,q12-q13,q10-q11,q8-q9,q6-q7,q4-q5,q2-q3,q0-q1 + vld1.8 {\qq}, [r2,:128]! + .endr + ldrd r2, r3, [r0, #4] + VFPFMXR FPSCR, r3 + VFPFMXR FPEXC, r2 + bx lr +ENDPROC(vfp_load_partial_state) + +#endif diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c index 52b8f40..3dea5ba 100644 --- a/arch/arm/vfp/vfpmodule.c +++ b/arch/arm/vfp/vfpmodule.c @@ -713,6 +713,9 @@ void kernel_neon_end(void) } EXPORT_SYMBOL(kernel_neon_end); +EXPORT_SYMBOL(vfp_save_partial_state); +EXPORT_SYMBOL(vfp_load_partial_state); + #endif /* CONFIG_KERNEL_MODE_NEON */ /*