From patchwork Fri Aug 9 12:21:09 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 2841909 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id A5E65BF546 for ; Fri, 9 Aug 2013 12:21:55 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 52C8720204 for ; Fri, 9 Aug 2013 12:21:54 +0000 (UTC) Received: from casper.infradead.org (casper.infradead.org [85.118.1.10]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A8C68201EB for ; Fri, 9 Aug 2013 12:21:52 +0000 (UTC) Received: from merlin.infradead.org ([2001:4978:20e::2]) by casper.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1V7lhR-00074s-LK; Fri, 09 Aug 2013 12:21:45 +0000 Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1V7lhP-0003hs-I9; Fri, 09 Aug 2013 12:21:43 +0000 Received: from mail-wg0-f53.google.com ([74.125.82.53]) by merlin.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1V7lhM-0003h6-K6 for linux-arm-kernel@lists.infradead.org; Fri, 09 Aug 2013 12:21:41 +0000 Received: by mail-wg0-f53.google.com with SMTP id c11so3537062wgh.20 for ; Fri, 09 Aug 2013 05:21:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=weSD0A3OMBKEPKzgHPxFpvQBItpKuxZYeQ4ZCscLQVE=; b=HroMg2EMMZqnJ86I9bTBQAsiSKsphl2kjoIVsA/Xq/5QZKjcsEbLm7eih9nqgPoq7Q 8pt4OnsI+sVYpFgvL7zll4G8GyxmuiHkSpkFGe+Mc1sn48LPmP1GUM2Jt0U9IDNDbUp7 YIbeRjqapPmw7WpM467Rdu98DZRytgdpOcOfYTsYZbZrhK0RXBYq279JJp9PlYBfmTfl 3Nw7OMmY5Vkn299Cx6NoHi03g8hwNA9kwpMFYVjAzyiFpzS+Ufxerq0CpRK5EMPpgWpJ HeqJjCK6FZNr5Wl22AS1nYLNfJj31EOFtrcuii8sGQqwssdhEtheQ3yii0mXovQ1WKC1 QCeA== X-Gm-Message-State: ALoCoQlA4I0Awg9DqgFl07YxBHkTUl2qQh5knvC7IKj3CZfaQlIvxYKicA3g5x6FumKkZogn3qDh X-Received: by 10.180.149.204 with SMTP id uc12mr165776wib.47.1376050876713; Fri, 09 Aug 2013 05:21:16 -0700 (PDT) Received: from ards-mac-mini.homenet.telecomitalia.it ([95.235.231.18]) by mx.google.com with ESMTPSA id w4sm2520490wia.9.2013.08.09.05.21.15 for (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 09 Aug 2013 05:21:16 -0700 (PDT) From: Ard Biesheuvel To: linux-arm-kernel@lists.infradead.org Subject: [PATCH] ARM: document the use of NEON in kernel mode Date: Fri, 9 Aug 2013 14:21:09 +0200 Message-Id: <1376050869-29255-1-git-send-email-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 1.8.1.2 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20130809_082140_865219_D0D9CE4B X-CRM114-Status: GOOD ( 22.54 ) X-Spam-Score: -2.6 (--) Cc: linux@arm.linux.org.uk, Ard Biesheuvel X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: Ard Biesheuvel Reviewed-by: Nicolas Pitre --- Documentation/arm/kernel_mode_neon.txt | 132 +++++++++++++++++++++++++++++++++ 1 file changed, 132 insertions(+) create mode 100644 Documentation/arm/kernel_mode_neon.txt diff --git a/Documentation/arm/kernel_mode_neon.txt b/Documentation/arm/kernel_mode_neon.txt new file mode 100644 index 0000000..4c2de85 --- /dev/null +++ b/Documentation/arm/kernel_mode_neon.txt @@ -0,0 +1,132 @@ +Kernel mode NEON +================ + +TL;DR summary +------------- +* Use only NEON instructions, or VFP instructions that don't rely on support + code +* Isolate your NEON code in a separate compilation unit, and compile it with + '-mfpu=neon -mfloat-abi=softfp' +* Put kernel_neon_begin() and kernel_neon_end() calls around the calls into your + NEON code +* Don't sleep in your NEON code, and be aware that it will be executed with + preemption disabled + + +Introduction +------------ +It is possible to use NEON instructions (and in some cases, VFP instructions) in +code that runs in kernel mode. However, for performance reasons, the NEON/VFP +register file is not preserved and restored at every context switch or taken +exception like the normal register file is, so some manual intervention is +required. Furthermore, special care is required for code that may sleep [i.e., +may call schedule()], as NEON or VFP instructions will be executed in a +non-preemptible section for reasons outlined below. + + +Lazy preserve and restore +------------------------- +The NEON/VFP register file is managed using lazy preserve (on UP systems) and +lazy restore (on both SMP and UP systems). This means that the register file is +kept 'live', and is only preserved and restored when multiple tasks are +contending for the NEON/VFP unit (or, in the SMP case, when a task migrates to +another core). Lazy restore is implemented by disabling the NEON/VFP unit after +every context switch, resulting in a trap when subsequently a NEON/VFP +instruction is issued, allowing the kernel to step in and perform the restore if +necessary. + +Any use of the NEON/VFP unit in kernel mode should not interfere with this, so +it is required to do an 'eager' preserve of the NEON/VFP register file, and +enable the NEON/VFP unit explicitly so no exceptions are generated on first +subsequent use. This is handled by the function kernel_neon_begin(), which +should be called before any kernel mode NEON or VFP instructions are issued. +Likewise, the NEON/VFP unit should be disabled again after use to make sure user +mode will hit the lazy restore trap upon next use. This is handled by the +function kernel_neon_end(). + + +Interruptions in kernel mode +---------------------------- +For reasons of performance and simplicity, it was decided that there shall be no +preserve/restore mechanism for the kernel mode NEON/VFP register contents. This +implies that interruptions of a kernel mode NEON section can only be allowed if +they are guaranteed not to touch the NEON/VFP registers. For this reason, the +following rules and restrictions apply in the kernel: +* NEON/VFP code is not allowed in interrupt context; +* NEON/VFP code is not allowed to sleep; +* NEON/VFP code is executed with preemption disabled. + +If latency is a concern, it is possible to put back to back calls to +kernel_neon_end() and kernel_neon_begin() in places in your code where none of +the NEON registers are live. (Additional calls to kernel_neon_begin() should be +reasonably cheap if no context switch occurred in the meantime) + + +VFP and support code +-------------------- +Earlier versions of VFP (prior to version 3) rely on software support for things +like IEEE-754 compliant underflow handling etc. When the VFP unit needs such +software assistance, it signals the kernel by raising an undefined instruction +exception. The kernel responds by inspecting the VFP control registers and the +current instruction and arguments, and emulates the instruction in software. + +Such software assistance is currently not implemented for VFP instructions +executed in kernel mode. If such a condition is encountered, the kernel will +fail and generate an OOPS. + + +Separating NEON code from ordinary code +--------------------------------------- +The compiler is not aware of the special significance of kernel_neon_begin() and +kernel_neon_end(), i.e., that it is only allowed to issue NEON/VFP instructions +between calls to these respective functions. Furthermore, GCC may generate NEON +instructions of its own at -O3 level if -mfpu=neon is selected, and even if the +kernel is currently compiled at -O2, future changes may result in NEON/VFP +instructions appearing in unexpected places if no special care is taken. + +Therefore, the recommended and only supported way of using NEON/VFP in the +kernel is by adhering to the following rules: +* isolate the NEON code in a separate compilation unit and compile it with + '-mfpu=neon -mfloat-abi=softfp'; +* issue the calls to kernel_neon_begin(), kernel_neon_end() as well as the calls + into the unit containing the NEON code from a compilation unit which is *not* + built with the GCC flag '-mfpu=neon' set. + +As the kernel is compiled with '-msoft-float', the above will guarantee that +both NEON and VFP instructions will only ever appear in designated compilation +units at any optimization level. + + +NEON assembler +-------------- +NEON assembler is supported with no additional caveats as long as the rules +above are followed. + + +NEON code generated by GCC +-------------------------- +The GCC option -ftree-vectorize (implied by -O3) tries to exploit implicit +parallelism, and generates NEON code from ordinary C source code. This is fully +supported as long as the rules above are followed. + + +NEON intrinsics +--------------- +NEON intrinsics are also supported. However, as code using NEON intrinsics +relies on the GCC header 'arm_neon.h', the following tricks are necessary as +this header is not fully compatible with the kernel: +* Declare the interface between the NEON intrinsics code and its caller using + only plain old C types (i.e., avoid using uintXX_t and uXX types; they seem + unambiguous but they are not[1]); +* Compile the unit containing the NEON intrinsics with '-ffreestanding' so it + does not choke on the missing 'stdint.h' #included by 'arm_neon.h' (this is a + C99 header which the kernel does not supply) and don't include any ordinary + kernel headers; +* Call the NEON code from a separate compilation unit that does the interfacing + with the rest of the kernel, includes kernel headers, types etc. + +---- +[1] Neither the bare metal version nor the glibc version of GCC agrees with the + kernel on the definitions of int32_t, uint32_t and/or uintptr_t: this + becomes a problem when you try to include both and + in the same compilation unit.