[v2] ARM: document the use of NEON in kernel mode

Message ID	1376897504-24672-1-git-send-email-ard.biesheuvel@linaro.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org> From: Ard Biesheuvel <ard.biesheuvel@linaro.org> To: linux@arm.linux.org.uk Subject: [PATCH v2] ARM: document the use of NEON in kernel mode Date: Mon, 19 Aug 2013 09:31:44 +0200 Message-Id: <1376897504-24672-1-git-send-email-ard.biesheuvel@linaro.org> Cc: linux-arm-kernel@lists.infradead.org, Ard Biesheuvel <ard.biesheuvel@linaro.org> Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Message ID

1376897504-24672-1-git-send-email-ard.biesheuvel@linaro.org (mailing list archive)

State

New, archived

Headers

From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: linux@arm.linux.org.uk
Subject: [PATCH v2] ARM: document the use of NEON in kernel mode
Date: Mon, 19 Aug 2013 09:31:44 +0200
Message-Id: <1376897504-24672-1-git-send-email-ard.biesheuvel@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>
Precedence: list
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Commit Message

Ard Biesheuvel Aug. 19, 2013, 7:31 a.m. UTC

Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
v2:
updated the NEON intrinsics section to reflect that the type ambiguity issue has
been addressed by patch 'ARM: add workaround for ambiguous C99 stdint.h types'

 Documentation/arm/kernel_mode_neon.txt | 121 +++++++++++++++++++++++++++++++++
 1 file changed, 121 insertions(+)
 create mode 100644 Documentation/arm/kernel_mode_neon.txt

Comments

Ard Biesheuvel Aug. 23, 2013, 1:48 p.m. UTC | #1

No objections apparently, so pushed to Russell's patch tracker [7825/1]

Regards,
Ard.

On 19 August 2013 09:31, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> Reviewed-by: Nicolas Pitre <nico@linaro.org>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
> v2:
> updated the NEON intrinsics section to reflect that the type ambiguity issue has
> been addressed by patch 'ARM: add workaround for ambiguous C99 stdint.h types'
>
>  Documentation/arm/kernel_mode_neon.txt | 121 +++++++++++++++++++++++++++++++++
>  1 file changed, 121 insertions(+)
>  create mode 100644 Documentation/arm/kernel_mode_neon.txt
>
> diff --git a/Documentation/arm/kernel_mode_neon.txt b/Documentation/arm/kernel_mode_neon.txt
> new file mode 100644
> index 0000000..5254527
> --- /dev/null
> +++ b/Documentation/arm/kernel_mode_neon.txt
> @@ -0,0 +1,121 @@
> +Kernel mode NEON
> +================
> +
> +TL;DR summary
> +-------------
> +* Use only NEON instructions, or VFP instructions that don't rely on support
> +  code
> +* Isolate your NEON code in a separate compilation unit, and compile it with
> +  '-mfpu=neon -mfloat-abi=softfp'
> +* Put kernel_neon_begin() and kernel_neon_end() calls around the calls into your
> +  NEON code
> +* Don't sleep in your NEON code, and be aware that it will be executed with
> +  preemption disabled
> +
> +
> +Introduction
> +------------
> +It is possible to use NEON instructions (and in some cases, VFP instructions) in
> +code that runs in kernel mode. However, for performance reasons, the NEON/VFP
> +register file is not preserved and restored at every context switch or taken
> +exception like the normal register file is, so some manual intervention is
> +required. Furthermore, special care is required for code that may sleep [i.e.,
> +may call schedule()], as NEON or VFP instructions will be executed in a
> +non-preemptible section for reasons outlined below.
> +
> +
> +Lazy preserve and restore
> +-------------------------
> +The NEON/VFP register file is managed using lazy preserve (on UP systems) and
> +lazy restore (on both SMP and UP systems). This means that the register file is
> +kept 'live', and is only preserved and restored when multiple tasks are
> +contending for the NEON/VFP unit (or, in the SMP case, when a task migrates to
> +another core). Lazy restore is implemented by disabling the NEON/VFP unit after
> +every context switch, resulting in a trap when subsequently a NEON/VFP
> +instruction is issued, allowing the kernel to step in and perform the restore if
> +necessary.
> +
> +Any use of the NEON/VFP unit in kernel mode should not interfere with this, so
> +it is required to do an 'eager' preserve of the NEON/VFP register file, and
> +enable the NEON/VFP unit explicitly so no exceptions are generated on first
> +subsequent use. This is handled by the function kernel_neon_begin(), which
> +should be called before any kernel mode NEON or VFP instructions are issued.
> +Likewise, the NEON/VFP unit should be disabled again after use to make sure user
> +mode will hit the lazy restore trap upon next use. This is handled by the
> +function kernel_neon_end().
> +
> +
> +Interruptions in kernel mode
> +----------------------------
> +For reasons of performance and simplicity, it was decided that there shall be no
> +preserve/restore mechanism for the kernel mode NEON/VFP register contents. This
> +implies that interruptions of a kernel mode NEON section can only be allowed if
> +they are guaranteed not to touch the NEON/VFP registers. For this reason, the
> +following rules and restrictions apply in the kernel:
> +* NEON/VFP code is not allowed in interrupt context;
> +* NEON/VFP code is not allowed to sleep;
> +* NEON/VFP code is executed with preemption disabled.
> +
> +If latency is a concern, it is possible to put back to back calls to
> +kernel_neon_end() and kernel_neon_begin() in places in your code where none of
> +the NEON registers are live. (Additional calls to kernel_neon_begin() should be
> +reasonably cheap if no context switch occurred in the meantime)
> +
> +
> +VFP and support code
> +--------------------
> +Earlier versions of VFP (prior to version 3) rely on software support for things
> +like IEEE-754 compliant underflow handling etc. When the VFP unit needs such
> +software assistance, it signals the kernel by raising an undefined instruction
> +exception. The kernel responds by inspecting the VFP control registers and the
> +current instruction and arguments, and emulates the instruction in software.
> +
> +Such software assistance is currently not implemented for VFP instructions
> +executed in kernel mode. If such a condition is encountered, the kernel will
> +fail and generate an OOPS.
> +
> +
> +Separating NEON code from ordinary code
> +---------------------------------------
> +The compiler is not aware of the special significance of kernel_neon_begin() and
> +kernel_neon_end(), i.e., that it is only allowed to issue NEON/VFP instructions
> +between calls to these respective functions. Furthermore, GCC may generate NEON
> +instructions of its own at -O3 level if -mfpu=neon is selected, and even if the
> +kernel is currently compiled at -O2, future changes may result in NEON/VFP
> +instructions appearing in unexpected places if no special care is taken.
> +
> +Therefore, the recommended and only supported way of using NEON/VFP in the
> +kernel is by adhering to the following rules:
> +* isolate the NEON code in a separate compilation unit and compile it with
> +  '-mfpu=neon -mfloat-abi=softfp';
> +* issue the calls to kernel_neon_begin(), kernel_neon_end() as well as the calls
> +  into the unit containing the NEON code from a compilation unit which is *not*
> +  built with the GCC flag '-mfpu=neon' set.
> +
> +As the kernel is compiled with '-msoft-float', the above will guarantee that
> +both NEON and VFP instructions will only ever appear in designated compilation
> +units at any optimization level.
> +
> +
> +NEON assembler
> +--------------
> +NEON assembler is supported with no additional caveats as long as the rules
> +above are followed.
> +
> +
> +NEON code generated by GCC
> +--------------------------
> +The GCC option -ftree-vectorize (implied by -O3) tries to exploit implicit
> +parallelism, and generates NEON code from ordinary C source code. This is fully
> +supported as long as the rules above are followed.
> +
> +
> +NEON intrinsics
> +---------------
> +NEON intrinsics are also supported. However, as code using NEON intrinsics
> +relies on the GCC header <arm_neon.h>, (which #includes <stdint.h>), you should
> +observe the following in addition to the rules above:
> +* Compile the unit containing the NEON intrinsics with '-ffreestanding' so GCC
> +  uses its builtin version of <stdint.h> (this is a C99 header which the kernel
> +  does not supply);
> +* Include <arm_neon.h> last, or at least after <linux/types.h>
> --
> 1.8.1.2
>

diff --git a/Documentation/arm/kernel_mode_neon.txt b/Documentation/arm/kernel_mode_neon.txt
new file mode 100644
index 0000000..5254527
--- /dev/null
+++ b/Documentation/arm/kernel_mode_neon.txt
@@ -0,0 +1,121 @@ 
+Kernel mode NEON
+================
+
+TL;DR summary
+-------------
+* Use only NEON instructions, or VFP instructions that don't rely on support
+  code
+* Isolate your NEON code in a separate compilation unit, and compile it with
+  '-mfpu=neon -mfloat-abi=softfp'
+* Put kernel_neon_begin() and kernel_neon_end() calls around the calls into your
+  NEON code
+* Don't sleep in your NEON code, and be aware that it will be executed with
+  preemption disabled
+
+
+Introduction
+------------
+It is possible to use NEON instructions (and in some cases, VFP instructions) in
+code that runs in kernel mode. However, for performance reasons, the NEON/VFP
+register file is not preserved and restored at every context switch or taken
+exception like the normal register file is, so some manual intervention is
+required. Furthermore, special care is required for code that may sleep [i.e.,
+may call schedule()], as NEON or VFP instructions will be executed in a
+non-preemptible section for reasons outlined below.
+
+
+Lazy preserve and restore
+-------------------------
+The NEON/VFP register file is managed using lazy preserve (on UP systems) and
+lazy restore (on both SMP and UP systems). This means that the register file is
+kept 'live', and is only preserved and restored when multiple tasks are
+contending for the NEON/VFP unit (or, in the SMP case, when a task migrates to
+another core). Lazy restore is implemented by disabling the NEON/VFP unit after
+every context switch, resulting in a trap when subsequently a NEON/VFP
+instruction is issued, allowing the kernel to step in and perform the restore if
+necessary.
+
+Any use of the NEON/VFP unit in kernel mode should not interfere with this, so
+it is required to do an 'eager' preserve of the NEON/VFP register file, and
+enable the NEON/VFP unit explicitly so no exceptions are generated on first
+subsequent use. This is handled by the function kernel_neon_begin(), which
+should be called before any kernel mode NEON or VFP instructions are issued.
+Likewise, the NEON/VFP unit should be disabled again after use to make sure user
+mode will hit the lazy restore trap upon next use. This is handled by the
+function kernel_neon_end().
+
+
+Interruptions in kernel mode
+----------------------------
+For reasons of performance and simplicity, it was decided that there shall be no
+preserve/restore mechanism for the kernel mode NEON/VFP register contents. This
+implies that interruptions of a kernel mode NEON section can only be allowed if
+they are guaranteed not to touch the NEON/VFP registers. For this reason, the
+following rules and restrictions apply in the kernel:
+* NEON/VFP code is not allowed in interrupt context;
+* NEON/VFP code is not allowed to sleep;
+* NEON/VFP code is executed with preemption disabled.
+
+If latency is a concern, it is possible to put back to back calls to
+kernel_neon_end() and kernel_neon_begin() in places in your code where none of
+the NEON registers are live. (Additional calls to kernel_neon_begin() should be
+reasonably cheap if no context switch occurred in the meantime)
+
+
+VFP and support code
+--------------------
+Earlier versions of VFP (prior to version 3) rely on software support for things
+like IEEE-754 compliant underflow handling etc. When the VFP unit needs such
+software assistance, it signals the kernel by raising an undefined instruction
+exception. The kernel responds by inspecting the VFP control registers and the
+current instruction and arguments, and emulates the instruction in software.
+
+Such software assistance is currently not implemented for VFP instructions
+executed in kernel mode. If such a condition is encountered, the kernel will
+fail and generate an OOPS.
+
+
+Separating NEON code from ordinary code
+---------------------------------------
+The compiler is not aware of the special significance of kernel_neon_begin() and
+kernel_neon_end(), i.e., that it is only allowed to issue NEON/VFP instructions
+between calls to these respective functions. Furthermore, GCC may generate NEON
+instructions of its own at -O3 level if -mfpu=neon is selected, and even if the
+kernel is currently compiled at -O2, future changes may result in NEON/VFP
+instructions appearing in unexpected places if no special care is taken.
+
+Therefore, the recommended and only supported way of using NEON/VFP in the
+kernel is by adhering to the following rules:
+* isolate the NEON code in a separate compilation unit and compile it with
+  '-mfpu=neon -mfloat-abi=softfp';
+* issue the calls to kernel_neon_begin(), kernel_neon_end() as well as the calls
+  into the unit containing the NEON code from a compilation unit which is *not*
+  built with the GCC flag '-mfpu=neon' set.
+
+As the kernel is compiled with '-msoft-float', the above will guarantee that
+both NEON and VFP instructions will only ever appear in designated compilation
+units at any optimization level.
+
+
+NEON assembler
+--------------
+NEON assembler is supported with no additional caveats as long as the rules
+above are followed.
+
+
+NEON code generated by GCC
+--------------------------
+The GCC option -ftree-vectorize (implied by -O3) tries to exploit implicit
+parallelism, and generates NEON code from ordinary C source code. This is fully
+supported as long as the rules above are followed.
+
+
+NEON intrinsics
+---------------
+NEON intrinsics are also supported. However, as code using NEON intrinsics
+relies on the GCC header <arm_neon.h>, (which #includes <stdint.h>), you should
+observe the following in addition to the rules above:
+* Compile the unit containing the NEON intrinsics with '-ffreestanding' so GCC
+  uses its builtin version of <stdint.h> (this is a C99 header which the kernel
+  does not supply);
+* Include <arm_neon.h> last, or at least after <linux/types.h>

[v2] ARM: document the use of NEON in kernel mode

Commit Message

Comments

Patch