[2/2] ARM: move VFP init to an earlier boot stage

Message ID	1369294104-3849-2-git-send-email-ard.biesheuvel@linaro.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org> From: Ard Biesheuvel <ard.biesheuvel@linaro.org> To: linux-arm-kernel@lists.infradead.org Subject: [PATCH 2/2] ARM: move VFP init to an earlier boot stage Date: Thu, 23 May 2013 09:28:24 +0200 Message-Id: <1369294104-3849-2-git-send-email-ard.biesheuvel@linaro.org> In-Reply-To: <1369294104-3849-1-git-send-email-ard.biesheuvel@linaro.org> References: <1369294104-3849-1-git-send-email-ard.biesheuvel@linaro.org> summary: Content analysis details: (-1.9 points) pts rule name description ---- ---------------------- -------------------------------------------------- -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>, nico@linaro.org Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Ard Biesheuvel May 23, 2013, 7:28 a.m. UTC

In order to use the NEON/VFP unit in the kernel, we should
initialize it a bit earlier in the boot process so NEON users
that like to do a quick benchmark at load time (like the
xor_blocks or RAID-6 code) find the NEON/VFP unit already
enabled.

Replaced late_initcall() with core_initcall().

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/vfp/vfpmodule.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Russell King - ARM Linux May 23, 2013, 8:41 a.m. UTC | #1

On Thu, May 23, 2013 at 09:28:24AM +0200, Ard Biesheuvel wrote:
> In order to use the NEON/VFP unit in the kernel, we should
> initialize it a bit earlier in the boot process so NEON users
> that like to do a quick benchmark at load time (like the
> xor_blocks or RAID-6 code) find the NEON/VFP unit already
> enabled.

So what are the results of those benchmarks that the RAID-6 code does?

This isn't something which will get accepted without there being a very
strong case for it; firstly, we've always said "no FP in the kernel".
Secondly, it makes it too easy for people to start thinking that FP is
safe in the kernel.  It isn't - this will go horribly wrong if the VFP
hardware bounces an instruction to the support code.

Lastly, what happens if you sleep between kernel_vfp_start() and
kernel_vfp_end() ?  That's a context switch which will mess up your
VFP state.  I think you need to take a spinlock (not _bh nor _irqsave
versions) in here to ensure that people don't get the idea that they
_can_ sleep between these two.

So, good attempt, but I think you need to think about this more, and
also justify the need for it.

Ard Biesheuvel May 23, 2013, 9:57 a.m. UTC | #2

On 23 May 2013 10:41, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote:
> On Thu, May 23, 2013 at 09:28:24AM +0200, Ard Biesheuvel wrote:
>> In order to use the NEON/VFP unit in the kernel, we should
>> initialize it a bit earlier in the boot process so NEON users
>> that like to do a quick benchmark at load time (like the
>> xor_blocks or RAID-6 code) find the NEON/VFP unit already
>> enabled.
>
> So what are the results of those benchmarks that the RAID-6 code does?
>

Output captured from a Cortex-A15 @ 1.7 GHz:

    xor: measuring software checksum speed
       arm4regs  :  2261.600 MB/sec
       8regs     :  1771.600 MB/sec
       32regs    :  1441.600 MB/sec
       neon      :  3619.600 MB/sec
    xor: using function: neon (3619.600 MB/sec)

Output captured from a Cortex-A15 @ 1.7 GHz:

    raid6: int32x1    200 MB/s
    raid6: int32x2    304 MB/s
    raid6: int32x4    388 MB/s
    raid6: int32x8    423 MB/s
    raid6: neonx1     799 MB/s
    raid6: neonx2    1364 MB/s
    raid6: neonx4    1731 MB/s
    raid6: neonx8    1676 MB/s
    raid6: using algorithm neonx4 (1731 MB/s)
    raid6: using intx1 recovery algorithm

I can put up both patches for RFC if you like.

> This isn't something which will get accepted without there being a very
> strong case for it; firstly, we've always said "no FP in the kernel".
> Secondly, it makes it too easy for people to start thinking that FP is
> safe in the kernel.  It isn't - this will go horribly wrong if the VFP
> hardware bounces an instruction to the support code.
>

That is clear. In the two cases I have been playing with, the
begin/end calls are in distinct compilation units from the actual
functions containing the NEON code. Perhaps we should enforce this by
conditionally turning kernel_vfp_begin() into a macro which errors out
(i.e., if __ARM_NEON__ is defined)?

> Lastly, what happens if you sleep between kernel_vfp_start() and
> kernel_vfp_end() ?  That's a context switch which will mess up your
> VFP state.  I think you need to take a spinlock (not _bh nor _irqsave
> versions) in here to ensure that people don't get the idea that they
> _can_ sleep between these two.
>

So generally, sleeping under preempt_disable() (or get cpu()) in this
case) is not currently handled in that way?
That is a misunderstanding on my part, I will address it in the code.

Regards,

Nicolas Pitre May 23, 2013, 3:23 p.m. UTC | #3

On Thu, 23 May 2013, Russell King - ARM Linux wrote:

> On Thu, May 23, 2013 at 09:28:24AM +0200, Ard Biesheuvel wrote:
> > In order to use the NEON/VFP unit in the kernel, we should
> > initialize it a bit earlier in the boot process so NEON users
> > that like to do a quick benchmark at load time (like the
> > xor_blocks or RAID-6 code) find the NEON/VFP unit already
> > enabled.
> 
> So what are the results of those benchmarks that the RAID-6 code does?
> 
> This isn't something which will get accepted without there being a very
> strong case for it; firstly, we've always said "no FP in the kernel".

That was my opinion as well.  However the numbers are compelling.

And X86 and PPC do use FP in the kernel already for very specific tasks 
because of that.

> Secondly, it makes it too easy for people to start thinking that FP is
> safe in the kernel.  It isn't - this will go horribly wrong if the VFP
> hardware bounces an instruction to the support code.

Hmmm. good point.  We probably should set a flag in kernel_vfp_begin() 
and BUG() if the support code is invoked while that flag is set.

> Lastly, what happens if you sleep between kernel_vfp_start() and
> kernel_vfp_end() ?  That's a context switch which will mess up your
> VFP state.  I think you need to take a spinlock (not _bh nor _irqsave
> versions) in here to ensure that people don't get the idea that they
> _can_ sleep between these two.

Isn't preemption off sufficient to flag that case?  The scheduler is 
very noisy when schedule() is called with preemption disabled.

Nicolas

Ard Biesheuvel May 23, 2013, 8:47 p.m. UTC | #4

On 23 May 2013 17:23, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Thu, 23 May 2013, Russell King - ARM Linux wrote:
>
>> Secondly, it makes it too easy for people to start thinking that FP is
>> safe in the kernel.  It isn't - this will go horribly wrong if the VFP
>> hardware bounces an instruction to the support code.
>
> Hmmm. good point.  We probably should set a flag in kernel_vfp_begin()
> and BUG() if the support code is invoked while that flag is set.
>

Currently, the lazy restore is armed even in kernel mode, so it pulls
in the userland context if you issue NEON/VFP instructions in kernel
mode with the FP unit off. This is probably a side effect of the VFP
detection code relying on this code path as well.
As kernel_vfp_begin() disarms the lazy restore and takes care of
preserving the userland context itself, entering __und_svc because of
NEON/VFP instructions should only be allowed in this particular case,
i.e., VFP detection and all other instances should BUG() imho
(regardless of if we end up doing NEON in the kernel or not). Doing it
that way, I don't think there is a reason for keeping additional
flags.

>> Lastly, what happens if you sleep between kernel_vfp_start() and
>> kernel_vfp_end() ?  That's a context switch which will mess up your
>> VFP state.  I think you need to take a spinlock (not _bh nor _irqsave
>> versions) in here to ensure that people don't get the idea that they
>> _can_ sleep between these two.
>
> Isn't preemption off sufficient to flag that case?  The scheduler is
> very noisy when schedule() is called with preemption disabled.
>

That was my understanding as well.

Nicolas Pitre May 23, 2013, 8:57 p.m. UTC | #5

On Thu, 23 May 2013, Ard Biesheuvel wrote:

> On 23 May 2013 17:23, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > On Thu, 23 May 2013, Russell King - ARM Linux wrote:
> >
> >> Secondly, it makes it too easy for people to start thinking that FP is
> >> safe in the kernel.  It isn't - this will go horribly wrong if the VFP
> >> hardware bounces an instruction to the support code.
> >
> > Hmmm. good point.  We probably should set a flag in kernel_vfp_begin()
> > and BUG() if the support code is invoked while that flag is set.
> >
> 
> Currently, the lazy restore is armed even in kernel mode, so it pulls
> in the userland context if you issue NEON/VFP instructions in kernel
> mode with the FP unit off. This is probably a side effect of the VFP
> detection code relying on this code path as well.
> As kernel_vfp_begin() disarms the lazy restore and takes care of
> preserving the userland context itself, entering __und_svc because of
> NEON/VFP instructions should only be allowed in this particular case,
> i.e., VFP detection and all other instances should BUG() imho
> (regardless of if we end up doing NEON in the kernel or not). Doing it
> that way, I don't think there is a reason for keeping additional
> flags.

The concern is more about VFP instructions that could trigger an 
exception where support code is invoked to emulate that instruction.


Nicolas

Ard Biesheuvel May 24, 2013, 8:01 a.m. UTC | #6

On 23 May 2013 22:57, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Thu, 23 May 2013, Ard Biesheuvel wrote:
>> Currently, the lazy restore is armed even in kernel mode, so it pulls
>> in the userland context if you issue NEON/VFP instructions in kernel
>> mode with the FP unit off. This is probably a side effect of the VFP
>> detection code relying on this code path as well.
>> As kernel_vfp_begin() disarms the lazy restore and takes care of
>> preserving the userland context itself, entering __und_svc because of
>> NEON/VFP instructions should only be allowed in this particular case,
>> i.e., VFP detection and all other instances should BUG() imho
>> (regardless of if we end up doing NEON in the kernel or not). Doing it
>> that way, I don't think there is a reason for keeping additional
>> flags.
>
> The concern is more about VFP instructions that could trigger an
> exception where support code is invoked to emulate that instruction.
>

So perhaps that is where we should draw the line: I am not proposing
supporting arbitrary VFP code in the kernel, just NEON (which does not
bounce and hence does not require the support code). Maybe I should
rename to kernel_neon_begin/kernel_neon_end to reflect that?

Regards,
Ard.

[2/2] ARM: move VFP init to an earlier boot stage

Commit Message

Comments

Patch