diff mbox

[RT,v2] arm64: fpsimd: use a local_lock() in addition to local_bh_disable()

Message ID 20180713174937.5ddaqpylalcmc3jq@linutronix.de (mailing list archive)
State New, archived
Headers show

Commit Message

Sebastian Andrzej Siewior July 13, 2018, 5:49 p.m. UTC
In v4.16-RT I noticed a number of warnings from task_fpsimd_load(). The
code disables BH and expects that it is not preemptible. On -RT the
task remains preemptible but remains the same CPU. This may corrupt the
content of the SIMD registers if the task is preempted during
saving/restoring those registers.
Add a locallock around this process. This avoids that the any function
within the locallock block is invoked more than once on the same CPU.

The kernel_neon_begin() can't be kept preemptible. If the task-switch notices
TIF_FOREIGN_FPSTATE then it would restore task's SIMD state and we lose the
state of registers used for in-kernel-work. We would require additional storage
for the in-kernel copy of the registers. But then the NEON-crypto checks for
the need-resched flag so it shouldn't that bad.
The preempt_disable() avoids the context switch while the kernel uses the SIMD
registers. Unfortunately we have to balance out the migrate_disable() counter
because local_lock_bh() is invoked in different context compared to its unlock
counterpart.

__efi_fpsimd_begin() should not use kernel_fpu_begin() due to its
preempt_disable() context and instead save the registers always in its
extra spot on RT.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---

This seems to make work (crypto chacha20-neon + cyclictest). I have no
EFI so I have no clue if saving SIMD while calling to EFI works.

 arch/arm64/kernel/fpsimd.c |   47 ++++++++++++++++++++++++++-------------------
 1 file changed, 28 insertions(+), 19 deletions(-)

Comments

Mike Galbraith July 13, 2018, 10:03 p.m. UTC | #1
On Fri, 2018-07-13 at 19:49 +0200, Sebastian Andrzej Siewior wrote:
> In v4.16-RT I noticed a number of warnings from task_fpsimd_load(). The
> code disables BH and expects that it is not preemptible. On -RT the
> task remains preemptible but remains the same CPU. This may corrupt the
> content of the SIMD registers if the task is preempted during
> saving/restoring those registers.
> Add a locallock around this process. This avoids that the any function
> within the locallock block is invoked more than once on the same CPU.
> 
> The kernel_neon_begin() can't be kept preemptible. If the task-switch notices
> TIF_FOREIGN_FPSTATE then it would restore task's SIMD state and we lose the
> state of registers used for in-kernel-work. We would require additional storage
> for the in-kernel copy of the registers. But then the NEON-crypto checks for
> the need-resched flag so it shouldn't that bad.
> The preempt_disable() avoids the context switch while the kernel uses the SIMD
> registers. Unfortunately we have to balance out the migrate_disable() counter
> because local_lock_bh() is invoked in different context compared to its unlock
> counterpart.
> 
> __efi_fpsimd_begin() should not use kernel_fpu_begin() due to its
> preempt_disable() context and instead save the registers always in its
> extra spot on RT.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
> 
> This seems to make work (crypto chacha20-neon + cyclictest). I have no
> EFI so I have no clue if saving SIMD while calling to EFI works.

All is not well on cavium test box.  I'm seeing random errors ala...

./include/linux/fs.h:3137:11: internal compiler error: Segmentation fault
./include/linux/bio.h:175:1: internal compiler error: in grokdeclarator, at c/c-decl.c:7023

...during make -j96 (2*cpus) kbuild.  Turns out 4.14-rt has this issue
as well, which is unsurprising if it's related to fpsimd woes.  Box
does not exhibit the issue with NONRT kernels, PREEMPT or NOPREEMPT.

To file under FWIW, arm64 configured SLE15-RT, 4.12 based kernel
containing virgin @stable arch/arm64/kernel/fpsimd.c, does not exhibit
the problem. (relevant? dunno, it may be unrelated to fpsimd.c).

	-Mike
Dave Martin July 16, 2018, 3:17 p.m. UTC | #2
On Fri, Jul 13, 2018 at 07:49:38PM +0200, Sebastian Andrzej Siewior wrote:
> In v4.16-RT I noticed a number of warnings from task_fpsimd_load(). The
> code disables BH and expects that it is not preemptible. On -RT the
> task remains preemptible but remains the same CPU. This may corrupt the
> content of the SIMD registers if the task is preempted during
> saving/restoring those registers.
> Add a locallock around this process. This avoids that the any function
> within the locallock block is invoked more than once on the same CPU.

The overall shape of this looks reasonable -- I have some comments and
concerns though, below.

I'm to blame for much of the current shape of this code, so please Cc
me on reposts.

> The kernel_neon_begin() can't be kept preemptible. If the task-switch notices
> TIF_FOREIGN_FPSTATE then it would restore task's SIMD state and we lose the
> state of registers used for in-kernel-work. We would require additional storage

Are you trying to say that a kernel_neon_begin()...kernel_neon_end()
critical section can't be preemptible?

Whether kernel_neon_begin() itself is preemptible is more of an
implementation detail.

> for the in-kernel copy of the registers. But then the NEON-crypto checks for
> the need-resched flag so it shouldn't that bad.

This sounds like an (understandably) confused description of how the
existing code is intended to work, rather than what the patch does.

Can we say something along the lines of:

"kernel_neon_begin() ... kernel_neon_end() are intended to delimit a
critical section where the executing context has exclusive access to
the cpu's FPSIMD/SVE registers.  This means that we must not allow
preemption by another context or migration to another cpu during this
region [...]"

(Not trying to be pedantic here: this -rt patch should help to clarify
the intended semantics of the code here beyond what is currently
explicit.  What I truly intended is less clear than I'd like... even to
me.)

> The preempt_disable() avoids the context switch while the kernel uses the SIMD
> registers. Unfortunately we have to balance out the migrate_disable() counter
> because local_lock_bh() is invoked in different context compared to its unlock
> counterpart.
> 
> __efi_fpsimd_begin() should not use kernel_fpu_begin() due to its

Confused -- is this a typo for "kernel_neon_begin()"?

> preempt_disable() context and instead save the registers always in its
> extra spot on RT.

I'm not sure this works.  Some EFI runtime service calls are interruptible,
so we arrange here to stash off the state as normal if an EFI call is
made from an interruptible context; otherwise we stash in a dedicated
percpu side buffer and restore eagerly in __efi_fpsimd_end().

If we now unconditionally use the side-buffer with
IS_ENABLED(CONFIG_PREEMPT_RT_BASE), I think a nested EFI call will save
state over the outer context's saved state, leading to corruption of the
vector registers.

TBH, I think EFI calls need to be non-preemptible and non-migratable.
I doubt anything else will conform to the EFI spec.

IIRC EFI runtime services are not specified to be in any way real-time,
but I don't think there's a lot we can do about that.

> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
> 
> This seems to make work (crypto chacha20-neon + cyclictest). I have no
> EFI so I have no clue if saving SIMD while calling to EFI works.
> 
>  arch/arm64/kernel/fpsimd.c |   47 ++++++++++++++++++++++++++-------------------
>  1 file changed, 28 insertions(+), 19 deletions(-)
> 
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -38,6 +38,7 @@
>  #include <linux/signal.h>
>  #include <linux/slab.h>
>  #include <linux/sysctl.h>
> +#include <linux/locallock.h>
>  
>  #include <asm/fpsimd.h>
>  #include <asm/cputype.h>
> @@ -235,7 +236,7 @@ static void sve_user_enable(void)
>   *    whether TIF_SVE is clear or set, since these are not vector length
>   *    dependent.
>   */
> -
> +static DEFINE_LOCAL_IRQ_LOCK(fpsimd_lock);

Does this really disable IRQs, and if so, why?

If so, this creates an IRQ blackout around task_fpsimd_save(), which is
something we explicitly tried to avoid in the original design.  It can
be costly.

Also, this belongs alongside the declaration for fpsimd_last_state,
since that's (part of) what the lock protects.

>  /*
>   * Update current's FPSIMD/SVE registers from thread_struct.
>   *
> @@ -594,7 +595,7 @@ int sve_set_vector_length(struct task_st
>  	 * non-SVE thread.
>  	 */
>  	if (task == current) {
> -		local_bh_disable();
> +		local_lock_bh(fpsimd_lock);
>  
>  		task_fpsimd_save();
>  		set_thread_flag(TIF_FOREIGN_FPSTATE);
> @@ -605,7 +606,7 @@ int sve_set_vector_length(struct task_st
>  		sve_to_fpsimd(task);
>  
>  	if (task == current)
> -		local_bh_enable();
> +		local_unlock_bh(fpsimd_lock);
>  
>  	/*
>  	 * Force reallocation of task SVE state to the correct size
> @@ -837,7 +838,7 @@ asmlinkage void do_sve_acc(unsigned int
>  
>  	sve_alloc(current);
>  
> -	local_bh_disable();
> +	local_lock_bh(fpsimd_lock);
>  
>  	task_fpsimd_save();
>  	fpsimd_to_sve(current);
> @@ -849,7 +850,7 @@ asmlinkage void do_sve_acc(unsigned int
>  	if (test_and_set_thread_flag(TIF_SVE))
>  		WARN_ON(1); /* SVE access shouldn't have trapped */
>  
> -	local_bh_enable();
> +	local_unlock_bh(fpsimd_lock);
>  }
>  
>  /*
> @@ -925,7 +926,7 @@ void fpsimd_flush_thread(void)
>  	if (!system_supports_fpsimd())
>  		return;
>  
> -	local_bh_disable();
> +	local_lock_bh(fpsimd_lock);
>  
>  	memset(&current->thread.fpsimd_state, 0, sizeof(struct fpsimd_state));
>  	fpsimd_flush_task_state(current);
> @@ -967,7 +968,7 @@ void fpsimd_flush_thread(void)
>  
>  	set_thread_flag(TIF_FOREIGN_FPSTATE);
>  
> -	local_bh_enable();
> +	local_unlock_bh(fpsimd_lock);
>  }
>  
>  /*
> @@ -979,9 +980,9 @@ void fpsimd_preserve_current_state(void)
>  	if (!system_supports_fpsimd())
>  		return;
>  
> -	local_bh_disable();
> +	local_lock_bh(fpsimd_lock);
>  	task_fpsimd_save();
> -	local_bh_enable();
> +	local_unlock_bh(fpsimd_lock);
>  }
>  
>  /*
> @@ -1021,14 +1022,14 @@ void fpsimd_restore_current_state(void)
>  	if (!system_supports_fpsimd())
>  		return;
>  
> -	local_bh_disable();
> +	local_lock_bh(fpsimd_lock);
>  
>  	if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
>  		task_fpsimd_load();
>  		fpsimd_bind_to_cpu();
>  	}
>  
> -	local_bh_enable();
> +	local_unlock_bh(fpsimd_lock);
>  }
>  
>  /*
> @@ -1041,7 +1042,7 @@ void fpsimd_update_current_state(struct
>  	if (!system_supports_fpsimd())
>  		return;
>  
> -	local_bh_disable();
> +	local_lock_bh(fpsimd_lock);
>  
>  	current->thread.fpsimd_state.user_fpsimd = *state;
>  	if (system_supports_sve() && test_thread_flag(TIF_SVE))
> @@ -1052,7 +1053,7 @@ void fpsimd_update_current_state(struct
>  	if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE))
>  		fpsimd_bind_to_cpu();
>  
> -	local_bh_enable();
> +	local_unlock_bh(fpsimd_lock);
>  }
>  
>  /*
> @@ -1115,7 +1116,7 @@ void kernel_neon_begin(void)
>  
>  	BUG_ON(!may_use_simd());
>  
> -	local_bh_disable();
> +	local_lock_bh(fpsimd_lock);
>  
>  	__this_cpu_write(kernel_neon_busy, true);
>  
> @@ -1129,8 +1130,14 @@ void kernel_neon_begin(void)
>  	fpsimd_flush_cpu_state();
>  
>  	preempt_disable();
> -
> -	local_bh_enable();
> +	/*
> +	 * ballance atomic vs !atomic context of migrate_disable().
> +	 * local_lock_bh = get_local_var() + spin_lock_bh (2x migrate_disable)
> +	 */
> +	migrate_disable();
> +	migrate_disable();
> +	migrate_disable();
> +	local_unlock_bh(fpsimd_lock);

This looks fragile.

Do we really need to do it 3 times?

Showing my ignorance here, but I'd naively expect that the migrate-
disableness changes as follows:

	/* 0 */
	local_lock_bh(); /* +3 */

	...

	preempt_disable(); /* +3 */

	migrate_disable(); /* +4 */
	migrate_disable(); /* +5 */
	migrate_disable(); /* +6 */

	local_unlock_bh(); /* +3 */


If so, I don't understand why it's not OK to call migrate_disable()
just once, leaving the count at +1  (?)

I'm making assumptions about the relationship between preempt_disable()
and migrate_disable() here.

>  }
>  EXPORT_SYMBOL(kernel_neon_begin);
>  
> @@ -1154,6 +1161,10 @@ void kernel_neon_end(void)
>  	WARN_ON(!busy);	/* No matching kernel_neon_begin()? */
>  
>  	preempt_enable();
> +	/* balance migrate_disable(). See kernel_neon_begin() */
> +	migrate_enable();
> +	migrate_enable();
> +	migrate_enable();
>  }
>  EXPORT_SYMBOL(kernel_neon_end);
>  
> @@ -1185,9 +1196,7 @@ void __efi_fpsimd_begin(void)
>  	if (!system_supports_fpsimd())
>  		return;
>  
> -	WARN_ON(preemptible());
> -
> -	if (may_use_simd()) {
> +	if (!IS_ENABLED(CONFIG_PREEMPT_RT_BASE) && may_use_simd()) {

I suspect this is wrong -- see comments on the commit message.

>  		kernel_neon_begin();
>  	} else {

[...]

Cheers
---Dave
Sebastian Andrzej Siewior July 18, 2018, 9:24 a.m. UTC | #3
On 2018-07-18 11:12:10 [+0200], To Dave Martin wrote:
> > > -	if (may_use_simd()) {
> > > +	if (!IS_ENABLED(CONFIG_PREEMPT_RT_BASE) && may_use_simd()) {
> > 
> > I suspect this is wrong -- see comments on the commit message.

I'm sorry, I pressed send too early, I was aiming for the draft folder.
So yes, based on that EFI that might be interruptible, let me try to
look at the initial issue again and maybe I get another idea how to deal
with this.
One question: If EFI is interruptible that means, we call into EFI - how
do we get out? Does EFI enable interrupts and the kernel receives an
interrupt and treats this EFI call like a regular context switch?

> > >  		kernel_neon_begin();
> > >  	} else {
> > 
> > [...]
> > 
> > Cheers
> > ---Dave

Sebastian
Sebastian Andrzej Siewior July 18, 2018, 9:27 a.m. UTC | #4
On 2018-07-14 00:03:44 [+0200], Mike Galbraith wrote:
> > This seems to make work (crypto chacha20-neon + cyclictest). I have no
> > EFI so I have no clue if saving SIMD while calling to EFI works.
> 
> All is not well on cavium test box.  I'm seeing random errors ala...
> 
> ./include/linux/fs.h:3137:11: internal compiler error: Segmentation fault
> ./include/linux/bio.h:175:1: internal compiler error: in grokdeclarator, at c/c-decl.c:7023
> 
> ...during make -j96 (2*cpus) kbuild.  Turns out 4.14-rt has this issue
> as well, which is unsurprising if it's related to fpsimd woes.  Box
> does not exhibit the issue with NONRT kernels, PREEMPT or NOPREEMPT.
> 
> To file under FWIW, arm64 configured SLE15-RT, 4.12 based kernel
> containing virgin @stable arch/arm64/kernel/fpsimd.c, does not exhibit
> the problem. (relevant? dunno, it may be unrelated to fpsimd.c).

Okay, so you did not test this because you can't compile. But the
"internal compiler error" is usually something the gcc folks are
interested in :)

> 	-Mike

Sebastian
Mike Galbraith July 18, 2018, 10:28 a.m. UTC | #5
On Wed, 2018-07-18 at 11:27 +0200, Sebastian Andrzej Siewior wrote:
> On 2018-07-14 00:03:44 [+0200], Mike Galbraith wrote:
> > > This seems to make work (crypto chacha20-neon + cyclictest). I have no
> > > EFI so I have no clue if saving SIMD while calling to EFI works.
> > 
> > All is not well on cavium test box.  I'm seeing random errors ala...
> > 
> > ./include/linux/fs.h:3137:11: internal compiler error: Segmentation fault
> > ./include/linux/bio.h:175:1: internal compiler error: in grokdeclarator, at c/c-decl.c:7023
> > 
> > ...during make -j96 (2*cpus) kbuild.  Turns out 4.14-rt has this issue
> > as well, which is unsurprising if it's related to fpsimd woes.  Box
> > does not exhibit the issue with NONRT kernels, PREEMPT or NOPREEMPT.
> > 
> > To file under FWIW, arm64 configured SLE15-RT, 4.12 based kernel
> > containing virgin @stable arch/arm64/kernel/fpsimd.c, does not exhibit
> > the problem. (relevant? dunno, it may be unrelated to fpsimd.c).
> 
> Okay, so you did not test this because you can't compile.

Nope, the running kernel, the one that is doing the segfaulting etc,
has the patches applied.

It is exhibiting that symptom because those patches do not cure this
symptom, one which I verified to be present in virgin 4.14-rt as well. 
The pseudo-patch I sent, disabling preemption where it is assumed to be
disabled instead, does cure it.  With preemption so disabled, I can
beat on affected kernels (>=4.14-rt) as long as I like.

This particular 48 core Cavium is very slow, maybe that makes it easier
to reproduce, dunno.  According to pipe-test, the thing is essentially
a dozen RPi super-glued together.  pipe-test pinned to a single core
can only context switch at ~40KHz with PREEMPT_RT, or ~90 with
NOPREEMPT, comparable to measurement done in real deal RPi.

	-Mike
Sebastian Andrzej Siewior July 18, 2018, 10:36 a.m. UTC | #6
On 2018-07-18 12:28:48 [+0200], Mike Galbraith wrote:
> > Okay, so you did not test this because you can't compile.
> 
> Nope, the running kernel, the one that is doing the segfaulting etc,
> has the patches applied.
> 
> It is exhibiting that symptom because those patches do not cure this
> symptom, one which I verified to be present in virgin 4.14-rt as well. 
> The pseudo-patch I sent, disabling preemption where it is assumed to be
> disabled instead, does cure it.  With preemption so disabled, I can
> beat on affected kernels (>=4.14-rt) as long as I like.

ah. so gcc shows the problem instead gcc explodes with the patch
applied. Okay. Let me stare at this a little more…

> This particular 48 core Cavium is very slow, maybe that makes it easier
> to reproduce, dunno.  According to pipe-test, the thing is essentially
> a dozen RPi super-glued together.  pipe-test pinned to a single core
> can only context switch at ~40KHz with PREEMPT_RT, or ~90 with
> NOPREEMPT, comparable to measurement done in real deal RPi.
> 
> 	-Mike

Sebastian
Steven Rostedt July 24, 2018, 1:46 p.m. UTC | #7
On Wed, 18 Jul 2018 11:12:10 +0200
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> > > @@ -1115,7 +1116,7 @@ void kernel_neon_begin(void)
> > >  
> > >  	BUG_ON(!may_use_simd());
> > >  
> > > -	local_bh_disable();
> > > +	local_lock_bh(fpsimd_lock);
> > >  
> > >  	__this_cpu_write(kernel_neon_busy, true);
> > >  
> > > @@ -1129,8 +1130,14 @@ void kernel_neon_begin(void)
> > >  	fpsimd_flush_cpu_state();
> > >  
> > >  	preempt_disable();
> > > -
> > > -	local_bh_enable();
> > > +	/*
> > > +	 * ballance atomic vs !atomic context of migrate_disable().
> > > +	 * local_lock_bh = get_local_var() + spin_lock_bh (2x migrate_disable)
> > > +	 */
> > > +	migrate_disable();
> > > +	migrate_disable();
> > > +	migrate_disable();
> > > +	local_unlock_bh(fpsimd_lock);  
> > 
> > This looks fragile.
> > 
> > Do we really need to do it 3 times?  
> 
> Unfortunately yes.

Then we need to find another solution, because this is way too ugly and
as Dave said, fragile to keep.


> 
> > Showing my ignorance here, but I'd naively expect that the migrate-
> > disableness changes as follows:
> > 
> > 	/* 0 */
> > 	local_lock_bh(); /* +3 */
> > 
> > 	...
> > 
> > 	preempt_disable(); /* +3 */
> > 
> > 	migrate_disable(); /* +4 */
> > 	migrate_disable(); /* +5 */
> > 	migrate_disable(); /* +6 */
> > 
> > 	local_unlock_bh(); /* +3 */
> > 
> > 
> > If so, I don't understand why it's not OK to call migrate_disable()
> > just once, leaving the count at +1  (?)
> > 
> > I'm making assumptions about the relationship between preempt_disable()
> > and migrate_disable() here.  
> 
> Exactly. So local_lock_bh() does +3 which I try to explain why it is 3.

How does local_lock_bh() do a +3 (not seeing it in the code). And
leaking internal implementation of local_lock_bh() into other parts of
the kernel is a no no.

> Now migrate_disable() has two counters: One for atomic context (irq or
> preemption disabled) and on for non-atomic context. The difference is
> that in atomic context migrate_disable() does nothing and in !atomic
> context it does other things which can't be done in atomic context
> anyway (I can explain this in full detail but you may lose interest so I
> keep it short). To update your example:
> 
> 	/* ATOMIC COUNTER | non-ATOMIC COUNTER  | change */
> 			 /* 0 | 0 | - */
> 	local_lock_bh(); /* 0 | 3 | N+3 */
>  
>  	...
>  
> 	preempt_disable(); /* atomic context */
>  
> 	migrate_disable(); /* 1 | 3 | A+1 */
> 	migrate_disable(); /* 2 | 3 | A+1 */
> 	migrate_disable(); /* 3 | 3 | A+1 */
>  
> 	local_unlock_bh(); /* 0 | 3 | A-3 */
> 
> /* later */
> 	preempt_enable();  /* non-atomic context */
> 	migrate_enable();  /* 0 | 2 | N-1 */
> 	migrate_enable();  /* 0 | 1 | N-1 */
> 	migrate_enable();  /* 0 | 0 | N-1 */

If anything, this needs a wrapper. local_lock_preempt_fixup() ?

-- Steve

> 
> > >  }
> > >  EXPORT_SYMBOL(kernel_neon_begin);
> > >  
> > > @@ -1154,6 +1161,10 @@ void kernel_neon_end(void)
> > >  	WARN_ON(!busy);	/* No matching kernel_neon_begin()? */
> > >  
> > >  	preempt_enable();
> > > +	/* balance migrate_disable(). See kernel_neon_begin() */
> > > +	migrate_enable();
> > > +	migrate_enable();
> > > +	migrate_enable();
> > >  }
> > >  EXPORT_SYMBOL(kernel_neon_end);
> > >  
> > > @@ -1185,9 +1196,7 @@ void __efi_fpsimd_begin(void)
> > >  	if (!system_supports_fpsimd())
> > >  		return;
> > >  
> > > -	WARN_ON(preemptible());
> > > -
> > > -	if (may_use_simd()) {
> > > +	if (!IS_ENABLED(CONFIG_PREEMPT_RT_BASE) && may_use_simd()) {  
> > 
> > I suspect this is wrong -- see comments on the commit message.
> >   
> > >  		kernel_neon_begin();
> > >  	} else {  
> > 
> > [...]
> > 
> > Cheers
> > ---Dave
Sebastian Andrzej Siewior July 24, 2018, 1:57 p.m. UTC | #8
On 2018-07-24 09:46:23 [-0400], Steven Rostedt wrote:
> > Unfortunately yes.
> 
> Then we need to find another solution, because this is way too ugly and
> as Dave said, fragile to keep.

Yes. I have something new where Mike said it works (while this causes
Mike's gcc to segfault). Need to test this myself…

> How does local_lock_bh() do a +3 (not seeing it in the code). And

get_local_var() +1
spin_lock_bh() +2 because
	local_bh_disable() +1
	spin_lock() +1

Sebastian
Dave Martin July 24, 2018, 2:45 p.m. UTC | #9
On Wed, Jul 18, 2018 at 11:24:48AM +0200, Sebastian Andrzej Siewior wrote:
> On 2018-07-18 11:12:10 [+0200], To Dave Martin wrote:
> > > > -	if (may_use_simd()) {
> > > > +	if (!IS_ENABLED(CONFIG_PREEMPT_RT_BASE) && may_use_simd()) {
> > > 
> > > I suspect this is wrong -- see comments on the commit message.
> 
> I'm sorry, I pressed send too early, I was aiming for the draft folder.
> So yes, based on that EFI that might be interruptible, let me try to
> look at the initial issue again and maybe I get another idea how to deal
> with this.
> One question: If EFI is interruptible that means, we call into EFI - how
> do we get out? Does EFI enable interrupts and the kernel receives an
> interrupt and treats this EFI call like a regular context switch?

AFAIK the only safe way to get out permanently is for the call to
return.  Note, I've not gone through the spec in fine detail myself.

The OS may handle interrupts occurring during the EFI call, but we
still have to return to EFI afterwards to finish off the call.  From
the Linux perspective, I think this means that EFI calls are non-
preemptible.

Under RT, I'm pretty sure that we can't safely resume the interrupted
EFI call on a different cpu from the one it was interrupted on.  Even
if it doesn't say this explicitly in the UEFI spec, I think it will be
assumed in implementations.


Certain EFI calls are not long-running and may need to be called from
interrupt context in Linux, which means that there may be live kernel-
mode NEON state.  This is why there are separate FPSIMD/SVE percpu stash
buffers for EFI specifically.

Does this make sense?  It's is probably not very clear, but I'm trying
to hide the fact that I haven't looked at the UEFI spec for ages...

Cheers
---Dave
Ard Biesheuvel July 24, 2018, 3:15 p.m. UTC | #10
On 24 July 2018 at 16:45, Dave Martin <Dave.Martin@arm.com> wrote:
> On Wed, Jul 18, 2018 at 11:24:48AM +0200, Sebastian Andrzej Siewior wrote:
>> On 2018-07-18 11:12:10 [+0200], To Dave Martin wrote:
>> > > > -       if (may_use_simd()) {
>> > > > +       if (!IS_ENABLED(CONFIG_PREEMPT_RT_BASE) && may_use_simd()) {
>> > >
>> > > I suspect this is wrong -- see comments on the commit message.
>>
>> I'm sorry, I pressed send too early, I was aiming for the draft folder.
>> So yes, based on that EFI that might be interruptible, let me try to
>> look at the initial issue again and maybe I get another idea how to deal
>> with this.
>> One question: If EFI is interruptible that means, we call into EFI - how
>> do we get out? Does EFI enable interrupts and the kernel receives an
>> interrupt and treats this EFI call like a regular context switch?
>
> AFAIK the only safe way to get out permanently is for the call to
> return.  Note, I've not gone through the spec in fine detail myself.
>
> The OS may handle interrupts occurring during the EFI call, but we
> still have to return to EFI afterwards to finish off the call.  From
> the Linux perspective, I think this means that EFI calls are non-
> preemptible.
>
> Under RT, I'm pretty sure that we can't safely resume the interrupted
> EFI call on a different cpu from the one it was interrupted on.  Even
> if it doesn't say this explicitly in the UEFI spec, I think it will be
> assumed in implementations.
>

This is a can of worms I would rather not open, although I don't think
the UEFI spec makes it explicit that you cannot migrate runtime
service calls while in progress.

Also, I don't think EFI calls are worth obsessing about, given that
they shouldn't be that common under normal operation. I know that RT
is not about the common case but about the worst case, though. What
problem is migrating a non-preemptible task intended to solve?

>
> Certain EFI calls are not long-running and may need to be called from
> interrupt context in Linux,

This suggests that the need to be called in interrupt context is a
property of the firmware implementation but it is not.

In the kernel, we have efi-pstore which may attempt to invoke runtime
services to record the kernel's dying gasp into a EFI variable. Other
than that, I don't think there are any reasons to call EFI services
from non-process context.

> which means that there may be live kernel-
> mode NEON state.  This is why there are separate FPSIMD/SVE percpu stash
> buffers for EFI specifically.
>

So given the above, and the fact that you can panic() in an interrupt
handler, we needed those buffers, although I wonder if we'd ever reach
the point where we end up resuming execution of the code that uses the
restored contents of those buffers.

> Does this make sense?  It's is probably not very clear, but I'm trying
> to hide the fact that I haven't looked at the UEFI spec for ages...
>
> Cheers
> ---Dave
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
diff mbox

Patch

--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -38,6 +38,7 @@ 
 #include <linux/signal.h>
 #include <linux/slab.h>
 #include <linux/sysctl.h>
+#include <linux/locallock.h>
 
 #include <asm/fpsimd.h>
 #include <asm/cputype.h>
@@ -235,7 +236,7 @@  static void sve_user_enable(void)
  *    whether TIF_SVE is clear or set, since these are not vector length
  *    dependent.
  */
-
+static DEFINE_LOCAL_IRQ_LOCK(fpsimd_lock);
 /*
  * Update current's FPSIMD/SVE registers from thread_struct.
  *
@@ -594,7 +595,7 @@  int sve_set_vector_length(struct task_st
 	 * non-SVE thread.
 	 */
 	if (task == current) {
-		local_bh_disable();
+		local_lock_bh(fpsimd_lock);
 
 		task_fpsimd_save();
 		set_thread_flag(TIF_FOREIGN_FPSTATE);
@@ -605,7 +606,7 @@  int sve_set_vector_length(struct task_st
 		sve_to_fpsimd(task);
 
 	if (task == current)
-		local_bh_enable();
+		local_unlock_bh(fpsimd_lock);
 
 	/*
 	 * Force reallocation of task SVE state to the correct size
@@ -837,7 +838,7 @@  asmlinkage void do_sve_acc(unsigned int
 
 	sve_alloc(current);
 
-	local_bh_disable();
+	local_lock_bh(fpsimd_lock);
 
 	task_fpsimd_save();
 	fpsimd_to_sve(current);
@@ -849,7 +850,7 @@  asmlinkage void do_sve_acc(unsigned int
 	if (test_and_set_thread_flag(TIF_SVE))
 		WARN_ON(1); /* SVE access shouldn't have trapped */
 
-	local_bh_enable();
+	local_unlock_bh(fpsimd_lock);
 }
 
 /*
@@ -925,7 +926,7 @@  void fpsimd_flush_thread(void)
 	if (!system_supports_fpsimd())
 		return;
 
-	local_bh_disable();
+	local_lock_bh(fpsimd_lock);
 
 	memset(&current->thread.fpsimd_state, 0, sizeof(struct fpsimd_state));
 	fpsimd_flush_task_state(current);
@@ -967,7 +968,7 @@  void fpsimd_flush_thread(void)
 
 	set_thread_flag(TIF_FOREIGN_FPSTATE);
 
-	local_bh_enable();
+	local_unlock_bh(fpsimd_lock);
 }
 
 /*
@@ -979,9 +980,9 @@  void fpsimd_preserve_current_state(void)
 	if (!system_supports_fpsimd())
 		return;
 
-	local_bh_disable();
+	local_lock_bh(fpsimd_lock);
 	task_fpsimd_save();
-	local_bh_enable();
+	local_unlock_bh(fpsimd_lock);
 }
 
 /*
@@ -1021,14 +1022,14 @@  void fpsimd_restore_current_state(void)
 	if (!system_supports_fpsimd())
 		return;
 
-	local_bh_disable();
+	local_lock_bh(fpsimd_lock);
 
 	if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
 		task_fpsimd_load();
 		fpsimd_bind_to_cpu();
 	}
 
-	local_bh_enable();
+	local_unlock_bh(fpsimd_lock);
 }
 
 /*
@@ -1041,7 +1042,7 @@  void fpsimd_update_current_state(struct
 	if (!system_supports_fpsimd())
 		return;
 
-	local_bh_disable();
+	local_lock_bh(fpsimd_lock);
 
 	current->thread.fpsimd_state.user_fpsimd = *state;
 	if (system_supports_sve() && test_thread_flag(TIF_SVE))
@@ -1052,7 +1053,7 @@  void fpsimd_update_current_state(struct
 	if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE))
 		fpsimd_bind_to_cpu();
 
-	local_bh_enable();
+	local_unlock_bh(fpsimd_lock);
 }
 
 /*
@@ -1115,7 +1116,7 @@  void kernel_neon_begin(void)
 
 	BUG_ON(!may_use_simd());
 
-	local_bh_disable();
+	local_lock_bh(fpsimd_lock);
 
 	__this_cpu_write(kernel_neon_busy, true);
 
@@ -1129,8 +1130,14 @@  void kernel_neon_begin(void)
 	fpsimd_flush_cpu_state();
 
 	preempt_disable();
-
-	local_bh_enable();
+	/*
+	 * ballance atomic vs !atomic context of migrate_disable().
+	 * local_lock_bh = get_local_var() + spin_lock_bh (2x migrate_disable)
+	 */
+	migrate_disable();
+	migrate_disable();
+	migrate_disable();
+	local_unlock_bh(fpsimd_lock);
 }
 EXPORT_SYMBOL(kernel_neon_begin);
 
@@ -1154,6 +1161,10 @@  void kernel_neon_end(void)
 	WARN_ON(!busy);	/* No matching kernel_neon_begin()? */
 
 	preempt_enable();
+	/* balance migrate_disable(). See kernel_neon_begin() */
+	migrate_enable();
+	migrate_enable();
+	migrate_enable();
 }
 EXPORT_SYMBOL(kernel_neon_end);
 
@@ -1185,9 +1196,7 @@  void __efi_fpsimd_begin(void)
 	if (!system_supports_fpsimd())
 		return;
 
-	WARN_ON(preemptible());
-
-	if (may_use_simd()) {
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT_BASE) && may_use_simd()) {
 		kernel_neon_begin();
 	} else {
 		/*