diff mbox

[RT] arm64: fpsimd: use a local_lock() in addition to local_bh_disable()

Message ID 20180517124006.ohygrrpg7z2moqqt@linutronix.de (mailing list archive)
State New, archived
Headers show

Commit Message

Sebastian Andrzej Siewior May 17, 2018, 12:40 p.m. UTC
In v4.16-RT I noticed a number of warnings from task_fpsimd_load(). The
code disables BH and expects that it is not preemptible. On -RT the
task remains preemptible but remains the same CPU. This may corrupt the
content of the SIMD registers if the task is preempted during
saving/restoring those registers.
Add a locallock around next to the local_bh_disable(). This fulfill the
requirement that the code is not invoked again in different context on
the same CPU while it remains preemptible.

The preempt_disable() + local_bh_enable() combo in kernel_neon_begin()
is not working on -RT. We don't use NEON in kernel mode on RT right now
but this still should be addressed.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/arm64/kernel/fpsimd.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

Comments

Dave Martin May 17, 2018, 6:19 p.m. UTC | #1
On Thu, May 17, 2018 at 02:40:06PM +0200, Sebastian Andrzej Siewior wrote:
> In v4.16-RT I noticed a number of warnings from task_fpsimd_load(). The
> code disables BH and expects that it is not preemptible. On -RT the
> task remains preemptible but remains the same CPU. This may corrupt the

Also, watch out for [1] which adds more of this stuff for KVM.  This
not merged yet, but likely to land in v4.18.

> content of the SIMD registers if the task is preempted during
> saving/restoring those registers.
> Add a locallock around next to the local_bh_disable(). This fulfill the
> requirement that the code is not invoked again in different context on
> the same CPU while it remains preemptible.

Thanks for this.

*WARNING*: My RT-fu is weak to nonexistent, so don't assume that
anything I suggest below is correct without thinking carefully about
it :)

Anyway:

What we're really trying to achieve with the local_bh_disable/enable
stuff is exclusive access to the CPU FPSIMD registers and associated
metadata that tracks who they belong to.

> The preempt_disable() + local_bh_enable() combo in kernel_neon_begin()
> is not working on -RT. We don't use NEON in kernel mode on RT right now
> but this still should be addressed.

I think we effectively have two levels of locking here.

At the outer level, we want exclusive access to the FPSIMD registers.
This is what is needed between kernel_neon_begin() and
kernel_neon_end(), and maps onto the preempt_disable()/_enable() done
by these functions.

In context switch critical code, that's insufficient, and we also
need exclusive access to the metadata that tracks which task or context
owns the FPSIMD registers.  This is what the local_bh_disable()/
_enable() achieves.


So does it make sense to have two locks (I'm assuming local locks are
implicitly percpu ?)

static inline void local_fpsimd_context_lock(void)
{
	local_bh_disable();
	local_lock(fpsimd_lock);
	local_lock(fpsimd_context_lock);
}

static inline void local_fpsimd_context_unlock(void)
{
	local_unlock(fpsimd_context_lock);
	local_unlock(fpsimd_lock);
	local_bh_enable();
}


kernel_neon_begin() could then do

	local_fpsimd_context_lock();

	/* ... */

	preempt_disable();
	local_unlock(fpsimd_context_lock);

... with the following in kernel_neon_end():

	local_unlock(fpsimd_lock);
	preempt_enable();


If kernel-mode NEON was considered harmful to RT due to the context
switch overheads, then the above might be overkill.  SVE will be worse
in that regard, and also needs thinking about at some point -- I've not
looked at if from the RT angle at all.

In either case, I think abstracting the lock/unlock sequences out to
make the purpose clearer may be a good idea, even if we just have a
single local lock to manage.


There is one place where I mess with the FPSIMD context no lock held
because of a need to copy_from_user() straight into the context backing
store (we can't copy_from_user() with preemption disabled...)
I'm not sure whether RT will have any impact on this, but it probably
needs thinking about.

One more comment below...

> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  arch/arm64/kernel/fpsimd.c | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index e7226c4c7493..3a5cd1908874 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -38,6 +38,7 @@
>  #include <linux/signal.h>
>  #include <linux/slab.h>
>  #include <linux/sysctl.h>
> +#include <linux/locallock.h>
>  
>  #include <asm/fpsimd.h>
>  #include <asm/cputype.h>
> @@ -235,7 +236,7 @@ static void sve_user_enable(void)
>   *    whether TIF_SVE is clear or set, since these are not vector length
>   *    dependent.
>   */
> -
> +static DEFINE_LOCAL_IRQ_LOCK(fpsimd_lock);
>  /*
>   * Update current's FPSIMD/SVE registers from thread_struct.
>   *
> @@ -594,6 +595,7 @@ int sve_set_vector_length(struct task_struct *task,
>  	 * non-SVE thread.
>  	 */
>  	if (task == current) {
> +		local_lock(fpsimd_lock);
>  		local_bh_disable();
>  
>  		task_fpsimd_save();
> @@ -604,8 +606,10 @@ int sve_set_vector_length(struct task_struct *task,
>  	if (test_and_clear_tsk_thread_flag(task, TIF_SVE))
>  		sve_to_fpsimd(task);
>  
> -	if (task == current)
> +	if (task == current) {
> +		local_unlock(fpsimd_lock);

Is this misordered against local_bh_enable(), or doesn't it matter?

>  		local_bh_enable();
> +	}
>  
>  	/*
>  	 * Force reallocation of task SVE state to the correct size
> @@ -838,6 +842,7 @@ asmlinkage void do_sve_acc(unsigned int esr, struct pt_regs *regs)
>  	sve_alloc(current);
>  
>  	local_bh_disable();
> +	local_lock(fpsimd_lock);
>  
>  	task_fpsimd_save();
>  	fpsimd_to_sve(current);
> @@ -849,6 +854,7 @@ asmlinkage void do_sve_acc(unsigned int esr, struct pt_regs *regs)
>  	if (test_and_set_thread_flag(TIF_SVE))
>  		WARN_ON(1); /* SVE access shouldn't have trapped */
>  
> +	local_unlock(fpsimd_lock);
>  	local_bh_enable();
>  }
>  
> @@ -926,6 +932,7 @@ void fpsimd_flush_thread(void)
>  		return;
>  
>  	local_bh_disable();
> +	local_lock(fpsimd_lock);
>  
>  	memset(&current->thread.fpsimd_state, 0, sizeof(struct fpsimd_state));
>  	fpsimd_flush_task_state(current);
> @@ -967,6 +974,7 @@ void fpsimd_flush_thread(void)
>  
>  	set_thread_flag(TIF_FOREIGN_FPSTATE);
>  
> +	local_unlock(fpsimd_lock);
>  	local_bh_enable();
>  }
>  
> @@ -980,7 +988,9 @@ void fpsimd_preserve_current_state(void)
>  		return;
>  
>  	local_bh_disable();
> +	local_lock(fpsimd_lock);
>  	task_fpsimd_save();
> +	local_unlock(fpsimd_lock);
>  	local_bh_enable();
>  }
>  
> @@ -1022,12 +1032,14 @@ void fpsimd_restore_current_state(void)
>  		return;
>  
>  	local_bh_disable();
> +	local_lock(fpsimd_lock);
>  
>  	if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
>  		task_fpsimd_load();
>  		fpsimd_bind_to_cpu();
>  	}
>  
> +	local_unlock(fpsimd_lock);
>  	local_bh_enable();
>  }
>  
> @@ -1042,6 +1054,7 @@ void fpsimd_update_current_state(struct user_fpsimd_state const *state)
>  		return;
>  
>  	local_bh_disable();
> +	local_lock(fpsimd_lock);
>  
>  	current->thread.fpsimd_state.user_fpsimd = *state;
>  	if (system_supports_sve() && test_thread_flag(TIF_SVE))
> @@ -1052,6 +1065,7 @@ void fpsimd_update_current_state(struct user_fpsimd_state const *state)
>  	if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE))
>  		fpsimd_bind_to_cpu();
>  
> +	local_unlock(fpsimd_lock);
>  	local_bh_enable();
>  }
>  
> @@ -1116,6 +1130,7 @@ void kernel_neon_begin(void)
>  	BUG_ON(!may_use_simd());
>  
>  	local_bh_disable();
> +	local_lock(fpsimd_lock);
>  
>  	__this_cpu_write(kernel_neon_busy, true);
>  
> @@ -1128,6 +1143,7 @@ void kernel_neon_begin(void)
>  	/* Invalidate any task state remaining in the fpsimd regs: */
>  	fpsimd_flush_cpu_state();
>  
> +	local_unlock(fpsimd_lock);
>  	preempt_disable();
>  
>  	local_bh_enable();

The general approach looks reasonable, based on my guesswork about what
the local_lock/_unlocks are doing here.

Cheers
---Dave

[...]

[1] [PATCH v7 00/16] KVM: arm64: Optimise FPSIMD context switching
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-May/576595.html
Dave Martin May 18, 2018, 12:46 p.m. UTC | #2
On Thu, May 17, 2018 at 07:19:43PM +0100, Dave Martin wrote:

[...]

> kernel_neon_begin() could then do
> 
> 	local_fpsimd_context_lock();
> 
> 	/* ... */
> 
> 	preempt_disable();
> 	local_unlock(fpsimd_context_lock);
> 
> ... with the following in kernel_neon_end():
> 
> 	local_unlock(fpsimd_lock);
> 	preempt_enable();
> 
> 
> If kernel-mode NEON was considered harmful to RT due to the context
> switch overheads, then the above might be overkill.  SVE will be worse
> in that regard, and also needs thinking about at some point -- I've not
> looked at if from the RT angle at all.

Hmmm, !KERNEL_MODE_NEON breaks EFI, so this probably does want looking
at.  Ard's recent rework to enable voluntary preemption the crypto
backends for arm64 [1] should reduce the fpsimd_lock blackouts, but it
still depends on the backends playing nice.

Cheers
---Dave

[1] [PATCH resend 00/10] crypto: arm64 - play nice with CONFIG_PREEMPT
lists.infradead.org/pipermail/linux-arm-kernel/2018-April/574819.html
Steven Rostedt May 22, 2018, 5:10 p.m. UTC | #3
On Thu, 17 May 2018 14:40:06 +0200
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> +static DEFINE_LOCAL_IRQ_LOCK(fpsimd_lock);
>  /*
>   * Update current's FPSIMD/SVE registers from thread_struct.
>   *
> @@ -594,6 +595,7 @@ int sve_set_vector_length(struct task_struct *task,
>  	 * non-SVE thread.
>  	 */
>  	if (task == current) {
> +		local_lock(fpsimd_lock);
>  		local_bh_disable();

I'm surprised that we don't have a "local_lock_bh()"?

-- Steve

>  
>  		task_fpsimd_save();
Sebastian Andrzej Siewior May 22, 2018, 5:21 p.m. UTC | #4
On 2018-05-22 13:10:04 [-0400], Steven Rostedt wrote:
> On Thu, 17 May 2018 14:40:06 +0200
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> 
> > +static DEFINE_LOCAL_IRQ_LOCK(fpsimd_lock);
> >  /*
> >   * Update current's FPSIMD/SVE registers from thread_struct.
> >   *
> > @@ -594,6 +595,7 @@ int sve_set_vector_length(struct task_struct *task,
> >  	 * non-SVE thread.
> >  	 */
> >  	if (task == current) {
> > +		local_lock(fpsimd_lock);
> >  		local_bh_disable();
> 
> I'm surprised that we don't have a "local_lock_bh()"?

right. Like the last time when we introduced a global lock with no
locking context? 

> -- Steve
> 
> >  
> >  		task_fpsimd_save();

Sebastian
Steven Rostedt May 22, 2018, 5:24 p.m. UTC | #5
On Tue, 22 May 2018 19:21:16 +0200
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> On 2018-05-22 13:10:04 [-0400], Steven Rostedt wrote:
> > On Thu, 17 May 2018 14:40:06 +0200
> > Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> >   
> > > +static DEFINE_LOCAL_IRQ_LOCK(fpsimd_lock);
> > >  /*
> > >   * Update current's FPSIMD/SVE registers from thread_struct.
> > >   *
> > > @@ -594,6 +595,7 @@ int sve_set_vector_length(struct task_struct *task,
> > >  	 * non-SVE thread.
> > >  	 */
> > >  	if (task == current) {
> > > +		local_lock(fpsimd_lock);
> > >  		local_bh_disable();  
> > 
> > I'm surprised that we don't have a "local_lock_bh()"?  
> 
> right. Like the last time when we introduced a global lock with no
> locking context? 
> 

I meant, we could have local_lock_bh(fpsimd_lock); that would turn into
a local_bh_disable() when !PREEMPT_RT.

-- Steve
Sebastian Andrzej Siewior May 22, 2018, 5:33 p.m. UTC | #6
On 2018-05-22 13:24:29 [-0400], Steven Rostedt wrote:
> On Tue, 22 May 2018 19:21:16 +0200
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> 
> > On 2018-05-22 13:10:04 [-0400], Steven Rostedt wrote:
> > > On Thu, 17 May 2018 14:40:06 +0200
> > > Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> > >   
> > > > +static DEFINE_LOCAL_IRQ_LOCK(fpsimd_lock);
> > > >  /*
> > > >   * Update current's FPSIMD/SVE registers from thread_struct.
> > > >   *
> > > > @@ -594,6 +595,7 @@ int sve_set_vector_length(struct task_struct *task,
> > > >  	 * non-SVE thread.
> > > >  	 */
> > > >  	if (task == current) {
> > > > +		local_lock(fpsimd_lock);
> > > >  		local_bh_disable();  
> > > 
> > > I'm surprised that we don't have a "local_lock_bh()"?  
> > 
> > right. Like the last time when we introduced a global lock with no
> > locking context? 
> > 
> 
> I meant, we could have local_lock_bh(fpsimd_lock); that would turn into
> a local_bh_disable() when !PREEMPT_RT.

Oh that part. That could be possible I guess. I need to look into the
second part which disables preemption while the FPU is taken.

> -- Steve

Sebastian
Sebastian Andrzej Siewior May 23, 2018, 2:31 p.m. UTC | #7
On 2018-05-17 19:19:43 [+0100], Dave Martin wrote:
> On Thu, May 17, 2018 at 02:40:06PM +0200, Sebastian Andrzej Siewior wrote:
> > In v4.16-RT I noticed a number of warnings from task_fpsimd_load(). The
> > code disables BH and expects that it is not preemptible. On -RT the
> > task remains preemptible but remains the same CPU. This may corrupt the
> 
> Also, watch out for [1] which adds more of this stuff for KVM.  This
> not merged yet, but likely to land in v4.18.

yay. I will try to keep this in mind while moving forward.

> Anyway:
> 
> What we're really trying to achieve with the local_bh_disable/enable
> stuff is exclusive access to the CPU FPSIMD registers and associated
> metadata that tracks who they belong to.

I assumed this.

> > The preempt_disable() + local_bh_enable() combo in kernel_neon_begin()
> > is not working on -RT. We don't use NEON in kernel mode on RT right now
> > but this still should be addressed.
> 
> I think we effectively have two levels of locking here.
> 
> At the outer level, we want exclusive access to the FPSIMD registers.
> This is what is needed between kernel_neon_begin() and
> kernel_neon_end(), and maps onto the preempt_disable()/_enable() done
> by these functions.
> 
> In context switch critical code, that's insufficient, and we also
> need exclusive access to the metadata that tracks which task or context
> owns the FPSIMD registers.  This is what the local_bh_disable()/
> _enable() achieves.
> 
> 
> So does it make sense to have two locks (I'm assuming local locks are
> implicitly percpu ?)
> 
> static inline void local_fpsimd_context_lock(void)
> {
> 	local_bh_disable();
> 	local_lock(fpsimd_lock);
> 	local_lock(fpsimd_context_lock);
> }
> 
> static inline void local_fpsimd_context_unlock(void)
> {
> 	local_unlock(fpsimd_context_lock);
> 	local_unlock(fpsimd_lock);
> 	local_bh_enable();
> }
> 
> 
> kernel_neon_begin() could then do
> 
> 	local_fpsimd_context_lock();
> 
> 	/* ... */
> 
> 	preempt_disable();
> 	local_unlock(fpsimd_context_lock);
> 
> ... with the following in kernel_neon_end():
> 
> 	local_unlock(fpsimd_lock);
> 	preempt_enable();
> 
> 
> If kernel-mode NEON was considered harmful to RT due to the context
> switch overheads, then the above might be overkill.  SVE will be worse
> in that regard, and also needs thinking about at some point -- I've not
> looked at if from the RT angle at all.
> 
> In either case, I think abstracting the lock/unlock sequences out to
> make the purpose clearer may be a good idea, even if we just have a
> single local lock to manage.

So the two looks you suggested make sense. However I would need to
replace this preempt_disable() with one such lock. A local_lock() on -RT
is a per-CPU spin_lock. RT's local_lock gets currently transformed into
preempt_disable() on !RT configs.

> There is one place where I mess with the FPSIMD context no lock held
> because of a need to copy_from_user() straight into the context backing
> store (we can't copy_from_user() with preemption disabled...)
> I'm not sure whether RT will have any impact on this, but it probably
> needs thinking about.

if no locks are held and the task can be preempted then it also might
happen on PREEMPT config. But copy_from_user() doesn't sounds like
something that will go straight to HW.

> One more comment below...
> 
> > 
> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > ---
> >  arch/arm64/kernel/fpsimd.c | 20 ++++++++++++++++++--
> >  1 file changed, 18 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> > index e7226c4c7493..3a5cd1908874 100644
> > --- a/arch/arm64/kernel/fpsimd.c
> > +++ b/arch/arm64/kernel/fpsimd.c
> > @@ -38,6 +38,7 @@
> >  #include <linux/signal.h>
> >  #include <linux/slab.h>
> >  #include <linux/sysctl.h>
> > +#include <linux/locallock.h>
> >  
> >  #include <asm/fpsimd.h>
> >  #include <asm/cputype.h>
> > @@ -235,7 +236,7 @@ static void sve_user_enable(void)
> >   *    whether TIF_SVE is clear or set, since these are not vector length
> >   *    dependent.
> >   */
> > -
> > +static DEFINE_LOCAL_IRQ_LOCK(fpsimd_lock);
> >  /*
> >   * Update current's FPSIMD/SVE registers from thread_struct.
> >   *
> > @@ -594,6 +595,7 @@ int sve_set_vector_length(struct task_struct *task,
> >  	 * non-SVE thread.
> >  	 */
> >  	if (task == current) {
> > +		local_lock(fpsimd_lock);
> >  		local_bh_disable();
> >  
> >  		task_fpsimd_save();
> > @@ -604,8 +606,10 @@ int sve_set_vector_length(struct task_struct *task,
> >  	if (test_and_clear_tsk_thread_flag(task, TIF_SVE))
> >  		sve_to_fpsimd(task);
> >  
> > -	if (task == current)
> > +	if (task == current) {
> > +		local_unlock(fpsimd_lock);
> 
> Is this misordered against local_bh_enable(), or doesn't it matter?

It actually is but it does not matter. On RT local_bh_disable() is
turned into migrate_disable() what in turn disables the migration of the
task to another CPU (while it is still preemtible). This
migrate_disable() is also part of local_lock(). Maybe I will add a
local_lock_bh() and replace this with this local_bh_disable() so it will
better :)

> >  		local_bh_enable();
> > +	}

Sebastian
Sebastian Andrzej Siewior May 23, 2018, 2:34 p.m. UTC | #8
On 2018-05-18 13:46:36 [+0100], Dave Martin wrote:
> On Thu, May 17, 2018 at 07:19:43PM +0100, Dave Martin wrote:
> 
> [...]
> 
> > kernel_neon_begin() could then do
> > 
> > 	local_fpsimd_context_lock();
> > 
> > 	/* ... */
> > 
> > 	preempt_disable();
> > 	local_unlock(fpsimd_context_lock);
> > 
> > ... with the following in kernel_neon_end():
> > 
> > 	local_unlock(fpsimd_lock);
> > 	preempt_enable();
> > 
> > 
> > If kernel-mode NEON was considered harmful to RT due to the context
> > switch overheads, then the above might be overkill.  SVE will be worse
> > in that regard, and also needs thinking about at some point -- I've not
> > looked at if from the RT angle at all.
> 
> Hmmm, !KERNEL_MODE_NEON breaks EFI, so this probably does want looking
> at.  Ard's recent rework to enable voluntary preemption the crypto
> backends for arm64 [1] should reduce the fpsimd_lock blackouts, but it
> still depends on the backends playing nice.

oh. I've seen the series, I more or less asked for this and I am glad
for it. I will try to unbreak this EFI problem then…
I remember the good old days when the boot loader was gone after it
jumped to the kernel…

> Cheers
> ---Dave

Sebastian
Dave Martin May 23, 2018, 2:55 p.m. UTC | #9
On Wed, May 23, 2018 at 04:31:56PM +0200, Sebastian Andrzej Siewior wrote:
> On 2018-05-17 19:19:43 [+0100], Dave Martin wrote:
> > On Thu, May 17, 2018 at 02:40:06PM +0200, Sebastian Andrzej Siewior wrote:
> > > In v4.16-RT I noticed a number of warnings from task_fpsimd_load(). The
> > > code disables BH and expects that it is not preemptible. On -RT the
> > > task remains preemptible but remains the same CPU. This may corrupt the
> > 
> > Also, watch out for [1] which adds more of this stuff for KVM.  This
> > not merged yet, but likely to land in v4.18.
> 
> yay. I will try to keep this in mind while moving forward.
> 
> > Anyway:
> > 
> > What we're really trying to achieve with the local_bh_disable/enable
> > stuff is exclusive access to the CPU FPSIMD registers and associated
> > metadata that tracks who they belong to.
> 
> I assumed this.

Yup, it's a common pattern cross-arch.

> > > The preempt_disable() + local_bh_enable() combo in kernel_neon_begin()
> > > is not working on -RT. We don't use NEON in kernel mode on RT right now
> > > but this still should be addressed.
> > 
> > I think we effectively have two levels of locking here.
> > 
> > At the outer level, we want exclusive access to the FPSIMD registers.
> > This is what is needed between kernel_neon_begin() and
> > kernel_neon_end(), and maps onto the preempt_disable()/_enable() done
> > by these functions.
> > 
> > In context switch critical code, that's insufficient, and we also
> > need exclusive access to the metadata that tracks which task or context
> > owns the FPSIMD registers.  This is what the local_bh_disable()/
> > _enable() achieves.
> > 
> > 
> > So does it make sense to have two locks (I'm assuming local locks are
> > implicitly percpu ?)
> > 
> > static inline void local_fpsimd_context_lock(void)
> > {
> > 	local_bh_disable();
> > 	local_lock(fpsimd_lock);
> > 	local_lock(fpsimd_context_lock);
> > }
> > 
> > static inline void local_fpsimd_context_unlock(void)
> > {
> > 	local_unlock(fpsimd_context_lock);
> > 	local_unlock(fpsimd_lock);
> > 	local_bh_enable();
> > }
> > 
> > 
> > kernel_neon_begin() could then do
> > 
> > 	local_fpsimd_context_lock();
> > 
> > 	/* ... */
> > 
> > 	preempt_disable();
> > 	local_unlock(fpsimd_context_lock);
> > 
> > ... with the following in kernel_neon_end():
> > 
> > 	local_unlock(fpsimd_lock);
> > 	preempt_enable();
> > 
> > 
> > If kernel-mode NEON was considered harmful to RT due to the context
> > switch overheads, then the above might be overkill.  SVE will be worse
> > in that regard, and also needs thinking about at some point -- I've not
> > looked at if from the RT angle at all.
> > 
> > In either case, I think abstracting the lock/unlock sequences out to
> > make the purpose clearer may be a good idea, even if we just have a
> > single local lock to manage.
> 
> So the two looks you suggested make sense. However I would need to
> replace this preempt_disable() with one such lock. A local_lock() on -RT
> is a per-CPU spin_lock. RT's local_lock gets currently transformed into
> preempt_disable() on !RT configs.

Thinking about this again, I'm guessing it only really makes sense to
have two local locks if there are two independent reasons to inhibit
migration.

There is only one such reason here: preempt_disable() for the purpose
of using the FPSIMD registers, where local_bh_disable() implies this
and also inhibits local softirqs -- which I'm guessing works just
the same with RT.


So maybe there should really be only one local lock as you suggest.

This gives:

static inline void local_fpsimd_context_lock(void)
{
	local_lock(fpsimd_lock);
	local_bh_disable();
}

static inline void local_fpsimd_context_unlock(void)
{
	local_bh_enable();
	local_unlock(fpsimd_lock);
}

kernel_neon_begin(...)
{
	local_lock(fpsimd_lock);
	local_bh_disable();

	/* ... */

	local_bh_enable();
}

kernel_neon_end(...)
{
	/* ... */

	local_unlock(fpsimd_lock);
}

If the local_{,un}lock() gets transmuted to preempt_{dis,en}enable()
for !RT, then this seems to map on to what we currently have.

(Still guessing about how RT works here, so a health warning is
implied.)


If you abstract things this way, can you please split the non-RT changes
into a separate patch and Cc me?  That would be a nice cleanup for
mainline rather than just having the bare local_bh_ and preempt_ stuff
everywhere.

> 
> > There is one place where I mess with the FPSIMD context no lock held
> > because of a need to copy_from_user() straight into the context backing
> > store (we can't copy_from_user() with preemption disabled...)
> > I'm not sure whether RT will have any impact on this, but it probably
> > needs thinking about.
> 
> if no locks are held and the task can be preempted then it also might
> happen on PREEMPT config. But copy_from_user() doesn't sounds like
> something that will go straight to HW.

We're just copying to memory, so I guess so long as program order is
respected when the task is migrated (which is ensured via heavy barriers
somewhere in the scheduler and/or arch context switch code IIRC) then I
think there should be no problem).  This is a prerequisite for
preemptive migration to work at all, not just on RT.  So I suppose
there's no new problem here.

> 
> > One more comment below...
> > 
> > > 
> > > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > > ---
> > >  arch/arm64/kernel/fpsimd.c | 20 ++++++++++++++++++--
> > >  1 file changed, 18 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> > > index e7226c4c7493..3a5cd1908874 100644
> > > --- a/arch/arm64/kernel/fpsimd.c
> > > +++ b/arch/arm64/kernel/fpsimd.c
> > > @@ -38,6 +38,7 @@
> > >  #include <linux/signal.h>
> > >  #include <linux/slab.h>
> > >  #include <linux/sysctl.h>
> > > +#include <linux/locallock.h>
> > >  
> > >  #include <asm/fpsimd.h>
> > >  #include <asm/cputype.h>
> > > @@ -235,7 +236,7 @@ static void sve_user_enable(void)
> > >   *    whether TIF_SVE is clear or set, since these are not vector length
> > >   *    dependent.
> > >   */
> > > -
> > > +static DEFINE_LOCAL_IRQ_LOCK(fpsimd_lock);
> > >  /*
> > >   * Update current's FPSIMD/SVE registers from thread_struct.
> > >   *
> > > @@ -594,6 +595,7 @@ int sve_set_vector_length(struct task_struct *task,
> > >  	 * non-SVE thread.
> > >  	 */
> > >  	if (task == current) {
> > > +		local_lock(fpsimd_lock);
> > >  		local_bh_disable();
> > >  
> > >  		task_fpsimd_save();
> > > @@ -604,8 +606,10 @@ int sve_set_vector_length(struct task_struct *task,
> > >  	if (test_and_clear_tsk_thread_flag(task, TIF_SVE))
> > >  		sve_to_fpsimd(task);
> > >  
> > > -	if (task == current)
> > > +	if (task == current) {
> > > +		local_unlock(fpsimd_lock);
> > 
> > Is this misordered against local_bh_enable(), or doesn't it matter?
> 
> It actually is but it does not matter. On RT local_bh_disable() is
> turned into migrate_disable() what in turn disables the migration of the
> task to another CPU (while it is still preemtible). This
> migrate_disable() is also part of local_lock(). Maybe I will add a
> local_lock_bh() and replace this with this local_bh_disable() so it will
> better :)

Fair enough.

Cheers
---Dave

> 
> > >  		local_bh_enable();
> > > +	}
Steven Rostedt July 11, 2018, 1:25 p.m. UTC | #10
On Tue, 22 May 2018 19:33:33 +0200
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> On 2018-05-22 13:24:29 [-0400], Steven Rostedt wrote:
> > On Tue, 22 May 2018 19:21:16 +0200
> > Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> >   
> > > On 2018-05-22 13:10:04 [-0400], Steven Rostedt wrote:  
> > > > On Thu, 17 May 2018 14:40:06 +0200
> > > > Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> > > >     
> > > > > +static DEFINE_LOCAL_IRQ_LOCK(fpsimd_lock);
> > > > >  /*
> > > > >   * Update current's FPSIMD/SVE registers from thread_struct.
> > > > >   *
> > > > > @@ -594,6 +595,7 @@ int sve_set_vector_length(struct task_struct *task,
> > > > >  	 * non-SVE thread.
> > > > >  	 */
> > > > >  	if (task == current) {
> > > > > +		local_lock(fpsimd_lock);
> > > > >  		local_bh_disable();    
> > > > 
> > > > I'm surprised that we don't have a "local_lock_bh()"?    
> > > 
> > > right. Like the last time when we introduced a global lock with no
> > > locking context? 
> > >   
> > 
> > I meant, we could have local_lock_bh(fpsimd_lock); that would turn into
> > a local_bh_disable() when !PREEMPT_RT.  
> 
> Oh that part. That could be possible I guess. I need to look into the
> second part which disables preemption while the FPU is taken.
>

Did you decide to create a local_lock_bh(lock) function? I don't see it.

And should this be backported to 4.14-rt too? You state you saw this in
4.16-rt, but did you start doing something different then, or did the
kernel change?

Thanks!

-- Steve
Sebastian Andrzej Siewior July 11, 2018, 1:31 p.m. UTC | #11
On 2018-07-11 09:25:55 [-0400], Steven Rostedt wrote:
> Did you decide to create a local_lock_bh(lock) function? I don't see it.
> 
> And should this be backported to 4.14-rt too? You state you saw this in
> 4.16-rt, but did you start doing something different then, or did the
> kernel change?

I wanted to re-evaluate the whole situation here but didn't get to it
yet.

> Thanks!
> 
> -- Steve

Sebastian
Steven Rostedt July 11, 2018, 1:33 p.m. UTC | #12
On Wed, 11 Jul 2018 15:31:57 +0200
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> On 2018-07-11 09:25:55 [-0400], Steven Rostedt wrote:
> > Did you decide to create a local_lock_bh(lock) function? I don't see it.
> > 
> > And should this be backported to 4.14-rt too? You state you saw this in
> > 4.16-rt, but did you start doing something different then, or did the
> > kernel change?  
> 
> I wanted to re-evaluate the whole situation here but didn't get to it
> yet.

Thanks, I wont add this then.

-- Steve
diff mbox

Patch

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index e7226c4c7493..3a5cd1908874 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -38,6 +38,7 @@ 
 #include <linux/signal.h>
 #include <linux/slab.h>
 #include <linux/sysctl.h>
+#include <linux/locallock.h>
 
 #include <asm/fpsimd.h>
 #include <asm/cputype.h>
@@ -235,7 +236,7 @@  static void sve_user_enable(void)
  *    whether TIF_SVE is clear or set, since these are not vector length
  *    dependent.
  */
-
+static DEFINE_LOCAL_IRQ_LOCK(fpsimd_lock);
 /*
  * Update current's FPSIMD/SVE registers from thread_struct.
  *
@@ -594,6 +595,7 @@  int sve_set_vector_length(struct task_struct *task,
 	 * non-SVE thread.
 	 */
 	if (task == current) {
+		local_lock(fpsimd_lock);
 		local_bh_disable();
 
 		task_fpsimd_save();
@@ -604,8 +606,10 @@  int sve_set_vector_length(struct task_struct *task,
 	if (test_and_clear_tsk_thread_flag(task, TIF_SVE))
 		sve_to_fpsimd(task);
 
-	if (task == current)
+	if (task == current) {
+		local_unlock(fpsimd_lock);
 		local_bh_enable();
+	}
 
 	/*
 	 * Force reallocation of task SVE state to the correct size
@@ -838,6 +842,7 @@  asmlinkage void do_sve_acc(unsigned int esr, struct pt_regs *regs)
 	sve_alloc(current);
 
 	local_bh_disable();
+	local_lock(fpsimd_lock);
 
 	task_fpsimd_save();
 	fpsimd_to_sve(current);
@@ -849,6 +854,7 @@  asmlinkage void do_sve_acc(unsigned int esr, struct pt_regs *regs)
 	if (test_and_set_thread_flag(TIF_SVE))
 		WARN_ON(1); /* SVE access shouldn't have trapped */
 
+	local_unlock(fpsimd_lock);
 	local_bh_enable();
 }
 
@@ -926,6 +932,7 @@  void fpsimd_flush_thread(void)
 		return;
 
 	local_bh_disable();
+	local_lock(fpsimd_lock);
 
 	memset(&current->thread.fpsimd_state, 0, sizeof(struct fpsimd_state));
 	fpsimd_flush_task_state(current);
@@ -967,6 +974,7 @@  void fpsimd_flush_thread(void)
 
 	set_thread_flag(TIF_FOREIGN_FPSTATE);
 
+	local_unlock(fpsimd_lock);
 	local_bh_enable();
 }
 
@@ -980,7 +988,9 @@  void fpsimd_preserve_current_state(void)
 		return;
 
 	local_bh_disable();
+	local_lock(fpsimd_lock);
 	task_fpsimd_save();
+	local_unlock(fpsimd_lock);
 	local_bh_enable();
 }
 
@@ -1022,12 +1032,14 @@  void fpsimd_restore_current_state(void)
 		return;
 
 	local_bh_disable();
+	local_lock(fpsimd_lock);
 
 	if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
 		task_fpsimd_load();
 		fpsimd_bind_to_cpu();
 	}
 
+	local_unlock(fpsimd_lock);
 	local_bh_enable();
 }
 
@@ -1042,6 +1054,7 @@  void fpsimd_update_current_state(struct user_fpsimd_state const *state)
 		return;
 
 	local_bh_disable();
+	local_lock(fpsimd_lock);
 
 	current->thread.fpsimd_state.user_fpsimd = *state;
 	if (system_supports_sve() && test_thread_flag(TIF_SVE))
@@ -1052,6 +1065,7 @@  void fpsimd_update_current_state(struct user_fpsimd_state const *state)
 	if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE))
 		fpsimd_bind_to_cpu();
 
+	local_unlock(fpsimd_lock);
 	local_bh_enable();
 }
 
@@ -1116,6 +1130,7 @@  void kernel_neon_begin(void)
 	BUG_ON(!may_use_simd());
 
 	local_bh_disable();
+	local_lock(fpsimd_lock);
 
 	__this_cpu_write(kernel_neon_busy, true);
 
@@ -1128,6 +1143,7 @@  void kernel_neon_begin(void)
 	/* Invalidate any task state remaining in the fpsimd regs: */
 	fpsimd_flush_cpu_state();
 
+	local_unlock(fpsimd_lock);
 	preempt_disable();
 
 	local_bh_enable();