[07/22] x86/fpu: Remove fpu->initialized
diff mbox series

Message ID 20190109114744.10936-8-bigeasy@linutronix.de
State New
Headers show
Series
  • [v6] x86: load FPU registers on return to userland
Related show

Commit Message

Sebastian Andrzej Siewior Jan. 9, 2019, 11:47 a.m. UTC
The `initialized' member of the fpu struct is always set to one user
tasks and zero for kernel tasks. This avoids saving/restoring the FPU
registers for kernel threads.

I expect that fpu->initialized is always 1 and the 0 case has been
removed or is not important. For instance fpu__drop() sets the value to
zero and its caller call either fpu__initialize() (which would
set it back to one) or don't return to userland.

The context switch code (switch_fpu_prepare() + switch_fpu_finish())
can't unconditionally save/restore registers for kernel threads. I have
no idea what will happen if we restore a zero FPU context for the kernel
thread (since it never was initialized). Also it has been agreed that
for PKRU we don't want a random state (inherited from the previous task)
but a deterministic one.

For kernel_fpu_begin() (+end) the situation is similar: The kernel test
bot told me, that EFI with runtime services uses this before
alternatives_patched is true. Which means that this function is used too
early and it wasn't the case before.

For those two cases current->mm is used to determine between user &
kernel thread. For kernel_fpu_begin() we skip save/restore of the FPU
registers.
During the context switch into a kernel thread we don't do anything.
There is no reason to save the FPU state of a kernel thread.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/ia32/ia32_signal.c         | 17 +++-----
 arch/x86/include/asm/fpu/internal.h | 15 +++----
 arch/x86/include/asm/fpu/types.h    |  9 ----
 arch/x86/include/asm/trace/fpu.h    |  5 +--
 arch/x86/kernel/fpu/core.c          | 68 ++++++++---------------------
 arch/x86/kernel/fpu/init.c          |  2 -
 arch/x86/kernel/fpu/regset.c        | 19 ++------
 arch/x86/kernel/fpu/xstate.c        |  2 -
 arch/x86/kernel/process_32.c        |  4 +-
 arch/x86/kernel/process_64.c        |  4 +-
 arch/x86/kernel/signal.c            | 17 +++-----
 arch/x86/mm/pkeys.c                 |  7 +--
 12 files changed, 49 insertions(+), 120 deletions(-)

Comments

Borislav Petkov Jan. 24, 2019, 1:34 p.m. UTC | #1
On Wed, Jan 09, 2019 at 12:47:29PM +0100, Sebastian Andrzej Siewior wrote:
> The `initialized' member of the fpu struct is always set to one user
								 ^
								 for

> tasks and zero for kernel tasks. This avoids saving/restoring the FPU
> registers for kernel threads.
> 
> I expect that fpu->initialized is always 1 and the 0 case has been

"I expect" reads funny. Make that impartial and passive and "expecting"
is the wrong formulation. It needs to be a clear statement talking about
the FPU context's state and why that state is always initialized.

> removed or is not important. For instance fpu__drop() sets the value to
> zero and its caller call either fpu__initialize() (which would

"its callers invoke" or so

> set it back to one) or don't return to userland.
> 
> The context switch code (switch_fpu_prepare() + switch_fpu_finish())
> can't unconditionally save/restore registers for kernel threads. I have
> no idea what will happen if we restore a zero FPU context for the kernel
> thread (since it never was initialized).

Yeah, avoid those "author is wondering" statements.

> Also it has been agreed that
> for PKRU we don't want a random state (inherited from the previous task)
> but a deterministic one.

Rewrite that to state what the PKRU state is going to be.

> For kernel_fpu_begin() (+end) the situation is similar: The kernel test
> bot told me, that EFI with runtime services uses this before
> alternatives_patched is true. Which means that this function is used too
> early and it wasn't the case before.
> 
> For those two cases current->mm is used to determine between user &
> kernel thread.

Now that we start looking at ->mm, I think we should document this
somewhere prominently, maybe

  arch/x86/include/asm/fpu/internal.h

or so along with all the logic this patchset changes wrt FPU handling.
Then we wouldn't have to wonder in the future why stuff is being done
the way it is done.

Like the FPU saving on the user stack frame or why this was needed:

-	/* Update the thread's fxstate to save the fsave header. */
-	if (ia32_fxstate)
-		copy_fxregs_to_kernel(fpu);

Some sort of a high-level invariants written down would save us a lot of
head scratching in the future.

> For kernel_fpu_begin() we skip save/restore of the FPU
> registers.
> During the context switch into a kernel thread we don't do anything.
> There is no reason to save the FPU state of a kernel thread.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  arch/x86/ia32/ia32_signal.c         | 17 +++-----
>  arch/x86/include/asm/fpu/internal.h | 15 +++----
>  arch/x86/include/asm/fpu/types.h    |  9 ----
>  arch/x86/include/asm/trace/fpu.h    |  5 +--
>  arch/x86/kernel/fpu/core.c          | 68 ++++++++---------------------
>  arch/x86/kernel/fpu/init.c          |  2 -
>  arch/x86/kernel/fpu/regset.c        | 19 ++------
>  arch/x86/kernel/fpu/xstate.c        |  2 -
>  arch/x86/kernel/process_32.c        |  4 +-
>  arch/x86/kernel/process_64.c        |  4 +-
>  arch/x86/kernel/signal.c            | 17 +++-----
>  arch/x86/mm/pkeys.c                 |  7 +--
>  12 files changed, 49 insertions(+), 120 deletions(-)

...

> diff --git a/arch/x86/include/asm/trace/fpu.h b/arch/x86/include/asm/trace/fpu.h
> index 069c04be15076..bd65f6ba950f8 100644
> --- a/arch/x86/include/asm/trace/fpu.h
> +++ b/arch/x86/include/asm/trace/fpu.h
> @@ -13,22 +13,19 @@ DECLARE_EVENT_CLASS(x86_fpu,
>  
>  	TP_STRUCT__entry(
>  		__field(struct fpu *, fpu)
> -		__field(bool, initialized)
>  		__field(u64, xfeatures)
>  		__field(u64, xcomp_bv)
>  		),

Yikes, can you do that?

rostedt has been preaching that adding members at the end of tracepoints
is ok but not changing them in the middle as that breaks ABI.

Might wanna ping him about it first.

>  
>  	TP_fast_assign(
>  		__entry->fpu		= fpu;
> -		__entry->initialized	= fpu->initialized;
>  		if (boot_cpu_has(X86_FEATURE_OSXSAVE)) {
>  			__entry->xfeatures = fpu->state.xsave.header.xfeatures;
>  			__entry->xcomp_bv  = fpu->state.xsave.header.xcomp_bv;
>  		}
>  	),
> -	TP_printk("x86/fpu: %p initialized: %d xfeatures: %llx xcomp_bv: %llx",
> +	TP_printk("x86/fpu: %p xfeatures: %llx xcomp_bv: %llx",
>  			__entry->fpu,
> -			__entry->initialized,
>  			__entry->xfeatures,
>  			__entry->xcomp_bv
>  	)
> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index e43296854e379..3a4668c9d24f1 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -101,7 +101,7 @@ static void __kernel_fpu_begin(void)
>  
>  	kernel_fpu_disable();
>  
> -	if (fpu->initialized) {
> +	if (current->mm) {
>  		/*
>  		 * Ignore return value -- we don't care if reg state
>  		 * is clobbered.
> @@ -116,7 +116,7 @@ static void __kernel_fpu_end(void)
>  {
>  	struct fpu *fpu = &current->thread.fpu;
>  
> -	if (fpu->initialized)
> +	if (current->mm)
>  		copy_kernel_to_fpregs(&fpu->state);
>  
>  	kernel_fpu_enable();
> @@ -147,10 +147,9 @@ void fpu__save(struct fpu *fpu)
>  
>  	preempt_disable();
>  	trace_x86_fpu_before_save(fpu);
> -	if (fpu->initialized) {
> -		if (!copy_fpregs_to_fpstate(fpu)) {
> -			copy_kernel_to_fpregs(&fpu->state);
> -		}
> +
> +	if (!copy_fpregs_to_fpstate(fpu)) {
> +		copy_kernel_to_fpregs(&fpu->state);
>  	}

WARNING: braces {} are not necessary for single statement blocks
#217: FILE: arch/x86/kernel/fpu/core.c:151:
+       if (!copy_fpregs_to_fpstate(fpu)) {
+               copy_kernel_to_fpregs(&fpu->state);
        }


...

> diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
> index 7888a41a03cdb..77d9eb43ccac8 100644
> --- a/arch/x86/kernel/process_32.c
> +++ b/arch/x86/kernel/process_32.c
> @@ -288,10 +288,10 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
>  	if (prev->gs | next->gs)
>  		lazy_load_gs(next->gs);
>  
> -	switch_fpu_finish(next_fpu, cpu);
> -
>  	this_cpu_write(current_task, next_p);
>  
> +	switch_fpu_finish(next_fpu, cpu);
> +
>  	/* Load the Intel cache allocation PQR MSR. */
>  	resctrl_sched_in();
>  
> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> index e1983b3a16c43..ffea7c557963a 100644
> --- a/arch/x86/kernel/process_64.c
> +++ b/arch/x86/kernel/process_64.c
> @@ -566,14 +566,14 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
>  
>  	x86_fsgsbase_load(prev, next);
>  
> -	switch_fpu_finish(next_fpu, cpu);
> -
>  	/*
>  	 * Switch the PDA and FPU contexts.
>  	 */
>  	this_cpu_write(current_task, next_p);
>  	this_cpu_write(cpu_current_top_of_stack, task_top_of_stack(next_p));
>  
> +	switch_fpu_finish(next_fpu, cpu);
> +
>  	/* Reload sp0. */
>  	update_task_stack(next_p);
>  

Those moves need at least a comment in the commit message or a separate
patch.
Sebastian Andrzej Siewior Feb. 5, 2019, 6:03 p.m. UTC | #2
On 2019-01-24 14:34:49 [+0100], Borislav Petkov wrote:
> > set it back to one) or don't return to userland.
> > 
> > The context switch code (switch_fpu_prepare() + switch_fpu_finish())
> > can't unconditionally save/restore registers for kernel threads. I have
> > no idea what will happen if we restore a zero FPU context for the kernel
> > thread (since it never was initialized).
> 
> Yeah, avoid those "author is wondering" statements.

So I am no longer unsure about certain thing. Understood.

> > Also it has been agreed that
> > for PKRU we don't want a random state (inherited from the previous task)
> > but a deterministic one.
> 
> Rewrite that to state what the PKRU state is going to be.
I dropped that part. It was part for this patch in an earlier version
but it was moved.

> > For kernel_fpu_begin() (+end) the situation is similar: The kernel test
> > bot told me, that EFI with runtime services uses this before
> > alternatives_patched is true. Which means that this function is used too
> > early and it wasn't the case before.
> > 
> > For those two cases current->mm is used to determine between user &
> > kernel thread.
> 
> Now that we start looking at ->mm, I think we should document this
> somewhere prominently, maybe
> 
>   arch/x86/include/asm/fpu/internal.h
> 
> or so along with all the logic this patchset changes wrt FPU handling.
> Then we wouldn't have to wonder in the future why stuff is being done
> the way it is done.

Well, nothing changes in regard to the logic. Earlier we had a variable
which helped us to distinguish between user & kernel thread. Now we have
a different one. 
I'm going to add a comment to switch_fpu_prepare() about ->mm since you
insist but I would like to avoid it.

> Like the FPU saving on the user stack frame or why this was needed:
> 
> -	/* Update the thread's fxstate to save the fsave header. */
> -	if (ia32_fxstate)
> -		copy_fxregs_to_kernel(fpu);
> 
> Some sort of a high-level invariants written down would save us a lot of
> head scratching in the future.

We have a comment, it is just not helping.

> > diff --git a/arch/x86/include/asm/trace/fpu.h b/arch/x86/include/asm/trace/fpu.h
> > index 069c04be15076..bd65f6ba950f8 100644
> > --- a/arch/x86/include/asm/trace/fpu.h
> > +++ b/arch/x86/include/asm/trace/fpu.h
> > @@ -13,22 +13,19 @@ DECLARE_EVENT_CLASS(x86_fpu,
> >  
> >  	TP_STRUCT__entry(
> >  		__field(struct fpu *, fpu)
> > -		__field(bool, initialized)
> >  		__field(u64, xfeatures)
> >  		__field(u64, xcomp_bv)
> >  		),
> 
> Yikes, can you do that?
> 
> rostedt has been preaching that adding members at the end of tracepoints
> is ok but not changing them in the middle as that breaks ABI.
> 
> Might wanna ping him about it first.

Steven said on IRC that it can be removed.

> > diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> > index e43296854e379..3a4668c9d24f1 100644
> > --- a/arch/x86/kernel/fpu/core.c
> > +++ b/arch/x86/kernel/fpu/core.c
> > @@ -147,10 +147,9 @@ void fpu__save(struct fpu *fpu)
> >  
> >  	preempt_disable();
> >  	trace_x86_fpu_before_save(fpu);
> > -	if (fpu->initialized) {
> > -		if (!copy_fpregs_to_fpstate(fpu)) {
> > -			copy_kernel_to_fpregs(&fpu->state);
> > -		}
> > +
> > +	if (!copy_fpregs_to_fpstate(fpu)) {
> > +		copy_kernel_to_fpregs(&fpu->state);
> >  	}
> 
> WARNING: braces {} are not necessary for single statement blocks
> #217: FILE: arch/x86/kernel/fpu/core.c:151:
> +       if (!copy_fpregs_to_fpstate(fpu)) {
> +               copy_kernel_to_fpregs(&fpu->state);
>         }
removed.

> 
> ...
> 
> > diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
> > index 7888a41a03cdb..77d9eb43ccac8 100644
> > --- a/arch/x86/kernel/process_32.c
> > +++ b/arch/x86/kernel/process_32.c
> > @@ -288,10 +288,10 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
> >  	if (prev->gs | next->gs)
> >  		lazy_load_gs(next->gs);
> >  
> > -	switch_fpu_finish(next_fpu, cpu);
> > -
> >  	this_cpu_write(current_task, next_p);
> >  
> > +	switch_fpu_finish(next_fpu, cpu);
> > +
> >  	/* Load the Intel cache allocation PQR MSR. */
> >  	resctrl_sched_in();
> >  
> > diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> > index e1983b3a16c43..ffea7c557963a 100644
> > --- a/arch/x86/kernel/process_64.c
> > +++ b/arch/x86/kernel/process_64.c
> > @@ -566,14 +566,14 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
> >  
> >  	x86_fsgsbase_load(prev, next);
> >  
> > -	switch_fpu_finish(next_fpu, cpu);
> > -
> >  	/*
> >  	 * Switch the PDA and FPU contexts.
> >  	 */
> >  	this_cpu_write(current_task, next_p);
> >  	this_cpu_write(cpu_current_top_of_stack, task_top_of_stack(next_p));
> >  
> > +	switch_fpu_finish(next_fpu, cpu);
> > +
> >  	/* Reload sp0. */
> >  	update_task_stack(next_p);
> >  
> 
> Those moves need at least a comment in the commit message or a separate
> patch.

This needs to be part of this patch. I add a note to the commit message.

Sebastian
Borislav Petkov Feb. 6, 2019, 2:01 p.m. UTC | #3
On Tue, Feb 05, 2019 at 07:03:37PM +0100, Sebastian Andrzej Siewior wrote:
> Well, nothing changes in regard to the logic. Earlier we had a variable
> which helped us to distinguish between user & kernel thread. Now we have
> a different one. 
> I'm going to add a comment to switch_fpu_prepare() about ->mm since you
> insist but I would like to avoid it.

I don't understand what that aversion is towards commenting stuff,
especially important stuff like the meaning of the presence of ->mm for
the FPU code. What is the downside to documenting that?

Considering that in this very thread we ourselves encountered the fact
that stuff is not documented and we complained that it wasn't!

> We have a comment, it is just not helping.

Why is it not helping?

> Steven said on IRC that it can be removed.

Did he give an explanation why is it ok?

Thx.
Sebastian Andrzej Siewior Feb. 7, 2019, 10:13 a.m. UTC | #4
On 2019-02-06 15:01:14 [+0100], Borislav Petkov wrote:
> On Tue, Feb 05, 2019 at 07:03:37PM +0100, Sebastian Andrzej Siewior wrote:
> > Well, nothing changes in regard to the logic. Earlier we had a variable
> > which helped us to distinguish between user & kernel thread. Now we have
> > a different one. 
> > I'm going to add a comment to switch_fpu_prepare() about ->mm since you
> > insist but I would like to avoid it.
> 
> I don't understand what that aversion is towards commenting stuff,
> especially important stuff like the meaning of the presence of ->mm for
> the FPU code. What is the downside to documenting that?

I don't like commenting the obvious things in code but I might be wrong
on what I consider here obvious. The important part is probably that we
don't save/restore FPU registers for kernel threads but this isn't new,
it was always like that (more or less implicit). The ->mm part is an
implementation detail (and is used in other places).
That said I already added this:
|@@ -525,11 +525,14 @@ static inline void fpregs_activate(struc
|  *
|  *  - switch_fpu_finish() restores the new state as
|  *    necessary.
|+ *
|+ * The FPU context is only stored/restore for user task and ->mm is used to
|+ * distinguish between kernel and user threads.
|  */
| static inline void
| switch_fpu_prepare(struct fpu *old_fpu, int cpu)
| {

and I *think* that this is enough. This *what* we do and not *why*. I
don't have an answer towards *why*.

> Considering that in this very thread we ourselves encountered the fact
> that stuff is not documented and we complained that it wasn't!

Yes. We had no idea why we save the FPU registers on user's stack during
signal handling. Was this an implementation detail on kernel side as
part of signal handling or is this required/ expected by the user as
part of a use case? We have now the explanation that signals may
cascade. Do we know by now if userland is supposed to use it or it
accessed the register because they were available?
The MPX code did access the MPX part of the xsave area (others do it for
"testing/debug" as per my I google research). This kind of things should
be part of the ABI document and not only a comment in the kernel.
Are the MAGIC constants only in-kernel use (to check if the user
accidentally overwrote its stack) or should be checked by the user
during signal handling to ensure that the xsave area is available.

> > We have a comment, it is just not helping.
> 
> Why is it not helping?

The part you referred to was:
|-       /* Update the thread's fxstate to save the fsave header. */
|-       if (ia32_fxstate) 
|-               copy_fxregs_to_kernel(fpu);

and it is not helping because it does not explain why it is done. I can
see based on the code that the FXstate is saved in case of a 32bit
frame. It is saved into thread's state. It does not explain why it
needs to be done. That is the "not helping" part.

> > Steven said on IRC that it can be removed.
> 
> Did he give an explanation why is it ok?

I can forward you the IRC pieces offlist if you like. He said I can
remove it if there are no users and I am not aware of any. He pointed
out that sched_wakeup had a "success" member which was relied on by
tools so it remained in order not to break them. So we have
	__entry->success        = 1; /* rudiment, kill when possible */

in the tree now. I can loop him in if this is not enough.

> Thx.

Sebastian
Borislav Petkov Feb. 7, 2019, 10:37 a.m. UTC | #5
On Thu, Feb 07, 2019 at 11:13:01AM +0100, Sebastian Andrzej Siewior wrote:
> and I *think* that this is enough. This *what* we do and not *why*. I
> don't have an answer towards *why*.

Well, it is a start.

You now have everything in your L1 and it is all clear but I'm sure all
the details will be LRU-evicted out soon :) and then you'd wish you'd
written down at least a small hint explaining the grand scheme at least.

> 
> > Considering that in this very thread we ourselves encountered the fact
> > that stuff is not documented and we complained that it wasn't!
> 
> Yes. We had no idea why we save the FPU registers on user's stack during
> signal handling. Was this an implementation detail on kernel side as
> part of signal handling or is this required/ expected by the user as
> part of a use case?

Well, at least a comment over get_sigframe() would've helped a long way,
right?

Instead of scratching heads why is this being done this way.

> We have now the explanation that signals may cascade. Do we know by
> now if userland is supposed to use it or it accessed the register
> because they were available? The MPX code did access the MPX part of
> the xsave area (others do it for "testing/debug" as per my I google
> research). This kind of things should be part of the ABI document and
> not only a comment in the kernel.

Absolutely agreed.

> Are the MAGIC constants only in-kernel use (to check if the user
> accidentally overwrote its stack) or should be checked by the user
> during signal handling to ensure that the xsave area is available.

I don't think the user should care but what do I know?!

> The part you referred to was:
> |-       /* Update the thread's fxstate to save the fsave header. */
> |-       if (ia32_fxstate) 
> |-               copy_fxregs_to_kernel(fpu);
> 
> and it is not helping because it does not explain why it is done. I can
> see based on the code that the FXstate is saved in case of a 32bit
> frame. It is saved into thread's state. It does not explain why it
> needs to be done. That is the "not helping" part.

This is *exactly* why I propose that we should have a
"grand-scheme-of-things" explanation somewhere about what we're doing
with the FPU context.

Figuring out what exactly to do in which context should be easier then.
I hope.

> I can forward you the IRC pieces offlist if you like. He said I can
> remove it if there are no users and I am not aware of any. He pointed
> out that sched_wakeup had a "success" member which was relied on by
> tools so it remained in order not to break them. So we have
> 	__entry->success        = 1; /* rudiment, kill when possible */
> 
> in the tree now. I can loop him in if this is not enough.

So you're replacing the old member with the new, AFAICT and I guess
that doesn't change offsets so even tools which don't use libtraceevent
should be fine but we better make sure before we break userspace because
we don't break userpace :)

Patch
diff mbox series

diff --git a/arch/x86/ia32/ia32_signal.c b/arch/x86/ia32/ia32_signal.c
index 321fe5f5d0e96..6eeb3249f22ff 100644
--- a/arch/x86/ia32/ia32_signal.c
+++ b/arch/x86/ia32/ia32_signal.c
@@ -216,8 +216,7 @@  static void __user *get_sigframe(struct ksignal *ksig, struct pt_regs *regs,
 				 size_t frame_size,
 				 void __user **fpstate)
 {
-	struct fpu *fpu = &current->thread.fpu;
-	unsigned long sp;
+	unsigned long sp, fx_aligned, math_size;
 
 	/* Default to using normal stack */
 	sp = regs->sp;
@@ -231,15 +230,11 @@  static void __user *get_sigframe(struct ksignal *ksig, struct pt_regs *regs,
 		 ksig->ka.sa.sa_restorer)
 		sp = (unsigned long) ksig->ka.sa.sa_restorer;
 
-	if (fpu->initialized) {
-		unsigned long fx_aligned, math_size;
-
-		sp = fpu__alloc_mathframe(sp, 1, &fx_aligned, &math_size);
-		*fpstate = (struct _fpstate_32 __user *) sp;
-		if (copy_fpstate_to_sigframe(*fpstate, (void __user *)fx_aligned,
-				    math_size) < 0)
-			return (void __user *) -1L;
-	}
+	sp = fpu__alloc_mathframe(sp, 1, &fx_aligned, &math_size);
+	*fpstate = (struct _fpstate_32 __user *) sp;
+	if (copy_fpstate_to_sigframe(*fpstate, (void __user *)fx_aligned,
+				     math_size) < 0)
+		return (void __user *) -1L;
 
 	sp -= frame_size;
 	/* Align the stack pointer according to the i386 ABI,
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 9f0b3ff8c9b7b..3d5121d2bc0bc 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -529,7 +529,7 @@  static inline void fpregs_activate(struct fpu *fpu)
 static inline void
 switch_fpu_prepare(struct fpu *old_fpu, int cpu)
 {
-	if (static_cpu_has(X86_FEATURE_FPU) && old_fpu->initialized) {
+	if (static_cpu_has(X86_FEATURE_FPU) && current->mm) {
 		if (!copy_fpregs_to_fpstate(old_fpu))
 			old_fpu->last_cpu = -1;
 		else
@@ -537,8 +537,7 @@  switch_fpu_prepare(struct fpu *old_fpu, int cpu)
 
 		/* But leave fpu_fpregs_owner_ctx! */
 		trace_x86_fpu_regs_deactivated(old_fpu);
-	} else
-		old_fpu->last_cpu = -1;
+	}
 }
 
 /*
@@ -551,12 +550,12 @@  switch_fpu_prepare(struct fpu *old_fpu, int cpu)
  */
 static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
 {
-	bool preload = static_cpu_has(X86_FEATURE_FPU) &&
-		       new_fpu->initialized;
+	if (static_cpu_has(X86_FEATURE_FPU)) {
+		if (!fpregs_state_valid(new_fpu, cpu)) {
+			if (current->mm)
+				copy_kernel_to_fpregs(&new_fpu->state);
+		}
 
-	if (preload) {
-		if (!fpregs_state_valid(new_fpu, cpu))
-			copy_kernel_to_fpregs(&new_fpu->state);
 		fpregs_activate(new_fpu);
 	}
 }
diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index 202c53918ecfa..c5a6edd92de4f 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -293,15 +293,6 @@  struct fpu {
 	 */
 	unsigned int			last_cpu;
 
-	/*
-	 * @initialized:
-	 *
-	 * This flag indicates whether this context is initialized: if the task
-	 * is not running then we can restore from this context, if the task
-	 * is running then we should save into this context.
-	 */
-	unsigned char			initialized;
-
 	/*
 	 * @state:
 	 *
diff --git a/arch/x86/include/asm/trace/fpu.h b/arch/x86/include/asm/trace/fpu.h
index 069c04be15076..bd65f6ba950f8 100644
--- a/arch/x86/include/asm/trace/fpu.h
+++ b/arch/x86/include/asm/trace/fpu.h
@@ -13,22 +13,19 @@  DECLARE_EVENT_CLASS(x86_fpu,
 
 	TP_STRUCT__entry(
 		__field(struct fpu *, fpu)
-		__field(bool, initialized)
 		__field(u64, xfeatures)
 		__field(u64, xcomp_bv)
 		),
 
 	TP_fast_assign(
 		__entry->fpu		= fpu;
-		__entry->initialized	= fpu->initialized;
 		if (boot_cpu_has(X86_FEATURE_OSXSAVE)) {
 			__entry->xfeatures = fpu->state.xsave.header.xfeatures;
 			__entry->xcomp_bv  = fpu->state.xsave.header.xcomp_bv;
 		}
 	),
-	TP_printk("x86/fpu: %p initialized: %d xfeatures: %llx xcomp_bv: %llx",
+	TP_printk("x86/fpu: %p xfeatures: %llx xcomp_bv: %llx",
 			__entry->fpu,
-			__entry->initialized,
 			__entry->xfeatures,
 			__entry->xcomp_bv
 	)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index e43296854e379..3a4668c9d24f1 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -101,7 +101,7 @@  static void __kernel_fpu_begin(void)
 
 	kernel_fpu_disable();
 
-	if (fpu->initialized) {
+	if (current->mm) {
 		/*
 		 * Ignore return value -- we don't care if reg state
 		 * is clobbered.
@@ -116,7 +116,7 @@  static void __kernel_fpu_end(void)
 {
 	struct fpu *fpu = &current->thread.fpu;
 
-	if (fpu->initialized)
+	if (current->mm)
 		copy_kernel_to_fpregs(&fpu->state);
 
 	kernel_fpu_enable();
@@ -147,10 +147,9 @@  void fpu__save(struct fpu *fpu)
 
 	preempt_disable();
 	trace_x86_fpu_before_save(fpu);
-	if (fpu->initialized) {
-		if (!copy_fpregs_to_fpstate(fpu)) {
-			copy_kernel_to_fpregs(&fpu->state);
-		}
+
+	if (!copy_fpregs_to_fpstate(fpu)) {
+		copy_kernel_to_fpregs(&fpu->state);
 	}
 	trace_x86_fpu_after_save(fpu);
 	preempt_enable();
@@ -190,7 +189,7 @@  int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
 {
 	dst_fpu->last_cpu = -1;
 
-	if (!src_fpu->initialized || !static_cpu_has(X86_FEATURE_FPU))
+	if (!static_cpu_has(X86_FEATURE_FPU))
 		return 0;
 
 	WARN_ON_FPU(src_fpu != &current->thread.fpu);
@@ -227,14 +226,10 @@  static void fpu__initialize(struct fpu *fpu)
 {
 	WARN_ON_FPU(fpu != &current->thread.fpu);
 
-	if (!fpu->initialized) {
-		fpstate_init(&fpu->state);
-		trace_x86_fpu_init_state(fpu);
+	fpstate_init(&fpu->state);
+	trace_x86_fpu_init_state(fpu);
 
-		trace_x86_fpu_activate_state(fpu);
-		/* Safe to do for the current task: */
-		fpu->initialized = 1;
-	}
+	trace_x86_fpu_activate_state(fpu);
 }
 
 /*
@@ -247,32 +242,20 @@  static void fpu__initialize(struct fpu *fpu)
  *
  * - or it's called for stopped tasks (ptrace), in which case the
  *   registers were already saved by the context-switch code when
- *   the task scheduled out - we only have to initialize the registers
- *   if they've never been initialized.
+ *   the task scheduled out.
  *
  * If the task has used the FPU before then save it.
  */
 void fpu__prepare_read(struct fpu *fpu)
 {
-	if (fpu == &current->thread.fpu) {
+	if (fpu == &current->thread.fpu)
 		fpu__save(fpu);
-	} else {
-		if (!fpu->initialized) {
-			fpstate_init(&fpu->state);
-			trace_x86_fpu_init_state(fpu);
-
-			trace_x86_fpu_activate_state(fpu);
-			/* Safe to do for current and for stopped child tasks: */
-			fpu->initialized = 1;
-		}
-	}
 }
 
 /*
  * This function must be called before we write a task's fpstate.
  *
- * If the task has used the FPU before then invalidate any cached FPU registers.
- * If the task has not used the FPU before then initialize its fpstate.
+ * Invalidate any cached FPU registers.
  *
  * After this function call, after registers in the fpstate are
  * modified and the child task has woken up, the child task will
@@ -289,17 +272,8 @@  void fpu__prepare_write(struct fpu *fpu)
 	 */
 	WARN_ON_FPU(fpu == &current->thread.fpu);
 
-	if (fpu->initialized) {
-		/* Invalidate any cached state: */
-		__fpu_invalidate_fpregs_state(fpu);
-	} else {
-		fpstate_init(&fpu->state);
-		trace_x86_fpu_init_state(fpu);
-
-		trace_x86_fpu_activate_state(fpu);
-		/* Safe to do for stopped child tasks: */
-		fpu->initialized = 1;
-	}
+	/* Invalidate any cached state: */
+	__fpu_invalidate_fpregs_state(fpu);
 }
 
 /*
@@ -316,17 +290,13 @@  void fpu__drop(struct fpu *fpu)
 	preempt_disable();
 
 	if (fpu == &current->thread.fpu) {
-		if (fpu->initialized) {
-			/* Ignore delayed exceptions from user space */
-			asm volatile("1: fwait\n"
-				     "2:\n"
-				     _ASM_EXTABLE(1b, 2b));
-			fpregs_deactivate(fpu);
-		}
+		/* Ignore delayed exceptions from user space */
+		asm volatile("1: fwait\n"
+			     "2:\n"
+			     _ASM_EXTABLE(1b, 2b));
+		fpregs_deactivate(fpu);
 	}
 
-	fpu->initialized = 0;
-
 	trace_x86_fpu_dropped(fpu);
 
 	preempt_enable();
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 6abd83572b016..20d8fa7124c77 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -239,8 +239,6 @@  static void __init fpu__init_system_ctx_switch(void)
 
 	WARN_ON_FPU(!on_boot_cpu);
 	on_boot_cpu = 0;
-
-	WARN_ON_FPU(current->thread.fpu.initialized);
 }
 
 /*
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index 5dbc099178a88..d652b939ccfb5 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -15,16 +15,12 @@ 
  */
 int regset_fpregs_active(struct task_struct *target, const struct user_regset *regset)
 {
-	struct fpu *target_fpu = &target->thread.fpu;
-
-	return target_fpu->initialized ? regset->n : 0;
+	return regset->n;
 }
 
 int regset_xregset_fpregs_active(struct task_struct *target, const struct user_regset *regset)
 {
-	struct fpu *target_fpu = &target->thread.fpu;
-
-	if (boot_cpu_has(X86_FEATURE_FXSR) && target_fpu->initialized)
+	if (boot_cpu_has(X86_FEATURE_FXSR))
 		return regset->n;
 	else
 		return 0;
@@ -370,16 +366,9 @@  int fpregs_set(struct task_struct *target, const struct user_regset *regset,
 int dump_fpu(struct pt_regs *regs, struct user_i387_struct *ufpu)
 {
 	struct task_struct *tsk = current;
-	struct fpu *fpu = &tsk->thread.fpu;
-	int fpvalid;
 
-	fpvalid = fpu->initialized;
-	if (fpvalid)
-		fpvalid = !fpregs_get(tsk, NULL,
-				      0, sizeof(struct user_i387_ia32_struct),
-				      ufpu, NULL);
-
-	return fpvalid;
+	return !fpregs_get(tsk, NULL, 0, sizeof(struct user_i387_ia32_struct),
+			   ufpu, NULL);
 }
 EXPORT_SYMBOL(dump_fpu);
 
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 9cc108456d0be..914d9886c6ee8 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -892,8 +892,6 @@  const void *get_xsave_field_ptr(int xsave_state)
 {
 	struct fpu *fpu = &current->thread.fpu;
 
-	if (!fpu->initialized)
-		return NULL;
 	/*
 	 * fpu__save() takes the CPU's xstate registers
 	 * and saves them off to the 'fpu memory buffer.
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 7888a41a03cdb..77d9eb43ccac8 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -288,10 +288,10 @@  __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	if (prev->gs | next->gs)
 		lazy_load_gs(next->gs);
 
-	switch_fpu_finish(next_fpu, cpu);
-
 	this_cpu_write(current_task, next_p);
 
+	switch_fpu_finish(next_fpu, cpu);
+
 	/* Load the Intel cache allocation PQR MSR. */
 	resctrl_sched_in();
 
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index e1983b3a16c43..ffea7c557963a 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -566,14 +566,14 @@  __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 
 	x86_fsgsbase_load(prev, next);
 
-	switch_fpu_finish(next_fpu, cpu);
-
 	/*
 	 * Switch the PDA and FPU contexts.
 	 */
 	this_cpu_write(current_task, next_p);
 	this_cpu_write(cpu_current_top_of_stack, task_top_of_stack(next_p));
 
+	switch_fpu_finish(next_fpu, cpu);
+
 	/* Reload sp0. */
 	update_task_stack(next_p);
 
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 08dfd4c1a4f95..6f45f795690f6 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -246,7 +246,7 @@  get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size,
 	unsigned long sp = regs->sp;
 	unsigned long buf_fx = 0;
 	int onsigstack = on_sig_stack(sp);
-	struct fpu *fpu = &current->thread.fpu;
+	int ret;
 
 	/* redzone */
 	if (IS_ENABLED(CONFIG_X86_64))
@@ -265,11 +265,9 @@  get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size,
 		sp = (unsigned long) ka->sa.sa_restorer;
 	}
 
-	if (fpu->initialized) {
-		sp = fpu__alloc_mathframe(sp, IS_ENABLED(CONFIG_X86_32),
-					  &buf_fx, &math_size);
-		*fpstate = (void __user *)sp;
-	}
+	sp = fpu__alloc_mathframe(sp, IS_ENABLED(CONFIG_X86_32),
+				  &buf_fx, &math_size);
+	*fpstate = (void __user *)sp;
 
 	sp = align_sigframe(sp - frame_size);
 
@@ -281,8 +279,8 @@  get_sigframe(struct k_sigaction *ka, struct pt_regs *regs, size_t frame_size,
 		return (void __user *)-1L;
 
 	/* save i387 and extended state */
-	if (fpu->initialized &&
-	    copy_fpstate_to_sigframe(*fpstate, (void __user *)buf_fx, math_size) < 0)
+	ret = copy_fpstate_to_sigframe(*fpstate, (void __user *)buf_fx, math_size);
+	if (ret < 0)
 		return (void __user *)-1L;
 
 	return (void __user *)sp;
@@ -763,8 +761,7 @@  handle_signal(struct ksignal *ksig, struct pt_regs *regs)
 		/*
 		 * Ensure the signal handler starts with the new fpu state.
 		 */
-		if (fpu->initialized)
-			fpu__clear(fpu);
+		fpu__clear(fpu);
 	}
 	signal_setup_done(failed, ksig, stepping);
 }
diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c
index 047a77f6a10cb..05bb9a44eb1c3 100644
--- a/arch/x86/mm/pkeys.c
+++ b/arch/x86/mm/pkeys.c
@@ -39,17 +39,12 @@  int __execute_only_pkey(struct mm_struct *mm)
 	 * dance to set PKRU if we do not need to.  Check it
 	 * first and assume that if the execute-only pkey is
 	 * write-disabled that we do not have to set it
-	 * ourselves.  We need preempt off so that nobody
-	 * can make fpregs inactive.
+	 * ourselves.
 	 */
-	preempt_disable();
 	if (!need_to_set_mm_pkey &&
-	    current->thread.fpu.initialized &&
 	    !__pkru_allows_read(read_pkru(), execute_only_pkey)) {
-		preempt_enable();
 		return execute_only_pkey;
 	}
-	preempt_enable();
 
 	/*
 	 * Set up PKRU so that it denies access for everything