[08/22] x86/fpu: Remove user_fpu_begin()
diff mbox series

Message ID 20190109114744.10936-9-bigeasy@linutronix.de
State New
Headers show
Series
  • [v6] x86: load FPU registers on return to userland
Related show

Commit Message

Sebastian Andrzej Siewior Jan. 9, 2019, 11:47 a.m. UTC
user_fpu_begin() sets fpu_fpregs_owner_ctx to task's fpu struct. This is
always the case since there is no lazy FPU anymore.

fpu_fpregs_owner_ctx is used during context switch to decide if it needs
to load the saved registers or if the currently loaded registers are
valid. It could be skipped during
	taskA -> kernel thread -> taskA

because the switch to kernel thread would not alter the CPU's FPU state.

Since this field is always updated during context switch and never
invalidated, setting it manually (in user context) makes no difference.
A kernel thread with kernel_fpu_begin() block could set
fpu_fpregs_owner_ctx to NULL but a kernel thread does not use
user_fpu_begin().
This is a leftover from the lazy-FPU time.

Remove user_fpu_begin(), it does not change fpu_fpregs_owner_ctx's
content.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/include/asm/fpu/internal.h | 17 -----------------
 arch/x86/kernel/fpu/core.c          |  4 +---
 arch/x86/kernel/fpu/signal.c        |  1 -
 3 files changed, 1 insertion(+), 21 deletions(-)

Comments

Borislav Petkov Jan. 25, 2019, 3:18 p.m. UTC | #1
On Wed, Jan 09, 2019 at 12:47:30PM +0100, Sebastian Andrzej Siewior wrote:
> user_fpu_begin() sets fpu_fpregs_owner_ctx to task's fpu struct. This is
> always the case since there is no lazy FPU anymore.
> 
> fpu_fpregs_owner_ctx is used during context switch to decide if it needs
> to load the saved registers or if the currently loaded registers are
> valid. It could be skipped during
> 	taskA -> kernel thread -> taskA
> 
> because the switch to kernel thread would not alter the CPU's FPU state.
> 
> Since this field is always updated during context switch and never
> invalidated, setting it manually (in user context) makes no difference.
> A kernel thread with kernel_fpu_begin() block could set
> fpu_fpregs_owner_ctx to NULL but a kernel thread does not use
> user_fpu_begin().
> This is a leftover from the lazy-FPU time.
> 
> Remove user_fpu_begin(), it does not change fpu_fpregs_owner_ctx's
> content.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  arch/x86/include/asm/fpu/internal.h | 17 -----------------
>  arch/x86/kernel/fpu/core.c          |  4 +---
>  arch/x86/kernel/fpu/signal.c        |  1 -
>  3 files changed, 1 insertion(+), 21 deletions(-)

Reviewed-by: Borislav Petkov <bp@suse.de>

Should we do this microoptimization in addition, to save us the
activation when the kernel thread here:

	taskA -> kernel thread -> taskA

doesn't call kernel_fpu_begin() and thus fpu_fpregs_owner_ctx remains
the same?

It would be a bit more correct as it won't invoke the
trace_x86_fpu_regs_activated() TP in case the FPU context is the same.

---
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index bfe0bfc7d0d1..ee1ac46a7820 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -510,7 +510,7 @@ switch_fpu_prepare(struct fpu *old_fpu, int cpu)
  * Set up the userspace FPU context for the new task, if the task
  * has used the FPU.
  */
-static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
+static inline void switch_fpu_finish(struct fpu *prev_fpu, struct fpu *new_fpu, int cpu)
 {
 	if (static_cpu_has(X86_FEATURE_FPU)) {
 		if (!fpregs_state_valid(new_fpu, cpu)) {
@@ -518,7 +518,8 @@ static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
 				copy_kernel_to_fpregs(&new_fpu->state);
 		}
 
-		fpregs_activate(new_fpu);
+		if (prev_fpu != new_fpu)
+			fpregs_activate(new_fpu);
 	}
 }
 
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 77d9eb43ccac..f8205df2df1d 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -290,7 +290,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 
 	this_cpu_write(current_task, next_p);
 
-	switch_fpu_finish(next_fpu, cpu);
+	switch_fpu_finish(prev_fpu, next_fpu, cpu);
 
 	/* Load the Intel cache allocation PQR MSR. */
 	resctrl_sched_in();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index ffea7c557963..5f153b963180 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -572,7 +572,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	this_cpu_write(current_task, next_p);
 	this_cpu_write(cpu_current_top_of_stack, task_top_of_stack(next_p));
 
-	switch_fpu_finish(next_fpu, cpu);
+	switch_fpu_finish(prev_fpu, next_fpu, cpu);
 
 	/* Reload sp0. */
 	update_task_stack(next_p);
Sebastian Andrzej Siewior Feb. 5, 2019, 6:16 p.m. UTC | #2
On 2019-01-25 16:18:40 [+0100], Borislav Petkov wrote:
> Reviewed-by: Borislav Petkov <bp@suse.de>
thanks.

> Should we do this microoptimization in addition, to save us the
> activation when the kernel thread here:
> 
> 	taskA -> kernel thread -> taskA
> 
> doesn't call kernel_fpu_begin() and thus fpu_fpregs_owner_ctx remains
> the same?

This might work now but at the end of the series this case will be
handled. The switch
	taskA -> kernel thread

will save taskA's registers. The switch
	kernel thread -> taskA

will only set TF flag to restore FPU registers on the return to
userland. The load happens only the ctx pointer is different.

> It would be a bit more correct as it won't invoke the
> trace_x86_fpu_regs_activated() TP in case the FPU context is the same.

The trace point is not wrong. As of now the same context will be loaded
again.

Sebastian

Patch
diff mbox series

diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 3d5121d2bc0bc..03acb9aeb32fc 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -560,23 +560,6 @@  static inline void switch_fpu_finish(struct fpu *new_fpu, int cpu)
 	}
 }
 
-/*
- * Needs to be preemption-safe.
- *
- * NOTE! user_fpu_begin() must be used only immediately before restoring
- * the save state. It does not do any saving/restoring on its own. In
- * lazy FPU mode, it is just an optimization to avoid a #NM exception,
- * the task can lose the FPU right after preempt_enable().
- */
-static inline void user_fpu_begin(void)
-{
-	struct fpu *fpu = &current->thread.fpu;
-
-	preempt_disable();
-	fpregs_activate(fpu);
-	preempt_enable();
-}
-
 /*
  * MXCSR and XCR definitions:
  */
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 3a4668c9d24f1..78d8037635932 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -335,10 +335,8 @@  void fpu__clear(struct fpu *fpu)
 	 * Make sure fpstate is cleared and initialized.
 	 */
 	fpu__initialize(fpu);
-	if (static_cpu_has(X86_FEATURE_FPU)) {
-		user_fpu_begin();
+	if (static_cpu_has(X86_FEATURE_FPU))
 		copy_init_fpstate_to_fpregs();
-	}
 }
 
 /*
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 047390a45e016..555c469878874 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -325,7 +325,6 @@  static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
 		 * For 64-bit frames and 32-bit fsave frames, restore the user
 		 * state to the registers directly (with exceptions handled).
 		 */
-		user_fpu_begin();
 		if (copy_user_to_fpregs_zeroing(buf_fx, xfeatures, fx_only)) {
 			fpu__clear(fpu);
 			return -1;