Message ID | 1381666503-23726-4-git-send-email-ard.biesheuvel@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sun, Oct 13, 2013 at 01:14:59PM +0100, Ard Biesheuvel wrote: > --- a/arch/arm64/kernel/fpsimd.c > +++ b/arch/arm64/kernel/fpsimd.c > @@ -72,7 +72,7 @@ void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs) > void fpsimd_thread_switch(struct task_struct *next) > { > /* check if not kernel threads */ > - if (current->mm) > + if (current->mm && !test_and_clear_thread_flag(TIF_RELOAD_FPSTATE)) > fpsimd_save_state(¤t->thread.fpsimd_state); Why does it need test_and_set_thread_flag() here? Some comments would be useful as it looks strange to check a reload flag to decide whether to save a state. Or change the name to something like 'dirty'. > if (next->mm) > fpsimd_load_state(&next->thread.fpsimd_state); This function could be optimised a bit more to avoid saving/restoring if the switch only happened between a user thread and a kernel one (and back again) since the FP state may not have been dirtied. But what I had in mind was per-CPU fpstate (possibly pointer or some flag) rather than per-thread. > --- a/arch/arm64/kernel/signal.c > +++ b/arch/arm64/kernel/signal.c > @@ -416,4 +416,6 @@ asmlinkage void do_notify_resume(struct pt_regs *regs, > clear_thread_flag(TIF_NOTIFY_RESUME); > tracehook_notify_resume(regs); > } > + if (test_and_clear_thread_flag(TIF_RELOAD_FPSTATE)) > + fpsimd_load_state(¤t->thread.fpsimd_state); I think this code can be preempted, it is run with IRQs enabled. And there is a small window where we cleared the flag but haven't loaded the state.
On 28 October 2013 11:12, Catalin Marinas <catalin.marinas@arm.com> wrote: > On Sun, Oct 13, 2013 at 01:14:59PM +0100, Ard Biesheuvel wrote: >> --- a/arch/arm64/kernel/fpsimd.c >> +++ b/arch/arm64/kernel/fpsimd.c >> @@ -72,7 +72,7 @@ void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs) >> void fpsimd_thread_switch(struct task_struct *next) >> { >> /* check if not kernel threads */ >> - if (current->mm) >> + if (current->mm && !test_and_clear_thread_flag(TIF_RELOAD_FPSTATE)) >> fpsimd_save_state(¤t->thread.fpsimd_state); > > Why does it need test_and_set_thread_flag() here? Some comments would be > useful as it looks strange to check a reload flag to decide whether to > save a state. Or change the name to something like 'dirty'. > Actually, it's test and clear. If the userland register file has already been preserved for the purpose of performing kernel mode NEON, it should not be saved again when the task gets scheduled out. The clearing could also be deferred to the time when the task gets scheduled in again. Or perhaps, it would be even better to always defer loading the userland state for the next task when that task in fact enters userland. >> if (next->mm) >> fpsimd_load_state(&next->thread.fpsimd_state); > > This function could be optimised a bit more to avoid saving/restoring if > the switch only happened between a user thread and a kernel one (and > back again) since the FP state may not have been dirtied. But what I had > in mind was per-CPU fpstate (possibly pointer or some flag) rather than > per-thread. > Well, then we are entering the realm of lazy restore, imo. There were some patches proposed for that already, I think? But I do agree that at his point, there is no need to restore the userland register contents yet, it can be deferred to the point when the task reenters userland (as mentioned above). >> --- a/arch/arm64/kernel/signal.c >> +++ b/arch/arm64/kernel/signal.c >> @@ -416,4 +416,6 @@ asmlinkage void do_noti fy_resume(struct pt_regs *regs, >> clear_thread_flag(TIF_NOTIFY_RESUME); >> tracehook_notify_resume(regs); >> } >> + if (test_and_clear_thread_flag(TIF_RELOAD_FPSTATE)) >> + fpsimd_load_state(¤t->thread.fpsimd_state); > > I think this code can be preempted, it is run with IRQs enabled. And > there is a small window where we cleared the flag but haven't loaded the > state. > If we are preempted at this point, the fpstate will be loaded in the normal way the next time this task runs, so I think this is harmless. Although I guess we may be restoring the fp state twice in that case? So in summary, what I need to do is: - rework to use a per_cpu flag rather than a TIF; - preserve the userland state (if it has one) when a task gets scheduled out; - restore the userland state when a task enters userland; I will propose an updated patch after I wrap up the work on the in_interrupt() kernel mode NEON, as these topics are really orthogonal and there is no reason to keep them combined in a single series.
On 28 Oct 2013, at 20:32, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > On 28 October 2013 11:12, Catalin Marinas <catalin.marinas@arm.com> wrote: >> On Sun, Oct 13, 2013 at 01:14:59PM +0100, Ard Biesheuvel wrote: >>> --- a/arch/arm64/kernel/fpsimd.c >>> +++ b/arch/arm64/kernel/fpsimd.c >>> @@ -72,7 +72,7 @@ void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs) >>> void fpsimd_thread_switch(struct task_struct *next) >>> { >>> /* check if not kernel threads */ >>> - if (current->mm) >>> + if (current->mm && !test_and_clear_thread_flag(TIF_RELOAD_FPSTATE)) >>> fpsimd_save_state(¤t->thread.fpsimd_state); >> >> Why does it need test_and_set_thread_flag() here? Some comments would be >> useful as it looks strange to check a reload flag to decide whether to >> save a state. Or change the name to something like 'dirty'. > > Actually, it's test and clear. Yes, just a typo. > If the userland register file has > already been preserved for the purpose of performing kernel mode NEON, > it should not be saved again when the task gets scheduled out. The > clearing could also be deferred to the time when the task gets > scheduled in again. The above should be turned into a comment in the code. > Or perhaps, it would be even better to always > defer loading the userland state for the next task when that task in > fact enters userland. It needs some more thinking, its usually cleaner in the context switching code. >>> if (next->mm) >>> fpsimd_load_state(&next->thread.fpsimd_state); >> >> This function could be optimised a bit more to avoid saving/restoring if >> the switch only happened between a user thread and a kernel one (and >> back again) since the FP state may not have been dirtied. But what I had >> in mind was per-CPU fpstate (possibly pointer or some flag) rather than >> per-thread. > > Well, then we are entering the realm of lazy restore, imo. There were > some patches proposed for that already, I think? But I do agree that > at his point, there is no need to restore the userland register > contents yet, it can be deferred to the point when the task reenters > userland (as mentioned above). Not entirely lazy. What I dont want to see (without proper benchmarks) is disabling the FP at context switch and restoring the registers lazily via the fault mechanism. What Im proposing above is not lazy, just an optimisation for a clear case where the FP is not used in a kernel thread. >>> --- a/arch/arm64/kernel/signal.c >>> +++ b/arch/arm64/kernel/signal.c >>> @@ -416,4 +416,6 @@ asmlinkage void do_noti fy_resume(struct pt_regs *regs, >>> clear_thread_flag(TIF_NOTIFY_RESUME); >>> tracehook_notify_resume(regs); >>> } >>> + if (test_and_clear_thread_flag(TIF_RELOAD_FPSTATE)) >>> + fpsimd_load_state(¤t->thread.fpsimd_state); >> >> I think this code can be preempted, it is run with IRQs enabled. And >> there is a small window where we cleared the flag but haven't loaded the >> state. > > If we are preempted at this point, the fpstate will be loaded in the > normal way the next time this task runs, so I think this is harmless. > Although I guess we may be restoring the fp state twice in that case? Lets say task A does an svc and gets into kernel mode followed by kernel_neon_begin/end(). Before returning to user, the kernel runs the above test_and_clear_thread_flag(). Immediately after, an interrupt happens the task A is preempted. The fpsimd_thread_switch() function finds that TIF_RELOAD_FPSTATE is cleared and save the current FP state. But the FP regs contain whatever the kernel neon code did, so you corrupt the existing data. > So in summary, what I need to do is: > - rework to use a per_cpu flag rather than a TIF; > - preserve the userland state (if it has one) when a task gets scheduled out; > - restore the userland state when a task enters userland; Happy to discuss the algorithm before you code it (unless you prefer to write the code quickly). > I will propose an updated patch after I wrap up the work on the > in_interrupt() kernel mode NEON, as these topics are really orthogonal > and there is no reason to keep them combined in a single series. Sounds fine. Catalin
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h index 23a3c47..3bdeab6 100644 --- a/arch/arm64/include/asm/thread_info.h +++ b/arch/arm64/include/asm/thread_info.h @@ -106,6 +106,7 @@ static inline struct thread_info *current_thread_info(void) #define TIF_SIGPENDING 0 #define TIF_NEED_RESCHED 1 #define TIF_NOTIFY_RESUME 2 /* callback before returning to user */ +#define TIF_RELOAD_FPSTATE 3 /* user FPSIMD context saved to mem */ #define TIF_SYSCALL_TRACE 8 #define TIF_POLLING_NRFLAG 16 #define TIF_MEMDIE 18 /* is terminating due to OOM killer */ @@ -118,10 +119,11 @@ static inline struct thread_info *current_thread_info(void) #define _TIF_SIGPENDING (1 << TIF_SIGPENDING) #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED) #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME) +#define _TIF_RELOAD_FPSTATE (1 << TIF_RELOAD_FPSTATE) #define _TIF_32BIT (1 << TIF_32BIT) #define _TIF_WORK_MASK (_TIF_NEED_RESCHED | _TIF_SIGPENDING | \ - _TIF_NOTIFY_RESUME) + _TIF_NOTIFY_RESUME | _TIF_RELOAD_FPSTATE) #endif /* __KERNEL__ */ #endif /* __ASM_THREAD_INFO_H */ diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index 3881fd1..2c6c7fb 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -589,7 +589,7 @@ fast_work_pending: str x0, [sp, #S_X0] // returned x0 work_pending: tbnz x1, #TIF_NEED_RESCHED, work_resched - /* TIF_SIGPENDING or TIF_NOTIFY_RESUME case */ + /* TIF_SIGPENDING/TIF_NOTIFY_RESUME/TIF_RELOAD_FPSTATE case */ ldr x2, [sp, #S_PSTATE] mov x0, sp // 'regs' tst x2, #PSR_MODE_MASK // user mode regs? diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 1f2e4d5..a52affd 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -72,7 +72,7 @@ void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs) void fpsimd_thread_switch(struct task_struct *next) { /* check if not kernel threads */ - if (current->mm) + if (current->mm && !test_and_clear_thread_flag(TIF_RELOAD_FPSTATE)) fpsimd_save_state(¤t->thread.fpsimd_state); if (next->mm) fpsimd_load_state(&next->thread.fpsimd_state); @@ -95,16 +95,13 @@ void kernel_neon_begin(void) BUG_ON(in_interrupt()); preempt_disable(); - if (current->mm) + if (current->mm && !test_and_set_thread_flag(TIF_RELOAD_FPSTATE)) fpsimd_save_state(¤t->thread.fpsimd_state); } EXPORT_SYMBOL(kernel_neon_begin); void kernel_neon_end(void) { - if (current->mm) - fpsimd_load_state(¤t->thread.fpsimd_state); - preempt_enable(); } EXPORT_SYMBOL(kernel_neon_end); diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index 890a591..da3a433 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -416,4 +416,6 @@ asmlinkage void do_notify_resume(struct pt_regs *regs, clear_thread_flag(TIF_NOTIFY_RESUME); tracehook_notify_resume(regs); } + if (test_and_clear_thread_flag(TIF_RELOAD_FPSTATE)) + fpsimd_load_state(¤t->thread.fpsimd_state); }
Modify kernel_neon_begin() and kernel_neon_end() so subsequent calls don't need to preserve/restore the userland FPSIMD state if the task has not entered userland in the mean time. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> --- arch/arm64/include/asm/thread_info.h | 4 +++- arch/arm64/kernel/entry.S | 2 +- arch/arm64/kernel/fpsimd.c | 7 ++----- arch/arm64/kernel/signal.c | 2 ++ 4 files changed, 8 insertions(+), 7 deletions(-)