diff mbox

[RFC,v3,3/7] ARM64: defer reloading a task's FPSIMD state to userland resume

Message ID 1381666503-23726-4-git-send-email-ard.biesheuvel@linaro.org (mailing list archive)
State New, archived
Headers show

Commit Message

Ard Biesheuvel Oct. 13, 2013, 12:14 p.m. UTC
Modify kernel_neon_begin() and kernel_neon_end() so subsequent calls
don't need to preserve/restore the userland FPSIMD state if the task
has not entered userland in the mean time.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/thread_info.h | 4 +++-
 arch/arm64/kernel/entry.S            | 2 +-
 arch/arm64/kernel/fpsimd.c           | 7 ++-----
 arch/arm64/kernel/signal.c           | 2 ++
 4 files changed, 8 insertions(+), 7 deletions(-)

Comments

Catalin Marinas Oct. 28, 2013, 6:12 p.m. UTC | #1
On Sun, Oct 13, 2013 at 01:14:59PM +0100, Ard Biesheuvel wrote:
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -72,7 +72,7 @@ void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
>  void fpsimd_thread_switch(struct task_struct *next)
>  {
>  	/* check if not kernel threads */
> -	if (current->mm)
> +	if (current->mm && !test_and_clear_thread_flag(TIF_RELOAD_FPSTATE))
>  		fpsimd_save_state(&current->thread.fpsimd_state);

Why does it need test_and_set_thread_flag() here? Some comments would be
useful as it looks strange to check a reload flag to decide whether to
save a state. Or change the name to something like 'dirty'.

>  	if (next->mm)
>  		fpsimd_load_state(&next->thread.fpsimd_state);

This function could be optimised a bit more to avoid saving/restoring if
the switch only happened between a user thread and a kernel one (and
back again) since the FP state may not have been dirtied. But what I had
in mind was per-CPU fpstate (possibly pointer or some flag) rather than
per-thread.

> --- a/arch/arm64/kernel/signal.c
> +++ b/arch/arm64/kernel/signal.c
> @@ -416,4 +416,6 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
>  		clear_thread_flag(TIF_NOTIFY_RESUME);
>  		tracehook_notify_resume(regs);
>  	}
> +	if (test_and_clear_thread_flag(TIF_RELOAD_FPSTATE))
> +		fpsimd_load_state(&current->thread.fpsimd_state);

I think this code can be preempted, it is run with IRQs enabled. And
there is a small window where we cleared the flag but haven't loaded the
state.
Ard Biesheuvel Oct. 28, 2013, 8:32 p.m. UTC | #2
On 28 October 2013 11:12, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Sun, Oct 13, 2013 at 01:14:59PM +0100, Ard Biesheuvel wrote:
>> --- a/arch/arm64/kernel/fpsimd.c
>> +++ b/arch/arm64/kernel/fpsimd.c
>> @@ -72,7 +72,7 @@ void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
>>  void fpsimd_thread_switch(struct task_struct *next)
>>  {
>>       /* check if not kernel threads */
>> -     if (current->mm)
>> +     if (current->mm && !test_and_clear_thread_flag(TIF_RELOAD_FPSTATE))
>>               fpsimd_save_state(&current->thread.fpsimd_state);
>
> Why does it need test_and_set_thread_flag() here? Some comments would be
> useful as it looks strange to check a reload flag to decide whether to
> save a state. Or change the name to something like 'dirty'.
>

Actually, it's test and clear. If the userland register file has
already been preserved for the purpose of performing kernel mode NEON,
it should not be saved again when the task gets scheduled out. The
clearing could also be deferred to the time when the task gets
scheduled in again. Or perhaps, it would be even better to always
defer loading the userland state for the next task when that task in
fact enters userland.

>>       if (next->mm)
>>               fpsimd_load_state(&next->thread.fpsimd_state);
>
> This function could be optimised a bit more to avoid saving/restoring if
> the switch only happened between a user thread and a kernel one (and
> back again) since the FP state may not have been dirtied. But what I had
> in mind was per-CPU fpstate (possibly pointer or some flag) rather than
> per-thread.
>

Well, then we are entering the realm of lazy restore, imo. There were
some patches proposed for that already, I think? But I do agree that
at his point, there is no need to restore the userland register
contents yet, it can be deferred to the point when the task reenters
userland (as mentioned above).

>> --- a/arch/arm64/kernel/signal.c
>> +++ b/arch/arm64/kernel/signal.c
>> @@ -416,4 +416,6 @@ asmlinkage void do_noti fy_resume(struct pt_regs *regs,
>>               clear_thread_flag(TIF_NOTIFY_RESUME);
>>               tracehook_notify_resume(regs);
>>       }
>> +     if (test_and_clear_thread_flag(TIF_RELOAD_FPSTATE))
>> +             fpsimd_load_state(&current->thread.fpsimd_state);
>
> I think this code can be preempted, it is run with IRQs enabled. And
> there is a small window where we cleared the flag but haven't loaded the
> state.
>

If we are preempted at this point, the fpstate will be loaded in the
normal way the next time this task runs, so I think this is harmless.
Although I guess we may be restoring the fp state twice in that case?

So in summary, what I need to do is:
- rework to use a per_cpu flag rather than a TIF;
- preserve the userland state (if it has one) when a task gets scheduled out;
- restore the userland state when a task enters userland;

I will propose an updated patch after I wrap up the work on the
in_interrupt() kernel mode NEON, as these topics are really orthogonal
and there is no reason to keep them combined in a single series.
Catalin Marinas Oct. 28, 2013, 10:29 p.m. UTC | #3
On 28 Oct 2013, at 20:32, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 28 October 2013 11:12, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> On Sun, Oct 13, 2013 at 01:14:59PM +0100, Ard Biesheuvel wrote:
>>> --- a/arch/arm64/kernel/fpsimd.c
>>> +++ b/arch/arm64/kernel/fpsimd.c
>>> @@ -72,7 +72,7 @@ void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
>>> void fpsimd_thread_switch(struct task_struct *next)
>>> {
>>>      /* check if not kernel threads */
>>> -     if (current->mm)
>>> +     if (current->mm && !test_and_clear_thread_flag(TIF_RELOAD_FPSTATE))
>>>              fpsimd_save_state(&current->thread.fpsimd_state);
>> 
>> Why does it need test_and_set_thread_flag() here? Some comments would be
>> useful as it looks strange to check a reload flag to decide whether to
>> save a state. Or change the name to something like 'dirty'.
> 
> Actually, it's test and clear.

Yes, just a typo.

> If the userland register file has
> already been preserved for the purpose of performing kernel mode NEON,
> it should not be saved again when the task gets scheduled out. The
> clearing could also be deferred to the time when the task gets
> scheduled in again.

The above should be turned into a comment in the code.

> Or perhaps, it would be even better to always
> defer loading the userland state for the next task when that task in
> fact enters userland.

It needs some more thinking, its usually cleaner in the context
switching code.

>>>      if (next->mm)
>>>              fpsimd_load_state(&next->thread.fpsimd_state);
>> 
>> This function could be optimised a bit more to avoid saving/restoring if
>> the switch only happened between a user thread and a kernel one (and
>> back again) since the FP state may not have been dirtied. But what I had
>> in mind was per-CPU fpstate (possibly pointer or some flag) rather than
>> per-thread.
> 
> Well, then we are entering the realm of lazy restore, imo. There were
> some patches proposed for that already, I think? But I do agree that
> at his point, there is no need to restore the userland register
> contents yet, it can be deferred to the point when the task reenters
> userland (as mentioned above).

Not entirely lazy.  What I dont want to see (without proper benchmarks)
is disabling the FP at context switch and restoring the registers lazily
via the fault mechanism.  What Im proposing above is not lazy, just an
optimisation for a clear case where the FP is not used in a kernel
thread.

>>> --- a/arch/arm64/kernel/signal.c
>>> +++ b/arch/arm64/kernel/signal.c
>>> @@ -416,4 +416,6 @@ asmlinkage void do_noti fy_resume(struct pt_regs *regs,
>>>              clear_thread_flag(TIF_NOTIFY_RESUME);
>>>              tracehook_notify_resume(regs);
>>>      }
>>> +     if (test_and_clear_thread_flag(TIF_RELOAD_FPSTATE))
>>> +             fpsimd_load_state(&current->thread.fpsimd_state);
>> 
>> I think this code can be preempted, it is run with IRQs enabled. And
>> there is a small window where we cleared the flag but haven't loaded the
>> state.
> 
> If we are preempted at this point, the fpstate will be loaded in the
> normal way the next time this task runs, so I think this is harmless.
> Although I guess we may be restoring the fp state twice in that case?

Lets say task A does an svc and gets into kernel mode followed by
kernel_neon_begin/end().  Before returning to user, the kernel runs the
above test_and_clear_thread_flag().  Immediately after, an interrupt
happens the task A is preempted.  The fpsimd_thread_switch() function
finds that TIF_RELOAD_FPSTATE is cleared and save the current FP state.
But the FP regs contain whatever the kernel neon code did, so you
corrupt the existing data.

> So in summary, what I need to do is:
> - rework to use a per_cpu flag rather than a TIF;
> - preserve the userland state (if it has one) when a task gets scheduled out;
> - restore the userland state when a task enters userland;

Happy to discuss the algorithm before you code it (unless you prefer to
write the code quickly).

> I will propose an updated patch after I wrap up the work on the
> in_interrupt() kernel mode NEON, as these topics are really orthogonal
> and there is no reason to keep them combined in a single series.

Sounds fine.

Catalin
diff mbox

Patch

diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 23a3c47..3bdeab6 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -106,6 +106,7 @@  static inline struct thread_info *current_thread_info(void)
 #define TIF_SIGPENDING		0
 #define TIF_NEED_RESCHED	1
 #define TIF_NOTIFY_RESUME	2	/* callback before returning to user */
+#define TIF_RELOAD_FPSTATE	3	/* user FPSIMD context saved to mem */
 #define TIF_SYSCALL_TRACE	8
 #define TIF_POLLING_NRFLAG	16
 #define TIF_MEMDIE		18	/* is terminating due to OOM killer */
@@ -118,10 +119,11 @@  static inline struct thread_info *current_thread_info(void)
 #define _TIF_SIGPENDING		(1 << TIF_SIGPENDING)
 #define _TIF_NEED_RESCHED	(1 << TIF_NEED_RESCHED)
 #define _TIF_NOTIFY_RESUME	(1 << TIF_NOTIFY_RESUME)
+#define _TIF_RELOAD_FPSTATE	(1 << TIF_RELOAD_FPSTATE)
 #define _TIF_32BIT		(1 << TIF_32BIT)
 
 #define _TIF_WORK_MASK		(_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
-				 _TIF_NOTIFY_RESUME)
+				 _TIF_NOTIFY_RESUME | _TIF_RELOAD_FPSTATE)
 
 #endif /* __KERNEL__ */
 #endif /* __ASM_THREAD_INFO_H */
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 3881fd1..2c6c7fb 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -589,7 +589,7 @@  fast_work_pending:
 	str	x0, [sp, #S_X0]			// returned x0
 work_pending:
 	tbnz	x1, #TIF_NEED_RESCHED, work_resched
-	/* TIF_SIGPENDING or TIF_NOTIFY_RESUME case */
+	/* TIF_SIGPENDING/TIF_NOTIFY_RESUME/TIF_RELOAD_FPSTATE case */
 	ldr	x2, [sp, #S_PSTATE]
 	mov	x0, sp				// 'regs'
 	tst	x2, #PSR_MODE_MASK		// user mode regs?
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 1f2e4d5..a52affd 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -72,7 +72,7 @@  void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs)
 void fpsimd_thread_switch(struct task_struct *next)
 {
 	/* check if not kernel threads */
-	if (current->mm)
+	if (current->mm && !test_and_clear_thread_flag(TIF_RELOAD_FPSTATE))
 		fpsimd_save_state(&current->thread.fpsimd_state);
 	if (next->mm)
 		fpsimd_load_state(&next->thread.fpsimd_state);
@@ -95,16 +95,13 @@  void kernel_neon_begin(void)
 	BUG_ON(in_interrupt());
 	preempt_disable();
 
-	if (current->mm)
+	if (current->mm && !test_and_set_thread_flag(TIF_RELOAD_FPSTATE))
 		fpsimd_save_state(&current->thread.fpsimd_state);
 }
 EXPORT_SYMBOL(kernel_neon_begin);
 
 void kernel_neon_end(void)
 {
-	if (current->mm)
-		fpsimd_load_state(&current->thread.fpsimd_state);
-
 	preempt_enable();
 }
 EXPORT_SYMBOL(kernel_neon_end);
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 890a591..da3a433 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -416,4 +416,6 @@  asmlinkage void do_notify_resume(struct pt_regs *regs,
 		clear_thread_flag(TIF_NOTIFY_RESUME);
 		tracehook_notify_resume(regs);
 	}
+	if (test_and_clear_thread_flag(TIF_RELOAD_FPSTATE))
+		fpsimd_load_state(&current->thread.fpsimd_state);
 }