Message ID | 20200605073052.23044-1-wooy88.kim@samsung.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: fpsimd: Added API to manage fpsimd state inside kernel | expand |
Hi Wooyeon, There are a *lot* of people Cc' here, many of whomo will find this irrelevant. Please try to keep the Cc list constrained to a reasonable number of interested parties. On Fri, Jun 05, 2020 at 04:30:52PM +0900, Wooyeon Kim wrote: > From: Wooki Min <wooki.min@samsung.com> > > This is an patch to use FPSIMD register in Kernel space. > It need to manage to use FPSIMD register without damaging it > of the user task. > Following items have been implemented and added. Please introduce the problem you are trying to solve in more detail. We already have kernel_neon_{begin,end}() for kernel-mode NEON; why is that not sufficient for your needs? Please answer this before considering other details. What do you want to use this for? > > 1. Using FPSIMD in ISR (in_interrupt) > It can used __efi_fpsimd_begin/__efi_fpsimd_end > which is already implemented. > Save fpsimd state before entering ISR, > and restore fpsimd state after ISR ends. > For use in external kernel module, > it is declared as EXPORT_SYMBOL. This patch adds no in-tree modular users of this, so per the usual conventions, NAK to EXPORT_SYMBOL(). Thanks, Mark.
On Fri, Jun 05, 2020 at 11:37:05AM +0100, Mark Rutland wrote: > Hi Wooyeon, > > There are a *lot* of people Cc' here, many of whomo will find this > irrelevant. Please try to keep the Cc list constrained to a reasonable > number of interested parties. > > On Fri, Jun 05, 2020 at 04:30:52PM +0900, Wooyeon Kim wrote: > > From: Wooki Min <wooki.min@samsung.com> > > > > This is an patch to use FPSIMD register in Kernel space. > > It need to manage to use FPSIMD register without damaging it > > of the user task. > > Following items have been implemented and added. > > Please introduce the problem you are trying to solve in more detail. We > already have kernel_neon_{begin,end}() for kernel-mode NEON; why is that > not sufficient for your needs? Please answer this before considering > other details. > > What do you want to use this for? > > > > > 1. Using FPSIMD in ISR (in_interrupt) > > It can used __efi_fpsimd_begin/__efi_fpsimd_end > > which is already implemented. > > Save fpsimd state before entering ISR, > > and restore fpsimd state after ISR ends. > > For use in external kernel module, > > it is declared as EXPORT_SYMBOL. > > This patch adds no in-tree modular users of this, so per the usual > conventions, NAK to EXPORT_SYMBOL(). Ack, this looks supicious. Can you explain why your usecase _requires_ FPSIMD in hardirq context? For now, these functions are strictly for EFI use only and should never be used by modules. Cheers ---Dave
Dear Dave Martin & Mark Rutland Thank you for your kind answer. > On Fri, Jun 05, 2020 at 11:37:05AM +0100, Mark Rutland wrote: > > Hi Wooyeon, > > > > There are a *lot* of people Cc' here, many of whomo will find this > > irrelevant. Please try to keep the Cc list constrained to a reasonable > > number of interested parties. I have adjusted the mailing list according to your opinion. > > Please introduce the problem you are trying to solve in more detail. > > We already have kernel_neon_{begin,end}() for kernel-mode NEON; why is > > that not sufficient for your needs? Please answer this before > > considering other details. > > > > What do you want to use this for? > Ack, this looks supicious. Can you explain why your usecase _requires_ > FPSIMD in hardirq context? > > For now, these functions are strictly for EFI use only and should never be > used by modules. I am in charge of camera driver development in Samsung S.LSI division. In order to guarantee real time processing such as Camera 3A algorithm in current or ongoing projects, prebuilt binary is loaded and used in kernel space, rather than user space. Because the binary is built with other standard library which could use FPSIMD register, kernel API should keep the original FPSIMD state for other user tasks. It is mostly used for internal kernel operation including hardirq context. (ex> hardIRQ context, kernel API called by user, kernel task) In the case of the kernel_neon_begin / kernel_neon_end that you mentioned, there is a limitation that cannot be used in hardirq context. Also, if another kernel task switching occurs while kernel API is being used, fpsimd register corruption may occur. In addition to kernel_neon_* API, It was necessary to add and utilize API considering kernel context. It is for this reason that I have tried to utilize efi_fpsimd_begin/end. Regards Wooyeon Kim
(Previous Mail account information is broken and send again. Sorry for inconvenience) Dear Dave Martin & Mark Rutland Thank you for your kind answer. > On Fri, Jun 05, 2020 at 11:37:05AM +0100, Mark Rutland wrote: > > Hi Wooyeon, > > > > There are a *lot* of people Cc' here, many of whomo will find this > > irrelevant. Please try to keep the Cc list constrained to a > > reasonable number of interested parties. I have adjusted the mailing list according to your opinion. > > Please introduce the problem you are trying to solve in more detail. > > We already have kernel_neon_{begin,end}() for kernel-mode NEON; why > > is that not sufficient for your needs? Please answer this before > > considering other details. > > > > What do you want to use this for? > Ack, this looks supicious. Can you explain why your usecase > _requires_ FPSIMD in hardirq context? > > For now, these functions are strictly for EFI use only and should > never be used by modules. I am in charge of camera driver development in Samsung S.LSI division. In order to guarantee real time processing such as Camera 3A algorithm in current or ongoing projects, prebuilt binary is loaded and used in kernel space, rather than user space. Because the binary is built with other standard library which could use FPSIMD register, kernel API should keep the original FPSIMD state for other user tasks. It is mostly used for internal kernel operation including hardirq context. (ex> hardIRQ context, kernel API called by user, kernel task) In the case of the kernel_neon_begin / kernel_neon_end that you mentioned, there is a limitation that cannot be used in hardirq context. Also, if another kernel task switching occurs while kernel API is being used, fpsimd register corruption may occur. In addition to kernel_neon_* API, It was necessary to add and utilize API considering kernel context. It is for this reason that I have tried to utilize efi_fpsimd_begin/end. Regards Wooyeon Kim
On Thu, Jun 11, 2020 at 06:17:32PM +0900, ??? wrote: > > On Fri, Jun 05, 2020 at 11:37:05AM +0100, Mark Rutland wrote: > > > Please introduce the problem you are trying to solve in more detail. > > > We already have kernel_neon_{begin,end}() for kernel-mode NEON; why is > > > that not sufficient for your needs? Please answer this before > > > considering other details. > > > > > > What do you want to use this for? > > > Ack, this looks supicious. Can you explain why your usecase _requires_ > > FPSIMD in hardirq context? > > > > For now, these functions are strictly for EFI use only and should never be > > used by modules. > > I am in charge of camera driver development in Samsung S.LSI division. > > In order to guarantee real time processing > such as Camera 3A algorithm in current or ongoing projects, > prebuilt binary is loaded and used in kernel space, rather than user space. > Because the binary is built with other standard library which could use > FPSIMD register, > kernel API should keep the original FPSIMD state for other user tasks. > It is mostly used for internal kernel operation including hardirq context. > (ex> hardIRQ context, kernel API called by user, kernel task) That sounds incredibly dodgy to me, both from a correctness perspective and a licensing perspective. Upstream doesn't support out-of-tree modules, nor does upstream support binary blobs within the kernel, so the above cannot justify this change to the kernel. If you wish to do such processing within the kernel, I think you'll need to post a more complete in-tree solution for inclusion in mainline. However, I suspect that it will be difficult to justify NEON in hardirq context given preempt rt and friends. Thanks, Mark.
On Thu, Jun 11, 2020 at 06:42:12PM +0900, Wooyeon Kim wrote: > I am in charge of camera driver development in Samsung S.LSI division. > > In order to guarantee real time processing such as Camera 3A algorithm in > current or ongoing projects, prebuilt binary is loaded and used in kernel > space, rather than user space. Thanks for the additional details. If you do such intensive processing in an IRQ context you'd probably introduce additional IRQ latency. Wouldn't offloading such work to a real-time (user) thread help? In a non-preempt-rt kernel, I don't think you can get much in terms of (soft) guarantees for IRQ latency anyway. > Because the binary is built with other standard library which could use > FPSIMD register, kernel API should keep the original FPSIMD state for other > user tasks. Can you not recompile those libraries not to use FP? As Mark said, for a kernel API we require at least an in-kernel, upstreamed, user of that functionality. > In the case of the kernel_neon_begin / kernel_neon_end that you mentioned, > there is a limitation that cannot be used in hardirq context. > Also, if another kernel task switching occurs while kernel API is being > used, fpsimd register corruption may occur. kernel_neon_begin/end disable preemption, so you can't have a task switch (you can have interrupts though but we don't allow FPSIMD in IRQ context).
On Thu, Jun 11, 2020 at 03:11:02PM +0100, Catalin Marinas wrote: > On Thu, Jun 11, 2020 at 06:42:12PM +0900, Wooyeon Kim wrote: > > I am in charge of camera driver development in Samsung S.LSI division. > > > > In order to guarantee real time processing such as Camera 3A algorithm in > > current or ongoing projects, prebuilt binary is loaded and used in kernel > > space, rather than user space. > > Thanks for the additional details. I have to ask: there are other camera drivers in existence already. What makes your hardware so different that it requires all this data processing to be done inside the kernel? > If you do such intensive processing in an IRQ context you'd probably > introduce additional IRQ latency. Wouldn't offloading such work to a > real-time (user) thread help? In a non-preempt-rt kernel, I don't think > you can get much in terms of (soft) guarantees for IRQ latency anyway. > > > Because the binary is built with other standard library which could use > > FPSIMD register, kernel API should keep the original FPSIMD state for other > > user tasks. > > Can you not recompile those libraries not to use FP? > > As Mark said, for a kernel API we require at least an in-kernel, > upstreamed, user of that functionality. > > > In the case of the kernel_neon_begin / kernel_neon_end that you mentioned, > > there is a limitation that cannot be used in hardirq context. > > Also, if another kernel task switching occurs while kernel API is being > > used, fpsimd register corruption may occur. > > kernel_neon_begin/end disable preemption, so you can't have a task > switch (you can have interrupts though but we don't allow FPSIMD in IRQ > context). Note, the decision not to support kernel_neon_begin / kernel_neon_end in hardirq context was deliberate. hardirq handlers shouldn't usually do anything at all except ensure that something responds to the hardware event, by waking some other thread or scheduling a workqueue item for example. An IRQ handler that only does that has no need to do any data processing, and gains no advantage from using FPSIMD. Doing additional work in hardirq context will harm interrupt latency for the rest of the system. So, you should move the data processing work out of the hardirq handler. Is there a reason why this is not possible? Secondly, there is the question of whether FPSIMD can be used by kernel threads. Currently this is only supported in a limited way. Again, this a deliberate decision, for now. Can you split the processing work into small blocks using kernel_neon_begin/kernel_neon_end, similarly to the arm64 crypto drivers? This is the current accepted way of doing number crunching inside the kernel without harming preemption latency too much. Even so, it's primarily intended for things that affect critical paths inside the kernel, such as crypto or checksumming in the filesysem and network subsystems. Cheers ---Dave
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index 59f10dd13f12..462c434fc57c 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -167,6 +167,12 @@ static inline void sve_setup(void) { } extern void __efi_fpsimd_begin(void); extern void __efi_fpsimd_end(void); +void fpsimd_set_task_using(struct task_struct *t); +void fpsimd_clr_task_using(struct task_struct *t); + +void fpsimd_get(void); +void fpsimd_put(void); + #endif #endif diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h index 240fe5e5b720..265669456bcb 100644 --- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -139,6 +139,19 @@ struct thread_struct { unsigned long tp2_value; struct user_fpsimd_state fpsimd_state; } uw; + struct fpsimd_kernel_state fpsimd_kernel_state; + + /* + * indicate the depth of using FP/SIMD registers in kernel mode. + * above kernel state should be preserved at first time + * before FP/SIMD registers be used by other tasks + * and the state should be restored before they be used by own. + * + * a kernel thread which uses FP/SIMD registers have to + * set this depth and it could utilize for a tasks executes + * some NEON instructions without preemption disable. + */ + atomic_t fpsimd_kernel_depth; unsigned int fpsimd_cpu; void *sve_state; /* SVE registers, if any */ diff --git a/arch/arm64/include/uapi/asm/ptrace.h b/arch/arm64/include/uapi/asm/ptrace.h index 42cbe34d95ce..0327e719721e 100644 --- a/arch/arm64/include/uapi/asm/ptrace.h +++ b/arch/arm64/include/uapi/asm/ptrace.h @@ -105,6 +105,14 @@ struct user_hwdebug_state { } dbg_regs[16]; }; +/* Kernel structure for floating point */ +struct fpsimd_kernel_state { + __uint128_t vregs[32]; + __u32 fpsr; + __u32 fpcr; + unsigned int cpu; +}; + /* SVE/FP/SIMD state (NT_ARM_SVE) */ struct user_sve_header { diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 35cb5e66c504..07597423fcfc 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -269,9 +269,6 @@ static void sve_free(struct task_struct *task) */ static void task_fpsimd_load(void) { - WARN_ON(!system_supports_fpsimd()); - WARN_ON(!have_cpu_fpsimd_context()); - if (system_supports_sve() && test_thread_flag(TIF_SVE)) sve_load_state(sve_pffr(¤t->thread), ¤t->thread.uw.fpsimd_state.fpsr, @@ -290,9 +287,6 @@ static void fpsimd_save(void) this_cpu_ptr(&fpsimd_last_state); /* set by fpsimd_bind_task_to_cpu() or fpsimd_bind_state_to_cpu() */ - WARN_ON(!system_supports_fpsimd()); - WARN_ON(!have_cpu_fpsimd_context()); - if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) { if (system_supports_sve() && test_thread_flag(TIF_SVE)) { if (WARN_ON(sve_get_vl() != last->sve_vl)) { @@ -982,6 +976,10 @@ void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs) void fpsimd_thread_switch(struct task_struct *next) { bool wrong_task, wrong_cpu; + struct fpsimd_kernel_state *cur_kst + = ¤t->thread.fpsimd_kernel_state; + struct fpsimd_kernel_state *nxt_kst + = &next->thread.fpsimd_kernel_state; if (!system_supports_fpsimd()) return; @@ -991,6 +989,16 @@ void fpsimd_thread_switch(struct task_struct *next) /* Save unsaved fpsimd state, if any: */ fpsimd_save(); + if (atomic_read(¤t->thread.fpsimd_kernel_depth)) + fpsimd_save_state((struct user_fpsimd_state *)cur_kst); + + if (atomic_read(&next->thread.fpsimd_kernel_depth)) { + fpsimd_load_state((struct user_fpsimd_state *)nxt_kst); + this_cpu_write(fpsimd_last_state.st, + (struct user_fpsimd_state *)nxt_kst); + nxt_kst->cpu = smp_processor_id(); + } + /* * Fix up TIF_FOREIGN_FPSTATE to correctly describe next's * state. For kernel threads, FPSIMD registers are never loaded @@ -1233,6 +1241,55 @@ void fpsimd_save_and_flush_cpu_state(void) __put_cpu_fpsimd_context(); } +void fpsimd_set_task_using(struct task_struct *t) +{ + atomic_set(&t->thread.fpsimd_kernel_depth, 1); +} +EXPORT_SYMBOL(fpsimd_set_task_using); + +void fpsimd_clr_task_using(struct task_struct *t) +{ + atomic_set(&t->thread.fpsimd_kernel_depth, 0); +} +EXPORT_SYMBOL(fpsimd_clr_task_using); + +void fpsimd_get(void) +{ + if (in_interrupt()) + return; + + if (atomic_inc_return(¤t->thread.fpsimd_kernel_depth) == 1) { + preempt_disable(); + if (current->mm) { + fpsimd_save(); + fpsimd_flush_task_state(current); + } + fpsimd_flush_cpu_state(); + preempt_enable(); + } +} +EXPORT_SYMBOL(fpsimd_get); + +void fpsimd_put(void) +{ + if (in_interrupt()) + return; + + WARN_ON(atomic_dec_return( + ¤t->thread.fpsimd_kernel_depth) < 0); + + if (atomic_read(¤t->thread.fpsimd_kernel_depth) == 0) { + preempt_disable(); + if (current->mm && test_thread_flag(TIF_FOREIGN_FPSTATE)) { + task_fpsimd_load(); + fpsimd_bind_task_to_cpu(); + clear_thread_flag(TIF_FOREIGN_FPSTATE); + } + preempt_enable(); + } +} +EXPORT_SYMBOL(fpsimd_put); + #ifdef CONFIG_KERNEL_MODE_NEON /* @@ -1338,6 +1395,7 @@ void __efi_fpsimd_begin(void) __this_cpu_write(efi_fpsimd_state_used, true); } } +EXPORT_SYMBOL(__efi_fpsimd_begin); /* * __efi_fpsimd_end(): clean up FPSIMD after an EFI runtime services call @@ -1364,6 +1422,7 @@ void __efi_fpsimd_end(void) } } } +EXPORT_SYMBOL(__efi_fpsimd_end); #endif /* CONFIG_EFI */