Message ID | 20210511160446.42871-4-broonie@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64/sve: Trivial optimisation for 128 bit SVE vectors | expand |
On Tue, May 11, 2021 at 05:04:46PM +0100, Mark Brown wrote: > When the SVE vector length is 128 bits then there are no bits in the Z > registers which are not shared with the V registers so we can skip them > when zeroing state not shared with FPSIMD, this results in a minor > performance improvement. > > Signed-off-by: Mark Brown <broonie@kernel.org> > --- > arch/arm64/include/asm/fpsimd.h | 2 +- > arch/arm64/kernel/entry-fpsimd.S | 9 +++++++-- > arch/arm64/kernel/fpsimd.c | 6 ++++-- > 3 files changed, 12 insertions(+), 5 deletions(-) > > diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h > index 2599504674b5..c072161d5c65 100644 > --- a/arch/arm64/include/asm/fpsimd.h > +++ b/arch/arm64/include/asm/fpsimd.h > @@ -69,7 +69,7 @@ static inline void *sve_pffr(struct thread_struct *thread) > extern void sve_save_state(void *state, u32 *pfpsr); > extern void sve_load_state(void const *state, u32 const *pfpsr, > unsigned long vq_minus_1); > -extern void sve_flush_live(void); > +extern void sve_flush_live(unsigned long vq_minus_1); > extern void sve_load_from_fpsimd_state(struct user_fpsimd_state const *state, > unsigned long vq_minus_1); > extern unsigned int sve_get_vl(void); > diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S > index dd8382e5ce82..87ef25836963 100644 > --- a/arch/arm64/kernel/entry-fpsimd.S > +++ b/arch/arm64/kernel/entry-fpsimd.S > @@ -69,10 +69,15 @@ SYM_FUNC_START(sve_load_from_fpsimd_state) > ret > SYM_FUNC_END(sve_load_from_fpsimd_state) > > -/* Zero all SVE registers but the first 128-bits of each vector */ > +/* > + * Zero all SVE registers but the first 128-bits of each vector > + * > + * x0 = VQ - 1 This does require that ZCR_EL1.LEN has already been set to match x0, and is not changed again before entering userspace. It would be a good idea to at least describe this in a comment so that this doesn't get forgotten later on, but there's a limit to how foolproof this low-level backend code needs to be... > + */ > SYM_FUNC_START(sve_flush_live) > + cbz x0, 1f // A VQ-1 of 0 is 128 bits so no extra Z state > sve_flush_z > - sve_flush_p_ffr > +1: sve_flush_p_ffr > ret > SYM_FUNC_END(sve_flush_live) > > diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c > index ad3dd34a83cf..e57b23f95284 100644 > --- a/arch/arm64/kernel/fpsimd.c > +++ b/arch/arm64/kernel/fpsimd.c > @@ -957,8 +957,10 @@ void do_sve_acc(unsigned int esr, struct pt_regs *regs) > * disabling the trap, otherwise update our in-memory copy. > */ > if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) { > - sve_set_vq(sve_vq_from_vl(current->thread.sve_vl) - 1); > - sve_flush_live(); > + unsigned long vq_minus_one = > + sve_vq_from_vl(current->thread.sve_vl) - 1; > + sve_set_vq(vq_minus_one); > + sve_flush_live(vq_minus_one); > fpsimd_bind_task_to_cpu(); > } else { > fpsimd_to_sve(current); > -- > 2.20.1 With a comment added as outlined above, Reviewed-by: Dave Martin <Dave.Martin@arm.com> Cheers ---Dave
On Wed, May 12, 2021 at 02:49:09PM +0100, Dave Martin wrote: > On Tue, May 11, 2021 at 05:04:46PM +0100, Mark Brown wrote: > > +/* > > + * Zero all SVE registers but the first 128-bits of each vector > > + * > > + * x0 = VQ - 1 > This does require that ZCR_EL1.LEN has already been set to match x0, and > is not changed again before entering userspace. > It would be a good idea to at least describe this in a comment so that > this doesn't get forgotten later on, but there's a limit to how > foolproof this low-level backend code needs to be... Similar concerns exist for huge chunks of the existing SVE code (eg, the no further changes on vector length constraint is pretty much universal and is I'd say largely more of a "make sure you handle the register contents" thing on anything that changes the vector length).
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index 2599504674b5..c072161d5c65 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -69,7 +69,7 @@ static inline void *sve_pffr(struct thread_struct *thread) extern void sve_save_state(void *state, u32 *pfpsr); extern void sve_load_state(void const *state, u32 const *pfpsr, unsigned long vq_minus_1); -extern void sve_flush_live(void); +extern void sve_flush_live(unsigned long vq_minus_1); extern void sve_load_from_fpsimd_state(struct user_fpsimd_state const *state, unsigned long vq_minus_1); extern unsigned int sve_get_vl(void); diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S index dd8382e5ce82..87ef25836963 100644 --- a/arch/arm64/kernel/entry-fpsimd.S +++ b/arch/arm64/kernel/entry-fpsimd.S @@ -69,10 +69,15 @@ SYM_FUNC_START(sve_load_from_fpsimd_state) ret SYM_FUNC_END(sve_load_from_fpsimd_state) -/* Zero all SVE registers but the first 128-bits of each vector */ +/* + * Zero all SVE registers but the first 128-bits of each vector + * + * x0 = VQ - 1 + */ SYM_FUNC_START(sve_flush_live) + cbz x0, 1f // A VQ-1 of 0 is 128 bits so no extra Z state sve_flush_z - sve_flush_p_ffr +1: sve_flush_p_ffr ret SYM_FUNC_END(sve_flush_live) diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index ad3dd34a83cf..e57b23f95284 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -957,8 +957,10 @@ void do_sve_acc(unsigned int esr, struct pt_regs *regs) * disabling the trap, otherwise update our in-memory copy. */ if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) { - sve_set_vq(sve_vq_from_vl(current->thread.sve_vl) - 1); - sve_flush_live(); + unsigned long vq_minus_one = + sve_vq_from_vl(current->thread.sve_vl) - 1; + sve_set_vq(vq_minus_one); + sve_flush_live(vq_minus_one); fpsimd_bind_task_to_cpu(); } else { fpsimd_to_sve(current);
When the SVE vector length is 128 bits then there are no bits in the Z registers which are not shared with the V registers so we can skip them when zeroing state not shared with FPSIMD, this results in a minor performance improvement. Signed-off-by: Mark Brown <broonie@kernel.org> --- arch/arm64/include/asm/fpsimd.h | 2 +- arch/arm64/kernel/entry-fpsimd.S | 9 +++++++-- arch/arm64/kernel/fpsimd.c | 6 ++++-- 3 files changed, 12 insertions(+), 5 deletions(-)