diff mbox series

arm64: mte: optimize asynchronous tag check fault flag check

Message ID 20201118032051.1405907-1-pcc@google.com (mailing list archive)
State New, archived
Headers show
Series arm64: mte: optimize asynchronous tag check fault flag check | expand

Commit Message

Peter Collingbourne Nov. 18, 2020, 3:20 a.m. UTC
We don't need to check for MTE support before checking the flag
because it can only be set if the hardware supports MTE. As a result
we can unconditionally check the flag bit which is expected to be in
a register and therefore the check can be done in a single instruction
instead of first needing to load the hwcaps.

On a DragonBoard 845c with a kernel built with CONFIG_ARM64_MTE=y with
the powersave governor this reduces the cost of a kernel entry/exit
(invalid syscall) from 465.1ns to 463.8ns.

Signed-off-by: Peter Collingbourne <pcc@google.com>
Link: https://linux-review.googlesource.com/id/If4dc3501fd4e4f287322f17805509613cfe47d24
---
 arch/arm64/kernel/syscall.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Catalin Marinas Nov. 18, 2020, 11:23 a.m. UTC | #1
On Tue, Nov 17, 2020 at 07:20:51PM -0800, Peter Collingbourne wrote:
> We don't need to check for MTE support before checking the flag
> because it can only be set if the hardware supports MTE. As a result
> we can unconditionally check the flag bit which is expected to be in
> a register and therefore the check can be done in a single instruction
> instead of first needing to load the hwcaps.
> 
> On a DragonBoard 845c with a kernel built with CONFIG_ARM64_MTE=y with
> the powersave governor this reduces the cost of a kernel entry/exit
> (invalid syscall) from 465.1ns to 463.8ns.

That's less than 0.3%, could be noise as well.

> diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
> index e4c0dadf0d92..f533e83fd722 100644
> --- a/arch/arm64/kernel/syscall.c
> +++ b/arch/arm64/kernel/syscall.c
> @@ -123,7 +123,7 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
>  	local_daif_restore(DAIF_PROCCTX);
>  	user_exit();
>  
> -	if (system_supports_mte() && (flags & _TIF_MTE_ASYNC_FAULT)) {
> +	if (IS_ENABLED(CONFIG_ARM64_MTE) && (flags & _TIF_MTE_ASYNC_FAULT)) {

system_supports_mte() is actually a static label (well, it expands to
two). So we get a combination of nop/branch jumping over the flags
check. Maybe the difference you are seeing is caused by the extra
instructions changing the cache alignment or confusing the branch
predictor.

I wonder whether changing __cpus_have_const_cap() to use
static_branch_likely() has any effect (given that we expect most
features to be enabled in future CPUs, such change would probably make
sense).

That said, I'm happy to take this patch, most likely it replaces some
static branch with tbnz on this path. Given that MTE is on in defconfig,
do we actually need the IS_ENABLED() check? If we remove it, it would be
more consistent with the similar check in do_notify_resume().
Peter Collingbourne Nov. 18, 2020, 4:53 p.m. UTC | #2
On Wed, Nov 18, 2020 at 3:24 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> On Tue, Nov 17, 2020 at 07:20:51PM -0800, Peter Collingbourne wrote:
> > We don't need to check for MTE support before checking the flag
> > because it can only be set if the hardware supports MTE. As a result
> > we can unconditionally check the flag bit which is expected to be in
> > a register and therefore the check can be done in a single instruction
> > instead of first needing to load the hwcaps.
> >
> > On a DragonBoard 845c with a kernel built with CONFIG_ARM64_MTE=y with
> > the powersave governor this reduces the cost of a kernel entry/exit
> > (invalid syscall) from 465.1ns to 463.8ns.
>
> That's less than 0.3%, could be noise as well.

The numbers are quite stable following the methodology of [1] (one
thing that I forgot to mention there is adb shell stop before taking
measurements). Out of 100 samples there were about 75 in the faster
cluster ranging from 464.9966ns-465.1515ns before and
463.7776ns-463.9312ns after.

> > diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
> > index e4c0dadf0d92..f533e83fd722 100644
> > --- a/arch/arm64/kernel/syscall.c
> > +++ b/arch/arm64/kernel/syscall.c
> > @@ -123,7 +123,7 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
> >       local_daif_restore(DAIF_PROCCTX);
> >       user_exit();
> >
> > -     if (system_supports_mte() && (flags & _TIF_MTE_ASYNC_FAULT)) {
> > +     if (IS_ENABLED(CONFIG_ARM64_MTE) && (flags & _TIF_MTE_ASYNC_FAULT)) {
>
> system_supports_mte() is actually a static label (well, it expands to
> two). So we get a combination of nop/branch jumping over the flags
> check. Maybe the difference you are seeing is caused by the extra
> instructions changing the cache alignment or confusing the branch
> predictor.

Okay, I wasn't aware of that mechanism. From the disassembly of
syscall.o it did appear that we were loading from hwcap every time.
But looking at the compiler's -S output it looks like there's
something going on with __jump_table which may lead to that happening.
There is still a conditional branch on the presumably static result of
the hwcap which may also be where part of the performance loss is
coming from.

> I wonder whether changing __cpus_have_const_cap() to use
> static_branch_likely() has any effect (given that we expect most
> features to be enabled in future CPUs, such change would probably make
> sense).
>
> That said, I'm happy to take this patch, most likely it replaces some
> static branch with tbnz on this path. Given that MTE is on in defconfig,
> do we actually need the IS_ENABLED() check? If we remove it, it would be
> more consistent with the similar check in do_notify_resume().

If we don't mind the additional conditional branch on
CONFIG_ARM64_MTE=n kernels we don't need the IS_ENABLED() check. I'll
remove it.

Peter

[1] https://lore.kernel.org/linux-arm-kernel/CAMn1gO7HCJiXEzDvBb-=T7znATqyaxE_t-zezqKL0dyXRCG-nQ@mail.gmail.com/
Catalin Marinas Nov. 18, 2020, 5:07 p.m. UTC | #3
On Wed, Nov 18, 2020 at 08:53:16AM -0800, Peter Collingbourne wrote:
> On Wed, Nov 18, 2020 at 3:24 AM Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Tue, Nov 17, 2020 at 07:20:51PM -0800, Peter Collingbourne wrote:
> > > diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
> > > index e4c0dadf0d92..f533e83fd722 100644
> > > --- a/arch/arm64/kernel/syscall.c
> > > +++ b/arch/arm64/kernel/syscall.c
> > > @@ -123,7 +123,7 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
> > >       local_daif_restore(DAIF_PROCCTX);
> > >       user_exit();
> > >
> > > -     if (system_supports_mte() && (flags & _TIF_MTE_ASYNC_FAULT)) {
> > > +     if (IS_ENABLED(CONFIG_ARM64_MTE) && (flags & _TIF_MTE_ASYNC_FAULT)) {
[...]
> > That said, I'm happy to take this patch, most likely it replaces some
> > static branch with tbnz on this path. Given that MTE is on in defconfig,
> > do we actually need the IS_ENABLED() check? If we remove it, it would be
> > more consistent with the similar check in do_notify_resume().
> 
> If we don't mind the additional conditional branch on
> CONFIG_ARM64_MTE=n kernels we don't need the IS_ENABLED() check. I'll
> remove it.

No need to, I'll do it when applying the patch.
Catalin Marinas Nov. 18, 2020, 5:48 p.m. UTC | #4
On Tue, 17 Nov 2020 19:20:51 -0800, Peter Collingbourne wrote:
> We don't need to check for MTE support before checking the flag
> because it can only be set if the hardware supports MTE. As a result
> we can unconditionally check the flag bit which is expected to be in
> a register and therefore the check can be done in a single instruction
> instead of first needing to load the hwcaps.
> 
> On a DragonBoard 845c with a kernel built with CONFIG_ARM64_MTE=y with
> the powersave governor this reduces the cost of a kernel entry/exit
> (invalid syscall) from 465.1ns to 463.8ns.

Applied to arm64 (for-next/misc), thanks!

[1/1] arm64: mte: optimize asynchronous tag check fault flag check
      https://git.kernel.org/arm64/c/739003c64283
diff mbox series

Patch

diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
index e4c0dadf0d92..f533e83fd722 100644
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -123,7 +123,7 @@  static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
 	local_daif_restore(DAIF_PROCCTX);
 	user_exit();
 
-	if (system_supports_mte() && (flags & _TIF_MTE_ASYNC_FAULT)) {
+	if (IS_ENABLED(CONFIG_ARM64_MTE) && (flags & _TIF_MTE_ASYNC_FAULT)) {
 		/*
 		 * Process the asynchronous tag check fault before the actual
 		 * syscall. do_notify_resume() will send a signal to userspace