Message ID | 20201118032051.1405907-1-pcc@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: mte: optimize asynchronous tag check fault flag check | expand |
On Tue, Nov 17, 2020 at 07:20:51PM -0800, Peter Collingbourne wrote: > We don't need to check for MTE support before checking the flag > because it can only be set if the hardware supports MTE. As a result > we can unconditionally check the flag bit which is expected to be in > a register and therefore the check can be done in a single instruction > instead of first needing to load the hwcaps. > > On a DragonBoard 845c with a kernel built with CONFIG_ARM64_MTE=y with > the powersave governor this reduces the cost of a kernel entry/exit > (invalid syscall) from 465.1ns to 463.8ns. That's less than 0.3%, could be noise as well. > diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c > index e4c0dadf0d92..f533e83fd722 100644 > --- a/arch/arm64/kernel/syscall.c > +++ b/arch/arm64/kernel/syscall.c > @@ -123,7 +123,7 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr, > local_daif_restore(DAIF_PROCCTX); > user_exit(); > > - if (system_supports_mte() && (flags & _TIF_MTE_ASYNC_FAULT)) { > + if (IS_ENABLED(CONFIG_ARM64_MTE) && (flags & _TIF_MTE_ASYNC_FAULT)) { system_supports_mte() is actually a static label (well, it expands to two). So we get a combination of nop/branch jumping over the flags check. Maybe the difference you are seeing is caused by the extra instructions changing the cache alignment or confusing the branch predictor. I wonder whether changing __cpus_have_const_cap() to use static_branch_likely() has any effect (given that we expect most features to be enabled in future CPUs, such change would probably make sense). That said, I'm happy to take this patch, most likely it replaces some static branch with tbnz on this path. Given that MTE is on in defconfig, do we actually need the IS_ENABLED() check? If we remove it, it would be more consistent with the similar check in do_notify_resume().
On Wed, Nov 18, 2020 at 3:24 AM Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Tue, Nov 17, 2020 at 07:20:51PM -0800, Peter Collingbourne wrote: > > We don't need to check for MTE support before checking the flag > > because it can only be set if the hardware supports MTE. As a result > > we can unconditionally check the flag bit which is expected to be in > > a register and therefore the check can be done in a single instruction > > instead of first needing to load the hwcaps. > > > > On a DragonBoard 845c with a kernel built with CONFIG_ARM64_MTE=y with > > the powersave governor this reduces the cost of a kernel entry/exit > > (invalid syscall) from 465.1ns to 463.8ns. > > That's less than 0.3%, could be noise as well. The numbers are quite stable following the methodology of [1] (one thing that I forgot to mention there is adb shell stop before taking measurements). Out of 100 samples there were about 75 in the faster cluster ranging from 464.9966ns-465.1515ns before and 463.7776ns-463.9312ns after. > > diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c > > index e4c0dadf0d92..f533e83fd722 100644 > > --- a/arch/arm64/kernel/syscall.c > > +++ b/arch/arm64/kernel/syscall.c > > @@ -123,7 +123,7 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr, > > local_daif_restore(DAIF_PROCCTX); > > user_exit(); > > > > - if (system_supports_mte() && (flags & _TIF_MTE_ASYNC_FAULT)) { > > + if (IS_ENABLED(CONFIG_ARM64_MTE) && (flags & _TIF_MTE_ASYNC_FAULT)) { > > system_supports_mte() is actually a static label (well, it expands to > two). So we get a combination of nop/branch jumping over the flags > check. Maybe the difference you are seeing is caused by the extra > instructions changing the cache alignment or confusing the branch > predictor. Okay, I wasn't aware of that mechanism. From the disassembly of syscall.o it did appear that we were loading from hwcap every time. But looking at the compiler's -S output it looks like there's something going on with __jump_table which may lead to that happening. There is still a conditional branch on the presumably static result of the hwcap which may also be where part of the performance loss is coming from. > I wonder whether changing __cpus_have_const_cap() to use > static_branch_likely() has any effect (given that we expect most > features to be enabled in future CPUs, such change would probably make > sense). > > That said, I'm happy to take this patch, most likely it replaces some > static branch with tbnz on this path. Given that MTE is on in defconfig, > do we actually need the IS_ENABLED() check? If we remove it, it would be > more consistent with the similar check in do_notify_resume(). If we don't mind the additional conditional branch on CONFIG_ARM64_MTE=n kernels we don't need the IS_ENABLED() check. I'll remove it. Peter [1] https://lore.kernel.org/linux-arm-kernel/CAMn1gO7HCJiXEzDvBb-=T7znATqyaxE_t-zezqKL0dyXRCG-nQ@mail.gmail.com/
On Wed, Nov 18, 2020 at 08:53:16AM -0800, Peter Collingbourne wrote: > On Wed, Nov 18, 2020 at 3:24 AM Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Tue, Nov 17, 2020 at 07:20:51PM -0800, Peter Collingbourne wrote: > > > diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c > > > index e4c0dadf0d92..f533e83fd722 100644 > > > --- a/arch/arm64/kernel/syscall.c > > > +++ b/arch/arm64/kernel/syscall.c > > > @@ -123,7 +123,7 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr, > > > local_daif_restore(DAIF_PROCCTX); > > > user_exit(); > > > > > > - if (system_supports_mte() && (flags & _TIF_MTE_ASYNC_FAULT)) { > > > + if (IS_ENABLED(CONFIG_ARM64_MTE) && (flags & _TIF_MTE_ASYNC_FAULT)) { [...] > > That said, I'm happy to take this patch, most likely it replaces some > > static branch with tbnz on this path. Given that MTE is on in defconfig, > > do we actually need the IS_ENABLED() check? If we remove it, it would be > > more consistent with the similar check in do_notify_resume(). > > If we don't mind the additional conditional branch on > CONFIG_ARM64_MTE=n kernels we don't need the IS_ENABLED() check. I'll > remove it. No need to, I'll do it when applying the patch.
On Tue, 17 Nov 2020 19:20:51 -0800, Peter Collingbourne wrote: > We don't need to check for MTE support before checking the flag > because it can only be set if the hardware supports MTE. As a result > we can unconditionally check the flag bit which is expected to be in > a register and therefore the check can be done in a single instruction > instead of first needing to load the hwcaps. > > On a DragonBoard 845c with a kernel built with CONFIG_ARM64_MTE=y with > the powersave governor this reduces the cost of a kernel entry/exit > (invalid syscall) from 465.1ns to 463.8ns. Applied to arm64 (for-next/misc), thanks! [1/1] arm64: mte: optimize asynchronous tag check fault flag check https://git.kernel.org/arm64/c/739003c64283
diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c index e4c0dadf0d92..f533e83fd722 100644 --- a/arch/arm64/kernel/syscall.c +++ b/arch/arm64/kernel/syscall.c @@ -123,7 +123,7 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr, local_daif_restore(DAIF_PROCCTX); user_exit(); - if (system_supports_mte() && (flags & _TIF_MTE_ASYNC_FAULT)) { + if (IS_ENABLED(CONFIG_ARM64_MTE) && (flags & _TIF_MTE_ASYNC_FAULT)) { /* * Process the asynchronous tag check fault before the actual * syscall. do_notify_resume() will send a signal to userspace
We don't need to check for MTE support before checking the flag because it can only be set if the hardware supports MTE. As a result we can unconditionally check the flag bit which is expected to be in a register and therefore the check can be done in a single instruction instead of first needing to load the hwcaps. On a DragonBoard 845c with a kernel built with CONFIG_ARM64_MTE=y with the powersave governor this reduces the cost of a kernel entry/exit (invalid syscall) from 465.1ns to 463.8ns. Signed-off-by: Peter Collingbourne <pcc@google.com> Link: https://linux-review.googlesource.com/id/If4dc3501fd4e4f287322f17805509613cfe47d24 --- arch/arm64/kernel/syscall.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)