diff mbox series

[v3,21/26] arm64: Introduce asm/vdso/arch_timer.h

Message ID 20200313154345.56760-22-vincenzo.frascino@arm.com (mailing list archive)
State Superseded
Headers show
Series Introduce common headers for vDSO | expand

Commit Message

Vincenzo Frascino March 13, 2020, 3:43 p.m. UTC
The vDSO library should only include the necessary headers required for
a userspace library (UAPI and a minimal set of kernel headers). To make
this possible it is necessary to isolate from the kernel headers the
common parts that are strictly necessary to build the library.

Introduce asm/vdso/arch_timer.h to contain all the arm64 specific
code. This allows to replace the second isb() in __arch_get_hw_counter()
with a fake dependent stack read of the counter which improves the vdso
library peformances of ~4.5%. Below the results of vdsotest [1] ran for
100 iterations.

Before the patch:
=================
clock-gettime-monotonic: syscall: 771 nsec/call
clock-gettime-monotonic:    libc: 130 nsec/call
clock-gettime-monotonic:    vdso: 111 nsec/call
...
clock-gettime-realtime: syscall: 762 nsec/call
clock-gettime-realtime:    libc: 130 nsec/call
clock-gettime-realtime:    vdso: 111 nsec/call

After the patch:
================
clock-gettime-monotonic: syscall: 792 nsec/call
clock-gettime-monotonic:    libc: 124 nsec/call
clock-gettime-monotonic:    vdso: 106 nsec/call
...
clock-gettime-realtime: syscall: 776 nsec/call
clock-gettime-realtime:    libc: 124 nsec/call
clock-gettime-realtime:    vdso: 106 nsec/call

[1] https://github.com/nathanlynch/vdsotest

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
---
 arch/arm64/include/asm/arch_timer.h        | 29 ++++---------------
 arch/arm64/include/asm/vdso/arch_timer.h   | 33 ++++++++++++++++++++++
 arch/arm64/include/asm/vdso/gettimeofday.h |  7 +++--
 3 files changed, 42 insertions(+), 27 deletions(-)
 create mode 100644 arch/arm64/include/asm/vdso/arch_timer.h

Comments

Catalin Marinas March 15, 2020, 6:32 p.m. UTC | #1
On Fri, Mar 13, 2020 at 03:43:40PM +0000, Vincenzo Frascino wrote:
> The vDSO library should only include the necessary headers required for
> a userspace library (UAPI and a minimal set of kernel headers). To make
> this possible it is necessary to isolate from the kernel headers the
> common parts that are strictly necessary to build the library.
> 
> Introduce asm/vdso/arch_timer.h to contain all the arm64 specific
> code. This allows to replace the second isb() in __arch_get_hw_counter()
> with a fake dependent stack read of the counter which improves the vdso
> library peformances of ~4.5%. Below the results of vdsotest [1] ran for
> 100 iterations.

The subject seems to imply a non-functional change but as you read, it
gets a lot more complicated. Could you keep the functional change
separate from the header clean-up, maybe submit it as an independent
patch? And it shouldn't go in without Will's ack ;).
Mark Rutland March 16, 2020, 10:28 a.m. UTC | #2
Hi Vincenzo,

On Fri, Mar 13, 2020 at 03:43:40PM +0000, Vincenzo Frascino wrote:
> The vDSO library should only include the necessary headers required for
> a userspace library (UAPI and a minimal set of kernel headers). To make
> this possible it is necessary to isolate from the kernel headers the
> common parts that are strictly necessary to build the library.
> 
> Introduce asm/vdso/arch_timer.h to contain all the arm64 specific
> code. This allows to replace the second isb() in __arch_get_hw_counter()
> with a fake dependent stack read of the counter which improves the vdso
> library peformances of ~4.5%. Below the results of vdsotest [1] ran for
> 100 iterations.
> 
> Before the patch:
> =================
> clock-gettime-monotonic: syscall: 771 nsec/call
> clock-gettime-monotonic:    libc: 130 nsec/call
> clock-gettime-monotonic:    vdso: 111 nsec/call
> ...
> clock-gettime-realtime: syscall: 762 nsec/call
> clock-gettime-realtime:    libc: 130 nsec/call
> clock-gettime-realtime:    vdso: 111 nsec/call
> 
> After the patch:
> ================
> clock-gettime-monotonic: syscall: 792 nsec/call
> clock-gettime-monotonic:    libc: 124 nsec/call
> clock-gettime-monotonic:    vdso: 106 nsec/call
> ...
> clock-gettime-realtime: syscall: 776 nsec/call
> clock-gettime-realtime:    libc: 124 nsec/call
> clock-gettime-realtime:    vdso: 106 nsec/call
> 
> [1] https://github.com/nathanlynch/vdsotest
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Mark Rutland <Mark.Rutland@arm.com>
> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
> ---
>  arch/arm64/include/asm/arch_timer.h        | 29 ++++---------------
>  arch/arm64/include/asm/vdso/arch_timer.h   | 33 ++++++++++++++++++++++
>  arch/arm64/include/asm/vdso/gettimeofday.h |  7 +++--
>  3 files changed, 42 insertions(+), 27 deletions(-)
>  create mode 100644 arch/arm64/include/asm/vdso/arch_timer.h
> 
> diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
> index 7ae54d7d333a..7f22cd00ad45 100644
> --- a/arch/arm64/include/asm/arch_timer.h
> +++ b/arch/arm64/include/asm/arch_timer.h
> @@ -164,24 +164,7 @@ static inline void arch_timer_set_cntkctl(u32 cntkctl)
>  	isb();
>  }
>  
> -/*
> - * Ensure that reads of the counter are treated the same as memory reads
> - * for the purposes of ordering by subsequent memory barriers.
> - *
> - * This insanity brought to you by speculative system register reads,
> - * out-of-order memory accesses, sequence locks and Thomas Gleixner.
> - *
> - * http://lists.infradead.org/pipermail/linux-arm-kernel/2019-February/631195.html
> - */
> -#define arch_counter_enforce_ordering(val) do {				\
> -	u64 tmp, _val = (val);						\
> -									\
> -	asm volatile(							\
> -	"	eor	%0, %1, %1\n"					\
> -	"	add	%0, sp, %0\n"					\
> -	"	ldr	xzr, [%0]"					\
> -	: "=r" (tmp) : "r" (_val));					\
> -} while (0)
> +#include <asm/vdso/arch_timer.h>
>  
>  static __always_inline u64 __arch_counter_get_cntpct_stable(void)
>  {
> @@ -189,7 +172,7 @@ static __always_inline u64 __arch_counter_get_cntpct_stable(void)
>  
>  	isb();
>  	cnt = arch_timer_reg_read_stable(cntpct_el0);
> -	arch_counter_enforce_ordering(cnt);
> +	cnt = arch_counter_enforce_ordering(cnt);
>  	return cnt;

Why have you changed the structure of arch_counter_enforce_ordering() to
return a value? The commit message has no rationale for that.

If there is a reason to change that, I'd prefer the driver change as one
patch, before moving the definition.

[...]

> +/*
> + * Ensure that reads of the counter are treated the same as memory reads
> + * for the purposes of ordering by subsequent memory barriers.
> + *
> + * This insanity brought to you by speculative system register reads,
> + * out-of-order memory accesses, sequence locks and Thomas Gleixner.
> + *
> + * http://lists.infradead.org/pipermail/linux-arm-kernel/2019-February/631195.html
> + *
> + */
> +static u64 arch_counter_enforce_ordering(u64 val)
> +{
> +	u64 tmp, _val = (val);
> +
> +	asm volatile(
> +	"	eor	%0, %1, %1\n"
> +	"	add	%0, sp, %0\n"
> +	"	ldr	xzr, [%0]"
> +	: "=r" (tmp) : "r" (_val));
> +
> +	return _val;
> +}

This change has no functional effect. Since `_val` is only passed in as
an input parameter, the compiler can assume the assembly has no effect
on it.

As above, what is the rationale for changing this?

> @@ -82,10 +83,10 @@ static __always_inline u64 __arch_get_hw_counter(s32 clock_mode)
>  	isb();
>  	asm volatile("mrs %0, cntvct_el0" : "=r" (res) :: "memory");
>  	/*
> -	 * This isb() is required to prevent that the seq lock is
> -	 * speculated.#
> +	 * arch_counter_enforce_ordering() is required to prevent that
> +	 * the seq lock is speculated.
>  	 */
> -	isb();
> +	res = arch_counter_enforce_ordering(res);

Can we delete the comment entirely? We don't bother in <asm/arch_timer.h>.

Even better, can we factor out __arch_counter_get_cntvct(), and use
that?

Thanks,
Mark.
Vincenzo Frascino March 16, 2020, 3:37 p.m. UTC | #3
Hi Catalin,

On 3/15/20 6:32 PM, Catalin Marinas wrote:
> On Fri, Mar 13, 2020 at 03:43:40PM +0000, Vincenzo Frascino wrote:
>> The vDSO library should only include the necessary headers required for
>> a userspace library (UAPI and a minimal set of kernel headers). To make
>> this possible it is necessary to isolate from the kernel headers the
>> common parts that are strictly necessary to build the library.
>>
>> Introduce asm/vdso/arch_timer.h to contain all the arm64 specific
>> code. This allows to replace the second isb() in __arch_get_hw_counter()
>> with a fake dependent stack read of the counter which improves the vdso
>> library peformances of ~4.5%. Below the results of vdsotest [1] ran for
>> 100 iterations.
> 
> The subject seems to imply a non-functional change but as you read, it
> gets a lot more complicated. Could you keep the functional change
> separate from the header clean-up, maybe submit it as an independent
> patch? And it shouldn't go in without Will's ack ;).
> 

It is fine by me. I will repost the series with the required fixes and without
this patch. This will give to me enough time to address Mark's comments as well
and to Will to have a proper look.
Will Deacon April 9, 2020, 1:26 p.m. UTC | #4
Hi Vincenzo,

Sorry, I was on holiday when you posted this and it slipped through the
cracks.

On Mon, Mar 16, 2020 at 03:37:23PM +0000, Vincenzo Frascino wrote:
> > On Fri, Mar 13, 2020 at 03:43:40PM +0000, Vincenzo Frascino wrote:
> >> The vDSO library should only include the necessary headers required for
> >> a userspace library (UAPI and a minimal set of kernel headers). To make
> >> this possible it is necessary to isolate from the kernel headers the
> >> common parts that are strictly necessary to build the library.
> >>
> >> Introduce asm/vdso/arch_timer.h to contain all the arm64 specific
> >> code. This allows to replace the second isb() in __arch_get_hw_counter()
> >> with a fake dependent stack read of the counter which improves the vdso
> >> library peformances of ~4.5%. Below the results of vdsotest [1] ran for
> >> 100 iterations.
> > 
> > The subject seems to imply a non-functional change but as you read, it
> > gets a lot more complicated. Could you keep the functional change
> > separate from the header clean-up, maybe submit it as an independent
> > patch? And it shouldn't go in without Will's ack ;).
> > 
> 
> It is fine by me. I will repost the series with the required fixes and without
> this patch. This will give to me enough time to address Mark's comments as well
> and to Will to have a proper look.

Please can you post whatever is left at -rc1? I'll have a look then, but
let's stick to just moving code around rather than randomly changing it
at the same time, ok?

Thanks,

Will
Vincenzo Frascino April 9, 2020, 1:36 p.m. UTC | #5
Hi Will,

On 4/9/20 2:26 PM, Will Deacon wrote:
> Hi Vincenzo,
> 
> Sorry, I was on holiday when you posted this and it slipped through the
> cracks.
> 

No issue at all. Thank you for getting back to me.

> On Mon, Mar 16, 2020 at 03:37:23PM +0000, Vincenzo Frascino wrote:
>>> On Fri, Mar 13, 2020 at 03:43:40PM +0000, Vincenzo Frascino wrote:
>>>> The vDSO library should only include the necessary headers required for
>>>> a userspace library (UAPI and a minimal set of kernel headers). To make
>>>> this possible it is necessary to isolate from the kernel headers the
>>>> common parts that are strictly necessary to build the library.
>>>>
>>>> Introduce asm/vdso/arch_timer.h to contain all the arm64 specific
>>>> code. This allows to replace the second isb() in __arch_get_hw_counter()
>>>> with a fake dependent stack read of the counter which improves the vdso
>>>> library peformances of ~4.5%. Below the results of vdsotest [1] ran for
>>>> 100 iterations.
>>>
>>> The subject seems to imply a non-functional change but as you read, it
>>> gets a lot more complicated. Could you keep the functional change
>>> separate from the header clean-up, maybe submit it as an independent
>>> patch? And it shouldn't go in without Will's ack ;).
>>>
>>
>> It is fine by me. I will repost the series with the required fixes and without
>> this patch. This will give to me enough time to address Mark's comments as well
>> and to Will to have a proper look.
> 
> Please can you post whatever is left at -rc1? I'll have a look then, but
> let's stick to just moving code around rather than randomly changing it
> at the same time, ok?
> 

Sure, I will try to re-post it by -rc1 and take on board your comments.

> Thanks,
> 
> Will
>
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
index 7ae54d7d333a..7f22cd00ad45 100644
--- a/arch/arm64/include/asm/arch_timer.h
+++ b/arch/arm64/include/asm/arch_timer.h
@@ -164,24 +164,7 @@  static inline void arch_timer_set_cntkctl(u32 cntkctl)
 	isb();
 }
 
-/*
- * Ensure that reads of the counter are treated the same as memory reads
- * for the purposes of ordering by subsequent memory barriers.
- *
- * This insanity brought to you by speculative system register reads,
- * out-of-order memory accesses, sequence locks and Thomas Gleixner.
- *
- * http://lists.infradead.org/pipermail/linux-arm-kernel/2019-February/631195.html
- */
-#define arch_counter_enforce_ordering(val) do {				\
-	u64 tmp, _val = (val);						\
-									\
-	asm volatile(							\
-	"	eor	%0, %1, %1\n"					\
-	"	add	%0, sp, %0\n"					\
-	"	ldr	xzr, [%0]"					\
-	: "=r" (tmp) : "r" (_val));					\
-} while (0)
+#include <asm/vdso/arch_timer.h>
 
 static __always_inline u64 __arch_counter_get_cntpct_stable(void)
 {
@@ -189,7 +172,7 @@  static __always_inline u64 __arch_counter_get_cntpct_stable(void)
 
 	isb();
 	cnt = arch_timer_reg_read_stable(cntpct_el0);
-	arch_counter_enforce_ordering(cnt);
+	cnt = arch_counter_enforce_ordering(cnt);
 	return cnt;
 }
 
@@ -199,7 +182,7 @@  static __always_inline u64 __arch_counter_get_cntpct(void)
 
 	isb();
 	cnt = read_sysreg(cntpct_el0);
-	arch_counter_enforce_ordering(cnt);
+	cnt = arch_counter_enforce_ordering(cnt);
 	return cnt;
 }
 
@@ -209,7 +192,7 @@  static __always_inline u64 __arch_counter_get_cntvct_stable(void)
 
 	isb();
 	cnt = arch_timer_reg_read_stable(cntvct_el0);
-	arch_counter_enforce_ordering(cnt);
+	cnt = arch_counter_enforce_ordering(cnt);
 	return cnt;
 }
 
@@ -219,12 +202,10 @@  static __always_inline u64 __arch_counter_get_cntvct(void)
 
 	isb();
 	cnt = read_sysreg(cntvct_el0);
-	arch_counter_enforce_ordering(cnt);
+	cnt = arch_counter_enforce_ordering(cnt);
 	return cnt;
 }
 
-#undef arch_counter_enforce_ordering
-
 static inline int arch_timer_arch_init(void)
 {
 	return 0;
diff --git a/arch/arm64/include/asm/vdso/arch_timer.h b/arch/arm64/include/asm/vdso/arch_timer.h
new file mode 100644
index 000000000000..a71bc83232f5
--- /dev/null
+++ b/arch/arm64/include/asm/vdso/arch_timer.h
@@ -0,0 +1,33 @@ 
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2020 ARM Ltd.
+ */
+#ifndef __ASM_VDSO_ARCH_TIMER_H
+#define __ASM_VDSO_ARCH_TIMER_H
+
+#include <uapi/linux/types.h>
+
+/*
+ * Ensure that reads of the counter are treated the same as memory reads
+ * for the purposes of ordering by subsequent memory barriers.
+ *
+ * This insanity brought to you by speculative system register reads,
+ * out-of-order memory accesses, sequence locks and Thomas Gleixner.
+ *
+ * http://lists.infradead.org/pipermail/linux-arm-kernel/2019-February/631195.html
+ *
+ */
+static u64 arch_counter_enforce_ordering(u64 val)
+{
+	u64 tmp, _val = (val);
+
+	asm volatile(
+	"	eor	%0, %1, %1\n"
+	"	add	%0, sp, %0\n"
+	"	ldr	xzr, [%0]"
+	: "=r" (tmp) : "r" (_val));
+
+	return _val;
+}
+
+#endif /* __ASM_VDSO_ARCH_TIMER_H */
diff --git a/arch/arm64/include/asm/vdso/gettimeofday.h b/arch/arm64/include/asm/vdso/gettimeofday.h
index afba6ba332f8..319808106625 100644
--- a/arch/arm64/include/asm/vdso/gettimeofday.h
+++ b/arch/arm64/include/asm/vdso/gettimeofday.h
@@ -8,6 +8,7 @@ 
 #ifndef __ASSEMBLY__
 
 #include <asm/unistd.h>
+#include <asm/vdso/arch_timer.h>
 
 #define VDSO_HAS_CLOCK_GETRES		1
 
@@ -82,10 +83,10 @@  static __always_inline u64 __arch_get_hw_counter(s32 clock_mode)
 	isb();
 	asm volatile("mrs %0, cntvct_el0" : "=r" (res) :: "memory");
 	/*
-	 * This isb() is required to prevent that the seq lock is
-	 * speculated.#
+	 * arch_counter_enforce_ordering() is required to prevent that
+	 * the seq lock is speculated.
 	 */
-	isb();
+	res = arch_counter_enforce_ordering(res);
 
 	return res;
 }