diff mbox series

[RFC,v4,08/11] lib: vdso: allow fixed clock mode

Message ID 1b278bc1f6859d4df734fb2cde61cf298e6e07fd.1579196675.git.christophe.leroy@c-s.fr (mailing list archive)
State New, archived
Headers show
Series powerpc: switch VDSO to C implementation. | expand

Commit Message

Christophe Leroy Jan. 16, 2020, 5:58 p.m. UTC
On arches like POWERPC, the clock is always the timebase, it
cannot be changed on the fly and it is always VDSO capable.

Therefore, give arches the opportunity to redefine the way
clock_mode is checked by moving the check into an overridable
__arch_vdso_capable() macro.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 lib/vdso/gettimeofday.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Comments

Thomas Gleixner Jan. 16, 2020, 8:13 p.m. UTC | #1
Christophe Leroy <christophe.leroy@c-s.fr> writes:

Can you please adjust the prefix for future patches to lib/vdso: and
start the sentence after the colon with an uppercase letter?

> On arches like POWERPC, the clock is always the timebase, it

Please spell out architectures. Changelogs are not space constraint.

> cannot be changed on the fly and it is always VDSO capable.

Also this sentence does not make sense as it might suggests that
architectures with a fixed compile time known clocksource have something
named timebase. Something like this is more clear:

Some architectures have a fixed clocksource which is known at compile
time and cannot be replaced or disabled at runtime, e.g. timebase on
PowerPC. For such cases the clock mode check in the VDSO code is
pointless.

Hmm?

Thanks,

        tglx
Andy Lutomirski Jan. 16, 2020, 8:19 p.m. UTC | #2
On Thu, Jan 16, 2020 at 12:14 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Christophe Leroy <christophe.leroy@c-s.fr> writes:
>
> Can you please adjust the prefix for future patches to lib/vdso: and
> start the sentence after the colon with an uppercase letter?
>
> > On arches like POWERPC, the clock is always the timebase, it
>
> Please spell out architectures. Changelogs are not space constraint.
>
> > cannot be changed on the fly and it is always VDSO capable.
>
> Also this sentence does not make sense as it might suggests that
> architectures with a fixed compile time known clocksource have something
> named timebase. Something like this is more clear:
>
> Some architectures have a fixed clocksource which is known at compile
> time and cannot be replaced or disabled at runtime, e.g. timebase on
> PowerPC. For such cases the clock mode check in the VDSO code is
> pointless.
>

I wonder if we should use this on x86 bare-metal if we have
sufficiently invariant TSC.  (Via static_cpu_has(), not compiled in.)
Maybe there is no such x86 machine.

I really really want Intel or AMD to introduce machines where the TSC
pinky-swears to count in actual nanoseconds.

--Andy
Thomas Gleixner Jan. 16, 2020, 9:07 p.m. UTC | #3
Andy Lutomirski <luto@kernel.org> writes:
> On Thu, Jan 16, 2020 at 12:14 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>> Some architectures have a fixed clocksource which is known at compile
>> time and cannot be replaced or disabled at runtime, e.g. timebase on
>> PowerPC. For such cases the clock mode check in the VDSO code is
>> pointless.
>>
> I wonder if we should use this on x86 bare-metal if we have
> sufficiently invariant TSC.  (Via static_cpu_has(), not compiled in.)
>
> Maybe there is no such x86 machine.

There might be some, but every time I started to trust the TSC a bit
more someone reported the next variant of brokenness.

Admittedly it has become better at least up to two sockets.

For a start we could do that when the TSC is considered reliable, which
is the case when:

  - The TSC is the only available clocksource

  - tsc=reliable is on the kernel command line

> I really really want Intel or AMD to introduce machines where the TSC
> pinky-swears to count in actual nanoseconds.

and is guaranteed to be synchronized across any number of sockets/cpus
and has an enforcable protection against BIOS writers.

Ideally it'd have a writeable MSR attached which allows us to tweak the
frequency in the PPM range via NTP/PTP.

Guess how long quite some people including Linus and myself are asking
for this?

I know that Linus started bitching about the TSC before me, but it's
already a bit over 20 years on my side when I first talked to Intel and
AMD about the requirements for a reliable clocksource.

Just to set the time lines straight.

Constant frequency TSC surfaced on Intel in 2006 with the Core brand and
on AMD in 2007 with Barcelona (Fam 10h).

In 2008 the first TSC surfaced which was not affected by C-States and 5
years later in 2013 some Atoms came out where TSC even worked accross
S3.

The > 2 socket issue is still not resolved AFAICT, but we got at least
the TSC ADJUST MSR around 2012 which allowed us for the first time to
reliably detect and mitigate BIOS wreckage.

All the years I was envy on architectures which had simple designed and
just reliably working timers forever.

So now you can extrapolate how long it will take until you get your
pinky-swearing pony :)

Thanks,

        tglx
diff mbox series

Patch

diff --git a/lib/vdso/gettimeofday.c b/lib/vdso/gettimeofday.c
index 9fa249809399..724b45c3e8ac 100644
--- a/lib/vdso/gettimeofday.c
+++ b/lib/vdso/gettimeofday.c
@@ -46,6 +46,13 @@  static inline bool __arch_vdso_hres_capable(void)
 }
 #endif
 
+#ifndef __arch_vdso_capable
+static inline bool __arch_vdso_capable(const struct vdso_data *vd)
+{
+	return vd->clock_mode != VDSO_CLOCKMODE_NONE;
+}
+#endif
+
 #ifdef CONFIG_TIME_NS
 static int do_hres_timens(const struct vdso_data *vdns, clockid_t clk,
 			  struct __kernel_timespec *ts)
@@ -66,7 +73,7 @@  static int do_hres_timens(const struct vdso_data *vdns, clockid_t clk,
 	do {
 		seq = vdso_read_begin(vd);
 
-		if (unlikely(vd->clock_mode == VDSO_CLOCKMODE_NONE))
+		if (unlikely(!__arch_vdso_capable(vd)))
 			return -1;
 
 		cycles = __arch_get_hw_counter(vd->clock_mode);
@@ -134,7 +141,7 @@  static __always_inline int do_hres(const struct vdso_data *vd, clockid_t clk,
 		}
 		smp_rmb();
 
-		if (unlikely(vd->clock_mode == VDSO_CLOCKMODE_NONE))
+		if (unlikely(!__arch_vdso_capable(vd)))
 			return -1;
 
 		cycles = __arch_get_hw_counter(vd->clock_mode);