Message ID | c8ce9baaef0dc7273e4bcc31f353b17b655113d1.1579196675.git.christophe.leroy@c-s.fr (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | powerpc: switch VDSO to C implementation. | expand |
On Thu, Jan 16, 2020 at 9:58 AM Christophe Leroy <christophe.leroy@c-s.fr> wrote: > > On powerpc/32, GCC (8.1) generates pretty bad code for the > ns >>= vd->shift operation taking into account that the > shift is always < 32 and the upper part of the result is > likely to be nul. GCC makes reversed assumptions considering > the shift to be likely >= 32 and the upper part to be like not nul. > > unsigned long long shift(unsigned long long x, unsigned char s) > { > return x >> s; > } > > results in: > > 00000018 <shift>: > 18: 35 25 ff e0 addic. r9,r5,-32 > 1c: 41 80 00 10 blt 2c <shift+0x14> > 20: 7c 64 4c 30 srw r4,r3,r9 > 24: 38 60 00 00 li r3,0 > 28: 4e 80 00 20 blr > 2c: 54 69 08 3c rlwinm r9,r3,1,0,30 > 30: 21 45 00 1f subfic r10,r5,31 > 34: 7c 84 2c 30 srw r4,r4,r5 > 38: 7d 29 50 30 slw r9,r9,r10 > 3c: 7c 63 2c 30 srw r3,r3,r5 > 40: 7d 24 23 78 or r4,r9,r4 > 44: 4e 80 00 20 blr > > Even when forcing the shift with an &= 31, it still considers > the shift as likely >= 32. > > Define a vdso_shift_ns() macro that can be overriden by > arches. Would mul_u64_u64_shr() be a good alternative? Could we adjust it to assume the shift is less than 32? That function exists to benefit 32-bit arches. --Andy
Andy Lutomirski <luto@kernel.org> writes: > On Thu, Jan 16, 2020 at 9:58 AM Christophe Leroy > > Would mul_u64_u64_shr() be a good alternative? Could we adjust it to > assume the shift is less than 32? That function exists to benefit > 32-bit arches. We'd want mul_u64_u32_shr() for this. The rules for mult and shift are: 1 <= mult <= U32_MAX 1 <= shift <= 32 If we want to enforce a shift < 32 we need to limit that conditionally in the calculation/registration function. Thanks, tglx
On Thu, Jan 16, 2020 at 11:57 AM Thomas Gleixner <tglx@linutronix.de> wrote: > > Andy Lutomirski <luto@kernel.org> writes: > > On Thu, Jan 16, 2020 at 9:58 AM Christophe Leroy > > > > Would mul_u64_u64_shr() be a good alternative? Could we adjust it to > > assume the shift is less than 32? That function exists to benefit > > 32-bit arches. > > We'd want mul_u64_u32_shr() for this. The rules for mult and shift are: > That's what I meant to type... > 1 <= mult <= U32_MAX > > 1 <= shift <= 32 > > If we want to enforce a shift < 32 we need to limit that conditionally > in the calculation/registration function. > > Thanks, > > tglx >
Andy Lutomirski <luto@kernel.org> writes: > On Thu, Jan 16, 2020 at 11:57 AM Thomas Gleixner <tglx@linutronix.de> wrote: >> >> Andy Lutomirski <luto@kernel.org> writes: >> > On Thu, Jan 16, 2020 at 9:58 AM Christophe Leroy >> > >> > Would mul_u64_u64_shr() be a good alternative? Could we adjust it to >> > assume the shift is less than 32? That function exists to benefit >> > 32-bit arches. >> >> We'd want mul_u64_u32_shr() for this. The rules for mult and shift are: >> > > That's what I meant to type... Just that it does not work. The math is: ns = d->nsecs; // That's the nsec value shifted left by d->shift ns += ((cur - d->last) & d->mask) * mult; ns >>= d->shift; So we cannot use mul_u64_u32_shr() because we need the addition there before shifting. And no, we can't drop the fractional part of d->nsecs. Been there, done that, got sporadic time going backwards problems as a reward. Need to look at that again as stuff has changed over time. On x86 we enforce that mask is 64bit, so the & operation is not there, but due to the nasties of TSC we have that conditional if (cur > last) return (cur - last) * mult; return 0; Christophe, on PPC the decrementer/RTC clocksource masks are 64bit as well, so you can spare that & operation there too. Thanks, tglx
Le 29/01/2020 à 08:14, Thomas Gleixner a écrit : > Andy Lutomirski <luto@kernel.org> writes: > >> On Thu, Jan 16, 2020 at 11:57 AM Thomas Gleixner <tglx@linutronix.de> wrote: >>> >>> Andy Lutomirski <luto@kernel.org> writes: >>>> On Thu, Jan 16, 2020 at 9:58 AM Christophe Leroy >>>> >>>> Would mul_u64_u64_shr() be a good alternative? Could we adjust it to >>>> assume the shift is less than 32? That function exists to benefit >>>> 32-bit arches. >>> >>> We'd want mul_u64_u32_shr() for this. The rules for mult and shift are: >>> >> >> That's what I meant to type... > > Just that it does not work. The math is: > > ns = d->nsecs; // That's the nsec value shifted left by d->shift > > ns += ((cur - d->last) & d->mask) * mult; > > ns >>= d->shift; > > So we cannot use mul_u64_u32_shr() because we need the addition there > before shifting. And no, we can't drop the fractional part of > d->nsecs. Been there, done that, got sporadic time going backwards > problems as a reward. Need to look at that again as stuff has changed > over time. > > On x86 we enforce that mask is 64bit, so the & operation is not there, > but due to the nasties of TSC we have that conditional > > if (cur > last) > return (cur - last) * mult; > return 0; > > Christophe, on PPC the decrementer/RTC clocksource masks are 64bit as > well, so you can spare that & operation there too. > Yes, I did it already. It spares reading d->mast, that the main advantage. Christophe
diff --git a/lib/vdso/gettimeofday.c b/lib/vdso/gettimeofday.c index 724b45c3e8ac..9ba92058cfd7 100644 --- a/lib/vdso/gettimeofday.c +++ b/lib/vdso/gettimeofday.c @@ -39,6 +39,13 @@ u64 vdso_calc_delta(u64 cycles, u64 last, u64 mask, u32 mult) } #endif +#ifndef vdso_shift_ns +static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift) +{ + return ns >> shift; +} +#endif + #ifndef __arch_vdso_hres_capable static inline bool __arch_vdso_hres_capable(void) { @@ -148,7 +155,7 @@ static __always_inline int do_hres(const struct vdso_data *vd, clockid_t clk, ns = vdso_ts->nsec; last = vd->cycle_last; ns += vdso_calc_delta(cycles, last, vd->mask, vd->mult); - ns >>= vd->shift; + ns = vdso_shift_ns(ns, vd->shift); sec = vdso_ts->sec; } while (unlikely(vdso_read_retry(vd, seq)));
On powerpc/32, GCC (8.1) generates pretty bad code for the ns >>= vd->shift operation taking into account that the shift is always < 32 and the upper part of the result is likely to be nul. GCC makes reversed assumptions considering the shift to be likely >= 32 and the upper part to be like not nul. unsigned long long shift(unsigned long long x, unsigned char s) { return x >> s; } results in: 00000018 <shift>: 18: 35 25 ff e0 addic. r9,r5,-32 1c: 41 80 00 10 blt 2c <shift+0x14> 20: 7c 64 4c 30 srw r4,r3,r9 24: 38 60 00 00 li r3,0 28: 4e 80 00 20 blr 2c: 54 69 08 3c rlwinm r9,r3,1,0,30 30: 21 45 00 1f subfic r10,r5,31 34: 7c 84 2c 30 srw r4,r4,r5 38: 7d 29 50 30 slw r9,r9,r10 3c: 7c 63 2c 30 srw r3,r3,r5 40: 7d 24 23 78 or r4,r9,r4 44: 4e 80 00 20 blr Even when forcing the shift with an &= 31, it still considers the shift as likely >= 32. Define a vdso_shift_ns() macro that can be overriden by arches. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> --- lib/vdso/gettimeofday.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)