diff mbox series

[RFC,v4,10/11] lib: vdso: Allow arches to override the ns shift operation

Message ID c8ce9baaef0dc7273e4bcc31f353b17b655113d1.1579196675.git.christophe.leroy@c-s.fr (mailing list archive)
State New, archived
Headers show
Series powerpc: switch VDSO to C implementation. | expand

Commit Message

Christophe Leroy Jan. 16, 2020, 5:58 p.m. UTC
On powerpc/32, GCC (8.1) generates pretty bad code for the
ns >>= vd->shift operation taking into account that the
shift is always < 32 and the upper part of the result is
likely to be nul. GCC makes reversed assumptions considering
the shift to be likely >= 32 and the upper part to be like not nul.

unsigned long long shift(unsigned long long x, unsigned char s)
{
	return x >> s;
}

results in:

00000018 <shift>:
  18:	35 25 ff e0 	addic.  r9,r5,-32
  1c:	41 80 00 10 	blt     2c <shift+0x14>
  20:	7c 64 4c 30 	srw     r4,r3,r9
  24:	38 60 00 00 	li      r3,0
  28:	4e 80 00 20 	blr
  2c:	54 69 08 3c 	rlwinm  r9,r3,1,0,30
  30:	21 45 00 1f 	subfic  r10,r5,31
  34:	7c 84 2c 30 	srw     r4,r4,r5
  38:	7d 29 50 30 	slw     r9,r9,r10
  3c:	7c 63 2c 30 	srw     r3,r3,r5
  40:	7d 24 23 78 	or      r4,r9,r4
  44:	4e 80 00 20 	blr

Even when forcing the shift with an &= 31, it still considers
the shift as likely >= 32.

Define a vdso_shift_ns() macro that can be overriden by
arches.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 lib/vdso/gettimeofday.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

Andy Lutomirski Jan. 16, 2020, 7:47 p.m. UTC | #1
On Thu, Jan 16, 2020 at 9:58 AM Christophe Leroy
<christophe.leroy@c-s.fr> wrote:
>
> On powerpc/32, GCC (8.1) generates pretty bad code for the
> ns >>= vd->shift operation taking into account that the
> shift is always < 32 and the upper part of the result is
> likely to be nul. GCC makes reversed assumptions considering
> the shift to be likely >= 32 and the upper part to be like not nul.
>
> unsigned long long shift(unsigned long long x, unsigned char s)
> {
>         return x >> s;
> }
>
> results in:
>
> 00000018 <shift>:
>   18:   35 25 ff e0     addic.  r9,r5,-32
>   1c:   41 80 00 10     blt     2c <shift+0x14>
>   20:   7c 64 4c 30     srw     r4,r3,r9
>   24:   38 60 00 00     li      r3,0
>   28:   4e 80 00 20     blr
>   2c:   54 69 08 3c     rlwinm  r9,r3,1,0,30
>   30:   21 45 00 1f     subfic  r10,r5,31
>   34:   7c 84 2c 30     srw     r4,r4,r5
>   38:   7d 29 50 30     slw     r9,r9,r10
>   3c:   7c 63 2c 30     srw     r3,r3,r5
>   40:   7d 24 23 78     or      r4,r9,r4
>   44:   4e 80 00 20     blr
>
> Even when forcing the shift with an &= 31, it still considers
> the shift as likely >= 32.
>
> Define a vdso_shift_ns() macro that can be overriden by
> arches.

Would mul_u64_u64_shr() be a good alternative?  Could we adjust it to
assume the shift is less than 32?  That function exists to benefit
32-bit arches.

--Andy
Thomas Gleixner Jan. 16, 2020, 7:57 p.m. UTC | #2
Andy Lutomirski <luto@kernel.org> writes:
> On Thu, Jan 16, 2020 at 9:58 AM Christophe Leroy
>
> Would mul_u64_u64_shr() be a good alternative?  Could we adjust it to
> assume the shift is less than 32?  That function exists to benefit
> 32-bit arches.

We'd want mul_u64_u32_shr() for this. The rules for mult and shift are:

     1 <= mult <= U32_MAX

     1 <= shift <= 32

If we want to enforce a shift < 32 we need to limit that conditionally
in the calculation/registration function.

Thanks,

        tglx
Andy Lutomirski Jan. 16, 2020, 8:20 p.m. UTC | #3
On Thu, Jan 16, 2020 at 11:57 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Andy Lutomirski <luto@kernel.org> writes:
> > On Thu, Jan 16, 2020 at 9:58 AM Christophe Leroy
> >
> > Would mul_u64_u64_shr() be a good alternative?  Could we adjust it to
> > assume the shift is less than 32?  That function exists to benefit
> > 32-bit arches.
>
> We'd want mul_u64_u32_shr() for this. The rules for mult and shift are:
>

That's what I meant to type...

>      1 <= mult <= U32_MAX
>
>      1 <= shift <= 32
>
> If we want to enforce a shift < 32 we need to limit that conditionally
> in the calculation/registration function.
>
> Thanks,
>
>         tglx
>
Thomas Gleixner Jan. 29, 2020, 7:14 a.m. UTC | #4
Andy Lutomirski <luto@kernel.org> writes:

> On Thu, Jan 16, 2020 at 11:57 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> Andy Lutomirski <luto@kernel.org> writes:
>> > On Thu, Jan 16, 2020 at 9:58 AM Christophe Leroy
>> >
>> > Would mul_u64_u64_shr() be a good alternative?  Could we adjust it to
>> > assume the shift is less than 32?  That function exists to benefit
>> > 32-bit arches.
>>
>> We'd want mul_u64_u32_shr() for this. The rules for mult and shift are:
>>
>
> That's what I meant to type...

Just that it does not work. The math is:

     ns = d->nsecs;   // That's the nsec value shifted left by d->shift

     ns += ((cur - d->last) & d->mask) * mult;

     ns >>= d->shift;

So we cannot use mul_u64_u32_shr() because we need the addition there
before shifting. And no, we can't drop the fractional part of
d->nsecs. Been there, done that, got sporadic time going backwards
problems as a reward. Need to look at that again as stuff has changed
over time.

On x86 we enforce that mask is 64bit, so the & operation is not there,
but due to the nasties of TSC we have that conditional

    if (cur > last)
       return (cur - last) * mult;
    return 0;

Christophe, on PPC the decrementer/RTC clocksource masks are 64bit as
well, so you can spare that & operation there too.

Thanks,

        tglx
Christophe Leroy Jan. 29, 2020, 7:26 a.m. UTC | #5
Le 29/01/2020 à 08:14, Thomas Gleixner a écrit :
> Andy Lutomirski <luto@kernel.org> writes:
> 
>> On Thu, Jan 16, 2020 at 11:57 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>>>
>>> Andy Lutomirski <luto@kernel.org> writes:
>>>> On Thu, Jan 16, 2020 at 9:58 AM Christophe Leroy
>>>>
>>>> Would mul_u64_u64_shr() be a good alternative?  Could we adjust it to
>>>> assume the shift is less than 32?  That function exists to benefit
>>>> 32-bit arches.
>>>
>>> We'd want mul_u64_u32_shr() for this. The rules for mult and shift are:
>>>
>>
>> That's what I meant to type...
> 
> Just that it does not work. The math is:
> 
>       ns = d->nsecs;   // That's the nsec value shifted left by d->shift
> 
>       ns += ((cur - d->last) & d->mask) * mult;
> 
>       ns >>= d->shift;
> 
> So we cannot use mul_u64_u32_shr() because we need the addition there
> before shifting. And no, we can't drop the fractional part of
> d->nsecs. Been there, done that, got sporadic time going backwards
> problems as a reward. Need to look at that again as stuff has changed
> over time.
> 
> On x86 we enforce that mask is 64bit, so the & operation is not there,
> but due to the nasties of TSC we have that conditional
> 
>      if (cur > last)
>         return (cur - last) * mult;
>      return 0;
> 
> Christophe, on PPC the decrementer/RTC clocksource masks are 64bit as
> well, so you can spare that & operation there too.
> 

Yes, I did it already. It spares reading d->mast, that the main advantage.

Christophe
diff mbox series

Patch

diff --git a/lib/vdso/gettimeofday.c b/lib/vdso/gettimeofday.c
index 724b45c3e8ac..9ba92058cfd7 100644
--- a/lib/vdso/gettimeofday.c
+++ b/lib/vdso/gettimeofday.c
@@ -39,6 +39,13 @@  u64 vdso_calc_delta(u64 cycles, u64 last, u64 mask, u32 mult)
 }
 #endif
 
+#ifndef vdso_shift_ns
+static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift)
+{
+	return ns >> shift;
+}
+#endif
+
 #ifndef __arch_vdso_hres_capable
 static inline bool __arch_vdso_hres_capable(void)
 {
@@ -148,7 +155,7 @@  static __always_inline int do_hres(const struct vdso_data *vd, clockid_t clk,
 		ns = vdso_ts->nsec;
 		last = vd->cycle_last;
 		ns += vdso_calc_delta(cycles, last, vd->mask, vd->mult);
-		ns >>= vd->shift;
+		ns = vdso_shift_ns(ns, vd->shift);
 		sec = vdso_ts->sec;
 	} while (unlikely(vdso_read_retry(vd, seq)));