diff mbox

[v10,1/3] lib: Add strongly typed 64bit int_sqrt

Message ID 20171220142001.18161-1-cmo@melexis.com (mailing list archive)
State New, archived
Headers show

Commit Message

Crt Mori Dec. 20, 2017, 2:20 p.m. UTC
There is no option to perform 64bit integer sqrt on 32bit platform.
Added stronger typed int_sqrt64 enables the 64bit calculations to
be performed on 32bit platforms. Although int_sqrt() is a rough
approximation, the same algorithm is used in int_sqrt64() as good
enough on 32bit platform.

Signed-off-by: Crt Mori <cmo@melexis.com>
---
 include/linux/kernel.h |  9 +++++++++
 lib/int_sqrt.c         | 31 +++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+)

Comments

David Laight Dec. 20, 2017, 2:39 p.m. UTC | #1
From: Crt Mori
> Sent: 20 December 2017 14:20
> 
> There is no option to perform 64bit integer sqrt on 32bit platform.
> Added stronger typed int_sqrt64 enables the 64bit calculations to
> be performed on 32bit platforms. Although int_sqrt() is a rough
> approximation, the same algorithm is used in int_sqrt64() as good
> enough on 32bit platform.

The algorithm used gives an exact (rounded down) square root,
not an approximation.

With minor changes it ought to be possible to remove most of the
64bit arithmetic and shifts.

If you care about performance then using 32 bit maths will be much faster.

	David

--
To unsubscribe from this list: send the line "unsubscribe linux-iio" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Crt Mori Dec. 20, 2017, 3:41 p.m. UTC | #2
On 20 December 2017 at 15:39, David Laight <David.Laight@aculab.com> wrote:
> From: Crt Mori
>> Sent: 20 December 2017 14:20
>>
>> There is no option to perform 64bit integer sqrt on 32bit platform.
>> Added stronger typed int_sqrt64 enables the 64bit calculations to
>> be performed on 32bit platforms. Although int_sqrt() is a rough
>> approximation, the same algorithm is used in int_sqrt64() as good
>> enough on 32bit platform.
>
> The algorithm used gives an exact (rounded down) square root,
> not an approximation.
>

OK, noted. This is from new changed function indeed - will change in v11.

> With minor changes it ought to be possible to remove most of the
> 64bit arithmetic and shifts.
>
> If you care about performance then using 32 bit maths will be much faster.
>

I need precision not the performance. That is why I would like to
leave int_sqrt as one for good performance while my will be used when
you require a precision.

>         David
>
--
To unsubscribe from this list: send the line "unsubscribe linux-iio" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra Dec. 20, 2017, 4 p.m. UTC | #3
On Wed, Dec 20, 2017 at 02:39:26PM +0000, David Laight wrote:

> With minor changes it ought to be possible to remove most of the
> 64bit arithmetic and shifts.
> 
> If you care about performance then using 32 bit maths will be much faster.

Some, u64 add/sub/shift isn't exactly expensive, but yes, I also
indicated that improvement is possible. At the very least y can be made
a u32 I suppose.
--
To unsubscribe from this list: send the line "unsubscribe linux-iio" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Crt Mori Dec. 20, 2017, 4:17 p.m. UTC | #4
On 20 December 2017 at 17:00, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Dec 20, 2017 at 02:39:26PM +0000, David Laight wrote:
>
>> With minor changes it ought to be possible to remove most of the
>> 64bit arithmetic and shifts.
>>
>> If you care about performance then using 32 bit maths will be much faster.
>
> Some, u64 add/sub/shift isn't exactly expensive, but yes, I also
> indicated that improvement is possible. At the very least y can be made
> a u32 I suppose.

OK, is there any more easy optimizations you see?
--
To unsubscribe from this list: send the line "unsubscribe linux-iio" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Laight Dec. 20, 2017, 4:46 p.m. UTC | #5
From: Crt Mori

> Sent: 20 December 2017 16:17

> 

> On 20 December 2017 at 17:00, Peter Zijlstra <peterz@infradead.org> wrote:

> > On Wed, Dec 20, 2017 at 02:39:26PM +0000, David Laight wrote:

> >

> >> With minor changes it ought to be possible to remove most of the

> >> 64bit arithmetic and shifts.

> >>

> >> If you care about performance then using 32 bit maths will be much faster.

> >

> > Some, u64 add/sub/shift isn't exactly expensive, but yes, I also

> > indicated that improvement is possible. At the very least y can be made

> > a u32 I suppose.

> 

> OK, is there any more easy optimizations you see?


I think this version works.
It doesn't have the optimisation for small values.

unsigned int sqrt64(unsigned long long x)
{
        unsigned int x_hi = x >> 32;

        unsigned int b = 0;
        unsigned int y = 0;
        unsigned int i;

        for (i = 0; i < 32; i++) {
                b <<= 2;
                b |= x_hi >> 30;
                x_hi <<= 2;
                if (i == 15)
                        x_hi = x;
                y <<= 1;
                if (b > y)
                        b -= ++y;
        }
        return y;
}

Put it through cc -O3 -m32 -c -o sqrt64.o sqrt64.c and then objdump sqrt64.o
and compare to that of your version.

	David
Joe Perches Dec. 20, 2017, 5:19 p.m. UTC | #6
On Wed, 2017-12-20 at 16:46 +0000, David Laight wrote:
> From: Crt Mori
> > Sent: 20 December 2017 16:17
> > 
> > On 20 December 2017 at 17:00, Peter Zijlstra <peterz@infradead.org> wrote:
> > > On Wed, Dec 20, 2017 at 02:39:26PM +0000, David Laight wrote:
> > > 
> > > > With minor changes it ought to be possible to remove most of the
> > > > 64bit arithmetic and shifts.
> > > > 
> > > > If you care about performance then using 32 bit maths will be much faster.
> > > 
> > > Some, u64 add/sub/shift isn't exactly expensive, but yes, I also
> > > indicated that improvement is possible. At the very least y can be made
> > > a u32 I suppose.
> > 
> > OK, is there any more easy optimizations you see?
> 
> I think this version works.
> It doesn't have the optimisation for small values.
> 
> unsigned int sqrt64(unsigned long long x)
> {
>         unsigned int x_hi = x >> 32;
> 
>         unsigned int b = 0;
>         unsigned int y = 0;
>         unsigned int i;
> 

Perhaps add:

	if (x <= UINT_MAX)
		return int_sqrt((unsigned long)x);

>         for (i = 0; i < 32; i++) {
>                 b <<= 2;
>                 b |= x_hi >> 30;
>                 x_hi <<= 2;
>                 if (i == 15)
>                         x_hi = x;
>                 y <<= 1;
>                 if (b > y)
>                         b -= ++y;
>         }
>         return y;
> }
> 
> Put it through cc -O3 -m32 -c -o sqrt64.o sqrt64.c and then objdump sqrt64.o
> and compare to that of your version.
> 
> 	David
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-iio" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Crt Mori Dec. 20, 2017, 5:30 p.m. UTC | #7
On 20 December 2017 at 17:46, David Laight <David.Laight@aculab.com> wrote:
> From: Crt Mori
>> Sent: 20 December 2017 16:17
>>
>> On 20 December 2017 at 17:00, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Wed, Dec 20, 2017 at 02:39:26PM +0000, David Laight wrote:
>> >
>> >> With minor changes it ought to be possible to remove most of the
>> >> 64bit arithmetic and shifts.
>> >>
>> >> If you care about performance then using 32 bit maths will be much faster.
>> >
>> > Some, u64 add/sub/shift isn't exactly expensive, but yes, I also
>> > indicated that improvement is possible. At the very least y can be made
>> > a u32 I suppose.
>>
>> OK, is there any more easy optimizations you see?
>
> I think this version works.
> It doesn't have the optimisation for small values.
>
> unsigned int sqrt64(unsigned long long x)
> {
>         unsigned int x_hi = x >> 32;
>
>         unsigned int b = 0;
>         unsigned int y = 0;
>         unsigned int i;
>
>         for (i = 0; i < 32; i++) {
>                 b <<= 2;
>                 b |= x_hi >> 30;
>                 x_hi <<= 2;
>                 if (i == 15)
>                         x_hi = x;
>                 y <<= 1;
>                 if (b > y)
>                         b -= ++y;
>         }
>         return y;
> }
>
> Put it through cc -O3 -m32 -c -o sqrt64.o sqrt64.c and then objdump sqrt64.o
> and compare to that of your version.
>
>         David
>

Wow, thanks. This seems like a major change.

I did a quick run through unit tests for the sensor and the results
are way off. On the sensor I had to convert double calculations to
integer calculations and target was to get end result within 0.02 degC
(with previous approximate sqrt implementation) in sensor working
range. This now gets into 3 degC delta at least and some are way off.
It might be off because of some scaling on the other hand during the
equation (not exactly comparing sqrt implementations here).

I will need a bit more testing of this to see if it is replaceable.
--
To unsubscribe from this list: send the line "unsubscribe linux-iio" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Laight Dec. 21, 2017, 10:02 a.m. UTC | #8
From: Joe Perches
> Sent: 20 December 2017 17:20
...
> > I think this version works.
> > It doesn't have the optimisation for small values.
> >
> > unsigned int sqrt64(unsigned long long x)
> > {
> >         unsigned int x_hi = x >> 32;
> >
> >         unsigned int b = 0;
> >         unsigned int y = 0;
> >         unsigned int i;
> >
> 
> Perhaps add:
> 
> 	if (x <= UINT_MAX)
> 		return int_sqrt((unsigned long)x);

Actually something like:
	i = 32;
	if (!x_hi) {
		x_hi = x;
		i = 16;
	}
	
	if (!(x_hi & 0xffff0000)) {
		x_hi <<= 16;
		i -= 8;
	}
Repeat for 0xff000000, 0xf0000000 and 0xc0000000 and adjust loop to count down.

	David

> 
> >         for (i = 0; i < 32; i++) {
> >                 b <<= 2;
> >                 b |= x_hi >> 30;
> >                 x_hi <<= 2;
> >                 if (i == 15)
> >                         x_hi = x;
> >                 y <<= 1;
> >                 if (b > y)
> >                         b -= ++y;
> >         }
> >         return y;
> > }
> >
> > Put it through cc -O3 -m32 -c -o sqrt64.o sqrt64.c and then objdump sqrt64.o
> > and compare to that of your version.
> >
> > 	David
> >
--
To unsubscribe from this list: send the line "unsubscribe linux-iio" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Laight Dec. 21, 2017, 10:08 a.m. UTC | #9
RnJvbTogQ3J0IE1vcmkNCj4gU2VudDogMjAgRGVjZW1iZXIgMjAxNyAxNzozMA0KPiBUbzogRGF2
aWQgTGFpZ2h0DQouLi4NCj4gPiBJIHRoaW5rIHRoaXMgdmVyc2lvbiB3b3Jrcy4NCj4gPiBJdCBk
b2Vzbid0IGhhdmUgdGhlIG9wdGltaXNhdGlvbiBmb3Igc21hbGwgdmFsdWVzLg0KPiA+DQo+ID4g
dW5zaWduZWQgaW50IHNxcnQ2NCh1bnNpZ25lZCBsb25nIGxvbmcgeCkNCj4gPiB7DQo+ID4gICAg
ICAgICB1bnNpZ25lZCBpbnQgeF9oaSA9IHggPj4gMzI7DQo+ID4NCj4gPiAgICAgICAgIHVuc2ln
bmVkIGludCBiID0gMDsNCj4gPiAgICAgICAgIHVuc2lnbmVkIGludCB5ID0gMDsNCj4gPiAgICAg
ICAgIHVuc2lnbmVkIGludCBpOw0KPiA+DQo+ID4gICAgICAgICBmb3IgKGkgPSAwOyBpIDwgMzI7
IGkrKykgew0KPiA+ICAgICAgICAgICAgICAgICBiIDw8PSAyOw0KPiA+ICAgICAgICAgICAgICAg
ICBiIHw9IHhfaGkgPj4gMzA7DQo+ID4gICAgICAgICAgICAgICAgIHhfaGkgPDw9IDI7DQo+ID4g
ICAgICAgICAgICAgICAgIGlmIChpID09IDE1KQ0KPiA+ICAgICAgICAgICAgICAgICAgICAgICAg
IHhfaGkgPSB4Ow0KPiA+ICAgICAgICAgICAgICAgICB5IDw8PSAxOw0KPiA+ICAgICAgICAgICAg
ICAgICBpZiAoYiA+IHkpDQo+ID4gICAgICAgICAgICAgICAgICAgICAgICAgYiAtPSArK3k7DQo+
ID4gICAgICAgICB9DQo+ID4gICAgICAgICByZXR1cm4geTsNCj4gPiB9DQo+IA0KPiBXb3csIHRo
YW5rcy4gVGhpcyBzZWVtcyBsaWtlIGEgbWFqb3IgY2hhbmdlLg0KDQpOb3QgcmVhbGx5LCBpdCBp
cyBqdXN0IGRvaW5nIGl0IHNsaWdodGx5IGRpZmZlcmVudGx5Lg0KVGhlIGFsZ29yaXRobSBpcyB0
aGUgJ3NjaG9vbGJveScgb25lIHRoYXQgdXNlZCB0byBiZSB0YXVnaHQgYXQgc2Nob29sLg0KKEFs
dGhvdWdoIEkgZm91bmQgaXQgaW4gYSBib290IGZyb20gdGhlIDE5MzBzLikNCg0KSSBjb21wYXJl
ZCB0aGUgb2JqZWN0IGNvZGUgZm9yIGFtZDY0IChkb2Vzbid0IG5lZWQgdG8gcmVsb2FkDQoneF9o
aScgaGFsZiB3YXkgdGhyb3VnaCkgYWdhaW5zdCB0aGUgb3JpZ2luYWwgbG9vcC4NClRoZXkgYXJl
IG11Y2ggdGhlIHNhbWUgc2l6ZSwgZXhlY3V0aW9uIHNwZWVkIHdpbGwgZGVwZW5kIHRoZSBsZW5n
dGhzDQpvZiB0aGUgZGVwZW5kZW5jeSBjaGFpbnMuDQoNCj4gSSBkaWQgYSBxdWljayBydW4gdGhy
b3VnaCB1bml0IHRlc3RzIGZvciB0aGUgc2Vuc29yIGFuZCB0aGUgcmVzdWx0cw0KPiBhcmUgd2F5
IG9mZi4gT24gdGhlIHNlbnNvciBJIGhhZCB0byBjb252ZXJ0IGRvdWJsZSBjYWxjdWxhdGlvbnMg
dG8NCj4gaW50ZWdlciBjYWxjdWxhdGlvbnMgYW5kIHRhcmdldCB3YXMgdG8gZ2V0IGVuZCByZXN1
bHQgd2l0aGluIDAuMDIgZGVnQw0KPiAod2l0aCBwcmV2aW91cyBhcHByb3hpbWF0ZSBzcXJ0IGlt
cGxlbWVudGF0aW9uKSBpbiBzZW5zb3Igd29ya2luZw0KPiByYW5nZS4gVGhpcyBub3cgZ2V0cyBp
bnRvIDMgZGVnQyBkZWx0YSBhdCBsZWFzdCBhbmQgc29tZSBhcmUgd2F5IG9mZi4NCj4gSXQgbWln
aHQgYmUgb2ZmIGJlY2F1c2Ugb2Ygc29tZSBzY2FsaW5nIG9uIHRoZSBvdGhlciBoYW5kIGR1cmlu
ZyB0aGUNCj4gZXF1YXRpb24gKG5vdCBleGFjdGx5IGNvbXBhcmluZyBzcXJ0IGltcGxlbWVudGF0
aW9ucyBoZXJlKS4NCj4gDQo+IEkgd2lsbCBuZWVkIGEgYml0IG1vcmUgdGVzdGluZyBvZiB0aGlz
IHRvIHNlZSBpZiBpdCBpcyByZXBsYWNlYWJsZS4NCg0KWW91IHByb2JhYmx5IG5lZWQgdG8gcHV0
IHZhbHVlcyB0aHJvdWdoIGJvdGggKGFsbCAzKSBmdW5jdGlvbnMgdG8gc2VlDQp3aGVyZSB5b3Ug
YXJlIGdldHRpbmcgZXJyb3JzLg0KSXQgbWlnaHQgYmUgcm91bmRpbmcsIG9yIHRoZXJlIG1pZ2h0
IGJlIGEgZnViYXIgc29tZXdoZXJlLg0KDQoJRGF2aWQNCg0K
--
To unsubscribe from this list: send the line "unsubscribe linux-iio" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Laight Dec. 21, 2017, 10:59 a.m. UTC | #10
From: Crt Mori

> Sent: 20 December 2017 17:30

> >> OK, is there any more easy optimizations you see?

> >

> > I think this version works.

> > It doesn't have the optimisation for small values.

> >

> > unsigned int sqrt64(unsigned long long x)

> > {

> >         unsigned int x_hi = x >> 32;

> >

> >         unsigned int b = 0;

> >         unsigned int y = 0;

> >         unsigned int i;

> >

> >         for (i = 0; i < 32; i++) {

> >                 b <<= 2;

> >                 b |= x_hi >> 30;

> >                 x_hi <<= 2;

> >                 if (i == 15)

> >                         x_hi = x;

> >                 y <<= 1;

> >                 if (b > y)

> >                         b -= ++y;

> >         }

> >         return y;

> > }

..
> 

> I did a quick run through unit tests for the sensor and the results

> are way off. On the sensor I had to convert double calculations to

> integer calculations and target was to get end result within 0.02 degC

> (with previous approximate sqrt implementation) in sensor working

> range. This now gets into 3 degC delta at least and some are way off.

> It might be off because of some scaling on the other hand during the

> equation (not exactly comparing sqrt implementations here).


I didn't get it quite right...
The last few lines need to be:
		if (b > y) {	
			b -= ++y;
			y++;
		}
	}
	return y >> 1;
}

Although that then fails for inputs larger than 2^62.

	David
David Laight Dec. 21, 2017, 11:43 a.m. UTC | #11
From: Crt Mori

> Sent: 20 December 2017 17:30

> I did a quick run through unit tests for the sensor and the results

> are way off

> ...


Try this version instead:
unsigned int sqrt64(unsigned long long x_in)
{
        unsigned int x = x_in >> 32;

        unsigned int b = 0;
        unsigned int y = 0;
        unsigned int i;

        i = 31;
        if (!x) {
                x = x_in;
                i = 15;
        }
        if (!(x & 0xffff0000)) {
                x <<= 16;
                i -= 8;
        }
        if (!(x & 0xff000000)) {
                x <<= 8;
                i -= 4;
        }
        if (!(x & 0xf0000000)) {
                x <<= 4;
                i -= 2;
        }

        do {
                b <<= 2;
                b |= x >> 30;
                x <<= 2;
                if (i == 16)
                        x = x_in;
                y <<= 1;
                if (b > y) {
                        b -= ++y;
                        y++;
                }
        } while (--i);

        /* 'b' becomes 33 bits if the input is greater than 2^62 */
        b <<= 1;
        b |= x >> 31;
        if (b > y || (b == y && x & (1u << 30)))
                y |= 1;

        return y;
}

I've tested that one with more values.

	David
Crt Mori Dec. 21, 2017, 1:17 p.m. UTC | #12
On 21 December 2017 at 12:43, David Laight <David.Laight@aculab.com> wrote:
> From: Crt Mori
>> Sent: 20 December 2017 17:30
>> I did a quick run through unit tests for the sensor and the results
>> are way off
>> ...
>
> Try this version instead:
> unsigned int sqrt64(unsigned long long x_in)
> {
>         unsigned int x = x_in >> 32;
>
>         unsigned int b = 0;
>         unsigned int y = 0;
>         unsigned int i;

i can be u8. And I will still use explicit typing.

>
>         i = 31;
>         if (!x) {
>                 x = x_in;
>                 i = 15;
>         }
>         if (!(x & 0xffff0000)) {
>                 x <<= 16;
>                 i -= 8;
>         }
>         if (!(x & 0xff000000)) {
>                 x <<= 8;
>                 i -= 4;
>         }
>         if (!(x & 0xf0000000)) {
>                 x <<= 4;
>                 i -= 2;
>         }
>

This part above looks like FLS

>         do {
>                 b <<= 2;
>                 b |= x >> 30;
>                 x <<= 2;
>                 if (i == 16)
>                         x = x_in;
>                 y <<= 1;
>                 if (b > y) {
>                         b -= ++y;
>                         y++;
>                 }
>         } while (--i);
>
>         /* 'b' becomes 33 bits if the input is greater than 2^62 */
>         b <<= 1;
>         b |= x >> 31;
>         if (b > y || (b == y && x & (1u << 30)))
>                 y |= 1;
>
>         return y;
> }
>
> I've tested that one with more values.
>
>         David
>

This one indeed works. I did some more testing this morning and I am
fine with either.

So question is: Do I make change as per David's suggestion with his
sign-off, or leave the version it was before the change?
--
To unsubscribe from this list: send the line "unsubscribe linux-iio" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Laight Dec. 21, 2017, 1:56 p.m. UTC | #13
RnJvbTogQ3J0IE1vcmkNCj4gU2VudDogMjEgRGVjZW1iZXIgMjAxNyAxMzoxOA0KLi4uDQo+ID4g
ICAgICAgICB1bnNpZ25lZCBpbnQgaTsNCj4gDQo+IGkgY2FuIGJlIHU4LiBBbmQgSSB3aWxsIHN0
aWxsIHVzZSBleHBsaWNpdCB0eXBpbmcuDQoNCnU4IHdpbGwgYWRkIGV4dHJhIGNvZGUsIHVuc2ln
bmVkIGludCBpcyBnb29kLg0KJ3gnIG5lZWRzIHRvIGJlIHUzMiwgJ3knIGFuZCAnYicgY291bGQg
YmUgbGFyZ2VyLg0KDQpJIHdhcyB0ZXN0aW5nIGluIHVzZXJzcGFjZS4NCg0KLi4uDQo+IFRoaXMg
cGFydCBhYm92ZSBsb29rcyBsaWtlIEZMUw0KSXQgYWxzbyBkb2VzIHRoZSByZXN0IG9mIHRoZSBy
ZXF1aXJlZCBzaGlmdHMuDQoNCi4uLg0KPiBUaGlzIG9uZSBpbmRlZWQgd29ya3MuIEkgZGlkIHNv
bWUgbW9yZSB0ZXN0aW5nIHRoaXMgbW9ybmluZyBhbmQgSSBhbQ0KPiBmaW5lIHdpdGggZWl0aGVy
Lg0KPiANCj4gU28gcXVlc3Rpb24gaXM6IERvIEkgbWFrZSBjaGFuZ2UgYXMgcGVyIERhdmlkJ3Mg
c3VnZ2VzdGlvbiB3aXRoIGhpcw0KPiBzaWduLW9mZiwgb3IgbGVhdmUgdGhlIHZlcnNpb24gaXQg
d2FzIGJlZm9yZSB0aGUgY2hhbmdlPw0KDQpJZiB5b3UgZ2VuZXJhdGUgdGhlIGFjdHVhbCBwYXRj
aCBJIGNhbiBhZGQgYW4gYXBwcm9wcmlhdGUgc2lnbi1vZmYNCihvciB3aGF0ZXZlciBpcyBuZWVk
ZWQpLg0KDQoJRGF2aWQNCg0K
--
To unsubscribe from this list: send the line "unsubscribe linux-iio" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra Dec. 21, 2017, 2:11 p.m. UTC | #14
On Thu, Dec 21, 2017 at 01:56:55PM +0000, David Laight wrote:
> > This part above looks like FLS
> It also does the rest of the required shifts.

Still, fls() + shift is way faster on hardware that has an fls
instruction.

Writing out that binary search doesn't make sense.
--
To unsubscribe from this list: send the line "unsubscribe linux-iio" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Laight Dec. 21, 2017, 2:48 p.m. UTC | #15
From: Peter Zijlstra
> Sent: 21 December 2017 14:12
...
> > > This part above looks like FLS
> > It also does the rest of the required shifts.
> 
> Still, fls() + shift is way faster on hardware that has an fls
> instruction.
> 
> Writing out that binary search doesn't make sense.

If the hardware doesn't have an appropriate fls instruction
the soft fls()will be worse.

If you used fls() you'd still need quite a bit of code
to generate the correct shift and loop count adjustment.
Given the cost of the loop iterations the 3 tests are noise.
The open coded version is obviously correct...

I didn't add the 4th one because the code always does 2 iterations.

If you were really worried about performance there are faster
algorithms (even doing 2 or 4 bits a time is faster).

	David

--
To unsubscribe from this list: send the line "unsubscribe linux-iio" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Crt Mori Dec. 22, 2017, 1:44 p.m. UTC | #16
From simple strong typing of existing int_sqrt we came to something a
bit more complex or better. Can we decide now which we want in, or I
submit v12 and we decide then (although it is not a v12, but whole new
thing)?

On 21 December 2017 at 15:48, David Laight <David.Laight@aculab.com> wrote:
> From: Peter Zijlstra
>> Sent: 21 December 2017 14:12
> ...
>> > > This part above looks like FLS
>> > It also does the rest of the required shifts.
>>
>> Still, fls() + shift is way faster on hardware that has an fls
>> instruction.
>>
>> Writing out that binary search doesn't make sense.
>
> If the hardware doesn't have an appropriate fls instruction
> the soft fls()will be worse.
>
> If you used fls() you'd still need quite a bit of code
> to generate the correct shift and loop count adjustment.
> Given the cost of the loop iterations the 3 tests are noise.
> The open coded version is obviously correct...
>
> I didn't add the 4th one because the code always does 2 iterations.
>
> If you were really worried about performance there are faster
> algorithms (even doing 2 or 4 bits a time is faster).
>
>         David
>
--
To unsubscribe from this list: send the line "unsubscribe linux-iio" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Crt Mori Jan. 9, 2018, 3:18 p.m. UTC | #17
It has been some time now since this moved. I have decided not to use
David's implementation because I want to maintain also range above
2^62

Are there any additional objections for this not to go in as it is?

On 22 December 2017 at 14:44, Crt Mori <cmo@melexis.com> wrote:
>
> From simple strong typing of existing int_sqrt we came to something a
> bit more complex or better. Can we decide now which we want in, or I
> submit v12 and we decide then (although it is not a v12, but whole new
> thing)?
>
> On 21 December 2017 at 15:48, David Laight <David.Laight@aculab.com> wrote:
> > From: Peter Zijlstra
> >> Sent: 21 December 2017 14:12
> > ...
> >> > > This part above looks like FLS
> >> > It also does the rest of the required shifts.
> >>
> >> Still, fls() + shift is way faster on hardware that has an fls
> >> instruction.
> >>
> >> Writing out that binary search doesn't make sense.
> >
> > If the hardware doesn't have an appropriate fls instruction
> > the soft fls()will be worse.
> >
> > If you used fls() you'd still need quite a bit of code
> > to generate the correct shift and loop count adjustment.
> > Given the cost of the loop iterations the 3 tests are noise.
> > The open coded version is obviously correct...
> >
> > I didn't add the 4th one because the code always does 2 iterations.
> >
> > If you were really worried about performance there are faster
> > algorithms (even doing 2 or 4 bits a time is faster).
> >
> >         David
> >
--
To unsubscribe from this list: send the line "unsubscribe linux-iio" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Laight Jan. 12, 2018, 9:41 a.m. UTC | #18
RnJvbTogQ3J0IE1vcmkgW21haWx0bzpjbW9AbWVsZXhpcy5jb21dDQo+IFNlbnQ6IDA5IEphbnVh
cnkgMjAxOCAxNToxOA0KPiANCj4gSXQgaGFzIGJlZW4gc29tZSB0aW1lIG5vdyBzaW5jZSB0aGlz
IG1vdmVkLiBJIGhhdmUgZGVjaWRlZCBub3QgdG8gdXNlDQo+IERhdmlkJ3MgaW1wbGVtZW50YXRp
b24gYmVjYXVzZSBJIHdhbnQgdG8gbWFpbnRhaW4gYWxzbyByYW5nZSBhYm92ZQ0KPiAyXjYyDQoN
ClRoZSBsYXN0IHZlcnNpb24gSSBkaWQgc3VwcG9ydGVkIHRoZSBmdWxsIHJhbmdlLg0KDQoJRGF2
aWQNCg0K
--
To unsubscribe from this list: send the line "unsubscribe linux-iio" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Crt Mori Jan. 15, 2018, 8:17 a.m. UTC | #19
On 12 January 2018 at 10:41, David Laight <David.Laight@aculab.com> wrote:
> From: Crt Mori [mailto:cmo@melexis.com]
>> Sent: 09 January 2018 15:18
>>
>> It has been some time now since this moved. I have decided not to use
>> David's implementation because I want to maintain also range above
>> 2^62
>
> The last version I did supported the full range.

Nothing changed below that comment, so I was assuming it is not
supported (or did I miss a mail?).

Also fls discussion had opposite opinions or it just seems inconclusive to me?

>
>         David
>

So you all agree David Laight version is much better and should be
used instead of currently proposed version?
--
To unsubscribe from this list: send the line "unsubscribe linux-iio" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 0ad4c3044cf9..e4a3dc64e650 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -460,6 +460,15 @@  extern int func_ptr_is_kernel_text(void *ptr);
 
 unsigned long int_sqrt(unsigned long);
 
+#if BITS_PER_LONG < 64
+u32 int_sqrt64(u64 x);
+#else
+static inline u32 int_sqrt64(u64 x)
+{
+	return (u32)int_sqrt(x);
+}
+#endif
+
 extern void bust_spinlocks(int yes);
 extern int oops_in_progress;		/* If set, an oops, panic(), BUG() or die() is in progress */
 extern int panic_timeout;
diff --git a/lib/int_sqrt.c b/lib/int_sqrt.c
index 1ef4cc344977..184da54677ec 100644
--- a/lib/int_sqrt.c
+++ b/lib/int_sqrt.c
@@ -7,6 +7,7 @@ 
 
 #include <linux/kernel.h>
 #include <linux/export.h>
+#include <linux/bitops.h>
 
 /**
  * int_sqrt - rough approximation to sqrt
@@ -36,3 +37,33 @@  unsigned long int_sqrt(unsigned long x)
 	return y;
 }
 EXPORT_SYMBOL(int_sqrt);
+
+#if BITS_PER_LONG < 64
+/**
+ * int_sqrt64 - strongly typed int_sqrt function when minimum 64 bit input
+ * is expected.
+ * @x: 64bit integer of which to calculate the sqrt
+ */
+u32 int_sqrt64(u64 x)
+{
+	u64 b, m, y = 0;
+
+	if (x <= 1)
+		return x;
+
+	m = 1ULL << (fls64(x) & ~1ULL);
+	while (m != 0) {
+		b = y + m;
+		y >>= 1;
+
+		if (x >= b) {
+			x -= b;
+			y += m;
+		}
+		m >>= 2;
+	}
+
+	return y;
+}
+EXPORT_SYMBOL(int_sqrt64);
+#endif