Message ID | alpine.DEB.2.21.2104200044060.44318@angie.orcam.me.uk (mailing list archive) |
---|---|
Headers | show |
Series | Reinstate and improve MIPS `do_div' implementation | expand |
On Tue, Apr 20, 2021 at 04:50:22AM +0200, Maciej W. Rozycki wrote: > Hi, > > As Huacai has recently discovered the MIPS backend for `do_div' has been > broken and inadvertently disabled with commit c21004cd5b4c ("MIPS: Rewrite > <asm/div64.h> to work with gcc 4.4.0."). As it is code I have originally > written myself and Huacai had issues bringing it back to life leading to a > request to discard it even I have decided to step in. > > In the end I have fixed the code and measured its performance to be ~100% > better on average than our generic code. I have decided it would be worth > having the test module I have prepared for correctness evaluation as well > as benchmarking, so I have included it with the series, also so that I can > refer to the results easily. > > In the end I have included four patches on this occasion: 1/4 is the test > module, 2/4 is an inline documentation fix/clarification for the `do_div' > wrapper, 3/4 enables the MIPS `__div64_32' backend and 4/4 adds a small > performance improvement to it. > > I have investigated a fifth change as a potential improvement where I > replaced the call to `do_div64_32' with a DIVU instruction for cases where > the high part of the intermediate divident is zero, but it has turned out > to regress performance a little, so I have discarded it. > > Also a follow-up change might be worth having to reduce the code size and > place `__div64_32' out of line for CC_OPTIMIZE_FOR_SIZE configurations, > but I have not fully prepared such a change at this time. I did use the > WIP form I have for performance evaluation however; see the figures quoted > with 4/4. > > These changes have been verified with a DECstation system with an R3400 > MIPS I processor @40MHz and a MTI Malta system with a 5Kc MIPS64 processor > @160MHz. > > See individual change descriptions and any additional discussions for > further details. > > Questions, comments or concerns? Otherwise please apply. series applied to mips-next. Thomas.
On Wed, 21 Apr 2021, Thomas Bogendoerfer wrote:
> series applied to mips-next.
Thanks! I was about to send v2 with a small fix for an issue discovered
with message formatting in the failure path of the test_div64 module,
which was supposed to align the expected result with one actually produced
for easier visual comparison. Somehow I failed to realise that format
specifiers will expand to text of a different length and therefore source
string alignment won't do.
I'll send an incremental change then.
Maciej
Hi, > Am 20.04.2021 um 04:50 schrieb Maciej W. Rozycki <macro@orcam.me.uk>: > > Hi, > > As Huacai has recently discovered the MIPS backend for `do_div' has been > broken and inadvertently disabled with commit c21004cd5b4c ("MIPS: Rewrite > <asm/div64.h> to work with gcc 4.4.0."). As it is code I have originally > written myself and Huacai had issues bringing it back to life leading to a > request to discard it even I have decided to step in. > > In the end I have fixed the code and measured its performance to be ~100% > better on average than our generic code. That would be good. > I have decided it would be worth > having the test module I have prepared for correctness evaluation as well > as benchmarking, so I have included it with the series, also so that I can > refer to the results easily. > > In the end I have included four patches on this occasion: 1/4 is the test > module, 2/4 is an inline documentation fix/clarification for the `do_div' > wrapper, 3/4 enables the MIPS `__div64_32' backend and 4/4 adds a small > performance improvement to it. How can I apply them to the kernel? There is something wrong which makes git am fail. > > I have investigated a fifth change as a potential improvement where I > replaced the call to `do_div64_32' with a DIVU instruction for cases where > the high part of the intermediate divident is zero, but it has turned out > to regress performance a little, so I have discarded it. > > Also a follow-up change might be worth having to reduce the code size and > place `__div64_32' out of line for CC_OPTIMIZE_FOR_SIZE configurations, > but I have not fully prepared such a change at this time. I did use the > WIP form I have for performance evaluation however; see the figures quoted > with 4/4. > > These changes have been verified with a DECstation system with an R3400 > MIPS I processor @40MHz and a MTI Malta system with a 5Kc MIPS64 processor > @160MHz. I'd like to test on ~320 MHz JZ4730. > > See individual change descriptions and any additional discussions for > further details. > > Questions, comments or concerns? Otherwise please apply. > > Maciej BR and thanks, Nikolaus Schaller
On Wed, 21 Apr 2021, H. Nikolaus Schaller wrote: > > In the end I have included four patches on this occasion: 1/4 is the test > > module, 2/4 is an inline documentation fix/clarification for the `do_div' > > wrapper, 3/4 enables the MIPS `__div64_32' backend and 4/4 adds a small > > performance improvement to it. > > How can I apply them to the kernel? There is something wrong which makes > git am fail. I don't know. The changes were made against vanilla 5.12-rc7, but then the pieces affected have not changed for ages. FWIW I can `git am' the series as received back just fine. > > These changes have been verified with a DECstation system with an R3400 > > MIPS I processor @40MHz and a MTI Malta system with a 5Kc MIPS64 processor > > @160MHz. > > I'd like to test on ~320 MHz JZ4730. Sure, I'd love to hear how this code performs with other implementations. Maciej
Hi, > Am 21.04.2021 um 21:04 schrieb Maciej W. Rozycki <macro@orcam.me.uk>: > > On Wed, 21 Apr 2021, H. Nikolaus Schaller wrote: > >>> In the end I have included four patches on this occasion: 1/4 is the test >>> module, 2/4 is an inline documentation fix/clarification for the `do_div' >>> wrapper, 3/4 enables the MIPS `__div64_32' backend and 4/4 adds a small >>> performance improvement to it. >> >> How can I apply them to the kernel? There is something wrong which makes >> git am fail. > > I don't know. The changes were made against vanilla 5.12-rc7, but then > the pieces affected have not changed for ages. FWIW I can `git am' the > series as received back just fine. Please can you point me to some download/pull/gitweb? It seems as if the series also did not appear at https://patchwork.kernel.org/project/linux-mips/list/ > >>> These changes have been verified with a DECstation system with an R3400 >>> MIPS I processor @40MHz and a MTI Malta system with a 5Kc MIPS64 processor >>> @160MHz. >> >> I'd like to test on ~320 MHz JZ4730. > > Sure, I'd love to hear how this code performs with other implementations. > > Maciej BR and thanks, Nikolaus Schaller
On Thu, Apr 22, 2021, at 1:53 PM, H. Nikolaus Schaller wrote: > Hi, > > > Am 21.04.2021 um 21:04 schrieb Maciej W. Rozycki <macro@orcam.me.uk>: > > > > On Wed, 21 Apr 2021, H. Nikolaus Schaller wrote: > > > >>> In the end I have included four patches on this occasion: 1/4 is the test > >>> module, 2/4 is an inline documentation fix/clarification for the `do_div' > >>> wrapper, 3/4 enables the MIPS `__div64_32' backend and 4/4 adds a small > >>> performance improvement to it. > >> > >> How can I apply them to the kernel? There is something wrong which makes > >> git am fail. > > > > I don't know. The changes were made against vanilla 5.12-rc7, but then > > the pieces affected have not changed for ages. FWIW I can `git am' the > > series as received back just fine. > > Please can you point me to some download/pull/gitweb? It seems as if the series > also did not appear at https://patchwork.kernel.org/project/linux-mips/list/ > You may try download raw from: https://lore.kernel.org/linux-mips/E6326E8A-50DA-4F81-9865-F29EE0E298A9@goldelico.com/T/#t
On Thu, 22 Apr 2021, Jiaxun Yang wrote: > > Please can you point me to some download/pull/gitweb? It seems as if the series > > also did not appear at https://patchwork.kernel.org/project/linux-mips/list/ > > > > You may try download raw from: > > https://lore.kernel.org/linux-mips/E6326E8A-50DA-4F81-9865-F29EE0E298A9@goldelico.com/T/#t Or you can cherry-pick the commits directly from mips-next: <git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux.git>. Maciej
Hi, > Am 22.04.2021 um 15:39 schrieb Jiaxun Yang <jiaxun.yang@flygoat.com>: > > > > On Thu, Apr 22, 2021, at 1:53 PM, H. Nikolaus Schaller wrote: >> Hi, >> >>> Am 21.04.2021 um 21:04 schrieb Maciej W. Rozycki <macro@orcam.me.uk>: >>> >>> On Wed, 21 Apr 2021, H. Nikolaus Schaller wrote: >>> >>>>> In the end I have included four patches on this occasion: 1/4 is the test >>>>> module, 2/4 is an inline documentation fix/clarification for the `do_div' >>>>> wrapper, 3/4 enables the MIPS `__div64_32' backend and 4/4 adds a small >>>>> performance improvement to it. >>>> >>>> How can I apply them to the kernel? There is something wrong which makes >>>> git am fail. >>> >>> I don't know. The changes were made against vanilla 5.12-rc7, but then >>> the pieces affected have not changed for ages. FWIW I can `git am' the >>> series as received back just fine. >> >> Please can you point me to some download/pull/gitweb? It seems as if the series >> also did not appear at https://patchwork.kernel.org/project/linux-mips/list/ >> > > You may try download raw from: > > https://lore.kernel.org/linux-mips/E6326E8A-50DA-4F81-9865-F29EE0E298A9@goldelico.com/T/#t I simply tried again and it seems that I had tried to git am it on top of the patches from Huacai which of course fails. Now I could run the tests: from [PATCH 4/4]: > This has passed correctness verification with test_div64 and reduced the > module's average execution time down to 1.0445s and 0.2619s from 1.0668s > and 0.2629s respectively for an R3400 CPU @40MHz and a 5Kc CPU @160MHz. test only [PATCH 1/4 and 2/4]: [ 256.301140] test_div64: Completed 64bit/32bit division and modulo test, 0.291154944s elapsed + [PATCH 3/4] [ 1698.698920] test_div64: Completed 64bit/32bit division and modulo test, 0.132142865s elapsed + [PATCH 4/4] [ 466.818349] test_div64: Completed 64bit/32bit division and modulo test, 0.134429075s elapsed So the new code is indeed faster than the default implementation. [PATCH 4/4] has no significant influence (wouldn't say it is slower because timer resolution isn't very high on this machine and the kernel has some scheduling issue [1]). Anyways the JZ4730 can boot and works with these patches included. BR and thanks, Nikolaus Schaller [1] we are preparing full support for the JZ4730 based Skytone Alpha machine. Most features are working except sound/I2S. I2C is a little unreliable and Ethernet has hickups. And scheduling which indicates some fundamental IRQ or timer issue we could not yet identify. So if anyone happens to have such a machine and can help debugging, please contact me.
On Thu, 22 Apr 2021, H. Nikolaus Schaller wrote: > > This has passed correctness verification with test_div64 and reduced the > > module's average execution time down to 1.0445s and 0.2619s from 1.0668s > > and 0.2629s respectively for an R3400 CPU @40MHz and a 5Kc CPU @160MHz. > > test only [PATCH 1/4 and 2/4]: > > [ 256.301140] test_div64: Completed 64bit/32bit division and modulo test, 0.291154944s elapsed > > + [PATCH 3/4] > > [ 1698.698920] test_div64: Completed 64bit/32bit division and modulo test, 0.132142865s elapsed > > + [PATCH 4/4] > > [ 466.818349] test_div64: Completed 64bit/32bit division and modulo test, 0.134429075s elapsed > > So the new code is indeed faster than the default implementation. > [PATCH 4/4] has no significant influence (wouldn't say it is slower because timer resolution > isn't very high on this machine and the kernel has some scheduling issue [1]). Have you used it as a module or at bootstrap? I have noticed that at bootstrap the initialisation of the random number generator sometimes interferes with the benchmark, which happens when there's an intervening message produced, e.g.: test_div64: Starting 64bit/32bit division and modulo test random: fast init done test_div64: Completed 64bit/32bit division and modulo test, 1.069906272s elapsed I think it can be worked around by configuration changes so that more stuff is run between the RNG and the test module, but instead I have simply inserted: mdelay(5000); at the beginning of `test_div64_init' instead, as for historical reasons I haven't got the systems involved set up for modules (beyond Linux 2.4) at this time. NB I have run the benchmark five times with each change and system and with the RNG taken out of the picture results were very stable as any fluctuation only started at the fifth decimal digit. Both the DECstation (the model I used anyway) and the Malta have a high-resolution clock source though, the I/O ASIC free-running counter register at 25MHz (used by David L. Mills, the original author of the NTP suite, for his reference implementation) and the CP0 Count register at 80MHz respectively. I would expect your JZ4730 device to have the CP0 Count register as well, as it has been architectural ever since MIPS III really, or is your system SMP with CP0 Count registers out of sync across CPUs due to sleep modes or whatever? Thanks for sharing your figures. > [1] we are preparing full support for the JZ4730 based Skytone Alpha machine. Most features > are working except sound/I2S. I2C is a little unreliable and Ethernet has hickups. And scheduling > which indicates some fundamental IRQ or timer issue we could not yet identify. Good luck with that! Maciej
> Am 22.04.2021 um 18:55 schrieb Maciej W. Rozycki <macro@orcam.me.uk>: > > > Have you used it as a module or at bootstrap? I did load it by insmod. > I would expect your JZ4730 device to have the CP0 Count register as well, > as it has been architectural ever since MIPS III really, or is your system > SMP with CP0 Count registers out of sync across CPUs due to sleep modes or > whatever? It switches clocksource to some operating system timers on the SoC which may have an influence on the resolution (or precision). > Thanks for sharing your figures. It was a pleasure towards better MIPS support... > >> [1] we are preparing full support for the JZ4730 based Skytone Alpha machine. Most features >> are working except sound/I2S. I2C is a little unreliable and Ethernet has hickups. And scheduling >> which indicates some fundamental IRQ or timer issue we could not yet identify. > > Good luck with that! BR and thanks, Niklaus