diff mbox series

[4/4] MIPS: Avoid DIVU in `__div64_32' is result would be zero

Message ID alpine.DEB.2.21.2104200331110.44318@angie.orcam.me.uk (mailing list archive)
State Accepted
Commit c1d337d45ec0a802299688e17d568c4e3a585895
Headers show
Series Reinstate and improve MIPS `do_div' implementation | expand

Commit Message

Maciej W. Rozycki April 20, 2021, 2:50 a.m. UTC
We already check the high part of the divident against zero to avoid the 
costly DIVU instruction in that case, needed to reduce the high part of 
the divident, so we may well check against the divisor instead and set 
the high part of the quotient to zero right away.  We need to treat the 
high part the divident in that case though as the remainder that would 
be calculated by the DIVU instruction we avoided.

This has passed correctness verification with test_div64 and reduced the
module's average execution time down to 1.0445s and 0.2619s from 1.0668s
and 0.2629s respectively for an R3400 CPU @40MHz and a 5Kc CPU @160MHz.

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
---
I have made an experimental change on top of this to put `__div64_32' out 
of line, and that increases the averages respectively up to 1.0785s and 
0.2705s.  Not a terrible loss, especially compared to generic times quoted 
with 3/4, but still, so I think it would best be made where optimising for 
size, as noted in the cover letter.
---
 arch/mips/include/asm/div64.h |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

H. Nikolaus Schaller April 21, 2021, 4:05 p.m. UTC | #1
> Am 20.04.2021 um 04:50 schrieb Maciej W. Rozycki <macro@orcam.me.uk>:
> 
> We already check the high part of the divident against zero to avoid the 

nit-picking: s/divident/dividend/

(seems to come from from Latin "dividendum" = the number that is to be divided).

> costly DIVU instruction in that case, needed to reduce the high part of 
> the divident, so we may well check against the divisor instead and set 
> the high part of the quotient to zero right away.  We need to treat the 
> high part the divident in that case though as the remainder that would 
> be calculated by the DIVU instruction we avoided.
> 
> This has passed correctness verification with test_div64 and reduced the
> module's average execution time down to 1.0445s and 0.2619s from 1.0668s
> and 0.2629s respectively for an R3400 CPU @40MHz and a 5Kc CPU @160MHz.

Impressive.

> 
> Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
> ---
> I have made an experimental change on top of this to put `__div64_32' out 
> of line, and that increases the averages respectively up to 1.0785s and 
> 0.2705s.  Not a terrible loss, especially compared to generic times quoted 
> with 3/4, but still, so I think it would best be made where optimising for 
> size, as noted in the cover letter.
> ---
> arch/mips/include/asm/div64.h |    6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
> 
> Index: linux-3maxp-div64/arch/mips/include/asm/div64.h
> ===================================================================
> --- linux-3maxp-div64.orig/arch/mips/include/asm/div64.h
> +++ linux-3maxp-div64/arch/mips/include/asm/div64.h
> @@ -68,9 +68,11 @@
> 									\
> 	__high = __div >> 32;						\
> 	__low = __div;							\
> -	__upper = __high;						\
> 									\
> -	if (__high) {							\
> +	if (__high < __radix) {						\
> +		__upper = __high;					\
> +		__high = 0;						\
> +	} else {							\
> 		__asm__("divu	$0, %z1, %z2"				\
> 		: "=x" (__modquot)					\
> 		: "Jr" (__high), "Jr" (__radix));			\
Maciej W. Rozycki April 21, 2021, 4:16 p.m. UTC | #2
On Wed, 21 Apr 2021, H. Nikolaus Schaller wrote:

> > We already check the high part of the divident against zero to avoid the 
> 
> nit-picking: s/divident/dividend/

 Umm, I find this embarassing (and I hesitated reaching for a dictionary 
to double-check the spelling!).  Indeed why would this be different from 
subtrahend or minuend?

 Thomas, as this mistake has spread across three out of these patches,
both in change descriptions and actual code, would you mind dropping the 
series from mips-next altogether and I'll respin the series with all 
these issues corrected?

  Maciej
Thomas Bogendoerfer April 22, 2021, 7:56 a.m. UTC | #3
On Wed, Apr 21, 2021 at 06:16:18PM +0200, Maciej W. Rozycki wrote:
> On Wed, 21 Apr 2021, H. Nikolaus Schaller wrote:
> 
> > > We already check the high part of the divident against zero to avoid the 
> > 
> > nit-picking: s/divident/dividend/
> 
>  Umm, I find this embarassing (and I hesitated reaching for a dictionary 
> to double-check the spelling!).  Indeed why would this be different from 
> subtrahend or minuend?
> 
>  Thomas, as this mistake has spread across three out of these patches,
> both in change descriptions and actual code, would you mind dropping the 
> series from mips-next altogether and I'll respin the series with all 
> these issues corrected?

I'm sorry, but I don't rebase mips-next and the patches have been pushed
out before I received this mail.

Thomas.
Maciej W. Rozycki April 22, 2021, 9:12 a.m. UTC | #4
On Thu, 22 Apr 2021, Thomas Bogendoerfer wrote:

> >  Thomas, as this mistake has spread across three out of these patches,
> > both in change descriptions and actual code, would you mind dropping the 
> > series from mips-next altogether and I'll respin the series with all 
> > these issues corrected?
> 
> I'm sorry, but I don't rebase mips-next and the patches have been pushed
> out before I received this mail.

 Hmm, what about changes known to actually break something then?  Like 
with R6 here?  Those will undoubtedly cause someone a headache with 
bisection sometime in the future.  Are you sure your policy is the best 
possible?

 NB, I have benchmarked the update with my DECstation, however my Malta 
has not come up after a reboot last evening and neither it has after a few 
remote power cycles.  I have planned a visit in my lab next week anyway, 
so I'll see what has happened there; hopefully I'm able to bring the board 
back to life as I find it valuable for my purposes.  I had to replace the 
PSU it came with already a couple years back and the new one is supposedly 
high-quality, so I fear it's the board itself.

 Meanwhile I'll be able to give DECstation figures only.  I guess such 
limited results will suffice if I post the fix as an update rather than 
replacement.

  Maciej
Thomas Bogendoerfer April 22, 2021, 11:08 a.m. UTC | #5
On Thu, Apr 22, 2021 at 11:12:40AM +0200, Maciej W. Rozycki wrote:
> On Thu, 22 Apr 2021, Thomas Bogendoerfer wrote:
> 
> > >  Thomas, as this mistake has spread across three out of these patches,
> > > both in change descriptions and actual code, would you mind dropping the 
> > > series from mips-next altogether and I'll respin the series with all 
> > > these issues corrected?
> > 
> > I'm sorry, but I don't rebase mips-next and the patches have been pushed
> > out before I received this mail.
> 
>  Hmm, what about changes known to actually break something then?  Like 
> with R6 here?  Those will undoubtedly cause someone a headache with 
> bisection sometime in the future.  Are you sure your policy is the best 
> possible?

This is my Linus pull tree, so I'm following 

Documentation/maintainer/rebasing-and-merging.rst

>  Meanwhile I'll be able to give DECstation figures only.  I guess such 
> limited results will suffice if I post the fix as an update rather than 
> replacement.

thank you.

Thomas.
Andreas Schwab April 22, 2021, 11:17 a.m. UTC | #6
Subject typo: s/is/if/

Andreas.
Maciej W. Rozycki April 22, 2021, 8:47 p.m. UTC | #7
On Thu, 22 Apr 2021, Thomas Bogendoerfer wrote:

> >  Hmm, what about changes known to actually break something then?  Like 
> > with R6 here?  Those will undoubtedly cause someone a headache with 
> > bisection sometime in the future.  Are you sure your policy is the best 
> > possible?
> 
> This is my Linus pull tree, so I'm following 
> 
> Documentation/maintainer/rebasing-and-merging.rst

 Fair enough.

> >  Meanwhile I'll be able to give DECstation figures only.  I guess such 
> > limited results will suffice if I post the fix as an update rather than 
> > replacement.
> 
> thank you.

 I have requested 3/4 to be backported however, so I think in this case 
4/4 will have to follow as well as 2/2 of the fix series I have just 
posted.  Will you be able to resolve this somehow?

  Maciej
Maciej W. Rozycki April 27, 2021, 12:16 p.m. UTC | #8
On Thu, 22 Apr 2021, Maciej W. Rozycki wrote:

>  NB, I have benchmarked the update with my DECstation, however my Malta 
> has not come up after a reboot last evening and neither it has after a few 
> remote power cycles.  I have planned a visit in my lab next week anyway, 
> so I'll see what has happened there; hopefully I'm able to bring the board 
> back to life as I find it valuable for my purposes.  I had to replace the 
> PSU it came with already a couple years back and the new one is supposedly 
> high-quality, so I fear it's the board itself.

 For the record I have put the Malta back in service now.

 It was weird: the YAMON status shown on the ASCII display was "E:P_UNKN",
i.e. "Unknown PCI device on board (should never happen)" (according to the 
manual).  Indeed I never saw that message before.  By elimination I have 
tracked down the DEFPA FDDI network interface to be the culprit and upon a 
close inspection found a bent pin with the PFI ASIC (the PCI interface 
chip), a 160-pin 20-mil PQFP part.  It was IDSEL shorting to RST_L, the 
adjacent pin.  No wonder the card wreaked havoc with the system.  Fixed by 
carefully moving the pins apart with a razor blade (always good to have 
one at hand!).

 Now I wonder how this card happened to work with the pin bent for some
15 years and only finally developed a short now.  Weird indeed, and 
weirder even to see a second failure related to IC pins this year only.

  Maciej
diff mbox series

Patch

Index: linux-3maxp-div64/arch/mips/include/asm/div64.h
===================================================================
--- linux-3maxp-div64.orig/arch/mips/include/asm/div64.h
+++ linux-3maxp-div64/arch/mips/include/asm/div64.h
@@ -68,9 +68,11 @@ 
 									\
 	__high = __div >> 32;						\
 	__low = __div;							\
-	__upper = __high;						\
 									\
-	if (__high) {							\
+	if (__high < __radix) {						\
+		__upper = __high;					\
+		__high = 0;						\
+	} else {							\
 		__asm__("divu	$0, %z1, %z2"				\
 		: "=x" (__modquot)					\
 		: "Jr" (__high), "Jr" (__radix));			\