diff mbox series

[2/7] diffcore-rename: remove unnecessary if-clause

Message ID f96bb36930a6e5e42f0d3b9c5dfa3b2cc27c1f9d.1607223276.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series diffcore-rename improvements | expand

Commit Message

Elijah Newren Dec. 6, 2020, 2:54 a.m. UTC
From: Elijah Newren <newren@gmail.com>

diffcore-rename had two different checks of the form

    if ((a < limit || b < limit) &&
        a * b <= limit * limit)

Since these are all non-negative integers, this can be simplified to

    if (a * b <= limit * limit)

The only advantage of the former would be in avoiding a couple
multiplications in the rare case that both a and b are BOTH very large.
I see no reason for such an optimization given that this code is not in
any kind of loop.  Prefer code simplicity here and change to the latter
form.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 diffcore-rename.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

Comments

Taylor Blau Dec. 9, 2020, 10:10 p.m. UTC | #1
On Sun, Dec 06, 2020 at 02:54:31AM +0000, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
>
> diffcore-rename had two different checks of the form
>
>     if ((a < limit || b < limit) &&
>         a * b <= limit * limit)
>
> Since these are all non-negative integers, this can be simplified to
>
>     if (a * b <= limit * limit)

Makes sense.

> The only advantage of the former would be in avoiding a couple
> multiplications in the rare case that both a and b are BOTH very large.
> I see no reason for such an optimization given that this code is not in
> any kind of loop.  Prefer code simplicity here and change to the latter
> form.

If you were really paranoid, you could perform these checks with
unsigned_mult_overflows(), but I don't think that it's worth doing so
here.

> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  diffcore-rename.c | 10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/diffcore-rename.c b/diffcore-rename.c
> index 68ddf51a2a1..0f8fce9293e 100644
> --- a/diffcore-rename.c
> +++ b/diffcore-rename.c
> @@ -450,9 +450,8 @@ static int too_many_rename_candidates(int num_targets, int num_sources,
>  	 */
>  	if (rename_limit <= 0)
>  		rename_limit = 32767;
> -	if ((num_targets <= rename_limit || num_sources <= rename_limit) &&
> -	    ((uint64_t)num_targets * (uint64_t)num_sources
> -	     <= (uint64_t)rename_limit * (uint64_t)rename_limit))
> +	if ((uint64_t)num_targets * (uint64_t)num_sources
> +	    <= (uint64_t)rename_limit * (uint64_t)rename_limit)

One small nit here (and below) is that not all of these need casting.
Only one operand of each multiplication needs to be widened for the
compiler to widen the other, too. So, I'd write this instead as:

> +	if ((num_targets * (uint64_t)num_sources) <=
> +     (rename_limit * (uint64_t)rename_limit))

or something. (I tend to prefer the cast on the right-most operand,
since it makes clear that there's no need to cast the "first" operand,
and that casting either will do the trick).

>  		return 0;
>
>  	options->needed_rename_limit =
> @@ -468,9 +467,8 @@ static int too_many_rename_candidates(int num_targets, int num_sources,
>  			continue;
>  		limited_sources++;
>  	}
> -	if ((num_targets <= rename_limit || limited_sources <= rename_limit) &&
> -	    ((uint64_t)num_targets * (uint64_t)limited_sources
> -	     <= (uint64_t)rename_limit * (uint64_t)rename_limit))
> +	if ((uint64_t)num_targets * (uint64_t)limited_sources
> +	    <= (uint64_t)rename_limit * (uint64_t)rename_limit)

Same notes here, of course. I was hoping that we could only do this
multiplication once, but it looks like limited_sources grows between the
two checks, so we have to repeat it here, I suppose.

Thanks,
Taylor
Elijah Newren Dec. 10, 2020, 12:32 a.m. UTC | #2
On Wed, Dec 9, 2020 at 2:10 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> On Sun, Dec 06, 2020 at 02:54:31AM +0000, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> >
> > diffcore-rename had two different checks of the form
> >
> >     if ((a < limit || b < limit) &&
> >         a * b <= limit * limit)
> >
> > Since these are all non-negative integers, this can be simplified to
> >
> >     if (a * b <= limit * limit)
>
> Makes sense.
>
> > The only advantage of the former would be in avoiding a couple
> > multiplications in the rare case that both a and b are BOTH very large.
> > I see no reason for such an optimization given that this code is not in
> > any kind of loop.  Prefer code simplicity here and change to the latter
> > form.
>
> If you were really paranoid, you could perform these checks with
> unsigned_mult_overflows(), but I don't think that it's worth doing so
> here.
>
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> >  diffcore-rename.c | 10 ++++------
> >  1 file changed, 4 insertions(+), 6 deletions(-)
> >
> > diff --git a/diffcore-rename.c b/diffcore-rename.c
> > index 68ddf51a2a1..0f8fce9293e 100644
> > --- a/diffcore-rename.c
> > +++ b/diffcore-rename.c
> > @@ -450,9 +450,8 @@ static int too_many_rename_candidates(int num_targets, int num_sources,
> >        */
> >       if (rename_limit <= 0)
> >               rename_limit = 32767;
> > -     if ((num_targets <= rename_limit || num_sources <= rename_limit) &&
> > -         ((uint64_t)num_targets * (uint64_t)num_sources
> > -          <= (uint64_t)rename_limit * (uint64_t)rename_limit))
> > +     if ((uint64_t)num_targets * (uint64_t)num_sources
> > +         <= (uint64_t)rename_limit * (uint64_t)rename_limit)
>
> One small nit here (and below) is that not all of these need casting.
> Only one operand of each multiplication needs to be widened for the
> compiler to widen the other, too. So, I'd write this instead as:
>
> > +     if ((num_targets * (uint64_t)num_sources) <=
> > +     (rename_limit * (uint64_t)rename_limit))
>
> or something. (I tend to prefer the cast on the right-most operand,
> since it makes clear that there's no need to cast the "first" operand,
> and that casting either will do the trick).

Yeah, I think that reads slightly better too, but on the other hand it
does make the diff slightly harder to read.  *shrug*.

> >               return 0;
> >
> >       options->needed_rename_limit =
> > @@ -468,9 +467,8 @@ static int too_many_rename_candidates(int num_targets, int num_sources,
> >                       continue;
> >               limited_sources++;
> >       }
> > -     if ((num_targets <= rename_limit || limited_sources <= rename_limit) &&
> > -         ((uint64_t)num_targets * (uint64_t)limited_sources
> > -          <= (uint64_t)rename_limit * (uint64_t)rename_limit))
> > +     if ((uint64_t)num_targets * (uint64_t)limited_sources
> > +         <= (uint64_t)rename_limit * (uint64_t)rename_limit)
>
> Same notes here, of course. I was hoping that we could only do this
> multiplication once, but it looks like limited_sources grows between the
> two checks, so we have to repeat it here, I suppose.

The right hand side of the expression is the same -- rename_limit *
rename_limit -- so it could be stored off (though I don't think
there's much point in doing so unless it made the code clearer; this
is not remotely close to a hot path).  However, in the left hand side,
it's not so much that one of the variables has changed since the last
check, it's that it uses a different set of variables in the check.
In particular, it replaces 'num_sources' with 'limited_sources'.
Junio C Hamano Dec. 10, 2020, 2:03 a.m. UTC | #3
Taylor Blau <me@ttaylorr.com> writes:

> On Sun, Dec 06, 2020 at 02:54:31AM +0000, Elijah Newren via GitGitGadget wrote:
>> From: Elijah Newren <newren@gmail.com>
>>
>> diffcore-rename had two different checks of the form
>>
>>     if ((a < limit || b < limit) &&
>>         a * b <= limit * limit)
>>
>> Since these are all non-negative integers, this can be simplified to
>>
>>     if (a * b <= limit * limit)
>
> Makes sense.

I've always assumed that the original was for correctness (if a and
b are both larger than limit, a*b could end up being smaller than
limit*limit when the result of multiplication of the former wraps
around but not the latter) ...

>> The only advantage of the former would be in avoiding a couple
>> multiplications in the rare case that both a and b are BOTH very large.
>> I see no reason for such an optimization given that this code is not in
>> any kind of loop.  Prefer code simplicity here and change to the latter
>> form.
>
> If you were really paranoid, you could perform these checks with
> unsigned_mult_overflows(), but I don't think that it's worth doing so
> here.

... and in no way as an optimization.

So, I dunno.
Elijah Newren Dec. 10, 2020, 2:17 a.m. UTC | #4
On Wed, Dec 9, 2020 at 6:03 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Taylor Blau <me@ttaylorr.com> writes:
>
> > On Sun, Dec 06, 2020 at 02:54:31AM +0000, Elijah Newren via GitGitGadget wrote:
> >> From: Elijah Newren <newren@gmail.com>
> >>
> >> diffcore-rename had two different checks of the form
> >>
> >>     if ((a < limit || b < limit) &&
> >>         a * b <= limit * limit)
> >>
> >> Since these are all non-negative integers, this can be simplified to
> >>
> >>     if (a * b <= limit * limit)
> >
> > Makes sense.
>
> I've always assumed that the original was for correctness (if a and
> b are both larger than limit, a*b could end up being smaller than
> limit*limit when the result of multiplication of the former wraps
> around but not the latter) ...
>
> >> The only advantage of the former would be in avoiding a couple
> >> multiplications in the rare case that both a and b are BOTH very large.
> >> I see no reason for such an optimization given that this code is not in
> >> any kind of loop.  Prefer code simplicity here and change to the latter
> >> form.
> >
> > If you were really paranoid, you could perform these checks with
> > unsigned_mult_overflows(), but I don't think that it's worth doing so
> > here.
>
> ... and in no way as an optimization.
>
> So, I dunno.

Ah, so would you be okay replacing these with

   if (st_mult(num_targets, limited_sources) <= st_mult(rename_limit,
rename_limit))

?

That'd make the correctness intent far clearer, and allow us to drop
the casting as well since st_mult converts to size_t.  (If, on the off
chance you're on a platform where size_t is 32-bits and the
multiplications don't fit in that size, then the requested matrices
for computing rename detection won't fit in memory for you.)
Junio C Hamano Dec. 10, 2020, 6:56 a.m. UTC | #5
Elijah Newren <newren@gmail.com> writes:

> Ah, so would you be okay replacing these with
>
>    if (st_mult(num_targets, limited_sources) <= st_mult(rename_limit,
> rename_limit))

It's subtle that the use of size_t is a sensible thing to do, and
definitely deserves an in-line comment, but I think that is a good
way to go.
diff mbox series

Patch

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 68ddf51a2a1..0f8fce9293e 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -450,9 +450,8 @@  static int too_many_rename_candidates(int num_targets, int num_sources,
 	 */
 	if (rename_limit <= 0)
 		rename_limit = 32767;
-	if ((num_targets <= rename_limit || num_sources <= rename_limit) &&
-	    ((uint64_t)num_targets * (uint64_t)num_sources
-	     <= (uint64_t)rename_limit * (uint64_t)rename_limit))
+	if ((uint64_t)num_targets * (uint64_t)num_sources
+	    <= (uint64_t)rename_limit * (uint64_t)rename_limit)
 		return 0;
 
 	options->needed_rename_limit =
@@ -468,9 +467,8 @@  static int too_many_rename_candidates(int num_targets, int num_sources,
 			continue;
 		limited_sources++;
 	}
-	if ((num_targets <= rename_limit || limited_sources <= rename_limit) &&
-	    ((uint64_t)num_targets * (uint64_t)limited_sources
-	     <= (uint64_t)rename_limit * (uint64_t)rename_limit))
+	if ((uint64_t)num_targets * (uint64_t)limited_sources
+	    <= (uint64_t)rename_limit * (uint64_t)rename_limit)
 		return 2;
 	return 1;
 }