diff mbox series

[v3,3/9] diffcore-rename: simplify limit check

Message ID 7214fa73fdd13418744903f6c769fdb77c9512ce.1609272328.git.gitgitgadget@gmail.com (mailing list archive)
State Accepted
Commit ad8a1be529ea9e0bd6346a3cb1d53fec16e3bd10
Headers show
Series diffcore-rename improvements | expand

Commit Message

Elijah Newren Dec. 29, 2020, 8:05 p.m. UTC
From: Elijah Newren <newren@gmail.com>

diffcore-rename had two different checks of the form

    if ((a < limit || b < limit) &&
        a * b <= limit * limit)

This can be simplified to

    if (st_mult(a, b) <= st_mult(limit, limit))

which makes it clearer how we are checking for overflow, and makes it
much easier to parse given the drop from 8 to 4 variable appearances.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 diffcore-rename.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

Comments

Başar Uğur Nov. 9, 2021, 9:14 p.m. UTC | #1
Hi all,

First post on Git mailing list, to provide a comment on a patch. Hope
this works.

In cases where the `rename_limit` is already greater than 65535, the
`st_mult(rename_limit, rename_limit)` multiplication overflows and
process halts. But I don't think an 'overflow error' is very helpful
for the users in understanding what is wrong with their configuration;
i.e. `diff.renameLimit` documentation says nothing about a 'maximum
allowed value'. I would either clamp it to a reasonable range, or
inform the users about the limits, or maybe both.

Cheers,
Elijah Newren Nov. 10, 2021, 8:06 p.m. UTC | #2
On Tue, Nov 9, 2021 at 1:15 PM Başar Uğur <basarugur@gmail.com> wrote:
>
> Hi all,
>
> First post on Git mailing list, to provide a comment on a patch. Hope
> this works.
>
> In cases where the `rename_limit` is already greater than 65535, the
> `st_mult(rename_limit, rename_limit)` multiplication overflows and
> process halts.

Out of curiosity, what system are you on?  And how many renames do you
actually have?

We used to clamp to 32767, but one specific repository needed values
larger than that, in the range of ~50000.  However, due to a number of
rename detection related optimizations added since then, the git of
today can handle that same particular repository and support full
rename detection with a rename limit under 1000 for merge/cherry-pick
(sorry, forgot the exact numbers), and probably less than 10000 for
diff (just guestimating; I don't want to go and check).

Anyway, all that aside, I don't see any such overflow for rename_limit
being 2**16; we can push it much farther:

    size_t a = 4294967295;   /*  2**32 -1  */
    size_t b = a;
    size_t c = st_mult(a, b);
    printf("%"PRIuMAX" = %"PRIuMAX" * %"PRIuMAX"\n", c, a, b);

Output:

    18446744065119617025 = 4294967295 * 4294967295

Adding one to the value of a results in:

    fatal: size_t overflow: 4294967296 * 4294967296

> But I don't think an 'overflow error' is very helpful
> for the users in understanding what is wrong with their configuration;
> i.e. `diff.renameLimit` documentation says nothing about a 'maximum
> allowed value'. I would either clamp it to a reasonable range, or
> inform the users about the limits, or maybe both.

That sounds reasonable, but I'm not sure it's worth the effort in
practice.  2**32 - 1 is so impractically large for a rename_limit that
I don't see why anyone would need a value even remotely close to that
level (and I wouldn't at all be surprised if other things in git
scaling broke before we even got to that point).

Perhaps you're dealing with a 32-bit system?  Even then, the
repository that hit this was about 6.5GB packed .git history; and you
might need to be a lot larger than that given the optimization since
then in order to hit this.  Can 32-bit systems even handle that size
of repository without dying in several other ways?
Başar Uğur Nov. 11, 2021, 9:02 a.m. UTC | #3
On Wed, Nov 10, 2021 at 9:06 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Tue, Nov 9, 2021 at 1:15 PM Başar Uğur <basarugur@gmail.com> wrote:
> >
> > Hi all,
> >
> > First post on Git mailing list, to provide a comment on a patch. Hope
> > this works.
> >
> > In cases where the `rename_limit` is already greater than 65535, the
> > `st_mult(rename_limit, rename_limit)` multiplication overflows and
> > process halts.
>
> Out of curiosity, what system are you on?  And how many renames do you
> actually have?

I am on a 64-bit Windows 10; but following up on your question, it
became obvious that these limits have something to do with 'which git
executable' I was dealing with. This problem surfaced when I was
working on Visual Studio 2019, and was trying to rename not more than
10 files. My config had 999999 as the renameLimit, and VS2019 showed
the 'fatal error' in its git output. However, git bash was all fine
with listing the renamed files. And the difference between these
scenarios turned out to be, yes, different git executables. VS2019 has
its own copy of git which is 32-bit, and it hits this 999999 * 999999
overflow; whereas *my* copy of git used in bash is 64-bit which does
not have that overflow problem.

>
> We used to clamp to 32767, but one specific repository needed values
> larger than that, in the range of ~50000.  However, due to a number of
> rename detection related optimizations added since then, the git of
> today can handle that same particular repository and support full
> rename detection with a rename limit under 1000 for merge/cherry-pick
> (sorry, forgot the exact numbers), and probably less than 10000 for
> diff (just guestimating; I don't want to go and check).
>
> Anyway, all that aside, I don't see any such overflow for rename_limit
> being 2**16; we can push it much farther:
>
>     size_t a = 4294967295;   /*  2**32 -1  */
>     size_t b = a;
>     size_t c = st_mult(a, b);
>     printf("%"PRIuMAX" = %"PRIuMAX" * %"PRIuMAX"\n", c, a, b);
>
> Output:
>
>     18446744065119617025 = 4294967295 * 4294967295
>
> Adding one to the value of a results in:
>
>     fatal: size_t overflow: 4294967296 * 4294967296
>
> > But I don't think an 'overflow error' is very helpful
> > for the users in understanding what is wrong with their configuration;
> > i.e. `diff.renameLimit` documentation says nothing about a 'maximum
> > allowed value'. I would either clamp it to a reasonable range, or
> > inform the users about the limits, or maybe both.
>
> That sounds reasonable, but I'm not sure it's worth the effort in
> practice.  2**32 - 1 is so impractically large for a rename_limit that
> I don't see why anyone would need a value even remotely close to that
> level (and I wouldn't at all be surprised if other things in git
> scaling broke before we even got to that point).
>
> Perhaps you're dealing with a 32-bit system?  Even then, the
> repository that hit this was about 6.5GB packed .git history; and you
> might need to be a lot larger than that given the optimization since
> then in order to hit this.  Can 32-bit systems even handle that size
> of repository without dying in several other ways?

Good point, but the system aside, 2**16 - 1 = 65535 would remain to be
the limit for the 32-bit git executables, wherever they are used.
Therefore, maybe there is a point to curb it, or mention this
somewhere, as I have said before.
Elijah Newren Nov. 11, 2021, 4:19 p.m. UTC | #4
On Thu, Nov 11, 2021 at 1:03 AM Başar Uğur <basarugur@gmail.com> wrote:
>
> On Wed, Nov 10, 2021 at 9:06 PM Elijah Newren <newren@gmail.com> wrote:
> >
> > On Tue, Nov 9, 2021 at 1:15 PM Başar Uğur <basarugur@gmail.com> wrote:
> > >
> > > Hi all,
> > >
> > > First post on Git mailing list, to provide a comment on a patch. Hope
> > > this works.
> > >
> > > In cases where the `rename_limit` is already greater than 65535, the
> > > `st_mult(rename_limit, rename_limit)` multiplication overflows and
> > > process halts.
> >
> > Out of curiosity, what system are you on?  And how many renames do you
> > actually have?
>
> I am on a 64-bit Windows 10; but following up on your question, it
> became obvious that these limits have something to do with 'which git
> executable' I was dealing with. This problem surfaced when I was
> working on Visual Studio 2019, and was trying to rename not more than
> 10 files. My config had 999999 as the renameLimit, and VS2019 showed
> the 'fatal error' in its git output. However, git bash was all fine
> with listing the renamed files. And the difference between these
> scenarios turned out to be, yes, different git executables. VS2019 has
> its own copy of git which is 32-bit, and it hits this 999999 * 999999
> overflow; whereas *my* copy of git used in bash is 64-bit which does
> not have that overflow problem.

Ah, thanks for the extra info.

> >
> > We used to clamp to 32767, but one specific repository needed values
> > larger than that, in the range of ~50000.  However, due to a number of
> > rename detection related optimizations added since then, the git of
> > today can handle that same particular repository and support full
> > rename detection with a rename limit under 1000 for merge/cherry-pick
> > (sorry, forgot the exact numbers), and probably less than 10000 for
> > diff (just guestimating; I don't want to go and check).
> >
> > Anyway, all that aside, I don't see any such overflow for rename_limit
> > being 2**16; we can push it much farther:
> >
> >     size_t a = 4294967295;   /*  2**32 -1  */
> >     size_t b = a;
> >     size_t c = st_mult(a, b);
> >     printf("%"PRIuMAX" = %"PRIuMAX" * %"PRIuMAX"\n", c, a, b);
> >
> > Output:
> >
> >     18446744065119617025 = 4294967295 * 4294967295
> >
> > Adding one to the value of a results in:
> >
> >     fatal: size_t overflow: 4294967296 * 4294967296
> >
> > > But I don't think an 'overflow error' is very helpful
> > > for the users in understanding what is wrong with their configuration;
> > > i.e. `diff.renameLimit` documentation says nothing about a 'maximum
> > > allowed value'. I would either clamp it to a reasonable range, or
> > > inform the users about the limits, or maybe both.
> >
> > That sounds reasonable, but I'm not sure it's worth the effort in
> > practice.  2**32 - 1 is so impractically large for a rename_limit that
> > I don't see why anyone would need a value even remotely close to that
> > level (and I wouldn't at all be surprised if other things in git
> > scaling broke before we even got to that point).
> >
> > Perhaps you're dealing with a 32-bit system?  Even then, the
> > repository that hit this was about 6.5GB packed .git history; and you
> > might need to be a lot larger than that given the optimization since
> > then in order to hit this.  Can 32-bit systems even handle that size
> > of repository without dying in several other ways?
>
> Good point, but the system aside, 2**16 - 1 = 65535 would remain to be
> the limit for the 32-bit git executables, wherever they are used.
> Therefore, maybe there is a point to curb it, or mention this
> somewhere, as I have said before.

Fair enough.  If someone wants to contribute a patch to either provide
a nicer error message if the value is set too high, or to clamp it
with a warning, and in either case to make sure the too-large check is
specific to 32-bit systems (or uses appropriately different limits on
32-bit and 64-bit systems), then that would be a welcome contribution.
diff mbox series

Patch

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 1d6675c040d..16553ab259f 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -447,12 +447,16 @@  static int too_many_rename_candidates(int num_destinations, int num_sources,
 	 * growing larger than a "rename_limit" square matrix, ie:
 	 *
 	 *    num_destinations * num_sources > rename_limit * rename_limit
+	 *
+	 * We use st_mult() to check overflow conditions; in the
+	 * exceptional circumstance that size_t isn't large enough to hold
+	 * the multiplication, the system won't be able to allocate enough
+	 * memory for the matrix anyway.
 	 */
 	if (rename_limit <= 0)
 		rename_limit = 32767;
-	if ((num_destinations <= rename_limit || num_sources <= rename_limit) &&
-	    ((uint64_t)num_destinations * (uint64_t)num_sources
-	     <= (uint64_t)rename_limit * (uint64_t)rename_limit))
+	if (st_mult(num_destinations, num_sources)
+	    <= st_mult(rename_limit, rename_limit))
 		return 0;
 
 	options->needed_rename_limit =
@@ -468,9 +472,8 @@  static int too_many_rename_candidates(int num_destinations, int num_sources,
 			continue;
 		limited_sources++;
 	}
-	if ((num_destinations <= rename_limit || limited_sources <= rename_limit) &&
-	    ((uint64_t)num_destinations * (uint64_t)limited_sources
-	     <= (uint64_t)rename_limit * (uint64_t)rename_limit))
+	if (st_mult(num_destinations, limited_sources)
+	    <= st_mult(rename_limit, rename_limit))
 		return 2;
 	return 1;
 }