diff mbox series

[1/1] Add a helper to reverse index_pos_to_insert_pos()

Message ID 81648344bbab4219c0bfc60d1e5f02473ea7d495.1570517329.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Fallout from azure-pipelines-msvc | expand

Commit Message

Johannes Schindelin via GitGitGadget Oct. 8, 2019, 6:48 a.m. UTC
From: Johannes Schindelin <johannes.schindelin@gmx.de>

We have just introduced the helper `index_pos_to_insert_pos()` to help
avoiding underflows when returning `-1 - pos` for cases where we want to
return an insert position, using the ones' complement (as `int`).

As pointed out during the review of the patch series that introduced it,
this helper wants to be accompanied by a helper that reverse that ones'
complement, for clarity. This patch does exactly that.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 blame.c           | 5 +++--
 builtin/mv.c      | 2 +-
 cache.h           | 5 +++++
 merge-recursive.c | 4 ++--
 read-cache.c      | 2 +-
 rerere.c          | 2 +-
 sha1-name.c       | 2 +-
 submodule.c       | 2 +-
 unpack-trees.c    | 2 +-
 9 files changed, 16 insertions(+), 10 deletions(-)

Comments

Johannes Sixt Oct. 8, 2019, 9:03 p.m. UTC | #1
Am 08.10.19 um 08:48 schrieb Johannes Schindelin via GitGitGadget:
> We have just introduced the helper `index_pos_to_insert_pos()` to help
> avoiding underflows when returning `-1 - pos` for cases where we want to
> return an insert position, using the ones' complement (as `int`).

We do not want to have it for *all* cases, where we return -1 - pos, but
only for those cases, where the result was actually encoded by
index_pos_to_insert_pos(). That excludes all cases where the argument is
derived from index_name_pos(), and leaves just...

> --- a/rerere.c
> +++ b/rerere.c
> @@ -154,7 +154,7 @@ static struct rerere_dir *find_rerere_dir(const char *hex)
>  		rr_dir->status = NULL;
>  		rr_dir->status_nr = 0;
>  		rr_dir->status_alloc = 0;
> -		pos = -1 - pos;
> +		pos = insert_pos_to_index_pos(pos);

... this one...

>  
>  		/* Make sure the array is big enough ... */
>  		ALLOC_GROW(rerere_dir, rerere_dir_nr + 1, rerere_dir_alloc);
> diff --git a/sha1-name.c b/sha1-name.c
> index 49855ad24f..bee7ce39ee 100644
> --- a/sha1-name.c
> +++ b/sha1-name.c
> @@ -98,7 +98,7 @@ static void find_short_object_filename(struct disambiguate_state *ds)
>  		loose_objects = odb_loose_cache(odb, &ds->bin_pfx);
>  		pos = oid_array_lookup(loose_objects, &ds->bin_pfx);
>  		if (pos < 0)
> -			pos = -1 - pos;
> +			pos = insert_pos_to_index_pos(pos);

... and this one.

>  		while (!ds->ambiguous && pos < loose_objects->nr) {
>  			const struct object_id *oid;
>  			oid = loose_objects->oid + pos;

-- Hannes
Junio C Hamano Oct. 9, 2019, 1:19 a.m. UTC | #2
Johannes Sixt <j6t@kdbg.org> writes:

> We do not want to have it for *all* cases, where we return -1 - pos, but
> only for those cases, where the result was actually encoded by
> index_pos_to_insert_pos().

Yup, I agree with you that decoder should be fed only the data
emitted by the encoder.

But shouldn't the code that yielded 'pos' that later gets decoded by
computing "-1 -pos" without using the encoding helper be corrected
to use the encoder instead?  After all, the primary purpose of
inventing the encoder was to catch the arith overflow, wasn't it?

> That excludes all cases where the argument is
> derived from index_name_pos(), and leaves just...
>
>> --- a/rerere.c
>> +++ b/rerere.c
>> @@ -154,7 +154,7 @@ static struct rerere_dir *find_rerere_dir(const char *hex)
>>  		rr_dir->status = NULL;
>>  		rr_dir->status_nr = 0;
>>  		rr_dir->status_alloc = 0;
>> -		pos = -1 - pos;
>> +		pos = insert_pos_to_index_pos(pos);
>
> ... this one...
>
>>  
>>  		/* Make sure the array is big enough ... */
>>  		ALLOC_GROW(rerere_dir, rerere_dir_nr + 1, rerere_dir_alloc);
>> diff --git a/sha1-name.c b/sha1-name.c
>> index 49855ad24f..bee7ce39ee 100644
>> --- a/sha1-name.c
>> +++ b/sha1-name.c
>> @@ -98,7 +98,7 @@ static void find_short_object_filename(struct disambiguate_state *ds)
>>  		loose_objects = odb_loose_cache(odb, &ds->bin_pfx);
>>  		pos = oid_array_lookup(loose_objects, &ds->bin_pfx);
>>  		if (pos < 0)
>> -			pos = -1 - pos;
>> +			pos = insert_pos_to_index_pos(pos);
>
> ... and this one.
>
>>  		while (!ds->ambiguous && pos < loose_objects->nr) {
>>  			const struct object_id *oid;
>>  			oid = loose_objects->oid + pos;
>
> -- Hannes
Johannes Sixt Oct. 9, 2019, 5:51 a.m. UTC | #3
Am 09.10.19 um 03:19 schrieb Junio C Hamano:
> Johannes Sixt <j6t@kdbg.org> writes:
> 
>> We do not want to have it for *all* cases, where we return -1 - pos, but
>> only for those cases, where the result was actually encoded by
>> index_pos_to_insert_pos().
> 
> Yup, I agree with you that decoder should be fed only the data
> emitted by the encoder.
> 
> But shouldn't the code that yielded 'pos' that later gets decoded by
> computing "-1 -pos" without using the encoding helper be corrected
> to use the encoder instead?

That is the obvious conclusion, of course.

>  After all, the primary purpose of
> inventing the encoder was to catch the arith overflow, wasn't it?

That was *your* motivation for the helper function. But IMO it is a
wrong design decision. Whether or not the index calculation overflows is
a matter of the type that is used for the index, and that in turn is
dicated by the possible sizes of the collections that are indexed. IOW,
the index overflow check is (*if* it is necessary) a policy decision
that must be made at a higher level and must not be hidden away in a
helper function whose purpose (as suggested by its name) is something
entirely different.

Unless, of course, we declare "all our indexes are of type int". But
that ship has sailed long ago, because there are too many cases where we
are forced to use size_t as index (strlen, sizeof...).

Meta note: We know that we are painting a tiny shed here (Replacing a
one-liner by a one-liner, huh?) If anyone of you has better things to
do, please move on. My interest in this discussion are just the design
decisions that are made, not the actual outcome of this particular case.

-- Hannes
Johannes Schindelin Oct. 9, 2019, 8:11 a.m. UTC | #4
Hi Junio,


On Wed, 9 Oct 2019, Junio C Hamano wrote:

> Johannes Sixt <j6t@kdbg.org> writes:
>
> > We do not want to have it for *all* cases, where we return -1 - pos, but
> > only for those cases, where the result was actually encoded by
> > index_pos_to_insert_pos().
>
> Yup, I agree with you that decoder should be fed only the data
> emitted by the encoder.
>
> But shouldn't the code that yielded 'pos' that later gets decoded by
> computing "-1 -pos" without using the encoding helper be corrected
> to use the encoder instead?  After all, the primary purpose of
> inventing the encoder was to catch the arith overflow, wasn't it?

That was the primary purpose of the encoder. And it is used in those
places where we want to encode _unsigned_ positions.

All of the calls to `insert_pos_to_index_pos()` that I introduced in
this here patch pass _signed_ position values, though. They cannot
overflow nor underflow because `-1 - <int>` is just the one-complement,
i.e. all bits are flipped.

Ciao,
Dscho

>
> > That excludes all cases where the argument is
> > derived from index_name_pos(), and leaves just...
> >
> >> --- a/rerere.c
> >> +++ b/rerere.c
> >> @@ -154,7 +154,7 @@ static struct rerere_dir *find_rerere_dir(const char *hex)
> >>  		rr_dir->status = NULL;
> >>  		rr_dir->status_nr = 0;
> >>  		rr_dir->status_alloc = 0;
> >> -		pos = -1 - pos;
> >> +		pos = insert_pos_to_index_pos(pos);
> >
> > ... this one...
> >
> >>
> >>  		/* Make sure the array is big enough ... */
> >>  		ALLOC_GROW(rerere_dir, rerere_dir_nr + 1, rerere_dir_alloc);
> >> diff --git a/sha1-name.c b/sha1-name.c
> >> index 49855ad24f..bee7ce39ee 100644
> >> --- a/sha1-name.c
> >> +++ b/sha1-name.c
> >> @@ -98,7 +98,7 @@ static void find_short_object_filename(struct disambiguate_state *ds)
> >>  		loose_objects = odb_loose_cache(odb, &ds->bin_pfx);
> >>  		pos = oid_array_lookup(loose_objects, &ds->bin_pfx);
> >>  		if (pos < 0)
> >> -			pos = -1 - pos;
> >> +			pos = insert_pos_to_index_pos(pos);
> >
> > ... and this one.
> >
> >>  		while (!ds->ambiguous && pos < loose_objects->nr) {
> >>  			const struct object_id *oid;
> >>  			oid = loose_objects->oid + pos;
> >
> > -- Hannes
>
Johannes Schindelin Oct. 9, 2019, 8:15 a.m. UTC | #5
Hi Hannes,

On Tue, 8 Oct 2019, Johannes Sixt wrote:

> Am 08.10.19 um 08:48 schrieb Johannes Schindelin via GitGitGadget:
> > We have just introduced the helper `index_pos_to_insert_pos()` to help
> > avoiding underflows when returning `-1 - pos` for cases where we want to
> > return an insert position, using the ones' complement (as `int`).
>
> We do not want to have it for *all* cases, where we return -1 - pos, but
> only for those cases, where the result was actually encoded by
> index_pos_to_insert_pos(). That excludes all cases where the argument is
> derived from index_name_pos(), and leaves just...
>
> > --- a/rerere.c
> > +++ b/rerere.c
> > @@ -154,7 +154,7 @@ static struct rerere_dir *find_rerere_dir(const char *hex)
> >  		rr_dir->status = NULL;
> >  		rr_dir->status_nr = 0;
> >  		rr_dir->status_alloc = 0;
> > -		pos = -1 - pos;
> > +		pos = insert_pos_to_index_pos(pos);
>
> ... this one...

This `pos` comes from that line (unfortunately not part of the diff
context):

        pos = sha1_pos(hash, rerere_dir, rerere_dir_nr, rerere_dir_hash);

The `sha1_pos()` function was patched, as per Junio's suggestion, to...

        return index_pos_to_insert_pos(lo);

>
> >
> >  		/* Make sure the array is big enough ... */
> >  		ALLOC_GROW(rerere_dir, rerere_dir_nr + 1, rerere_dir_alloc);
> > diff --git a/sha1-name.c b/sha1-name.c
> > index 49855ad24f..bee7ce39ee 100644
> > --- a/sha1-name.c
> > +++ b/sha1-name.c
> > @@ -98,7 +98,7 @@ static void find_short_object_filename(struct disambiguate_state *ds)
> >  		loose_objects = odb_loose_cache(odb, &ds->bin_pfx);
> >  		pos = oid_array_lookup(loose_objects, &ds->bin_pfx);
> >  		if (pos < 0)
> > -			pos = -1 - pos;
> > +			pos = insert_pos_to_index_pos(pos);
>
> ... and this one.

This `pos` comes from that `oid_array_lookup()` function that _is_ part
of the diff context. What you don't see is the definition of that
function:

	int oid_array_lookup(struct oid_array *array, const struct object_id *oid)
	{
		if (!array->sorted)
				oid_array_sort(array);
		return sha1_pos(oid->hash, array->oid, array->nr, sha1_access);
	}

There you go. `sha1_pos()` again.

So yes, both of these instances were changed to call that helper on
purpose.

Ciao,
Dscho

>
> >  		while (!ds->ambiguous && pos < loose_objects->nr) {
> >  			const struct object_id *oid;
> >  			oid = loose_objects->oid + pos;
>
> -- Hannes
>
Johannes Schindelin Oct. 9, 2019, 8:17 a.m. UTC | #6
Hi Hannes,

On Wed, 9 Oct 2019, Johannes Sixt wrote:

> Am 09.10.19 um 03:19 schrieb Junio C Hamano:
> > Johannes Sixt <j6t@kdbg.org> writes:
>
> >  After all, the primary purpose of
> > inventing the encoder was to catch the arith overflow, wasn't it?
>
> That was *your* motivation for the helper function. But IMO it is a
> wrong design decision.

I wish that comment, and the argument following it, would have come as
part of the review of the patch series that already made it to `next`.

FWIW I actually agree with Junio about the helper, but in hindsight I
could have used a better name (not one that is tied to the "index").
Something like `unsigned_one_complement()`. But of course, that would
say _what_ it does, not _why_.

And yes, I would wish we had C++ templates so that the helper could use
the exact type of the caller.

Ciao,
Dscho

> Whether or not the index calculation overflows is a matter of the type
> that is used for the index, and that in turn is dicated by the
> possible sizes of the collections that are indexed. IOW, the index
> overflow check is (*if* it is necessary) a policy decision that must
> be made at a higher level and must not be hidden away in a helper
> function whose purpose (as suggested by its name) is something
> entirely different.
>
> Unless, of course, we declare "all our indexes are of type int". But
> that ship has sailed long ago, because there are too many cases where we
> are forced to use size_t as index (strlen, sizeof...).
>
> Meta note: We know that we are painting a tiny shed here (Replacing a
> one-liner by a one-liner, huh?) If anyone of you has better things to
> do, please move on. My interest in this discussion are just the design
> decisions that are made, not the actual outcome of this particular case.
>
> -- Hannes
>
Junio C Hamano Oct. 9, 2019, 11:44 a.m. UTC | #7
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> FWIW I actually agree with Junio about the helper, but in hindsight I
> could have used a better name (not one that is tied to the "index").
> Something like `unsigned_one_complement()`. But of course, that would
> say _what_ it does, not _why_.

I personally feel that the particular name is on the better side of
the borderline.  "st_add3(a, b, c)" says it is about adding three
size_t quantities, without saying why it exists and should be used
over a+b+c.  Existence of the helper and calling it alone should be
a good enough sign that we somehow feel a+b+c is not sufficient [ly
safe], so we do not call it st_add3_safe() or st_add3_wo_overflow().

Your unsigned-one-complement would fall into the same category, no?
Johannes Schindelin Oct. 9, 2019, 11:59 a.m. UTC | #8
Hi Junio,

On Wed, 9 Oct 2019, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > FWIW I actually agree with Junio about the helper, but in hindsight I
> > could have used a better name (not one that is tied to the "index").
> > Something like `unsigned_one_complement()`. But of course, that would
> > say _what_ it does, not _why_.
>
> I personally feel that the particular name is on the better side of
> the borderline.  "st_add3(a, b, c)" says it is about adding three
> size_t quantities, without saying why it exists and should be used
> over a+b+c.  Existence of the helper and calling it alone should be
> a good enough sign that we somehow feel a+b+c is not sufficient [ly
> safe], so we do not call it st_add3_safe() or st_add3_wo_overflow().
>
> Your unsigned-one-complement would fall into the same category, no?

Yes. That's what I meant to say with the "what vs why" argument.

Thanks,
Dscho
Junio C Hamano Oct. 9, 2019, 12:09 p.m. UTC | #9
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> On Wed, 9 Oct 2019, Junio C Hamano wrote:
>
>> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>>
>> > FWIW I actually agree with Junio about the helper, but in hindsight I
>> > could have used a better name (not one that is tied to the "index").
>> > Something like `unsigned_one_complement()`. But of course, that would
>> > say _what_ it does, not _why_.
>>
>> I personally feel that the particular name is on the better side of
>> the borderline.  "st_add3(a, b, c)" says it is about adding three
>> size_t quantities, without saying why it exists and should be used
>> over a+b+c.  Existence of the helper and calling it alone should be
>> a good enough sign that we somehow feel a+b+c is not sufficient [ly
>> safe], so we do not call it st_add3_safe() or st_add3_wo_overflow().
>>
>> Your unsigned-one-complement would fall into the same category, no?
>
> Yes. That's what I meant to say with the "what vs why" argument.

And what I wanted to say was that, even though we encourage use of
names that convey _why_, in a case like this, the name that conveys
only what without explicitly saying why is probably OK.
diff mbox series

Patch

diff --git a/blame.c b/blame.c
index 145eaf2faf..848355923d 100644
--- a/blame.c
+++ b/blame.c
@@ -109,8 +109,9 @@  static void verify_working_tree_path(struct repository *r,
 	pos = index_name_pos(r->index, path, strlen(path));
 	if (pos >= 0)
 		; /* path is in the index */
-	else if (-1 - pos < r->index->cache_nr &&
-		 !strcmp(r->index->cache[-1 - pos]->name, path))
+	else if (insert_pos_to_index_pos(pos) < r->index->cache_nr &&
+		 !strcmp(r->index->cache[insert_pos_to_index_pos(pos)]->name,
+			 path))
 		; /* path is in the index, unmerged */
 	else
 		die("no such path '%s' in HEAD", path);
diff --git a/builtin/mv.c b/builtin/mv.c
index be15ba7044..9ebb1ed0a2 100644
--- a/builtin/mv.c
+++ b/builtin/mv.c
@@ -102,7 +102,7 @@  static int index_range_of_same_dir(const char *src, int length,
 	if (first >= 0)
 		die(_("%.*s is in index"), len_w_slash, src_w_slash);
 
-	first = -1 - first;
+	first = insert_pos_to_index_pos(first);
 	for (last = first; last < active_nr; last++) {
 		const char *path = active_cache[last]->name;
 		if (strncmp(path, src_w_slash, len_w_slash))
diff --git a/cache.h b/cache.h
index 850e7b945a..8fb57db091 100644
--- a/cache.h
+++ b/cache.h
@@ -738,6 +738,11 @@  static inline int index_pos_to_insert_pos(uintmax_t pos)
 	return -1 - (int)pos;
 }
 
+static inline int insert_pos_to_index_pos(int pos)
+{
+	return -1 - pos;
+}
+
 #define ADD_CACHE_OK_TO_ADD 1		/* Ok to add */
 #define ADD_CACHE_OK_TO_REPLACE 2	/* Ok to replace file/directory */
 #define ADD_CACHE_SKIP_DFCHECK 4	/* Ok to skip DF conflict checks */
diff --git a/merge-recursive.c b/merge-recursive.c
index d2e380b7ed..8dca01a279 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -753,7 +753,7 @@  static int dir_in_way(struct index_state *istate, const char *path,
 	pos = index_name_pos(istate, dirpath.buf, dirpath.len);
 
 	if (pos < 0)
-		pos = -1 - pos;
+		pos = insert_pos_to_index_pos(pos);
 	if (pos < istate->cache_nr &&
 	    !strncmp(dirpath.buf, istate->cache[pos]->name, dirpath.len)) {
 		strbuf_release(&dirpath);
@@ -822,7 +822,7 @@  static int would_lose_untracked(struct merge_options *opt, const char *path)
 	int pos = index_name_pos(istate, path, strlen(path));
 
 	if (pos < 0)
-		pos = -1 - pos;
+		pos = insert_pos_to_index_pos(pos);
 	while (pos < istate->cache_nr &&
 	       !strcmp(path, istate->cache[pos]->name)) {
 		/*
diff --git a/read-cache.c b/read-cache.c
index ec13953a21..ac3b0f8e5c 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -641,7 +641,7 @@  static int index_name_pos_also_unmerged(struct index_state *istate,
 		return pos;
 
 	/* maybe unmerged? */
-	pos = -1 - pos;
+	pos = insert_pos_to_index_pos(pos);
 	if (pos >= istate->cache_nr ||
 			compare_name((ce = istate->cache[pos]), path, namelen))
 		return -1;
diff --git a/rerere.c b/rerere.c
index 17abb47321..122ebed5d8 100644
--- a/rerere.c
+++ b/rerere.c
@@ -154,7 +154,7 @@  static struct rerere_dir *find_rerere_dir(const char *hex)
 		rr_dir->status = NULL;
 		rr_dir->status_nr = 0;
 		rr_dir->status_alloc = 0;
-		pos = -1 - pos;
+		pos = insert_pos_to_index_pos(pos);
 
 		/* Make sure the array is big enough ... */
 		ALLOC_GROW(rerere_dir, rerere_dir_nr + 1, rerere_dir_alloc);
diff --git a/sha1-name.c b/sha1-name.c
index 49855ad24f..bee7ce39ee 100644
--- a/sha1-name.c
+++ b/sha1-name.c
@@ -98,7 +98,7 @@  static void find_short_object_filename(struct disambiguate_state *ds)
 		loose_objects = odb_loose_cache(odb, &ds->bin_pfx);
 		pos = oid_array_lookup(loose_objects, &ds->bin_pfx);
 		if (pos < 0)
-			pos = -1 - pos;
+			pos = insert_pos_to_index_pos(pos);
 		while (!ds->ambiguous && pos < loose_objects->nr) {
 			const struct object_id *oid;
 			oid = loose_objects->oid + pos;
diff --git a/submodule.c b/submodule.c
index 0f199c5137..4cab9100ea 100644
--- a/submodule.c
+++ b/submodule.c
@@ -37,7 +37,7 @@  int is_gitmodules_unmerged(const struct index_state *istate)
 {
 	int pos = index_name_pos(istate, GITMODULES_FILE, strlen(GITMODULES_FILE));
 	if (pos < 0) { /* .gitmodules not found or isn't merged */
-		pos = -1 - pos;
+		pos = insert_pos_to_index_pos(pos);
 		if (istate->cache_nr > pos) {  /* there is a .gitmodules */
 			const struct cache_entry *ce = istate->cache[pos];
 			if (ce_namelen(ce) == strlen(GITMODULES_FILE) &&
diff --git a/unpack-trees.c b/unpack-trees.c
index dab713203e..abb33ce259 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -553,7 +553,7 @@  static int locate_in_src_index(const struct cache_entry *ce,
 	int len = ce_namelen(ce);
 	int pos = index_name_pos(index, ce->name, len);
 	if (pos < 0)
-		pos = -1 - pos;
+		pos = insert_pos_to_index_pos(pos);
 	return pos;
 }