diff mbox series

[GSoC,v2] rm: fix sign comparison warnings

Message ID 71098ea7-9136-4ab2-8e15-27017773e054@gmail.com (mailing list archive)
State Superseded
Headers show
Series [GSoC,v2] rm: fix sign comparison warnings | expand

Commit Message

Arnav Bhate March 16, 2025, 10:13 a.m. UTC
There are multiple places in loops, where a signed and an
unsigned data type are compared. Git uses a mix of signed and unsigned
types to store lengths of arrays. This sometimes leads to using a signed
index for an array whose length is stored in an unsigned variable or
vice versa.

get_ours_cache_pos is a special case where i, though derived from a
signed variable is never negative. Move this part to the caller side
and make i an unsigned argument of the function. Rename i to
inverted_pos to make it more descriptive, now that it is a function
argument.

Replace signed data types with unsigned data types and vice versa
wherever necessary. Where both signed and unsigned data types have been
used, define a new variable in the scope of the for loop for use as the
iterator. Remove #define DISABLE_SIGN_COMPARE_WARNINGS.

Signed-off-by: Arnav Bhate <bhatearnav@gmail.com>
---
 builtin/rm.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

Comments

Junio C Hamano March 17, 2025, 4:47 p.m. UTC | #1
Arnav Bhate <bhatearnav@gmail.com> writes:

> -static int get_ours_cache_pos(const char *path, int pos)
> +static int get_ours_cache_pos(const char *path, unsigned int inverted_pos)

This renaming of parameter is not right.  

At this point when the value comes to this function, it *IS* the
position, there is nothing inverted about it.  It points at the
position in the .cache[] array where an cache_entry at a higher
stage would appear.

It is perfectly fine to state that the value that is returned from
index_name_pos() is potentially inverted.  The function is given a
path name (without any stage information) and

 - returns a non-negative number, the position in the .cache[] array,
   where a cache_entry at stage #0 (i.e. an entry for a path that does
   not require conflict resolution), or

 - returns a negative number, when there is no such cache_entry
   exists.  The caller can "invert" the value to recover a position
   in the .cache[] array, where a cache_entry for the path at stage
   #0 _would_ _have_ been found, if existed.  Due to the way the
   cache entries are sorted in the .cache[] array, when you are
   interested in finding cache entries for a path at higher stages,
   like this function is, you can start scanning at this point until
   you see an entry for a different path.

Calling the parameter "pos" is the right thing to do.  The value
used to come here _could_ have been called "inverted", and the
result of (-inverted_pos-1) can be assigned to "pos".  But because
the patch moves the inversion to the caller, what the code in the
while loop sees is no longer "inverted".

>  {
> -	int i = -pos - 1;
> -
> -	while ((i < the_repository->index->cache_nr) && !strcmp(the_repository->index->cache[i]->name, path)) {
> -		if (ce_stage(the_repository->index->cache[i]) == 2)
> -			return i;
> -		i++;
> +	while ((inverted_pos < the_repository->index->cache_nr) && !strcmp(the_repository->index->cache[inverted_pos]->name, path)) {
> +		if (ce_stage(the_repository->index->cache[inverted_pos]) == 2)
> +			return inverted_pos;
> +		inverted_pos++;
>  	}
>  	return -1;
>  }
> @@ -58,7 +55,7 @@ static void print_error_files(struct string_list *files_list,
>  			      int *errs)
>  {
>  	if (files_list->nr) {
> -		int i;
> +		unsigned int i;
>  		struct strbuf err_msg = STRBUF_INIT;
>  
>  		strbuf_addstr(&err_msg, main_msg);
> @@ -83,7 +80,7 @@ static void submodules_absorb_gitdir_if_needed(void)
>  
>  		pos = index_name_pos(the_repository->index, name, strlen(name));
>  		if (pos < 0) {

Here is where the caller notices that index_name_pos() did not see a
stage #0 entry.  This caller wants to see "ours" entry at stage #2,
so it "inverts" the returned value and asks the helper function if
it sees such an entry in the .cache[] array.

A handful of prerequisite pieces of knowledge to understand this
code are:

 - The index (i.e. the .cache[] array) is sorted by full path name
   (down from the top level of the working tree).

 - The index can have at most one stage #0 entry for each path name.
   When a stage #0 entry exists for a path name, there cannot be
   higher stage entries (the path is called "resolved").

 - The cache entries in the .cache[] array for the same path name
   are sorted by their stage number.

 - There can be at most one stage #2 entry for each path name, which
   are called "ours".  Entries at stage #1 are from common ancestor,
   entries at stage #3 are from "their" tree.  These higher (i.e.
   more than zero) stage entries appear only for "conflicting"
   paths in the .cache[] array.

With the understanding above, you can see why "our" position is
computed only when index_name_pos() returns negative in this hunk.

> -			pos = get_ours_cache_pos(name, pos);
> +			pos = get_ours_cache_pos(name, -pos - 1);
>  			if (pos < 0)
>  				continue;
>  		}
> @@ -131,7 +128,7 @@ static int check_local_mod(struct object_id *head, int index_only)
>  			 * Skip unmerged entries except for populated submodules
>  			 * that could lose history when removed.
>  			 */
> -			pos = get_ours_cache_pos(name, pos);
> +			pos = get_ours_cache_pos(name, -pos - 1);
>  			if (pos < 0)
>  				continue;

The above hunks are perfectly fine.  

> @@ -314,7 +311,7 @@ int cmd_rm(int argc,
>  	if (pathspec_needs_expanded_index(the_repository->index, &pathspec))
>  		ensure_full_index(the_repository->index);
>  
> -	for (i = 0; i < the_repository->index->cache_nr; i++) {
> +	for (unsigned int i = 0; i < the_repository->index->cache_nr; i++) {
>  		const struct cache_entry *ce = the_repository->index->cache[i];
>  
>  		if (!include_sparse &&

OK.

Thanks.
Arnav Bhate March 17, 2025, 5:05 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> Arnav Bhate <bhatearnav@gmail.com> writes:
> 
>> -static int get_ours_cache_pos(const char *path, int pos)
>> +static int get_ours_cache_pos(const char *path, unsigned int inverted_pos)
> 
> This renaming of parameter is not right.  
> 
> At this point when the value comes to this function, it *IS* the
> position, there is nothing inverted about it.  It points at the
> position in the .cache[] array where an cache_entry at a higher
> stage would appear.
> 
> It is perfectly fine to state that the value that is returned from
> index_name_pos() is potentially inverted.  The function is given a
> path name (without any stage information) and
> 
>  - returns a non-negative number, the position in the .cache[] array,
>    where a cache_entry at stage #0 (i.e. an entry for a path that does
>    not require conflict resolution), or
> 
>  - returns a negative number, when there is no such cache_entry
>    exists.  The caller can "invert" the value to recover a position
>    in the .cache[] array, where a cache_entry for the path at stage
>    #0 _would_ _have_ been found, if existed.  Due to the way the
>    cache entries are sorted in the .cache[] array, when you are
>    interested in finding cache entries for a path at higher stages,
>    like this function is, you can start scanning at this point until
>    you see an entry for a different path.
> 
> Calling the parameter "pos" is the right thing to do.  The value
> used to come here _could_ have been called "inverted", and the
> result of (-inverted_pos-1) can be assigned to "pos".  But because
> the patch moves the inversion to the caller, what the code in the
> while loop sees is no longer "inverted".

My logic was that it was the inversion of the variable pos, but your
logic makes more sense. I'll make the change.
 
>>  {
>> -	int i = -pos - 1;
>> -
>> -	while ((i < the_repository->index->cache_nr) && !strcmp(the_repository->index->cache[i]->name, path)) {
>> -		if (ce_stage(the_repository->index->cache[i]) == 2)
>> -			return i;
>> -		i++;
>> +	while ((inverted_pos < the_repository->index->cache_nr) && !strcmp(the_repository->index->cache[inverted_pos]->name, path)) {
>> +		if (ce_stage(the_repository->index->cache[inverted_pos]) == 2)
>> +			return inverted_pos;
>> +		inverted_pos++;
>>  	}
>>  	return -1;
>>  }
>> @@ -58,7 +55,7 @@ static void print_error_files(struct string_list *files_list,
>>  			      int *errs)
>>  {
>>  	if (files_list->nr) {
>> -		int i;
>> +		unsigned int i;
>>  		struct strbuf err_msg = STRBUF_INIT;
>>  
>>  		strbuf_addstr(&err_msg, main_msg);
>> @@ -83,7 +80,7 @@ static void submodules_absorb_gitdir_if_needed(void)
>>  
>>  		pos = index_name_pos(the_repository->index, name, strlen(name));
>>  		if (pos < 0) {
> 
> Here is where the caller notices that index_name_pos() did not see a
> stage #0 entry.  This caller wants to see "ours" entry at stage #2,
> so it "inverts" the returned value and asks the helper function if
> it sees such an entry in the .cache[] array.
> 
> A handful of prerequisite pieces of knowledge to understand this
> code are:
> 
>  - The index (i.e. the .cache[] array) is sorted by full path name
>    (down from the top level of the working tree).
> 
>  - The index can have at most one stage #0 entry for each path name.
>    When a stage #0 entry exists for a path name, there cannot be
>    higher stage entries (the path is called "resolved").
> 
>  - The cache entries in the .cache[] array for the same path name
>    are sorted by their stage number.
> 
>  - There can be at most one stage #2 entry for each path name, which
>    are called "ours".  Entries at stage #1 are from common ancestor,
>    entries at stage #3 are from "their" tree.  These higher (i.e.
>    more than zero) stage entries appear only for "conflicting"
>    paths in the .cache[] array.
> 
> With the understanding above, you can see why "our" position is
> computed only when index_name_pos() returns negative in this hunk.

Thanks for the explanation, I was not able to get this from the code.

> 
>> -			pos = get_ours_cache_pos(name, pos);
>> +			pos = get_ours_cache_pos(name, -pos - 1);
>>  			if (pos < 0)
>>  				continue;
>>  		}
>> @@ -131,7 +128,7 @@ static int check_local_mod(struct object_id *head, int index_only)
>>  			 * Skip unmerged entries except for populated submodules
>>  			 * that could lose history when removed.
>>  			 */
>> -			pos = get_ours_cache_pos(name, pos);
>> +			pos = get_ours_cache_pos(name, -pos - 1);
>>  			if (pos < 0)
>>  				continue;
> 
> The above hunks are perfectly fine.  
> 
>> @@ -314,7 +311,7 @@ int cmd_rm(int argc,
>>  	if (pathspec_needs_expanded_index(the_repository->index, &pathspec))
>>  		ensure_full_index(the_repository->index);
>>  
>> -	for (i = 0; i < the_repository->index->cache_nr; i++) {
>> +	for (unsigned int i = 0; i < the_repository->index->cache_nr; i++) {
>>  		const struct cache_entry *ce = the_repository->index->cache[i];
>>  
>>  		if (!include_sparse &&
> 
> OK.
> 
> Thanks.
diff mbox series

Patch

diff --git a/builtin/rm.c b/builtin/rm.c
index 12ae086a55..a5c9fc644e 100644
--- a/builtin/rm.c
+++ b/builtin/rm.c
@@ -5,7 +5,6 @@ 
  */
 
 #define USE_THE_REPOSITORY_VARIABLE
-#define DISABLE_SIGN_COMPARE_WARNINGS
 
 #include "builtin.h"
 #include "advice.h"
@@ -40,14 +39,12 @@  static struct {
 	} *entry;
 } list;
 
-static int get_ours_cache_pos(const char *path, int pos)
+static int get_ours_cache_pos(const char *path, unsigned int inverted_pos)
 {
-	int i = -pos - 1;
-
-	while ((i < the_repository->index->cache_nr) && !strcmp(the_repository->index->cache[i]->name, path)) {
-		if (ce_stage(the_repository->index->cache[i]) == 2)
-			return i;
-		i++;
+	while ((inverted_pos < the_repository->index->cache_nr) && !strcmp(the_repository->index->cache[inverted_pos]->name, path)) {
+		if (ce_stage(the_repository->index->cache[inverted_pos]) == 2)
+			return inverted_pos;
+		inverted_pos++;
 	}
 	return -1;
 }
@@ -58,7 +55,7 @@  static void print_error_files(struct string_list *files_list,
 			      int *errs)
 {
 	if (files_list->nr) {
-		int i;
+		unsigned int i;
 		struct strbuf err_msg = STRBUF_INIT;
 
 		strbuf_addstr(&err_msg, main_msg);
@@ -83,7 +80,7 @@  static void submodules_absorb_gitdir_if_needed(void)
 
 		pos = index_name_pos(the_repository->index, name, strlen(name));
 		if (pos < 0) {
-			pos = get_ours_cache_pos(name, pos);
+			pos = get_ours_cache_pos(name, -pos - 1);
 			if (pos < 0)
 				continue;
 		}
@@ -131,7 +128,7 @@  static int check_local_mod(struct object_id *head, int index_only)
 			 * Skip unmerged entries except for populated submodules
 			 * that could lose history when removed.
 			 */
-			pos = get_ours_cache_pos(name, pos);
+			pos = get_ours_cache_pos(name, -pos - 1);
 			if (pos < 0)
 				continue;
 
@@ -314,7 +311,7 @@  int cmd_rm(int argc,
 	if (pathspec_needs_expanded_index(the_repository->index, &pathspec))
 		ensure_full_index(the_repository->index);
 
-	for (i = 0; i < the_repository->index->cache_nr; i++) {
+	for (unsigned int i = 0; i < the_repository->index->cache_nr; i++) {
 		const struct cache_entry *ce = the_repository->index->cache[i];
 
 		if (!include_sparse &&