diff mbox series

[02/13] msvc: avoid using minus operator on unsigned types

Message ID 2abe1e1fb0bf3025489c2e543b9a9c648a164827.1569486607.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series ci: include a Visual Studio build & test in our Azure Pipeline | expand

Commit Message

Johannes Schindelin via GitGitGadget Sept. 26, 2019, 8:30 a.m. UTC
From: Johannes Schindelin <johannes.schindelin@gmx.de>

MSVC complains about this with `-Wall`, which can be taken as a sign
that this is indeed a real bug. The symptom is:

	C4146: unary minus operator applied to unsigned type, result
	still unsigned

Let's avoid this warning in the minimal way, e.g. writing `-1 -
<unsigned value>` instead of `-<unsigned value> - 1`.

Note that the change in the `estimate_cache_size()` function is
needed because MSVC considers the "return type" of the `sizeof()`
operator to be `size_t`, i.e. unsigned, and therefore it cannot be
negated using the unary minus operator.

Even worse, that arithmetic is doing extra work, in vain. We want to
calculate the entry extra cache size as the difference between the
size of the `cache_entry` structure minus the size of the
`ondisk_cache_entry` structure, padded to the appropriate alignment
boundary.

To that end, we start by assigning that difference to the `per_entry`
variable, and then abuse the `len` parameter of the
`align_padding_size()` macro to take the negative size of the ondisk
entry size. Essentially, we try to avoid passing the already calculated
difference to that macro by passing the operands of that difference
instead, when the macro expects operands of an addition:

	#define align_padding_size(size, len) \
		((size + (len) + 8) & ~7) - (size + len)

Currently, we pass A and -B to that macro instead of passing A - B and
0, where A - B is already stored in the `per_entry` variable, ready to
be used.

This is neither necessary, nor intuitive. Let's fix this, and have code
that is both easier to read and that also does not trigger MSVC's
warning.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 read-cache.c  | 4 ++--
 sha1-lookup.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

Comments

Denton Liu Sept. 26, 2019, 5:20 p.m. UTC | #1
Hi Dscho,

On Thu, Sep 26, 2019 at 01:30:10AM -0700, Johannes Schindelin via GitGitGadget wrote:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> 
> MSVC complains about this with `-Wall`, which can be taken as a sign
> that this is indeed a real bug. The symptom is:
> 
> 	C4146: unary minus operator applied to unsigned type, result
> 	still unsigned
> 
> Let's avoid this warning in the minimal way, e.g. writing `-1 -
> <unsigned value>` instead of `-<unsigned value> - 1`.

[...]

> ---
>  read-cache.c  | 4 ++--
>  sha1-lookup.c | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/read-cache.c b/read-cache.c
> index c701f7f8b8..11f3357216 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1276,7 +1276,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
>  	 */
>  	if (istate->cache_nr > 0 &&
>  		strcmp(ce->name, istate->cache[istate->cache_nr - 1]->name) > 0)
> -		pos = -istate->cache_nr - 1;
> +		pos = -1 - istate->cache_nr;

I've been thinking about this and I'm still not certain that this 100%
correct from a language-lawyer perspective.

If we do `-1 - istate->cache_nr`, then the unsignedness of
istate->cache_nr takes over and the whole expression is a very large
unsigned number.

Then, when we assign to `int pos`, we are converting an unsigned number
which is out of the range of the signed number. According to a
StackOverflow post citing the C99 standard[1]:

	Otherwise, the new type is signed and the value cannot be
	represented in it; either the result is implementation-defined
	or an implementation-defined signal is raised.

I'm sure that most platforms that we support will handle it sanely but
could we write this as

	pos = -1 - (int) istate->cache_nr;

to be doubly sure that no funny business will happen?

>  	else
>  		pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce));
>  
> @@ -1894,7 +1894,7 @@ static size_t estimate_cache_size(size_t ondisk_size, unsigned int entries)
>  	/*
>  	 * Account for potential alignment differences.
>  	 */
> -	per_entry += align_padding_size(sizeof(struct cache_entry), -sizeof(struct ondisk_cache_entry));
> +	per_entry += align_padding_size(per_entry, 0);
>  	return ondisk_size + entries * per_entry;
>  }
>  
> diff --git a/sha1-lookup.c b/sha1-lookup.c
> index 796ab68da8..c819687730 100644
> --- a/sha1-lookup.c
> +++ b/sha1-lookup.c
> @@ -97,7 +97,7 @@ int sha1_pos(const unsigned char *sha1, void *table, size_t nr,
>  			lo = mi + 1;
>  		mi = lo + (hi - lo) / 2;
>  	} while (lo < hi);
> -	return -lo-1;
> +	return -1 - lo;

Same thing here.

[1]: https://stackoverflow.com/questions/50605/signed-to-unsigned-conversion-in-c-is-it-always-safe

>  }
>  
>  int bsearch_hash(const unsigned char *sha1, const uint32_t *fanout_nbo,
> -- 
> gitgitgadget
>
Johannes Schindelin Sept. 26, 2019, 9:01 p.m. UTC | #2
Hi Denton,

On Thu, 26 Sep 2019, Denton Liu wrote:

> Hi Dscho,
>
> On Thu, Sep 26, 2019 at 01:30:10AM -0700, Johannes Schindelin via GitGitGadget wrote:
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > MSVC complains about this with `-Wall`, which can be taken as a sign
> > that this is indeed a real bug. The symptom is:
> >
> > 	C4146: unary minus operator applied to unsigned type, result
> > 	still unsigned
> >
> > Let's avoid this warning in the minimal way, e.g. writing `-1 -
> > <unsigned value>` instead of `-<unsigned value> - 1`.
>
> [...]
>
> > ---
> >  read-cache.c  | 4 ++--
> >  sha1-lookup.c | 2 +-
> >  2 files changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/read-cache.c b/read-cache.c
> > index c701f7f8b8..11f3357216 100644
> > --- a/read-cache.c
> > +++ b/read-cache.c
> > @@ -1276,7 +1276,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
> >  	 */
> >  	if (istate->cache_nr > 0 &&
> >  		strcmp(ce->name, istate->cache[istate->cache_nr - 1]->name) > 0)
> > -		pos = -istate->cache_nr - 1;
> > +		pos = -1 - istate->cache_nr;
>
> I've been thinking about this and I'm still not certain that this 100%
> correct from a language-lawyer perspective.
>
> If we do `-1 - istate->cache_nr`, then the unsignedness of
> istate->cache_nr takes over and the whole expression is a very large
> unsigned number.
>
> Then, when we assign to `int pos`, we are converting an unsigned number
> which is out of the range of the signed number. According to a
> StackOverflow post citing the C99 standard[1]:
>
> 	Otherwise, the new type is signed and the value cannot be
> 	represented in it; either the result is implementation-defined
> 	or an implementation-defined signal is raised.
>
> I'm sure that most platforms that we support will handle it sanely but
> could we write this as
>
> 	pos = -1 - (int) istate->cache_nr;
>
> to be doubly sure that no funny business will happen?

I guess we should use `signed_add_overflows()` to make extra certain
that it does what we want it to do, kind of like `st_add()`. Or just do
the check explicitly, like so:

	if (istate->cache_nr > INT_MAX)
		die("overflow: -1 - %u", istate->cache_nr);
	pos = -1 - istate->cache_nr;
}
>
> >  	else
> >  		pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce));
> >
> > @@ -1894,7 +1894,7 @@ static size_t estimate_cache_size(size_t ondisk_size, unsigned int entries)
> >  	/*
> >  	 * Account for potential alignment differences.
> >  	 */
> > -	per_entry += align_padding_size(sizeof(struct cache_entry), -sizeof(struct ondisk_cache_entry));
> > +	per_entry += align_padding_size(per_entry, 0);
> >  	return ondisk_size + entries * per_entry;
> >  }
> >
> > diff --git a/sha1-lookup.c b/sha1-lookup.c
> > index 796ab68da8..c819687730 100644
> > --- a/sha1-lookup.c
> > +++ b/sha1-lookup.c
> > @@ -97,7 +97,7 @@ int sha1_pos(const unsigned char *sha1, void *table, size_t nr,
> >  			lo = mi + 1;
> >  		mi = lo + (hi - lo) / 2;
> >  	} while (lo < hi);
> > -	return -lo-1;
> > +	return -1 - lo;
>
> Same thing here.

This is even more critical, as `lo` has the type `size_t`:

	if (lo > INT_MAX)
		die("overflow: -1 - %"PRIuMAX, (uintmax_t)lo);
	return -1 - lo;
>
What do you think?
Dscho

> [1]: https://stackoverflow.com/questions/50605/signed-to-unsigned-conversion-in-c-is-it-always-safe
>
> >  }
> >
> >  int bsearch_hash(const unsigned char *sha1, const uint32_t *fanout_nbo,
> > --
> > gitgitgadget
> >
>
Denton Liu Sept. 26, 2019, 11:57 p.m. UTC | #3
Hi Dscho,

On Thu, Sep 26, 2019 at 11:01:32PM +0200, Johannes Schindelin wrote:
> Hi Denton,
> 
> On Thu, 26 Sep 2019, Denton Liu wrote:
> 
> > Hi Dscho,
> >
> > On Thu, Sep 26, 2019 at 01:30:10AM -0700, Johannes Schindelin via GitGitGadget wrote:
> > > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> > >
> > > MSVC complains about this with `-Wall`, which can be taken as a sign
> > > that this is indeed a real bug. The symptom is:
> > >
> > > 	C4146: unary minus operator applied to unsigned type, result
> > > 	still unsigned
> > >
> > > Let's avoid this warning in the minimal way, e.g. writing `-1 -
> > > <unsigned value>` instead of `-<unsigned value> - 1`.
> >
> > [...]
> >
> > > ---
> > >  read-cache.c  | 4 ++--
> > >  sha1-lookup.c | 2 +-
> > >  2 files changed, 3 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/read-cache.c b/read-cache.c
> > > index c701f7f8b8..11f3357216 100644
> > > --- a/read-cache.c
> > > +++ b/read-cache.c
> > > @@ -1276,7 +1276,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
> > >  	 */
> > >  	if (istate->cache_nr > 0 &&
> > >  		strcmp(ce->name, istate->cache[istate->cache_nr - 1]->name) > 0)
> > > -		pos = -istate->cache_nr - 1;
> > > +		pos = -1 - istate->cache_nr;
> >
> > I've been thinking about this and I'm still not certain that this 100%
> > correct from a language-lawyer perspective.
> >
> > If we do `-1 - istate->cache_nr`, then the unsignedness of
> > istate->cache_nr takes over and the whole expression is a very large
> > unsigned number.
> >
> > Then, when we assign to `int pos`, we are converting an unsigned number
> > which is out of the range of the signed number. According to a
> > StackOverflow post citing the C99 standard[1]:
> >
> > 	Otherwise, the new type is signed and the value cannot be
> > 	represented in it; either the result is implementation-defined
> > 	or an implementation-defined signal is raised.
> >
> > I'm sure that most platforms that we support will handle it sanely but
> > could we write this as
> >
> > 	pos = -1 - (int) istate->cache_nr;
> >
> > to be doubly sure that no funny business will happen?
> 
> I guess we should use `signed_add_overflows()` to make extra certain
> that it does what we want it to do, kind of like `st_add()`. Or just do
> the check explicitly, like so:
> 
> 	if (istate->cache_nr > INT_MAX)
> 		die("overflow: -1 - %u", istate->cache_nr);
> 	pos = -1 - istate->cache_nr;

Could we change this to 

 	pos = -1 - (int) istate->cache_nr;

so that we alleviate the problem I was talking about above?

Other than that, it looks good. Well, it might break on one's complement
systems but I don't think we support UNIVACs anyway. ;)

> }
> >
> > >  	else
> > >  		pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce));
> > >
> > > @@ -1894,7 +1894,7 @@ static size_t estimate_cache_size(size_t ondisk_size, unsigned int entries)
> > >  	/*
> > >  	 * Account for potential alignment differences.
> > >  	 */
> > > -	per_entry += align_padding_size(sizeof(struct cache_entry), -sizeof(struct ondisk_cache_entry));
> > > +	per_entry += align_padding_size(per_entry, 0);
> > >  	return ondisk_size + entries * per_entry;
> > >  }
> > >
> > > diff --git a/sha1-lookup.c b/sha1-lookup.c
> > > index 796ab68da8..c819687730 100644
> > > --- a/sha1-lookup.c
> > > +++ b/sha1-lookup.c
> > > @@ -97,7 +97,7 @@ int sha1_pos(const unsigned char *sha1, void *table, size_t nr,
> > >  			lo = mi + 1;
> > >  		mi = lo + (hi - lo) / 2;
> > >  	} while (lo < hi);
> > > -	return -lo-1;
> > > +	return -1 - lo;
> >
> > Same thing here.
> 
> This is even more critical, as `lo` has the type `size_t`:
> 
> 	if (lo > INT_MAX)
> 		die("overflow: -1 - %"PRIuMAX, (uintmax_t)lo);
> 	return -1 - lo;

Also, could we change this to

 	return -1 - (int) lo;

Thanks,

Denton

> >
> What do you think?
> Dscho
> 
> > [1]: https://stackoverflow.com/questions/50605/signed-to-unsigned-conversion-in-c-is-it-always-safe
> >
> > >  }
> > >
> > >  int bsearch_hash(const unsigned char *sha1, const uint32_t *fanout_nbo,
> > > --
> > > gitgitgadget
> > >
> >
Johannes Schindelin Sept. 30, 2019, 9:50 a.m. UTC | #4
Hi Denton,

On Thu, 26 Sep 2019, Denton Liu wrote:

> On Thu, Sep 26, 2019 at 11:01:32PM +0200, Johannes Schindelin wrote:
> >
> > On Thu, 26 Sep 2019, Denton Liu wrote:
> >
> > > Hi Dscho,
> > >
> > > On Thu, Sep 26, 2019 at 01:30:10AM -0700, Johannes Schindelin via GitGitGadget wrote:
> > > > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> > > >
> > > > MSVC complains about this with `-Wall`, which can be taken as a sign
> > > > that this is indeed a real bug. The symptom is:
> > > >
> > > > 	C4146: unary minus operator applied to unsigned type, result
> > > > 	still unsigned
> > > >
> > > > Let's avoid this warning in the minimal way, e.g. writing `-1 -
> > > > <unsigned value>` instead of `-<unsigned value> - 1`.
> > >
> > > [...]
> > >
> > > > ---
> > > >  read-cache.c  | 4 ++--
> > > >  sha1-lookup.c | 2 +-
> > > >  2 files changed, 3 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/read-cache.c b/read-cache.c
> > > > index c701f7f8b8..11f3357216 100644
> > > > --- a/read-cache.c
> > > > +++ b/read-cache.c
> > > > @@ -1276,7 +1276,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
> > > >  	 */
> > > >  	if (istate->cache_nr > 0 &&
> > > >  		strcmp(ce->name, istate->cache[istate->cache_nr - 1]->name) > 0)
> > > > -		pos = -istate->cache_nr - 1;
> > > > +		pos = -1 - istate->cache_nr;
> > >
> > > I've been thinking about this and I'm still not certain that this 100%
> > > correct from a language-lawyer perspective.
> > >
> > > If we do `-1 - istate->cache_nr`, then the unsignedness of
> > > istate->cache_nr takes over and the whole expression is a very large
> > > unsigned number.
> > >
> > > Then, when we assign to `int pos`, we are converting an unsigned number
> > > which is out of the range of the signed number. According to a
> > > StackOverflow post citing the C99 standard[1]:
> > >
> > > 	Otherwise, the new type is signed and the value cannot be
> > > 	represented in it; either the result is implementation-defined
> > > 	or an implementation-defined signal is raised.
> > >
> > > I'm sure that most platforms that we support will handle it sanely but
> > > could we write this as
> > >
> > > 	pos = -1 - (int) istate->cache_nr;
> > >
> > > to be doubly sure that no funny business will happen?
> >
> > I guess we should use `signed_add_overflows()` to make extra certain
> > that it does what we want it to do, kind of like `st_add()`. Or just do
> > the check explicitly, like so:
> >
> > 	if (istate->cache_nr > INT_MAX)
> > 		die("overflow: -1 - %u", istate->cache_nr);
> > 	pos = -1 - istate->cache_nr;
>
> Could we change this to
>
>  	pos = -1 - (int) istate->cache_nr;
>
> so that we alleviate the problem I was talking about above?

Thank you, I had missed that.

> Other than that, it looks good. Well, it might break on one's complement
> systems but I don't think we support UNIVACs anyway. ;)

Let's hope that nobody tries to run Git on such a system, indeed.

Thanks,
Dscho

>
> > }
> > >
> > > >  	else
> > > >  		pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce));
> > > >
> > > > @@ -1894,7 +1894,7 @@ static size_t estimate_cache_size(size_t ondisk_size, unsigned int entries)
> > > >  	/*
> > > >  	 * Account for potential alignment differences.
> > > >  	 */
> > > > -	per_entry += align_padding_size(sizeof(struct cache_entry), -sizeof(struct ondisk_cache_entry));
> > > > +	per_entry += align_padding_size(per_entry, 0);
> > > >  	return ondisk_size + entries * per_entry;
> > > >  }
> > > >
> > > > diff --git a/sha1-lookup.c b/sha1-lookup.c
> > > > index 796ab68da8..c819687730 100644
> > > > --- a/sha1-lookup.c
> > > > +++ b/sha1-lookup.c
> > > > @@ -97,7 +97,7 @@ int sha1_pos(const unsigned char *sha1, void *table, size_t nr,
> > > >  			lo = mi + 1;
> > > >  		mi = lo + (hi - lo) / 2;
> > > >  	} while (lo < hi);
> > > > -	return -lo-1;
> > > > +	return -1 - lo;
> > >
> > > Same thing here.
> >
> > This is even more critical, as `lo` has the type `size_t`:
> >
> > 	if (lo > INT_MAX)
> > 		die("overflow: -1 - %"PRIuMAX, (uintmax_t)lo);
> > 	return -1 - lo;
>
> Also, could we change this to
>
>  	return -1 - (int) lo;
>
> Thanks,
>
> Denton
>
> > >
> > What do you think?
> > Dscho
> >
> > > [1]: https://stackoverflow.com/questions/50605/signed-to-unsigned-conversion-in-c-is-it-always-safe
> > >
> > > >  }
> > > >
> > > >  int bsearch_hash(const unsigned char *sha1, const uint32_t *fanout_nbo,
> > > > --
> > > > gitgitgadget
> > > >
> > >
>
diff mbox series

Patch

diff --git a/read-cache.c b/read-cache.c
index c701f7f8b8..11f3357216 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1276,7 +1276,7 @@  static int add_index_entry_with_check(struct index_state *istate, struct cache_e
 	 */
 	if (istate->cache_nr > 0 &&
 		strcmp(ce->name, istate->cache[istate->cache_nr - 1]->name) > 0)
-		pos = -istate->cache_nr - 1;
+		pos = -1 - istate->cache_nr;
 	else
 		pos = index_name_stage_pos(istate, ce->name, ce_namelen(ce), ce_stage(ce));
 
@@ -1894,7 +1894,7 @@  static size_t estimate_cache_size(size_t ondisk_size, unsigned int entries)
 	/*
 	 * Account for potential alignment differences.
 	 */
-	per_entry += align_padding_size(sizeof(struct cache_entry), -sizeof(struct ondisk_cache_entry));
+	per_entry += align_padding_size(per_entry, 0);
 	return ondisk_size + entries * per_entry;
 }
 
diff --git a/sha1-lookup.c b/sha1-lookup.c
index 796ab68da8..c819687730 100644
--- a/sha1-lookup.c
+++ b/sha1-lookup.c
@@ -97,7 +97,7 @@  int sha1_pos(const unsigned char *sha1, void *table, size_t nr,
 			lo = mi + 1;
 		mi = lo + (hi - lo) / 2;
 	} while (lo < hi);
-	return -lo-1;
+	return -1 - lo;
 }
 
 int bsearch_hash(const unsigned char *sha1, const uint32_t *fanout_nbo,