mbox series

[0/9] commit-reach: -Wsign-compare follow-ups

Message ID 20241227-b4-pks-commit-reach-sign-compare-v1-0-07c59c2aa632@pks.im (mailing list archive)
Headers show
Series commit-reach: -Wsign-compare follow-ups | expand

Message

Patrick Steinhardt Dec. 27, 2024, 10:46 a.m. UTC
Hi,

this patch series is a follow-up for [1], fixing two smallish issues
introduced in that patch. Naturally I couldn't stop there and decided to
also make "commit-reach.c" and two other files -Wsign-compare-clean.

Thanks!

Patrick

[1]: <20241220084949.GA132704@coredump.intra.peff.net>

---
Patrick Steinhardt (9):
      prio-queue: fix type of `insertion_ctr`
      commit-reach: fix index used to loop through unsigned integer
      commit-reach: fix type of `min_commit_date`
      commit-reach: use `size_t` to track indices in `remove_redundant()`
      commit-reach: use `size_t` to track indices in `get_reachable_subset()`
      builtin/log: use `size_t` to track indices
      builtin/log: fix remaining -Wsign-compare warnings
      shallow: fix -Wsign-compare warnings
      commit-reach: use `size_t` to track indices when computing merge bases

 bisect.c              | 11 ++++----
 builtin/log.c         | 50 +++++++++++++++++-----------------
 builtin/merge-base.c  |  4 +--
 commit-reach.c        | 75 +++++++++++++++++++++++++++------------------------
 commit-reach.h        | 10 +++----
 commit.c              |  4 +--
 commit.h              |  2 +-
 prio-queue.h          |  4 +--
 ref-filter.c          |  2 +-
 remote.c              |  4 +--
 shallow.c             | 38 +++++++++++++-------------
 shallow.h             |  6 ++---
 t/helper/test-reach.c |  6 ++---
 13 files changed, 111 insertions(+), 105 deletions(-)


---
base-commit: 76cf4f61c87855ebf0784b88aaf737d6b09f504b
change-id: 20241227-b4-pks-commit-reach-sign-compare-235a03fffc78

Comments

Junio C Hamano Dec. 27, 2024, 7:47 p.m. UTC | #1
Speaking of -Wsign-compare (cf. [*1*]), we seem to be hitting this
error on linux32 CI job [*2*]:

Error: shallow.c:537:32: comparison of integer expressions of different signedness:
'unsigned int' and 'int' [-Werror=sign-compare]

    537 |  if (!info->pool_count || size > info->end - info->free) {

I didn't dig deeper than this.  After taking three tries to get
'seen' build with linux-meson job (needed merge-fix for ds/backfill
topic, which needed (1) a new built-in hence a new entry in
meson.build, (2) a new test hence a new entry in t/meson.build, and
(3) a new doc hence a new entry in Documentation/meson.build), I am
a bit exhausted right now.

Also breakage of linux-meson job we have at 'master' seems gone,
which probably has to do with your recent update with gitweb thing.

Thanks.


[References]

*1* 

https://staticthinking.wordpress.com/2023/07/25/wsign-compare-is-garbage/

*2*

https://github.com/git/git/actions/runs/12519901432/job/34924707204#step:6:181
Junio C Hamano Dec. 27, 2024, 8:08 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> Error: shallow.c:537:32: comparison of integer expressions of different signedness:
> 'unsigned int' and 'int' [-Werror=sign-compare]
>
>     537 |  if (!info->pool_count || size > info->end - info->free) {
>
> I didn't dig deeper than this.

What we want to express seems to be:

    If the region between "end" and "free" is smaller than the
    required "size" then we are in trouble.

which can be said more naturally with

    If the region of size "size" starting at "free" pointer overruns the
    "end" pointer, we are in trouble.

So perhaps something like this would help?  We are no longer making
a comparison between two integers with this rewrite.

 shallow.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git c/shallow.c w/shallow.c
index b8fcfbef0f..b54244ffa9 100644
--- c/shallow.c
+++ w/shallow.c
@@ -534,7 +534,7 @@ static uint32_t *paint_alloc(struct paint_info *info)
 	unsigned nr = DIV_ROUND_UP(info->nr_bits, 32);
 	unsigned size = nr * sizeof(uint32_t);
 	void *p;
-	if (!info->pool_count || size > info->end - info->free) {
+	if (!info->pool_count || info->end < info->free + size) {
 		if (size > POOL_SIZE)
 			BUG("pool size too small for %d in paint_alloc()",
 			    size);
Jeff King Dec. 27, 2024, 9:37 p.m. UTC | #3
On Fri, Dec 27, 2024 at 12:08:03PM -0800, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
> 
> > Error: shallow.c:537:32: comparison of integer expressions of different signedness:
> > 'unsigned int' and 'int' [-Werror=sign-compare]
> >
> >     537 |  if (!info->pool_count || size > info->end - info->free) {
> >
> > I didn't dig deeper than this.
> 
> What we want to express seems to be:
> 
>     If the region between "end" and "free" is smaller than the
>     required "size" then we are in trouble.
> 
> which can be said more naturally with
> 
>     If the region of size "size" starting at "free" pointer overruns the
>     "end" pointer, we are in trouble.
> 
> So perhaps something like this would help?  We are no longer making
> a comparison between two integers with this rewrite.
> 
>  shallow.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git c/shallow.c w/shallow.c
> index b8fcfbef0f..b54244ffa9 100644
> --- c/shallow.c
> +++ w/shallow.c
> @@ -534,7 +534,7 @@ static uint32_t *paint_alloc(struct paint_info *info)
>  	unsigned nr = DIV_ROUND_UP(info->nr_bits, 32);
>  	unsigned size = nr * sizeof(uint32_t);
>  	void *p;
> -	if (!info->pool_count || size > info->end - info->free) {
> +	if (!info->pool_count || info->end < info->free + size) {
>  		if (size > POOL_SIZE)
>  			BUG("pool size too small for %d in paint_alloc()",
>  			    size);

I doubt it is a practical problem in this instance, but as a general
rule, I think bounds checks like this should avoid using addition
because it can lead to overflow. In this code, imagine that the free
pool was allocated near the end of the address space, but somebody asked
for a very large "size", causing the addition to wrap around.

I think even without that overflow this might be undefined behavior,
because "info->free + size" is forming a pointer that may be outside the
allocated array.

So as a general rule, bounds comparisons like this should be formulated
as "how much free space do we have" compared to "how much are we asking
for". And there "info->end - info->free" is the right way to ask the
first half of that question.

Now where it gets weird with -Wsign-compare is that the pointer
difference is giving us a signed value. Which makes sense in the general
case (you could ask for "info->free - info->end", for example). But we
know that it will always be non-negative because we know that info->free
is <= info->end, coming from earlier in the same allocation.

I doubt there is a way to tell the compiler that (or that a compiler
could even switch to an unsigned ptrdiff type if it knew that). But I
wonder if there is a generalized helper we can devise that would avoid
simply casting here. I guess that could be a checked cast like:

  static inline size_t ptrdiff_to_size(ptrdiff_t v)
  {
	if (v < 0)
		BUG("surprising negative value: %"PRIdMAX, v);
	return (size_t)v;
  }

or even:

  static inline bool has_space(const void *vs, const void *ve, size_t want)
  {
	const char *s = vs, e = ve;
	return want <= ptrdiff_to_size(ve - vs);
  }

I don't love hiding basic things like this behind macros or inlines. But
allocation and bounds comparisons do have gotchas (especially against an
adversary that can try to create pathological situations). Maybe it's worth
having an easy way to do them safely without having to think about each
one. I dunno.

-Peff
Junio C Hamano Dec. 28, 2024, midnight UTC | #4
Junio C Hamano <gitster@pobox.com> writes:

> So perhaps something like this would help?  We are no longer making
> a comparison between two integers with this rewrite.

... and this gave us a first "pass" of the tip of 'seen' for all
jobs since for quite some time, like a few weeks.
Patrick Steinhardt Dec. 28, 2024, 8:27 a.m. UTC | #5
On Fri, Dec 27, 2024 at 04:00:38PM -0800, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > So perhaps something like this would help?  We are no longer making
> > a comparison between two integers with this rewrite.
> 
> ... and this gave us a first "pass" of the tip of 'seen' for all
> jobs since for quite some time, like a few weeks.

Thanks for your fix. I'll have a look at whether I can include a 32 bit
job into GitLab CI for improved test coverage here so that it does not
fall on you to fix up things like this going forward.

Overall, CI has been in pretty rough shape recently. Besides the
mentioned failures there's also been the issue with flaky tests:

  - t5616-partial-clone regularly fails on macOS. [1] This seems like a
    race condition or to me:

    error: unable to open .git/objects/pack/pack-029d08823bd8a8eab510ad6ac75c823cfd3ed31e.pack: No such file or directory
    fatal: unable to rename temporary file to '.git/objects/pack/pack-029d08823bd8a8eab510ad6ac75c823cfd3ed31e.pack'
    fatal: could not finish pack-objects to repack local links
    fatal: index-pack failed
    error: last command exited with $?=128
    not ok 18 - fetch --refetch triggers repacking

  - The leak-checking jobs fail quite regularly in t0003 with something
    that feels like either a race caused by a leak or an issue with the
    sanitizer itself [2]:

    ==git==17055==ERROR: LeakSanitizer: detected memory leaks
    Direct leak of 32 byte(s) in 1 object(s) allocated from:
    #0 0x7aa0d03c7713 in realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98
    #1 0x7aa0d0221f69 in pthread_getattr_np (/lib/x86_64-linux-gnu/libc.so.6+0x9df69) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
    #2 0x7aa0d03d9544 in __sanitizer::GetThreadStackTopAndBottom(bool, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:150
    #3 0x7aa0d03d96fa in __sanitizer::GetThreadStackAndTls(bool, unsigned long*, unsigned long*, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:614
    #4 0x7aa0d03cb2b9 in __lsan::ThreadStart(unsigned int, unsigned long long, __sanitizer::ThreadType) ../../../../src/libsanitizer/lsan/lsan_posix.cpp:53
    #5 0x7aa0d03c756a in ThreadStartFunc<false> ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:431
    #6 0x7aa0d0220a93  (/lib/x86_64-linux-gnu/libc.so.6+0x9ca93) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
    #7 0x7aa0d02adc3b  (/lib/x86_64-linux-gnu/libc.so.6+0x129c3b) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6)
    DEDUP_TOKEN: ___interceptor_realloc--pthread_getattr_np--__sanitizer::GetThreadStackTopAndBottom(bool, unsigned long*, unsigned long*)--__sanitizer::GetThreadStackAndTls(bool, unsigned long*, unsigned long*, unsigned long*, unsigned long*)--__lsan::ThreadStart(unsigned int, unsigned long long, __sanitizer::ThreadType)--ThreadStartFunc<false>----
    SUMMARY: LeakSanitizer: 32 byte(s) leaked in 1 allocation(s).

  - Windows has been quite flaky since adding it to GitLab CI. No idea
    whether it's the same for GitHub Actions.

The thing is, the less reliable it becomes the more likely it is that
people are simply going to ignore its results.

I'll try to have a look at some of the problems on Monday.

Patrick

[1]: https://gitlab.com/git-scm/git/-/jobs/8671168627
[2]: https://gitlab.com/git-scm/git/-/jobs/8671168849
Patrick Steinhardt Dec. 28, 2024, 8:41 a.m. UTC | #6
On Fri, Dec 27, 2024 at 04:37:29PM -0500, Jeff King wrote:
> On Fri, Dec 27, 2024 at 12:08:03PM -0800, Junio C Hamano wrote:
> I doubt there is a way to tell the compiler that (or that a compiler
> could even switch to an unsigned ptrdiff type if it knew that). But I
> wonder if there is a generalized helper we can devise that would avoid
> simply casting here. I guess that could be a checked cast like:
> 
>   static inline size_t ptrdiff_to_size(ptrdiff_t v)
>   {
> 	if (v < 0)
> 		BUG("surprising negative value: %"PRIdMAX, v);
> 	return (size_t)v;
>   }
> 
> or even:
> 
>   static inline bool has_space(const void *vs, const void *ve, size_t want)
>   {
> 	const char *s = vs, e = ve;
> 	return want <= ptrdiff_to_size(ve - vs);
>   }
> 
> I don't love hiding basic things like this behind macros or inlines. But
> allocation and bounds comparisons do have gotchas (especially against an
> adversary that can try to create pathological situations). Maybe it's worth
> having an easy way to do them safely without having to think about each
> one. I dunno.

I think having a wrapper like `cast_ptrdiff_to_size_t()` would be a
sensible solution for now, also because it fits in nicely with
`cast_size_t_to_int()`. I'll introduce such a wrapper once I've got a
good excuse to do so.

Patrick
Junio C Hamano Dec. 28, 2024, 3:38 p.m. UTC | #7
Patrick Steinhardt <ps@pks.im> writes:

> Thanks for your fix. I'll have a look at whether I can include a 32 bit
> job into GitLab CI for improved test coverage here so that it does not
> fall on you to fix up things like this going forward.

I noticed it since it failed GitHub actions thing which already has 32-bit
job.

>   - t5616-partial-clone regularly fails on macOS. [1] This seems like a
>     race condition or to me:

I've seen it before as well at GitHub actions side.  Running
"t5616-*.sh --stress" locally on Debian (x86-64) did not help
isolate it very well.

>   - The leak-checking jobs fail quite regularly in t0003 with something
>     that feels like either a race caused by a leak or an issue with the
>     sanitizer itself [2]:

This one I am not aware of.

>   - Windows has been quite flaky since adding it to GitLab CI. No idea
>     whether it's the same for GitHub Actions.

Similar on GitHub CI front.  Not that I am playing favors between
GitHub and GitLab, but for historical reasons I've pushed to the
former myself but not to the latter, so I do not notice breakages on
the latter.


> The thing is, the less reliable it becomes the more likely it is that
> people are simply going to ignore its results.

Indeed.  Also, for macOS and Windows, I have no access to an
environment to let me debug, so it is really up to the platform
stakeholders to see what they can do to help.

Thanks.