Message ID | 20241227-b4-pks-commit-reach-sign-compare-v1-0-07c59c2aa632@pks.im (mailing list archive) |
---|---|
Headers | show |
Series | commit-reach: -Wsign-compare follow-ups | expand |
Speaking of -Wsign-compare (cf. [*1*]), we seem to be hitting this error on linux32 CI job [*2*]: Error: shallow.c:537:32: comparison of integer expressions of different signedness: 'unsigned int' and 'int' [-Werror=sign-compare] 537 | if (!info->pool_count || size > info->end - info->free) { I didn't dig deeper than this. After taking three tries to get 'seen' build with linux-meson job (needed merge-fix for ds/backfill topic, which needed (1) a new built-in hence a new entry in meson.build, (2) a new test hence a new entry in t/meson.build, and (3) a new doc hence a new entry in Documentation/meson.build), I am a bit exhausted right now. Also breakage of linux-meson job we have at 'master' seems gone, which probably has to do with your recent update with gitweb thing. Thanks. [References] *1* https://staticthinking.wordpress.com/2023/07/25/wsign-compare-is-garbage/ *2* https://github.com/git/git/actions/runs/12519901432/job/34924707204#step:6:181
Junio C Hamano <gitster@pobox.com> writes: > Error: shallow.c:537:32: comparison of integer expressions of different signedness: > 'unsigned int' and 'int' [-Werror=sign-compare] > > 537 | if (!info->pool_count || size > info->end - info->free) { > > I didn't dig deeper than this. What we want to express seems to be: If the region between "end" and "free" is smaller than the required "size" then we are in trouble. which can be said more naturally with If the region of size "size" starting at "free" pointer overruns the "end" pointer, we are in trouble. So perhaps something like this would help? We are no longer making a comparison between two integers with this rewrite. shallow.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git c/shallow.c w/shallow.c index b8fcfbef0f..b54244ffa9 100644 --- c/shallow.c +++ w/shallow.c @@ -534,7 +534,7 @@ static uint32_t *paint_alloc(struct paint_info *info) unsigned nr = DIV_ROUND_UP(info->nr_bits, 32); unsigned size = nr * sizeof(uint32_t); void *p; - if (!info->pool_count || size > info->end - info->free) { + if (!info->pool_count || info->end < info->free + size) { if (size > POOL_SIZE) BUG("pool size too small for %d in paint_alloc()", size);
On Fri, Dec 27, 2024 at 12:08:03PM -0800, Junio C Hamano wrote: > Junio C Hamano <gitster@pobox.com> writes: > > > Error: shallow.c:537:32: comparison of integer expressions of different signedness: > > 'unsigned int' and 'int' [-Werror=sign-compare] > > > > 537 | if (!info->pool_count || size > info->end - info->free) { > > > > I didn't dig deeper than this. > > What we want to express seems to be: > > If the region between "end" and "free" is smaller than the > required "size" then we are in trouble. > > which can be said more naturally with > > If the region of size "size" starting at "free" pointer overruns the > "end" pointer, we are in trouble. > > So perhaps something like this would help? We are no longer making > a comparison between two integers with this rewrite. > > shallow.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git c/shallow.c w/shallow.c > index b8fcfbef0f..b54244ffa9 100644 > --- c/shallow.c > +++ w/shallow.c > @@ -534,7 +534,7 @@ static uint32_t *paint_alloc(struct paint_info *info) > unsigned nr = DIV_ROUND_UP(info->nr_bits, 32); > unsigned size = nr * sizeof(uint32_t); > void *p; > - if (!info->pool_count || size > info->end - info->free) { > + if (!info->pool_count || info->end < info->free + size) { > if (size > POOL_SIZE) > BUG("pool size too small for %d in paint_alloc()", > size); I doubt it is a practical problem in this instance, but as a general rule, I think bounds checks like this should avoid using addition because it can lead to overflow. In this code, imagine that the free pool was allocated near the end of the address space, but somebody asked for a very large "size", causing the addition to wrap around. I think even without that overflow this might be undefined behavior, because "info->free + size" is forming a pointer that may be outside the allocated array. So as a general rule, bounds comparisons like this should be formulated as "how much free space do we have" compared to "how much are we asking for". And there "info->end - info->free" is the right way to ask the first half of that question. Now where it gets weird with -Wsign-compare is that the pointer difference is giving us a signed value. Which makes sense in the general case (you could ask for "info->free - info->end", for example). But we know that it will always be non-negative because we know that info->free is <= info->end, coming from earlier in the same allocation. I doubt there is a way to tell the compiler that (or that a compiler could even switch to an unsigned ptrdiff type if it knew that). But I wonder if there is a generalized helper we can devise that would avoid simply casting here. I guess that could be a checked cast like: static inline size_t ptrdiff_to_size(ptrdiff_t v) { if (v < 0) BUG("surprising negative value: %"PRIdMAX, v); return (size_t)v; } or even: static inline bool has_space(const void *vs, const void *ve, size_t want) { const char *s = vs, e = ve; return want <= ptrdiff_to_size(ve - vs); } I don't love hiding basic things like this behind macros or inlines. But allocation and bounds comparisons do have gotchas (especially against an adversary that can try to create pathological situations). Maybe it's worth having an easy way to do them safely without having to think about each one. I dunno. -Peff
Junio C Hamano <gitster@pobox.com> writes: > So perhaps something like this would help? We are no longer making > a comparison between two integers with this rewrite. ... and this gave us a first "pass" of the tip of 'seen' for all jobs since for quite some time, like a few weeks.
On Fri, Dec 27, 2024 at 04:00:38PM -0800, Junio C Hamano wrote: > Junio C Hamano <gitster@pobox.com> writes: > > > So perhaps something like this would help? We are no longer making > > a comparison between two integers with this rewrite. > > ... and this gave us a first "pass" of the tip of 'seen' for all > jobs since for quite some time, like a few weeks. Thanks for your fix. I'll have a look at whether I can include a 32 bit job into GitLab CI for improved test coverage here so that it does not fall on you to fix up things like this going forward. Overall, CI has been in pretty rough shape recently. Besides the mentioned failures there's also been the issue with flaky tests: - t5616-partial-clone regularly fails on macOS. [1] This seems like a race condition or to me: error: unable to open .git/objects/pack/pack-029d08823bd8a8eab510ad6ac75c823cfd3ed31e.pack: No such file or directory fatal: unable to rename temporary file to '.git/objects/pack/pack-029d08823bd8a8eab510ad6ac75c823cfd3ed31e.pack' fatal: could not finish pack-objects to repack local links fatal: index-pack failed error: last command exited with $?=128 not ok 18 - fetch --refetch triggers repacking - The leak-checking jobs fail quite regularly in t0003 with something that feels like either a race caused by a leak or an issue with the sanitizer itself [2]: ==git==17055==ERROR: LeakSanitizer: detected memory leaks Direct leak of 32 byte(s) in 1 object(s) allocated from: #0 0x7aa0d03c7713 in realloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:98 #1 0x7aa0d0221f69 in pthread_getattr_np (/lib/x86_64-linux-gnu/libc.so.6+0x9df69) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6) #2 0x7aa0d03d9544 in __sanitizer::GetThreadStackTopAndBottom(bool, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:150 #3 0x7aa0d03d96fa in __sanitizer::GetThreadStackAndTls(bool, unsigned long*, unsigned long*, unsigned long*, unsigned long*) ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cpp:614 #4 0x7aa0d03cb2b9 in __lsan::ThreadStart(unsigned int, unsigned long long, __sanitizer::ThreadType) ../../../../src/libsanitizer/lsan/lsan_posix.cpp:53 #5 0x7aa0d03c756a in ThreadStartFunc<false> ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:431 #6 0x7aa0d0220a93 (/lib/x86_64-linux-gnu/libc.so.6+0x9ca93) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6) #7 0x7aa0d02adc3b (/lib/x86_64-linux-gnu/libc.so.6+0x129c3b) (BuildId: 6d64b17fbac799e68da7ebd9985ddf9b5cb375e6) DEDUP_TOKEN: ___interceptor_realloc--pthread_getattr_np--__sanitizer::GetThreadStackTopAndBottom(bool, unsigned long*, unsigned long*)--__sanitizer::GetThreadStackAndTls(bool, unsigned long*, unsigned long*, unsigned long*, unsigned long*)--__lsan::ThreadStart(unsigned int, unsigned long long, __sanitizer::ThreadType)--ThreadStartFunc<false>---- SUMMARY: LeakSanitizer: 32 byte(s) leaked in 1 allocation(s). - Windows has been quite flaky since adding it to GitLab CI. No idea whether it's the same for GitHub Actions. The thing is, the less reliable it becomes the more likely it is that people are simply going to ignore its results. I'll try to have a look at some of the problems on Monday. Patrick [1]: https://gitlab.com/git-scm/git/-/jobs/8671168627 [2]: https://gitlab.com/git-scm/git/-/jobs/8671168849
On Fri, Dec 27, 2024 at 04:37:29PM -0500, Jeff King wrote: > On Fri, Dec 27, 2024 at 12:08:03PM -0800, Junio C Hamano wrote: > I doubt there is a way to tell the compiler that (or that a compiler > could even switch to an unsigned ptrdiff type if it knew that). But I > wonder if there is a generalized helper we can devise that would avoid > simply casting here. I guess that could be a checked cast like: > > static inline size_t ptrdiff_to_size(ptrdiff_t v) > { > if (v < 0) > BUG("surprising negative value: %"PRIdMAX, v); > return (size_t)v; > } > > or even: > > static inline bool has_space(const void *vs, const void *ve, size_t want) > { > const char *s = vs, e = ve; > return want <= ptrdiff_to_size(ve - vs); > } > > I don't love hiding basic things like this behind macros or inlines. But > allocation and bounds comparisons do have gotchas (especially against an > adversary that can try to create pathological situations). Maybe it's worth > having an easy way to do them safely without having to think about each > one. I dunno. I think having a wrapper like `cast_ptrdiff_to_size_t()` would be a sensible solution for now, also because it fits in nicely with `cast_size_t_to_int()`. I'll introduce such a wrapper once I've got a good excuse to do so. Patrick
Patrick Steinhardt <ps@pks.im> writes: > Thanks for your fix. I'll have a look at whether I can include a 32 bit > job into GitLab CI for improved test coverage here so that it does not > fall on you to fix up things like this going forward. I noticed it since it failed GitHub actions thing which already has 32-bit job. > - t5616-partial-clone regularly fails on macOS. [1] This seems like a > race condition or to me: I've seen it before as well at GitHub actions side. Running "t5616-*.sh --stress" locally on Debian (x86-64) did not help isolate it very well. > - The leak-checking jobs fail quite regularly in t0003 with something > that feels like either a race caused by a leak or an issue with the > sanitizer itself [2]: This one I am not aware of. > - Windows has been quite flaky since adding it to GitLab CI. No idea > whether it's the same for GitHub Actions. Similar on GitHub CI front. Not that I am playing favors between GitHub and GitLab, but for historical reasons I've pushed to the former myself but not to the latter, so I do not notice breakages on the latter. > The thing is, the less reliable it becomes the more likely it is that > people are simply going to ignore its results. Indeed. Also, for macOS and Windows, I have no access to an environment to let me debug, so it is really up to the platform stakeholders to see what they can do to help. Thanks.