Message ID | 20250205014055.737190-1-jelly.zhao.42@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [GSOC] apply: address -Wsign-comparison warnings | expand |
On Wed, Feb 05, 2025 at 01:40:55AM +0000, Zejun Zhao wrote: > There are several -Wsign-comparison warnings in "apply.c", which can be > classified into the following three types: > > 1. comparing a `length` of `size_t` type with a result of pointer > arithmetic of `int` type > > 2. comparing a `length` of `size_t` type with a `length` of `int` type > > 3. comparing a loop count `i` of `int` type with an unsigned loop > bound > > Fix these warnings following one basic principle: do not touch the > relevant logics and keep the behaviors of the code. Adopt three > different strategies for each of the above three types: > > 1. cast the result of pointer arithmetic to `size_t` type > > 2. try to change the type of the `length` to `size_t` (may fall back > to Strategy 1 if the variable is not guaranteed to be unsigned) > > 3. use a loop count `i` of `size_t` type You should split up this patch into a series, as it is really hard to follow what's going on. There are a couple of things happening: - You change types in `struct apply_state`, which bubbles up. - You adapt `git_hdr_len()` to receive different inputs, which bubbles up. - You perform small fixes in several places. It might also be a good idea to split out the loop counters into a separate commit, as those are trivially correct. > Signed-off-by: Zejun Zhao <jelly.zhao.42@gmail.com> > --- > apply.c | 73 +++++++++++++++++++++++++++------------------------------ > apply.h | 6 ++--- > 2 files changed, 38 insertions(+), 41 deletions(-) > > diff --git a/apply.c b/apply.c > index 4a7b6120ac..0c7b89dc3a 100644 > --- a/apply.c > +++ b/apply.c > @@ -8,7 +8,6 @@ > */ > > #define USE_THE_REPOSITORY_VARIABLE > -#define DISABLE_SIGN_COMPARE_WARNINGS > > #include "git-compat-util.h" > #include "abspath.h" > @@ -540,7 +539,7 @@ static size_t date_len(const char *line, size_t len) > !isdigit(*p++) || !isdigit(*p++)) /* Not a date. */ > return 0; > > - if (date - line >= strlen("19") && > + if ((size_t) (date - line) >= strlen("19") && We know that `date` is always bigger than or equal to `line`, so this is correct. > @@ -1087,11 +1086,11 @@ static int gitdiff_index(struct gitdiff_data *state, > * and optional space with octal mode. > */ > const char *ptr, *eol; > - int len; > - const unsigned hexsz = the_hash_algo->hexsz; > + size_t len; > + const size_t hexsz = the_hash_algo->hexsz; The change to `hexsz` shouldn't be needed, even if it makes us match the type of `hexsz` as declared in `git_hash_algo`. > ptr = strchr(line, '.'); > - if (!ptr || ptr[1] != '.' || hexsz < ptr - line) > + if (!ptr || ptr[1] != '.' || hexsz < (size_t) (ptr - line)) `ptr` is the reline of `strrchr(line)`, so it must be either `NULL` or greater than or equal to `line`, so this cast is fine. > @@ -1158,7 +1157,7 @@ static const char *skip_tree_prefix(int p_value, > */ > static char *git_header_name(int p_value, > const char *line, > - int llen) > + size_t llen) > { > const char *name; > const char *second = NULL; It would make sense to split this change out into a separate commit, as it bubbles up into calling functions, as well. > @@ -1207,7 +1206,7 @@ static char *git_header_name(int p_value, > cp = skip_tree_prefix(p_value, second, line + llen - second); > if (!cp) > goto free_and_fail1; > - if (line + llen - cp != first.len || > + if ((size_t) (line + llen - cp) != first.len || > memcmp(first.buf, cp, first.len)) > goto free_and_fail1; > return strbuf_detach(&first, NULL); This cast should be fine, too. > @@ -1240,7 +1239,7 @@ static char *git_header_name(int p_value, > goto free_and_fail2; > > len = sp.buf + sp.len - np; > - if (len < second - name && > + if (len < (size_t) (second - name) && > !strncmp(np, name, len) && > isspace(name[len])) { > /* Good */ This one, too. `second` is iterating through `name`, so it's always greater or equal to `name`. > @@ -1371,14 +1370,13 @@ int parse_git_diff_header(struct strbuf *root, > { "index ", gitdiff_index }, > { "", gitdiff_unrecognized }, > }; > - int i; > > len = linelen(line, size); > if (!len || line[len-1] != '\n') > break; > - for (i = 0; i < ARRAY_SIZE(optable); i++) { > + for (size_t i = 0; i < ARRAY_SIZE(optable); i++) { > const struct opentry *p = optable + i; > - int oplen = strlen(p->str); > + size_t oplen = strlen(p->str); > int res; > if (len < oplen || memcmp(p->str, line, oplen)) > continue; Makes sense. > @@ -1592,7 +1590,7 @@ static int find_header(struct apply_state *state, > size, patch); > if (git_hdr_len < 0) > return -128; > - if (git_hdr_len <= len) > + if ((size_t) git_hdr_len <= len) > continue; > *hdrsize = git_hdr_len; > return offset; We've asserted that `git_hdr_len` isn't negative, so the cast is fine. > @@ -2185,7 +2182,7 @@ static int parse_chunk(struct apply_state *state, char *buffer, unsigned long si > }; > int i; This may arguably be `size_t`, as well. > @@ -2257,12 +2255,12 @@ static void show_stats(struct apply_state *state, struct patch *patch) > } > > if (patch->is_binary) { > - printf(" %-*s | Bin\n", max, qname.buf); > + printf(" %-*s | Bin\n", (int) max, qname.buf); > strbuf_release(&qname); > return; > } > > - printf(" %-*s |", max, qname.buf); > + printf(" %-*s |", (int) max, qname.buf); > strbuf_release(&qname); > > /* Do we _know_ that `max` fits into an `int`? Patrick
> @@ -1087,11 +1086,11 @@ static int gitdiff_index(struct gitdiff_data *state, > * and optional space with octal mode. > */ > const char *ptr, *eol; > - int len; > - const unsigned hexsz = the_hash_algo->hexsz; > + size_t len; > + const size_t hexsz = the_hash_algo->hexsz; I thought that I already saw this discussed in another thread. The .hexsz of any hash algorithm would never be larger than what a platform natural "unsigned" integer type can hold, so using size_t for the member _is_ the wrong thing to do and the fix may be the other way around, no? There are genuinely good changes (correcting assigned variable from int to size_t when the value ultimately came from a system function that returns size_t) in this patch, but there are other annoying ones like these, so I am not sure. > ptr = strchr(line, '.'); > - if (!ptr || ptr[1] != '.' || hexsz < ptr - line) > + if (!ptr || ptr[1] != '.' || hexsz < (size_t) (ptr - line)) Is this about -Wsign-compare complaining about size_t vs ptrdiff_t? > @@ -1207,7 +1206,7 @@ static char *git_header_name(int p_value, > cp = skip_tree_prefix(p_value, second, line + llen - second); > if (!cp) > goto free_and_fail1; > - if (line + llen - cp != first.len || > + if ((size_t) (line + llen - cp) != first.len || Ditto. > - if (len < second - name && > + if (len < (size_t) (second - name) && Ditto. I said this before, but I am not sure if being strict about "-Wsign-compare" is really buying us much. If we are getting so many false positives that need to be squelched by churning the code with so many typecasts in order to find a new and real problem, is it really worth it?
Thank you for reviewing this patch. On Wed, Feb 05, 2025 08:47:06 +0100, Patrick Steinhardt wrote: > You should split up this patch into a series, as it is really hard to > follow what's going on. There are a couple of things happening: > > - You change types in `struct apply_state`, which bubbles up. > > - You adapt `git_hdr_len()` to receive different inputs, which bubbles > up. > > - You perform small fixes in several places. > > It might also be a good idea to split out the loop counters into a > separate commit, as those are trivially correct. Sure I'll come up with a v2 patch series, in which each kind of fixes will be put in a single commit and I'll state why I believe the type cast/change is safe for every single fix in the commit message. > > @@ -1087,11 +1086,11 @@ static int gitdiff_index(struct gitdiff_data *state, > > * and optional space with octal mode. > > */ > > const char *ptr, *eol; > > - int len; > > - const unsigned hexsz = the_hash_algo->hexsz; > > + size_t len; > > + const size_t hexsz = the_hash_algo->hexsz; > > The change to `hexsz` shouldn't be needed, even if it makes us match the > type of `hexsz` as declared in `git_hash_algo`. Yes it's not necessary here to change the type. And for the `hexsz` stuff, on Wed, Feb 05 2025 04:58:57 -0800, Junio C Hamano wrote, > I thought that I already saw this discussed in another thread. > > The .hexsz of any hash algorithm would never be larger than what a > platform natural "unsigned" integer type can hold, so using size_t > for the member _is_ the wrong thing to do and the fix may be the > other way around, no? I found the discussion mentioned at [1]. It seems like the change here only makes things worse so I'll see if I'd leave it untouched or change the type of `.hexsz` member to `int` or something. [1] https://lore.kernel.org/git/xmqqttaqw2eb.fsf@gitster.g/ > > @@ -2185,7 +2182,7 @@ static int parse_chunk(struct apply_state *state, char *buffer, unsigned long si > > }; > > int i; > > This may arguably be `size_t`, as well. It's OK for me to use a `size_t` loop count everywhere but I tried to keep the changes in this patch minimal (forget about the `hexsz` thing). I could apply this change if you insist. > > @@ -2257,12 +2255,12 @@ static void show_stats(struct apply_state *state, struct patch *patch) > > } > > > > if (patch->is_binary) { > > - printf(" %-*s | Bin\n", max, qname.buf); > > + printf(" %-*s | Bin\n", (int) max, qname.buf); > > strbuf_release(&qname); > > return; > > } > > > > - printf(" %-*s |", max, qname.buf); > > + printf(" %-*s |", (int) max, qname.buf); > > strbuf_release(&qname); > > > > /* > > Do we _know_ that `max` fits into an `int`? Yep we've set an upper bound for `max` before: > /* > * "scale" the filename > */ > max = state->max_len; > if (max > 50) > max = 50; so it must fit into an `int`.
Thank you for reviewing this patch. On Wed, Feb 05 2025 04:58:57 -0800, Junio C Hamano wrote, > > @@ -1087,11 +1086,11 @@ static int gitdiff_index(struct gitdiff_data *state, > > * and optional space with octal mode. > > */ > > const char *ptr, *eol; > > - int len; > > - const unsigned hexsz = the_hash_algo->hexsz; > > + size_t len; > > + const size_t hexsz = the_hash_algo->hexsz; > > I thought that I already saw this discussed in another thread. > > The .hexsz of any hash algorithm would never be larger than what a > platform natural "unsigned" integer type can hold, so using size_t > for the member _is_ the wrong thing to do and the fix may be the > other way around, no? Sorry I didn't saw the comment at [1]. You are right about this. I prefer changing types of the `git_hash_algo::*sz` family to `unsigned`. [1] https://lore.kernel.org/git/xmqqttaqw2eb.fsf@gitster.g/ > > ptr = strchr(line, '.'); > > - if (!ptr || ptr[1] != '.' || hexsz < ptr - line) > > + if (!ptr || ptr[1] != '.' || hexsz < (size_t) (ptr - line)) > > Is this about -Wsign-compare complaining about size_t vs ptrdiff_t? > > > @@ -1207,7 +1206,7 @@ static char *git_header_name(int p_value, > > cp = skip_tree_prefix(p_value, second, line + llen - second); > > if (!cp) > > goto free_and_fail1; > > - if (line + llen - cp != first.len || > > + if ((size_t) (line + llen - cp) != first.len || > > Ditto. > > > - if (len < second - name && > > + if (len < (size_t) (second - name) && > > Ditto. Yes it's comparing unsigned (`size_t`) with signed (`ptrdiff_t`). > I said this before, but I am not sure if being strict about > "-Wsign-compare" is really buying us much. If we are getting so > many false positives that need to be squelched by churning the code > with so many typecasts in order to find a new and real problem, > is it really worth it? Well, I contributed most before to a Rust OS kernel project so honestly I'm used to the idea of explicit type casting when necessary to avoid possible bugs. However, this is non-trivial. People should understand each relevant variable's semantics to decide what its type should be and do typecast on top of that. Even if it's already made a gradual process, it may still cost much from both the contributors and the reviewers but brings benefits that do not match up. Maybe a clear commit message that states why each type cast/change makes sense can help with the iteration? I'm not quite sure about that.
diff --git a/apply.c b/apply.c index 4a7b6120ac..0c7b89dc3a 100644 --- a/apply.c +++ b/apply.c @@ -8,7 +8,6 @@ */ #define USE_THE_REPOSITORY_VARIABLE -#define DISABLE_SIGN_COMPARE_WARNINGS #include "git-compat-util.h" #include "abspath.h" @@ -540,7 +539,7 @@ static size_t date_len(const char *line, size_t len) !isdigit(*p++) || !isdigit(*p++)) /* Not a date. */ return 0; - if (date - line >= strlen("19") && + if ((size_t) (date - line) >= strlen("19") && isdigit(date[-1]) && isdigit(date[-2])) /* 4-digit year */ date -= strlen("19"); @@ -1087,11 +1086,11 @@ static int gitdiff_index(struct gitdiff_data *state, * and optional space with octal mode. */ const char *ptr, *eol; - int len; - const unsigned hexsz = the_hash_algo->hexsz; + size_t len; + const size_t hexsz = the_hash_algo->hexsz; ptr = strchr(line, '.'); - if (!ptr || ptr[1] != '.' || hexsz < ptr - line) + if (!ptr || ptr[1] != '.' || hexsz < (size_t) (ptr - line)) return 0; len = ptr - line; memcpy(patch->old_oid_prefix, line, len); @@ -1158,7 +1157,7 @@ static const char *skip_tree_prefix(int p_value, */ static char *git_header_name(int p_value, const char *line, - int llen) + size_t llen) { const char *name; const char *second = NULL; @@ -1207,7 +1206,7 @@ static char *git_header_name(int p_value, cp = skip_tree_prefix(p_value, second, line + llen - second); if (!cp) goto free_and_fail1; - if (line + llen - cp != first.len || + if ((size_t) (line + llen - cp) != first.len || memcmp(first.buf, cp, first.len)) goto free_and_fail1; return strbuf_detach(&first, NULL); @@ -1240,7 +1239,7 @@ static char *git_header_name(int p_value, goto free_and_fail2; len = sp.buf + sp.len - np; - if (len < second - name && + if (len < (size_t) (second - name) && !strncmp(np, name, len) && isspace(name[len])) { /* Good */ @@ -1317,7 +1316,7 @@ int parse_git_diff_header(struct strbuf *root, int *linenr, int p_value, const char *line, - int len, + size_t len, unsigned int size, struct patch *patch) { @@ -1371,14 +1370,13 @@ int parse_git_diff_header(struct strbuf *root, { "index ", gitdiff_index }, { "", gitdiff_unrecognized }, }; - int i; len = linelen(line, size); if (!len || line[len-1] != '\n') break; - for (i = 0; i < ARRAY_SIZE(optable); i++) { + for (size_t i = 0; i < ARRAY_SIZE(optable); i++) { const struct opentry *p = optable + i; - int oplen = strlen(p->str); + size_t oplen = strlen(p->str); int res; if (len < oplen || memcmp(p->str, line, oplen)) continue; @@ -1592,7 +1590,7 @@ static int find_header(struct apply_state *state, size, patch); if (git_hdr_len < 0) return -128; - if (git_hdr_len <= len) + if ((size_t) git_hdr_len <= len) continue; *hdrsize = git_hdr_len; return offset; @@ -2097,7 +2095,6 @@ static void add_name_limit(struct apply_state *state, static int use_patch(struct apply_state *state, struct patch *p) { const char *pathname = p->new_name ? p->new_name : p->old_name; - int i; /* Paths outside are not touched regardless of "--include" */ if (state->prefix && *state->prefix) { @@ -2107,7 +2104,7 @@ static int use_patch(struct apply_state *state, struct patch *p) } /* See if it matches any of exclude/include rule */ - for (i = 0; i < state->limit_by_name.nr; i++) { + for (size_t i = 0; i < state->limit_by_name.nr; i++) { struct string_list_item *it = &state->limit_by_name.items[i]; if (!wildmatch(it->string, pathname, 0)) return (it->util != NULL); @@ -2185,7 +2182,7 @@ static int parse_chunk(struct apply_state *state, char *buffer, unsigned long si }; int i; for (i = 0; binhdr[i]; i++) { - int len = strlen(binhdr[i]); + size_t len = strlen(binhdr[i]); if (len < size - hd && !memcmp(binhdr[i], buffer + hd, len)) { state->linenr++; @@ -2238,7 +2235,8 @@ static void show_stats(struct apply_state *state, struct patch *patch) { struct strbuf qname = STRBUF_INIT; char *cp = patch->new_name ? patch->new_name : patch->old_name; - int max, add, del; + size_t max; + int add, del; quote_c_style(cp, &qname, NULL, 0); @@ -2257,12 +2255,12 @@ static void show_stats(struct apply_state *state, struct patch *patch) } if (patch->is_binary) { - printf(" %-*s | Bin\n", max, qname.buf); + printf(" %-*s | Bin\n", (int) max, qname.buf); strbuf_release(&qname); return; } - printf(" %-*s |", max, qname.buf); + printf(" %-*s |", (int) max, qname.buf); strbuf_release(&qname); /* @@ -2319,7 +2317,7 @@ static void update_pre_post_images(struct image *preimage, { struct image fixed_preimage = IMAGE_INIT; size_t insert_pos = 0; - int i, ctx, reduced; + size_t ctx, reduced; const char *fixed; /* @@ -2328,7 +2326,7 @@ static void update_pre_post_images(struct image *preimage, * free "oldlines". */ image_prepare(&fixed_preimage, buf, len, 1); - for (i = 0; i < fixed_preimage.line_nr; i++) + for (size_t i = 0; i < fixed_preimage.line_nr; i++) fixed_preimage.line[i].flag = preimage->line[i].flag; image_clear(preimage); *preimage = fixed_preimage; @@ -2337,7 +2335,7 @@ static void update_pre_post_images(struct image *preimage, /* * Adjust the common context lines in postimage. */ - for (i = reduced = ctx = 0; i < postimage->line_nr; i++) { + for (size_t i = reduced = ctx = 0; i < postimage->line_nr; i++) { size_t l_len = postimage->line[i].len; if (!(postimage->line[i].flag & LINE_COMMON)) { @@ -2417,9 +2415,9 @@ static int line_by_line_fuzzy_match(struct image *img, struct image *postimage, unsigned long current, int current_lno, - int preimage_limit) + size_t preimage_limit) { - int i; + size_t i; size_t imgoff = 0; size_t preoff = 0; size_t extra_chars; @@ -2486,12 +2484,12 @@ static int match_fragment(struct apply_state *state, unsigned ws_rule, int match_beginning, int match_end) { - int i; + size_t i; const char *orig, *target; struct strbuf fixed = STRBUF_INIT; char *fixed_buf; size_t fixed_len; - int preimage_limit; + size_t preimage_limit; int ret; if (preimage->line_nr + current_lno <= img->line_nr) { @@ -2663,12 +2661,11 @@ static int match_fragment(struct apply_state *state, for ( ; i < preimage->line_nr; i++) { size_t fixstart = fixed.len; /* start of the fixed preimage */ size_t oldlen = preimage->line[i].len; - int j; /* Try fixing the line in the preimage */ ws_fix_copy(&fixed, orig, oldlen, ws_rule, NULL); - for (j = fixstart; j < fixed.len; j++) { + for (size_t j = fixstart; j < fixed.len; j++) { if (!isspace(fixed.buf[j])) { ret = 0; goto out; @@ -2705,7 +2702,7 @@ static int find_pos(struct apply_state *state, { int i; unsigned long backwards, forwards, current; - int backwards_lno, forwards_lno, current_lno; + size_t backwards_lno, forwards_lno, current_lno; /* * When running with --allow-overlap, it is possible that a hunk is @@ -2790,7 +2787,7 @@ static int find_pos(struct apply_state *state, */ static void update_image(struct apply_state *state, struct image *img, - int applied_pos, + size_t applied_pos, struct image *preimage, struct image *postimage) { @@ -2798,11 +2795,11 @@ static void update_image(struct apply_state *state, * remove the copy of preimage at offset in img * and replace it with postimage */ - int i, nr; + int nr; size_t remove_count, insert_count, applied_at = 0; size_t result_alloc; char *result; - int preimage_limit; + size_t preimage_limit; /* * If we are removing blank lines at the end of img, @@ -2817,11 +2814,11 @@ static void update_image(struct apply_state *state, if (preimage_limit > img->line_nr - applied_pos) preimage_limit = img->line_nr - applied_pos; - for (i = 0; i < applied_pos; i++) + for (size_t i = 0; i < applied_pos; i++) applied_at += img->line[i].len; remove_count = 0; - for (i = 0; i < preimage_limit; i++) + for (size_t i = 0; i < preimage_limit; i++) remove_count += img->line[applied_pos + i].len; insert_count = postimage->buf.len; @@ -2850,7 +2847,7 @@ static void update_image(struct apply_state *state, img->line_nr - (applied_pos + preimage_limit)); COPY_ARRAY(img->line + applied_pos, postimage->line, postimage->line_nr); if (!state->allow_overlap) - for (i = 0; i < postimage->line_nr; i++) + for (size_t i = 0; i < postimage->line_nr; i++) img->line[applied_pos + i].flag |= LINE_PATCHED; img->line_nr = nr; } @@ -4287,19 +4284,19 @@ static void summary_patch_list(struct patch *patch) static void patch_stats(struct apply_state *state, struct patch *patch) { - int lines = patch->lines_added + patch->lines_deleted; + unsigned lines = patch->lines_added + patch->lines_deleted; if (lines > state->max_change) state->max_change = lines; if (patch->old_name) { - int len = quote_c_style(patch->old_name, NULL, NULL, 0); + size_t len = quote_c_style(patch->old_name, NULL, NULL, 0); if (!len) len = strlen(patch->old_name); if (len > state->max_len) state->max_len = len; } if (patch->new_name) { - int len = quote_c_style(patch->new_name, NULL, NULL, 0); + size_t len = quote_c_style(patch->new_name, NULL, NULL, 0); if (!len) len = strlen(patch->new_name); if (len > state->max_len) diff --git a/apply.h b/apply.h index 90e887ec0e..f53fbb1777 100644 --- a/apply.h +++ b/apply.h @@ -90,8 +90,8 @@ struct apply_state { * we've seen, and the longest filename. That allows us to do simple * scaling. */ - int max_change; - int max_len; + unsigned max_change; + size_t max_len; /* * Records filenames that have been touched, in order to handle @@ -170,7 +170,7 @@ int parse_git_diff_header(struct strbuf *root, int *linenr, int p_value, const char *line, - int len, + size_t len, unsigned int size, struct patch *patch);
There are several -Wsign-comparison warnings in "apply.c", which can be classified into the following three types: 1. comparing a `length` of `size_t` type with a result of pointer arithmetic of `int` type 2. comparing a `length` of `size_t` type with a `length` of `int` type 3. comparing a loop count `i` of `int` type with an unsigned loop bound Fix these warnings following one basic principle: do not touch the relevant logics and keep the behaviors of the code. Adopt three different strategies for each of the above three types: 1. cast the result of pointer arithmetic to `size_t` type 2. try to change the type of the `length` to `size_t` (may fall back to Strategy 1 if the variable is not guaranteed to be unsigned) 3. use a loop count `i` of `size_t` type Signed-off-by: Zejun Zhao <jelly.zhao.42@gmail.com> --- apply.c | 73 +++++++++++++++++++++++++++------------------------------ apply.h | 6 ++--- 2 files changed, 38 insertions(+), 41 deletions(-) base-commit: bc204b742735ae06f65bb20291c95985c9633b7f