Message ID | 51ee5660a1452797ac0a45819210141c57f3dcb9.1716983704.git.ps@pks.im (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Compile with `-Wwrite-strings` | expand |
Patrick Steinhardt <ps@pks.im> writes: > + char refspec_str[] = "refs/heads/*"; > > memset(&refspec, 0, sizeof(refspec)); > refspec.force = 0; > refspec.pattern = 1; > - refspec.src = refspec.dst = "refs/heads/*"; > + refspec.src = refspec.dst = refspec_str; > get_fetch_map(remote_refs, &refspec, &fetch_map_tail, 0); > matches = guess_remote_head(find_ref_by_name(remote_refs, "HEAD"), > fetch_map, 1); I have no objections to changes along this line, but this makes me wonder if we somehow have ways to ensure that after everything is done, refspec_str[], empty_str[], and the like in other hunks remain to have their initial contents. Stepping back a bit, without applying this series, if we told the compiler to store these constant strings in read-only segment, any attempt to write into refspec.src or refspec.dst string would have been caught at runtime as an error. With the patch, that would no longer work. The piece of memory in refspec_str[] is a fair game to be overwritten by anybody. Which makes me wonder why these refspec.src and refspec.dst members are not "const char *" pointers in the first place? Obviously we do not expect "refs/heads/*" to be overwritten after storing the pointer to it in these members and making the get_fetch_map() call.
On Wed, May 29, 2024 at 10:25:47AM -0700, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > > > + char refspec_str[] = "refs/heads/*"; > > > > memset(&refspec, 0, sizeof(refspec)); > > refspec.force = 0; > > refspec.pattern = 1; > > - refspec.src = refspec.dst = "refs/heads/*"; > > + refspec.src = refspec.dst = refspec_str; > > get_fetch_map(remote_refs, &refspec, &fetch_map_tail, 0); > > matches = guess_remote_head(find_ref_by_name(remote_refs, "HEAD"), > > fetch_map, 1); > > I have no objections to changes along this line, but this makes me > wonder if we somehow have ways to ensure that after everything is > done, refspec_str[], empty_str[], and the like in other hunks remain > to have their initial contents. Stepping back a bit, without > applying this series, if we told the compiler to store these > constant strings in read-only segment, any attempt to write into > refspec.src or refspec.dst string would have been caught at runtime > as an error. With the patch, that would no longer work. The piece > of memory in refspec_str[] is a fair game to be overwritten by > anybody. True. Ideally, we wouldn't have to do these acrobatics here in the first place. That would likely require quite a lot more work and go beyond the scope of this patch series. > Which makes me wonder why these refspec.src and refspec.dst members > are not "const char *" pointers in the first place? Obviously we do > not expect "refs/heads/*" to be overwritten after storing the pointer > to it in these members and making the get_fetch_map() call. Well, we do. Not in `get_fetch_map()`, but in `query_refspecs()`. It does weird stuff where it writes the result into either `src` or `dst` depending on which of these fields is provided by the caller. Which means that one of the fields would typically be a constant, whereas the other one will be allocated. Really, once you start looking you find all kinds of weird interfaces where we cannot quite make up our mind whether fields are controlled by the caller or by the callee. This patch series certainly surfaces quite some places that left me scratching my head. Patrick
Patrick Steinhardt <ps@pks.im> writes: > Well, we do. Not in `get_fetch_map()`, but in `query_refspecs()`. It > does weird stuff where it writes the result into either `src` or `dst` > depending on which of these fields is provided by the caller. Which > means that one of the fields would typically be a constant, whereas the > other one will be allocated. Yes, <src, dst> is used as a pair of <key, value> to query one with the other (i.e. "where does this one go?" "where does this one come from?"). But we are not talking about const-ness of the member (iow, once you point a string with the member, you cannot repoint the pointer to another string), but we are talking about const-ness of the string that is pointed by the member (iow, not "char const *src" but "const char *src"), no? If I ask "I have this src, where does it go?" with a refspec element filled with src, the dst member may need to be updated to point at the string that is the answer of the query, but that still can be done with "const char *src, *dst", can't it? That was what I was wondering. And again you are conflating "allocated" with "read-write" here. It is often convenient if a variable that points at an allocated string is of type "char *" and not "const char *", because you do not cast it when calling free(). But if you want to make a structure member or a variable that holds an allocated string responsible for _owning_ the piece of memory, then you need to consistently have the member point at an allocated piece of memory (or NULL), no? What this patch does, i.e. prepare an on-stack refspec_str[] array that is initialized from a constant string, and have .src member point at it, would not make .src freeable. In other words, .src pointing at an allocated piece of string "some of the time" alone is not a good justification to make it a non-const pointer, I would think.
On Thu, May 30, 2024 at 12:38:45PM -0700, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > > > Well, we do. Not in `get_fetch_map()`, but in `query_refspecs()`. It > > does weird stuff where it writes the result into either `src` or `dst` > > depending on which of these fields is provided by the caller. Which > > means that one of the fields would typically be a constant, whereas the > > other one will be allocated. > > Yes, <src, dst> is used as a pair of <key, value> to query one with > the other (i.e. "where does this one go?" "where does this one come > from?"). > > But we are not talking about const-ness of the member (iow, once you > point a string with the member, you cannot repoint the pointer to > another string), but we are talking about const-ness of the string > that is pointed by the member (iow, not "char const *src" but "const > char *src"), no? If I ask "I have this src, where does it go?" with > a refspec element filled with src, the dst member may need to be > updated to point at the string that is the answer of the query, but > that still can be done with "const char *src, *dst", can't it? That > was what I was wondering. Well, yes, the pointers can be updated. But that doesn't solve the underlying problem that the interface is not well-prepared to track ownership of its inputs and outputs. The root problem is that `struct refspec_item` is being abused for multiple purposes. One of these purposes it to store parsed refspec items. Another purpose is to actually serve as a query. The requirements for those two purposes regarding the lifetime of associated strings is different: - When using it for storage purposes the strings should be owned by the structure itself. Consequently, the lifetime should be managed by the structure too, namely via `refspec_item_clear()`. - When using it for parsing purposes the input string is owned by the caller, whereas the output string is owned by the callee and then handed over to the caller. Here we cannot rely on the structure itself to manage the lifetimes of both strings, at least not in the current form. Now of course, we could amend `struct refspec_item` to grow two additional fields `src_to_free` and `dst_to_free`. But that only doubles down on this misdesigned interface. The proper solution in my opinion is to detangle those two usecases and split them out into two independent interfaces. > And again you are conflating "allocated" with "read-write" here. It > is often convenient if a variable that points at an allocated string > is of type "char *" and not "const char *", because you do not cast > it when calling free(). But if you want to make a structure member > or a variable that holds an allocated string responsible for > _owning_ the piece of memory, then you need to consistently have the > member point at an allocated piece of memory (or NULL), no? What > this patch does, i.e. prepare an on-stack refspec_str[] array that > is initialized from a constant string, and have .src member point at > it, would not make .src freeable. In other words, .src pointing at > an allocated piece of string "some of the time" alone is not a good > justification to make it a non-const pointer, I would think. That's fair enough. I'm trying to work around a broken interface, but do not solve the underlying issue. Patrick
On Fri, May 31, 2024 at 03:00:36PM +0200, Patrick Steinhardt wrote: > On Thu, May 30, 2024 at 12:38:45PM -0700, Junio C Hamano wrote: > > Patrick Steinhardt <ps@pks.im> writes: > > > > > Well, we do. Not in `get_fetch_map()`, but in `query_refspecs()`. It > > > does weird stuff where it writes the result into either `src` or `dst` > > > depending on which of these fields is provided by the caller. Which > > > means that one of the fields would typically be a constant, whereas the > > > other one will be allocated. > > > > Yes, <src, dst> is used as a pair of <key, value> to query one with > > the other (i.e. "where does this one go?" "where does this one come > > from?"). > > > > But we are not talking about const-ness of the member (iow, once you > > point a string with the member, you cannot repoint the pointer to > > another string), but we are talking about const-ness of the string > > that is pointed by the member (iow, not "char const *src" but "const > > char *src"), no? If I ask "I have this src, where does it go?" with > > a refspec element filled with src, the dst member may need to be > > updated to point at the string that is the answer of the query, but > > that still can be done with "const char *src, *dst", can't it? That > > was what I was wondering. > > Well, yes, the pointers can be updated. But that doesn't solve the > underlying problem that the interface is not well-prepared to track > ownership of its inputs and outputs. > > The root problem is that `struct refspec_item` is being abused for > multiple purposes. One of these purposes it to store parsed refspec > items. Another purpose is to actually serve as a query. The requirements > for those two purposes regarding the lifetime of associated strings is > different: > > - When using it for storage purposes the strings should be owned by > the structure itself. Consequently, the lifetime should be managed > by the structure too, namely via `refspec_item_clear()`. > > - When using it for parsing purposes the input string is owned by the > caller, whereas the output string is owned by the callee and then > handed over to the caller. Here we cannot rely on the structure > itself to manage the lifetimes of both strings, at least not in the > current form. > > Now of course, we could amend `struct refspec_item` to grow two > additional fields `src_to_free` and `dst_to_free`. But that only doubles > down on this misdesigned interface. > > The proper solution in my opinion is to detangle those two usecases and > split them out into two independent interfaces. > > > And again you are conflating "allocated" with "read-write" here. It > > is often convenient if a variable that points at an allocated string > > is of type "char *" and not "const char *", because you do not cast > > it when calling free(). But if you want to make a structure member > > or a variable that holds an allocated string responsible for > > _owning_ the piece of memory, then you need to consistently have the > > member point at an allocated piece of memory (or NULL), no? What > > this patch does, i.e. prepare an on-stack refspec_str[] array that > > is initialized from a constant string, and have .src member point at > > it, would not make .src freeable. In other words, .src pointing at > > an allocated piece of string "some of the time" alone is not a good > > justification to make it a non-const pointer, I would think. > > That's fair enough. I'm trying to work around a broken interface, but do > not solve the underlying issue. > > Patrick In any case, I have now invested some more time into the individual sites where I was converting to use on-stack strings. In many of the cases we were indeed able to avoid the issue altogether without too much of a hassle. The end result is more pleasing indeed, I'd say. Patrick
Patrick Steinhardt <ps@pks.im> writes: > On Thu, May 30, 2024 at 12:38:45PM -0700, Junio C Hamano wrote: >> Patrick Steinhardt <ps@pks.im> writes: >> >> > Well, we do. Not in `get_fetch_map()`, but in `query_refspecs()`. It >> > does weird stuff where it writes the result into either `src` or `dst` >> > depending on which of these fields is provided by the caller. Which >> > means that one of the fields would typically be a constant, whereas the >> > other one will be allocated. > ... > Well, yes, the pointers can be updated. But that doesn't solve the > underlying problem that the interface is not well-prepared to track > ownership of its inputs and outputs. Oh, absolutely. That is exactly why I wondered why .src and .dst need to be "char *" and not "const char *". If the piece of memory these members point at are never modified via these pointers, marking them as "const char *" would help the compilers help us by catching when we try to break the promise. Sometimes, the "constness of a pointer should help compilers help us avoid modifying a piece of memory we promised not to modify through the pointer" contradicts with the idea of (ab)using the constness to signal ownership, i.e., "const pointers point at something owned by somebody else, but via non-const pointers you always own the memory". I wonder if we can do something to separate these two concerns apart, using a trick similar to what we often use with an extra variable "to_free". Doing so would bloat the refspec_item, but unlike the references themselves, there won't be thousands of them, so it may not be an issue, perhaps? >> And again you are conflating "allocated" with "read-write" here. It >> is often convenient if a variable that points at an allocated string >> is of type "char *" and not "const char *", because you do not cast >> it when calling free(). But if you want to make a structure member >> or a variable that holds an allocated string responsible for >> _owning_ the piece of memory, then you need to consistently have the >> member point at an allocated piece of memory (or NULL), no? What >> this patch does, i.e. prepare an on-stack refspec_str[] array that >> is initialized from a constant string, and have .src member point at >> it, would not make .src freeable. In other words, .src pointing at >> an allocated piece of string "some of the time" alone is not a good >> justification to make it a non-const pointer, I would think. > > That's fair enough. I'm trying to work around a broken interface, but do > not solve the underlying issue.
Patrick Steinhardt <ps@pks.im> writes: > In any case, I have now invested some more time into the individual > sites where I was converting to use on-stack strings. In many of the > cases we were indeed able to avoid the issue altogether without too much > of a hassle. The end result is more pleasing indeed, I'd say. Nice to hear that. Thanks.
On Fri, May 31, 2024 at 08:27:13AM -0700, Junio C Hamano wrote: > I wonder if we can do something to separate these two concerns > apart, using a trick similar to what we often use with an extra > variable "to_free". Doing so would bloat the refspec_item, but > unlike the references themselves, there won't be thousands of them, > so it may not be an issue, perhaps? I had a similar thought while looking at this spot a while ago, so I dug this attempt out of my stash. It's quite ugly, as you need to keep the storage pointer and the const pointer in sync. Especially because there's a lot of clever pointer indirection via match_name_with_pattern(). So I don't think it's the best way to refactor this code, but I present it here for your amusement/disgust. This is on top of what you have queued in ps/no-writable-strings. --- branch.c | 6 +++--- builtin/fetch.c | 8 ++++---- builtin/remote.c | 6 +++--- checkout.c | 13 +++++++------ refspec.c | 6 ++++-- refspec.h | 6 ++++-- remote.c | 30 +++++++++++++++++------------- transport.c | 2 +- 8 files changed, 43 insertions(+), 34 deletions(-) diff --git a/branch.c b/branch.c index df5d24fec6..e1585b10c2 100644 --- a/branch.c +++ b/branch.c @@ -38,7 +38,7 @@ static int find_tracked_branch(struct remote *remote, void *priv) if (!remote_find_tracking(remote, &tracking->spec)) { switch (++tracking->matches) { case 1: - string_list_append_nodup(tracking->srcs, tracking->spec.src); + string_list_append_nodup(tracking->srcs, tracking->spec.src_storage); tracking->remote = remote->name; break; case 2: @@ -47,7 +47,7 @@ static int find_tracked_branch(struct remote *remote, void *priv) /* fall through */ default: string_list_append(&ftb->ambiguous_remotes, remote->name); - free(tracking->spec.src); + free(tracking->spec.src_storage); string_list_clear(tracking->srcs, 0); break; } @@ -489,7 +489,7 @@ static int check_tracking_branch(struct remote *remote, void *cb_data) memset(&query, 0, sizeof(struct refspec_item)); query.dst = tracking_branch; res = !remote_find_tracking(remote, &query); - free(query.src); + free(query.src_storage); return res; } diff --git a/builtin/fetch.c b/builtin/fetch.c index 06b60867f5..00a67c99ef 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -443,7 +443,7 @@ static void filter_prefetch_refspec(struct refspec *rs) for (i = 0; i < rs->nr; i++) { struct strbuf new_dst = STRBUF_INIT; - char *old_dst; + const char *old_dst; const char *sub = NULL; if (rs->items[i].negative) @@ -454,8 +454,8 @@ static void filter_prefetch_refspec(struct refspec *rs) ref_namespace[NAMESPACE_TAGS].ref))) { int j; - free(rs->items[i].src); - free(rs->items[i].dst); + free(rs->items[i].src_storage); + free(rs->items[i].dst_storage); for (j = i + 1; j < rs->nr; j++) { rs->items[j - 1] = rs->items[j]; @@ -481,7 +481,7 @@ static void filter_prefetch_refspec(struct refspec *rs) rs->items[i].dst = strbuf_detach(&new_dst, NULL); rs->items[i].force = 1; - free(old_dst); + free(rs->items[i].dst_storage); } } diff --git a/builtin/remote.c b/builtin/remote.c index b44f580b8c..87b97f5ef1 100644 --- a/builtin/remote.c +++ b/builtin/remote.c @@ -496,8 +496,8 @@ static int get_head_names(const struct ref *remote_refs, struct ref_states *stat struct refspec_item refspec = { .force = 0, .pattern = 1, - .src = (char *) "refs/heads/*", - .dst = (char *) "refs/heads/*", + .src = "refs/heads/*", + .dst = "refs/heads/*", }; get_fetch_map(remote_refs, &refspec, &fetch_map_tail, 0); @@ -981,7 +981,7 @@ static int append_ref_to_tracked_list(const char *refname, return 0; memset(&refspec, 0, sizeof(refspec)); - refspec.dst = (char *)refname; + refspec.dst = refname; if (!remote_find_tracking(states->remote, &refspec)) string_list_append(&states->tracked, abbrev_branch(refspec.src)); diff --git a/checkout.c b/checkout.c index cfaea4bd10..327ab5f6e3 100644 --- a/checkout.c +++ b/checkout.c @@ -8,7 +8,7 @@ #include "strbuf.h" struct tracking_name_data { - /* const */ char *src_ref; + const char *src_ref; char *dst_ref; struct object_id *dst_oid; int num_matches; @@ -27,7 +27,7 @@ static int check_tracking_name(struct remote *remote, void *cb_data) query.src = cb->src_ref; if (remote_find_tracking(remote, &query) || repo_get_oid(the_repository, query.dst, cb->dst_oid)) { - free(query.dst); + free(query.dst_storage); return 0; } cb->num_matches++; @@ -38,10 +38,10 @@ static int check_tracking_name(struct remote *remote, void *cb_data) cb->default_dst_oid = dst; } if (cb->dst_ref) { - free(query.dst); + free(query.dst_storage); return 0; } - cb->dst_ref = query.dst; + cb->dst_ref = query.dst_storage; return 0; } @@ -50,14 +50,15 @@ char *unique_tracking_name(const char *name, struct object_id *oid, { struct tracking_name_data cb_data = TRACKING_NAME_DATA_INIT; const char *default_remote = NULL; + char *src_ref; if (!git_config_get_string_tmp("checkout.defaultremote", &default_remote)) cb_data.default_remote = default_remote; - cb_data.src_ref = xstrfmt("refs/heads/%s", name); + cb_data.src_ref = src_ref = xstrfmt("refs/heads/%s", name); cb_data.dst_oid = oid; for_each_remote(check_tracking_name, &cb_data); if (dwim_remotes_matched) *dwim_remotes_matched = cb_data.num_matches; - free(cb_data.src_ref); + free(src_ref); if (cb_data.num_matches == 1) { free(cb_data.default_dst_ref); free(cb_data.default_dst_oid); diff --git a/refspec.c b/refspec.c index 1df5de6c2f..a7473c9628 100644 --- a/refspec.c +++ b/refspec.c @@ -163,8 +163,10 @@ void refspec_item_init_or_die(struct refspec_item *item, const char *refspec, void refspec_item_clear(struct refspec_item *item) { - FREE_AND_NULL(item->src); - FREE_AND_NULL(item->dst); + FREE_AND_NULL(item->src_storage); + item->src = NULL; + FREE_AND_NULL(item->dst_storage); + item->dst = NULL; item->force = 0; item->pattern = 0; item->matching = 0; diff --git a/refspec.h b/refspec.h index 754be45cee..fbc4352397 100644 --- a/refspec.h +++ b/refspec.h @@ -24,8 +24,10 @@ struct refspec_item { unsigned exact_sha1 : 1; unsigned negative : 1; - char *src; - char *dst; + const char *src; + const char *dst; + char *src_storage; + char *dst_storage; }; #define REFSPEC_FETCH 1 diff --git a/remote.c b/remote.c index d319f28757..5cdc3a17cc 100644 --- a/remote.c +++ b/remote.c @@ -827,7 +827,8 @@ int remote_has_url(struct remote *remote, const char *url) } static int match_name_with_pattern(const char *key, const char *name, - const char *value, char **result) + const char *value, char **result, + const char **result_const) { const char *kstar = strchr(key, '*'); size_t klen; @@ -850,6 +851,8 @@ static int match_name_with_pattern(const char *key, const char *name, strbuf_add(&sb, name + klen, namelen - klen - ksuffixlen); strbuf_addstr(&sb, vstar + 1); *result = strbuf_detach(&sb, NULL); + if (result_const) + *result_const = *result; } return ret; } @@ -858,7 +861,7 @@ static int refspec_match(const struct refspec_item *refspec, const char *name) { if (refspec->pattern) - return match_name_with_pattern(refspec->src, name, NULL, NULL); + return match_name_with_pattern(refspec->src, name, NULL, NULL, NULL); return !strcmp(refspec->src, name); } @@ -927,7 +930,7 @@ static int query_matches_negative_refspec(struct refspec *rs, struct refspec_ite const char *key = refspec->dst ? refspec->dst : refspec->src; const char *value = refspec->src; - if (match_name_with_pattern(key, needle, value, &expn_name)) + if (match_name_with_pattern(key, needle, value, &expn_name, NULL)) string_list_append_nodup(&reversed, expn_name); } else if (refspec->matching) { /* For the special matching refspec, any query should match */ @@ -967,12 +970,12 @@ static void query_refspecs_multiple(struct refspec *rs, const char *key = find_src ? refspec->dst : refspec->src; const char *value = find_src ? refspec->src : refspec->dst; const char *needle = find_src ? query->dst : query->src; - char **result = find_src ? &query->src : &query->dst; + char **result = find_src ? &query->src_storage : &query->dst_storage; if (!refspec->dst || refspec->negative) continue; if (refspec->pattern) { - if (match_name_with_pattern(key, needle, value, result)) + if (match_name_with_pattern(key, needle, value, result, NULL)) string_list_append_nodup(results, *result); } else if (!strcmp(needle, key)) { string_list_append(results, value); @@ -985,7 +988,8 @@ int query_refspecs(struct refspec *rs, struct refspec_item *query) int i; int find_src = !query->src; const char *needle = find_src ? query->dst : query->src; - char **result = find_src ? &query->src : &query->dst; + char **result = find_src ? &query->src_storage : &query->dst_storage; + const char **result_const = find_src ? &query->src : &query->dst; if (find_src && !query->dst) BUG("query_refspecs: need either src or dst"); @@ -1001,12 +1005,12 @@ int query_refspecs(struct refspec *rs, struct refspec_item *query) if (!refspec->dst || refspec->negative) continue; if (refspec->pattern) { - if (match_name_with_pattern(key, needle, value, result)) { + if (match_name_with_pattern(key, needle, value, result, result_const)) { query->force = refspec->force; return 0; } } else if (!strcmp(needle, key)) { - *result = xstrdup(value); + *result_const = *result = xstrdup(value); query->force = refspec->force; return 0; } @@ -1019,12 +1023,12 @@ char *apply_refspecs(struct refspec *rs, const char *name) struct refspec_item query; memset(&query, 0, sizeof(struct refspec_item)); - query.src = (char *)name; + query.src = name; if (query_refspecs(rs, &query)) return NULL; - return query.dst; + return query.dst_storage; } int remote_find_tracking(struct remote *remote, struct refspec_item *refspec) @@ -1399,9 +1403,9 @@ static char *get_ref_match(const struct refspec *rs, const struct ref *ref, const char *dst_side = item->dst ? item->dst : item->src; int match; if (direction == FROM_SRC) - match = match_name_with_pattern(item->src, ref->name, dst_side, &name); + match = match_name_with_pattern(item->src, ref->name, dst_side, &name, NULL); else - match = match_name_with_pattern(dst_side, ref->name, item->src, &name); + match = match_name_with_pattern(dst_side, ref->name, item->src, &name, NULL); if (match) { matching_refs = i; break; @@ -2020,7 +2024,7 @@ static struct ref *get_expanded_map(const struct ref *remote_refs, if (strchr(ref->name, '^')) continue; /* a dereference item */ if (match_name_with_pattern(refspec->src, ref->name, - refspec->dst, &expn_name) && + refspec->dst, &expn_name, NULL) && !ignore_symref_update(expn_name, &scratch)) { struct ref *cpy = copy_ref(ref); diff --git a/transport.c b/transport.c index 83ddea8fbc..0cedda425c 100644 --- a/transport.c +++ b/transport.c @@ -550,7 +550,7 @@ static void update_one_tracking_ref(struct remote *remote, char *refname, refs_update_ref(get_main_ref_store(the_repository), "update by push", rs.dst, new_oid, NULL, 0, 0); - free(rs.dst); + free(rs.dst_storage); } }
Jeff King <peff@peff.net> writes: > On Fri, May 31, 2024 at 08:27:13AM -0700, Junio C Hamano wrote: > >> I wonder if we can do something to separate these two concerns >> apart, using a trick similar to what we often use with an extra >> variable "to_free". Doing so would bloat the refspec_item, but >> unlike the references themselves, there won't be thousands of them, >> so it may not be an issue, perhaps? > > I had a similar thought while looking at this spot a while ago, so I dug > this attempt out of my stash. It's quite ugly, as you need to keep the > storage pointer and the const pointer in sync. Especially because > there's a lot of clever pointer indirection via match_name_with_pattern(). Ah, true. The patch itself does not look _too_ bad, but that may simply be because the original is bad enough ;-)
On Wed, Jun 05, 2024 at 06:46:46AM -0400, Jeff King wrote: > On Fri, May 31, 2024 at 08:27:13AM -0700, Junio C Hamano wrote: > > > I wonder if we can do something to separate these two concerns > > apart, using a trick similar to what we often use with an extra > > variable "to_free". Doing so would bloat the refspec_item, but > > unlike the references themselves, there won't be thousands of them, > > so it may not be an issue, perhaps? > > I had a similar thought while looking at this spot a while ago, so I dug > this attempt out of my stash. It's quite ugly, as you need to keep the > storage pointer and the const pointer in sync. Especially because > there's a lot of clever pointer indirection via match_name_with_pattern(). > > So I don't think it's the best way to refactor this code, but I present > it here for your amusement/disgust. This is on top of what you have > queued in ps/no-writable-strings. Just for the record: I'll keep such a refactoring out of this patch series as it is already big enough. I do agree though that we should ideally fix this interface in a subsequent iteration. Patrick
On Wed, Jun 05, 2024 at 10:13:28AM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > On Fri, May 31, 2024 at 08:27:13AM -0700, Junio C Hamano wrote: > > > >> I wonder if we can do something to separate these two concerns > >> apart, using a trick similar to what we often use with an extra > >> variable "to_free". Doing so would bloat the refspec_item, but > >> unlike the references themselves, there won't be thousands of them, > >> so it may not be an issue, perhaps? > > > > I had a similar thought while looking at this spot a while ago, so I dug > > this attempt out of my stash. It's quite ugly, as you need to keep the > > storage pointer and the const pointer in sync. Especially because > > there's a lot of clever pointer indirection via match_name_with_pattern(). > > Ah, true. The patch itself does not look _too_ bad, but that may > simply be because the original is bad enough ;-) A lot of it is not how bad the patch itself looks, but how hard it was to write. Each site which touches "src" needs to consider whether it should be "src_storage", and there was a lot of running the test suite with ASan, followed by head-scratching. I _think_ I got all of the right spots, but who knows. ;) Anyway, I think Patrick is quite sensible in not pursuing it further for this series. But just for posterity, here's a slightly less invasive version of the patch I showed earlier. I had added an extra arguments to match_name_with_pattern() to auto-assign the "const" version, but it turned out that only one caller cared about it (and had to manually assign the const version in its fallback code path anyway!). So this version has a little less noise in it. --- branch.c | 6 +++--- builtin/fetch.c | 8 ++++---- builtin/remote.c | 6 +++--- checkout.c | 13 +++++++------ refspec.c | 6 ++++-- refspec.h | 6 ++++-- remote.c | 12 +++++++----- transport.c | 2 +- 8 files changed, 33 insertions(+), 26 deletions(-) diff --git a/branch.c b/branch.c index df5d24fec6..e1585b10c2 100644 --- a/branch.c +++ b/branch.c @@ -38,7 +38,7 @@ static int find_tracked_branch(struct remote *remote, void *priv) if (!remote_find_tracking(remote, &tracking->spec)) { switch (++tracking->matches) { case 1: - string_list_append_nodup(tracking->srcs, tracking->spec.src); + string_list_append_nodup(tracking->srcs, tracking->spec.src_storage); tracking->remote = remote->name; break; case 2: @@ -47,7 +47,7 @@ static int find_tracked_branch(struct remote *remote, void *priv) /* fall through */ default: string_list_append(&ftb->ambiguous_remotes, remote->name); - free(tracking->spec.src); + free(tracking->spec.src_storage); string_list_clear(tracking->srcs, 0); break; } @@ -489,7 +489,7 @@ static int check_tracking_branch(struct remote *remote, void *cb_data) memset(&query, 0, sizeof(struct refspec_item)); query.dst = tracking_branch; res = !remote_find_tracking(remote, &query); - free(query.src); + free(query.src_storage); return res; } diff --git a/builtin/fetch.c b/builtin/fetch.c index 06b60867f5..00a67c99ef 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -443,7 +443,7 @@ static void filter_prefetch_refspec(struct refspec *rs) for (i = 0; i < rs->nr; i++) { struct strbuf new_dst = STRBUF_INIT; - char *old_dst; + const char *old_dst; const char *sub = NULL; if (rs->items[i].negative) @@ -454,8 +454,8 @@ static void filter_prefetch_refspec(struct refspec *rs) ref_namespace[NAMESPACE_TAGS].ref))) { int j; - free(rs->items[i].src); - free(rs->items[i].dst); + free(rs->items[i].src_storage); + free(rs->items[i].dst_storage); for (j = i + 1; j < rs->nr; j++) { rs->items[j - 1] = rs->items[j]; @@ -481,7 +481,7 @@ static void filter_prefetch_refspec(struct refspec *rs) rs->items[i].dst = strbuf_detach(&new_dst, NULL); rs->items[i].force = 1; - free(old_dst); + free(rs->items[i].dst_storage); } } diff --git a/builtin/remote.c b/builtin/remote.c index b44f580b8c..87b97f5ef1 100644 --- a/builtin/remote.c +++ b/builtin/remote.c @@ -496,8 +496,8 @@ static int get_head_names(const struct ref *remote_refs, struct ref_states *stat struct refspec_item refspec = { .force = 0, .pattern = 1, - .src = (char *) "refs/heads/*", - .dst = (char *) "refs/heads/*", + .src = "refs/heads/*", + .dst = "refs/heads/*", }; get_fetch_map(remote_refs, &refspec, &fetch_map_tail, 0); @@ -981,7 +981,7 @@ static int append_ref_to_tracked_list(const char *refname, return 0; memset(&refspec, 0, sizeof(refspec)); - refspec.dst = (char *)refname; + refspec.dst = refname; if (!remote_find_tracking(states->remote, &refspec)) string_list_append(&states->tracked, abbrev_branch(refspec.src)); diff --git a/checkout.c b/checkout.c index cfaea4bd10..327ab5f6e3 100644 --- a/checkout.c +++ b/checkout.c @@ -8,7 +8,7 @@ #include "strbuf.h" struct tracking_name_data { - /* const */ char *src_ref; + const char *src_ref; char *dst_ref; struct object_id *dst_oid; int num_matches; @@ -27,7 +27,7 @@ static int check_tracking_name(struct remote *remote, void *cb_data) query.src = cb->src_ref; if (remote_find_tracking(remote, &query) || repo_get_oid(the_repository, query.dst, cb->dst_oid)) { - free(query.dst); + free(query.dst_storage); return 0; } cb->num_matches++; @@ -38,10 +38,10 @@ static int check_tracking_name(struct remote *remote, void *cb_data) cb->default_dst_oid = dst; } if (cb->dst_ref) { - free(query.dst); + free(query.dst_storage); return 0; } - cb->dst_ref = query.dst; + cb->dst_ref = query.dst_storage; return 0; } @@ -50,14 +50,15 @@ char *unique_tracking_name(const char *name, struct object_id *oid, { struct tracking_name_data cb_data = TRACKING_NAME_DATA_INIT; const char *default_remote = NULL; + char *src_ref; if (!git_config_get_string_tmp("checkout.defaultremote", &default_remote)) cb_data.default_remote = default_remote; - cb_data.src_ref = xstrfmt("refs/heads/%s", name); + cb_data.src_ref = src_ref = xstrfmt("refs/heads/%s", name); cb_data.dst_oid = oid; for_each_remote(check_tracking_name, &cb_data); if (dwim_remotes_matched) *dwim_remotes_matched = cb_data.num_matches; - free(cb_data.src_ref); + free(src_ref); if (cb_data.num_matches == 1) { free(cb_data.default_dst_ref); free(cb_data.default_dst_oid); diff --git a/refspec.c b/refspec.c index 1df5de6c2f..a7473c9628 100644 --- a/refspec.c +++ b/refspec.c @@ -163,8 +163,10 @@ void refspec_item_init_or_die(struct refspec_item *item, const char *refspec, void refspec_item_clear(struct refspec_item *item) { - FREE_AND_NULL(item->src); - FREE_AND_NULL(item->dst); + FREE_AND_NULL(item->src_storage); + item->src = NULL; + FREE_AND_NULL(item->dst_storage); + item->dst = NULL; item->force = 0; item->pattern = 0; item->matching = 0; diff --git a/refspec.h b/refspec.h index 754be45cee..fbc4352397 100644 --- a/refspec.h +++ b/refspec.h @@ -24,8 +24,10 @@ struct refspec_item { unsigned exact_sha1 : 1; unsigned negative : 1; - char *src; - char *dst; + const char *src; + const char *dst; + char *src_storage; + char *dst_storage; }; #define REFSPEC_FETCH 1 diff --git a/remote.c b/remote.c index d319f28757..d1511ecf83 100644 --- a/remote.c +++ b/remote.c @@ -967,7 +967,7 @@ static void query_refspecs_multiple(struct refspec *rs, const char *key = find_src ? refspec->dst : refspec->src; const char *value = find_src ? refspec->src : refspec->dst; const char *needle = find_src ? query->dst : query->src; - char **result = find_src ? &query->src : &query->dst; + char **result = find_src ? &query->src_storage : &query->dst_storage; if (!refspec->dst || refspec->negative) continue; @@ -985,7 +985,8 @@ int query_refspecs(struct refspec *rs, struct refspec_item *query) int i; int find_src = !query->src; const char *needle = find_src ? query->dst : query->src; - char **result = find_src ? &query->src : &query->dst; + char **result = find_src ? &query->src_storage : &query->dst_storage; + const char **result_const = find_src ? &query->src : &query->dst; if (find_src && !query->dst) BUG("query_refspecs: need either src or dst"); @@ -1002,11 +1003,12 @@ int query_refspecs(struct refspec *rs, struct refspec_item *query) continue; if (refspec->pattern) { if (match_name_with_pattern(key, needle, value, result)) { + *result_const = *result; query->force = refspec->force; return 0; } } else if (!strcmp(needle, key)) { - *result = xstrdup(value); + *result_const = *result = xstrdup(value); query->force = refspec->force; return 0; } @@ -1019,12 +1021,12 @@ char *apply_refspecs(struct refspec *rs, const char *name) struct refspec_item query; memset(&query, 0, sizeof(struct refspec_item)); - query.src = (char *)name; + query.src = name; if (query_refspecs(rs, &query)) return NULL; - return query.dst; + return query.dst_storage; } int remote_find_tracking(struct remote *remote, struct refspec_item *refspec) diff --git a/transport.c b/transport.c index 83ddea8fbc..0cedda425c 100644 --- a/transport.c +++ b/transport.c @@ -550,7 +550,7 @@ static void update_one_tracking_ref(struct remote *remote, char *refname, refs_update_ref(get_main_ref_store(the_repository), "update by push", rs.dst, new_oid, NULL, 0, 0); - free(rs.dst); + free(rs.dst_storage); } }
diff --git a/builtin/remote.c b/builtin/remote.c index d52b1c0e10..0324a5d48d 100644 --- a/builtin/remote.c +++ b/builtin/remote.c @@ -494,11 +494,12 @@ static int get_head_names(const struct ref *remote_refs, struct ref_states *stat struct ref *ref, *matches; struct ref *fetch_map = NULL, **fetch_map_tail = &fetch_map; struct refspec_item refspec; + char refspec_str[] = "refs/heads/*"; memset(&refspec, 0, sizeof(refspec)); refspec.force = 0; refspec.pattern = 1; - refspec.src = refspec.dst = "refs/heads/*"; + refspec.src = refspec.dst = refspec_str; get_fetch_map(remote_refs, &refspec, &fetch_map_tail, 0); matches = guess_remote_head(find_ref_by_name(remote_refs, "HEAD"), fetch_map, 1); diff --git a/diff.c b/diff.c index ffd867ef6c..1439a5a01d 100644 --- a/diff.c +++ b/diff.c @@ -7231,11 +7231,12 @@ size_t fill_textconv(struct repository *r, struct diff_filespec *df, char **outbuf) { + static char empty_str[] = ""; size_t size; if (!driver) { if (!DIFF_FILE_VALID(df)) { - *outbuf = ""; + *outbuf = empty_str; return 0; } if (diff_populate_filespec(r, df, NULL)) diff --git a/entry.c b/entry.c index b8c257f6f9..2fc06ac90c 100644 --- a/entry.c +++ b/entry.c @@ -175,6 +175,7 @@ int finish_delayed_checkout(struct checkout *state, int show_progress) struct string_list_item *filter, *path; struct progress *progress = NULL; struct delayed_checkout *dco = state->delayed_checkout; + char empty_str[] = ""; if (!state->delayed_checkout) return errs; @@ -189,7 +190,7 @@ int finish_delayed_checkout(struct checkout *state, int show_progress) if (!async_query_available_blobs(filter->string, &available_paths)) { /* Filter reported an error */ errs = 1; - filter->string = ""; + filter->string = empty_str; continue; } if (available_paths.nr <= 0) { @@ -199,7 +200,7 @@ int finish_delayed_checkout(struct checkout *state, int show_progress) * filter from the list (see * "string_list_remove_empty_items" call below). */ - filter->string = ""; + filter->string = empty_str; continue; } @@ -225,7 +226,7 @@ int finish_delayed_checkout(struct checkout *state, int show_progress) * Do not ask the filter for available blobs, * again, as the filter is likely buggy. */ - filter->string = ""; + filter->string = empty_str; continue; } ce = index_file_exists(state->istate, path->string, diff --git a/ident.c b/ident.c index cc7afdbf81..df7aa42802 100644 --- a/ident.c +++ b/ident.c @@ -46,9 +46,14 @@ static struct passwd *xgetpwuid_self(int *is_bogus) pw = getpwuid(getuid()); if (!pw) { static struct passwd fallback; - fallback.pw_name = "unknown"; + static char fallback_name[] = "unknown"; #ifndef NO_GECOS_IN_PWENT - fallback.pw_gecos = "Unknown"; + static char fallback_gcos[] = "Unknown"; +#endif + + fallback.pw_name = fallback_name; +#ifndef NO_GECOS_IN_PWENT + fallback.pw_gecos = fallback_gcos; #endif pw = &fallback; if (is_bogus) diff --git a/line-log.c b/line-log.c index 8ff6ccb772..d9bf2c8120 100644 --- a/line-log.c +++ b/line-log.c @@ -1032,6 +1032,7 @@ static int process_diff_filepair(struct rev_info *rev, struct range_set tmp; struct diff_ranges diff; mmfile_t file_parent, file_target; + char empty_str[] = ""; assert(pair->two->path); while (rg) { @@ -1056,7 +1057,7 @@ static int process_diff_filepair(struct rev_info *rev, file_parent.ptr = pair->one->data; file_parent.size = pair->one->size; } else { - file_parent.ptr = ""; + file_parent.ptr = empty_str; file_parent.size = 0; } diff --git a/object-file.c b/object-file.c index 610b1f465c..c9e374e57e 100644 --- a/object-file.c +++ b/object-file.c @@ -282,12 +282,13 @@ static struct cached_object { } *cached_objects; static int cached_object_nr, cached_object_alloc; +static char empty_tree_buf[] = ""; static struct cached_object empty_tree = { .oid = { .hash = EMPTY_TREE_SHA1_BIN_LITERAL, }, .type = OBJ_TREE, - .buf = "", + .buf = empty_tree_buf, }; static struct cached_object *find_cached_object(const struct object_id *oid) diff --git a/pretty.c b/pretty.c index ec05db5655..1a0030b32a 100644 --- a/pretty.c +++ b/pretty.c @@ -1583,9 +1583,10 @@ static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */ return 1; case 'D': { + char empty_str[] = ""; const struct decoration_options opts = { - .prefix = "", - .suffix = "" + .prefix = empty_str, + .suffix = empty_str, }; format_decorations(sb, commit, c->auto_color, &opts); diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c index 1af86bbdec..1908e74dea 100644 --- a/refs/reftable-backend.c +++ b/refs/reftable-backend.c @@ -1285,6 +1285,7 @@ static int write_copy_table(struct reftable_writer *writer, void *cb_data) struct strbuf errbuf = STRBUF_INIT; size_t logs_nr = 0, logs_alloc = 0, i; const char *committer_info; + char head[] = "HEAD"; int ret; committer_info = git_committer_info(0); @@ -1387,7 +1388,7 @@ static int write_copy_table(struct reftable_writer *writer, void *cb_data) if (append_head_reflog) { ALLOC_GROW(logs, logs_nr + 1, logs_alloc); logs[logs_nr] = logs[logs_nr - 1]; - logs[logs_nr].refname = "HEAD"; + logs[logs_nr].refname = head; logs_nr++; } } @@ -1463,7 +1464,7 @@ static int write_copy_table(struct reftable_writer *writer, void *cb_data) string_list_clear(&skip, 0); strbuf_release(&errbuf); for (i = 0; i < logs_nr; i++) { - if (!strcmp(logs[i].refname, "HEAD")) + if (logs[i].refname == head) continue; logs[i].refname = NULL; reftable_log_record_release(&logs[i]);
There are several cases where we initialize non-const fields with string constants. This is invalid and will cause warnings once we enable the `-Wwrite-strings` warning. Adapt those cases to instead use string arrays. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- builtin/remote.c | 3 ++- diff.c | 3 ++- entry.c | 7 ++++--- ident.c | 9 +++++++-- line-log.c | 3 ++- object-file.c | 3 ++- pretty.c | 5 +++-- refs/reftable-backend.c | 5 +++-- 8 files changed, 25 insertions(+), 13 deletions(-)