diff mbox series

[02/19] global: assign non-const strings as required

Message ID 51ee5660a1452797ac0a45819210141c57f3dcb9.1716983704.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series Compile with `-Wwrite-strings` | expand

Commit Message

Patrick Steinhardt May 29, 2024, 12:44 p.m. UTC
There are several cases where we initialize non-const fields with string
constants. This is invalid and will cause warnings once we enable the
`-Wwrite-strings` warning. Adapt those cases to instead use string
arrays.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/remote.c        | 3 ++-
 diff.c                  | 3 ++-
 entry.c                 | 7 ++++---
 ident.c                 | 9 +++++++--
 line-log.c              | 3 ++-
 object-file.c           | 3 ++-
 pretty.c                | 5 +++--
 refs/reftable-backend.c | 5 +++--
 8 files changed, 25 insertions(+), 13 deletions(-)

Comments

Junio C Hamano May 29, 2024, 5:25 p.m. UTC | #1
Patrick Steinhardt <ps@pks.im> writes:

> +	char refspec_str[] = "refs/heads/*";
>  
>  	memset(&refspec, 0, sizeof(refspec));
>  	refspec.force = 0;
>  	refspec.pattern = 1;
> -	refspec.src = refspec.dst = "refs/heads/*";
> +	refspec.src = refspec.dst = refspec_str;
>  	get_fetch_map(remote_refs, &refspec, &fetch_map_tail, 0);
>  	matches = guess_remote_head(find_ref_by_name(remote_refs, "HEAD"),
>  				    fetch_map, 1);

I have no objections to changes along this line, but this makes me
wonder if we somehow have ways to ensure that after everything is
done, refspec_str[], empty_str[], and the like in other hunks remain
to have their initial contents.  Stepping back a bit, without
applying this series, if we told the compiler to store these
constant strings in read-only segment, any attempt to write into
refspec.src or refspec.dst string would have been caught at runtime
as an error.  With the patch, that would no longer work.  The piece
of memory in refspec_str[] is a fair game to be overwritten by
anybody.

Which makes me wonder why these refspec.src and refspec.dst members
are not "const char *" pointers in the first place?  Obviously we do
not expect "refs/heads/*" to be overwritten after storing the pointer
to it in these members and making the get_fetch_map() call.
Patrick Steinhardt May 30, 2024, 11:29 a.m. UTC | #2
On Wed, May 29, 2024 at 10:25:47AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > +	char refspec_str[] = "refs/heads/*";
> >  
> >  	memset(&refspec, 0, sizeof(refspec));
> >  	refspec.force = 0;
> >  	refspec.pattern = 1;
> > -	refspec.src = refspec.dst = "refs/heads/*";
> > +	refspec.src = refspec.dst = refspec_str;
> >  	get_fetch_map(remote_refs, &refspec, &fetch_map_tail, 0);
> >  	matches = guess_remote_head(find_ref_by_name(remote_refs, "HEAD"),
> >  				    fetch_map, 1);
> 
> I have no objections to changes along this line, but this makes me
> wonder if we somehow have ways to ensure that after everything is
> done, refspec_str[], empty_str[], and the like in other hunks remain
> to have their initial contents.  Stepping back a bit, without
> applying this series, if we told the compiler to store these
> constant strings in read-only segment, any attempt to write into
> refspec.src or refspec.dst string would have been caught at runtime
> as an error.  With the patch, that would no longer work.  The piece
> of memory in refspec_str[] is a fair game to be overwritten by
> anybody.

True. Ideally, we wouldn't have to do these acrobatics here in the first
place. That would likely require quite a lot more work and go beyond the
scope of this patch series.

> Which makes me wonder why these refspec.src and refspec.dst members
> are not "const char *" pointers in the first place?  Obviously we do
> not expect "refs/heads/*" to be overwritten after storing the pointer
> to it in these members and making the get_fetch_map() call.

Well, we do. Not in `get_fetch_map()`, but in `query_refspecs()`. It
does weird stuff where it writes the result into either `src` or `dst`
depending on which of these fields is provided by the caller. Which
means that one of the fields would typically be a constant, whereas the
other one will be allocated.

Really, once you start looking you find all kinds of weird interfaces
where we cannot quite make up our mind whether fields are controlled by
the caller or by the callee. This patch series certainly surfaces quite
some places that left me scratching my head.

Patrick
Junio C Hamano May 30, 2024, 7:38 p.m. UTC | #3
Patrick Steinhardt <ps@pks.im> writes:

> Well, we do. Not in `get_fetch_map()`, but in `query_refspecs()`. It
> does weird stuff where it writes the result into either `src` or `dst`
> depending on which of these fields is provided by the caller. Which
> means that one of the fields would typically be a constant, whereas the
> other one will be allocated.

Yes, <src, dst> is used as a pair of <key, value> to query one with
the other (i.e. "where does this one go?" "where does this one come
from?").

But we are not talking about const-ness of the member (iow, once you
point a string with the member, you cannot repoint the pointer to
another string), but we are talking about const-ness of the string
that is pointed by the member (iow, not "char const *src" but "const
char *src"), no?  If I ask "I have this src, where does it go?" with
a refspec element filled with src, the dst member may need to be
updated to point at the string that is the answer of the query, but
that still can be done with "const char *src, *dst", can't it?  That
was what I was wondering.

And again you are conflating "allocated" with "read-write" here.  It
is often convenient if a variable that points at an allocated string
is of type "char *" and not "const char *", because you do not cast
it when calling free().  But if you want to make a structure member
or a variable that holds an allocated string responsible for
_owning_ the piece of memory, then you need to consistently have the
member point at an allocated piece of memory (or NULL), no?  What
this patch does, i.e. prepare an on-stack refspec_str[] array that
is initialized from a constant string, and have .src member point at
it, would not make .src freeable.  In other words, .src pointing at
an allocated piece of string "some of the time" alone is not a good
justification to make it a non-const pointer, I would think.
Patrick Steinhardt May 31, 2024, 1 p.m. UTC | #4
On Thu, May 30, 2024 at 12:38:45PM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > Well, we do. Not in `get_fetch_map()`, but in `query_refspecs()`. It
> > does weird stuff where it writes the result into either `src` or `dst`
> > depending on which of these fields is provided by the caller. Which
> > means that one of the fields would typically be a constant, whereas the
> > other one will be allocated.
> 
> Yes, <src, dst> is used as a pair of <key, value> to query one with
> the other (i.e. "where does this one go?" "where does this one come
> from?").
> 
> But we are not talking about const-ness of the member (iow, once you
> point a string with the member, you cannot repoint the pointer to
> another string), but we are talking about const-ness of the string
> that is pointed by the member (iow, not "char const *src" but "const
> char *src"), no?  If I ask "I have this src, where does it go?" with
> a refspec element filled with src, the dst member may need to be
> updated to point at the string that is the answer of the query, but
> that still can be done with "const char *src, *dst", can't it?  That
> was what I was wondering.

Well, yes, the pointers can be updated. But that doesn't solve the
underlying problem that the interface is not well-prepared to track
ownership of its inputs and outputs.

The root problem is that `struct refspec_item` is being abused for
multiple purposes. One of these purposes it to store parsed refspec
items. Another purpose is to actually serve as a query. The requirements
for those two purposes regarding the lifetime of associated strings is
different:

  - When using it for storage purposes the strings should be owned by
    the structure itself. Consequently, the lifetime should be managed
    by the structure too, namely via `refspec_item_clear()`.

  - When using it for parsing purposes the input string is owned by the
    caller, whereas the output string is owned by the callee and then
    handed over to the caller. Here we cannot rely on the structure
    itself to manage the lifetimes of both strings, at least not in the
    current form.

Now of course, we could amend `struct refspec_item` to grow two
additional fields `src_to_free` and `dst_to_free`. But that only doubles
down on this misdesigned interface.

The proper solution in my opinion is to detangle those two usecases and
split them out into two independent interfaces.

> And again you are conflating "allocated" with "read-write" here.  It
> is often convenient if a variable that points at an allocated string
> is of type "char *" and not "const char *", because you do not cast
> it when calling free().  But if you want to make a structure member
> or a variable that holds an allocated string responsible for
> _owning_ the piece of memory, then you need to consistently have the
> member point at an allocated piece of memory (or NULL), no?  What
> this patch does, i.e. prepare an on-stack refspec_str[] array that
> is initialized from a constant string, and have .src member point at
> it, would not make .src freeable.  In other words, .src pointing at
> an allocated piece of string "some of the time" alone is not a good
> justification to make it a non-const pointer, I would think.

That's fair enough. I'm trying to work around a broken interface, but do
not solve the underlying issue.

Patrick
Patrick Steinhardt May 31, 2024, 1:33 p.m. UTC | #5
On Fri, May 31, 2024 at 03:00:36PM +0200, Patrick Steinhardt wrote:
> On Thu, May 30, 2024 at 12:38:45PM -0700, Junio C Hamano wrote:
> > Patrick Steinhardt <ps@pks.im> writes:
> > 
> > > Well, we do. Not in `get_fetch_map()`, but in `query_refspecs()`. It
> > > does weird stuff where it writes the result into either `src` or `dst`
> > > depending on which of these fields is provided by the caller. Which
> > > means that one of the fields would typically be a constant, whereas the
> > > other one will be allocated.
> > 
> > Yes, <src, dst> is used as a pair of <key, value> to query one with
> > the other (i.e. "where does this one go?" "where does this one come
> > from?").
> > 
> > But we are not talking about const-ness of the member (iow, once you
> > point a string with the member, you cannot repoint the pointer to
> > another string), but we are talking about const-ness of the string
> > that is pointed by the member (iow, not "char const *src" but "const
> > char *src"), no?  If I ask "I have this src, where does it go?" with
> > a refspec element filled with src, the dst member may need to be
> > updated to point at the string that is the answer of the query, but
> > that still can be done with "const char *src, *dst", can't it?  That
> > was what I was wondering.
> 
> Well, yes, the pointers can be updated. But that doesn't solve the
> underlying problem that the interface is not well-prepared to track
> ownership of its inputs and outputs.
> 
> The root problem is that `struct refspec_item` is being abused for
> multiple purposes. One of these purposes it to store parsed refspec
> items. Another purpose is to actually serve as a query. The requirements
> for those two purposes regarding the lifetime of associated strings is
> different:
> 
>   - When using it for storage purposes the strings should be owned by
>     the structure itself. Consequently, the lifetime should be managed
>     by the structure too, namely via `refspec_item_clear()`.
> 
>   - When using it for parsing purposes the input string is owned by the
>     caller, whereas the output string is owned by the callee and then
>     handed over to the caller. Here we cannot rely on the structure
>     itself to manage the lifetimes of both strings, at least not in the
>     current form.
> 
> Now of course, we could amend `struct refspec_item` to grow two
> additional fields `src_to_free` and `dst_to_free`. But that only doubles
> down on this misdesigned interface.
> 
> The proper solution in my opinion is to detangle those two usecases and
> split them out into two independent interfaces.
> 
> > And again you are conflating "allocated" with "read-write" here.  It
> > is often convenient if a variable that points at an allocated string
> > is of type "char *" and not "const char *", because you do not cast
> > it when calling free().  But if you want to make a structure member
> > or a variable that holds an allocated string responsible for
> > _owning_ the piece of memory, then you need to consistently have the
> > member point at an allocated piece of memory (or NULL), no?  What
> > this patch does, i.e. prepare an on-stack refspec_str[] array that
> > is initialized from a constant string, and have .src member point at
> > it, would not make .src freeable.  In other words, .src pointing at
> > an allocated piece of string "some of the time" alone is not a good
> > justification to make it a non-const pointer, I would think.
> 
> That's fair enough. I'm trying to work around a broken interface, but do
> not solve the underlying issue.
> 
> Patrick

In any case, I have now invested some more time into the individual
sites where I was converting to use on-stack strings. In many of the
cases we were indeed able to avoid the issue altogether without too much
of a hassle. The end result is more pleasing indeed, I'd say.

Patrick
Junio C Hamano May 31, 2024, 3:27 p.m. UTC | #6
Patrick Steinhardt <ps@pks.im> writes:

> On Thu, May 30, 2024 at 12:38:45PM -0700, Junio C Hamano wrote:
>> Patrick Steinhardt <ps@pks.im> writes:
>> 
>> > Well, we do. Not in `get_fetch_map()`, but in `query_refspecs()`. It
>> > does weird stuff where it writes the result into either `src` or `dst`
>> > depending on which of these fields is provided by the caller. Which
>> > means that one of the fields would typically be a constant, whereas the
>> > other one will be allocated.
> ...
> Well, yes, the pointers can be updated. But that doesn't solve the
> underlying problem that the interface is not well-prepared to track
> ownership of its inputs and outputs.

Oh, absolutely.  That is exactly why I wondered why .src and .dst
need to be "char *" and not "const char *".  If the piece of memory
these members point at are never modified via these pointers, marking
them as "const char *" would help the compilers help us by catching
when we try to break the promise.

Sometimes, the "constness of a pointer should help compilers help us
avoid modifying a piece of memory we promised not to modify through
the pointer" contradicts with the idea of (ab)using the constness to
signal ownership, i.e., "const pointers point at something owned by
somebody else, but via non-const pointers you always own the memory".

I wonder if we can do something to separate these two concerns
apart, using a trick similar to what we often use with an extra
variable "to_free".  Doing so would bloat the refspec_item, but
unlike the references themselves, there won't be thousands of them,
so it may not be an issue, perhaps?

>> And again you are conflating "allocated" with "read-write" here.  It
>> is often convenient if a variable that points at an allocated string
>> is of type "char *" and not "const char *", because you do not cast
>> it when calling free().  But if you want to make a structure member
>> or a variable that holds an allocated string responsible for
>> _owning_ the piece of memory, then you need to consistently have the
>> member point at an allocated piece of memory (or NULL), no?  What
>> this patch does, i.e. prepare an on-stack refspec_str[] array that
>> is initialized from a constant string, and have .src member point at
>> it, would not make .src freeable.  In other words, .src pointing at
>> an allocated piece of string "some of the time" alone is not a good
>> justification to make it a non-const pointer, I would think.
>
> That's fair enough. I'm trying to work around a broken interface, but do
> not solve the underlying issue.
Junio C Hamano May 31, 2024, 3:27 p.m. UTC | #7
Patrick Steinhardt <ps@pks.im> writes:

> In any case, I have now invested some more time into the individual
> sites where I was converting to use on-stack strings. In many of the
> cases we were indeed able to avoid the issue altogether without too much
> of a hassle. The end result is more pleasing indeed, I'd say.

Nice to hear that.
Thanks.
Jeff King June 5, 2024, 10:46 a.m. UTC | #8
On Fri, May 31, 2024 at 08:27:13AM -0700, Junio C Hamano wrote:

> I wonder if we can do something to separate these two concerns
> apart, using a trick similar to what we often use with an extra
> variable "to_free".  Doing so would bloat the refspec_item, but
> unlike the references themselves, there won't be thousands of them,
> so it may not be an issue, perhaps?

I had a similar thought while looking at this spot a while ago, so I dug
this attempt out of my stash. It's quite ugly, as you need to keep the
storage pointer and the const pointer in sync. Especially because
there's a lot of clever pointer indirection via match_name_with_pattern().

So I don't think it's the best way to refactor this code, but I present
it here for your amusement/disgust. This is on top of what you have
queued in ps/no-writable-strings.

---
 branch.c         |  6 +++---
 builtin/fetch.c  |  8 ++++----
 builtin/remote.c |  6 +++---
 checkout.c       | 13 +++++++------
 refspec.c        |  6 ++++--
 refspec.h        |  6 ++++--
 remote.c         | 30 +++++++++++++++++-------------
 transport.c      |  2 +-
 8 files changed, 43 insertions(+), 34 deletions(-)

diff --git a/branch.c b/branch.c
index df5d24fec6..e1585b10c2 100644
--- a/branch.c
+++ b/branch.c
@@ -38,7 +38,7 @@ static int find_tracked_branch(struct remote *remote, void *priv)
 	if (!remote_find_tracking(remote, &tracking->spec)) {
 		switch (++tracking->matches) {
 		case 1:
-			string_list_append_nodup(tracking->srcs, tracking->spec.src);
+			string_list_append_nodup(tracking->srcs, tracking->spec.src_storage);
 			tracking->remote = remote->name;
 			break;
 		case 2:
@@ -47,7 +47,7 @@ static int find_tracked_branch(struct remote *remote, void *priv)
 			/* fall through */
 		default:
 			string_list_append(&ftb->ambiguous_remotes, remote->name);
-			free(tracking->spec.src);
+			free(tracking->spec.src_storage);
 			string_list_clear(tracking->srcs, 0);
 		break;
 		}
@@ -489,7 +489,7 @@ static int check_tracking_branch(struct remote *remote, void *cb_data)
 	memset(&query, 0, sizeof(struct refspec_item));
 	query.dst = tracking_branch;
 	res = !remote_find_tracking(remote, &query);
-	free(query.src);
+	free(query.src_storage);
 	return res;
 }
 
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 06b60867f5..00a67c99ef 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -443,7 +443,7 @@ static void filter_prefetch_refspec(struct refspec *rs)
 
 	for (i = 0; i < rs->nr; i++) {
 		struct strbuf new_dst = STRBUF_INIT;
-		char *old_dst;
+		const char *old_dst;
 		const char *sub = NULL;
 
 		if (rs->items[i].negative)
@@ -454,8 +454,8 @@ static void filter_prefetch_refspec(struct refspec *rs)
 				 ref_namespace[NAMESPACE_TAGS].ref))) {
 			int j;
 
-			free(rs->items[i].src);
-			free(rs->items[i].dst);
+			free(rs->items[i].src_storage);
+			free(rs->items[i].dst_storage);
 
 			for (j = i + 1; j < rs->nr; j++) {
 				rs->items[j - 1] = rs->items[j];
@@ -481,7 +481,7 @@ static void filter_prefetch_refspec(struct refspec *rs)
 		rs->items[i].dst = strbuf_detach(&new_dst, NULL);
 		rs->items[i].force = 1;
 
-		free(old_dst);
+		free(rs->items[i].dst_storage);
 	}
 }
 
diff --git a/builtin/remote.c b/builtin/remote.c
index b44f580b8c..87b97f5ef1 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -496,8 +496,8 @@ static int get_head_names(const struct ref *remote_refs, struct ref_states *stat
 	struct refspec_item refspec = {
 		.force = 0,
 		.pattern = 1,
-		.src = (char *) "refs/heads/*",
-		.dst = (char *) "refs/heads/*",
+		.src = "refs/heads/*",
+		.dst = "refs/heads/*",
 	};
 
 	get_fetch_map(remote_refs, &refspec, &fetch_map_tail, 0);
@@ -981,7 +981,7 @@ static int append_ref_to_tracked_list(const char *refname,
 		return 0;
 
 	memset(&refspec, 0, sizeof(refspec));
-	refspec.dst = (char *)refname;
+	refspec.dst = refname;
 	if (!remote_find_tracking(states->remote, &refspec))
 		string_list_append(&states->tracked, abbrev_branch(refspec.src));
 
diff --git a/checkout.c b/checkout.c
index cfaea4bd10..327ab5f6e3 100644
--- a/checkout.c
+++ b/checkout.c
@@ -8,7 +8,7 @@
 #include "strbuf.h"
 
 struct tracking_name_data {
-	/* const */ char *src_ref;
+	const char *src_ref;
 	char *dst_ref;
 	struct object_id *dst_oid;
 	int num_matches;
@@ -27,7 +27,7 @@ static int check_tracking_name(struct remote *remote, void *cb_data)
 	query.src = cb->src_ref;
 	if (remote_find_tracking(remote, &query) ||
 	    repo_get_oid(the_repository, query.dst, cb->dst_oid)) {
-		free(query.dst);
+		free(query.dst_storage);
 		return 0;
 	}
 	cb->num_matches++;
@@ -38,10 +38,10 @@ static int check_tracking_name(struct remote *remote, void *cb_data)
 		cb->default_dst_oid = dst;
 	}
 	if (cb->dst_ref) {
-		free(query.dst);
+		free(query.dst_storage);
 		return 0;
 	}
-	cb->dst_ref = query.dst;
+	cb->dst_ref = query.dst_storage;
 	return 0;
 }
 
@@ -50,14 +50,15 @@ char *unique_tracking_name(const char *name, struct object_id *oid,
 {
 	struct tracking_name_data cb_data = TRACKING_NAME_DATA_INIT;
 	const char *default_remote = NULL;
+	char *src_ref;
 	if (!git_config_get_string_tmp("checkout.defaultremote", &default_remote))
 		cb_data.default_remote = default_remote;
-	cb_data.src_ref = xstrfmt("refs/heads/%s", name);
+	cb_data.src_ref = src_ref = xstrfmt("refs/heads/%s", name);
 	cb_data.dst_oid = oid;
 	for_each_remote(check_tracking_name, &cb_data);
 	if (dwim_remotes_matched)
 		*dwim_remotes_matched = cb_data.num_matches;
-	free(cb_data.src_ref);
+	free(src_ref);
 	if (cb_data.num_matches == 1) {
 		free(cb_data.default_dst_ref);
 		free(cb_data.default_dst_oid);
diff --git a/refspec.c b/refspec.c
index 1df5de6c2f..a7473c9628 100644
--- a/refspec.c
+++ b/refspec.c
@@ -163,8 +163,10 @@ void refspec_item_init_or_die(struct refspec_item *item, const char *refspec,
 
 void refspec_item_clear(struct refspec_item *item)
 {
-	FREE_AND_NULL(item->src);
-	FREE_AND_NULL(item->dst);
+	FREE_AND_NULL(item->src_storage);
+	item->src = NULL;
+	FREE_AND_NULL(item->dst_storage);
+	item->dst = NULL;
 	item->force = 0;
 	item->pattern = 0;
 	item->matching = 0;
diff --git a/refspec.h b/refspec.h
index 754be45cee..fbc4352397 100644
--- a/refspec.h
+++ b/refspec.h
@@ -24,8 +24,10 @@ struct refspec_item {
 	unsigned exact_sha1 : 1;
 	unsigned negative : 1;
 
-	char *src;
-	char *dst;
+	const char *src;
+	const char *dst;
+	char *src_storage;
+	char *dst_storage;
 };
 
 #define REFSPEC_FETCH 1
diff --git a/remote.c b/remote.c
index d319f28757..5cdc3a17cc 100644
--- a/remote.c
+++ b/remote.c
@@ -827,7 +827,8 @@ int remote_has_url(struct remote *remote, const char *url)
 }
 
 static int match_name_with_pattern(const char *key, const char *name,
-				   const char *value, char **result)
+				   const char *value, char **result,
+				   const char **result_const)
 {
 	const char *kstar = strchr(key, '*');
 	size_t klen;
@@ -850,6 +851,8 @@ static int match_name_with_pattern(const char *key, const char *name,
 		strbuf_add(&sb, name + klen, namelen - klen - ksuffixlen);
 		strbuf_addstr(&sb, vstar + 1);
 		*result = strbuf_detach(&sb, NULL);
+		if (result_const)
+			*result_const = *result;
 	}
 	return ret;
 }
@@ -858,7 +861,7 @@ static int refspec_match(const struct refspec_item *refspec,
 			 const char *name)
 {
 	if (refspec->pattern)
-		return match_name_with_pattern(refspec->src, name, NULL, NULL);
+		return match_name_with_pattern(refspec->src, name, NULL, NULL, NULL);
 
 	return !strcmp(refspec->src, name);
 }
@@ -927,7 +930,7 @@ static int query_matches_negative_refspec(struct refspec *rs, struct refspec_ite
 			const char *key = refspec->dst ? refspec->dst : refspec->src;
 			const char *value = refspec->src;
 
-			if (match_name_with_pattern(key, needle, value, &expn_name))
+			if (match_name_with_pattern(key, needle, value, &expn_name, NULL))
 				string_list_append_nodup(&reversed, expn_name);
 		} else if (refspec->matching) {
 			/* For the special matching refspec, any query should match */
@@ -967,12 +970,12 @@ static void query_refspecs_multiple(struct refspec *rs,
 		const char *key = find_src ? refspec->dst : refspec->src;
 		const char *value = find_src ? refspec->src : refspec->dst;
 		const char *needle = find_src ? query->dst : query->src;
-		char **result = find_src ? &query->src : &query->dst;
+		char **result = find_src ? &query->src_storage : &query->dst_storage;
 
 		if (!refspec->dst || refspec->negative)
 			continue;
 		if (refspec->pattern) {
-			if (match_name_with_pattern(key, needle, value, result))
+			if (match_name_with_pattern(key, needle, value, result, NULL))
 				string_list_append_nodup(results, *result);
 		} else if (!strcmp(needle, key)) {
 			string_list_append(results, value);
@@ -985,7 +988,8 @@ int query_refspecs(struct refspec *rs, struct refspec_item *query)
 	int i;
 	int find_src = !query->src;
 	const char *needle = find_src ? query->dst : query->src;
-	char **result = find_src ? &query->src : &query->dst;
+	char **result = find_src ? &query->src_storage : &query->dst_storage;
+	const char **result_const = find_src ? &query->src : &query->dst;
 
 	if (find_src && !query->dst)
 		BUG("query_refspecs: need either src or dst");
@@ -1001,12 +1005,12 @@ int query_refspecs(struct refspec *rs, struct refspec_item *query)
 		if (!refspec->dst || refspec->negative)
 			continue;
 		if (refspec->pattern) {
-			if (match_name_with_pattern(key, needle, value, result)) {
+			if (match_name_with_pattern(key, needle, value, result, result_const)) {
 				query->force = refspec->force;
 				return 0;
 			}
 		} else if (!strcmp(needle, key)) {
-			*result = xstrdup(value);
+			*result_const = *result = xstrdup(value);
 			query->force = refspec->force;
 			return 0;
 		}
@@ -1019,12 +1023,12 @@ char *apply_refspecs(struct refspec *rs, const char *name)
 	struct refspec_item query;
 
 	memset(&query, 0, sizeof(struct refspec_item));
-	query.src = (char *)name;
+	query.src = name;
 
 	if (query_refspecs(rs, &query))
 		return NULL;
 
-	return query.dst;
+	return query.dst_storage;
 }
 
 int remote_find_tracking(struct remote *remote, struct refspec_item *refspec)
@@ -1399,9 +1403,9 @@ static char *get_ref_match(const struct refspec *rs, const struct ref *ref,
 			const char *dst_side = item->dst ? item->dst : item->src;
 			int match;
 			if (direction == FROM_SRC)
-				match = match_name_with_pattern(item->src, ref->name, dst_side, &name);
+				match = match_name_with_pattern(item->src, ref->name, dst_side, &name, NULL);
 			else
-				match = match_name_with_pattern(dst_side, ref->name, item->src, &name);
+				match = match_name_with_pattern(dst_side, ref->name, item->src, &name, NULL);
 			if (match) {
 				matching_refs = i;
 				break;
@@ -2020,7 +2024,7 @@ static struct ref *get_expanded_map(const struct ref *remote_refs,
 		if (strchr(ref->name, '^'))
 			continue; /* a dereference item */
 		if (match_name_with_pattern(refspec->src, ref->name,
-					    refspec->dst, &expn_name) &&
+					    refspec->dst, &expn_name, NULL) &&
 		    !ignore_symref_update(expn_name, &scratch)) {
 			struct ref *cpy = copy_ref(ref);
 
diff --git a/transport.c b/transport.c
index 83ddea8fbc..0cedda425c 100644
--- a/transport.c
+++ b/transport.c
@@ -550,7 +550,7 @@ static void update_one_tracking_ref(struct remote *remote, char *refname,
 			refs_update_ref(get_main_ref_store(the_repository),
 					"update by push", rs.dst, new_oid,
 					NULL, 0, 0);
-		free(rs.dst);
+		free(rs.dst_storage);
 	}
 }
Junio C Hamano June 5, 2024, 5:13 p.m. UTC | #9
Jeff King <peff@peff.net> writes:

> On Fri, May 31, 2024 at 08:27:13AM -0700, Junio C Hamano wrote:
>
>> I wonder if we can do something to separate these two concerns
>> apart, using a trick similar to what we often use with an extra
>> variable "to_free".  Doing so would bloat the refspec_item, but
>> unlike the references themselves, there won't be thousands of them,
>> so it may not be an issue, perhaps?
>
> I had a similar thought while looking at this spot a while ago, so I dug
> this attempt out of my stash. It's quite ugly, as you need to keep the
> storage pointer and the const pointer in sync. Especially because
> there's a lot of clever pointer indirection via match_name_with_pattern().

Ah, true.  The patch itself does not look _too_ bad, but that may
simply be because the original is bad enough ;-)
Patrick Steinhardt June 6, 2024, 10:36 a.m. UTC | #10
On Wed, Jun 05, 2024 at 06:46:46AM -0400, Jeff King wrote:
> On Fri, May 31, 2024 at 08:27:13AM -0700, Junio C Hamano wrote:
> 
> > I wonder if we can do something to separate these two concerns
> > apart, using a trick similar to what we often use with an extra
> > variable "to_free".  Doing so would bloat the refspec_item, but
> > unlike the references themselves, there won't be thousands of them,
> > so it may not be an issue, perhaps?
> 
> I had a similar thought while looking at this spot a while ago, so I dug
> this attempt out of my stash. It's quite ugly, as you need to keep the
> storage pointer and the const pointer in sync. Especially because
> there's a lot of clever pointer indirection via match_name_with_pattern().
> 
> So I don't think it's the best way to refactor this code, but I present
> it here for your amusement/disgust. This is on top of what you have
> queued in ps/no-writable-strings.

Just for the record: I'll keep such a refactoring out of this patch
series as it is already big enough. I do agree though that we should
ideally fix this interface in a subsequent iteration.

Patrick
Jeff King June 8, 2024, 10:59 a.m. UTC | #11
On Wed, Jun 05, 2024 at 10:13:28AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > On Fri, May 31, 2024 at 08:27:13AM -0700, Junio C Hamano wrote:
> >
> >> I wonder if we can do something to separate these two concerns
> >> apart, using a trick similar to what we often use with an extra
> >> variable "to_free".  Doing so would bloat the refspec_item, but
> >> unlike the references themselves, there won't be thousands of them,
> >> so it may not be an issue, perhaps?
> >
> > I had a similar thought while looking at this spot a while ago, so I dug
> > this attempt out of my stash. It's quite ugly, as you need to keep the
> > storage pointer and the const pointer in sync. Especially because
> > there's a lot of clever pointer indirection via match_name_with_pattern().
> 
> Ah, true.  The patch itself does not look _too_ bad, but that may
> simply be because the original is bad enough ;-)

A lot of it is not how bad the patch itself looks, but how hard it was
to write. Each site which touches "src" needs to consider whether it
should be "src_storage", and there was a lot of running the test suite
with ASan, followed by head-scratching. I _think_ I got all of the right
spots, but who knows. ;)

Anyway, I think Patrick is quite sensible in not pursuing it further for
this series. But just for posterity, here's a slightly less invasive
version of the patch I showed earlier. I had added an extra arguments to
match_name_with_pattern() to auto-assign the "const" version, but it
turned out that only one caller cared about it (and had to manually
assign the const version in its fallback code path anyway!).

So this version has a little less noise in it.

---
 branch.c         |  6 +++---
 builtin/fetch.c  |  8 ++++----
 builtin/remote.c |  6 +++---
 checkout.c       | 13 +++++++------
 refspec.c        |  6 ++++--
 refspec.h        |  6 ++++--
 remote.c         | 12 +++++++-----
 transport.c      |  2 +-
 8 files changed, 33 insertions(+), 26 deletions(-)

diff --git a/branch.c b/branch.c
index df5d24fec6..e1585b10c2 100644
--- a/branch.c
+++ b/branch.c
@@ -38,7 +38,7 @@ static int find_tracked_branch(struct remote *remote, void *priv)
 	if (!remote_find_tracking(remote, &tracking->spec)) {
 		switch (++tracking->matches) {
 		case 1:
-			string_list_append_nodup(tracking->srcs, tracking->spec.src);
+			string_list_append_nodup(tracking->srcs, tracking->spec.src_storage);
 			tracking->remote = remote->name;
 			break;
 		case 2:
@@ -47,7 +47,7 @@ static int find_tracked_branch(struct remote *remote, void *priv)
 			/* fall through */
 		default:
 			string_list_append(&ftb->ambiguous_remotes, remote->name);
-			free(tracking->spec.src);
+			free(tracking->spec.src_storage);
 			string_list_clear(tracking->srcs, 0);
 		break;
 		}
@@ -489,7 +489,7 @@ static int check_tracking_branch(struct remote *remote, void *cb_data)
 	memset(&query, 0, sizeof(struct refspec_item));
 	query.dst = tracking_branch;
 	res = !remote_find_tracking(remote, &query);
-	free(query.src);
+	free(query.src_storage);
 	return res;
 }
 
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 06b60867f5..00a67c99ef 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -443,7 +443,7 @@ static void filter_prefetch_refspec(struct refspec *rs)
 
 	for (i = 0; i < rs->nr; i++) {
 		struct strbuf new_dst = STRBUF_INIT;
-		char *old_dst;
+		const char *old_dst;
 		const char *sub = NULL;
 
 		if (rs->items[i].negative)
@@ -454,8 +454,8 @@ static void filter_prefetch_refspec(struct refspec *rs)
 				 ref_namespace[NAMESPACE_TAGS].ref))) {
 			int j;
 
-			free(rs->items[i].src);
-			free(rs->items[i].dst);
+			free(rs->items[i].src_storage);
+			free(rs->items[i].dst_storage);
 
 			for (j = i + 1; j < rs->nr; j++) {
 				rs->items[j - 1] = rs->items[j];
@@ -481,7 +481,7 @@ static void filter_prefetch_refspec(struct refspec *rs)
 		rs->items[i].dst = strbuf_detach(&new_dst, NULL);
 		rs->items[i].force = 1;
 
-		free(old_dst);
+		free(rs->items[i].dst_storage);
 	}
 }
 
diff --git a/builtin/remote.c b/builtin/remote.c
index b44f580b8c..87b97f5ef1 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -496,8 +496,8 @@ static int get_head_names(const struct ref *remote_refs, struct ref_states *stat
 	struct refspec_item refspec = {
 		.force = 0,
 		.pattern = 1,
-		.src = (char *) "refs/heads/*",
-		.dst = (char *) "refs/heads/*",
+		.src = "refs/heads/*",
+		.dst = "refs/heads/*",
 	};
 
 	get_fetch_map(remote_refs, &refspec, &fetch_map_tail, 0);
@@ -981,7 +981,7 @@ static int append_ref_to_tracked_list(const char *refname,
 		return 0;
 
 	memset(&refspec, 0, sizeof(refspec));
-	refspec.dst = (char *)refname;
+	refspec.dst = refname;
 	if (!remote_find_tracking(states->remote, &refspec))
 		string_list_append(&states->tracked, abbrev_branch(refspec.src));
 
diff --git a/checkout.c b/checkout.c
index cfaea4bd10..327ab5f6e3 100644
--- a/checkout.c
+++ b/checkout.c
@@ -8,7 +8,7 @@
 #include "strbuf.h"
 
 struct tracking_name_data {
-	/* const */ char *src_ref;
+	const char *src_ref;
 	char *dst_ref;
 	struct object_id *dst_oid;
 	int num_matches;
@@ -27,7 +27,7 @@ static int check_tracking_name(struct remote *remote, void *cb_data)
 	query.src = cb->src_ref;
 	if (remote_find_tracking(remote, &query) ||
 	    repo_get_oid(the_repository, query.dst, cb->dst_oid)) {
-		free(query.dst);
+		free(query.dst_storage);
 		return 0;
 	}
 	cb->num_matches++;
@@ -38,10 +38,10 @@ static int check_tracking_name(struct remote *remote, void *cb_data)
 		cb->default_dst_oid = dst;
 	}
 	if (cb->dst_ref) {
-		free(query.dst);
+		free(query.dst_storage);
 		return 0;
 	}
-	cb->dst_ref = query.dst;
+	cb->dst_ref = query.dst_storage;
 	return 0;
 }
 
@@ -50,14 +50,15 @@ char *unique_tracking_name(const char *name, struct object_id *oid,
 {
 	struct tracking_name_data cb_data = TRACKING_NAME_DATA_INIT;
 	const char *default_remote = NULL;
+	char *src_ref;
 	if (!git_config_get_string_tmp("checkout.defaultremote", &default_remote))
 		cb_data.default_remote = default_remote;
-	cb_data.src_ref = xstrfmt("refs/heads/%s", name);
+	cb_data.src_ref = src_ref = xstrfmt("refs/heads/%s", name);
 	cb_data.dst_oid = oid;
 	for_each_remote(check_tracking_name, &cb_data);
 	if (dwim_remotes_matched)
 		*dwim_remotes_matched = cb_data.num_matches;
-	free(cb_data.src_ref);
+	free(src_ref);
 	if (cb_data.num_matches == 1) {
 		free(cb_data.default_dst_ref);
 		free(cb_data.default_dst_oid);
diff --git a/refspec.c b/refspec.c
index 1df5de6c2f..a7473c9628 100644
--- a/refspec.c
+++ b/refspec.c
@@ -163,8 +163,10 @@ void refspec_item_init_or_die(struct refspec_item *item, const char *refspec,
 
 void refspec_item_clear(struct refspec_item *item)
 {
-	FREE_AND_NULL(item->src);
-	FREE_AND_NULL(item->dst);
+	FREE_AND_NULL(item->src_storage);
+	item->src = NULL;
+	FREE_AND_NULL(item->dst_storage);
+	item->dst = NULL;
 	item->force = 0;
 	item->pattern = 0;
 	item->matching = 0;
diff --git a/refspec.h b/refspec.h
index 754be45cee..fbc4352397 100644
--- a/refspec.h
+++ b/refspec.h
@@ -24,8 +24,10 @@ struct refspec_item {
 	unsigned exact_sha1 : 1;
 	unsigned negative : 1;
 
-	char *src;
-	char *dst;
+	const char *src;
+	const char *dst;
+	char *src_storage;
+	char *dst_storage;
 };
 
 #define REFSPEC_FETCH 1
diff --git a/remote.c b/remote.c
index d319f28757..d1511ecf83 100644
--- a/remote.c
+++ b/remote.c
@@ -967,7 +967,7 @@ static void query_refspecs_multiple(struct refspec *rs,
 		const char *key = find_src ? refspec->dst : refspec->src;
 		const char *value = find_src ? refspec->src : refspec->dst;
 		const char *needle = find_src ? query->dst : query->src;
-		char **result = find_src ? &query->src : &query->dst;
+		char **result = find_src ? &query->src_storage : &query->dst_storage;
 
 		if (!refspec->dst || refspec->negative)
 			continue;
@@ -985,7 +985,8 @@ int query_refspecs(struct refspec *rs, struct refspec_item *query)
 	int i;
 	int find_src = !query->src;
 	const char *needle = find_src ? query->dst : query->src;
-	char **result = find_src ? &query->src : &query->dst;
+	char **result = find_src ? &query->src_storage : &query->dst_storage;
+	const char **result_const = find_src ? &query->src : &query->dst;
 
 	if (find_src && !query->dst)
 		BUG("query_refspecs: need either src or dst");
@@ -1002,11 +1003,12 @@ int query_refspecs(struct refspec *rs, struct refspec_item *query)
 			continue;
 		if (refspec->pattern) {
 			if (match_name_with_pattern(key, needle, value, result)) {
+				*result_const = *result;
 				query->force = refspec->force;
 				return 0;
 			}
 		} else if (!strcmp(needle, key)) {
-			*result = xstrdup(value);
+			*result_const = *result = xstrdup(value);
 			query->force = refspec->force;
 			return 0;
 		}
@@ -1019,12 +1021,12 @@ char *apply_refspecs(struct refspec *rs, const char *name)
 	struct refspec_item query;
 
 	memset(&query, 0, sizeof(struct refspec_item));
-	query.src = (char *)name;
+	query.src = name;
 
 	if (query_refspecs(rs, &query))
 		return NULL;
 
-	return query.dst;
+	return query.dst_storage;
 }
 
 int remote_find_tracking(struct remote *remote, struct refspec_item *refspec)
diff --git a/transport.c b/transport.c
index 83ddea8fbc..0cedda425c 100644
--- a/transport.c
+++ b/transport.c
@@ -550,7 +550,7 @@ static void update_one_tracking_ref(struct remote *remote, char *refname,
 			refs_update_ref(get_main_ref_store(the_repository),
 					"update by push", rs.dst, new_oid,
 					NULL, 0, 0);
-		free(rs.dst);
+		free(rs.dst_storage);
 	}
 }
diff mbox series

Patch

diff --git a/builtin/remote.c b/builtin/remote.c
index d52b1c0e10..0324a5d48d 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -494,11 +494,12 @@  static int get_head_names(const struct ref *remote_refs, struct ref_states *stat
 	struct ref *ref, *matches;
 	struct ref *fetch_map = NULL, **fetch_map_tail = &fetch_map;
 	struct refspec_item refspec;
+	char refspec_str[] = "refs/heads/*";
 
 	memset(&refspec, 0, sizeof(refspec));
 	refspec.force = 0;
 	refspec.pattern = 1;
-	refspec.src = refspec.dst = "refs/heads/*";
+	refspec.src = refspec.dst = refspec_str;
 	get_fetch_map(remote_refs, &refspec, &fetch_map_tail, 0);
 	matches = guess_remote_head(find_ref_by_name(remote_refs, "HEAD"),
 				    fetch_map, 1);
diff --git a/diff.c b/diff.c
index ffd867ef6c..1439a5a01d 100644
--- a/diff.c
+++ b/diff.c
@@ -7231,11 +7231,12 @@  size_t fill_textconv(struct repository *r,
 		     struct diff_filespec *df,
 		     char **outbuf)
 {
+	static char empty_str[] = "";
 	size_t size;
 
 	if (!driver) {
 		if (!DIFF_FILE_VALID(df)) {
-			*outbuf = "";
+			*outbuf = empty_str;
 			return 0;
 		}
 		if (diff_populate_filespec(r, df, NULL))
diff --git a/entry.c b/entry.c
index b8c257f6f9..2fc06ac90c 100644
--- a/entry.c
+++ b/entry.c
@@ -175,6 +175,7 @@  int finish_delayed_checkout(struct checkout *state, int show_progress)
 	struct string_list_item *filter, *path;
 	struct progress *progress = NULL;
 	struct delayed_checkout *dco = state->delayed_checkout;
+	char empty_str[] = "";
 
 	if (!state->delayed_checkout)
 		return errs;
@@ -189,7 +190,7 @@  int finish_delayed_checkout(struct checkout *state, int show_progress)
 			if (!async_query_available_blobs(filter->string, &available_paths)) {
 				/* Filter reported an error */
 				errs = 1;
-				filter->string = "";
+				filter->string = empty_str;
 				continue;
 			}
 			if (available_paths.nr <= 0) {
@@ -199,7 +200,7 @@  int finish_delayed_checkout(struct checkout *state, int show_progress)
 				 * filter from the list (see
 				 * "string_list_remove_empty_items" call below).
 				 */
-				filter->string = "";
+				filter->string = empty_str;
 				continue;
 			}
 
@@ -225,7 +226,7 @@  int finish_delayed_checkout(struct checkout *state, int show_progress)
 					 * Do not ask the filter for available blobs,
 					 * again, as the filter is likely buggy.
 					 */
-					filter->string = "";
+					filter->string = empty_str;
 					continue;
 				}
 				ce = index_file_exists(state->istate, path->string,
diff --git a/ident.c b/ident.c
index cc7afdbf81..df7aa42802 100644
--- a/ident.c
+++ b/ident.c
@@ -46,9 +46,14 @@  static struct passwd *xgetpwuid_self(int *is_bogus)
 	pw = getpwuid(getuid());
 	if (!pw) {
 		static struct passwd fallback;
-		fallback.pw_name = "unknown";
+		static char fallback_name[] = "unknown";
 #ifndef NO_GECOS_IN_PWENT
-		fallback.pw_gecos = "Unknown";
+		static char fallback_gcos[] = "Unknown";
+#endif
+
+		fallback.pw_name = fallback_name;
+#ifndef NO_GECOS_IN_PWENT
+		fallback.pw_gecos = fallback_gcos;
 #endif
 		pw = &fallback;
 		if (is_bogus)
diff --git a/line-log.c b/line-log.c
index 8ff6ccb772..d9bf2c8120 100644
--- a/line-log.c
+++ b/line-log.c
@@ -1032,6 +1032,7 @@  static int process_diff_filepair(struct rev_info *rev,
 	struct range_set tmp;
 	struct diff_ranges diff;
 	mmfile_t file_parent, file_target;
+	char empty_str[] = "";
 
 	assert(pair->two->path);
 	while (rg) {
@@ -1056,7 +1057,7 @@  static int process_diff_filepair(struct rev_info *rev,
 		file_parent.ptr = pair->one->data;
 		file_parent.size = pair->one->size;
 	} else {
-		file_parent.ptr = "";
+		file_parent.ptr = empty_str;
 		file_parent.size = 0;
 	}
 
diff --git a/object-file.c b/object-file.c
index 610b1f465c..c9e374e57e 100644
--- a/object-file.c
+++ b/object-file.c
@@ -282,12 +282,13 @@  static struct cached_object {
 } *cached_objects;
 static int cached_object_nr, cached_object_alloc;
 
+static char empty_tree_buf[] = "";
 static struct cached_object empty_tree = {
 	.oid = {
 		.hash = EMPTY_TREE_SHA1_BIN_LITERAL,
 	},
 	.type = OBJ_TREE,
-	.buf = "",
+	.buf = empty_tree_buf,
 };
 
 static struct cached_object *find_cached_object(const struct object_id *oid)
diff --git a/pretty.c b/pretty.c
index ec05db5655..1a0030b32a 100644
--- a/pretty.c
+++ b/pretty.c
@@ -1583,9 +1583,10 @@  static size_t format_commit_one(struct strbuf *sb, /* in UTF-8 */
 		return 1;
 	case 'D':
 		{
+			char empty_str[] = "";
 			const struct decoration_options opts = {
-				.prefix = "",
-				.suffix = ""
+				.prefix = empty_str,
+				.suffix = empty_str,
 			};
 
 			format_decorations(sb, commit, c->auto_color, &opts);
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 1af86bbdec..1908e74dea 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1285,6 +1285,7 @@  static int write_copy_table(struct reftable_writer *writer, void *cb_data)
 	struct strbuf errbuf = STRBUF_INIT;
 	size_t logs_nr = 0, logs_alloc = 0, i;
 	const char *committer_info;
+	char head[] = "HEAD";
 	int ret;
 
 	committer_info = git_committer_info(0);
@@ -1387,7 +1388,7 @@  static int write_copy_table(struct reftable_writer *writer, void *cb_data)
 		if (append_head_reflog) {
 			ALLOC_GROW(logs, logs_nr + 1, logs_alloc);
 			logs[logs_nr] = logs[logs_nr - 1];
-			logs[logs_nr].refname = "HEAD";
+			logs[logs_nr].refname = head;
 			logs_nr++;
 		}
 	}
@@ -1463,7 +1464,7 @@  static int write_copy_table(struct reftable_writer *writer, void *cb_data)
 	string_list_clear(&skip, 0);
 	strbuf_release(&errbuf);
 	for (i = 0; i < logs_nr; i++) {
-		if (!strcmp(logs[i].refname, "HEAD"))
+		if (logs[i].refname == head)
 			continue;
 		logs[i].refname = NULL;
 		reftable_log_record_release(&logs[i]);