diff mbox series

[v2,08/22] config: fix leaking comment character config

Message ID 8fbd72a1002d1a285847c62b5524041927a7b4d4.1723121979.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series Memory leak fixes (pt.4) | expand

Commit Message

Patrick Steinhardt Aug. 8, 2024, 1:05 p.m. UTC
When the comment line character has been specified multiple times in the
configuration, then `git_default_core_config()` will cause a memory leak
because it unconditionally copies the string into `comment_line_str`
without free'ing the previous value. In fact, it can't easily free the
value in the first place because it may contain a string constant.

Refactor the code such that we track allocated comment character strings
via a separate non-constant variable `comment_line_str_allocated`. Adapt
sites that set `comment_line_str` to set both and free the old value
that was stored in `comment_line_str_allocated`.

This memory leak is being hit in t3404. As there are still other memory
leaks in that file we cannot yet mark it as passing with leak checking
enabled.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/commit.c | 7 +++++--
 config.c         | 4 +++-
 environment.c    | 1 +
 environment.h    | 1 +
 4 files changed, 10 insertions(+), 3 deletions(-)

Comments

Junio C Hamano Aug. 8, 2024, 5:12 p.m. UTC | #1
Patrick Steinhardt <ps@pks.im> writes:

> diff --git a/config.c b/config.c
> index 6421894614..cb78b652ee 100644
> --- a/config.c
> +++ b/config.c
> @@ -1596,7 +1596,9 @@ static int git_default_core_config(const char *var, const char *value,
>  		else if (value[0]) {
>  			if (strchr(value, '\n'))
>  				return error(_("%s cannot contain newline"), var);
> -			comment_line_str = xstrdup(value);
> +			free(comment_line_str_allocated);
> +			comment_line_str = comment_line_str_allocated =
> +				xstrdup(value);

If you are to follow the _to_free pattern, you do not have to
allocate here, no?  We borrow the value in the configset and point
at it via comment_line_str, and clear comment_line_str_to_free
because there is nothing to free now.  I.e.

			comment_line_str = value;
			FREE_AND_NULL(comment_line_str_allocated);

I still think the approach taken by the previous iteration was
simpler and much less error prone, though.
Patrick Steinhardt Aug. 12, 2024, 7:45 a.m. UTC | #2
On Thu, Aug 08, 2024 at 10:12:26AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > diff --git a/config.c b/config.c
> > index 6421894614..cb78b652ee 100644
> > --- a/config.c
> > +++ b/config.c
> > @@ -1596,7 +1596,9 @@ static int git_default_core_config(const char *var, const char *value,
> >  		else if (value[0]) {
> >  			if (strchr(value, '\n'))
> >  				return error(_("%s cannot contain newline"), var);
> > -			comment_line_str = xstrdup(value);
> > +			free(comment_line_str_allocated);
> > +			comment_line_str = comment_line_str_allocated =
> > +				xstrdup(value);
> 
> If you are to follow the _to_free pattern, you do not have to
> allocate here, no?  We borrow the value in the configset and point
> at it via comment_line_str, and clear comment_line_str_to_free
> because there is nothing to free now.  I.e.
> 
> 			comment_line_str = value;
> 			FREE_AND_NULL(comment_line_str_allocated);

Only if it is guaranteed that the configuration will never be re-read,
which would end up discarding memory owned by the old string. Which
should be the case already, but to the best of my knowledge we do not
document the expected lifetime of config strings anywhere.

> I still think the approach taken by the previous iteration was
> simpler and much less error prone, though.

I personally prefer this iteration. I feel that it is way more
discoverable to have an explicit indicator that something needs to be
freed, which the `_allocated` suffix brings us. With the old version,
the caller needs to become aware that the constant string may sometimes
need to be freed, and that sometimes is figured out by comparing to a
magic variable, which feels worse to me.

Ultimately, both solutions are okay-ish, but I don't consider either of
them to be great. As mentioned elsewhere, I think the best solution
would be to adapt the `struct strbuf` interface to have an initializer
like `STRBUF_INIT_CONST("foobar")` that allows us to initialize it with
a string constant. There wouldn't be any need to have two variables
anymore, and the `strbuf` API would handle the lifecycle of its contents
for us. In any case, I'd say this is a #leftoverbit and is better done
in a subsequent patch series.

I don't really think it makes sense to reroll this version to swap out
the patch for the first version again, but am happy to adapt if you
prefer that.

Thanks!

Patrick
Junio C Hamano Aug. 12, 2024, 8:32 p.m. UTC | #3
Patrick Steinhardt <ps@pks.im> writes:

> On Thu, Aug 08, 2024 at 10:12:26AM -0700, Junio C Hamano wrote:
>> Patrick Steinhardt <ps@pks.im> writes:
>> 
>> > diff --git a/config.c b/config.c
>> > index 6421894614..cb78b652ee 100644
>> > --- a/config.c
>> > +++ b/config.c
>> > @@ -1596,7 +1596,9 @@ static int git_default_core_config(const char *var, const char *value,
>> >  		else if (value[0]) {
>> >  			if (strchr(value, '\n'))
>> >  				return error(_("%s cannot contain newline"), var);
>> > -			comment_line_str = xstrdup(value);
>> > +			free(comment_line_str_allocated);
>> > +			comment_line_str = comment_line_str_allocated =
>> > +				xstrdup(value);
>> 
>> If you are to follow the _to_free pattern, you do not have to
>> allocate here, no?  We borrow the value in the configset and point
>> at it via comment_line_str, and clear comment_line_str_to_free
>> because there is nothing to free now.  I.e.
>> 
>> 			comment_line_str = value;
>> 			FREE_AND_NULL(comment_line_str_allocated);
>
> Only if it is guaranteed that the configuration will never be re-read,
> which would end up discarding memory owned by the old string. Which
> should be the case already, but to the best of my knowledge we do not
> document the expected lifetime of config strings anywhere.

Then let's mark it as #leftoverbits to document it.  Many other code
paths depend on it.

>> I still think the approach taken by the previous iteration was
>> simpler and much less error prone, though.
>
> I personally prefer this iteration.

If so, then let's fully take advantage of the fact that you have a
to-free variable dedicated for the comment_line_str variable.

I still think it is a maintenance burden to keep them always in sync
(which is another thing the developers have to remember---when they
are updating _this_ particular variable, an extra rule applies and
they need to take care of this _allocated thing associated with it),
and the first approach, by not forcing all the other assignment code
paths to worry about it, simplifies the mental model for developers
greatly (i.e. we know we do not own the initial value, but
everything else we allocate thus we free everything but the initial
value), in exchange for a slightly wasteful allocation.

The approach in the second patch is worse in two counds compared to
the original.  It does wasteful allocation (which we do not have
to---the fix was shown above).  It also burdens the developers to
know that they have to manually manage the _allocated half of the
two-variable pair.
Patrick Steinhardt Aug. 13, 2024, 6:54 a.m. UTC | #4
On Mon, Aug 12, 2024 at 01:32:53PM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > On Thu, Aug 08, 2024 at 10:12:26AM -0700, Junio C Hamano wrote:
> >> Patrick Steinhardt <ps@pks.im> writes:
> >> 
> >> > diff --git a/config.c b/config.c
> >> > index 6421894614..cb78b652ee 100644
> >> > --- a/config.c
> >> > +++ b/config.c
> >> > @@ -1596,7 +1596,9 @@ static int git_default_core_config(const char *var, const char *value,
> >> >  		else if (value[0]) {
> >> >  			if (strchr(value, '\n'))
> >> >  				return error(_("%s cannot contain newline"), var);
> >> > -			comment_line_str = xstrdup(value);
> >> > +			free(comment_line_str_allocated);
> >> > +			comment_line_str = comment_line_str_allocated =
> >> > +				xstrdup(value);
> >> 
> >> If you are to follow the _to_free pattern, you do not have to
> >> allocate here, no?  We borrow the value in the configset and point
> >> at it via comment_line_str, and clear comment_line_str_to_free
> >> because there is nothing to free now.  I.e.
> >> 
> >> 			comment_line_str = value;
> >> 			FREE_AND_NULL(comment_line_str_allocated);
> >
> > Only if it is guaranteed that the configuration will never be re-read,
> > which would end up discarding memory owned by the old string. Which
> > should be the case already, but to the best of my knowledge we do not
> > document the expected lifetime of config strings anywhere.
> 
> Then let's mark it as #leftoverbits to document it.  Many other code
> paths depend on it.

Okay.

> >> I still think the approach taken by the previous iteration was
> >> simpler and much less error prone, though.
> >
> > I personally prefer this iteration.
> 
> If so, then let's fully take advantage of the fact that you have a
> to-free variable dedicated for the comment_line_str variable.

Can do.

> I still think it is a maintenance burden to keep them always in sync
> (which is another thing the developers have to remember---when they
> are updating _this_ particular variable, an extra rule applies and
> they need to take care of this _allocated thing associated with it),
> and the first approach, by not forcing all the other assignment code
> paths to worry about it, simplifies the mental model for developers
> greatly (i.e. we know we do not own the initial value, but
> everything else we allocate thus we free everything but the initial
> value), in exchange for a slightly wasteful allocation.
> 
> The approach in the second patch is worse in two counds compared to
> the original.  It does wasteful allocation (which we do not have
> to---the fix was shown above).  It also burdens the developers to
> know that they have to manually manage the _allocated half of the
> two-variable pair.

Well, the developer has to manage the allocation in both versions of
this patch series. In the first iteration it just wasn't as bad because
I didn't bother to adjust all sites where we set up the comment string.
So I was just about to convert it back to the first iteration, but then
again saw that we now have to carry this ugly construct everywhere:

    if (comment_line_str != comment_line_str_default)
        free((char *) comment_line_str);
    comme_line_str = xstrdup(value);

vs

    free(comment_line_str_to_free);
    comment_line_str = comment_line_str_to_free = xstrdup(value);

I certainly think that the maintenance headache of the first version is
higher than having to maintain both variables. The worst that can happen
for the second version is that we leak memory because we don't update
the `_to_free` string. The worst that can happen in the above code
snippet is that one isn't aware of the condition of when the string
needs to be freed and then unconditionally frees it, leading to a
segfault.

But as said, ultimately I think neither of these versions is great.

Patrick
diff mbox series

Patch

diff --git a/builtin/commit.c b/builtin/commit.c
index 66427ba82d..025b1c4686 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -684,7 +684,9 @@  static void adjust_comment_line_char(const struct strbuf *sb)
 	const char *p;
 
 	if (!memchr(sb->buf, candidates[0], sb->len)) {
-		comment_line_str = xstrfmt("%c", candidates[0]);
+		free(comment_line_str_allocated);
+		comment_line_str = comment_line_str_allocated =
+			xstrfmt("%c", candidates[0]);
 		return;
 	}
 
@@ -705,7 +707,8 @@  static void adjust_comment_line_char(const struct strbuf *sb)
 	if (!*p)
 		die(_("unable to select a comment character that is not used\n"
 		      "in the current commit message"));
-	comment_line_str = xstrfmt("%c", *p);
+	free(comment_line_str_allocated);
+	comment_line_str = comment_line_str_allocated = xstrfmt("%c", *p);
 }
 
 static void prepare_amend_commit(struct commit *commit, struct strbuf *sb,
diff --git a/config.c b/config.c
index 6421894614..cb78b652ee 100644
--- a/config.c
+++ b/config.c
@@ -1596,7 +1596,9 @@  static int git_default_core_config(const char *var, const char *value,
 		else if (value[0]) {
 			if (strchr(value, '\n'))
 				return error(_("%s cannot contain newline"), var);
-			comment_line_str = xstrdup(value);
+			free(comment_line_str_allocated);
+			comment_line_str = comment_line_str_allocated =
+				xstrdup(value);
 			auto_comment_line_char = 0;
 		} else
 			return error(_("%s must have at least one character"), var);
diff --git a/environment.c b/environment.c
index 5cea2c9f54..1a95798d5f 100644
--- a/environment.c
+++ b/environment.c
@@ -114,6 +114,7 @@  int protect_ntfs = PROTECT_NTFS_DEFAULT;
  * that is subject to stripspace.
  */
 const char *comment_line_str = "#";
+char *comment_line_str_allocated;
 int auto_comment_line_char;
 
 /* Parallel index stat data preload? */
diff --git a/environment.h b/environment.h
index e9f01d4d11..0e0906f125 100644
--- a/environment.h
+++ b/environment.h
@@ -9,6 +9,7 @@  struct strvec;
  * that is subject to stripspace.
  */
 extern const char *comment_line_str;
+extern char *comment_line_str_allocated;
 extern int auto_comment_line_char;
 
 /*