Message ID | 8fbd72a1002d1a285847c62b5524041927a7b4d4.1723121979.git.ps@pks.im (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Memory leak fixes (pt.4) | expand |
Patrick Steinhardt <ps@pks.im> writes: > diff --git a/config.c b/config.c > index 6421894614..cb78b652ee 100644 > --- a/config.c > +++ b/config.c > @@ -1596,7 +1596,9 @@ static int git_default_core_config(const char *var, const char *value, > else if (value[0]) { > if (strchr(value, '\n')) > return error(_("%s cannot contain newline"), var); > - comment_line_str = xstrdup(value); > + free(comment_line_str_allocated); > + comment_line_str = comment_line_str_allocated = > + xstrdup(value); If you are to follow the _to_free pattern, you do not have to allocate here, no? We borrow the value in the configset and point at it via comment_line_str, and clear comment_line_str_to_free because there is nothing to free now. I.e. comment_line_str = value; FREE_AND_NULL(comment_line_str_allocated); I still think the approach taken by the previous iteration was simpler and much less error prone, though.
On Thu, Aug 08, 2024 at 10:12:26AM -0700, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > > > diff --git a/config.c b/config.c > > index 6421894614..cb78b652ee 100644 > > --- a/config.c > > +++ b/config.c > > @@ -1596,7 +1596,9 @@ static int git_default_core_config(const char *var, const char *value, > > else if (value[0]) { > > if (strchr(value, '\n')) > > return error(_("%s cannot contain newline"), var); > > - comment_line_str = xstrdup(value); > > + free(comment_line_str_allocated); > > + comment_line_str = comment_line_str_allocated = > > + xstrdup(value); > > If you are to follow the _to_free pattern, you do not have to > allocate here, no? We borrow the value in the configset and point > at it via comment_line_str, and clear comment_line_str_to_free > because there is nothing to free now. I.e. > > comment_line_str = value; > FREE_AND_NULL(comment_line_str_allocated); Only if it is guaranteed that the configuration will never be re-read, which would end up discarding memory owned by the old string. Which should be the case already, but to the best of my knowledge we do not document the expected lifetime of config strings anywhere. > I still think the approach taken by the previous iteration was > simpler and much less error prone, though. I personally prefer this iteration. I feel that it is way more discoverable to have an explicit indicator that something needs to be freed, which the `_allocated` suffix brings us. With the old version, the caller needs to become aware that the constant string may sometimes need to be freed, and that sometimes is figured out by comparing to a magic variable, which feels worse to me. Ultimately, both solutions are okay-ish, but I don't consider either of them to be great. As mentioned elsewhere, I think the best solution would be to adapt the `struct strbuf` interface to have an initializer like `STRBUF_INIT_CONST("foobar")` that allows us to initialize it with a string constant. There wouldn't be any need to have two variables anymore, and the `strbuf` API would handle the lifecycle of its contents for us. In any case, I'd say this is a #leftoverbit and is better done in a subsequent patch series. I don't really think it makes sense to reroll this version to swap out the patch for the first version again, but am happy to adapt if you prefer that. Thanks! Patrick
Patrick Steinhardt <ps@pks.im> writes: > On Thu, Aug 08, 2024 at 10:12:26AM -0700, Junio C Hamano wrote: >> Patrick Steinhardt <ps@pks.im> writes: >> >> > diff --git a/config.c b/config.c >> > index 6421894614..cb78b652ee 100644 >> > --- a/config.c >> > +++ b/config.c >> > @@ -1596,7 +1596,9 @@ static int git_default_core_config(const char *var, const char *value, >> > else if (value[0]) { >> > if (strchr(value, '\n')) >> > return error(_("%s cannot contain newline"), var); >> > - comment_line_str = xstrdup(value); >> > + free(comment_line_str_allocated); >> > + comment_line_str = comment_line_str_allocated = >> > + xstrdup(value); >> >> If you are to follow the _to_free pattern, you do not have to >> allocate here, no? We borrow the value in the configset and point >> at it via comment_line_str, and clear comment_line_str_to_free >> because there is nothing to free now. I.e. >> >> comment_line_str = value; >> FREE_AND_NULL(comment_line_str_allocated); > > Only if it is guaranteed that the configuration will never be re-read, > which would end up discarding memory owned by the old string. Which > should be the case already, but to the best of my knowledge we do not > document the expected lifetime of config strings anywhere. Then let's mark it as #leftoverbits to document it. Many other code paths depend on it. >> I still think the approach taken by the previous iteration was >> simpler and much less error prone, though. > > I personally prefer this iteration. If so, then let's fully take advantage of the fact that you have a to-free variable dedicated for the comment_line_str variable. I still think it is a maintenance burden to keep them always in sync (which is another thing the developers have to remember---when they are updating _this_ particular variable, an extra rule applies and they need to take care of this _allocated thing associated with it), and the first approach, by not forcing all the other assignment code paths to worry about it, simplifies the mental model for developers greatly (i.e. we know we do not own the initial value, but everything else we allocate thus we free everything but the initial value), in exchange for a slightly wasteful allocation. The approach in the second patch is worse in two counds compared to the original. It does wasteful allocation (which we do not have to---the fix was shown above). It also burdens the developers to know that they have to manually manage the _allocated half of the two-variable pair.
On Mon, Aug 12, 2024 at 01:32:53PM -0700, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > > > On Thu, Aug 08, 2024 at 10:12:26AM -0700, Junio C Hamano wrote: > >> Patrick Steinhardt <ps@pks.im> writes: > >> > >> > diff --git a/config.c b/config.c > >> > index 6421894614..cb78b652ee 100644 > >> > --- a/config.c > >> > +++ b/config.c > >> > @@ -1596,7 +1596,9 @@ static int git_default_core_config(const char *var, const char *value, > >> > else if (value[0]) { > >> > if (strchr(value, '\n')) > >> > return error(_("%s cannot contain newline"), var); > >> > - comment_line_str = xstrdup(value); > >> > + free(comment_line_str_allocated); > >> > + comment_line_str = comment_line_str_allocated = > >> > + xstrdup(value); > >> > >> If you are to follow the _to_free pattern, you do not have to > >> allocate here, no? We borrow the value in the configset and point > >> at it via comment_line_str, and clear comment_line_str_to_free > >> because there is nothing to free now. I.e. > >> > >> comment_line_str = value; > >> FREE_AND_NULL(comment_line_str_allocated); > > > > Only if it is guaranteed that the configuration will never be re-read, > > which would end up discarding memory owned by the old string. Which > > should be the case already, but to the best of my knowledge we do not > > document the expected lifetime of config strings anywhere. > > Then let's mark it as #leftoverbits to document it. Many other code > paths depend on it. Okay. > >> I still think the approach taken by the previous iteration was > >> simpler and much less error prone, though. > > > > I personally prefer this iteration. > > If so, then let's fully take advantage of the fact that you have a > to-free variable dedicated for the comment_line_str variable. Can do. > I still think it is a maintenance burden to keep them always in sync > (which is another thing the developers have to remember---when they > are updating _this_ particular variable, an extra rule applies and > they need to take care of this _allocated thing associated with it), > and the first approach, by not forcing all the other assignment code > paths to worry about it, simplifies the mental model for developers > greatly (i.e. we know we do not own the initial value, but > everything else we allocate thus we free everything but the initial > value), in exchange for a slightly wasteful allocation. > > The approach in the second patch is worse in two counds compared to > the original. It does wasteful allocation (which we do not have > to---the fix was shown above). It also burdens the developers to > know that they have to manually manage the _allocated half of the > two-variable pair. Well, the developer has to manage the allocation in both versions of this patch series. In the first iteration it just wasn't as bad because I didn't bother to adjust all sites where we set up the comment string. So I was just about to convert it back to the first iteration, but then again saw that we now have to carry this ugly construct everywhere: if (comment_line_str != comment_line_str_default) free((char *) comment_line_str); comme_line_str = xstrdup(value); vs free(comment_line_str_to_free); comment_line_str = comment_line_str_to_free = xstrdup(value); I certainly think that the maintenance headache of the first version is higher than having to maintain both variables. The worst that can happen for the second version is that we leak memory because we don't update the `_to_free` string. The worst that can happen in the above code snippet is that one isn't aware of the condition of when the string needs to be freed and then unconditionally frees it, leading to a segfault. But as said, ultimately I think neither of these versions is great. Patrick
diff --git a/builtin/commit.c b/builtin/commit.c index 66427ba82d..025b1c4686 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -684,7 +684,9 @@ static void adjust_comment_line_char(const struct strbuf *sb) const char *p; if (!memchr(sb->buf, candidates[0], sb->len)) { - comment_line_str = xstrfmt("%c", candidates[0]); + free(comment_line_str_allocated); + comment_line_str = comment_line_str_allocated = + xstrfmt("%c", candidates[0]); return; } @@ -705,7 +707,8 @@ static void adjust_comment_line_char(const struct strbuf *sb) if (!*p) die(_("unable to select a comment character that is not used\n" "in the current commit message")); - comment_line_str = xstrfmt("%c", *p); + free(comment_line_str_allocated); + comment_line_str = comment_line_str_allocated = xstrfmt("%c", *p); } static void prepare_amend_commit(struct commit *commit, struct strbuf *sb, diff --git a/config.c b/config.c index 6421894614..cb78b652ee 100644 --- a/config.c +++ b/config.c @@ -1596,7 +1596,9 @@ static int git_default_core_config(const char *var, const char *value, else if (value[0]) { if (strchr(value, '\n')) return error(_("%s cannot contain newline"), var); - comment_line_str = xstrdup(value); + free(comment_line_str_allocated); + comment_line_str = comment_line_str_allocated = + xstrdup(value); auto_comment_line_char = 0; } else return error(_("%s must have at least one character"), var); diff --git a/environment.c b/environment.c index 5cea2c9f54..1a95798d5f 100644 --- a/environment.c +++ b/environment.c @@ -114,6 +114,7 @@ int protect_ntfs = PROTECT_NTFS_DEFAULT; * that is subject to stripspace. */ const char *comment_line_str = "#"; +char *comment_line_str_allocated; int auto_comment_line_char; /* Parallel index stat data preload? */ diff --git a/environment.h b/environment.h index e9f01d4d11..0e0906f125 100644 --- a/environment.h +++ b/environment.h @@ -9,6 +9,7 @@ struct strvec; * that is subject to stripspace. */ extern const char *comment_line_str; +extern char *comment_line_str_allocated; extern int auto_comment_line_char; /*
When the comment line character has been specified multiple times in the configuration, then `git_default_core_config()` will cause a memory leak because it unconditionally copies the string into `comment_line_str` without free'ing the previous value. In fact, it can't easily free the value in the first place because it may contain a string constant. Refactor the code such that we track allocated comment character strings via a separate non-constant variable `comment_line_str_allocated`. Adapt sites that set `comment_line_str` to set both and free the old value that was stored in `comment_line_str_allocated`. This memory leak is being hit in t3404. As there are still other memory leaks in that file we cannot yet mark it as passing with leak checking enabled. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- builtin/commit.c | 7 +++++-- config.c | 4 +++- environment.c | 1 + environment.h | 1 + 4 files changed, 10 insertions(+), 3 deletions(-)