Message ID | 20240909231828.GF921834@coredump.intra.peff.net (mailing list archive) |
---|---|
State | Accepted |
Commit | f6ba781903f778b82e0b2fa11b61fb0403e1bfa5 |
Headers | show |
Series | ref-filter %(trailer) fixes | expand |
On Mon, Sep 09, 2024 at 07:18:28PM -0400, Jeff King wrote: > When we parse a placeholder like "%(trailers:key=foo)", our atom parsing > function is passed just the argument string "key=foo". We duplicate this > into its own string, but never free it, causing a leak. > > We do the duplication for two reasons: > > 1. There's a mismatch with the pretty.c trailer-formatting code that > we rely on. It expects to see a closing paren, like "key=foo)". So > we duplicate the argument string with that extra character to pass > along. > > This is probably something we could fix in the long run, but it's > somewhat non-trivial if we want to avoid regressing error cases for > things like "git log --format='%(trailer:oops'". So let's accept > it as a necessity for now. > > 2. The argument parser expects to store the list of "key" entries > ("foo" in this case) in a string-list. It also stores the length of > the string in the string-list "util" field. The original caller in > pretty.c uses this with a "nodup" string list to avoid making extra > copies, which creates a subtle dependency on the lifetime of the > original format string. > > We do the same here, which creates that same dependency. So we > can't simply free it as soon as the parsing is done. > > There are two possible solutions here. The first is to hold on to the > duplicated "argbuf" string in the used_atom struct, so that it lives as > long as the string_list which references it. > > But I think a less-subtle solution, and what this patch does, is to > switch to a duplicating string_list. That makes it self-contained, and > lets us free argbuf immediately. It may involve a few extra allocations, > but this parsing is something that happens once per program, not once > per output ref. Sensible. I found that in many cases, the `nodup` variants of string lists bring more pain than real benefit. > This clears up one case that LSan finds in t6300, but there are more. Yeah, there are a bunch of memory leaks around atom parsing in general exposed by t6300. Thanks for plugging some of them! Patrick
On Tue, Sep 10, 2024 at 08:09:03AM +0200, Patrick Steinhardt wrote: > > But I think a less-subtle solution, and what this patch does, is to > > switch to a duplicating string_list. That makes it self-contained, and > > lets us free argbuf immediately. It may involve a few extra allocations, > > but this parsing is something that happens once per program, not once > > per output ref. > > Sensible. I found that in many cases, the `nodup` variants of string > lists bring more pain than real benefit. I have found that, too. :) I've argued in the past for getting rid of it (and especially not propagating its world-view to other data structures like strmap, and so on). But Elijah presented some pretty compelling numbers that it does help for some of the large merge-ort strmaps. So I've softened my argument to asking people to use it judiciously. ;) > > This clears up one case that LSan finds in t6300, but there are more. > > Yeah, there are a bunch of memory leaks around atom parsing in general > exposed by t6300. Thanks for plugging some of them! Well, you're really going to like the rest of the series, then! -Peff
diff --git a/ref-filter.c b/ref-filter.c index ebddc041c7..54c5079dde 100644 --- a/ref-filter.c +++ b/ref-filter.c @@ -567,7 +567,8 @@ static int trailers_atom_parser(struct ref_format *format UNUSED, atom->u.contents.trailer_opts.no_divider = 1; if (arg) { - const char *argbuf = xstrfmt("%s)", arg); + char *argbuf = xstrfmt("%s)", arg); + const char *arg = argbuf; char *invalid_arg = NULL; struct ref_trailer_buf *tb; @@ -579,21 +580,23 @@ static int trailers_atom_parser(struct ref_format *format UNUSED, * They must be allocated in a separate, stable struct. */ atom->u.contents.trailer_buf = tb = xmalloc(sizeof(*tb)); - string_list_init_nodup(&tb->filter_list); + string_list_init_dup(&tb->filter_list); strbuf_init(&tb->sepbuf, 0); strbuf_init(&tb->kvsepbuf, 0); if (format_set_trailers_options(&atom->u.contents.trailer_opts, &tb->filter_list, &tb->sepbuf, &tb->kvsepbuf, - &argbuf, &invalid_arg)) { + &arg, &invalid_arg)) { if (!invalid_arg) strbuf_addf(err, _("expected %%(trailers:key=<value>)")); else strbuf_addf(err, _("unknown %%(trailers) argument: %s"), invalid_arg); free(invalid_arg); + free(argbuf); return -1; } + free(argbuf); } atom->u.contents.option = C_TRAILERS; return 0;
When we parse a placeholder like "%(trailers:key=foo)", our atom parsing function is passed just the argument string "key=foo". We duplicate this into its own string, but never free it, causing a leak. We do the duplication for two reasons: 1. There's a mismatch with the pretty.c trailer-formatting code that we rely on. It expects to see a closing paren, like "key=foo)". So we duplicate the argument string with that extra character to pass along. This is probably something we could fix in the long run, but it's somewhat non-trivial if we want to avoid regressing error cases for things like "git log --format='%(trailer:oops'". So let's accept it as a necessity for now. 2. The argument parser expects to store the list of "key" entries ("foo" in this case) in a string-list. It also stores the length of the string in the string-list "util" field. The original caller in pretty.c uses this with a "nodup" string list to avoid making extra copies, which creates a subtle dependency on the lifetime of the original format string. We do the same here, which creates that same dependency. So we can't simply free it as soon as the parsing is done. There are two possible solutions here. The first is to hold on to the duplicated "argbuf" string in the used_atom struct, so that it lives as long as the string_list which references it. But I think a less-subtle solution, and what this patch does, is to switch to a duplicating string_list. That makes it self-contained, and lets us free argbuf immediately. It may involve a few extra allocations, but this parsing is something that happens once per program, not once per output ref. This clears up one case that LSan finds in t6300, but there are more. Signed-off-by: Jeff King <peff@peff.net> --- ref-filter.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)