[6/9] ref-filter: fix leak of %(trailers) "argbuf"

Message ID	20240909231828.GF921834@coredump.intra.peff.net (mailing list archive)
State	Accepted
Commit	f6ba781903f778b82e0b2fa11b61fb0403e1bfa5
Headers	show Received: from cloud.peff.net (cloud.peff.net [104.130.231.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 14D03210EC for <git@vger.kernel.org>; Mon, 9 Sep 2024 23:18:29 +0000 (UTC) Date: Mon, 9 Sep 2024 19:18:28 -0400 From: Jeff King <peff@peff.net> To: git@vger.kernel.org Cc: Brooke Kuhlmann <brooke@alchemists.io> Subject: [PATCH 6/9] ref-filter: fix leak of %(trailers) "argbuf" Message-ID: <20240909231828.GF921834@coredump.intra.peff.net> References: <20240909230758.GA921697@coredump.intra.peff.net> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20240909230758.GA921697@coredump.intra.peff.net>
Series	ref-filter %(trailer) fixes \| expand [0/9] ref-filter %(trailer) fixes [1/9] t6300: drop newline from wrapped test title [2/9] ref-filter: avoid extra copies of payload/signature [3/9] ref-filter: strip signature when parsing tag trailers [4/9] ref-filter: drop useless cast in trailers_atom_parser() [5/9] ref-filter: store ref_trailer_buf data per-atom [6/9] ref-filter: fix leak of %(trailers) "argbuf" [7/9] ref-filter: fix leak with %(describe) arguments [8/9] ref-filter: fix leak when formatting %(push:remoteref) [9/9] ref-filter: add ref_format_clear() function [10/9] ref-filter: fix leak with unterminated %(if) atoms

Message ID

20240909231828.GF921834@coredump.intra.peff.net (mailing list archive)

State

Accepted

Commit

f6ba781903f778b82e0b2fa11b61fb0403e1bfa5

Headers

Date: Mon, 9 Sep 2024 19:18:28 -0400
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: Brooke Kuhlmann <brooke@alchemists.io>
Subject: [PATCH 6/9] ref-filter: fix leak of %(trailers) "argbuf"
Message-ID: <20240909231828.GF921834@coredump.intra.peff.net>
References: <20240909230758.GA921697@coredump.intra.peff.net>
Precedence: bulk
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <20240909230758.GA921697@coredump.intra.peff.net>

Series

ref-filter %(trailer) fixes | expand

Commit Message

Jeff King Sept. 9, 2024, 11:18 p.m. UTC

When we parse a placeholder like "%(trailers:key=foo)", our atom parsing
function is passed just the argument string "key=foo". We duplicate this
into its own string, but never free it, causing a leak.

We do the duplication for two reasons:

  1. There's a mismatch with the pretty.c trailer-formatting code that
     we rely on. It expects to see a closing paren, like "key=foo)". So
     we duplicate the argument string with that extra character to pass
     along.

     This is probably something we could fix in the long run, but it's
     somewhat non-trivial if we want to avoid regressing error cases for
     things like "git log --format='%(trailer:oops'". So let's accept
     it as a necessity for now.

  2. The argument parser expects to store the list of "key" entries
     ("foo" in this case) in a string-list. It also stores the length of
     the string in the string-list "util" field. The original caller in
     pretty.c uses this with a "nodup" string list to avoid making extra
     copies, which creates a subtle dependency on the lifetime of the
     original format string.

     We do the same here, which creates that same dependency. So we
     can't simply free it as soon as the parsing is done.

There are two possible solutions here. The first is to hold on to the
duplicated "argbuf" string in the used_atom struct, so that it lives as
long as the string_list which references it.

But I think a less-subtle solution, and what this patch does, is to
switch to a duplicating string_list. That makes it self-contained, and
lets us free argbuf immediately. It may involve a few extra allocations,
but this parsing is something that happens once per program, not once
per output ref.

This clears up one case that LSan finds in t6300, but there are more.

Signed-off-by: Jeff King <peff@peff.net>
---
 ref-filter.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

Comments

Patrick Steinhardt Sept. 10, 2024, 6:09 a.m. UTC | #1

On Mon, Sep 09, 2024 at 07:18:28PM -0400, Jeff King wrote:
> When we parse a placeholder like "%(trailers:key=foo)", our atom parsing
> function is passed just the argument string "key=foo". We duplicate this
> into its own string, but never free it, causing a leak.
> 
> We do the duplication for two reasons:
> 
>   1. There's a mismatch with the pretty.c trailer-formatting code that
>      we rely on. It expects to see a closing paren, like "key=foo)". So
>      we duplicate the argument string with that extra character to pass
>      along.
> 
>      This is probably something we could fix in the long run, but it's
>      somewhat non-trivial if we want to avoid regressing error cases for
>      things like "git log --format='%(trailer:oops'". So let's accept
>      it as a necessity for now.
> 
>   2. The argument parser expects to store the list of "key" entries
>      ("foo" in this case) in a string-list. It also stores the length of
>      the string in the string-list "util" field. The original caller in
>      pretty.c uses this with a "nodup" string list to avoid making extra
>      copies, which creates a subtle dependency on the lifetime of the
>      original format string.
> 
>      We do the same here, which creates that same dependency. So we
>      can't simply free it as soon as the parsing is done.
> 
> There are two possible solutions here. The first is to hold on to the
> duplicated "argbuf" string in the used_atom struct, so that it lives as
> long as the string_list which references it.
> 
> But I think a less-subtle solution, and what this patch does, is to
> switch to a duplicating string_list. That makes it self-contained, and
> lets us free argbuf immediately. It may involve a few extra allocations,
> but this parsing is something that happens once per program, not once
> per output ref.

Sensible. I found that in many cases, the `nodup` variants of string
lists bring more pain than real benefit.

> This clears up one case that LSan finds in t6300, but there are more.

Yeah, there are a bunch of memory leaks around atom parsing in general
exposed by t6300. Thanks for plugging some of them!

Patrick

Jeff King Sept. 10, 2024, 6:33 a.m. UTC | #2

On Tue, Sep 10, 2024 at 08:09:03AM +0200, Patrick Steinhardt wrote:

> > But I think a less-subtle solution, and what this patch does, is to
> > switch to a duplicating string_list. That makes it self-contained, and
> > lets us free argbuf immediately. It may involve a few extra allocations,
> > but this parsing is something that happens once per program, not once
> > per output ref.
> 
> Sensible. I found that in many cases, the `nodup` variants of string
> lists bring more pain than real benefit.

I have found that, too. :) I've argued in the past for getting rid of it
(and especially not propagating its world-view to other data structures
like strmap, and so on). But Elijah presented some pretty compelling
numbers that it does help for some of the large merge-ort strmaps.

So I've softened my argument to asking people to use it judiciously. ;)

> > This clears up one case that LSan finds in t6300, but there are more.
> 
> Yeah, there are a bunch of memory leaks around atom parsing in general
> exposed by t6300. Thanks for plugging some of them!

Well, you're really going to like the rest of the series, then!

-Peff

diff --git a/ref-filter.c b/ref-filter.c
index ebddc041c7..54c5079dde 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -567,7 +567,8 @@  static int trailers_atom_parser(struct ref_format *format UNUSED,
 	atom->u.contents.trailer_opts.no_divider = 1;
 
 	if (arg) {
-		const char *argbuf = xstrfmt("%s)", arg);
+		char *argbuf = xstrfmt("%s)", arg);
+		const char *arg = argbuf;
 		char *invalid_arg = NULL;
 		struct ref_trailer_buf *tb;
 
@@ -579,21 +580,23 @@  static int trailers_atom_parser(struct ref_format *format UNUSED,
 		 * They must be allocated in a separate, stable struct.
 		 */
 		atom->u.contents.trailer_buf = tb = xmalloc(sizeof(*tb));
-		string_list_init_nodup(&tb->filter_list);
+		string_list_init_dup(&tb->filter_list);
 		strbuf_init(&tb->sepbuf, 0);
 		strbuf_init(&tb->kvsepbuf, 0);
 
 		if (format_set_trailers_options(&atom->u.contents.trailer_opts,
 						&tb->filter_list,
 						&tb->sepbuf, &tb->kvsepbuf,
-						&argbuf, &invalid_arg)) {
+						&arg, &invalid_arg)) {
 			if (!invalid_arg)
 				strbuf_addf(err, _("expected %%(trailers:key=<value>)"));
 			else
 				strbuf_addf(err, _("unknown %%(trailers) argument: %s"), invalid_arg);
 			free(invalid_arg);
+			free(argbuf);
 			return -1;
 		}
+		free(argbuf);
 	}
 	atom->u.contents.option = C_TRAILERS;
 	return 0;

[6/9] ref-filter: fix leak of %(trailers) "argbuf"

Commit Message

Comments

Patch